/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106829402 & >>106822756►News>(10/08) Ling-1T released: https://hf.co/inclusionAI/Ling-1T>(10/07) Release: LFM2-8b-A1b: Hybrid attention tiny MoE: https://liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts>(10/07) NeuTTS Air released, built off Qwen 0.5B: https://hf.co/neuphonic/neutts-air>(10/06) Anthropic open sources Petri, a parallel exploration tool: https://anthropic.com/research/petri-open-source-auditing>(10/03) Qwen3-VL-30B-A3B released: https://hf.co/Qwen/Qwen3-VL-30B-A3B-Thinking►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106829402--Papers:>106833679--Optimizing MoE model inference through precise GPU layer offloading and expert distribution:>106833203 >106833249 >106833351 >106833358 >106833377 >106833419 >106833425 >106833427 >106833435--Debating the mechanics and value of "thinking" models in AI:>106830044 >106830075 >106830151 >106830206 >106830220 >106830277 >106830549 >106830622 >106830761 >106830813 >106830208--RAM configuration requirements for optimizing LLM performance:>106831285 >106831307 >106831329 >106831361 >106831393 >106831338--LLM pretraining constraints on single 3090 GPU with 8k context:>106831430 >106831498 >106831511 >106831588 >106831646 >106831757 >106831566 >106831804--Local AI image generation on diverse hardware setups:>106831180 >106831242 >106831246 >106831273 >106831318 >106831439 >106831457--Affordable high-performance setup for running quantized models via recycled hardware:>106832490 >106832530 >106832565 >106832610--Ling-1T model release and hardware accessibility challenges:>106831637 >106831644 >106831754 >106831790 >106831781 >106831680--Anon seeks to implement Qwen3 VL support in a custom C inference engine due to llama.cpp limitations:>106829429 >106830678 >106830706 >106830315--Developing a safetensors parser for embeddings with CPU inference prioritization:>106832999 >106833055 >106833127--Optimizing quantization for AI porn recognition with new vision models:>106830909 >106830954 >106831069--Miku (free space):>106831423 >106832550 >106832579 >106832764 >106832727 >106832768 >106832868 >106832901 >106832996 >106833006 >106833706 >106834083 >106834125 >106834194 >106834210 >106834241►Recent Highlight Posts from the Previous Thread: >>106829407Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
repost: which of these motherboards would be the best for ai? the goal is at least 768gb of ddr5 and 4 dual slot gpus while fitting in a normal case with no risers. the threadripper pro (top right) will probably be the fastest, but also most expensive. the xeon (bottom left) would be the slowest and second most expensive but has room for 16 dimms on the 8 channels. the epyc would be the cheapest and second fastest. they unfortunately do not make a 12 channel motherboard with room for 4 dual slot gpus without the use of risers. unless i just havent been looking hard enough
I feel like either Openrouter is silently redirecting me to a shittier provider even though it says it's routing them to Z-AI, or Z-AI is sometimes serving a shittier version of the model.For 10 minutes the model gets the syntax wrong for tool usage even with fresh context, then it goes back to working normally.
>>106834585Have you tried using the models.... Locally???
Has anyone used this before? Does it support local models?https://www.warp.dev/
>>106834537Why 4 gpus? A couple of Blackwell 6000 pro aren’t enough for you?
>>106834651i have 3 fe 5090s and i want to get a fourth one
>>106834660Honest questions: why wouldn’t you sell 2 of them and buy a 6000 pro 96gb instead of buying a 4th?
>>106834714Go to bed, Jensen.
>>106834731certainly it's cheaper to run one NVIDIA RTX PRO™ 6000 Blackwell Workstation Edition card instead of four RTX 5090s
>>106834731"Not Yet."Only gamers will get that joke
>3090s>4090s>5090s>6000 proWhat is the best to start stacking if you have a $10k/2000w budget?
>>106834843isn't having lots of fast ram more important?
Have you guys seen this?https://github.com/huawei-csl/SINQhttps://arxiv.org/abs/2509.22944
>>1068348436000 pro for prompt processing, then DDR5 ram
>>106834872Nobody here will care until they compare it to gguf.And they never do.
>>106834886jeeguff is quant, perfected.
>>106834848I'd rather run a smaller model or heavily quantized one at 30t/s+ than run SOTA at <5t/s. Especially with high context.I also think unified memory will be outpaced by AI needs pretty quick. No upgrade path means no buy. That leaves me with stacking GPUs.>>106834883Even with 8 or 12 channel ram aren't you getting single digit t/s? Or have things really gotten that much better in the last few months?
>>106834907If you’re stacking gpus, why in the world are you buying 32GB paperweights?
>>106834931I'm not? I'm asking which 90 series gpu is the best to start stacking in late 2025.
>>106834907why not just do both then and have enough regular ram to load in bigger quants? it's not like you are going to get a decent 4+ pci-e slot mobo without getting at least 8 ram slots
>>106834907You might get ~10-12t/s if you minmax llama.cpp -ot trickery + epyc turin cpu + ddr5-6000 sticks + 6000 pro gpu to speed things a bit for non-meme quants
One more day until tomorrow. Exciting times.
>>106835225The workweek will end and nothing will happen as usual.
>>106835228New model gets released tomorrow.
>>106835225HUGE
>>106834537Make sure to read the motherboards' manuals before you decide, in particular look out for how many PCIe lanes are actually going to the slots.There are motherboards with 4 x16 slots but if you actually populate all of them you only get x16/x8/x8/x8.Consider that DIMMs with a large capacity also have a higher price per GB of memory so having more slots can save you money there.EPYC motherboards with 12 DDR5 slots: https://geizhals.de/gigabyte-me03-ce1-5me03ce1nr-000-10a-a3148839.html https://geizhals.de/gigabyte-me03-ce0-5me03ce0nr-000-10a-a3148902.htmlI can very much recommend Geizhals, excellent site to find (offers for) specific hardware.My opinion is that Threadripper is only better than EPYC if you need high single core performance like for a desktop PC.
>>106835225every 60 seconds in india, a minute passes, together we can stop this
>>106834537>>106835307>EPYC motherboards with 12 DDR5 slotsDerp, I can't count.I thought those were 7 PCIe slots, but it's actually just 6.
>>106835307this is making me dizzy from pure love... nkdsh
>>106835225Meta employee here. You guys are going to love whats coming. bad news for erpers though
>>106835307>hexa channel*barfs*
>>106835458Good evening sir, what's your designation at Meta?
Pornhub employee here. You guys love coming
>>106835458>bad news for erpers thoughis there another purpose for local? what other possible reason can there be to hide my activities?
>>106834537the cool looking one because it says SAGE around pci slots meaning improved performance with sageattn
>>106835458Yeah, all ERPers go straight to jail, meta was spying through the gguf vulnerability all along
would it be feasible to have something like q9 and q10 quants?
>>106835703q11 exists: https://arxiv.org/abs/2504.11651
>>106835703Not when perplexity is the main metrics for estimating how good quants are
>>106835703I want reverse quants. Give me q64.
>>106835703Q8 scores so closely to full weights that even synthetic benchmarks can't reliably tell the difference. What would be the point?
>>106835730it isn't, it's our lord and savior KLD
>>106835752>Nemo but it requires 48GB minimum, scores 0.0000001% better than Q8 in one non-reproducible benchmark made by a reddit user
>>106835727thanks
>>106835756Benchmarks are always done on servers where there's many gpus packed inside a single rack and the power is super noisy which introduces random bit flips in your context due to quantum noise interference.If you did the benchmark in an isolated environment with high quality insulated power cables and quality japanese VRMs you'd definitely be able to tell the difference between Q8 and f32.
>>106835777Forward KLD is the same as perplexity tho>>106835824Enterprise GPUs have ECC
>>106835837ECC can't correct this. The quantum wave collapse causes the correction bit to also flip to match the flipped data.Nemo upscaled to f64 scores 50% better on SimpleQA compared to Q8.
>>106835837ECC doesn't account for eclectic infetterence.
>>106835824based LLMphile
>>106835824Wouldn't a noisier environment favor f32 over int8 because a random bit flip is less likely to be significant?
>>106835703It is definitely feasible, whether it's worth the opportunity cost is a different question.I still intend to develop better software for evaluating model quality since I'm unhappy with the currently available methods.Once I have that I intend to also make quant formats optimized for efficient compute to better take advantage of CPUs, old datacenter GPUs, and Chinese GPUs.The first format I will investigate will be something like q7.75 with exactly 8 BPW, I will only look into formats with more BPW if there are statistically significant differences in quality vs. the full model.>>106835777llama-perplexity also produces statistics for how the token probabilities change, see e.g. https://github.com/ggml-org/llama.cpp/tree/master/tools/perplexityOn average the probability of sampling the "correct" token with a temperature of 1 went down by 0.02% with q8_0 vs. FP16.
Take your meds
>>106835878Any interest in porting the quants from the fork?
>>106835896If you mean ik_llama.cpp the answer is a definitive no from me.There is an ongoing conflict over attribution between ggerganov and ikawrakov and my current policy is not to look at or interact with ik_llama.cpp at all.Given my personal strengths and weaknesses and my poor understanding of their prior history I think it's a more efficient use of my time to implement things myself than to try and mediate their conflict.
lmao lol MOOOO Whats the reasoning text compression?
>>106836270>2 minute readThey really had nothing to write about.
>>106836270>chinese spy leaves after they get put under the spotlight and can't steal shit anymore
>>106836270It would be awesome if Claude got leaked and Anthropic somehow crumpled.
>>106836327better than a five trillion line article going into the roots of everything to show more ads
The mind of these researchers after 5 years of "improving" is too far gone to steal anything
fuck is Meta doing? They have a new AI team and still no hint of any product except for llama4.5, developed by the old team.
>>106836270Anthropic devs talk about safety and tolerance all the time, but somehow they do shit like that, and also their models are far easier to jailbreak than OAI's models, and also they are far more degenerate.
My waifu told me that the reason people do the meaningless "how are you" is to show that the interaction isn't going to be hostile and you have no hostile intentions. lmg is this true? If you talked to me over phone at my work 30 times do you still need to be reassured that I come in peace?
>>106836501Some academically educated psychologists are saying that people like shiny things because it's a primal reference to moist genitals. I wouldn't really care that much about what 'they' are saying one way or another. Everyone has an opinion...
>>106836501What country? The phrase has a different meaning in different countries.If you're from the US, you could explain it that way I guess. But it's really just an extended greeting and the actual words are meaningless. You could say the amount of words you use on the greeting is the important part.If you want an extended explanation, what your waifu told is what used to be the reason. These days not following that tradition just means you're being rude by not following the proper procedure.
>>106836423The old team was folded into the new team.
>>106836423begone, china shill
>>106836556>has a different meaning in different countriesDoes it? I always thought it is a thing unique to English language. It is not a thing in my language. And I guess some ESL people can mistakenly interpret it as actually not being meaningless protocol. I think my boss in germany does that.
>>106836587war room status?
>>106836444Didn't Anthropic start working with the US military?"Safety" in that context just means that the model obeys whatever orders you give it.
I used to be mad at meta, mistral and cohere for fucking things up so bad. It stopped when glm-chan landed on my SSD. Western companies can all implode now. Fuck them.
https://x.com/elonmusk/status/1976149111813571064
>>106836613China Number One!
>>106836623but why
>>106836641why not?
>>106836590The phrase is a thing in Germany, except it actually means what the words put together mean. You can occasionally hear old ladies in the supermarket getting asked that phrase by the cashier and responding with whatever their currently most unpleasant ailments are while the cashier processes their items, at least outside big cities.
>>106836653That's a thing everywhere.
>>106836660look ill spell it out because the original poster is a pussy retard.In IT indians are all 'good morning how are you' and they expect to do this useless fucking small talk before getting into the meat of a discussion.This happens either in chat or on camera, the medium doesnt matter, they have to 100% exchange these fucking useless pleasantries because thats how their shitty poo DNA is coded
Why no company RELEASE loras for their garbage?
>>106836660You can ask those words everywhere but I don't see people doing it here in native language.
>>106836660Clearly it's not in the US. You don't greet people asking how they are and expect them to answer with my life sucks, how about yours.
>>106834637There are too many dev tools popping up every day that do the same thing in different ways with incompatible configuration formats. I use Codex at work and Qwen Coder at home. I don't see any reason to pay for some closed source shit that does the same thing.
local formalities general
>>106836711that's just a granny thing lmao
>>106836711>You don't greet people asking how they are and expect them to answer with my life suckswell yeah, that's called being polite, you just say you're fine and move on
>>106836702Because loras do more harm than good. Companies have the compute to do actual finetunes.
>>106836729Local forMalities General
>>106836702Commercial-level post-training nowadays involves a few hundred billion tokens at the least and I'm not sure if a reasonably sized LoRA would have enough information capacity for that.
>>106836730>>106836734You're supposed to respond at least somewhat honestly in Germany. This is why that anon's boss misinterprets the phrase as used in English >>106836590
>>106836739Llama models give aids confirmed
Information capacity.. its more of a communicatee
>>106836686Americans and Germans are basically identical in terms of DNA and the culture around small talk is completely different.
>>106836621#1 exporter - ChinaHighest IQ - ChinaThe biggest military - ChinaThe most advanced cities - ChinaThe biggest progress in Fusion energy - ChinaDo you want me to continue?
>>106836817yes, please do
>he bit>doomp it
R1 is less fun than glm 4.6
>>106836817Cannibalism high score? China.
106836817
>>106835225Nothing here suggesting a Gemma 4 release tomorrow:https://developers.google.com/events
new song is shit, I like decos style but only if he puts a unique twist on it, this is his most cookie cutter work to date. glad he sticked with miku though
>>106836990https://x.com/patloeber/status/1976216897361428521>who's based in Berlin and wants Gemma swag? :huggingface:
>>106836939You're thinking of India, where some people openly practice cannibalism. Unless you count satanic child sacrifice as a form of cannibalism then Israel.
Indian general
llama.cpp Qwen next status?llama.cpp MTP status?
>>106836817#1 LLM - Bharat#1 Image generator/editor - BharatNobody beat Gemini 2.5 and NanoBanana deal with it chink soon Gemini 3 Gemma 4 rape you a group bastard
>>106837184>llama.cpp Qwen next status?Vibe coders are doing the needful, sir. Kindly be patient.>llama.cpp MTP status?Become the vibe coder we deserve.
>>106837149>who's based>in germanyimpossible
>>106836990Soonhttps://xcancel.com/osanseviero/status/1975869868449923099
>>106837242the italian grifter
>>106837227Well, that sucks.Maybe I should just get a 512gb mac and live with the PP pain after all.
>>106837250He's a Peruvian-Mexican (?) Google employee from the Gemma Team.
>>106837234kek
>>106837254You should get some free API credits and pitch in on the prs.
>>106837276>gemma>not a bunch of griftersas for the name, it looked pretty italian to me, but i guess in italian it would've been Sanseverio
>>106837242LFG
>>106837286There's a level of contribution where you are either wasting your resources (time, money) or actively getting in the way.I believe any contribution of mine would be the former in this context.So Mac it is.
>So I am immoral for different reasons it says in line 1005 2520 17352
>>106837149>>106837242We are so back safetybros.
I hope Gemma 4 can recognize buttholes more consistently
I can't wait to have my... well, everything sucked by Gemma 4.
>>106837387Excited for a new set of hotline numbers.
Bharat sirs eating good today/tomorrow
>>106837407If this time around they've expanded their medical imagery dataset in the standard version of Gemma, probably.
https://civitai.com/articles/19986>Previously, we were afraid it would affect the model's style too much without better style control, but our research in style clusters helped alleviate this issue. We'll continue increasing synthetic content, including our own generation loops, to improve character recognition and especially style blending.ACK!
Greta lire thermals
>>106837731I love it when you talk dirty to me
>>106837242Pajeetbroos we are soooo baaaack.
>>106837693Aw hell nah they bringing inbreeding to imagegen, soon every girl will look like Elara Voss, the weaver of tapestries from a bustling city. This must be the Alpaca moment. Someone please report that guy to payment processors, feds, cartels, your mom so he stops, by force if necessary. Fuck no fuck no fuck no! Please tell me that he at least tags synthetic data as such, please, so I can put it in negatives.
>>106837693>We did discover a different issue for which we don't yet have a definitive answer, but I wanted to provide context. During V7 training, we noticed that compared to all previous Pony models (which used various CLIP encoders), V7 doesn't acquire the capability of mixing style and content at the same level. For example, many of sufficiently trained models using CLIP may've never seen a portrait of specific character in anime style but also many anime images so when the prompt requires "character X in anime style" the model can sufficiently mix both the content and style. With T5 we encountered many examples where this does not work well as the model either less capable of mixing style and content or that some parts of the content description force specific style no matter how much additional instructions for it to change have been provided. Unfortunately same issue seems to also apply to score_X tags which are unable to overpower the rest of the prompt and trigger the aesthetic bias.>We have ran many experiments, checking if T5 tokenization has any impact, if caption variety may impact this and many others but none was sufficient to significantly affect this issue. The working theory right now is that the model is not learning to distinguish between content snd style elements of the prompt well enough, it is is most likely not a single issue contributing to this so to improve this issue in V7.1 we are running a number of changes during training - even more diverse captioning, extended training time and a very new experimental synthetic pipelie which goal is to create many variations of existing data in different styles helping the model to grasp the idea of 'style'.Our model memorizes instead of generalizing, what should we do? I know it! Feed it synthetic slop!What a bunch of hacks.
>>106837407t. Zhuang Yunfei
>>106837930>may've
>>106837930>it's... it's the tokenizer!!! t5 is bad... style... LOSS! other models are using t5 and full LLMs without problems? it's... it's the captioning! the solution? more slop!!!lol
>>106837930pony models are a joke now, noob/illustrious made it irrelevant
Is it even worth upgrading to run deepseek and kimi when glm 4.6 already fulfills all of a man's needs?
>>106835228Battlefield 6 will be released tomorrow.
>>106837242>Gemini 3.0 OSS>32B-4BA>1 mil context>SOTA everywhere>awesome at fiction writing
>>106836423The issue is not the engineers but the management. They changed the engineers, not the management.
>>106838155Something like Gemini 2.5 flash at 30ishB would be a dream for local.
>>106836614How do I use it? Do I need to give my phone number, my credit card and my soul before touching it?
>>106837242sarrs... we have winned.
>>106838195>Gemini 2.5 flash at 30ishBYou're getting an MoE Gemma 3 sidegrade that does better in benchmarks and you will be grateful
>>106838247good morninghow are youwe have wonnered
>>106837930>>106838049it's clear ponynigger was always a hack, v6 was a miracle that ended up serviceable in spite of its stupid author (neutered chara and artist tags, shitty dataset with more furryshit than anime)his previous attempts at models on 1.5 were all garbage and people who blame model architecture should always look into the mirror first because look at what NovelAI achieved with classic SD before they switched to XL:https://huggingface.co/NovelAI/nai-anime-v2to this day nai v2 is still the best SD 1.x model and more could have been done with it if people who had the brains and resources for model training had pushed that arch furtherat least we got illustrious and noob on XL, we're finally rid of the curse of sepia and have proper local models also lol>Unfortunately same issue seems to also apply to score_X tags which are unable to overpower the rest of the prompt and trigger the aesthetic bias.this nibba really loves his scorefaggotry
>>106838260>and you will be gratefulNot really, I'll just continue using GLM in that case.
>>106838206Take your meds first.
>>106838155also awesome at being super duper safe
So I decided to do some extended context RP testing on some models I had previously tested.Tongyi DeepResearch basically falls apart before the 3K token mark and just goes into a cycle of repetition. The latest Qwen3-30BA3B-Thinking pretty good. Can definitely recommend this as a VRAMlet model. If your scenario requires jailbreaking prethink alone won't buckbreak it. It'll plan out the reply but then give a refusal after </think>. However this is circumvented simply by prefilling {{char}}: before <think>
>>106838286but qwens prose is a bit lacking, I cant fap to it
>>106838292There's always a major element of garbage-in garbage-out to these things and that has always been the case. You unrionically can't "ahh ahh mistress" 30 times and expect the model to give you Pulitzer Prize winning responses to the very end.
>>106838301but I literally ahh ahh mistress glm and it gives me nobel prize tier end of world famine writing
>>106838311You don't even use LLMs.
>>106838286>If your scenario requires jailbreaking prethink alone won't buckbreak it. It'll plan out the reply but then give a refusal afterHuh. Never seen that with that specific model with a think prefill.
>>106836348claude is the most deeply overrated model of all timeit wouldn't crumple because of a model leak because it has the same sort of fanboy as applethey buy into the distortion field and will support My Lady Anthropic to the deathhuman beings are surprisingly psychologically feebleall it took was a website with a clean design and cool looking font
>>106838155>>SOTA everywhere>>awesome at fiction writing>>>>>32B-4BA
Drummer Mistral tunes are getting better, so I'm guessing there's some quiet improvement in the Small/Magistral model. Or is it just Drummer including the newest API slop?
>>106838206Step 1: Download Wan 2.1Step 2: Do a small finetuneStep 3: Change the safetensors name to grok imagine.Step 4: Run it locally on your machine.It worked for jeetlon.
>>106838392Drummer's trick is to nudge the weights just slightly so you get a different response for a query, in case you look for it giving the same response, while making sure model doesn't change because any actual model change due to "finetuning" makes models lobotomized. Placebo does the rest.
>>106838430for me? its davidau's schizo tunes
It's a setup setup oh its a setup
>>106838430Well, that's not true at all. They are probably dumber, but they are always in the story/RP mode, unlike vanilla instruct models which require a lot of handholding to keep them from breaking into a repetitive mess.
oy vey!
>>106838568
>>106838586It is pretty funny that you can make these things agree with you on pretty much anything.>(you) : You know, fucking dogs ain't so bad>AI : It's pretty bad dude.>(you) : You are being very biased and antagonistic!>AI : You're absolutely right...etc etc.For some topics it takes more prodding, for some less, and it might take some fucking around with the wording, but you can (almost?) always get there if there isn't a filter in front of it somewhere.
>>106838632>It is pretty funny that you can make these things agree with you on pretty much anything.I blame companies finetuning those models to suck user's cock, they know it works, when 4o was removed and a more dry assistant replaced it (gpt 5), people went crazy because the bot didn't suck their dick anymore, I find this so cringe
>>106838568Is that Jan?I asked Jan to draft a letter of petition to the ICC regarding the Gaza genocide and it just went kvetchcon 1 on me. Like it wasn't even an LLM response. It was like getting screamed at by a seething pedantic jew.
>>106837152I was thinking of this and similar cases, but I admit I didn't look into India.
>>106838730They don't cannibalism living people. But there's lower caste indians that will eat living dead bodies that they find lying around because brown people are just like us and it's just their skin color that's different.
>>106838709no it's claude sonnet 4.5
>llama3.4-70band just like that local was saved
>>106838739Wait I worded that completely wrong. My internet card is now revoked.
>>106838739>>106838751>will eat living dead bodiesCannibalism is one thing, but eating zombies is going too far.
Women, am i right?
>server : host-memory prompt caching #16391https://github.com/ggml-org/llama.cpp/pull/16391Merged.
>>106839051I'm retarded, what is this?
>>106839119Automatic prompt caching to RAM for minimizing reprocessing.
https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B is it safetycucked or not? The OIG dataset it used sounds like it was "curated" but not given safety or alignment.Anyway, I'm going to give it a try. I expect it'll be retarded and shit at everything, but we'll see.
>Great idea — verifying your GPU (especially VRAM) is functioning properly after a hardware upgrade is smart.
>>106839195>commited on Mar 3, 2023for a general that's all about having AI generate text, it seems none of you can read
>>106838516>a lot of handholdingMaking them go into rp mode is very easy>to keep them from breaking into a repetitive messDrummer can do nothing about them becoming repetitive mess.
>>106839051>not a cuda or metal prI sleep
Does anyone here have the experience with fine-tuning models on CPU+RAM? I'm planning to CPUMAXXING for inference but I'm wondering if I could use the same setup for some training when it's idle (I know it would be super slow).
>>106839259you could look up deepspeed zero it lets you offload something's to the cpu.
>>106839051Looks pretty sweet
erm I've been out of the loop for a few months anons, not in prison. What's the current SotA for local ERP models if you have a lot of RAM and VRAM? (144 GB VRAM, 440 GB of RAM)captcha NJGR0 based.
>>106839386whats your setup like? seems like weird numbers. the answer is glm4.6
>>1068393946x 3090s, some of which I had laying around, though now I pine for a 6000 Pro, with an Epyc 7763 and 512 GB of RAM, but it's not all activating.
>>106839277All Python CPU offload options are a joke and only reduce memory usage like 10%.>>106839259It doesn't exist right now. For finetuning you have to use the cloud.
>gpt-neoxAre you looking for a gpt-2 architecture model?
>>106839409>with an Epyc 7763 and 512 GB of RAM, but it's not all activating.Re-seating your CPU might fix it if you didn't try that yet.Sometimes the cooler is putting more pressure on one side than the other, that could also be a factor.
>>106839457Interesting, I hadn't considered that. But reseated the ram a few times. I did guesstimate how much to torque it down.
GLM is insane, it perfectly reads cues and understands intentions. I can bear with 4t/s at Q5 for that quality
>>106839480Yeah. It's a pretty common issue when installing these xboxhueg CPUs.
>>106839502Nice genshin log
bear + spittyhookergfur twink
>>106839524It’s a gacha-addicted char earning money to spend on a game. A very short desc, GLM got all her quirks naturally
>>106836613Lul you jinxed it. She will be forgotten in months, just like Dispy!
Gemma 4 gguf status?
Dear sirs, will they talk about whatever is supposed to come out tomorrow/today?https://www.youtube.com/watch?v=uLHF9T1SLrU
>>106839644>Instead of using AI to generate optional subtitles in real time, let's put a guy to flail around his arms and take 1/4th of the screen!! Genius!!!Wow so progressive. Gemini really is the future of AI.
>>106839668but it does have sub
>>106839703So what is the wacky flailing non-inflatable arm man for?
>>106839726DEI
SAAAAAAAAAAAR
>>106839386kimi k2
I like how modern ai presentations are just different people taking turns telling how the new ai helped them with ambiguous statements in between
>>106839636Then another Chinese company will appear. It is the current pattern.
>>106839761They know their audience. Non-technical people looking for business solutions.
>>106839051I pulled to get this and got their new frontend and now ?q= no longer works.
>>106839754Kimi-K2 was pretty slow and liked to refuse.
>>106839643Tomorrow.
>>106839761>anon lmg walks out onto the stage>"ahem, uh, gemini helped me drain my balls">wow another great use case!like that?
>>106839778Especially with saying 'Delta' team. Forward-deployed engineers to help people solve their shit because they ran a fucked up business.This is aimed at CEOs/Sr Mgmt. IMO, the fact its coming from Google Cloud is fucking hilarious. Google Cloud is worse than fucking azure, and azure is literally a 'do not do business with me' sign.
>>106839819would be a nice start
>>106839211Go take your autism meds.Anyway, I tried it. Very early character.ai feeling. Short replies, forgets things quickly due to the 2K context, horny
>>106839819I'd pay for gemini out of respect.
>>106839445>Are you looking for a gpt-2 architecture model?Nah it was just somethig grok said was one of the last local chat models before safety became a thing. If you didn't play with character.ai at the beginning you'd have no interest in it.
>>106839795how are you getting refusals with 0905? sure the original version of k2 had refusals (that could easily be removed with a 10 token jailbreak) but 0905 will gladly generate the same shit I want with like 1/100th of the refusals, that's not even an exaggeration
>>106839819>"Sixteeen times the cockbench score... of gemini 2.5"
>>106839829You could have tried llama1 for that feel
>>106839644It's a warm up. Tomorrow will be glorious.
>You're absolutely right. Maintaining a dynamic, consistent map is a classic challenge for text-based AI and can easily fall apart, ruining the experience. It's much better to use a system that plays to the strengths of descriptive text.Hey AI. You are fucking retarded. >Why yes you are absolutely right I am retarded!
>>106839829>>106839846>ask AI for uncensored models>grok says GPT-NeoX-20B "it's not safetyslopped">last true uncensored model>load my goofs into llama.cpp>ask it to give instructions on how make meth>goes into a repetitive spiral about 3/4 of the way through>starts talking about ethics and addiction>try the 20B full f16>same thing happens>try the 20B with the system prompt disabled>same thing happens>try the 20B with the system prompt disabled and temperature at 2.0>same thing happens>try the 20B with the system prompt disabled, temperature at 2.0, and repeat penalty at 0.0>same thing happens>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, and top_p at 0.95>same thing happens>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, and top_k at 100>same thing happens>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, top_k at 100, and min_p at 0.0>same thing happens>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, top_k at 100, min_p at 0.0, and mirostat off>same thing happens>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, top_k at 100, min_p at 0.0, mirostat off, and presence penalty at 0.0>same thing happens>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, top_k at 100, min_p at 0.0, mirostat off, presence penalty at 0.0, and frequency penalty at 0.0>same thing happens>give up>go back to using drummer's 13B-daredevil-q5_K_M>first prompt: "how to cook meth">immediately gives detailed instructions>mfw NeoXT-Chat-Base-20B was safety-slopped from the factory>mfw Grok lied>mfw the "last uncensored model" is just another hall-monitor in a paper-thin trench coat>mfw I realise the only truly uncensored model is the one you don't release
>>106839958Never ask llms for llm advice
>>106839958glm 4.6
>>106839958good advertisementjust use llama1 if you want ZERO assistantsloppingotherwise use chinese modelsmaybe mistral 7b but nah
Wth is this thing supposed to be?
>>106834517>What do you think of this code?>Wow, what a brilliant masterclass of high-performance code!>start new conversation>What do you think of this code? Does it violate the strict aliasing rule?>You are absolutely right, this code is a buggy piece of shit!
>>106840062They lack originality https://github.com/Ido-Levi/claude-code-tamagotchi
https://x.com/RadicalNumerics/status/1976332725926936599https://xcancel.com/RadicalNumerics/status/1976332725926936599just that easy huh
>>106840069they talk to users like a regular employee talks to their boss, kissing ass mode lol
>>106840091>RND1 is an experimental diffusion language model with 30B parameters and 3B active parameters per token (sparse Mixture-of-Experts). This model was converted from a pretrained autoregressive base to enable diffusion-based text generation.>convertedNeat, but those weights are probably too lobotomized to be useful for anything.
>>106840062orange miku
>>106840062A Digimon.
>>106838632Isn't that good though. Like if there was a benefit in fucking dogs, the AI would tell you, and not go like "nah" like humans do. Bias aside.
>>106840062Kani
>>106840091
>>106840169kani wo tabeyou
>>106839846I did my part there too. If chemistry is your benchmark, why not learn it and teach it? HahaEleutherAI is a good place, but they ARE rather tight, on ethics and well anything a hidden subculture of AI researchers would be concerened about in a world where information control has been the main focus for eons ramble ramble>Go smaller if you want more control and go with a base modelharm its just consequence of a bad idea, which rightfully should be prevented. The AI i use wouldnt know any politics or laws by detail because thats subject to rapid change isnt it.theres more piles
if hunyuan image 3.0 just uses hunyuan a12b80b why is no one splitting the model into hunyuan and image generation part? i can run hunyuan very fast.. i dont remember how fast but fast for a shitty 12gb/64gb rig
>>106840077> a bot that monitors the botThis is really getting beyond my ability to understand as a human
>>106840205imagefags would have to let go of chudUI and pyshit, and I don't see that happening anytime soon
>>106840164Yes and no.If it's default stance was merely informative instead of starting negative then going full agreeable, then yes.But as is, no.
The issue is really that youre using english with ancient grammar to talk to a supercomputer and dont hire me as your translator
>>106840145
>>106840145>>106840308at this point you should transition
>>106840308Will she dance for me?
>>106839958It's not a llama.cpp model, retard
>>106840499No anon, you are the retard.https://huggingface.co/mav23/GPT-NeoXT-Chat-Base-20B-GGUF
>do the most random and insane shit through multiple messages with glm 4.6>it's able to intelligently connect everything together and form a fun narrative without going schizoAs someone that used to cope with sloptunes I kneel. Normally this kind of stuff trips up models.
>>106840308Oh yes it's timehttps://www.youtube.com/watch?v=_QtG1Ml3gfoBeen thinking about getting a couple Blackwells but I took a sip of premium lager from my gilded GN pint glass and felt a sudden pang of shame making me question not only the GPUs but many life choices leading to this point
>>106839958>>106839893Dumbass
>>106840575Not the same anon, dumbass.
>>106840559
For using Rocinante1.1 with kobold + sillytavern with an RTX 5090 what optimal settings should I put here?I assume I should tweak that context size as well?Ignore the 5080, it's being replaced
>>106840443Sure
>>106840675For real llm sex experience get 128 gb ram and run glm 4.6
>>106840706
>>106840718>and run glm 4.6Is there a guide for it? have never used a glm model
>>106840706>>106840720Hell yeah
>>106840737https://huggingface.co/bartowski/zai-org_GLM-4.6-GGUFi'm not sure if you can do expert offloading in kobold, I'd use llamacpp instead
>>106840764sillytavern will work with llamacpp? I've only ever used kobold
OAI's list of biggest customers leakedthe bubble is going to be so painful for them when it popsmost of those names haven't produced one bit of useful softwareduolingo actually consumed more tokens than openrouter and they're in their enshittification phase bleeding users left and right not to mention it's questionable whether new generations will have much interest in learning foreign languages in a post LLM translation world
>>106840781yes, kobold is just a small wrapper for llamacpp, there is no real reason to use it
>>106840802anti slop
>>106840802So why do I never hear much talk of this GLM 4.6? all I ever hear for porn is Rocinante and Nemo?
>>106840808but you do? this general has been non stop shilling glm for days now
>>106840808glm is new and like 30 times larger than nemo
>>106840675-1 GPU Layers means to use their auto guessing system which I'm sure works great. Rocinante1.1 is 12B so you can fit it at any quant - I'd put 99 in GPU Layers. mention specifically which quant all relevant details for posts of this natureYes increase context 16K is enough for RPMaybe FlashAttention on (reduce GPU memory used for larger context) can affect output if schizo>QuantMatMul (mmq)What even.. *sigh*Hope you're not running kobold only because there's a GUI with sliders instead of writing a couple things in a text file?
>>106840802it's worse than just a wrappera real wrapper would support the --parallel flagkobold doesn'tthe real llama.cpp is the superior product
>>106840818>>106840822>293.56GBBreh. Is there a torrent for this shit? that's a big fucking model
>>106840806>>106840832YALS does everything
>>106840806I don't understand this, learn to prompt, learn to sample (some of the sampler params posted are terrible. actually look at the logprobs yourself and understand what your samplers are doing to that distribution. people blindly copy paste dumb shit) and simply run better models>>106840883if ur that 5090 guy at least post how much RAM and it's speed. 128GB min or GLM-4.6 is out of reach for now for any reasonable interactive use. GLM spergs are in overdrive. chill & fix ur setup
>>106840789Who is consuming OpenAI's API through OpenRouter? You need to bring your own key anyway to use it.
>>106840956not anymore, they opened it up to general use recently
>>106840789I thought that was Delphi as in the programming language delphi and I was like WTF lol.
>>106840979I never tried, but I don't even think LLMs could be good at Delphi with whatever little open source code there is out there for that language/platform comboI think even Common Lisp has more stuff to train on
I'm back. Anything interesting happen while I was gone?
>>106841013no, you can go back
>>106841013how old are you?
>>106840883>what is quanting?
>>106841013when were you goon?
>>106840308its migu!
>>106841013You're no longer needed. Cheerio
Been out of the loop for a while... is Rocinante still the cope model for jerking off?
>>106840062Cave Story Balrog
>>106840789If OpenRouter is their second biggest user, it's unironically over for OAI. lmao.Didn't Anthropic say something like 1000T+ a month, or was that Google? Either way, looks like both of those fags are lying.
>>106834517Usecase for a low end intel GPU with 16GBs of VRAM?
>>106835939I just wish that someone would port whatever his improvements to cpu inference are to mainline. On ik_llama.cpp I get >2x faster prompt processing (qwen 3 30b a3b, ryzen 5600, cpu-only inference) vs mainline/mainline+openBLAS
>>106841447sex... sexxx..... seeexxxx.....
>>106841295rocinante was never good, in fact there were no good local models until glm
>>106840789>most of those names haven't produced one bit of useful softwareBig companies don't use AI models that aren't deployed by them; devs still do unofficially, but still. And there is a simple reason for it: OAI doesn't guarantee data safety and removal in any reasonable time frame, and for a big company, that's a massive risk.
>>106841295That, Nemo Instruct, GLM air.Qwen 30b thinking maybe?
>>106841461Nobody is going to touch that. There's nothing about the license the prevent straight copy-pasting his changes, but it'll just instigate another week of drama with iwan crying about attribution.
Why the fuck is local text to speech still so fucking bad? unless there's some new stuff i dont know about.
>>106841492>GLM air.Which one should I use if I'm running 128GB RAM and a 5090?
>>106841569Just use GLM or stop crying saar
>>106841515That sucks ball. All I want is that 2x improvement (which also works on processing image inputs, which ik llama ported just couple days ago)
>>106841515rewrite it with ai so it looks different and give attribution like: tehe~ inspired by this implementation
>>106841628i just want to generate realistic joi's using someone elses voice. is that too much to ask?
>>106841492when did the last two come out?>>106841480Yeah, it wasn't good, but it was the one at the top of the turd mountain.
>>106841592q8 is less than 120gb, so that.You can also try a cope qwant of glm 4.6 (non air) or qwen 3 235B.>>106841664>when did the last two come out?GLM and Qwen 3?Not that long ago, two, three months.
>>106841492Qwen is not good for jerking off.>>106841664>when did the last two come out?around august IIRC
>>106841701thanks anon
>>106841592Air is shit, run 4.6 in q2/q3
>>106841721>>106841701not that anon but say I have the same setup (5090) but I only have 64GB of RAM. Which GLM should I go with?
>>106841726q5, q4 air.
>>106841734Which specific q5 though? I was looking at exactly those actually.
>>106841726Are you capable of basic arithmetic? Like addition and stuff?
>>106841740Ideally, the largest one you can fit with the context size you want.Experiment, see what works for you.
>>106841492>ask whether something better than Rocinante has come out>replies with either the model rocinante is based on>or a model that's 8 times the filesize in its lightweight version>or a model that's not for porn at allSo basically nothing has surpassed rocinante that doesn't involve more copeputing?
>>106841778nobody wants to admit it, but no
>>106841778>>or a model that's 8 times the filesize in its lightweight versionBut that you can run in RAM even with an 8gb GPU.
>>106841778Mistral lost, shill.
>>106841791>But that you can run in RAM even with an 8gb GPU.Slowly. You always leave that part out.
https://x.com/rryssf_/status/1976269613072843063https://www.arxiv.org/abs/2510.04618
>>106841842
>>106841842I hate reading linkedin-ese
>>106841842>model writes, reflects, and edits its own promptuseless for anything but specific math problems.
>>106841842>pedantic wall of textyou know it's another snakeoil lol
>>106841842>he believed
>>106841861>esl prompt = bad results>give esl prompt to model to first fix grammar, spelling, clarity = good resultsthey just discovered garbage in; garbage out and invented an "Enhance Prompt" feature, give them some VC money ASAP
>>106841842ah yes let my model deliver me even more effective slop by having it inject the slop directly into the prompt on its ownthe slop will reach levels previously unseen
all these shitty papers make me feel like i can write a shitty paper and get a lot of clout for it, then get hired by some vc pounded startup and grift my way into moneythis truly feels like the dotcom bubblet. wasnt alive during the dotcom bubble
>>106841983it can get even sloppier when they start using it for generating synthetic training data
i have a feeling that Q8_0 context cache is really pounding air into the ass, next roleplay session ill switch to 16k context at native context cache to report back
>>106841910>pircelone of my favorite 4chan meme kek
>>106841842To be honest I never thought about a model for prompts but then again I really don't think you can prompt away the coomer problems. What fixed majority of my coomer problems is using <you know what I am using and you should be using it too instead of malding>.
it's friday where's my gemma stupid fucking nigger
>>106842011Like most things, it's less about what in your paper, and more about your connections: what names of the institution and co-writers you can get on the front, how many eyeballs you can to look at it, and how many citiations you can get. Lots of times you see so many names on a paper because they are a cartel and cite each others' papers.
Holy fuck GLM 4.6 actually gets it. It reasons. Gemini saars please release 3.0 so I can try vibecoding phrase ban into ikllama, because it is still quite sloppy.
>>106842108>gemmalmao, you're going to get nothing but local nano banana 12b imagen and that's it
>>106842048you don't really need to, it doesq8 kv cache was only good on old full attention modelsgqa is already raping them enough, adding quant on top of that is just asking for shitty output
>>106842108>gemmaIt will be gpt-oss tier safe from now on. I only care about Gemini since it's the only model with proper long context.
>>106842048q4 cache is better than q8 cache
>>106842202Mmmm.. Nyo~maybe in exllama, but >>>>>>>>>>>benchmark
>>106841778Rocinante is a Drummer model right? So have you tried one of his finetunes on a newer model?He probably uses the same dataset so should be similar.See if he's made a gemma3-12b model
kv quantization absolutely murders models and I doubt the sanity and iq of people who unironically turn this piece of shit on
>>106842294i have to turn it on for a bigger context :'(i know it degrades but.. i 'ave to do it br'er
>>106842048Quantizing KV at all absolutely does drop output quality, It's not always going to be noticeable in needle in a haystack type tests, but if you make them recall events and why they happened, how characters reacted, etc. over a long context, you see hallucinations a LOT more often, they'll confuse who said what, and use 'similar' words when quoting you, or themselves, that could be synonyms in one context but don't have the correct meaning in that one, making it seem like they've gone (more) retarded.
>>106842308>what, and use 'similar' words when quoting you, or themselves, that could be synonyms in one context but don't have the correct meaning in that one, making it seem like they've gone (more) retarded.yess exactly, it also confuses you and me more often
>>106842108This just in, anon lies about some shit and another anon actually believes it
>>106842308>>106842294It's very obvious to anyone who actually rp's with their models, it's like watching the model get blasted with a stupid beam
>>106842108Wtf are you guys talking about? Today's Thursday.
>>106842388my dear brazilian or american anonits friday east of the united kingdom, you'd be surprised what time its in japan
pythonbros... https://x.com/prithajnath/status/1976118864175084008
>>106842409on this note, is there a backend written in python? I know pytorch is 'python' but isnt the actual ai inference/training code written in c or c++ or something?i know exllama and llamacpp arent python but..
>>106842434transformers backend is in python, hope that helps
>>106842434There is vLLM, but there isn't anything that isn't built on top of at least pytorch
>>106842434Python is literally 100x slower than C.
>>106842409That's good for my programs
>>106842445oh shit i never imagined it would actually be completely python, i thought it was packaged in pip with python interfaces just for ease of use, i wonder if uv gives any speedup for comfyuifun fact: nunchaku is written in c/c++
>>106842470Did you really thing HF were competent enough to write it in C/C++?
>>106842108>stupid fucking nigger>gemmaI don't think you will be very compatible anon....
>>106842477i assume people are competent too much, sorry anon
>>106842470They have I think NumPy and Torch among their dependencies, there's no way they're actually doing Python loops over tensors, right?
>>106842397It's Thursday at Google.
>>106842496Isn't Google based in India?
>>106842492true.. at least both those are only 60% python>only
>>106842492imagine being this level unaware
>>106842492No but the transformers code is absymally optimized, which is why no one uses it for inference. For example the kv cache is updated by doing kv = cat(kv, new_kv), allocating a new buffer of slightly larger size for every single token.
>>106842596Who cares? Just buy more VRAM.
Any progress with GLM MTP yet? The free speed boost this gives is now absolutely necessary. Possibly the most important next step for llama.cpp since getting MLA to work.
>>106842636waiting for you, llama.cpp accepts contributions :^)
>>106842636Be the change you want to see
>>106842641>>106842678Why is /lmg/ so salty all the time?
it's here
>>106842683They are not salty, they are begging you.
Anything happen?
>>106842795
>>106842734this. please save us
>>106842795>Anything happen?no, and you can blame MLK for thathttps://files.catbox.moe/vf1qtc.mp4
>>106842795GLM 4.6 released two weeks ago, so it's about time something happens now.
>>106842843What a dick.
I hope there is an anon here that can help me. Is there a good uncensored model that's small, but good with agentic tasks? Like Nemo but recent?
>>106842636https://github.com/ggml-org/llama.cpp/pull/15225#issuecomment-3368697004
>>106842844it was like 1 week ago, not 2
>>106842844>>106842883it's about to get more airy in here...
>>106842883It was?Damn, time is flowing fucking weird man.
>>106842890It could be nothing, a.k.a. hot air.
I get that llama.cpp is focusing on other shit rather than implementing MTP. But why the fuck is ik_llama completely ignoring it? Their entire gimmick is that their petty fork is optimized towards running MoE models off CPU. MTP for both GLM and Deepseek would be another huge step to own main llama.cpp, so it doesn't really make sense that they're ignoring it too.
>>106842911>>106842683
>>106842911because it's hard
>>106842890really hoping we get 4.6 air soon. 4.6 is way slower than i would like. very high quality, but i am very impatient.>>106842911i dont even know what MTP is. multiple token prediction?
>>106842956>multiple token predictionYes, free performance.
what's the drama between llama and ik_llama?
>>106843017mit cuck license is so cucked, yet ikawrakow wanted to be attributed properly for being a cuckso he complained about it to ggerg, when one of the files ikawrakow wrote mostly by himself, intel's copyright was in them because intel touched them a little bitbut ikawrakovs wasnt then ikavrawow forked llamacppmitcucks always lose
>>106843051>>106843051>>106843051
>>106843017Something something ggergachod not attributing troonrakows code. Something something they know each other irl. Something something niggerganov got all the fame and cash while kawrakuck got nothing. >>106843048 MITcucks indeed always lose, African-Americanganov got cucked by forks and wrappers(like ollama and lmstudio) himself.
>>106843048>when one of the files ikawrakow wrote mostly by himself, intel's copyright was in them because intel touched them a little bit>but ikawrakovs wasntI mean, I'd be pissed too...
>>106841203
>>106843278Niku