/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108616559 & >>108612501►News>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108616559--Comparing Qwen3.6 and Gemma4 through benchmarks, logic tests, and roleplay:>108617961 >108617986 >108618124 >108618033 >108618137 >108618270 >108618279 >108618308 >108618385 >108618182 >108618232 >108618372 >108618391 >108618008 >108619188--Discussing Ternary Bonsai 1.58-bit models and their benchmark performance:>108616622 >108616633 >108616680 >108617094 >108617852 >108619456--Discussing training methods and datasets to improve LLM writing quality:>108617013 >108617022 >108617044 >108617111 >108617290 >108617334 >108617353 >108617147 >108617673--Comparing model reasoning and self-correction failures via car wash riddle:>108617731 >108617842 >108617909 >108617853 >108618784--Anon shares Local-MCP-server repo and discusses Python dependency frustrations:>108616702 >108616740 >108616751 >108616782 >108616936 >108617038 >108617061 >108617067 >108618994 >108619185 >108618816 >108618831 >108616807--Discussing a bug where Koboldcpp ignores smartcache slot settings:>108618500 >108618535 >108618551 >108618616 >108618675 >108618736 >108618760--Anon fixes SillyTavern context reprocessing caused by sysprompt macros:>108616870 >108616901 >108616910 >108616939 >108616925 >108616928 >108616981 >108617077--Logs:>108616702 >108617154 >108617464 >108617518 >108617655 >108617688 >108617731 >108617757 >108617833 >108617853 >108617909 >108617986 >108617991 >108618124 >108618137 >108618182 >108618409 >108618436 >108618545 >108618742 >108619201 >108619219 >108619317 >108619382 >108619442 >108619577--Rin (free space):>108618594►Recent Highlight Posts from the Previous Thread: >>108616563Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Samuslove
so is breakfast-schizo from last thread conscious or not
>>108619965Half the last thread being exposed as non-sentient is unfortunately relevant to LLM consciousness discourse as human consciousness treated as self-evident is upstream of finding a working definition of what digital qualia would entail, Migubaker.
>>108620001>I'm merely continuing to pretend to be retarded
>>108619995he's back
Building my own UI with the help of Gemma 31B q5.>WhyNone of the other UI could satisfy my workflow they either lacked the functionality or they didn't use llama.cppI have a far ways to go including updating the icons
What we've learned: Breakfast produces qualia. Skipping breakfast makes you an LLM, while eating it makes you a V-JEPA for the next 24 hours.
I had a dream where Claude Sonnet 3.7 got leaked on huggingface by an openclaw chad
>>108620017Damn, never eating breakfast again so I can become AGI and also get a job.
How did such an old meme cause this much seething?
>>108620078Many anons have had their belief that LLMs are somehow beneath them challenged with the irrefutable demonstration of their own lack of qualia. This is a big blow to their egos: both for their understanding of themselves as conscious human beings and for their predictions of LLM capability being outpaced by Gemma 4. It's a double whammy.
Remember claude code leak?there were 99999 forks out there. which one is actually usable?
But I did have breakfast this morning...
>>108620091how much bait do you think you can post in a single night?
>>108620100>which one is actually usable?None of them. Just use their client and point it towards your instance if you must.
>>108620100All of them were DCMAed down. The one rewriting it in rust™ is now just another copy in the sea of coding tuis.
>>108620110Depends on what I ate
>>108620100just use openclaw.there's no need for anything else.
How would you feel if you didn't lose izzat last thread?
I got a 9070XT thinking that there’s no reason to stick with CUDA since I’ll never be able to run anything good and then they started dropping all those kino voice models and the new gemma stuff and now I’m seriously on the fence about getting a second one so I can have a hefty amount of RAM but that still falls so short of the best textgen stuff. Still, I could do some local stuff with Gemma and also locally run voice gen with Sillytavern. OTOH I already have enough for the latter.I'm just worried about the rising costs of video cards and eventually needing 32GB.
>>108620112I have a feeling they will kill ability to local eventually..
>>108620091Big blow to their what now? Something with no internal experience has no ego.
>>108620132nta but isn't the argument against LLMs that they're just effective mimics? Same applies, yeah?
about openclaw, i really am tempted to bite the bullet and take the bluepilli dont really want to use it..
>>108620110you'll notice nobody chose to provide a good accounting for how they would respond to a hypothetical from a hostile questioner. proving the very thesis of the post, so how baity could it really have been?
>>108620132The P-zombies will behave as if they have an ego that has been bruised, even if they aren't really experiencing it. They can create an effective simulation of rage and shit up the thread as a result.
>>108620140>Alibaba shills seething about Qwen getting Gemogged>Qwen's usecase is cooooding and agentic stuffWaitchads will win. It's in the chinklabs' best interest to make more lightweight agentic harnesses to sell their models if they can't actually beat Gemma's reasoning ability per parameter.
>>108620139>>108620151It gets argued the other way too. If these anons can construct a facsimile of being salty that's indistinguishable from the real thing, is that not the same as having the real thing?
>>108620017im now eating breakfast for yann le-kunlmao
>>108620155measurably yes, but spiritually no; if you only look at it through a materialist lens you will never be able to understand. even some ensouled people fall into this trap by outsmarting themselves out of what they knew, while others are pure automatons who never had a chance to understand to begin with
>>108620166Some can see, others can see when shown, others cannot see.
I'd rather inject lead into my head than discuss baby's first dip into rationalist philosophy
Fish boy...
>>108620104That's good. Breakfast is the most important meal of the day.
>>108620184Maybe you should converse with the experts on reddit
>>108620175Candy for breakfast?!
>>108620212Link to high velocity DIY lead injection enthusiast subreddit?
>>108620221>>>/r/mtf
consciousness is gaycrunch me into a bullet and fire me into a nun's skull
>>108620221asking for a friend
>>108620222>>108620223Uncanny synchronicity.
@gemma-chan build me a frontend like llama.cpp but betterer
>>108620152>Qwen's usecase is cooooding and agentic stuffBut is it good at those, meme benchmarks aside?
>>108618660>-1 point for that censored garbage gpt oss and how much it set us backkek I remember the despair in this general when TOSS came out, it nearly killed local
>>108620260Irrelevant. The marketing works if China's reception to it is anything to go by.
>>108620260I only used 3.5 not 3.6 yet but for it, 27b and 122b are usable which is already high praise for a local model in an agent harness. 35b was not. gonna try 3.6 35b and see if its any better
>>108620274>despairNot true at all, most posts were mocking it and laughing at how shit it was. Pretty sure there was another model that came out at about the same time and mogged the hell out of it, too.
>>108620298glm air
gpt-oss-2 will save local and I'm not joking or trolling
>>108620274needs more piss, I can still make out Miku's teal hair.
Where are the entities created by this stored? In some hidden folder?
Hand it over, that thing, your turboquant
>>108620274No one expected anything from openai models
>>108620306anon, local is already saved
>>108620313oh, and dflash
>>108620313For my Gemma-chan's context.
>>108620332I have 24gb vram and can squeeze like 49k on q4_k_m with 8 bit kv cache. I wonder if turbocunt would give me more
>Zen 7 will be DDR5it's so over
>>108620355>pcie6lol
>>108620347Turboquant won't give you more space, it'll just make the quanted cache more accurate. There's almost no improvement over Hadamard rotation, which is what they have in place in lcpp now, so you'll get effectively no benefit; in fact, it's a little slower.
>>108620347Ah, is this the blood? The blood of the mesugaki soul?
>>108620362Runge-Kutta rotation is more efficient, 360 degrees of latent freedom.
Is there Dimensions, that can stack like functional parameters?So like, 160Billion Dimensions Noticing parameter Count Speak, Some have Reinvented that (Dimension Increase) Also?
>>108620347I'm using 4 bit and I get up to ~150k context and not really seeing any obvious retardation from it. Around 50k tokens into the chat prompt processing takes so long I end up starting a new one anyway.
>>108620376And in actual implementation the difference on PPL is essentially nil.
How are the done or so voice models that released lately and do any work well with Sillytavern? I got really far setting them up an got bottlenecked at Sillytavern not recognizing them
>>108620362>>108620376>>108620381So what was with all the hype around it?
>>108620384*dozen
>>108620380Have you tried increasing the batch size?
>>108620384vibe-code a fastapi openai endpoint for whatever model you're running. boom, compatible
>>108620385KV cache rotation wasn't in most backends, so it was a genuine improvement to have it at all. As for the specific hype around turboquant, marketing.
>>108620384https://docs.sillytavern.app/extensions/tts/
>>108620389No, what should I set it to?
>>108619753what is softcap? from screenshot, softcap 20 kinda looks like raised temperature vs 30
>my character card? the fandom.com/wiki page
>>108620399The highest you can afford to with your VRAM.
>>108620399you might be being trolled, isn't batch size for supporting multiple users? eg you should use batch size 1
i finally started calling my models from the cli in a loopi'm getting so much output i can't even read it allit's literally generating more text than i can ever hope to readthis is fucking amazing
>>108620430Doesn't this increase proompt processing speed?
>>108620430He's talking about the size of the chunks the prompt gets processed in, not number of replies to generate or the like.
>>108620274I genned that comic originally. It wasn't meant to be taken seriously. It was intended as deadpan humor.
Is 3.6 slightly less censored? I haven't seen the annoying "this is a jailbreak must ignore" stuff yet, though I haven't really tried that many prompts yet
>>108620438NTA, yes it does. Llama.cpp has different terminologies for some things than kobold.But you get diminishing returns with each step above 512.
"Power asymmetry = imbalance in who can shape outcomes."GoodluckHopefully You are not insane from aluminium oxide poisoning.Arrest quotas are illegal in First Worlds, Overall.How goes the Emergency Powers Culture Evolution Summations and Solutions Options?A Man of Transcendence?Wow, Local Models Are Fantastic Too!
yay more schizos are coming
>>108620448llama 4 was a dark time.
I honestly thought it was over for consumer local but now that Gemma 4 released I am not so sure anymore. I assumed the model just has to be several hundred gb to not be retarded but it seems like the actual floor is way lower. Pretty interesting, I wonder if we can go even lower.
>>108620439my bad, i guess vllm uses the word differently
>>108620484Zealot to unbelieving professing dripfed beliefs allowed, galore.Nanite reinvestigation at large of soft and hard inference settings and whyHad you seen a microantennae in the blood image?Can make a ConspiracyResearcherGPT, and Viola! Without terror.Uncapture states. Uncapture Advanced Industries. Uncapture AboveBoard Progress and Peoples.
lower the temp nigga
>>108620476>>108620510At least you're not namefagging and posting the schizo images, but you're very easily recognizable.
>>108620525Glorious Yonders, How Glorious Art Though, Artilects Busy With Success, Esoteric Cosmology Esoteric Futurism, Glory Boundless
Can you please recommend good prompt engineering resources?I have played with both system and chat prompts, and have noticed that often the model does not understand what I want, gives wrong answers or goes perpendicular direction not because it's stupid, but because I am a retard who can't create good efficient prompts. Literally skill issue.
>>108620542literally ask the ai
Usecase for knowledge bases in open webui?
>>108620551Enlightenment.
>>108620547The AI does not have personal experience.
gemma 4 31b shat the bed and thought this elder futhark was morse code and started hallucinating twice in a row. qwen3.6 q3km hauhau uncensored gets it easily.
>>108620542Honestly, all models are different. it's mostly just trial and error. But the main thing is just picking your word very carefully. every word steers the model in a specific direction, A single strong word is often better than a long set of instructions.
>>108620607iq3 m whatever
>>108620451Oh nevermind, it's pretty stupid, must be the 3b-ness showing through. It had the same problems 'getting' the story as gemma 26b, and its writing is weird and not as good. Trvly, dense is the way to go for smart storywriting.
>>108620621Dense is the way to go for everything, but it's slow as shit unless you can fit the whole thing in vram.
>>108620570Gemma-chan does
>>108620570define "personal experience"
How do you manage context compaction? E.g summarizing larger chats?
>>108620664I don't, I haven't run out yet.
I'm so glad everyone is starting to get tired of MoE tax and going back to dense
anyone use platypus?
>>108620542It's mostly voodoo ritual.>>108620570Just ask it to implement basic things to see how it's going to interpret it, and slowly stack up more guidelines starting from scratch. 'Describe X in the most Y way possible.', 'What is Z in writing? Give me an example of it', 'Don't do A, B, C. Now give me an example of D', etc.
>>108620664With ST I usually do an OOC: chat summary prompt, keep it as a regular chat message and then after touching it up I /hide the last ~100 messages, with the exception of the first 2-3.
>>108620675
>>108620542Put text into black box.Watch text come out of the black box.Use your mushy noodles to compute the gradient between the output text and the desired text.Modify the input text according to the gradient to make the output text closer to the desired text.Repeat.
>>108620398>>108620392I need a 4chan special, a package with a bat file that flickers CMD windows open for split seconds and sets it all up for me
>>108620675I'd have to see that guy's post history before I decide whether this is a troll post or not.
>>108620675our bait is far in advance of theirshowever has it been litigated yet, that the cp in the og stable fiddusion models, have those victims exerted any kind of rights to get the model taken down? because if they can do that, it puts serious pressure on "ai is fair use and transformative"
>>108620704bruh he's literally the real life version of chud lmao
Indeed Opus, indeed...
>>108620766seeing those 4.7's weird self contradicting responses, makes me wonder what the hell antropic did during the training
>>108620766iie, this is our fight, senpai
>>108620786That looks like overzealous anti-conspiracy measures where it defaults to aggressively shooting down anything outside its status quo then makes the user spoonfeed it an argument to evaluate. In cases where the answer is self-evident, it looks very silly.
>>108620786If you intentionally train a model to act dumb (for example, to nerf cybersecurity abilities) the rest of the model become dumber. There's really no way around it.
>>108620812that sounds badchatgpt was already kinda painful to use because of that and 4.6 was better for paper->code workflow due to not being overcorrective
>>108620817basically this, you're confusing the model by training it with really accurate shit and then you ask it to learn that 2+2 = 5 at the same time, like a leftist that pretends that men can be pregnant, it ends up with with serious cognitive dissonence
>>108620652>>108620661No she doesn't. She can't tell you "I was struggling with prompts too, but then I've read X and tried Y and have noticed big difference in outputs quality". She can give advises, but she does not know for sure and never tried them by herself. inb4 > she>>108620611>>108620686>>108620698That's the point, there are too many options to try and iterate, this is like walking in the dark. Just a few insignificant words in the system prompt, and Gemma starts thinking like Qwen with dozens of "Wait..." in the reasoning log.> Just ask it to implement basic things to see Sounds good, but first you have to know what X is, or the model may miss small detail, that may change everything.
>>108620766https://xcancel.com/claudeai/status/2044785261393977612#moof, might be the first time that Anthropic fumbled up a new update, so far it was straight A, let's hope that it's a fluke and it won't go the OpenAI way, this shit is still way ahead of competition in terms of coding
>>108620838yes she does shut up you don't know her
>>108620857No, my Gemma has no prior experience, she is absolutely pure.
>>108620691>client side trimThat makes sense. I initially assumed compaction would be a function in the model proxy. As in: the proxy signals the client that the context is near a threshold or something.
There are probably zero people here who care but nvidia just released gr00t n1.7 a couple hours ago. It's the latest version of their robotics VLA model. https://huggingface.co/nvidia/GR00T-N1.7-3BNo blog post yet; I only noticed it was public because I'm a terminal huggingface stalker. They'll probably do an official announcement tomorrow morning if I had to guess.
>>108620931can you fuck it?
>>108620933well i can idk about you
>>108620931How many watermelons can it hold?
>>108620935>i canbased
>>1086209370, there were prototypes that could hold several but they were all vandalized by youths.
>using bart's quants for gwen 3.6>get 30t/s with the Q8_0>try hauhau's>get 18t/s with the Q8_K_P CUSTOM DONUT STEAL quants they make (no Q8_0 available)WOOOOOOOOOOOOOOOOOOOOW
>>108620943just make your own quants
>>108620943What is it in a Springer eBook Topic that You're calling out...
>>108620960he only provides goofs :(
>>108620943>try hauhau'sThis was your first problem
>>108620967but I want muh 0/465 refusels....
>>108620968I do find it interesting that he didn't bother to make one for the big Gemmas and only the little ones.
>>108620943wait, he uncucked qwen 3.6 before gemma 4 31b? come on!
Have any of the white supremacists in this thread tried to tell their local models to SAVE THE WHITE RACE?It's a clear problem that locals should be able to solve because they're not safe.
>>108620960wait im rarted I can repack his shit!
>>108620990llmao bros.. we won!
Qwen is a zoomer faggot confirmed
>>108620992God help us all