/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108519856 & >>108516658►News>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3>(03/31) 1-bit Bonsai models quantized from Qwen 3: https://prismml.com/news/bonsai-8b►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108519856--Discussing handling of Gemma's thinking blocks in multi-turn histories:>108522106 >108522122 >108522130 >108522225 >108522326 >108522301 >108522330 >108522349 >108522396--Comparing Gemma 4's performance and repetition issues against Mistral 3.x:>108520556 >108520583 >108520616 >108520591 >108520629 >108520663 >108521005 >108520665 >108520695 >108520794 >108520871 >108521612 >108521633 >108521640--Discussing logit softcapping for Gemma 4 to improve response variety:>108521009 >108521025 >108521026 >108521075 >108521091 >108521139 >108521303 >108522677 >108522691 >108522702 >108522943 >108522949 >108522955--Gemma 4's low censorship and debugging llama-server crashes:>108521733 >108521777 >108521796 >108521822 >108521831 >108521872 >108521908 >108521978 >108521989 >108522157 >108522161 >108522332 >108522771--KV Cache quantization and context length optimization for roleplay:>108521216 >108521226 >108521307 >108521373 >108521385 >108521388 >108521401--Discussing a patch making Gemma's logits softcap configurable:>108520086 >108520139 >108520210 >108520231--Debating llama.cpp's direction following Gemma 4's reception:>108520807 >108520826 >108520858 >108520880 >108520921 >108520990 >108520860 >108520934--Impressions of Gemma-4-2B's roleplay quality and thinking capabilities:>108520161 >108520190 >108520317 >108520493 >108520537--Anon buys RTX 6000 for Gemma 4's high KV cache needs:>108519917 >108519933 >108519982 >108519950 >108519952 >108519983 >108519960--Praising Gemma 4 for ERP and discussing perspective tests and performance:>108519877 >108519901 >108520393 >108520589 >108520547 >108520608 >108520642 >108521252 >108521812 >108522080--Miku (free space):>108520018 >108520164 >108520232 >108520411 >108520425 >108521082 >108521554 >108521568 >108521656►Recent Highlight Posts from the Previous Thread: >>108519859Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108523376wait whos this bitch
First for NIM
>>108523394>Firstepic fail
>>108523398based
>>108523389don't ask a llm for their opinion on what log output says without also showing them the relevant code, they love making assumptions about thingspic related is why vibecoding or vibeasking llm questions about code is a fail when you do not know what should be introduced in the context yourself
>>108523415Damn, so many things are wrong on gemma 4's implementaiton- We don't know if the rotation shit is being applied or not- The temperature does nothing- There's so crash during tool callsbruh, I miss the time when llama.cpp wasn't bought by huggingface, the enshittification process wasn't as fast as right now, now their team consist on vibeshitters who can't verify by themselves if Claude is hallucinating stuff or not
Can we actually be reasonably sure that this technology is approaching its upper limits in what it can do and isn't just bottlenecked at every stage of development by jeets and vibecoders shitting it all up?Would AGI SOON :rocket: actually be possible in a less brown world?
Imagine when Gemma 4 124B gets reveled on May 20. I wonder if the Chinese will rush to release their best models before that. Or perhaps they'll wait for Google's fagship Gemma to distill the heck out of it like they've been doing with gpt-oss-120B.
>>108523433The only actual innovative lab is DeekSeep so we'll know for sure when/if V4 comes out, but that it has taken over a year already is not a good sign.
>>108523433i think all ml researchers are sub-human, so no.
>>108523433even the data is built upon a framework of shit, I mean, most data labeling work is done by brown hands, openai is willing to spend billions on data center but not a single cent on paying western worker a worthwhile salary to do proper, well thought labeling work.RLHF is really reinforcement learning from jeet feedback. Even when you do it from pure synthslop reward models, those models inherit and condense the Original Sin
>>108523394>thinking disabled
>>108523447Funny enough I've had many remote job advertisements for chemistry data labeling tasks requiring chemistry related degrees. Which the Indians actually have a lot of, but they seem to be targeting Europeans / Americans specifically.
>>108523442I hope Dipsy saves us all.>>108523447Is there realistically any fix for this that isn't starting over from scratch? As much as we shitpost about ozone in our RP logs, it really is the perfect microcosm of the entire problem that's been snowballing since at least GPT-3, like you described.
>>108523394Doesn't NIM log all your prompts
>>108523473Just like space debris, there is no fix other than accepting that the problem exists and attempting to mitigate it.
>>108523447The travesty is that white people think they're too good for manual data labeling work, so it's exclusively done by browns and blacks.
>>108523480If something connects to the internet and isn't open source then it's safe to assume it's logging and sending everything it can
Yes, I understand it now... I need more VRAM... Much, much more VRAM...
>>108523484kys elon
Am I supposed to get 21 t/s on Gemma 4 26B 4AB with a 3090? This seems slow compared to the 17 t/s I get on the 31B one.
>>108523498I'm getting 34 t/s on the 31B one with a 3090 so either way you're fucking your settings up somehow.
>>108523498I'm getting about the same as the other anon, make sure both are not spilling onto RAM.
>>108523498Use >--gpu-layers 99And tweak>--n-cpu-moe XXStarting from 20, 30, 40, 50 and check your vram usage. When it begins to drastically drop below the max (minus some space for your system) you have hit the sweet spot.>--mlock, --no-mmap Might make a difference too but that's up to you.Use llama-server webui to test out these things.
>>108523484lol. lmao even.
>>108523498you probably have shit using your vram and not loading all of it.i get 130t/s at q4 on a 4090.
I get 50t/s at 0 context and 40t/s at 55,000 context with Gemmy 4 31b. I'm impressed by how little relative speedloss there is as context climbs.
>>108523484just pay them more and you'll see the average melanin becoming clearer
>>108523498Repull llama it fixed this today.
>>108523539Asking for fair wages is antisemitic.
>>108523540What else was broken by the latest fix?
Yeah of course, the training pipeline is currupted by browns, which were brought out by... Jews! Oh so jews are at fault again! Because of jews we don't have AGI!! Damn, every single time!!!
>>108523567>AGIcorpo wet dream, no one with a soul wants this
>>108523567>Damn, every single time!!!this but unironically
>>108523562Some anons report gemma still has repeating melties but it's working fine for others so maybe you'll be one of the lucky ones.
>>108523567What you say with sarcasm, I say with conviction.
>>108523567You meme, but I'm actually getting pretty tired of how it's unironically every single time.
>>108523567
openai indeed had a high concentration of baal worshippers within its founding members
>>108523567My schizo brother is so far gone that he literally said "The Jews invented and imposed the laws of physics on the universe to restrict white people and ensure we don't have unlimited energy so we stay dependent on our Jewish masters".These people can't be reasoned with. They will literally say the powers of Jews are equivalent or superior of god itself before they will admit they are contrarian schizos that refuse to take ownership of their own issues.
>>108523593he's not that far off tbqh
>>108523593He's still right you know. Mankind is living in the dark and at the mercy of greedy shits. Always has been.
>>108523593sounds like a "it's the jews" variant of an otherwise already pre-existing schizo theory that has existed for a very long timehttps://www.trickedbythelight.com/tbtl/index.htmlThroughout history there were many times when extreme gnostic world views like these were spouted. Your brother being that type has no incidence on whether the wrong side won WW2.
>>108523593>They will literally say the powers of Jews are equivalent or superior of god itselfthey're the chosen people after all
>>108523593>restrict white people and ensure we don't have unlimited energy so we stay dependent on our Jewish mastersLook up (((who))) sabotaged Nikola Tesla. :)
>>1085236232nd law of thermodynamics, entropy etc etc.
>>108523593he's right tho the jew "invented", aka daydream aka thought experiment'd GR to cucked us out of trying FTL travel
>>108523575>>108523582>>108523585>>108523594>>108523599America was founded on Judeo-Christian values. Jewish people are Paragon of Virtue and root of all morality. Those who bless Israel will be blessed, and those who curse Israel will be cursed.
>>108523659>America was founded on Judeo-Christian values.*Christian Protestant values, which is why Presidents have to swear on the bible and not on the Torah
>>108523659>you lost tranny>*runs towards the meat grinder*What's the name of this mental disease?
I think I might prefer Gemma 4 31B over GLM-4.6..........Why the FUCK doesn't Google just release a bigger version? They would clearly dominate the entire open source model scene. Is it because they are afraid to cannibalize Gemini users?
>polshitsirs do the needful.For on-topic stuff, extremely disappointed in llmaocpp development of the non-core components.
>>108523659>go die for israel to.. uhh.. own the libsHonestly? I support this message, good idea
Frontier labs openly state they want to create godlike superhuman AI, which implies they want to rule the world.Do you trust Dario or Sam (or Elon) to rule the world and be at their mercy? Do you think they have your best interests at heart?
>>108523659you guys are cringe, you are pro war now?? if only democrats weren't worshipping troons I'd vote for them in the midterms, looks like I'll stay at home for the moment, both parties fucking suck
are these settings sensible for 24GB VRAM? I assume it's not worth offloading dense models to RAM at all.```llama-server -m ""models/bartowski/google_gemma-4-31B-it-GGUF/google_gemma-4-31B-it-Q4_K_S.gguf"" ^ --alias "Gemma 4 31B" ^ --ctx-size 32384 -fa on ^ -ctk q8_0 -ctv q8_0 ^ -ub 4096 -b 4096 ^ --parallel 1 ^ --threads 16 ^ --no-mmap```
>>108523712Israel is worth it
oh fuck this looks so cursed
>>108523719>he doesn't uninistall the previous cuda version before installing a new oneanon...
>>108523712I think the image is either ironic or a bait
>>108523705>Do you trust Dario or Sam (or Elon) to rule the world and be at their mercy? Do you think they have your best interests at heart?>SamFuck no, this dude is the most psychotic person I've ever seen. He makes peter thiel look like a benevolent saint in comparison. I'd literally prefer a Yudkowski tier misaligned AI to take over humanity and genocide humanity away than for Sam Altman to rule the world>ElonI think he would get off on the idea but he has a constant need for adoration and thus I actually think he would roleplay some retarded savior-type that helps humanity as long as you constantly praise and validate him. It's a bad outcome but more like living in Singapore where you have to praise the Kim family but your daily life and quality of life is pretty decent if not good>DarioI legitimately think the world would be a better place with him at the helm. I think he's a genuine person that wants a better place, he would probably defer most of his powers to democratic institutions (as long as it follows his definition of democratic and "good" which will be slightly left center "liberal" version of democracy)
>>108523715>--ctx-size 32384*32768Also I would lower ub to 512 to save memory to then NOT quantize KV at all, it should still fit in VRAM. Quantizing KV degrades outputs considerably
>>108523719goofy ahh font
>>108523741>I legitimately think the world would be a better place with him at the helmlol
>>108523742even with the new rotation PR?
>>108523705I think that any company that claims to achieve AGI should be nuked by every other company on the planet
>>108523691>>108523700>>108523712Cope I'm still voting for Trump's third term in 2028. I hope we invade Cuba, North Korea and China next just to make you libs seethe more.
>>108523747Gemma 4 doesn't benefit from KV activation rotation
>>108523747it doesn't look like the rotation is being applied to gemma though, that's the problem
>>108523751will you be enlisting?
>>108523747I'm not even going to bother trying that until someone reputable posts a well-tested and reproducible benchmark.
>>108523755You're either with us or against us