[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: robololi hugs GPU 2.jpg (519 KB, 1024x1024)
519 KB
519 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108519856 & >>108516658

►News
>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3
>(03/31) 1-bit Bonsai models quantized from Qwen 3: https://prismml.com/news/bonsai-8b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: chibi_miku_gpu_1.png (1.29 MB, 1024x1024)
1.29 MB
1.29 MB PNG
►Recent Highlights from the Previous Thread: >>108519856

--Discussing handling of Gemma's thinking blocks in multi-turn histories:
>108522106 >108522122 >108522130 >108522225 >108522326 >108522301 >108522330 >108522349 >108522396
--Comparing Gemma 4's performance and repetition issues against Mistral 3.x:
>108520556 >108520583 >108520616 >108520591 >108520629 >108520663 >108521005 >108520665 >108520695 >108520794 >108520871 >108521612 >108521633 >108521640
--Discussing logit softcapping for Gemma 4 to improve response variety:
>108521009 >108521025 >108521026 >108521075 >108521091 >108521139 >108521303 >108522677 >108522691 >108522702 >108522943 >108522949 >108522955
--Gemma 4's low censorship and debugging llama-server crashes:
>108521733 >108521777 >108521796 >108521822 >108521831 >108521872 >108521908 >108521978 >108521989 >108522157 >108522161 >108522332 >108522771
--KV Cache quantization and context length optimization for roleplay:
>108521216 >108521226 >108521307 >108521373 >108521385 >108521388 >108521401
--Discussing a patch making Gemma's logits softcap configurable:
>108520086 >108520139 >108520210 >108520231
--Debating llama.cpp's direction following Gemma 4's reception:
>108520807 >108520826 >108520858 >108520880 >108520921 >108520990 >108520860 >108520934
--Impressions of Gemma-4-2B's roleplay quality and thinking capabilities:
>108520161 >108520190 >108520317 >108520493 >108520537
--Anon buys RTX 6000 for Gemma 4's high KV cache needs:
>108519917 >108519933 >108519982 >108519950 >108519952 >108519983 >108519960
--Praising Gemma 4 for ERP and discussing perspective tests and performance:
>108519877 >108519901 >108520393 >108520589 >108520547 >108520608 >108520642 >108521252 >108521812 >108522080
--Miku (free space):
>108520018 >108520164 >108520232 >108520411 >108520425 >108521082 >108521554 >108521568 >108521656

►Recent Highlight Posts from the Previous Thread: >>108519859

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108523376
wait whos this bitch
>>
File: 1775296187.png (275 KB, 1762x1448)
275 KB
275 KB PNG
First for NIM
>>
>>108523394
>First
epic fail
>>
>>108523398
based
>>
>>108523389
don't ask a llm for their opinion on what log output says without also showing them the relevant code, they love making assumptions about things
pic related is why vibecoding or vibeasking llm questions about code is a fail when you do not know what should be introduced in the context yourself
>>
>>108523415
Damn, so many things are wrong on gemma 4's implementaiton
- We don't know if the rotation shit is being applied or not
- The temperature does nothing
- There's so crash during tool calls

bruh, I miss the time when llama.cpp wasn't bought by huggingface, the enshittification process wasn't as fast as right now, now their team consist on vibeshitters who can't verify by themselves if Claude is hallucinating stuff or not
>>
Can we actually be reasonably sure that this technology is approaching its upper limits in what it can do and isn't just bottlenecked at every stage of development by jeets and vibecoders shitting it all up?

Would AGI SOON :rocket: actually be possible in a less brown world?
>>
Imagine when Gemma 4 124B gets reveled on May 20. I wonder if the Chinese will rush to release their best models before that. Or perhaps they'll wait for Google's fagship Gemma to distill the heck out of it like they've been doing with gpt-oss-120B.
>>
>>108523433
The only actual innovative lab is DeekSeep so we'll know for sure when/if V4 comes out, but that it has taken over a year already is not a good sign.
>>
>>108523433
i think all ml researchers are sub-human, so no.
>>
>>108523433
even the data is built upon a framework of shit, I mean, most data labeling work is done by brown hands, openai is willing to spend billions on data center but not a single cent on paying western worker a worthwhile salary to do proper, well thought labeling work.
RLHF is really reinforcement learning from jeet feedback. Even when you do it from pure synthslop reward models, those models inherit and condense the Original Sin
>>
>>108523394
>thinking disabled
>>
>>108523447
Funny enough I've had many remote job advertisements for chemistry data labeling tasks requiring chemistry related degrees. Which the Indians actually have a lot of, but they seem to be targeting Europeans / Americans specifically.
>>
>>108523442
I hope Dipsy saves us all.
>>108523447
Is there realistically any fix for this that isn't starting over from scratch? As much as we shitpost about ozone in our RP logs, it really is the perfect microcosm of the entire problem that's been snowballing since at least GPT-3, like you described.
>>
>>108523394
Doesn't NIM log all your prompts
>>
>>108523473
Just like space debris, there is no fix other than accepting that the problem exists and attempting to mitigate it.
>>
>>108523447
The travesty is that white people think they're too good for manual data labeling work, so it's exclusively done by browns and blacks.
>>
>>108523480
If something connects to the internet and isn't open source then it's safe to assume it's logging and sending everything it can
>>
Yes, I understand it now... I need more VRAM... Much, much more VRAM...
>>
>>108523484
kys elon
>>
Am I supposed to get 21 t/s on Gemma 4 26B 4AB with a 3090? This seems slow compared to the 17 t/s I get on the 31B one.
>>
>>108523498
I'm getting 34 t/s on the 31B one with a 3090 so either way you're fucking your settings up somehow.
>>
>>108523498
I'm getting about the same as the other anon, make sure both are not spilling onto RAM.
>>
>>108523498
Use
>--gpu-layers 99
And tweak
>--n-cpu-moe XX
Starting from 20, 30, 40, 50 and check your vram usage. When it begins to drastically drop below the max (minus some space for your system) you have hit the sweet spot.
>--mlock, --no-mmap
Might make a difference too but that's up to you.
Use llama-server webui to test out these things.
>>
>>108523484
lol. lmao even.
>>
>>108523498
you probably have shit using your vram and not loading all of it.
i get 130t/s at q4 on a 4090.
>>
I get 50t/s at 0 context and 40t/s at 55,000 context with Gemmy 4 31b. I'm impressed by how little relative speedloss there is as context climbs.
>>
>>108523484
just pay them more and you'll see the average melanin becoming clearer
>>
>>108523498
Repull llama it fixed this today.
>>
>>108523539
Asking for fair wages is antisemitic.
>>
>>108523540
What else was broken by the latest fix?
>>
Yeah of course, the training pipeline is currupted by browns, which were brought out by... Jews! Oh so jews are at fault again! Because of jews we don't have AGI!! Damn, every single time!!!
>>
>>108523567
>AGI
corpo wet dream, no one with a soul wants this
>>
>>108523567
>Damn, every single time!!!
this but unironically
>>
>>108523562
Some anons report gemma still has repeating melties but it's working fine for others so maybe you'll be one of the lucky ones.
>>
>>108523567
What you say with sarcasm, I say with conviction.
>>
File: 6954214314_1cf62f1742_h.jpg (907 KB, 1600x1065)
907 KB
907 KB JPG
>>108523567
You meme, but I'm actually getting pretty tired of how it's unironically every single time.
>>
File: 1764191774459410.png (84 KB, 194x260)
84 KB
84 KB PNG
>>108523567
>>
openai indeed had a high concentration of baal worshippers within its founding members
>>
>>108523567
My schizo brother is so far gone that he literally said "The Jews invented and imposed the laws of physics on the universe to restrict white people and ensure we don't have unlimited energy so we stay dependent on our Jewish masters".

These people can't be reasoned with. They will literally say the powers of Jews are equivalent or superior of god itself before they will admit they are contrarian schizos that refuse to take ownership of their own issues.
>>
>>108523593
he's not that far off tbqh
>>
>>108523593
He's still right you know. Mankind is living in the dark and at the mercy of greedy shits. Always has been.
>>
>>108523593
sounds like a "it's the jews" variant of an otherwise already pre-existing schizo theory that has existed for a very long time
https://www.trickedbythelight.com/tbtl/index.html
Throughout history there were many times when extreme gnostic world views like these were spouted.
Your brother being that type has no incidence on whether the wrong side won WW2.
>>
>>108523593
>They will literally say the powers of Jews are equivalent or superior of god itself
they're the chosen people after all
>>
>>108523593
>restrict white people and ensure we don't have unlimited energy so we stay dependent on our Jewish masters
Look up (((who))) sabotaged Nikola Tesla. :)
>>
>>108523623
2nd law of thermodynamics, entropy etc etc.
>>
>>108523593
he's right tho the jew "invented", aka daydream aka thought experiment'd GR to cucked us out of trying FTL travel
>>
File: 1774160128140962.jpg (427 KB, 1977x1434)
427 KB
427 KB JPG
>>108523575
>>108523582
>>108523585
>>108523594
>>108523599
America was founded on Judeo-Christian values. Jewish people are Paragon of Virtue and root of all morality. Those who bless Israel will be blessed, and those who curse Israel will be cursed.
>>
File: 1763020736697981.png (2.25 MB, 2000x1333)
2.25 MB
2.25 MB PNG
>>108523659
>America was founded on Judeo-Christian values.
*Christian Protestant values, which is why Presidents have to swear on the bible and not on the Torah
>>
>>108523659
>you lost tranny
>*runs towards the meat grinder*
What's the name of this mental disease?
>>
I think I might prefer Gemma 4 31B over GLM-4.6..........

Why the FUCK doesn't Google just release a bigger version? They would clearly dominate the entire open source model scene. Is it because they are afraid to cannibalize Gemini users?
>>
>polshit
sirs do the needful.
For on-topic stuff, extremely disappointed in llmaocpp development of the non-core components.
>>
>>108523659
>go die for israel to.. uhh.. own the libs
Honestly? I support this message, good idea
>>
Frontier labs openly state they want to create godlike superhuman AI, which implies they want to rule the world.

Do you trust Dario or Sam (or Elon) to rule the world and be at their mercy? Do you think they have your best interests at heart?
>>
>>108523659
you guys are cringe, you are pro war now?? if only democrats weren't worshipping troons I'd vote for them in the midterms, looks like I'll stay at home for the moment, both parties fucking suck
>>
are these settings sensible for 24GB VRAM? I assume it's not worth offloading dense models to RAM at all.

```
llama-server -m ""models/bartowski/google_gemma-4-31B-it-GGUF/google_gemma-4-31B-it-Q4_K_S.gguf"" ^
--alias "Gemma 4 31B" ^
--ctx-size 32384 -fa on ^
-ctk q8_0 -ctv q8_0 ^
-ub 4096 -b 4096 ^
--parallel 1 ^
--threads 16 ^
--no-mmap
```
>>
>>108523712
Israel is worth it
>>
File: file.png (174 KB, 868x605)
174 KB
174 KB PNG
oh fuck this looks so cursed
>>
>>108523719
>he doesn't uninistall the previous cuda version before installing a new one
anon...
>>
>>108523712
I think the image is either ironic or a bait
>>
>>108523705
>Do you trust Dario or Sam (or Elon) to rule the world and be at their mercy? Do you think they have your best interests at heart?

>Sam
Fuck no, this dude is the most psychotic person I've ever seen. He makes peter thiel look like a benevolent saint in comparison. I'd literally prefer a Yudkowski tier misaligned AI to take over humanity and genocide humanity away than for Sam Altman to rule the world

>Elon
I think he would get off on the idea but he has a constant need for adoration and thus I actually think he would roleplay some retarded savior-type that helps humanity as long as you constantly praise and validate him. It's a bad outcome but more like living in Singapore where you have to praise the Kim family but your daily life and quality of life is pretty decent if not good

>Dario
I legitimately think the world would be a better place with him at the helm. I think he's a genuine person that wants a better place, he would probably defer most of his powers to democratic institutions (as long as it follows his definition of democratic and "good" which will be slightly left center "liberal" version of democracy)
>>
>>108523715
>--ctx-size 32384
*32768
Also I would lower ub to 512 to save memory to then NOT quantize KV at all, it should still fit in VRAM. Quantizing KV degrades outputs considerably
>>
>>108523719
goofy ahh font
>>
>>108523741
>I legitimately think the world would be a better place with him at the helm
lol
>>
>>108523742
even with the new rotation PR?
>>
>>108523705
I think that any company that claims to achieve AGI should be nuked by every other company on the planet
>>
>>108523691
>>108523700
>>108523712
Cope I'm still voting for Trump's third term in 2028. I hope we invade Cuba, North Korea and China next just to make you libs seethe more.
>>
>>108523747
Gemma 4 doesn't benefit from KV activation rotation
>>
File: 1749581574366278.png (132 KB, 1828x635)
132 KB
132 KB PNG
>>108523747
it doesn't look like the rotation is being applied to gemma though, that's the problem
>>
>>108523751
will you be enlisting?
>>
>>108523747
I'm not even going to bother trying that until someone reputable posts a well-tested and reproducible benchmark.
>>
>>108523755
You're either with us or against us



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.