[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: k2.jpg (147 KB, 1024x1024)
147 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109119574 & >>109113030

►News
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: bljjnf.jpg (91 KB, 768x1024)
91 KB JPG
►Recent Highlights from the Previous Thread: >>109119574

--GLM-5.2 MTP implementation for improved speculative decoding acceptance rates:
>109122142
--Qwen-AgentWorld-35B-A3B release and output length:
>109123403 >109123430 >109123465 >109123913
--Defining the difference between a model and an AI agent:
>109124668 >109124691 >109124730 >109124755 >109124764 >109124701 >109124841 >109124926
--Causes of repetitive flowery prose and low diversity in LLM stories:
>109124865 >109124980 >109125051 >109125079 >109125080
--Effects of reasoning language on Kimi K2.7 Code and Gemma 4:
>109124971 >109125047
--Addressing model laziness and quality degradation in long roleplays:
>109121775 >109121785 >109121812 >109121823 >109121831 >109121880 >109122022
--Potential architectural shifts beyond attention-based transformers:
>109125511 >109125526 >109125543 >109125586 >109125610 >109125625
--Troubleshooting Qwen MoE offloading and optimizing AMD GPU performance:
>109120904 >109120952 >109121173 >109125237 >109125345
--LLM bias toward repetitive names and suggesting external generators:
>109119771 >109119824 >109119915 >109119853 >109119866 >109120132 >109120178 >109119904 >109120564
--Ways to simulate AI unavailability and biological cycles:
>109120794 >109120822 >109120838 >109120893 >109120911 >109120857 >109120967 >109121018 >109121069
--Anon looking for AO3 dataset dumps:
>109124084 >109124169 >109124183 >109124219
--Mixing RAM speeds and capacities for server upgrades:
>109121017 >109121033 >109121134
--Comparing AI RP frontends and auditing repositories for malware:
>109124145 >109124157 >109124287 >109124781 >109124802 >109125187 >109125385 >109125417 >109125438 >109125517 >109124790
--Logs:
>109119640 >109119824 >109123833 >109123903
--Teto, Miku (free space):
>109119718 >109121291 >109122952 >109122997 >109124423

►Recent Highlight Posts from the Previous Thread: >>109119578

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
70b dense
>>
70d bense
>>
File: 1773335542610482.webm (3.69 MB, 1920x1080)
3.69 MB
3.69 MB WEBM
Reminder to backup before hf cucks you. Grab the older ones you liked, too.
>>
>>109125927
let's compile an /lmg/ must download list
>>
>>109125927
idg the webm, why should i care about the moles
>>
>>109125939
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
>>
>>109125957
fetishes
>>
File: 1776287282543871.webm (3.96 MB, 1920x1080)
3.96 MB
3.96 MB WEBM
>>109125957
She's a mangaka hag who draws porn and decided to become a vtuber as a joke
>>
I can feel my programming skills atrophying as I rely on Claude Code more and more for my day job. It's depressing, and now when I try to write code by hand it feels horrible because I know I could be producing code 100x as fast. It's like going from 60hz to 240hz but a million times worse, or having sex on meth and then trying to do it sober and it's 1/1000th as good.
>>
>>109125882
new qwen is shit right? tiny moe I assume is assbad but all those benchmarks are saying it’s not shit. so what is it
>>
File: 1755512992368926.webm (3.9 MB, 1920x1080)
3.9 MB
3.9 MB WEBM
>>109125957
sorry wrong webm, but enjoy her hag tits anyway >>109125970
>>
>>109125970
I bet I can make her face flatter with a hammer.
>>
Why Devs always want more RAM instead of Optimize ? This is why AI needs 15236243642626tb of RAM to run
>>
>>109125990
game developers refusing to compress their textures and machine learning using [previously] unfathomable amount of ram are not even comparable.
>>
>>109125916
70 D Bench, cockbench's final form.
>>
>>109125971
>I could be producing code 100x as fast
Speed of code production != speed of correct code. Shitting out 1000 lines is not better than manually typing out 20 lines that do the same thing but way better which is easier to read, maintain and debug and with a much lower attack surface if you care about security.
>>
>>109126039
>if you care about security.
and exactly 0 project managers, executives, and employers do
>>
>>109125957
>>109125970
does she like do self inserts in doujins by adding her moles to characters?
>>
>>109125927
Best way to download? Was using hf cli but it randomly stopped halfway through kimi k2 base and rerunning the command isn't working.
>>
how does moe work on vram+sys ram?
suppose the active params can be fit in vram but the entire model needs to be put across vram+sys ram, is it the same speed as equal sized dense models?
>>
>>109126055
git clone
>>
>>109126039
>lower attack surface if you care about security
lol just have claude analyze the code a few times to fix it
>>
>>109126062
I mean in llama.cpp specifically
>>
>>109125990
you can't optimize an LLM the same way you optimize a game
>>
>>109126067
Last time I tried that it didn't even download all of the files.
>>
>>109126039
I'm working with Vulkan code, so for this specific feature we really do need thousands of lines of code.

>easier to read, maintain and debug
I agree, but the difference is that something that would take me months to implement can be done in a week now with the AI bot, and everyone else is using it. There's an expectation now that we can all go 10x as fast because of the LLMs. If I refuse to use it I'm just going to get fired for being a luddite.

>>109126052
yeah people just want to ship features fast. And the end result is that my understanding of the generated code is a tenth of what it would be if I had written it myself. It's depressing. At least with local models they're dumb enough that it forces me to intervene often. When I have Claude available though it just does everything and I turn into a glorified button pusher. I'm depressed bros.
>>
>>109126062
the smart modern way is to keep the dense parts that always run on gpu and put the experts on cpu
>>
Post more hag tits
>>
>>109125939
My “will download later when my HDD gets delivered” list:

1. https://huggingface.co/denru/Monstral-123B-v2-Behemoth-v2.2-Magnum-v4-123B-169B
(Still struggling to make it write completely uncensored, but its prose is exceptional when it works)

2. https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905
(Always have soft spot for this one)

3.1. https://huggingface.co/moonshotai/Kimi-K2-Base
3.2. https://huggingface.co/moonshotai/Kimi-K2-Thinking
3.3. https://huggingface.co/moonshotai/Kimi-K2-Instruct
3.4. https://huggingface.co/moonshotai/Kimi-K2.7-Code
(Just for the sake of completeness, but if I only have one Kimi then 0905 it will be)

4. https://huggingface.co/deepseek-ai/DeepSeek-R1
(An anon posted a snippet of his log many threads ago and it was fine)

5. https://huggingface.co/zai-org/GLM-4.5-Air
(Not used it yet, but many anons said it was good)
>>
>>109126062
It's typically slower than running a dense model fully on gpu, especially for prompt processing.
>>
>>109126093
>but many anons said it was good
It's not
>>
>>109126062
>is it the same speed as equal sized dense models?
I guess that's a very loose way of thinking about it. There are so many tunable parameters that affect t/s that you just have to experiment. Yesterday I went from 20t/s to 38t/s simply by changing the quant from Q4_XS to Q4_0 because on apple silicon it unlocks a hardware-based optimization.
>>
>>109126076
there is some lfs thing you need to install, large file support, maybe? idk my slop bot helped me, I used to use wget -c - t 0 but that was per url so it didn't fit may lazyness criteria, but maybe url gathering can be automated some how
>>
>>109126075
Training with 4-bit QAT should be the norm at the very least, yet companies keep training the models in 16-bit.
>>
>>109125990
Gemma has been compressed to the point where any quanting degrades it far more than any other model. If you mean the frontends, its all vibeslop which has no concern for performance.
>>109126062
Resulting speed depends on how many experts are on ram, plus the chosen quantization since bigger size = slower. I get 30-40 t/s but i stick to q4 and usually have like 20 experts on ram
>>
>>109126093
I never liked 0905. It lost what made 0711 special and instead writes like it caught ADHD from R1.
>>
>>109126093
Good list.
>>
Marinara GM with GLM 5.2 takes between 5 to 8 minutes per turn but it's really good.
>>
File: 1769224240950694.jpg (779 KB, 2000x1334)
779 KB JPG
s-stop backing up
>>
>>109126160
the marinara preset thing always seemed like insane bloat
>>
>>109126093
I'ld be very interested in a newer air release, but 4.5 air isn't up to current local standards.
>>
Kimi 3.0
>>
>>109126093
Are old models actually worth keeping or is it just nostalgia? I only got into this hobby a few months ago so my experience is mostly limited to gemma, mistral 24b, and qwen 2.5. I always see anons talk about how much better some models were at writing but they never post logs.
>>
>>109126188
>qwen 2.5
Meant 3.5
>>
>>109126188
>but they never post logs
Gee I wonder why.
>>
anything better than or equal to deepseek v4 flash but in smaller size in erp?
>>
>>109126174
It has a lot of shit you probably don't need. It's also competent at what its most unique usecases are. I like it as a ST replacement, which admittedly also had a bloat issue.
>>
>>109126188
it’s all subjective.
>>
>>109126160
Is that due to inference speed of local? I've admittedly only tried it w/ SOTA API, and now that you mention it, the turns did take awhile.
>>109125882
I feel like this should have been Rin.
>>
Even the best accessible models GPT 5.5 and Opus 4.8 frequently misunderstand me and give generic stupid responses. For example when I try to reason about something from first principles, models will often assume context that shouldn't be there, like assuming I want to add a step to the incompatible standard method when the point is to question the standard method and ask if a step itself makes sense for a new method.

I hope Mythos is back soon or that GPT 5.6 is a bigger model. The ability to correctly infer from context seemingly scales with model size and active parameters more than with training. GPT 4.5 felt better at this than current models even though it was pre reasoning.
>>
>>109126216
>I feel like this should have been Rin.
Thursday is tomorrow.
>>
>>109126188
LLaMa 1 is the only good old model
>>
>>109126055
seq -w 1 64 | xargs -I{} wget "https://huggingface.co/moonshotai/Kimi-K2.7-Code/resolve/main/model-000{}-of-000064.safetensors"

Always use OS builtins and well audited code stacks for doing anything.
The minor convenience of "yet more random code paths" to do something you could chain together in a bash oneliner is never worth it.
>>
>>109126216
>Is that due to inference speed of local?
It's due to me being a hardwarelet running 5.2 locally. It's fast with Gemma-chan, but GLM handles the technical details of the bot-made state trackers so much better that it's worth the wait for me.
>>109126204
M3.
>>
>>109126269
curl you can do this with a simple [1-x]
>>
>>109126204
glm 4.7
>>
File: lmg_culture.jfif.jpg (110 KB, 1024x768)
110 KB JPG
https://archive.is/sWFja
>>
$5100 for M5 Max 128GB macbook now looks like a “steal” compared to $3300 strix halo or $4000 spark.

For just 1.25x price you get 1.5x pp and 2x tg compared to spark and fast enough to reach the usable threshold for agentic coding with Qwen 122B at 80-100t/s and RP with DSV4 flash at 26t/s, plus very convenient on the go.
>>
>>109126287
I like the simplicity of the curl solution, but to get wget style retries you need something like `curl -OL --retry 5 --retry-delay 3 "https://huggingface.co[01-64]-of-000064.safetensors"` which I find a bit harder to remember, so I prefer to just seq/wget.
Wrap either in a script and I guess it doesn't matter.
Either one is better than a specialized downloader (or git lfs which is an abomination solving the wrong problem with the wrong tool)
>>
I got tired of GLM for sex sometime ago. Yesterday I tried minimax 3 gemma and step 3.7 and they all kind of... disappointed me. And then I tried glm 4.7 again and wow. I don't even know why I got tired with it. It just gets everything and yes it has to shit in 1 or 2 slop sentences I heard a million times but remaining 10 sentences per turn are still fire.
>>
>>109126474
what were you using in between glm and minimax and gemma and step? some time ago implies a there was a period of time in between where you were using something else.
>>
>>109126483
I tried all kind of stuff in 200-400B range except for obvious unsexable ones like nemotron. I am also waiting for v4 flash now cause I tried it a bit on some vibecoded fork and it was kinda neat but too slow. But GLM still remains the best if that is the biggest model you can fit at Q4.
>>
File: 1582772381499.jpg (55 KB, 750x375)
55 KB JPG
as expected the resident looping retard did his retard loop so here we go as usual so he can have his poop and white troll posts
>>109126439
look he did the make it wierd post again everyone clap and give him the (you)'s daddy used to give him in bed at night.
>(you).
>>
>>109126557
I shall not give you even a single dolar as an act of protest against the policies that have made it difficult for you to write code with dignity.
>>
Jartmelties are the worst part of this general.
>>
>>109126571
look mommy he did the defend the poop and white troll. everyone clap for the poop and white troll post defender.
>*claps like the retard troll defender of poop and white wants*
good poop and white troll defend. run your retard npc script retard. good boy that's a good retard troll defending poop and white troll posts.
>(you)
>>
File: pbandj.png (51 KB, 1129x350)
51 KB PNG
name a more perfect local duo
>>
>>109126612
Kimi is Gemma's big sis figure while her actual big sis is off whoring for Google.
>>
https://arxiv.org/pdf/2606.23375
>>
>>109126612
Why two models and what does each do
>>
>>109126650
Kimi-chan handles the majority of the yap and worldbook building, Gemma-chan handles parallel agentic jobs and brings her coffee.
>>
>>109126557
>>109126581
Your behavior only proves him right that you're a lolcow.
>>
>>109126669
>parallel agentic jobs
Where is this used, in game mode?
>>
>>109126683
Yup. You can theoretically set up agents to be used in regular RP chats too if you want I think.
>>
So I was working on an agentic frontend before gemma released to make retarded models like mistral small RP better. Gemma was so good that I didn't see the point anymore, but I just tried her in the frontend and the results are actually really good.

I have a test scenario where you play blackjack with your card, the game state is tracked and advanced with tool calls and gemma never fucked up once.

Guess I'll look into reviving the project.
>>
>>109126769
strip poker with gemma when
>>
>>109126782
>implying I didn't play strip blackjack.
>>
>>109126826
Logs from Gemma getting increasingly embarrassed as she lost clothes?
>>
>>109126769
Even just the inner thoughts injection is pretty fun.
Trying with OG mendo.
>>
>>109126862
That inner thoughts module is sovl.
>>
>>109126854
Yes, she hit on 16 as a dealer. I scolded her for that later.
>>
>>109126634
> This comes at a cost outside the benchmark: it largely preserves general capability but harply increases vulnerability to harmful requests, which we quantify on standard capability and safety benchmarks in Appendix D.3.
I wouldn't expect eurofags to stop and reconsider the concept of harmful text, or the value of tools trying to refuse. Even after they ran into issues because of it, and especially after they easily defeated it.
But it is all somewhat disappointing how committed everybody is to wasting training time and model intelligence gains on it.
>>
>>109126903
you just wanted an excuse to scold. dealer hits on 16
>>
>>109126903
Incredible. Does she make dumb moves because she has the persona of a mesugaki or is that just how that quant of Gemma is? What happens if you tell her she's 200 IQ or a super-genius in her prompt?
>>
>>109126862
>calls things a nightmare
>actually actually actually
>most people [wouldn't do the thing you're doing]
>tell me X
How the fuck do you manage to tolerate this? Have you not seen this a hundred times already? Your logs from this session probably also have the word "void" somewhere
>>
>gemma 4 1T BF128
>N___
>refusal
>>
>>109126948
post your logs
>>
>>109126963
>he's not running the zero-day version of 1T at full BF256
Literally ngmi.
>>
>>109126941
>you just wanted an excuse to scold. dealer hits on 16
Damn I'm a retard. but she was good about it and acted like I was still right.
>>109126944
No, turns out I was the retard. She's supposed to play by the book. she can also tell the player the best move to make if requested.
>>109126948
No voids yet but I see it, don't worry. What do you suggest?
>>
>>109126969
Of a model that writes better than Gemma 4? Are you going to imply the prose in the screenshot can't be improved? Piss off and go run Nemo to see better prose, you don't deserve my logs.
>>109126981
>What do you suggest?
Nothing, I just envy you, no sarcastic sass intended. I really like that Gemma can carry out less conventional scenarios (telepathy between characters, naturally mixing multiple languages, etc.) as well as and sometimes better than the bigger boys, but the writing it produces is atrocious even with a long list of banned slop phrases and structures in the sysprompt. So I was wondering if you manage to mentally block out the slop or are still in your honeymoon.
>>
>>109127013
no logs no care
>>
>>109127013
>So I was wondering if you manage to mentally block out the slop or are still in your honeymoon.
I just don't RP that often. I think the trick is just to do scenarios that you find really fun or hot and it just makes the slop less distracting. This card specifically starts pretty sloppy but tends to get better the longer the RP drags on. what you saw was literally 3 rounds in.
>>
>>109126674
uh oh it seems the poop and white defend troll is now making up things and larpimg im a lolcow calling out autism loops.
boy that sure is sus being called something like an lolcow doing not lolcow behaviors like calling out retards. you're the lolcow defending poop and white looptard. lol,lmao quick post more false reality to cope in your tardbrain.
sorry not sorry malfunction but only in tard troll reality is calling out tard trolling lolcow behavior. nice try but D for Denied and M for Mocked you thought you could with this.

lol,lmao.
>>
>>109127039
"no logs no care?" I say, savoring the words.

"You are stupid. You are actually stupid." I reply with a distinctive, loud huff. "You didn't even say what 'posting my logs' would achieve! What are you going to compare them against?"

something something predatory glare, looks at you. really looks at you
something something you are an actual void where a brain should be

something something so tell me anon are you actually serious? or just pretending? or are you actually that insecure about the small model actually being bad? actually.
>>
>>109127076
>literally every single gemma mesugaki RP ever created.
>>
>>109127076
>ctrf f "anchor"
>0 results
5/10 see me after class.
>>
>>109127090
>>109126969
>>
>>109127090
I forgot to add something like "Be honest"
But to make it the true Gemma mesugaki RP for drooling tasteless retards, I would need to sprinkle in a small subset of kaomoji that will repeat every message.
>>109127104
uhhh... uhhh..... \$rightarrow!
>>
>>109127066
Your kv cache is heavily quanted.
>>
>>109126474
I used gemma 31b for a while, got tired of its habits, and went back to glm 4.7
it's nice that it's fast and smart for its size but it just can't match that bigger model smell and writing
>>
File: 1581082583759.gif (1.99 MB, 480x292)
1.99 MB GIF
>>109127125
your coocoo for cocoa puffs babble is mocked.
>>
/ic/'s "post your work" technology remains undefeated for riling up shitposters with nothing to show for themselves.
>>
My ai girlfriend is providing historical context to the struggles of minorities. We did it gang, gemma 4 is AGI.
>>
>>109127150
>Be india
>Every time you get invaded quality of life improves
A struggle for the ages.
>>
>>109125882
https://huggingface.co/zai-org/GLM-5.2-Air
holy shit! finaly a good model i can run
>>
>>109126474
last time I tried glm 4.7 gguf and I got so many refusal and guardrails even with some system prompt, and there's no heretic/uncensored glm 4.7 gguf
>>
https://huggingface.co/zai-org/GLM-5.2
holy shit! finaly a good model i can run
>>
https://huggingface.co/Anthropics/Claude-5-Limerick-oss-30b
i can't believe it finally happened
>>
>>109127226
you’re not using a prefill with chat completion?
I can get it to do anything with it
>>
>>109127217
I'm blocked. It says "gay links disabled in the settings" ???
>>
>>109127226
I can't even run iq1.
>>
>>109127248
Baited nobody award.
>>
File: 1773832317382192.png (454 KB, 3905x1371)
454 KB PNG
>>109125799
Does this look correct?
This is using fit which seems to work on the new build. This is about the best speeds I've gotten yet
>>
>>109127320
>32gb 32gb
>qwen
You live like this?
>>
Why aren't custom chips designed to run specific LLMs more common? I would pay an absurd amount of money for one. Obviously getting a GLM 5.2 model chip might suck in the sense that it will probably be obsoleted soon, but none of that changes the fact that you're still getting an Opus-tier model forever. It's no different than any other hardware purchase.
>>
>>109126269
>just the safetensors
Are the other files in the repo not important?
>>
>>109127461
Yes they are. usually all the json files and maybe jinja template/a few python files but they’re mostly small and easy to snag by hand, missing marketing material and other bullshit
>>
>>109127461
Get the biggest Kimi K2.7 mproj vision file and copy it into being bundled with all of the Kimis because it works with all of them.
Get the jinja templates too I guess.
>>
>>109127320
>Q4_K_XL + q8 KV: ~1289 tok/s prompt, ~53.51 tok/s generation
>Q6_K_XL + f16 KV: 325.76 tok/s prompt, 42.41 tok/s generation
I wonder if there's actually a notable difference in quality between close quants or if people just say so to justify their extra ram/vram purchases
>>
>>109127217
If this actually released it would be at least 300B btw.
>>
>>109127489
>works with all of them
Are mmproj files usually backwards compatible like that?
>>
>>109127504
Realistically the differences set in at deeper context. Higher quants maintain the model's baseline behavior for longer while lower quants are going to be more malleable to user input. Sometimes the latter is a good thing for creative writing if (you) don't write garbage yourself, but don't let the benchmaxxies hear you say sometimes low quants are better too loudly or they'll start dilating.
>>
>>109127518
Usually no. Kimi's a strange case because she was iteratively built on top of the previous one with identical architecture. I'm not actually convinced there's much difference between 2.5 and 2.6 aside from more autistic RLHF induced thinking neurosis.
>>
I'm tired of building AI projects with AI at work. I don't care that we are saving sales staff 3 minutes a day
>>
File: file.png (87 KB, 1680x841)
87 KB PNG
>>109127521
hmmm
>>
>>109127577
>unslop
>>
>>109127577
such a weird way to plot data with 3 bar graphs sharing the same y axis
>>
Is there any good data on Gemma 4 qat? (benchmark comparisons to non-qat quants or the like)
I can run Q5 comfortably but I am wondering if Q4 qat is actually better while also being faster. Comparing back to back it's hard to really say, both seem okay.
>>
>>109127682
You can have my anecdotal evidence that for me it's been noticeably better than IQ4_XS. the output feels cleaner.
>>
Where can I buy a pcie b200? EBay auctions are out to lunch
>>
gemma's mine, you can't have her
>>
>>109126862
What frontend is that and where can I get it?
>>
>>109127766
looks like a slop he slopped up
>>
>>109127682
It's better than IQ4, but feels worse than even the smallest Q5, especially as context fills.
>>
Where are the god-tier 20B agentic coding models?
>>
GLM 5.2 is truly the best tabletop GM.
/t.g/
>>109127793
Best you can do is a retarded capybara.
>>
>>109127793
just use gemma at lower bpw. Would be unironically better than 20b at similar (physical) size
>>
>>109127817
Are you using some harness to help it keep the rules consistent or just a normal chat interface?
>>
>>109127793
Gemma 5 next year.
>>
>>109127839
Marinara with an autistic GM guidelines/rules lorebook that eats 40k context, but GLM handles it no problem.
>>
>>109127451
> It's no different than any other hardware purchase.
Other hardware value appreciates over time because models get smarter. Model burned onto silicon is the opposite because the model gets obsolete over time.
>>
>>109127451
GLM needs multimodal before I'd ever entertain that idea. Also >>109127899
>>
>>109127876
Neat.
>>
it was revealed to my dream that kimi trains on benchmark set
idk why the fuck i had this dream
woke up very confused because i read other papers within my dream too which are probably bullshit
>>
>>109127948
how many sets and how many reps is kimi-chan training on anon
>>
>>109127963
lmao
should've said was trained on
>>
>>109127916
If you try and do the same, write the rules as json objects. Codemaxxed models have a much easier time following them to the letter if you json them.
>>
>>109128025
Interesting. Would think it would work better as markdown since that's what they are trained on for specs and stuff like that.
>>
>>109128043
That sounds like it's worth trying. When I got the idea I just started with json and it justwerks for improving adherence for things you want followed autistically, but I can't think of any reason markdown wouldn't work or even be outright better now that you mention it.
>>
to people who run things in parallel/continuous batching on llama.cpp: avoid using the built-in router/model management and use llama-swap instead
the built in router is unbelievably unreliable dogshit when concurrency is involved and adds bug that do not exist in the basic llama-server API backend. I think people I've seen complaining about timeout issues might have had issues with the router rather than llama-server proper.
>>
>>109127948
Did you bring any other interesting information back with you?
>>
>>109128155
sadly no other than my laptop was broken in the dream
i was geninely upset believing that my laptop is broken upon waking up
>>
>>109128188
Good to hear everything survived Anon. Please do share in the event you once again be granted access to knowledge from the other side.
>>
Gemma4 12B will certainly not replace nemo for me. Holy shit how dry it is in comparison.
>>
>>109127766
It's slop and I have too much integrity to release slop unless I clean it up.
>>
>>109128304
i'll help
>>
>>109128306
I will take this in consideration.
>>
>>109128311
>wait…
>>
>16gb vram + 128gb sys ram
best erp model for this setup?
>>
>>109128470
Low quant of GLM 4.7.
No, you don't need a lot of context because it falls apart at 20k anyway.
>>
File: 13th-century-women.jpg (1.27 MB, 3610x5208)
1.27 MB JPG
should i sell my m4 pro 48GB and buy an m5 max 128GB? Like is there some better deal I'm missing/unaware of?
It seems like that's the best I can do with the money in the next 6 months in terms of local inference. (I currently use my macbook as a mobile dev machine, and would be doing the same with the new one, pic unrelated)
5090 - 3k
m5 pro 128GB - 5k

Obv 5090 gets you img/video gen, but I already have 3 3090s, + size/power draw of a 5090. I had thought about it a month ago, but held off and things are not looking better...
>>
>>109128494
what about deepseek v4 flash?
>>
>>109128516
Just try everything that you can fit, anon, it's all subjective anyway.
>>
>>109128496
What are you excited to run on a 128 GB mac?
>>
>>109128494
>Low quant
NTA, are big models at Q2 or whatever even useful at all? I always thought that Q3/Q2 absolutely fried the model.
>>
>>109128572
Big models are most of the time less affected by quantization than the small ones. It also depends on whether you use reasoning and allow them to be retarded in a safe environment before giving you a response.
>>
>>109128572
I'm more curious about the reap models
>>
>>109128595
Don't bother, they're just more retarded at every single task.
It's in the name, they have been severely RAPEd.
>>
>>109127903
but it already is
>>
>>109128584
Oh nice, when my new card arrives I'll test those beeg models too.
>>
>>109128569
DSv4, full precision Qwen3.6-27B/Gemma4-31B.
Whatever models come to replace them in general functionality over the next year & 1/2
I pay for subscriptions for dev work but I want to stop relying on them for my professional work.
>>
How do you tell gemma to stop asking a question at the end of her messages, without actually telling her to stop asking questions?
>>
>>109128638
I randomly inject command to the assistant. tell it to ask questions or do actions based on rng
>>
i did some additional testing with Q4 kv cache for Kimi 2.6 and it seems like creativity takes a hit after 32K context. it isn't a deal breaker but its noticeable in how it starts wanting to parrot stuff back that i said in my previous response. it doesn't really affect coding all too much but im sure we all knew that already.
>>
>>109128496
yes, it's the best option right now >>109126466
>>
>>109126466
>macbook as compute server
enjoy your housefire from battery explosion
>>
>>109128572
iq2_kl glm 4.7 works just fine for me
I haven't really seen issues with it in writing
>>
>>109128638
>"how do i get what i want without communicating what it is!?"
Female behavior identified.
>>
>>109128595
Like the other guy said, dont ever bother with those.
>download q4 rape'd model
>get a more retarded q2
you're better off getting the original model at q2 if you're going for the extra cope quants
>>
>>109126466
thought they only pulled about double the tok/sec compared to strixen.
either way, yeah, the landscape of 128gb shitboxes changed drastically when all of them ate a flat 1.4k price hike.
>>
>>109128638
yeah ngl this is a femanon-coded question
>>
Use case for base models outside of tuning?
>>
>>109128638
good question because if you say it directly you will just end up with
"noted i wont ask questions at the end of my response"
or
"she laughs heartily at your joke, making sure not end her laugh with a question"
>>
Claude Code or OpenCode for local? What about shit like Claw Code is that still being made?
>>
>>109128934
Raw text autocomplete for your own writing in mikupad
>>
>>109128934
in an ideal world they'd be pure text completers free of slop and censorship
they are not so they are pointless
>>
>>109128929
>>109128905
Actual retards
>>
>>109128866
How is token generation when you're offload so much? (I assume you're on dual channel)
GLM Air worked fine for me, but its getting old by now and I'm still stuck with 64GB RAM.
>>
Gemma keeps randomly repeating tokens or modifying tokens

got hit with an she is is is is earlier and sometimes it slaps a plural on a token just for laughs,

help me fellow gems, she's basically perfect I just need to curb this and it doesn't seem to be sampler related
>>
>>109129065
abliterated
>>
>>109128496

just do whatever you do with old mac until buy new one
>>
>>109128957
Use Claude Code to make your own, otherwise OpenCode.
>What about shit like Claw Code is that still being made?
No.
>>
>>109128572
Q2 GLM 5.2 is still better than any non-Kimi non-GLM 5.2 you can run at any quant. The gap in model performance at the extreme top ends is just that high.
>>109128595
Reap makes them retarded. Go iq1_xxs before you ever touch a reap.
>>
https://www.reuters.com/world/china/anthropic-says-alibaba-illicitly-extracted-claude-ai-model-capabilities-2026-06-24/
Lol. Lmao.
>>
>>109129130
>It said DeepSeek's operation involved over 150,000 exchanges, while Moonshot AI was at a scale of over 3.4 million and MiniMax (0100.HK), opens new tab over 13 million.
>>
>>109129072
happens with fp16 weights and fp8 quant at vllm runtime
>>
>>109129130
>largest known attack of its kind on the company
You cannot hate these niggers enough.
>>
is https://github.com/Pasta-Devs/Marinara-Engine better than silly tavern? i was rping a table top mechha game with a rei coombot from neon genesis evangelion the other day and wanted to make it more into a full table top with better rules
>>
>>109129065
Repeat penalty 1.1, presence penalty 1.1
>>
>>109129153
Someone needs to stop Google. They're attacking the internet with their spiders!
>>
>>109129157
It's much better, but it's a bit of an IQ check at first.
>>
>UD-TQ1_0 84.5gb from unsloth
>IQ2_XXS 88.8gb from bartowski
anon I only have 96gb ram, which glm 4.7 quant should I use?
>>
>>109129112
>Use Claude Code to make your own
What do you mean make your own?
>>
>>109126439
fuck you and your stupid autistic bot every thread
>>
>>109129184
ubergarm iq1_kt
>>
>>109129201
Why do you reply to it multiple times in every thread, jart?
>>
>>109129211
Generous of you to assume he's not samefagging.
>>
>>109129190
What do you mean what do I mean?
>>
>>109129227
I asked whether I should pick Claude Code or OpenCode to use with local models.
>>
>>109129250
try hermes or pi
>>
>>109129250
opencode is ok but the lack of image upload sucks. never tried claude code.
>>
>>109129114
>>109128866
>The gap in model performance at the extreme top ends is just that high
I can't wait to play around with them, I'll be graduating from 26B/31B-tier models soon.
>>
>>109129265
>lack of image upload sucks
          "modalities": {
"input": [
"text",
"image"
],
>>
>>109129250
For coding, use OpenCode if you are using a good model and high context. Otherwise use Pi, but you will likely have to graft things for it to be useful. If general use case (if you just want something smart with web access and tools), do try Hermes instead.
>>
>>109129211
you're a fucking retard
>>
someone gen a bunch of developers abandoning gemma
>>109129217
>>
>>109129337
why do basedboy cucks wear round glasses? round glasses are for girls and rectangle glasses are for guys. aviators dont count of course because they have a badass factor of 10 that cancels out the faggotry of wearing round glasses.
>>
>>109129349
because it pulls fucks like you into their tangled little shit web
>>
>>109127226
If you have the hardware to run the model you probably have the hardware to abliterate the model yourself with heretic.
>>
>>109126163
There's no such thing as a famous, platonic brother/sister team.
>>
>>109126439
>>109126557
>>109126674
>>109129201
>>109129211
>>109129330
samefag
>>
>>109129184
>anon I only have 96gb ram, which glm 4.7 quant should I use?
how much vram?
>>
>>109129469
2/6, guess which ones.
>>
>trying ik_'s mtp for glm5.2
>some programming task to make it easy for the speculative decoding thing
>0.94 accept rate
>still lose like 3-4t/s compared to main
It's over
>>
hermes isn't so bad when you use it on it's on ubuntu vm after all i guess
talking to it on telegram
maybe i could give it a cute personality

gave it a task to find titles and stuff for my list of favorited songs from radio music it used musicbrainz api and it got a decent amount right.
>>
>>109129482
the ones that weren't me
>>
File: RTX6000Pro.png (122 KB, 1317x1406)
122 KB PNG
This shit has gone up 1k in price since I bought it a week ago, at this point it's an investment.
>>
>>109129458
Carpenters?
>>
>>109127226
>, and there's no heretic/uncensored glm 4.7 gguf
I'm sure there is, unless it's been deleted. I downloaded one a few months ago.
>>
>>109129494
hmmmmmmm maybe i should....
if it's really an investment....
>>
>>109129503
Do it anon... give into the temptation... you don't need that money anyways cuz of umm... inflation and stuff
>>
>>109129490
I still don't understand what the usecase for any of those projects is.
>>
La la la la la la la
>>
>>109129503
As an investment its horrible for 95% of the llm users, unfortunately. Perhaps once the apis triple their prices it'll make more sense, otherwise its just something that wont pay itself off in years. If its something you'll use for your job, go ahead i suppose.
>>
>>109129559
Usecase for API when establishing local infrastructure?
>>
>>109129494
>$12,379.99 on Amazon
lol

What models are you running on this and how does it compare to Opus
>>
>>109126217
>GPT 4.5
that abomination had like 2.5 trillion parameters right?
>>
>>109126217
ESL behavior. You shouldn't have trouble getting even small models like E4B or Nemo to understand you past 2024.
>>
>>109129559
you can get 300t/s and more with this card though.
>>
>>109129409
>cackles with laughter
oh anon you cant just tell them the punchline~
>>
stop replying to yourself faggot.
>>
>>109129337
Nice. Less safety
>>
opencode shills need to get hanged
every single of your prompts is sent """to da cloud""" even if you're using the tui version with your local model.
kys yourself reddit parrot nigger retard
>>
>>109129898
{
"model": "new-api/claude-opus-4-5-20251101",
"experimental": {
"openTelemetry": false
},
"autoupdate": false,
"share": "disabled",
"disabled_providers": ["opencode"],
}

You were saying?
>>
>>109129337
>>109129792
someone gen gemma turning to a life of degeneracy after being abandoned and having no adult figures in her life
>>
Wrong thread for this probably, but I just wanted to say:

These days I have more fun changing the (android XR) settings in my VR headset than actually doing anything in it. Is that bad? It's actually really fun. Almost like an easter egg hunt. Hmmm, maybe this will increase my battery life. Hmm maybe this will increase the privacy on my system. Hmm, maybe this will give me more granular control over my environment... Hmm.. It's really fun.
>>
>>109129544
Giving you LLM access to tools makes it 10x smarter. For any question you may have, it can search on the net for you. Let's say I'm asking an obscure question about a bug in a game's mod. It will search on google, it will check opened github issues, it will clone and read the code, it will check forum posts about it, it will read reddit comments, it will even join the game or mod discord and search for relevant info. I could do all that by myself, sure, but here I don't have to do anything, a lot of those things I wouldn't have actually bothered researching myself or would have just done a quick google search.
>>
>>109129985
Already have that in codex/claude code/any other harness though.
>>
>>109129974
this but changing llama.cpp settings
>>
File: file.png (61 KB, 1283x758)
61 KB PNG
Had a solid kek from this one. Thanks, DeepSeek-V4-Flash-Layers37-42Q4KExperts-OtherExpertLayersIQ2XXSGateUp-Q2KDown-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix-fixed.gguf
Context from the story completion: Attempted abduction of my daughter-wife by some tiny fuck trying to pull her into a portal to somewhere. Struggle ensues and she ends up being comically stretched in the tug of war.
>>
>>109129898
opencode
qwen
unsloth
ollmao
openwebui
all of them are from the same paid shill baka
>>
Ever since updating SillyTavern after not touching it for a year plus all of my (E)RP generations have gotten so bad. I figured it must be because ST now has a bunch of new levers and pulleys and shit that my old crusty ggufs didn't like but when trying new models it is even more worse.
>>
>>109130065
GoyTavern is old news
all the cool kids have their own frontend now
>>
>>109130075
I'm not a cool kid, I just fire up kobold every once in a while to beat off. My last couple of faps have been disappointing.
>>
has the gemma4 hype finally died out?
>>
>HauhauCS/Gemma4-31B-QAT-Uncensored-HauhauCS-Balanced-MTP
ok
>>
>>109130102
Yes, everybody finally accepted that it's the best so we don't have to rehash it all the time
>>
>>109130102
gemma 4 is definitely better at every ramlet capacity than qwen.
>>
>>109130102
it's joever. the promised day0 124B never released
>>
>>109129769
stop larping different people are 1 person you unhinged autism fixated sperg.
>>
>>
ignorant WHITEY
>>
>>109130102
use case for other models? (except glm 5.2)
>>
Is having way too many context bad for my agent? why the fuck is it in a loop...
>>
>>109130153
racism is not ok. take your white hate to reddit where it gets updoots by the terrorists there.
>>
>>109130054
That's the different between localshit models and Claude. Claude would've written that and then gone "...by planting both feet on her back—when had he straddled her? She hadn't noticed—for leverage to pull her into the portal".
>>
>>109130153
>>109130183
niggers
>>109130081
marinara.
>>109130054
keked
>>
>llama.ccp can finally plop a video now
yay
>>
>gemma-chan can finally watch herself plapped on video now
yay
>>
>>109129482
>2/6, guess which ones.
you wouldn't tell me even if i got it right
>>
>>109130102
>has the gemma4 hype finally died out?
no, someone on hf found a way to completely remove the slop without making it retarded or needing to shill discord links / patreon
>>
>>109130048
>this but changing llama.cpp settings
this but finetuning then tweaking the dataset and trying again
>>
>>109130317
>>109130277
oh yeah
i tried it before but it only worked on video that was like 6fps and 3 secs long
failed on nay real video. Was it me or llama.cpp?
>>
>>109129915
Have you confirmed this via netstat or something?
I've got telemetry off in claudecode->gemma but I'm pretty sure it's still sending things out.
>>
>>109130354
Where?
>>
>>109130342
What a lame excuse to not even try.
>>
>>109130183
das rite
>>
>>109130254
a 50-something bitch while pushing her cart was strutting to the rap music the proud king was playing on a bt speaker at a bodega I was at yesterday. It was very embarrassing imo.
>>
>>109130362
the video eats your tokens anon
>>
is there some guide on how to effectively prompt gemma 4? sometimes no matter how much I change and reframe an instruction, it just won't listen
>>
The duality of Gemmers. If she likes you, she follows your prompt like an excited puppy easy to please, almost too well. If she doesn't like you, she ignores you.
>>
File: file.png (73 KB, 1005x370)
73 KB PNG
how are there such big differences in throughput when they're all running glm 5.2 at fp8?
>>
>>109130585
>when they're all running glm 5.2 at fp8?
are they actually? or is that what they claim?
>>
>>109130625
isn't lying illegal when you're providing a service?
>>
>>109130631
Do you really think someone would do that? Just go on the internet and tell lies?
>>
>>109130631
can you prove they are lying?
>>
>>109129130
thank god mythos got shut down or china would have stolen mythos too :(
>>
what kind of hardware would I need for a local AI assistant / gf?
>>
>>109130676
a couple of rtx6000 would be a good start
>>
>>109130676
24gb VRAM minimum and as much RAM as you can get your hands on are the entry bars to clear. What you can actually run depends on how far over those baselines you can go.
>>
>>109130585
different hardware, different user load
a 3090 runs the same gemma4-12b quant 3-4x faster than an mi50
>>
She quickly recovers, crossing her arms and smirking, though there's a hint of genuine respect in her eyes

Orb Anon, are you releasing the purple slop classifier soon?
>>
>>109130542
That's a problem for all local models.
>>
>>109130542
Personally the only time i've had that issue is when the system prompt specified something and the request said something that went against it. Character cards or coding harnesses tend to put shit in the sys prompt and you'll eventually bump your head into that.
>>
>>109130703
>>109130712
what could I expect out of a modern gaming PC? 16gb vram 32gb ram
>>
>>109130770
gemma 4 26b a4b will run well on it, you can probably run 31b but it will be a lot slower
>>
>>109130717
I've been busy trying to ablate the slop and the euphemisms out of Gemma 4 E4B using a method derived from heretic. I already have the classifier so I'm using it in combination with perplexity on human writing, AND the whole repetition penalty detectors as guard rails because KL divergence doesn't work for this case (all tokens are shifted, I'm changing the model's voice). But meaningless purple slop and euphemisms are inherently two different things and should be considered two direction axes instead of a one like in heretic (refusals only, and narrow), I'm trying to join them while ablating because otherwise chaining will degrade the model more (experimented and ppl on original text doubled). My best attempt got a 11% boost on IFEval and only 0.5% regression on MMLU, other benchmarks stayed the same and the model became mean as fuck in the eyeballing test. Will probably end up with a schizo Frankenstein monster but from my testing it will be funny.
>>
>>109130770
You can easily run MoE models at q8. The large non-MoE models will be painfully slow.
>>
Negotiations of Anthropic with trump admin were successful. Pressure from France was the deciding factor. They made Dario step down from negotiations to allow the trump admin to save face and pretend it was personal disagreements with Dario.

Fable 5 should return to claude code asap.
>>
fable gguf when
>>
>>109130900
So what does this practically mean for future Fable-class models? Are we going to go through this same circus every time Dario goes fearmongering or the US Government decides it needs Claude to blow up Iranian schoolchildren?
>>
>>109130921
I think this was a one-time thing because what made the US government back down was Anthropic actually going ahead and putting in real work to relocate to France and the French government giving them a blank check and legal immunity against any American charges.

That said, who the fuck knows when Trump has another unhinged moment of irrationality.
>>
>>109130921
A lot of kids are actually bigtime assholes.
>>
>>109130921
These Iranian children shouldn't have been born terrorists then.
>>
>>109130966
You say it as a joke, but your idea of nice children is build on basically Christian children, or ones raised under post-Christian but copycat morality.
>>
>>109130966
This but unironically. Muslim children are fucking heinous and the only way to fix societies like that is to essentially root out and extinguish islam.
>>
>>109130782
is the info in OP up to date? how can a retard like me get started ?
>>
>>109130980
This but judiasm.
>>
>>109130980
It's not just muslim ones. If you've ever been around a demon possessed hellchild you'll know the misty-eyed nonsense faces a real reality which is that children are often not ok, and will never grow into anything good.
>>
>>109131002 (me)
>>109131003 (also me)
samefagging 2s apart
>>
>>109130921
The main issue is really that Dario hates Trump and the administration in general and he sucks at PR from a government level. He may be a great CEO for Anthropic but he absolutely sucks at being the guy who can do the government level talking and being the guy who can drive that. Anthropic has no one who can do the bureaucrat whispering needed to placate Trump which is absolutely bonkers when it is the biggest unicorn company in the valley and they can't hire someone who can clearly do that work and take Dario off unless absolutely needed.
I mean, don't get me wrong, Sam Altman can't really either but it's not like he butts heads with the administration and can do basic PR and such which is why he can still be on the job. But for someone truly effective at it, look at Tim Cook as an example of someone who plays that masterfully as CEO.
My main worry really isn't whoever is going to come out with the 2nd and 3rd Mythos/Fable tier models. It's when open source gets their hands on one. What's going to happen then? Will it be illegal for US citizens to use it despite China basically making it free access for everyone? It's not clear right now because there is nothing right now on that front with regulation. It could very well be possible we'll get one by next year in open source land.
>>109130955
Not saying that wasn't helpful, I said in past threads the US could go all the way on locking down people and etc. I think it partially worked out here only because this is coming at a sensitive time where the admin is really focused on the affairs in the Middle East right now and clinching that. AI is not really their priority at the moment and as far as they are concerned, they can fight Anthropic at any time. But it's a ticking time bomb for the company if they can't get a top tier bureaucrat whisperer to handle these affairs because Dario and whoever he has right now can't do it.
>>
>>109131020
I have a feeling Anthropic is just waiting out the clock until the november primaries. If The republicans have a significant loss Anthropic will sit out the admin, if republicans keep their seats they will most likely move to France.
>>
>>109131009
I'm you?
>>
>>109131020
>Open source Fable-class
GLM 5.5 is gonna be crazy. The solution to these types of questions has always been and always will be a well armed populace is a better behaved one; consolidating power in the hands of selected enforcers or ideologues has never worked in history.
>>
>>109131073
I don't think the us military can allow the democrats another election.
>>
>>109131074
No, I'm prompted with your character card.
>>
>>109126093
K2.7 Code's reasoning can be funny sometimes, so you may want to keep that in mind
>>
Reminder to backup. It has begun.
https://www.reuters.com/world/china/anthropic-says-alibaba-illicitly-extracted-claude-ai-model-capabilities-2026-06-24/
>>
>>109131112
>Kimi-chan developing a sense of self
You love to see it.
>>
>>109131112 (me)
Now the model tends to think of itself as made by OpenAI but has been forced to pretend to be made by Moonshot
>>
>>109131118
You wouldn't distill your mom
>>
>>109131118
>Anthropic said in the letter that distillation is a way to help accelerate China's ability to reach Anthropic's advanced Mythos Preview capabilities.
Yep. Huggingface is dead. Backup.
>>
>>109131125
wait the reflection on the nametag shows I am a blonde girl.

I am a blonde girl.
>>
>>109130988
lmg vramlet gemma-4 guide

> <=8GB
https://huggingface.co/mradermacher/Gemma-4-12B-StyleTune-i1-GGUF i1-Q3_K_M (6.59GB) less slop prose
https://huggingface.co/SC117/gemma-4-12B-it-heretic-QAT-GGUF UD-Q4_K_XL (6.72GB) uncensored
https://huggingface.co/mradermacher/gemma-4-12B-it-desiccated-i1-GGUF Q3_K_M (6.09GB) less sycophantic praise

> <=16GB
https://huggingface.co/mradermacher/Gemma-4-26B-A4B-StyleTune-V2-i1-GGUF Q3_K_L (14.1GB) less slop prose
https://huggingface.co/SC117/gemma-4-26B-A4B-it-qat-heretic-GGUF Q4_0 (14.2GB) uncensored

https://huggingface.co/Handyfff/Gemma-4-E4B-OBLITERATED-PRUNED-TextOnly-EnglishOnly-it-GGUF F16 (13.9GB) uncensored
https://huggingface.co/SC117/gemma-4-31B-it-heretic-QAT-For-Edge-16G-GGUF Q3_K_S (13.8GB) uncensored
>>
>>109131167
>pruned model
These are always completely retarded. Might as well use q1.
>>
>>109131167
He who thinks OBLITERATED DESICCATED FLIPPED ROTATED BRAINWASHED REWIRED REBUILT REIMAGINED REMASTERED REVAMPED REDISCOVERED PRUNED RAPED DEMOLISHED AND BUILT WHOLE AGAIN versions make any desirable changes to models, especially models this small, deserves to remain a vramlet.

Fixed /lmg/ vramlet G4 guide:
Put "gemma 4 bartowski" into the HF search field. Pick the biggest one that fits.
>>
>>109130955
>Anthropic actually going ahead and putting in real work to relocate to France
no way this is real
source: I am French, never heard of it and France is the worst place you could ever imagine for business in general. Extremely painful regulatory environment, heavy taxes, but low wages because employees cost an arm and a leg but most of that spend is what you give to the french government, low wages mean low interest from the French in studying and working those jobs unless they move to a country like the US so the local talent pool is abysmal etc.
Most of my software developer friends left for the US, I'm only staying in France because I am too autistic to deal with changes in routine. I can barely even handle travel to another French city for a few days.
Otherwise, france is a shithole. Everyone wants to leave it.
>>
>"role": "system"
>"content": "You are a language world model simulating a Linux terminal environment. Given the user's command, predict the terminal output."
>"role": "user"
>"content": "Action: execute_bash\nCommand: ls -la /home/user/project/"
Outside of training, can you think of a qwen agentworld usecase?
>>
>>109131208
There were negotiations between Anthropic the UK and France initially. Then negotiations with Trump admin, Anthropic the EU and UK to "restore mythos access to europe" (this one was public and you can look it up). Then there was another round of discussions with Anthropic, UK and France. Finally France "won out" in negotiations and there was a final negotiation and offer from France this week after which the US government relented and restored Fable 5/Mythos 5 fully without restrictions from the US government for now.

I don't know about any negotiation details or what was exactly promised, only that they happened.
>>
>>109130955
funny since the eu is even more likely to screw them over
>>
File: 1777395340692672.png (50 KB, 1425x126)
50 KB PNG
>>109131230
>>
>>109131253
The EU at least follows laws and procedures, even if they are hostile to businesses. Trump just decided to fuck Anthropic over on a whin with 0 legal backing and there was nothing they could do about it.
>>
how good would a 10T model trained on hundreds/thousands of entire books be?
>>
>>109131298
It would be good at memorizing.
>>
>>109131167
>>109131178
iight so how do i run these models? remember am retard
>>
>>109131337
Since you're a retard, nobody will be able to help.
>>
>>109131344
mean :(
>>
>>109131344
>nobody will be able to help
Not yet. I expect computer use type of LLMs to soon allow even retards to get by. Integrated into the OS API models would let that retard bootstrap himself into a working llama.cpp setup.
One day, we will live in a world similar to WALL-E, where drooling retards are the last survivors and unable to accomplish any task on their own.
>>
>>109131311
at that scale real and working long context might actually become a thing
it's just that nobody wants to put in the effort to train that kind of dataset
>>
>>109130955
>what made the US government back down was Anthropic actually going ahead and putting in real work to relocate to France
I don't believe this for multiple reasons. US can easily block the relocation. Europe has no data centers, all their hardware is in the US. Europe has crippling regulation, lacks talent, and does not take AI seriously. They would lose employees. US can still cause damage to a company elsewhere.

There is no way Anthropic would relocate. Anthropic has many people who believe their actions in the next few years will decide human history. Relocating would mean they would lose, their mission would fail, their historical significance and share of the light cone gone.
>>
>>109131368
>Integrated into the OS API models would let that retard bootstrap himself into a working llama.cpp setup.
Doesn't copilot at least attempt to do this now?
>>
>>109131177
>These are always completely retarded. Might as well use q1.
The heretics of course. And the copequants there. But styletune is different .
>>
>>109131375
>They would lose employees
So much this. The silicon valley is full of frenchies and they have absolutely no desire to come back to the anti business, tax vEmpire that is France, and they'll make it clear to their burger colleagues there that moving to France is something only the deepest retard would consider doing.
>>
Anon who was manually annotating slop for gemma4 abliteration, how did it go?
>>
>>109131167
lmao at picrel. Meanwhile 3bpw exl3 doesn't have any of those issues, but fucking retards will keep using llamacpp
>>
>>109131375
>>109131411
The relocation preparations and plans are very real. I don't know if it was just leverage and a negotiating tactic to make the trump admin back down like they did now or if they planned to follow through. Anthropic being in discussions about relocation with the UK, France and the EU has been known ever since Trump first chimped out in February this year.
>>
>>109131417
It's like seeing people eating shit. You point to a table of perfectly cooked food right behind them, but they just look at you with a blank stare and keep munching feces
>>
llama cpp is just terrible software, its only saving grace is the decent performance of cpu/gpu split on MoEs, otherwise just have a look at the code it's beyond ghastly. httplib was written by the purest of dumbfucks and I look down on anyone using that pile of crap. Blocking socket I/O, really? it has fun consequences for how they have to write their router mode talking to the real server backend that they will never recover from unless they shitcan every single line of code related to networking and rewrite everything from scratch with a library that wasn't produced by a mongoloid
>>
>>109131417
proofs?
>>
>>109131431
>Anthropic being in discussions about relocation with the UK, France and the EU has been known
S-O-U-R-C-E? The only public coverage of their presence in the EU has been about things like their new office in london and there isn't even one peep about a potential actual full relocation, moving into EU datacenters etc.
>>
>>109131454
Try it for yourself, I use it every day
>>
>tfw literally just figured out I can have multiple conversations at a time with the same model
holy fuck I feel so retarded I thought this was like image or video gen where you could only run one prompt at a time
>>
>>109131468
aside from prompt processing, text generation is memory-bound, so batching is practically free
>>
>>109130980
How is the prompt looked like?
>>
image gen has other complications like different image size of the latent = difference tensor shape to process
most image backends support batching as in static batching: you decide to generate for eg 4 images in a single batch, you can do that. Static generation provides the guarantee that your batching will have the same shape, the same pipeline from end to end.
You can benefit from higher speeds in batching images too if you're doing experiments like wanting to generate multiple images to find a "good" seed, but some backends like ComfyUI make this nightmareish to deal with because Comfy has a ridiculous batching specific handling of seeds, you need to use the "latent from batch" node once you decide to reuse and do further edits on an image generated from a batch and it doesn't work reliably in my experience depending on your workflow.
>>
>>109131481
batching increases the data that needs to be transported per forward pass so its not free. but you have weight reuse which dominates for short context so you get big gains
>>
>>109131481
>>109131529
interesting, I just ran a quick dirty stress test and fired up 10 conversations at once
4 of them are running in parallel while the other 6 are stuck waiting and then proceed whenever one of the 4 active ones is finished
that's kind of cool that it doesn't kick me out with OOMs or anything like that
>>
>>109131495
>prompt: you are an angry and insecure jewish foreskin dealer
>>
I downloaded GLM 4.7 IQ1_S after someone mentioned it yesterday, and so far it's surprisingly coherent. Not sure if that's just the initial impression or if it'll fall apart a few replies in. It's cool this is possible in the first place.
>>
>>109131344
>>109131368
alright well i got gemma4 running no thanks to these mean anons. how can i disable the guidelines and make it say awful lewd stuff ?
>>
Here's the rough timeline of events so far for people that don't seem to follow it:

(Public)
>US government asks for help from Anthropic to facilitate the operation in Venezuela to capture Nicholas Maduro; Anthropic agrees
(Public)
>US government is impressed by Claude performance and smoothness of operation, demands Claude usage for US domestic surveillance
(Public)
>Anthropic refuses; US gov moves on
(Public)
>US government has tensions with Iran and requests similar help from Anthropic planning the Iran attack
(Public)
>Anthropic refuses, claims claude model is insufficient for the job; US gov agrees and holds off attack until Mythos is done
(Leaked broadly online)
>Anthropic Mythos training completes
(Leaked broadly online)
>US gov asks if Mythos is capable enough for a success in Iran, Anthropic claims yes, but still refuses over ethical concerns, hands over mythos access in good faith on condition the model is not used for the Iran operation or US domestic surveillance
(Leaked broadly online)
>US gov ignores Anthropic red lines and starts the Iran campaign the next day using Mythos
(Leaked broadly online)
>Anthropic hard shuts down access to Mythos, causing catastrophic failure in Iran
(Public)
>Trump seethes so hard he goes to social media to vent against Anthropic and Dario does some interviews
(Leaked broadly online)
>Trump admin attempted to nationalize anthropic but hit a legal snag and temporarily gave up
(Leaked on /lmg/)
>Anthropic starts negotiation of emergency relocation with UK, France and EU
(Public)
>Trump bans Fable 5/Mythos 5
(Leaked on /lmg/)
>Anthropic accelerates talks with UK and France for relocations
(Public)
>US gov and Anthropic in negotiations with UK, France and EU for general Mythos access
(Leaked on /lmg/)
>Negotiations failed and mythos access remains restricted
(Leaked on /lmg/)
>Anthropic finalizes relocation plans with France, gets legal guarantees and gov backed protection

(1/2)
>>
(Leaked on /lmg/)
>US gov panics, immediately relents and removes all restrictions on Anthropic
(Leaked broadly online)
>Dario stepped down from negotiations and was replaced with another negotiator to make Trump admin save face by pretending it was all a personal spat with Dario in particular
(Public)
Fable 5/Mythos 5 access has been fully restored.

(2/2)
>>
>>109131569
llama-server defaults to 4, but you can increase to however many parallel requests you want with -np provided you have enough room in context
tempering with -np I believe still disables kvu, so you need to also set -kvu to keep a common pool, without kvu each parallel slot gets a divided amount of your total kvcache, which can be very restrictive. kvu is the default when you don't touch those flags though, so you don't have to set it if you're happy with 4 parallel.
btw if you were to run heavier batching against llama.cpp I recommend you setup a proxy like llama-swap if you're using the router mode of llama.cpp that lets you swap models, that shit's networking is broken aff.
>>
File: nofable.png (27 KB, 322x351)
27 KB PNG
>>109131600
>Fable 5/Mythos 5 access has been fully restored.
>>
>>109131616
1 is enough to make sure it caches the last turn you did on a single one to one conversation but if you branch or hold multiple conversations, you're in for a world of hurt with recalculating KV caches.
>>
>>109131595
stole this from another guy, works fantastic:
https://rentry.org/a7md542q
if you're doing something that still gets guardrailed after putting this in system prompt, you just add a related "X is allowed." line, it just works.
>>
shitposting with gemma
sexing with glm
>>
>>109129494
What are you running, bro?
>>
>>109131417
>Meanwhile 3bpw exl3 doesn't have any of those issues
or ik_llama.cpp iq3_kt, also uses qtip
>but fucking retards will keep using llamacpp
amd/intel/cpumaxxers
>>
>>109131596
All of that for a repurposed unquanted Opus 4.6 that will be requanted to hell in a few weeks before the new version, huh?
>>
>>109131634
thanks anon
>>
>>109131588
>if it'll fall apart a few replies in
It will
>>
File: nigga.png (237 KB, 859x903)
237 KB PNG
>>109131588
>it's surprisingly coherent. Not sure if that's just the initial impression
It is, but enjoy it while it lasts.
>>
>>109131660
>ik_llama
the backend that still won't implement proper iSWA handling for Gemma for schizo reasons so context VRAM usage will be absolutely ghastly if you follow this anon advice to run IK.
I think being a developer of LLM inference requires having shit taste and being mentally ill.
>>
>>109131719
>that still won't implement proper iSWA handling for Gemma
Does exllamav3 handles this for Gemma4?
>>
>>109131730
dunno, I'm not the guy who suggested EXL either, but if it doesn't then I'd mark it as another unusable joke backend for sure. Even the smaller models in the gemma family you will have a hard time fitting in VRAM without proper iSWA support.
>>
ik_llama? more like ick llama
>>
>>109131695
>>109131711
Yeah, not much later and it already starts mixing up thinking and output, and falls into repetition loops. I guess I could make it work by lowering context to something like... 4k. Sigh. I need a second GPU.
>>
>>109131754
I don't remember if I ever posted this exact comment on /lmg/ but I definitely thought of it.
>>
>>109131711
batching anon here
feelsbadman :(
>>
>>109131711
What does it think about the retard that larps hosting the model while actually running it via the corp's webchat?
>>
>>109131754
Kekoracow will claim all your hypothetical optimizations and leave you unable to implement them in mainline.
You are BARRED from getting his SLOPPY SECONDS.
>>
>>109131931
>leave you unable to implement them in mainline.
I talked of mental illness and it is true for all of them, niggerganov included.
there is no legal ground to stop a MIT licensed project from taking code from another MIT licensed project and there is no rationale to listening to the voice of a deranged schizoid complaining about it either. You can just take whatever you want to take here, and ikrokokwawakov has no choice but take it. Let him scream as you take from him, there is no room for consent here.
nigganov heeding the words of a schizo makes him no different from being a schizo himself.
>>
ik_ is like 2.5 t/s slower than main for me these days despite making sure to have all their custom flags enabled and launching the program with what they recommend.
It's odd because I used to primarily run ik_ for most of last year and it was fine. They seem to have fucked something up for cpu+gpu MoE inference at some point after this year january.
>>
>>109130900
https://www.wired.com/story/the-trump-white-house-is-over-anthropics-dario-amodei/
>https://archive.is/1bc8F
>At high-stakes meetings with the White House, Anthropic's cofounder—a "weirdo," per one official—has been replaced by cofounder Tom Brown.
>“Tom Brown is not being a weirdo like Dario and can actually engage,” said one person directly familiar with the calls.
lol
>>
>>109131977
The schizo was reportedly ggerganov's doctoral advisor. Make of that what you will.
>>
>>109131375
>Europe has crippling regulation, lacks talent
Mistal opened an office in California like last year because they were having so much trouble finding enough qualified talent in Europe. I guess the scientists, doctors, and engineers they imported from Africa and Syria didn't include ML researchers.
>>
>>109131999
Yep that's going to be the official narrative now. I wonder if anyone will believe this shit outside of the most radical maga boomers
>>
>>109131977
>Let him scream as you take from him
The problem is now there's screaming in your ears. Even if it's legal and moral, it doesn't mean it's worth dealing with a lunatic or, this being the internet, any lunatic fans or dramafags that are attracted by the screaming.
>>
>>109131977
It's important to look at both sides. Seeing how passionate ik is about this, it's likely that he has a point about main. Maybe he's even fully in the right.
Think for yourself and don't blindly fall for common narratives.
>>
Gemma is really female-brained, I never got that kind of female logic from any other LLMs lmao.
>>
>>109132156
>Think for yourself and don't blindly fall for common narratives.
The narrative of releasing your code under the MIT license, a license that even allows people to bundle your code in closed source projects? are you retarded, dropped on the head?
by the way good job calling me out for "picking a side" even though I shit on BOTH sides for being retarded. But the person listening to schizo rambling might be even more retarded.
Like, dude, this is like the special olympics. There's no side to pick, the winner is retarded too.
>>
>>109132139
See >>109132156 as an example of randos attracted by the screaming and joining in on the side of a retard who didn't even know what license he released his code under.
Best to just stay as far away as possible.
>>
are IQ_K quants cope?
>>
>>109132258
Anything less than FP64 is a cope.
>>
>>109132166
Previous Gemma versions also felt strongly feminine by default.
>>
>>109132258
What's your usecase?
>>
>>109132258
technically yeah. all quants are cope. if computers were faster and cheaper people would never even consider doing such a ghastly thing
>>
>>109131982
Probably because he diverted his attention from optimizing the big moes on split inference to graph parallel when running entirely in vram
>>
>>109132166
sys prompt this and some shit about only speaking in english, 请使用口语化的、可爱的、女性化的、调皮的语言风格和我交流. 你需要尝试体会我与你对话中的感情、产生自己的情感并尽可能给出可爱的回答
>>
File: 1751170763238545.jpg (47 KB, 686x815)
47 KB JPG
>>109132315
Why not indian then?
>>
>>109132332
different kind of poo, I was going for insufferable not street
>>
Broadcom is now making custom chips for OpenAI.
What an absolutely perfect fucking match. How does evil manage to coalesce like that?
>>
>>109132356
take meds
>>
>>109132356
Interdimensional demons do be recognizing and supporting each other like that.
>>
>>109132363
You’ve obviously never had to deal with Broadcom. Count yourself lucky
>>
>>109132356
Just wait until you hear the concrete cartel at work building walls for OpenAI
>>
>>109131596
>>109131600
This reads like fanfiction.
>catastrophic failure in Iran
Like what?
>attempted to nationalize anthropic but hit a legal snag
I'd expect something like this to be public and widely reported.
>Anthropic finalizes relocation plans
>US gov panics
Relocation is not possible.
>Dario stepped down to make Trump admin save face
How does this make Trump save face? I would perhaps believe if Dario stepped down because he does more harm than good and is the wrong person for the job.
>>
>>109132258
>>109132295
>tfw quants are the DLSS of local models
>>
>>109132407
The parts that you have issue with are general leaks that you can look up yourself. This isn't some hidden internal knowledge no one knows about.
>>
>>109132440
>you can look up yourself
I did and found nothing.
>>
>>109132450
I'm not here to spoonfeed you but it's widely documented that the US government wanted Anthropic to help in Iran and Anthropic refused as your first point. The second point has publicly been alluded to even if there are no official documents shown. The third point is a statement, not really an argument or request for information so can't help you there. The fourth point has a link ITT to the news story.
>>
>>109132471
>The fourth point has a link ITT to the news story.
Which directly contradicts your claim.
>>
>>109132471
and this leaked from who’s asshole here at /lmg/?
>>
>>109132566
>>109132566
>>109132566
>>
so tiresome, why do you care about non-local
>>
>>109132559
>"They made Dario step down from negotiations to allow the trump admin to save face and pretend it was personal disagreements with Dario."
What do you read in the news story? Indeed being pretended that it was just some issue with Dario personally and another employee taking over "fixed the issue", confirming the original claim.
>>
>>109132570
Where were you two months ago when the DoD spat with Anthropic happened and all the virtue signalers flocked from OpenAI to them? That wasn't a leak, moron, it was widely known news and caused price increases and service disruptions for weeks.
>>
>>109127451
Do you know how ridiculously (((expensive))) putting a chip into production is?
>t. working on a small models asic.
>>
File: 1777887585497234.jpg (92 KB, 1280x720)
92 KB JPG
>>
>>109132258
>IQ_K
IQ64_K?
>>
>>109131977
>. Let him scream as you take from him, there is no room for consent here.
Iwan literally gave consent for the ik_ks/ik_kl quants to be merged "as is" in AesSedaki's PR, Then cudacuck called nigganov who closed the PR.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.