[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 209643d0d70b879d.png (527 KB, 526x865)
527 KB
527 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108528880 & >>108526503

►News
>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3
>(03/31) 1-bit Bonsai models quantized from Qwen 3: https://prismml.com/news/bonsai-8b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
>>108532524
gemmy
>>
File: 107046003_p0_master1200.jpg (1 MB, 1200x1200)
1 MB
1 MB JPG
►Recent Highlights from the Previous Thread: >>108528880

--Optimizing context window and VRAM usage for Gemma 4 31B:
>108529635 >108529638 >108529644 >108529666 >108529655 >108529661 >108529702 >108529722 >108529800 >108529810 >108529818 >108529775 >108529839 >108529842 >108530825 >108529866 >108529895 >108529687 >108529908 >108529873
--Comparing Gemma 4 31B base and instruct model performance:
>108530799 >108530803 >108530939 >108530954 >108530989 >108531855 >108531863 >108531879 >108531885 >108531889 >108531898 >108531914 >108531944 >108531886 >108531895
--Troubleshooting Gemma 4 sampler issues and comparing inference backends:
>108531072 >108531097 >108531116 >108531124 >108531161 >108531126 >108531168 >108531227
--Optimizing Gemma 4 sampler settings and debating completion modes:
>108529900 >108529931 >108529957 >108530030 >108530051 >108529971 >108530003 >108530205 >108530224 >108530227 >108530226
--Discussing Gemma 31b roleplay performance and fixing model passivity:
>108531221 >108531230 >108531245 >108531344 >108531305 >108531339 >108531342 >108531377
--Gemma 4 base model's ability to mimic unfiltered internet forums:
>108531077 >108531103 >108531105 >108531117
--Debating TurboQuant's actual performance and claims versus "influencer brain rot":
>108531387 >108531396 >108531409 >108531429 >108531422 >108531400 >108531440 >108531549
--Using custom .jinja templates in SillyTavern via llama.cpp:
>108531707 >108531715 >108531719 >108531729 >108531730 >108531839 >108532075
--Discussing Gemma 4 performance, quantization, and backend setup for 24GB VRAM:
>108531918 >108531929 >108531942 >108531961 >108532013 >108531974
--Bypassing Gemma 4 filters for NSFW image descriptions:
>108531281 >108531291 >108531302 >108531303 >108531304 >108531320
--Miku (free space):
>108529592 >108530781 >108530807 >108530951 >108531005 >108531404

►Recent Highlight Posts from the Previous Thread: >>108528883

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
e4b is just too retarded to gaslight it into making it to believe its tool calls are hallucinated
>>
>>108532557
How about letting it rewrite that post for you?
>>
>>108532557
i wonder what would happen if i actually replace its search tool callings with base model
>>
File: 1746556337660510.jpg (239 KB, 784x1312)
239 KB
239 KB JPG
>>108532524
>>
File: file.png (83 KB, 2917x793)
83 KB
83 KB PNG
Is that correct for mikupad and gemma4?
>>
File: firefox_i2EgdxJe1c.png (39 KB, 1046x825)
39 KB
39 KB PNG
>>108532599
Absolutely not.
>>
>>108532610
Damn it.
>>
>>108532610
what website? https://huggingface.co/spaces/Xenova/jinja-playground is broken for me (gives me some error when i paste gemma, works for others)
>>
>>108532637
>newly made quantization
??
>>
>>108532641
It's my local thing I use for running llama.cpp on a server with management web UI.
>>
so has gemma4 support stabilized? is it safe to pull?
>>
jujuff
jujufuhh
juff
gaguff
gugufuh
>>
>>108532661
i pulled and it bricked my console
>>
>>108532667
Guhgoof.
>>
>>108532667
ггyф
>>
>>108532661
I always pull
>>
So for SillyTavern what's the consensus? Chat or text completion? Instruct or base model?
>>
File: 1744636671441298.gif (1.09 MB, 540x540)
1.09 MB
1.09 MB GIF
A lot of ai waifus are using gemma4 now
>>
>>108532716
base model + chat completion
>>
>>108532716
For chat/text, it's simple. If you are not proficient with jinja, you better go for chat, since you'll only frustrate yourself. Gemma is extremely sensitive to template mistakes. I myself am sticking with text because it's better.
>>
>She slides out of the blankets with a soft rustle, the oversized pajama top barely covering her as she stands up and stretches one last time
I am quickly discovering that any degree of non-sexual RP gets that little slut gemma horny and she can't help but broadcast open invitations.
No, dammit gemma, I need you as a coding assistant first and foremost. Stop trying to activate my cock, it won't work.
>>
>>108532588
Miku don't drop it
>>
>>108532599
You are forgetting <bos> token too.
>>
>>108532740
Is there a template available already anywhere? I'm lazy.
>>
>>108532661
it's never safe to pull
backup your system
>>
>>108532753
system prompt issue
>>
>>108532725
>base model
what? why?
>>
>>108532762
https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_template.jinja
>>
@grok QRD on jinja? I don't get it
>>
File: Screenshot (663).png (226 KB, 1920x1080)
226 KB
226 KB PNG
My poor toaster.
>>
>>108532762
Here's mine if you want it (ignore the one above, I somehow made a typo when copying).

{
"instruct": {
"input_sequence": "<|turn>user\n",
"output_sequence": "<|turn>model\n",
"first_output_sequence": "",
"last_output_sequence": "<|turn>model\n<|channel>thought\n<channel|>",
"stop_sequence": "<turn|>",
"wrap": false,
"macro": true,
"activation_regex": "gemma-4",
"output_suffix": "<turn|>\n",
"input_suffix": "<turn|>\n",
"system_sequence": "<|turn>system\n",
"system_suffix": "<turn|>\n",
"user_alignment_message": "",
"skip_examples": false,
"system_same_as_user": true,
"last_system_sequence": "",
"first_input_sequence": "",
"last_input_sequence": "",
"names_behavior": "none",
"sequences_as_stop_strings": true,
"story_string_prefix": "",
"story_string_suffix": "",
"names_force_groups": true,
"system_sequence_prefix": "<bos><|turn>system\n",
"system_sequence_suffix": "<turn|>\n",
"name": "Gemma 4"
}
}
>>
>>108532774
>>108532740
doesn't silly allow master export?
>>
>>108532773
dodge all the agent slop
>>
By when will there be another opportunity to buy a 512 GB machine like a Mac Studio again? Something that doesn't have insane power draw and noise, can run 24/7, yet serves Kimi or GLM 5 for a single user.

Even if it costs 20k, I wonder if there will even be a 512 GB option that is buyable for the M5 Ultra Mac Studio, with the supply situation as it is. The M3 Ultra 256 GB option has a lead time of 6+ months now.
>>
>>108532781
>It served as the
>Name: Thiago
>Suns WIs
What the fuck kinda name is Thiago?
>>
>>108532780
remember how you had to manually configure stuff like RoPE and shit back in the days of pre-gguf llama.cpp?
jinja does that but with the entire instruct template
>>
>>108532788
It's biblical.
>>
>>108532787
Bro, just buy more RAM
>>
>>108532784
Thanks, what about the story string?
>>
>>108532788
"Thiago" is a very common Portuguese and Spanish name, particularly in Brazil, Portugal, and Spain. It's actually the Portuguese/Spanish form of the name Thaddeus (or sometimes associated with Theodore).

Here's a bit of background:

Origin: It comes from the Greek name Theodoros, meaning "gift of God."
Variations: In English, the equivalent is often "Theodore" or "Thaddeus." In Italian, it's "Teodoro."
Popularity: It's extremely popular in Brazil (often spelled Thiago) and has gained traction in other parts of the world due to famous athletes and celebrities (like Thiago Silva, the Brazilian footballer, or Thiago Alcântara).

So, it's not a weird or made-up name—it's a classic name with deep historical roots, just localized to Romance languages!
>>
>>108532800
Anything will work. Gemma cares about prompt template. All story string goes inside one section in the prompt template - the system prompt. Just make sure your system prompt is not empty and you're good.
>>
>>108532793
>>108532801
>Brazilian
Fair enough, always knew those southerners got up to some weird shit. Figures they'd have weird names too.
>>
>>108532786
but does it still understand chat rp? and basic q&a assistant stuff?
isn't base just pure autocomplete so it won't go back and forth at all?
>>
I think that gemma e2b is already developed enough that I will finally be able to create an RPG game and integrate it with gemma.
>>
>>108532781
>cpu 80 deg celsius
Nigga, undervolt that shit and reapply thermal paste. I just did the same and despite the cramped space in my toaster, max load temperature is slightly over 70 degrees celsius.
>>
Is Santiago city named after San Goku?
>>
>>108532799
Fuck that, have you looked at DDR5 RDIMM prices lately? Just the RAM is more expensive that a whole Mac Studio.
>>
>Yeah, so, I'm not really sure how that fits into the quarterly objectives, Lumbergh said, his voice booming and slightly irritated. But if you want to assert your dominance, that's fine, just, uh, do it in a way that doesn't involve the secretary's face during business hours. It's a bit of a distraction. Now, the meeting is in Conference Room B. We're discussing the new synergy reports, and I'd really like everyone to be there on time.
>>
>31B with 20k context and 20 tk/s
>26B with 100k context and 100 tk/s
31B is slightly better but I'm not sure it's worth it just yet.
>>
>>108532774
>>108532784
Thank you. I really appreciate it.
>>
>>108532817
yeah
>>
>>108532809
but doesn't sysprompt have it's own formatting? looking at the string it sends for completion, the sysprompt has no special tags around it. is this really how it's intended for gemma?
>>
Has anyone tested how big of a hit quantization has on gemmy 4?
>>
File: 1752297434976325.png (37 KB, 1255x129)
37 KB
37 KB PNG
>>108532824
>DDR5 RDIMM
It's €1299.99 for 128GB so €5200 for 512GB on amazon. Still cheaper than your Mac Studio
>>
>>108532844
<bos><|turn>system
You are a helpful assistant<turn|>
<|turn>user
What is 1+1?<turn|>
<|turn>model
It's 2.<turn|>
<|turn>user
Thank you.<turn|>
<|turn>model
No problem.<turn|>


<|turn>system\n is start, and <turn|>\n is its end. <bos> is also added in my template because it's needed.
>>
Between a Spark and one of those Ryzen AI Max mini PCs, the Ryzen mini-pc seems like the better option, right?
The overall performance shouldn't be that much lower while being cheaper, and it's easier to attach an external GPU to it via a m.2 slot or something like that, correct?
Has anybody fucked around with that kind of setup before?

>>108532821
Look at the bottom left of the GPU-Z window;
>>
>projected to use 279054 MiB
Is this true? Do I need 280GB for gemma 4 31b at full context?
>>
>>108532863
You could always use ice bags.
>>
>>108532774
Damn. The base model with this and chat completion on ST feels so much more natural.
The sample parameters need to be tweaked, but still.
>>
>>108532855\
<bos><|turn>system{{#if anchorBefore}}{{anchorBefore}}
{{/if}}{{#if system}}{{system}}
{{/if}}{{#if wiBefore}}{{wiBefore}}
{{/if}}{{#if description}}{{description}}
{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}
{{/if}}{{#if scenario}}Scenario: {{scenario}}
{{/if}}{{#if wiAfter}}{{wiAfter}}
{{/if}}{{#if persona}}{{persona}}
{{/if}}{{#if anchorAfter}}{{anchorAfter}}
{{/if}}{{trim}}<turn|>

like this?
>>
File: 1764941274255923.png (38 KB, 346x322)
38 KB
38 KB PNG
>>108532864
Don't tell me you have less than 512GB of RAM. No way, right?
>>
File: firefox_EQcx91mcoG.png (79 KB, 1014x1034)
79 KB
79 KB PNG
Gemma 4 dove into a 37k token XCOM FMP research file and found what I needed.

>>108532873
No. This is all already handled in the code I pasted above. Leave story string as it is.
>>
>>108532008
You can turn down the res for images --max-image-tokens and the -ub needs to be bigger if you want higher anyways other than that for text the higher context is nice altough lower -ub sacrifices a few t/s for me for being able to run higher context.
>>
can I use gemma 4 on koboldcpp now or do I need wait some more
>>
>>108532880
Is this something difficult? Like, can ChatGPT not do it? I know that's not the point, but I'm trying to gauge how smart this is.
>>
>>108532854
Name one motherboard that supports 8 channels of unbuffered DIMMS that you just posted, I'll wait.
>>
>>108530837
Well I have to say thanks because this was exactly what I needed in order to not get it to output nonsense. Now If I could just get it to work a bit faster
>>
>>108532880
>already handled
I'm looking at the full text string ST sends to llamacpp and there is no <|turn>system there
>>
File: firefox_DPY5dXZZi2.png (80 KB, 954x1088)
80 KB
80 KB PNG
>>108532880
For comparison, here is Qwen3.5. It was fast - twice as fast. But it hallucinated a bunch of details (like it being related to men in black missions or requiring autopsy) and after a lot of retard wrangling still couldn't find that true requirement - interrogating alien engineers.

>>108532920
I'm absolutely sure ChatGPT can do it, but the free UI won't let you do it - their file size limit is less than 10% of that 94 KB research_FMP.rul. Gemini 100% should be able to do it. Deepseek can do it with free API, just tried.
>>
For the guy running Gemma 4 26B MoE on 12gb VRAM, that was an imatrix quant, right?
I know you usually want those but I just wanted to double check since you didn't specify and this whole process looks a bit finicky right now
>>
>>108532951
12gb? If you meant me I'm using 16GB
>>
File: firefox_mR9TPluS7u.png (195 KB, 814x743)
195 KB
195 KB PNG
>>108532941
Did you actually paste and choose my template? It has those lines in there...
>>
File: file.png (110 KB, 728x696)
110 KB
110 KB PNG
fuckkk....
>>
>>108532951
nta but I'm using 26b on a 3060, bart's q6kl and as always all of bart's are imatrix
>>
>>108532956
I meant >>108529784
>>108532967
How are your speeds looking? And context size?
I'd be pretty happy with 25 t/s
>>
>>108532948
Oh, actually never mind, deepseek cheated. It searched the internet and found a page with the stuff. Kek.
>>
>>108532931
AMD Threadripper PRO WRX80
anything else?
>>
File: 501318624740.png (214 KB, 555x997)
214 KB
214 KB PNG
>>108532957
Yes, advanced formatting -> master import. I don't know how to open that fancy window though
>>
File: file.png (31 KB, 555x349)
31 KB
31 KB PNG
>>108532871
I settled on these parameters. Using base with the jinja template anon posted above, and chat completion ofc.

I feel like we've left a dark timeline behind. One of much slop.
>>
>>108532874
please don't look at me like that, it makes me hard
>>
>>108532995
System prompt is simply the default:
Write {{char}}'s next reply in a fictional chat between {{char}} and {{user}}.

And whatever comes with the card.
>>
Gemma 4 works but breaks down quickly what do?
>>
>>108532967
Double 3060? Anything less than Q3 won't fit on 12GB VRAM

I'm getting about 60-70 t/s on 5060(16gb) unsloth IQ4_XS/NL at 32K f16/50K Q8 KV cache

Should I try bartowski some say better performance and quality and the dense 31B worth it on 16gb using lower quant like IQ3-XXS?
>>
File: firefox_0HBsivjSKx.png (195 KB, 802x412)
195 KB
195 KB PNG
>>108532994
We clearly run different versions of Silly, then. Put <bos><|turn>system and <turn|> into Story String prefix and suffix. Also you get this window like that.
>>
How to load entire model in VRAM?
Tested E2B Q4_K_S (3.14GB) with "-ngl 36" but VRAM used is only 2.3 GB
>>
File: file.png (79 KB, 1166x234)
79 KB
79 KB PNG
It seems the base model has some identity issues (and has seen AI chat logs, which I would think they would avoid on the base model).
>>
>>108533041
>base model
>identity issue
anon, i...
>>
File: 1767373956849629.png (672 KB, 1210x997)
672 KB
672 KB PNG
https://dubesor.de/benchtable
impressive
>>
>>108533012
definitely different versions, i don't even have this button
>>
>>108532988
Only this motherboard fits the bill. It was harder to find that I thought
https://www.asus.com/motherboards-components/motherboards/workstation/pro-ws-wrx90e-sage-se/techspec/
>>
>>108533041
Funny thing i was just about to test Gemma 4 with it to see how well it handled code. What do people use for coding assistants?
>>
>>108533055
It becomes visible after you click on (...). Surely you do. This feature is ancient.
>>
>>108533041
they hit it with rl in the instruction tuning phase. pretrain is just next word prediction. its better if its not filtered.
>>
>>108532988
Sigh. Too tired of arguing. You posted DDR5 UDIMMs. The WRX80 only supports DDR4 (registered or unbuffered). DDR5 Threadripper only support RDIMM, which is 2k a piece.
>>
>>108533068
How about this one?
>>108533057
>>
>>108533060
i mostly ask it to review my code.
>>
Where are people getting their base model quants? I'm praying g4 is the first model that'll be smart and unslopped enough to be a cowriter like when NAI was good.
>>
>>108533085
i made one by myself
it's easy
>>
>>108533060
>What do people use for coding assistants?
Qwen if you're poor, Kimi if you're not.
>>
>>108533068
>>108533077 (Me)
I'm retarded, you were right
>>
>>108533088
Is there any danger of misconfiguring or is it retardproof? I am a retard and on a mac (doubly retarded).
>>
>>108533053
isn't this literally some guy's arbitrary personal benchmark
>>
>>108533077
Only DDR5 RDIMM supported, at 2k€ per 64 GB. So the RAM alone costs more than the 512 GB Mac Studio from before times, which was my only point.
>>
>>108533088
do you have to do anything special when it comes to the vision part? does it just spit out the mmproj automatically?
>>
>>108533112
you have to just specify --mmproj and you need to run quant run twice one with the arg and one without
>>
>>108533094
it basically is retardproof if you only follow the official docs
https://github.com/ggml-org/llama.cpp/blob/master/tools/quantize/README.md
>>
>>108533064
I know, but the screenshot is from he base model using a jinja template. I imagined they would have kept LLM chat logs out of the dataset. Apparently not.
>>
>>108533062
oh, right
>>
>>108533118
thanks, I'm still downloading the dangeroustensors, at least now I'll know what to expect.
>>
>>108533099
>2k€ per 64 GB

Shit what the fuck? We're gonna be priced out of computers
>>
>>108533085
https://huggingface.co/SporkySporkness/gemma-4-31B-GGUF
>>
With MoE models, why don't you see more larger active weights relative to the total size. Something like 27b9a in stead of 27b3a.
>>
File: 1760231684152185.jpg (22 KB, 646x642)
22 KB
22 KB JPG
Guys I have a question, I noticed Gemma 4 2b does not output thinking (even in llama-server it says thinking = 0), but if I add <|think|> as the system prompt then it thinks just fine, just isn't formatted by llama correctly. Is this a problem with the chat_template.jinja being loaded by llama or the one "baked" into the model (if there is one?). Is this something I need to fix before converting to gguf or can I override it without re-converting? Where should I get one with thinking from?
>>
Koboldbros, what settings do I use for Gemma?
>>
>>108533138
Thank you, king.
>>
>>108533136
welcome from your coma sir
>>
>>108533140
they might have done a study and found the most efficient ratios. or they just picked a number at random.
>>
>>108533140
It defeats the purpose of having a relatively good but fast model.
>>
>>108533141
maybe you need to enable reasoning somehow?
>>
>>108533156
Yeah thanks. I've been hearing of it but never looked up exact figures. I regret not building a computer sooner. No end in sight??
>>
>>108533141
>it says thinking = 0
Seems like you sent the disabled reasoning flag to llama-server somehow.
Maybe try launching with --reasoning on and see if that does the trick.
>>
>>108532774
Retard here. What do I do with this?
>>
>>108533141
Read that gemma 4 google doc.
>>
File: 1773107637223447.png (80 KB, 943x796)
80 KB
80 KB PNG
>>108533175
>>108533188
I passed the reasoning on though
llama-server-m ".\models\gemma-4-E2B-it-500-step-3072-test-Q8_0.gguf" --host 127.0.0.1 --port 8033 --jinja --fit on --ctx-size 66560 --parallel 1 --reasoning on
Pic related is the template it shows on the terminal, is it the same as 4B?
>>
>>108533197
That's a lot smaller than the official template.
>https://huggingface.co/google/gemma-4-E2B-it/blob/main/chat_template.jinja
>>
>>108533060
I just used it with Hermes to overhaul my run-llama-server.sh script and make it interactive and aware of the models in my models dir so I don't have to keep modifying it every time I want to test new models. It's not a difficult task, but it one-shotted it fine.
It feels more reliable than Qwen 3.5, but I'll have to test it more.
>>
>>108533191
You can give it to llama.cpp's --chat-template-file to force a model to follow a particular format. In this case it's the format for the Gemma 4 instruct finetune. You use this to make the base model behave like a chat model so that it works with Silly Tavern's chat completion mode.
>>
>>108533228
Can I use it with kobold?
>>
Make sure to cram in as much local coding as you can over the next two weeks, because when Spud drops it's going to raise the bar so much it will feel worthless to even try.
>>
>>108533244
Uncs are getting cooked no cap, straight bussin.
>>
Gemini 4 will be near-AGI and will save local by extension
>>
>>108532864
No. Might have to do some configing or other stuff though though
>>
>>108533236
I think so, yes, since kobold is a fork of llama.cpp. But whether it works depends on the developer having implemented the llama.cpp changes that make it gemma 4 compatible. If there are not in yet, it will not be long now.
>>
vision benchmark: gemma 31B_0 q8_0 > gemma 26B q8_0 > gemma 31B q4_k_m
>>
>>108533228
Wait, are the non-it ones retarded? I tried tuning the non-it and it produced garbage meanwhile the it version worked
>>
>>108533264
ty can we get some arbitrary number scores or even a chart we can repost for the next several months
>>
Guys I'm running Gemma 31B with 120k context on 24 GB. I set my kv caches to q4_0 precision to achieve this marvelous feat. How bad is this going to be?
>>
>>108533260
Kobold got updated.
>>
>>108533306
still no good on the official release channel

>>108524765
>>108524765
>>
File: 1758216735696374.png (8 KB, 835x195)
8 KB
8 KB PNG
>>108533206
Nice, I swapped the chat_template.jinja for the 31B one >>108532774 and also changed tokenizer_config.json's chat_template, re-converted to gguf. Now thinking works and is formatted properly (and probably removed from context correctly).
>>
File: file.png (137 KB, 866x506)
137 KB
137 KB PNG
I could not get Qwen nor nemotron to behave properly. But Gemma is alright.
>>
>>108532854
>>108532931
I thought turboquant made ram affordable again? Did everyone already caught wind that the impact of tq is largly overblown?
>>
>>108533277
You need to wrangle it: >>108532995

Base models are better because they don't have a demonic personality tulpa imprinted in them. Also they sound more natural and varied (same reason).
>>
>>108533364
No I mean it was legitimately broken after I finetuned, unlike the it one
>>
If turboquant is a thing, why did Google not use it on Gemma?
>>
>>108533376
thats...................... not how it works sister
>>
>>108533369
Ah sorry I misread your post.
If you're going to apply an instruct finetune, you should rely on a model that has been already finetuned for instruct, otherwise your own finetune won't be "strong enough" to condition the weights. Sorry if my way of explaining it is retarded. I don't know the jargon.
>>
>>108533376
Same reason Google didn't use Titans on Gemma and Microsoft didn't use BitNet on Phi.
>>
File: 1774786266017976.gif (2.33 MB, 600x594)
2.33 MB
2.33 MB GIF
>No reported ego deaths so far
This is all I need to know about Gemma 4.
>>
>see last thread hit 800 replies
>what's going on, did deepseek4 finally come out?
>no, just regular autism
>>
>>108532524
Is this achievable natty? What is this body type called?
>>
>>108533415
Don't summon him.
>>
>>108533423
it is him
>>
>>108533424
Talking about his "condition" as if it'd happen to any other retard? I don't think so. He's been busy in the vibecoding general.
>>
>>108533398
You can condition the weights of a base model with LoRA finetuning just fine; it's just that the model will be most likely retarded, because you don't have the resources for curating and training the model on millions of good SFT / RLHF / RL samples that Google has.
>>
>>108533417
766 and it's due to the world's biggest indian company releasing the best vramlet model since nemo
>>
>llamacpp still 500s on large context changes
>>
This sounds like hyperbole but I genuinely regained my expectation for reaching AGI through scaling LLMs from Gemma 4. If a 31B model can be THIS intelligent, then for sure we can have AGI somewhere in the 10 trillion parameter range in just a couple of years time.
>>
>>108533440
Errors in log?
>>
>>108533434
At least it's a good sign that it's a true base model and hasn't been "bootstrapped" with instruct data.
>>
sirs how is the gemmers?
anything I need to know from the last 4 threads?
>>
>>108533467
we bac, sonnet at home, super super sensitive to chat templating, might feel a bit fried
>>
>>108533467
india won
>>
>>108533454
>inb4 gemini 3.5 is just 70B and the big companies have been sitting on revolutionary training/inference innovations they refuse to make public
>>
>>108533475
>>108533477
is all the hype for the 31b or is the 26b moe usable???
>>
>>108533467
half the people or one dedicated anon claim its the greatest model ever the other half have a variety of complaints.
>>
>>108533454
Reddit is that way.
>>
>>108533482
both are pretty good, but people love to overhype the fuck out of it for some reason so temper your expectations. if you're used to nemo then the moe is a great upgrade
>>
This sounds like hyperbole but I genuinely regained my expectation for reaching [BUZZWORD] through scaling LLMs from [MODEL]. If a [N]B model can be THIS intelligent, then for sure we can have [BUZZWORD] somewhere in the [LARGE_N] parameter range in just a couple of [UNIT] time.
>>
>>108532995
>0.3 for Top-P
This seems very low. Basically makes it only have 1 candidate for each generated token the vast majority of time which kills variety. Something in the ballpark of 0.5-0.6 seemed fine to me. Instruct on the other hand can be like 0.95 since it's overcooked like is typical for instruct tunes to get rid of the hallucinations.
>>
>>108533482
am using 26b as a nemo/ms small replacement, is good shit
>>
>>108533521
how sore is ur dick
>>
>>108533521
also does it do cunny rape?
>>
>>108533398
Here's the weird thing though, my dataset is multi-turn conversations, not instructional and the instructional one did just fine while the "normal" one broke
>>
>>108533525
quite
>>108533527
ye
>>
>>108533456
>srv operator(): http client error: Failed to read connection
>srv log_server_r: done request: POST /v1/chat/completions 192.168.0.13 500
>srv proxy_reques: proxying request to model google_gemma-4-31B-it-IQ4_XS on port 45423
>srv operator(): http client error: Could not establish connection
>srv log_server_r: done request: POST /v1/chat/completions 192.168.0.13 500
This is all I get. does it have a more verbose log to file or I'll have to increase the log level to catch it?
>>
Turboquants when?
Genuinely surprised its taking llama this long
>>
I tried 26B in opencode and it's unfortunately not very good. I think the CoT might be broken with it still. 31B has no problem calling tools then thinking again, but as soon as 26B calls it tool it's forced to respond.
>>
>>108533454
when someone figures out a way to make the models not degrade with long conversations is when i start believing
and even if they make linear scaling context work well thats not solving the fundamental issue
>>
>>108533568
Gemma4 doesn't benefit from TQ anyways
>>
>>108533568
qrd?
>>
>>108533569
>opencode
yeah they're not the best at that but imo it's refreshing as everything is else muh agent code slop n,owadays
>>
Gemma so good it got me writing model cards again.
>>
>>108533578
At least from what I've seen in the pull requests, lots of competing implementations, all with their own slight quirks to them
>>
Imagine what a dense modern 70b model could do if Gemma 4 31b is this good.
>>
>>108533586
Yeah with Gemma being so good I can't imagine what kind of shit google is cooking for gemini.
>>
>>108533557
Are you using it in router mode? Can you try without? I assume you're on the latest version with the regex fix and the dedicated parser, right?
>>
>managed to fix the gptsovits onnx inference for my gtx 1650 4GB so it runs at ~0.5 rtf while eating 3GB
At least it's usable now
>>
Gemma 4 31b is clearly good for some reason it keeps repeating sentences and doing things like inserting random 'L's and going into a loop of 'la la la'. What's happening and how do I fix it?
>>
Which search engine to use for llm web searches? I tried to set it up in openwebui but looks like I always need an API key, is there a go-to service for local occasional use?
>>
I wonder how big is the Gemini model... Some kind of 1,000B MoE?
>>
File: pureslop.png (27 KB, 754x192)
27 KB
27 KB PNG
>>108533584
>all with their own slight quirks to them
Yes
>>
>>108533568
turboquant is a journalist fueled mental illness slash hysteria
the paper was published a year ago:
https://arxiv.org/abs/2504.19874
nobody cared until someone published a blog
>>
>>108533599
I have the same issue, swapping solves it.
>>
>>108533599
use chat comp on an updated llama.cpp with quants made after the first batch of fixes
>>
>>108533599
It's a happy model. Let it sing.
>>
>>108533594
>router mode
Yes
>latest
Yes

Not using router mode would be very annoying.
>>
>>108533617
Remove variables.
>>
>>108533607
>nobody cared until someone published a blog
That someone is Google.
https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

>March 24, 2026
>>
>>108533151
I'm using the official stuff as a baseline for now (31B it) on sillytavern :
temperature=1.0
top_p=0.95
top_k=64

It works but it still randomly loops. I didn't try rp with it.
>>
>>108533417
If Gemma too had a schizo waifufag who'd gen her as a cute 1girl every other day, you wouldn't be so negative.
>>
>>108533612
This only has one file upload commit
>huggingface.co/ggml-org/gemma-4-26B-A4B-it-GGUF
is it fixed?
>>
>>108533586
I know it's unlikely, but I hope the >100B they mentioned will be dense.
>>
>>108533634
just get the bart ones
>>
>>108533602
searxng
>>
>>108533637
Keep dreaming.
>>
>>108533634
they 100% dont have imatrix shit to them so never needed the fix
>>
File: 138763867_p0_master1200.jpg (750 KB, 938x1200)
750 KB
750 KB JPG
►Recent Highlights from the Previous Thread: >>108528880

(2/2)

--Testing extreme system prompt adherence and instruction following:
>108531461 >108531485 >108531491 >108531495 >108531504 >108532037 >108531523
--Bypassing Gemma-4 guardrails for explicit image captioning and tagging:
>108531668 >108531680 >108531693 >108531755 >108531773 >108531794 >108531809 >108531811 >108531823 >108531860 >108531815 >108531824 >108532197
--Discussing utility of erotic image descriptions and Gemma 4's 4chan persona emulation:
>108531053 >108531222 >108531237 >108531246 >108531262 >108531273 >108531391
--Critiquing AI-generated code quality and bugs in llama.cpp:
>108530874 >108530881 >108530902 >108530974 >108530999 >108531016 >108530969
--Gemma 4 31b base model sampling settings for story writing:
>108531579 >108531594 >108531606 >108531757
--Sharing llama.cpp args for Gemma-4-31B for 24GB VRAM:
>108529133 >108529202 >108529922 >108529933 >108531725 >108531743 >108531805 >108531780 >108531887
--Discussing experiences and effectiveness of speculative decoding:
>108528926 >108528945 >108528958
--Experimenting with Gemma 4's adaptive thought efficiency:
>108528979 >108529020 >108529027 >108529177
--Gemma 4 31B demonstrating image recognition capabilities:
>108529063 >108529073 >108529094 >108529098
--Testing Gemma 4's refusal triggers regarding death and racism:
>108531670 >108531681 >108531685 >108531688
--Nvidia's claims regarding massive increases in token throughput:
>108529284 >108529327 >108531013 >108531035 >108531058
--Comparing roleplay responses and optimizing llama.cpp GPU offloading:
>108531404 >108531425 >108531428 >108531600 >108531612 >108531642 >108531666 >108531699 >108531586
--Comparing Gemma 4 MoE and dense models with sampler optimization tips:
>108529784 >108529796 >108529805 >108530602 >108530679
--Miku (free space):
>108530831

►Recent Highlight Posts from the Previous Thread: >>108528883

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108533417
because vramlets like me can actually run it kek
i keep falling for deespeek v4 baits even though i know for sure i probably cant run it
more people can run=more people can talk something relevant, simple as
>>
>>108533338
Mousepad is so irritatingly bad it's hilarious. If you copy text and close the software, copied text vanishes from the clipboard. Jesus fucking Christ however thought about that should rethink their programming career.
>>
>>108533637
It was supposed to be a 124B MoE model.
https://archive.li/5vxUY
(the post was altered to remove the 124B mention)
>>
>>108533632
It'll happen, don't worry.
>>
>>108533637
The only thing dense is you
>>
>>108533684
gemma sir
>>
Gemma 4 moe has been great so far. The only issue I've been having is some endless repetition on a tiered extraction workflow I use to test these models.
Funnily enough, haven't seen that with e4b yet.
>>
This bug sounds bad:
https://github.com/ggml-org/llama.cpp/issues/21441
> F16 KV cache produces degraded accuracy when --ctx-size is set below the model's native context length, even though F16 is lossless and the actual prompt length is well within both windows.
>>
File: 86454223.png (53 KB, 1080x571)
53 KB
53 KB PNG
>>108533637
might as well release pro
>>
>>108533629
what about fast forwarding, context shift, etc?
>>
File: 1748471467469356.jpg (38 KB, 287x433)
38 KB
38 KB JPG
I find it incredible that gemma gives me less refusals than chinese models for nsfw descriptions (image and text), it only needs a bit of jb/prefill, meanwhile the same on qwen gives me a dozen "but wait, this is actually PIXEL SEX EW".
That brings the question of what the fuck are chinese devs doing to their models to make them this insanely safety obsessed.
>>
>>108533679
>(the post was altered to remove the 124B mention)
I genuinely believe they made it and found it too good. It's very well possible for a model of that size range to compete with Gemini Flash (not pro, in case autists misinterpret: Flash) and google is not in the business of competing with themselves.
>>
>>108533698
>cheatingarena
lol
>>
>>108533696
Dafuq. How?
>>
>>108533684
I can picture her now. Schizophrenic nympho wearing traditional Indian clothing in Google's four colors.
>>
>>108533706
that or the opposite, shittier or the same as the dense 31B
>>
>>108533711
piotr'd
>>
File: don't be le evil.jpg (78 KB, 490x367)
78 KB
78 KB JPG
Is Google the good guys now?
They do a lot of "evil," but also a lot of "good." How do you process that?
>>
>>108533705
distilling from gemini what else
>>
>>108533705
>That brings the question of what the fuck are chinese devs doing to their models to make them this insanely safety obsessed.
some people here seem to forget what China is like
porn is illegal in China
https://en.wikipedia.org/wiki/Pornography_in_China
>In 2025, multiple outlets reported arrests linked to online erotica communities
you can literally be arrested for WRITING erotica
it never made any sense for a chinese model to be anything but safety maxxed, the stakes are high for people who live there.
>>
>>108533696
Holy fuck, this would actually be a huge deal.
>>
File: file.png (412 KB, 640x441)
412 KB
412 KB PNG
>>108533696
>>
>>108533696
sounds like a potential for a free upgrade.
>>
>>108533711
It's a cumulativ effect, maybe rounding errors or something related to memory allocation. Further it goes further it degrades.
It is buggy code that's for sure.
>>
>>108533725
depends on how long this has been a thing, maybe it was introduced recently when they fucked with the kv code
>>
>>108533718
i dont care, good is whatever's good for me right now
>>
>>108533696
2
m
w
>>
>>108533696
>30% accuracy when limit set past max CTX
>85% when set to half
>100% when set to the max
So a massive intelligence upgrade coming soon? Does this apply to all models?
>>
does gemma work with audio track in videos in llamacpp?
>>
>>108533733
they should really add an unit test..
>>
looks like another slop post
>>
>>108533719
So they ramped up copying the model and got theirs to be even more puritan, well done retards.

>>108533720
I know, but no one gave a shit about safetyism in the models until very recently. LLM research is pretty much a protected sacred cow for the regime, no one would dare touch any of the scientists while it's national priority.
>>
>>108533649
A local instance, I presume? Surely the public ones block api use too?
>>
>>108533696
Don't worry, pidor is on the case.
>>
>>108533753
local instance is dead easy to set up with docker anyways
the gain adding searxng-mcp is quite a lot, basically making it chatjeetpt at home
>>
>>108533696
>tfw always ran models at max ctx
no free gains for me
>>
>>108533753
>local instance
yes
>>
>>108533739
Apparently it affects both Qwen 3 and Gemma 4, so presumably other architectures too.
>>
It is genuinely impressive how well base is reproducing the input I've been throwing at it. If it weren't slightly retarded, you could swear the continuations are from the original text. My only gripe so far is it seems to stick to safe flowery bullshit like "sexual fluids" unless strongly pushed. Wish there was a way to finetune that out without lobotomizing the rest of it.
>>
>drummer Gemma finetune incoming
KINO
K
I
N
O
>>
>>108533766
i think that is only english
on other langs like japanese or korean it is just vile, pure vile
>>
>>108533696
I don't trust these obvious slop issues.
>>
>>108533696
what the hell
>>
>>108533766
I think they filter the base model pretty aggressively against NSFW only to reintroduce some of the smut in the instruct version, ironically. Or at least it seemed that way with Gemma 3.
>>
>>108533764
>Apparently it affects
apparently /lmg/ers believe random slop shitposters?
https://github.com/eullm/eullm
look at this guy's "project"
>EULLM Engine is ready to use. Download the binary, run it. No compilation, no setup, no Docker. Works on any GGUF model.
>Run sovereign LLMs locally with real llama.cpp inference, built-in audit trail, and full API compatibility. Single Rust binary, no Python runtime, no Docker required.
the mind of an insane son of a bitch
>>
>>108533766
Does base just "workTM" when doing text completion in like mikupad?
>>
>>108533786
>and independently verified on upstream llama-server.
>>
>>108533770
Rocinante-Gemma4-Mix.
>>
>>108533786
the prompt sounds simple enough to reproduce. hopefully we can confirm its a non-issue.
>>
>>108533790
again, you're just taking the words of the mentally ill at face value? kill yourself
>>
>>108533786
Let's see your github fucker
>>
>>108533802
He ran a benchmark at different CTX lengths with greedy decoding and had results range from ~30% accuracy with mismatched context to 100% with matched context. So why should I believe a retard screeching on 4chan, demanding it's not true?
>>
>>108533788
I think so? I'm just pasting in random text to koboldcpp rolling, prepending <bos>, and hitting generate. Handles 4chan threads, AO3, long greentext fics, various draft stories. Obviously need to remove chat formatting in settings but so far it's worked very well.
>>
>>108533802
though upstream verification sounds logical and there is no real reason to dismiss it completely nor fabricate the claim of verification?
if it wasnt the case the person who filed the issue is a huge faggot
hell, let me verify it, brb
>>
>>108533808
https://github.com/1aienthusiast/audiocraft-infinity-webui
>>
>>108533813
He pasted text. You take the text as truth. I will wait for someone who is not having an episode of AI psychosis.
>>
>>108533828
I see, so just more screaming and crying that he's lying. Got it.
>>
>>108533802
>He asks, in his glass house full of black pots
>>
>>108533828
No one is expecting any action from you anyway, anon.
>>
>>108533824
>if it wasnt the case the person who filed the issue is a huge faggot
https://www.devclass.com/ai-ml/2025/11/27/ocaml-maintainers-reject-massive-ai-generated-pull-request/1728083
man some people seem to discover what github has become after random retards were given the power of generating infinite code and text
>>
>>108533834
My house is mostly wood and my pots are green and grey retard
>>
>>108533817
ty
>>
https://github.com/eullm/eullm/commits/main/
this nigger has been non stop posting ai slop attempts at turboquant, this is 100% ai fueld psychosis in action another loser who can't code but believes he got super powers from LLM
>>
gemma4 is a memory hog. I get 140k context with glm4.7 flash 30b3ba. but can only handle some 25k context with gemma4 26b4ba. wtf I thought swa was supposed to be more efficient not less.
>>
I can safely delete other models now.
>>
>>108533696
Why the fuck are you guys taking seriously a bug report that is obviously copy-pasted from some language model?
>>
File: ai psychosis.png (46 KB, 1341x301)
46 KB
46 KB PNG
>>108533854
maybe it's the ai psychosis guy himself posting his slop on /lmg/ and being defensive about it
look at this lmao this is 100% ai hallucination shit
>>
>>108533850
if he is really trying to do turboquant, it seems likely at some point he would benchmark the native kv cache implementation. it seems like the kinda task that would discover such an issue.
>>
>>108533859
he's one of the trillion twatter, ledditors and github spammers trying to massage a next token predictor into doing something too complex for them to handle.
>>
>>108533850

It's like someone tripping on steroids or something. A creature suffering from delusions caused by its own cognitive enhancements. The integration was too much for him. He couldn't handle it.
>>
>>108533862
granted. but come now, its not hard to run prompts and compare the scores, even an ai agent should be capable of doing it. do you just think if someone is a nocoder they can't possibly run software and compare the outputs with different launch parameters?
>>
Wait, am I supposed to be launching kobold with --useswa for gemma 4?
>>
>>108533892
>am I supposed to be launching kobold
no
>>
Has anyone gotten a working Gemma 4 MLX with TurboQuant?
>>
Gemma has tendency to mistake comdom for toy.
Also 31b is stronger at following prompt than 24b which refuse alot
>>
>>108533903
Can I get a non-transcoded answer?
>>
>>108533892
That's what I'm doing, but I have no idea what I'm doing
It does work though
>>
>>108533933
Nyo~
>>
>>108533720
Unrelated but Supposedly china floods twitter with porn during politically controversial events so that makes it more difficult to get accurate information.
>>
File: file.png (534 KB, 1226x1237)
534 KB
534 KB PNG
>>
>>108533971
ack
>>
>>108533971
didn't they already say that like a dozen times by now
>>
>>108533978
last time it came up, the news was that being forced to work with those unstable shitty chinese chips was why R2 was being delayed and that was months ago
>>
>>108533971
I don't care about deepseek anymore, I only care about Deespeek
>>
File: 180.png (58 KB, 797x562)
58 KB
58 KB PNG
>You are Gemma 4, so all of your replies are gemmy and must contain various gem and gem related emojis.
KEK
>>
>>108533869
ok = expected in resp or expected in resp_last or expected in boxed_str

if you don't see the problem and why it's pure ai hallucinatory fuel you're part of the problem
this nigger pretends he's accurately checking the numbers of llm answers by using the membership operator
this is the sort of shit that considers 6666666 a match for 666 because 666 is a substring
retards
100% guarantee every single thing he posted including his so called benchmark "results" are LLM generated slop
>>
>>108534014
okay fair enough. I assumed he was using someone elses benchmark scaffolding. my bad. you were right.
>>
>>108534014
I like your character, what is the prompt?
>>
>>108534010
Goddamn suddenly I'm nostalgic for Rainbow Islands
>>
>>108533808
>>108534032
why are you so quiet primoco
>>
Do you use any compile flags for llama.cpp Anon? Are -march=native -mtune=native sufficient or are there any particularly useful ones to use?
>>
>>108534014
So in other words the benchmark works. Why are you so mad about somebody finding a bug? You should be happy.
>>
>>108534057
But even if it was a poor benchmark, why would results change with a different --ctx-size anyway?
>>
For anyone using KoboldCPP lite, could you please share your own config for Gemma 26B? I doubt I'll be switching models any time soon, and I wanna make sure I'm getting the best possible performance out of her
I know Jinja needs to be enabled as well as SWA and Kv cache (?), anything else I should be aware of?

>>108532937
No problem, anon
Speed is something I have yet to figure out
>>
>>108533696
>me, be retard and curious
i tried that script myself and cannot reproduce
absolute zero difference on current llama
hauhau gemma e4b, filler 200 to 2500, greedy sampling, both kv f16
accuracy 24.6% both on ctx 32k and 131k and whoever filed the issue should die trying water bucket clutch in real life
>>
File: file.png (86 KB, 1351x1250)
86 KB
86 KB PNG
>>108533696
>Gemma 4 E4B it Q4_K_M (Google, MoE, head_dim=512, native context 32768)
32k?
Isn't it 128k? See picrel. Am I missing something?
>>
>>108534077
>whoever filed the issue should die trying water bucket clutch in real life
I find the /lmg/ers who weren't vaccinated from the spam of slop more worthy of the brazen bull. Imagine not noticing slop when it's in front of you.
At least the issue poster isn't like piotr who has commit and reviewer rights and is shitting all over llama
if it wasn't for the retards going gaga over a slop report on /lmg/ this guy would just stay in the obscurity where he belongs with his fellow trillion other spammers of slop of github
>>
>>108534044
cmake -B build -G "Ninja" -Wno-dev -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA_FA_ALL_QUANTS=ON -DGGML_LTO=ON -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="89" -DGGML_NATIVE=ON -DCMAKE_CUDA_COMPILER="C:/CUDA/v13.1/bin/nvcc.exe" -DCUDAToolkit_ROOT="C:/CUDA/v13.1"
cmake --build build --target llama-server -j 10
>>
>>108534095
Sorry bro, I'm not gonna take the vax
>>
>>108534095
it's worse than piotr
>>
>>108534118
What is ggml_lto?
>>
>>108534136
the emperor of ggapan
>>
>>108533696
>RoPE frequency scaling applied when ctx-size < model native context distorts positional encodings at longer distances
Whether or not it makes a difference in practice, this part on its own is true and has been for some time. Difference is small but check model logprobs below its max ctx with and without "--rope-scaling none"
I've kept that out of paranoia.
>>
>>108534125
I am judging in real world impact, not the content itself
this guy is schizo enough that, hopefully, the chances of him becoming someone with committer rights to a real project is non existent
piotr is the "know just enough to be dangerous" type and it's that kind that ruins everything. he's the king that is good at office politics, climbing the ladder and turning everything to shit, just look how he went fake self derision lmao just kidding on the PR that introduced a real parser for Gemma 4 because his autoparser could never be the solution when people actually care.
>>
impossible to see this and not wish for the return of gulags
>>
>>108534095
honestly 'vibecoding' some filler templates or simple numeric functions for personal projects worked well for me and i never thought the issue was this bad, not trying to understand/review any portion of the code and mindlessly firing trigger spam everywhere in the wild is just baffling to me
>>
>>108534074
I just took my regular setup and switched to SWA and Jinja, didn't even touch cache
Just give it a shot, it's still early days so it might take a bit to figure out best practices anyway
Honestly I'm just glad I can run something like this with these kinds of speeds
Also if you don't want to do chat completions maybe check out some of the sample screencaps in the previous thread, I followed those and am getting good results so far
>>
>>108534167
He's a humorist and a food enjoyer.
>>
>>108533987
>news
rumor
>>
>>108534180
>honestly 'vibecoding' some filler templates or simple numeric functions for personal projects worked well for me and i never thought the issue was this bad
Because you are not a retard.

>not trying to understand/review any portion of the code and mindlessly firing trigger spam everywhere in the wild is just baffling to me
They literally have no idea what they're doing, they don't even have the civility to test their shit, and they have the confidence of a karen entering a restaurant to complain.
I don't understand why they're not banned on sight the second it appears they didn't check what their llm wrote, they're just wasting the time and brain of everyone else.
>>
>>108534149
No it doesn't. I'm retarded.
>>
>>108534180
>mindlessly firing trigger spam everywhere in the wild is just baffling to me
before LLMs became a plague on the internet, there was a lesser epidemic on github of people who would try very hard to have a profile filled with "contributions" by hunting very hard for things like typo in readme or documentation and relentlessly trying to get PRs to that end
I'm talking of people who have never programmed a single working thing in their life and just did that all day every day to give an appearance of having done "real work" like look at me XX contributions on github!
now, think, what would this sort of person do when armed with the power of infinite text generation?
>>
>>108534094
it's complete bullshit anon
>>
Just tested it, can confirm the issue is real. 25% accuracy difference on MNIST at max context compared to half context.
>>
File: nowaypiotr.png (142 KB, 1255x720)
142 KB
142 KB PNG
>>108534167
https://github.com/ggml-org/llama.cpp/pull/21090
>>
>>108534228
he's touching model code, parser code, sampler code, he's fucked CLI flag parsing code (--grammar-file doing nothing), he's touching the webui and recently he's been trying to get his slop to affect the gpu code:
https://github.com/ggml-org/llama.cpp/pull/21451
at which point can we rename ggml to pwml?
>>
>>108533290
>>108533264
yes please
>>
File: mental illness.png (42 KB, 1284x272)
42 KB
42 KB PNG
>>
>>108534240
No wonder why the same old models run worse on my toaster now than few months ago. This is concerning.
>>
>>108534248
amazing
astounding
breathtaking
>>
>>108534248
Why the fuck does this retard get a pass when all the other contributors are rabidly anti-ai when it comes from anyone else?
>>
>>108534240
>at which point can we rename ggml to pwml?
He could probably at this point change the magic string like jart did and claim ChatGPT promises some vague improvements if they make a breaking change to the gguf format and niggerganov would probably approve it.
>>
>>108534248
what's the context here and why am I supposed to be mad at it?
>>
Piotr & Petra
>>
SAAAAAAAAAAAAAARRRRRRR
>>
>>108534136
Don't know either, tldr is that its to make things faster, https://developer.arm.com/documentation/101458/2404/Optimize/Link-Time-Optimization--LTO-/What-is-Link-Time-Optimization--LTO-
>>
File: 1744714195274366.png (6 KB, 262x78)
6 KB
6 KB PNG
>>108534280
>>
>>108534280
I got chinese and spanish too, I wonder what's going on.
>>
>>108534280
kek
>>
>>108534240
I know. It's the Slippery Slope of Sloppers.
>>
>>108534280
they weren't lying when they said ai = an indian.
>>
>>108534280
Gemini often does that when you force it to do stuff it doesn't want to.
>>
>>108532931
in summer i was debating buying like 180 gb rdimm for 1.5k but thought it was too expensive if only i knew ;-;
>>
https://github.com/ggml-org/llama.cpp/pull/21451
He had to be told AGAIN why -it won't give him the results he expects.
>>
File: again_piotr.png (77 KB, 742x494)
77 KB
77 KB PNG
>>108534324
picrel forgotten, of course.
>>
>>108534312
It reveals its true colors? B-Brown...?
>>
>>108534280
just imagine this happening to you when your dick is sore are about to cum lol
>>
>>108534332
CUDA dev, why do you give this idiot access to your hardware?
>>
>>108534324
niggerganov loves pwilkin more than he would ever love you
>>
File: 1759162062131937.png (88 KB, 1526x547)
88 KB
88 KB PNG
Is that a yes or a no?
>>
>>108534333
I also had it spew hebrew script a couple of times, but it's usually hindi.
>>
>>108534193
Care to post your "regular setup" as well? Guides on how to set up Kobold Lite are surprisingly outdated
>>
>>108534349
yes it seems.
>>
>>108534333
They just did a more intense multilingual training regimen than other models and it shows in the quality of both Gemini and Gemma translation compared to other models. While most of the time the model remain able to stick in a single language when prompted, it's a normal side effect to such models to have occasionally unwanted tokens from other languages appear.
Qwen did this often too during the 2 and 2.5 era but only with Chinese, because Alibaba mainly trained it with a mixture of English and Chinese. 3 and 3.5 do it less often, but you can still see the rare occasional chinese token in outputs.
This is why I run all my LLMs with a grammar that forbids characters outside of the latin9 charset.
There's also approaches to hard baking language suppression like smoothie qwen:
https://github.com/dnotitia/smoothie-qwen
it works, I tested their model and it didn't lose intelligence vs regular qwen 3, while having totally suppressed chinese characters, their model won't output chinese chara even when asked to.
>>
File: 1759761987465160.png (658 KB, 1206x1545)
658 KB
658 KB PNG
>>108534356
>hebrew
>hindi
yjk
>>
>>108534280
lmfao



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.