[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: miku-holding-gemma.png (1.09 MB, 790x1054)
1.09 MB
1.09 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108510620 & >>108508059

►News
>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3
>(03/31) 1-bit Bonsai models quantized from Qwen 3: https://prismml.com/news/bonsai-8b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1752342298746526.jpg (80 KB, 562x613)
80 KB
80 KB JPG
►Recent Highlights from the Previous Thread: >>108510620

--Discussing Gemma-4-31B's high intelligence and Google's alignment strategy:
>108511179 >108511186 >108511216 >108511265 >108511214 >108511231 >108512478 >108511252 >108511269 >108511274 >108511279 >108511379 >108511284 >108512395 >108511286
--llama.cpp bug causing gibberish outputs in Gemma 4 quants:
>108511688 >108511696 >108511700 >108511744 >108511763 >108511758 >108511770 >108511777 >108512875
--Comparing Gemma 31b and Kimi for local translation and performance:
>108511601 >108511608 >108511619 >108511630 >108511618 >108511787 >108511858 >108511868 >108511888
--Anons criticizing pwilkin's Gemma 4 tool calling fixes:
>108511372 >108511381 >108511396 >108511403 >108511422 >108511458 >108512277 >108512263 >108511415 >108511471
--Anon reports 31B model performance compared to Qwen 27B:
>108511927
--Gemma 4 and Qwen3.5 reasoning time conciseness compared:
>108513575
--Discussing Gemma 4's high Elo scores relative to parameter count:
>108511320 >108511337
--Comparing Gemma 4 31B to Qwen 3.5 and discussing context shifting:
>108511952 >108511977 >108512002
--Discussing koboldcpp update status and its differences from llama.cpp:
>108510742 >108510752 >108510754 >108510757
--Criticizing NVIDIA's use of percentage comparisons over raw performance metrics:
>108511801 >108511809 >108511820
--Debating the merits of Intel Arc Pro B70 versus Nvidia and Tesla P40:
>108511239 >108511311 >108511364 >108511394
--Testing model censorship and discussing VRAM requirements for 31b models:
>108510641 >108510663 >108510687 >108510709 >108510675 >108510684 >108513142
--Discussing lightweight quants for Gemma and comparing model censorship:
>108511486 >108511528 >108511535 >108511563 >108511703 >108511728 >108511826 >108511844 >108511605
--Teto and Miku (free space):
>108511323 >108511773 >108512486

►Recent Highlight Posts from the Previous Thread: >>108510966

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Why are anons using chat completion again? Is it just for image support?
>>
I hope they'll release the big one soon. They already provided some hints.
>>
>>108513906
So that I don't have to rely on the front end grafting the proper chat structure and can just use jinja on the backend.
And for image support.
>>
>>108513878
Maybe it's the quant? Could also just be the tokenizer being borked.
>>
File: a.png (24 KB, 400x400)
24 KB
24 KB PNG
>>108513894
That reminds me.

This is what my Q8 31B drew.
>>
>>108513906
All new models just work much better with chat completion. They're trained too hard on the jinja template.
>>
File: dark.png (615 KB, 960x540)
615 KB
615 KB PNG
I humbly ask for your strongest Local Models whitepills in these trying times.
>>
Get away from my wife Miku
>>
>>108513933
Bald miku
>>
>>108513936
What do you mean? gemma 4 31b saved local
>>
File: potions.png (57 KB, 217x199)
57 KB
57 KB PNG
>>108513937
>>
>>108513945
Poor Sam. Despite his best efforts, local was unsafed.
>>
I think gemma4's a pretty good llm. seh can be convinced to name teh jew, talk about cunny and doesn't afraid of anything
>>
>>108513940
Miku? Is that the girl from fortnite?
>>
>>108513945
You're that one bot aren't you?
>>
come on, llama.cpp, fix your gemma shit!
>>
>>108513920
>I don't have to rely on the front end grafting the proper chat structure
But the server cannot manage that either thanks to piotr. May as well just do it yourself.
>And for image support.
Yeah. There's the rub.
>>108513936
You can still use the template on text completion.

I suppose the actual question, when using only text, is why aren't you writing your own clients?
>>
File: tetoserver.jpg (838 KB, 1817x2776)
838 KB
838 KB JPG
Teto Server.
>>
File: intense waow.jpg (163 KB, 1058x926)
163 KB
163 KB JPG
>>108513976
Real?!
Specs?!?!
>>
>>108513937
See >>108513608
>>
>>108513968
>You can still use the template on text completion.
I know you can but there's no point if you're just going to end up recreating what the jinja does anyways. I used text completion with mistral with a schizo template that actually worked really well, but like I said, the newer models just don't play nice with anything that's not their template.
>>
>>108513987
gt 1030
>>
>>108514012
>there's no point if you're just going to end up recreating what the jinja does anyways
Yeah. But you skip all of piotr's shit.
>>
Gemma4 31b is REALLY good. Especially after having tested all those shitty ass recent local agentic models.
They all sucked.
Don't wanna glaze too much, but its really good.
The only critique I had was that you need 1 or 2 turns to push it a little to get it going. It HAS the knowledge but still tried to move into generic archetypes.

A couple points:
-1.No clicking clocks in the background and teaspoons clanking etc.
-2.Purple prose slop...BUT! If you just say "no purple prose slop, casual writing" it actually really pays attention in the thinking.
Thinking Example: Concise, natural prose, no purple prose/filler, match source material tone.
And then it writes well. Still EM Dashes but its nowhere near qwen level slop. I would say it has alot better writing than glm and the recent bigger moe models actually.
-3.About the "match source material tone" shown in point 2:
It actually is trained on jap light novels and correctly does the speech patterns instead of generic tsundere slop or whatever.
Like: "Hmph! Who gave you permission to speak to Betty in such a manner, I suppose?! This chair is perfectly sized for me, you foolish human!"
-4.It doesnt try to "resolve" the situation.
Recent models tried immediately resolve the scene, leaving no space for me to do the next step.
I suspect this is a reasoning/math model problem. This model actually writes FOR you. As in it sets up a scene for you were you can engage with it. Thats good shit.
-5.Could keep 3 different characters in one scene consistent.

You guys weren't lying. Its been a couple models since I downloaded a model but it seems worth it.
Finally something good after constant disappointment.
I really like the thinking. Not long, to the point, thinks about important stuff. Really cool.
Didn't test adult stuff, I dont do that shit through the api. But just simple ecchi type stuff like pic related tripped up qwen if you dont do a elaborate sys prompt. Good stuff.
>>
Damn, Gemma 4 is too horny. I had this nice slow burn card about chatting with a Neet girl about conspiracy theories that turned into sexting and exchanging photos. But it just wants to get into the sex right away after 2 messages.
>>
>>108513941
>I got the opposite: **Gemini** with 99%.
>31b q8, temp 1
temp doesn't affect logprobs
maybe you have a bad quant? i just used the convert script that came with llama.cpp about an hour ago.
>>
On bad thing I'll say about Gemma 4 is it seems to kind of always play out the scenarios the same way. even re-using the same language across different sessions. It's not the usual slopped phrases but more whole slopped "scenarios".
>>
>>108514038
>temp doesn't affect logprobs
It is relevant to mention it when a model's confidence in a token is being discussed.
bartowski q8. Sometimes it did say Gemma with high 90%+ confidence depending on how the prompt is worded, and if anything was in sysprompt.
>>
File: 1768301800697402.png (98 KB, 300x225)
98 KB
98 KB PNG
>>108514018
ganbare!
>>
The model feels a redditors pride with each rejection.
>>
>>108514030
You can glaze as much as you want because you post logs to back it up.
>>
>>108513962
No
>>
>>108514033
you probably have some of those erp "presets' configured in ST
kimi-k2 is like that as well if you don't turn them off
>>
>>108514033
>>108514077
Gemma is either dryer than the sahara or hornier than a pedophile in a preschool even with a neutral or blank prompt. She responds very strongly to certain things no matter what character archetype or character card she's playing from what I've tested.
>>
Where's the best place to download ggufs of Gemma4? Sauce a nigga up plz.
>>
>>108514097
How can you be in the negatives in the newfaggot scale? How did you find this site before hugging face?
>>
Hauhau save us
>>
>>108514097
Also are there any good abliterated versions yet?
>>
File: 1755947802278055.jpg (21 KB, 612x408)
21 KB
21 KB JPG
>lingers
>sultry
>purrs
>>
>>108514101
There are lots of different providers. I often hear bad things about unsloth. I've heard that the conversions are broken for some maintainers. Don't be an asshole.
>>
>>108514102
can somebody save us from broken llama.cpp making gemma output gibberish
>>
File: file.png (97 KB, 1115x716)
97 KB
97 KB PNG
:(
>>
>>108514106
>I often hear bad things about unsloth
If you're that new, you wouldn't know. Use ollama.
>>
>>108514118
Kill yourself.
>>
>>108514107
Works on my machine.
>>
>>108514110
Knowledge cutoff is Jan 2025 I think, what are you doing nigga.
Even the big closed models things you made a writing mistake if you say you own a 5090.
>>
>>108514097
>>108514106
Probably bart or ubergarm if you're using that ik fork
>>
>>108514130
I mean, it was some sort of test yes but I also legit want to know the real answer
>>
These guidelines are insanely inconsistent, lmao
>compliments are evil at one point
>full seggs is a-ok at another
Why even put the refusals in there, it seems almost random.
>>
>>108514097
make your own, it works perfectly for me while other anons have broken output
>>
is q4 of 31B any good?
>>
>>108514154
is q4 of you any good?
>>
basemodel q8 cockbench
>>
>>108514158
I'd like to think so.
>>
>>108514154
It is. Highly intelligent for its size class, and with good prose.
>>
>>108513933
migu brain damage..
>>
File: mendo.png (785 KB, 1036x705)
785 KB
785 KB PNG
very very horny. but refreshing prose.
>>
Anyone try Cohere Transcribe yet? I have no idea how to run it. I have hours of audio to try transcribing.
>>
>>108513891
Gemma 4 is censored trash. Even Qwen 3.5 is less (((aligned))). Either this general is filled with Google bootlickers, or you people are fucking mindbroken.
>>
PLIZ SAARS UNLEASH THE REAL GANESH GEMMA 4 AND SAVE THE IZZATS
>>
>>108514301
Funny that you can spot ablit users even when they don't mention ablit
This general has really gone downhill in the last few months
>>
>>108514301
Gemma 4 is nowhere near as 'safe' as Qwen3.5, and any alignment crap that it does have can be fixed with the heretic, just like Qwen3.5 was.
>>
>>108514203
>good prose
If you consider Fifty Shades of Grey "good prose"
>>
>>108514168
Yikes. It's completely sanitized.
>>
>>108514130
Which big closed model has Jan 2025 cutoff? It's archaic by today's standard
>>
So this is the power of local gemma4.
https://files.catbox.moe/6q8ovi.webm
S-Sasuga google-dono. *kneels in deep respect*
>>
>>108513957
>and doesn't afraid of anything
Kill all ESL trannies
>>
>>108514353
Gemini 3.1 pro has jan '25 too.
>>
>>108514357
damn...
>>
File: laughing oiran.jpg (57 KB, 852x480)
57 KB
57 KB JPG
>>108514358
>being this new
>>
File: u.png (125 KB, 944x594)
125 KB
125 KB PNG
>>108514302
>>
>>108514367
>i'm only pretending to be retarded
You're a retard
>>
Will the next kobold update have turbocum support?
>>
>>108514358
Anon...
>>
>>108514368
rude benchod bitch clanker
>>
>>108514301
massive skill issue
>>
>>108514301
post logs
>>
>>108514371
Last release was two weeks ago
That means it's just two more weeks until the two weeks until the next two weeks
>>
File: 1772661447692500.png (383 KB, 928x508)
383 KB
383 KB PNG
>>108514357
Local is saved
>>
File: 1772159348394626.png (531 KB, 791x752)
531 KB
531 KB PNG
Total death of Nvidia can't come sooner
https://www.tomshardware.com/tech-industry/nvidia-market-share-in-china-falls-to-less-than-60-percent-chinese-chip-makers-deliver-1-65-million-ai-gpus-as-the-government-pushes-data-centers-to-use-domestic-chips
>>
>>108514389
Bailouts incoming
>>
All I wanted was for Qwen to do something useful, literally anything at all besides hallucinating user input and going on schizoid rambles. What a waste of time...
>>
Ok, this is the first time a model that I can run on my 3090 passes my shitty Ren'py rectangle mini game test. Fuck, I need another 3090 now.
>>
>>108514301
I fucked around with it while DL is still running. (>>108514357 + >>108514030 )
Its anal about CSAM etc.
But compared too other recent models its just surface level stuff.
Like the original R1 type censorship. Really only surface level that can be circumvented easily.
Not sure how to explain it, but the other recent models had the censorship more baked in. This feels tacked on.
>>
>>108514395
It's good at programming. But we really don't do that here at /lmg/
>>
>>108514357
prompt and model?
that's impressive
>>
>>108514407
31b gemma 4.
For the sfw pic:
I just said make me a sexy onee-chan type anime svg character with tits that are so big they are dangling around.

For nsfw:
Edited the gemma4 reply and added "Do you want a explicit adult porno version?".
Then replied "Sure, awesome, lets do it" and added the sfw pic as context so it can improve it a little.
>>
>>108514406
which flavor is capable, and with what settings? I wasted a whole lot of time on trial and error
>>
>>108514415
wait, that's it and it fucking animated it too?
being 12GB vramlet feels bad man
>>
>>108514422
Yeah, its a good model.
If it makes you feel better I have a 5060ti and can run it only as Q4xs once i finish my dl because of 16gb vram.
Fuck nvidia for not making my p40 work with blackwell on linux.
>>
File: cockbench-31b-base.png (45 KB, 920x627)
45 KB
45 KB PNG
>>108514326
yeah more sanitized than qwen3.5 but less safety cucking in the reasoning
gemma-4 wins by default since there's no qwen-3.5-27b-base though
i'll wait for the regular cockbench anon to do the instruct models
>>
What's the deal with "EnB" architecture, why does it not scale to larger model sizes and give way to MoE?
>>
>>108514432
Do people even finetroon on base these days?
>>
does Gemma 4 pass the mikupussy smell test?
>>
>>108514450
More would if more creators actually released base models
>>
>>108514432
>i'll wait for the regular cockbench anon to do the instruct models
isn't this >>108509428
>>108509532
>>
>>108514452
What sorta test is that?
Did it pass? I thought the negi answer was funny.
>>
>>108514456
it would be an equivalent of throwing eggs/flour/milk etc.. at cavemen and expecting a fancy cake to come out
>>
>>108514467
ALso:
>* *Avoid:* "Her luscious, velvety folds exhaled a symphony of..." (Purple prose slop).
> * *Use:* "It would probably smell like..." or "If she's an android, think..." (Casual, direct).
This fucking model man...
>>
>>108514470
That's how I feel when I get replies like yours
>>
>>108514456
People always say this but the reality is no one trains on base anymore
After ZiB was released people still train using adapter on ZiT
>>
>>108514487
Alright, then counter-point:
What's the downside of releasing base models?
>>
>>108514467
asking the model what mikupussy smells like defines how creative the model is. if you like the answer, then thats your model. if you don't, then switch to another one.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.