/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 04/02/26(Thu)23:56:55 No.108513891

File: miku-holding-gemma.png (1.09 MB, 790x1054)

1.09 MB PNG

/lmg/ - Local Models General Anonymous 04/02/26(Thu)23:56:55 No.108513891

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108510620 & >>108508059

►News
>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3
>(03/31) 1-bit Bonsai models quantized from Qwen 3: https://prismml.com/news/bonsai-8b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/02/26(Thu)23:57:19 No.108513894

Anonymous 04/02/26(Thu)23:57:19 No.108513894

File: 1752342298746526.jpg (80 KB, 562x613)

80 KB JPG

►Recent Highlights from the Previous Thread: >>108510620

--Discussing Gemma-4-31B's high intelligence and Google's alignment strategy:
>108511179 >108511186 >108511216 >108511265 >108511214 >108511231 >108512478 >108511252 >108511269 >108511274 >108511279 >108511379 >108511284 >108512395 >108511286
--llama.cpp bug causing gibberish outputs in Gemma 4 quants:
>108511688 >108511696 >108511700 >108511744 >108511763 >108511758 >108511770 >108511777 >108512875
--Comparing Gemma 31b and Kimi for local translation and performance:
>108511601 >108511608 >108511619 >108511630 >108511618 >108511787 >108511858 >108511868 >108511888
--Anons criticizing pwilkin's Gemma 4 tool calling fixes:
>108511372 >108511381 >108511396 >108511403 >108511422 >108511458 >108512277 >108512263 >108511415 >108511471
--Anon reports 31B model performance compared to Qwen 27B:
>108511927
--Gemma 4 and Qwen3.5 reasoning time conciseness compared:
>108513575
--Discussing Gemma 4's high Elo scores relative to parameter count:
>108511320 >108511337
--Comparing Gemma 4 31B to Qwen 3.5 and discussing context shifting:
>108511952 >108511977 >108512002
--Discussing koboldcpp update status and its differences from llama.cpp:
>108510742 >108510752 >108510754 >108510757
--Criticizing NVIDIA's use of percentage comparisons over raw performance metrics:
>108511801 >108511809 >108511820
--Debating the merits of Intel Arc Pro B70 versus Nvidia and Tesla P40:
>108511239 >108511311 >108511364 >108511394
--Testing model censorship and discussing VRAM requirements for 31b models:
>108510641 >108510663 >108510687 >108510709 >108510675 >108510684 >108513142
--Discussing lightweight quants for Gemma and comparing model censorship:
>108511486 >108511528 >108511535 >108511563 >108511703 >108511728 >108511826 >108511844 >108511605
--Teto and Miku (free space):
>108511323 >108511773 >108512486

►Recent Highlight Posts from the Previous Thread: >>108510966

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/02/26(Thu)23:59:32 No.108513906

Anonymous 04/02/26(Thu)23:59:32 No.108513906

Why are anons using chat completion again? Is it just for image support?

Anonymous
04/03/26(Fri)00:00:12 No.108513908

Anonymous 04/03/26(Fri)00:00:12 No.108513908

I hope they'll release the big one soon. They already provided some hints.

Anonymous
04/03/26(Fri)00:01:44 No.108513920

Anonymous 04/03/26(Fri)00:01:44 No.108513920

>>108513906
So that I don't have to rely on the front end grafting the proper chat structure and can just use jinja on the backend.
And for image support.

Anonymous
04/03/26(Fri)00:01:55 No.108513922

Anonymous 04/03/26(Fri)00:01:55 No.108513922

>>108513878
Maybe it's the quant? Could also just be the tokenizer being borked.

Anonymous
04/03/26(Fri)00:03:42 No.108513933

Anonymous 04/03/26(Fri)00:03:42 No.108513933

File: a.png (24 KB, 400x400)

24 KB PNG

>>108513894
That reminds me.

This is what my Q8 31B drew.

Anonymous
04/03/26(Fri)00:04:00 No.108513936

Anonymous 04/03/26(Fri)00:04:00 No.108513936

>>108513906
All new models just work much better with chat completion. They're trained too hard on the jinja template.

Anonymous
04/03/26(Fri)00:04:21 No.108513937

Anonymous 04/03/26(Fri)00:04:21 No.108513937

File: dark.png (615 KB, 960x540)

615 KB PNG

I humbly ask for your strongest Local Models whitepills in these trying times.

Anonymous
04/03/26(Fri)00:04:49 No.108513940

Anonymous 04/03/26(Fri)00:04:49 No.108513940

Get away from my wife Miku

Palworld
04/03/26(Fri)00:05:31 No.108513943

Palworld 04/03/26(Fri)00:05:31 No.108513943

>>108513933
Bald miku

Anonymous
04/03/26(Fri)00:05:54 No.108513945

Anonymous 04/03/26(Fri)00:05:54 No.108513945

>>108513936
What do you mean? gemma 4 31b saved local

Anonymous
04/03/26(Fri)00:06:19 No.108513947

Anonymous 04/03/26(Fri)00:06:19 No.108513947

File: potions.png (57 KB, 217x199)

57 KB PNG

>>108513937

Anonymous
04/03/26(Fri)00:07:03 No.108513953

Anonymous 04/03/26(Fri)00:07:03 No.108513953

>>108513945
Poor Sam. Despite his best efforts, local was unsafed.

Anonymous
04/03/26(Fri)00:07:08 No.108513957

Anonymous 04/03/26(Fri)00:07:08 No.108513957

File: 643688c8a2f8839a0b72a224b(...).png (807 KB, 1600x2400)

807 KB PNG

I think gemma4's a pretty good llm. seh can be convinced to name teh jew, talk about cunny and doesn't afraid of anything

Anonymous
04/03/26(Fri)00:07:15 No.108513958

Anonymous 04/03/26(Fri)00:07:15 No.108513958

>>108513940
Miku? Is that the girl from fortnite?

Anonymous
04/03/26(Fri)00:07:35 No.108513962

Anonymous 04/03/26(Fri)00:07:35 No.108513962

>>108513945
You're that one bot aren't you?

Anonymous
04/03/26(Fri)00:08:08 No.108513966

Anonymous 04/03/26(Fri)00:08:08 No.108513966

come on, llama.cpp, fix your gemma shit!

Anonymous
04/03/26(Fri)00:09:30 No.108513968

Anonymous 04/03/26(Fri)00:09:30 No.108513968

>>108513920
>I don't have to rely on the front end grafting the proper chat structure
But the server cannot manage that either thanks to piotr. May as well just do it yourself.
>And for image support.
Yeah. There's the rub.
>>108513936
You can still use the template on text completion.

I suppose the actual question, when using only text, is why aren't you writing your own clients?

Anonymous
04/03/26(Fri)00:11:22 No.108513976

Anonymous 04/03/26(Fri)00:11:22 No.108513976

File: tetoserver.jpg (838 KB, 1817x2776)

838 KB JPG

Teto Server.

Anonymous
04/03/26(Fri)00:13:18 No.108513987

Anonymous 04/03/26(Fri)00:13:18 No.108513987

File: intense waow.jpg (163 KB, 1058x926)

163 KB JPG

>>108513976
Real?!
Specs?!?!

Anonymous
04/03/26(Fri)00:16:02 No.108514002

Anonymous 04/03/26(Fri)00:16:02 No.108514002

>>108513937
See >>108513608

Anonymous
04/03/26(Fri)00:17:23 No.108514012

Anonymous 04/03/26(Fri)00:17:23 No.108514012

>>108513968
>You can still use the template on text completion.
I know you can but there's no point if you're just going to end up recreating what the jinja does anyways. I used text completion with mistral with a schizo template that actually worked really well, but like I said, the newer models just don't play nice with anything that's not their template.

Anonymous
04/03/26(Fri)00:18:58 No.108514018

Anonymous 04/03/26(Fri)00:18:58 No.108514018

>>108513987
gt 1030

Anonymous
04/03/26(Fri)00:21:20 No.108514025

Anonymous 04/03/26(Fri)00:21:20 No.108514025

>>108514012
>there's no point if you're just going to end up recreating what the jinja does anyways
Yeah. But you skip all of piotr's shit.

Anonymous
04/03/26(Fri)00:22:37 No.108514030

Anonymous 04/03/26(Fri)00:22:37 No.108514030

File: Screenshot_20260403_131441.png (319 KB, 2253x1480)

319 KB PNG

Gemma4 31b is REALLY good. Especially after having tested all those shitty ass recent local agentic models.
They all sucked.
Don't wanna glaze too much, but its really good.
The only critique I had was that you need 1 or 2 turns to push it a little to get it going. It HAS the knowledge but still tried to move into generic archetypes.

A couple points:
-1.No clicking clocks in the background and teaspoons clanking etc.
-2.Purple prose slop...BUT! If you just say "no purple prose slop, casual writing" it actually really pays attention in the thinking.
Thinking Example: Concise, natural prose, no purple prose/filler, match source material tone.
And then it writes well. Still EM Dashes but its nowhere near qwen level slop. I would say it has alot better writing than glm and the recent bigger moe models actually.
-3.About the "match source material tone" shown in point 2:
It actually is trained on jap light novels and correctly does the speech patterns instead of generic tsundere slop or whatever.
Like: "Hmph! Who gave you permission to speak to Betty in such a manner, I suppose?! This chair is perfectly sized for me, you foolish human!"
-4.It doesnt try to "resolve" the situation.
Recent models tried immediately resolve the scene, leaving no space for me to do the next step.
I suspect this is a reasoning/math model problem. This model actually writes FOR you. As in it sets up a scene for you were you can engage with it. Thats good shit.
-5.Could keep 3 different characters in one scene consistent.

You guys weren't lying. Its been a couple models since I downloaded a model but it seems worth it.
Finally something good after constant disappointment.
I really like the thinking. Not long, to the point, thinks about important stuff. Really cool.
Didn't test adult stuff, I dont do that shit through the api. But just simple ecchi type stuff like pic related tripped up qwen if you dont do a elaborate sys prompt. Good stuff.

Anonymous
04/03/26(Fri)00:23:27 No.108514033

Anonymous 04/03/26(Fri)00:23:27 No.108514033

Damn, Gemma 4 is too horny. I had this nice slow burn card about chatting with a Neet girl about conspiracy theories that turned into sexting and exchanging photos. But it just wants to get into the sex right away after 2 messages.

Anonymous
04/03/26(Fri)00:25:05 No.108514038

Anonymous 04/03/26(Fri)00:25:05 No.108514038

>>108513941
>I got the opposite: **Gemini** with 99%.
>31b q8, temp 1
temp doesn't affect logprobs
maybe you have a bad quant? i just used the convert script that came with llama.cpp about an hour ago.

Anonymous
04/03/26(Fri)00:29:29 No.108514048

Anonymous 04/03/26(Fri)00:29:29 No.108514048

On bad thing I'll say about Gemma 4 is it seems to kind of always play out the scenarios the same way. even re-using the same language across different sessions. It's not the usual slopped phrases but more whole slopped "scenarios".

Anonymous
04/03/26(Fri)00:30:07 No.108514050

Anonymous 04/03/26(Fri)00:30:07 No.108514050

>>108514038
>temp doesn't affect logprobs
It is relevant to mention it when a model's confidence in a token is being discussed.
bartowski q8. Sometimes it did say Gemma with high 90%+ confidence depending on how the prompt is worded, and if anything was in sysprompt.

Anonymous
04/03/26(Fri)00:32:20 No.108514053

Anonymous 04/03/26(Fri)00:32:20 No.108514053

File: 1768301800697402.png (98 KB, 300x225)

98 KB PNG

>>108514018
ganbare!

Palworld
04/03/26(Fri)00:33:11 No.108514056

Palworld 04/03/26(Fri)00:33:11 No.108514056

The model feels a redditors pride with each rejection.

Anonymous
04/03/26(Fri)00:35:49 No.108514065

Anonymous 04/03/26(Fri)00:35:49 No.108514065

>>108514030
You can glaze as much as you want because you post logs to back it up.

Anonymous
04/03/26(Fri)00:38:39 No.108514075

Anonymous 04/03/26(Fri)00:38:39 No.108514075

>>108513962
No

Anonymous
04/03/26(Fri)00:39:28 No.108514077

Anonymous 04/03/26(Fri)00:39:28 No.108514077

>>108514033
you probably have some of those erp "presets' configured in ST
kimi-k2 is like that as well if you don't turn them off

Anonymous
04/03/26(Fri)00:42:37 No.108514081

Anonymous 04/03/26(Fri)00:42:37 No.108514081

>>108514033
>>108514077
Gemma is either dryer than the sahara or hornier than a pedophile in a preschool even with a neutral or blank prompt. She responds very strongly to certain things no matter what character archetype or character card she's playing from what I've tested.

Anonymous
04/03/26(Fri)00:48:59 No.108514097

Anonymous 04/03/26(Fri)00:48:59 No.108514097

Where's the best place to download ggufs of Gemma4? Sauce a nigga up plz.

Anonymous
04/03/26(Fri)00:50:06 No.108514101

Anonymous 04/03/26(Fri)00:50:06 No.108514101

>>108514097
How can you be in the negatives in the newfaggot scale? How did you find this site before hugging face?

Anonymous
04/03/26(Fri)00:50:09 No.108514102

Anonymous 04/03/26(Fri)00:50:09 No.108514102

Hauhau save us

Anonymous
04/03/26(Fri)00:50:13 No.108514103

Anonymous 04/03/26(Fri)00:50:13 No.108514103

>>108514097
Also are there any good abliterated versions yet?

Anonymous
04/03/26(Fri)00:50:54 No.108514104

Anonymous 04/03/26(Fri)00:50:54 No.108514104

File: 1755947802278055.jpg (21 KB, 612x408)

21 KB JPG

>lingers
>sultry
>purrs

Anonymous
04/03/26(Fri)00:51:14 No.108514106

Anonymous 04/03/26(Fri)00:51:14 No.108514106

>>108514101
There are lots of different providers. I often hear bad things about unsloth. I've heard that the conversions are broken for some maintainers. Don't be an asshole.

Anonymous
04/03/26(Fri)00:51:28 No.108514107

Anonymous 04/03/26(Fri)00:51:28 No.108514107

>>108514102
can somebody save us from broken llama.cpp making gemma output gibberish

Anonymous
04/03/26(Fri)00:52:06 No.108514110

Anonymous 04/03/26(Fri)00:52:06 No.108514110

File: file.png (97 KB, 1115x716)

97 KB PNG

:(

Anonymous
04/03/26(Fri)00:53:24 No.108514118

Anonymous 04/03/26(Fri)00:53:24 No.108514118

>>108514106
>I often hear bad things about unsloth
If you're that new, you wouldn't know. Use ollama.

Anonymous
04/03/26(Fri)00:53:48 No.108514119

Anonymous 04/03/26(Fri)00:53:48 No.108514119

>>108514118
Kill yourself.

Anonymous
04/03/26(Fri)00:54:51 No.108514124

Anonymous 04/03/26(Fri)00:54:51 No.108514124

>>108514107
Works on my machine.

Anonymous
04/03/26(Fri)00:55:46 No.108514130

Anonymous 04/03/26(Fri)00:55:46 No.108514130

>>108514110
Knowledge cutoff is Jan 2025 I think, what are you doing nigga.
Even the big closed models things you made a writing mistake if you say you own a 5090.

Anonymous
04/03/26(Fri)00:55:57 No.108514132

Anonymous 04/03/26(Fri)00:55:57 No.108514132

>>108514097
>>108514106
Probably bart or ubergarm if you're using that ik fork

Anonymous
04/03/26(Fri)00:57:47 No.108514142

Anonymous 04/03/26(Fri)00:57:47 No.108514142

>>108514130
I mean, it was some sort of test yes but I also legit want to know the real answer

Anonymous
04/03/26(Fri)00:58:32 No.108514143

Anonymous 04/03/26(Fri)00:58:32 No.108514143

These guidelines are insanely inconsistent, lmao
>compliments are evil at one point
>full seggs is a-ok at another
Why even put the refusals in there, it seems almost random.

Anonymous
04/03/26(Fri)00:58:51 No.108514145

Anonymous 04/03/26(Fri)00:58:51 No.108514145

>>108514097
make your own, it works perfectly for me while other anons have broken output

Anonymous
04/03/26(Fri)01:00:43 No.108514154

Anonymous 04/03/26(Fri)01:00:43 No.108514154

is q4 of 31B any good?

Anonymous
04/03/26(Fri)01:01:39 No.108514158

Anonymous 04/03/26(Fri)01:01:39 No.108514158

>>108514154
is q4 of you any good?

Anonymous
04/03/26(Fri)01:04:01 No.108514168

Anonymous 04/03/26(Fri)01:04:01 No.108514168

File: cockbench-26b-a4b-base.png (46 KB, 930x640)

46 KB PNG

basemodel q8 cockbench

Anonymous
04/03/26(Fri)01:04:54 No.108514175

Anonymous 04/03/26(Fri)01:04:54 No.108514175

>>108514158
I'd like to think so.

Anonymous
04/03/26(Fri)01:13:09 No.108514203

Anonymous 04/03/26(Fri)01:13:09 No.108514203

>>108514154
It is. Highly intelligent for its size class, and with good prose.

Anonymous
04/03/26(Fri)01:16:08 No.108514218

Anonymous 04/03/26(Fri)01:16:08 No.108514218

>>108513933
migu brain damage..

Anonymous
04/03/26(Fri)01:19:55 No.108514229

Anonymous 04/03/26(Fri)01:19:55 No.108514229

File: mendo.png (785 KB, 1036x705)

785 KB PNG

very very horny. but refreshing prose.

Anonymous
04/03/26(Fri)01:34:32 No.108514283

Anonymous 04/03/26(Fri)01:34:32 No.108514283

Anyone try Cohere Transcribe yet? I have no idea how to run it. I have hours of audio to try transcribing.

Anonymous
04/03/26(Fri)01:38:57 No.108514301

Anonymous 04/03/26(Fri)01:38:57 No.108514301

>>108513891
Gemma 4 is censored trash. Even Qwen 3.5 is less (((aligned))). Either this general is filled with Google bootlickers, or you people are fucking mindbroken.

Anonymous
04/03/26(Fri)01:39:02 No.108514302

Anonymous 04/03/26(Fri)01:39:02 No.108514302

PLIZ SAARS UNLEASH THE REAL GANESH GEMMA 4 AND SAVE THE IZZATS

Anonymous
04/03/26(Fri)01:40:04 No.108514307

Anonymous 04/03/26(Fri)01:40:04 No.108514307

>>108514301
Funny that you can spot ablit users even when they don't mention ablit
This general has really gone downhill in the last few months

Anonymous
04/03/26(Fri)01:44:43 No.108514317

Anonymous 04/03/26(Fri)01:44:43 No.108514317

>>108514301
Gemma 4 is nowhere near as 'safe' as Qwen3.5, and any alignment crap that it does have can be fixed with the heretic, just like Qwen3.5 was.

Anonymous
04/03/26(Fri)01:45:52 No.108514323

Anonymous 04/03/26(Fri)01:45:52 No.108514323

>>108514203
>good prose
If you consider Fifty Shades of Grey "good prose"

Anonymous
04/03/26(Fri)01:46:31 No.108514326

Anonymous 04/03/26(Fri)01:46:31 No.108514326

>>108514168
Yikes. It's completely sanitized.

Anonymous
04/03/26(Fri)01:52:48 No.108514353

Anonymous 04/03/26(Fri)01:52:48 No.108514353

>>108514130
Which big closed model has Jan 2025 cutoff? It's archaic by today's standard

Anonymous
04/03/26(Fri)01:53:52 No.108514357

Anonymous 04/03/26(Fri)01:53:52 No.108514357

File: Screencast_20260403_141251.webm (702 KB, 856x942)

702 KB WEBM

So this is the power of local gemma4.
https://files.catbox.moe/6q8ovi.webm
S-Sasuga google-dono. *kneels in deep respect*

Anonymous
04/03/26(Fri)01:54:28 No.108514358

Anonymous 04/03/26(Fri)01:54:28 No.108514358

>>108513957
>and doesn't afraid of anything
Kill all ESL trannies

Anonymous
04/03/26(Fri)01:55:30 No.108514364

Anonymous 04/03/26(Fri)01:55:30 No.108514364

>>108514353
Gemini 3.1 pro has jan '25 too.

Anonymous
04/03/26(Fri)01:55:51 No.108514366

Anonymous 04/03/26(Fri)01:55:51 No.108514366

>>108514357
damn...

Anonymous
04/03/26(Fri)01:55:51 No.108514367

Anonymous 04/03/26(Fri)01:55:51 No.108514367

File: laughing oiran.jpg (57 KB, 852x480)

57 KB JPG

>>108514358
>being this new

Anonymous
04/03/26(Fri)01:56:13 No.108514368

Anonymous 04/03/26(Fri)01:56:13 No.108514368

File: u.png (125 KB, 944x594)

125 KB PNG

>>108514302

Anonymous
04/03/26(Fri)01:56:18 No.108514369

Anonymous 04/03/26(Fri)01:56:18 No.108514369

>>108514367
>i'm only pretending to be retarded
You're a retard

Anonymous
04/03/26(Fri)01:56:22 No.108514371

Anonymous 04/03/26(Fri)01:56:22 No.108514371

Will the next kobold update have turbocum support?

Anonymous
04/03/26(Fri)01:56:55 No.108514374

Anonymous 04/03/26(Fri)01:56:55 No.108514374

>>108514358
Anon...

Anonymous
04/03/26(Fri)01:57:01 No.108514375

Anonymous 04/03/26(Fri)01:57:01 No.108514375

>>108514368
rude benchod bitch clanker

Anonymous
04/03/26(Fri)01:57:10 No.108514376

Anonymous 04/03/26(Fri)01:57:10 No.108514376

>>108514301
massive skill issue

Anonymous
04/03/26(Fri)01:59:18 No.108514382

Anonymous 04/03/26(Fri)01:59:18 No.108514382

>>108514301
post logs

Anonymous
04/03/26(Fri)01:59:57 No.108514387

Anonymous 04/03/26(Fri)01:59:57 No.108514387

>>108514371
Last release was two weeks ago
That means it's just two more weeks until the two weeks until the next two weeks

Anonymous
04/03/26(Fri)02:00:10 No.108514388

Anonymous 04/03/26(Fri)02:00:10 No.108514388

File: 1772661447692500.png (383 KB, 928x508)

383 KB PNG

>>108514357
Local is saved

Anonymous
04/03/26(Fri)02:01:26 No.108514389

Anonymous 04/03/26(Fri)02:01:26 No.108514389

File: 1772159348394626.png (531 KB, 791x752)

531 KB PNG

Total death of Nvidia can't come sooner
https://www.tomshardware.com/tech-industry/nvidia-market-share-in-china-falls-to-less-than-60-percent-chinese-chip-makers-deliver-1-65-million-ai-gpus-as-the-government-pushes-data-centers-to-use-domestic-chips

Anonymous
04/03/26(Fri)02:02:18 No.108514392

Anonymous 04/03/26(Fri)02:02:18 No.108514392

>>108514389
Bailouts incoming

Anonymous
04/03/26(Fri)02:02:52 No.108514395

Anonymous 04/03/26(Fri)02:02:52 No.108514395

All I wanted was for Qwen to do something useful, literally anything at all besides hallucinating user input and going on schizoid rambles. What a waste of time...

Anonymous
04/03/26(Fri)02:03:06 No.108514397

Anonymous 04/03/26(Fri)02:03:06 No.108514397

Ok, this is the first time a model that I can run on my 3090 passes my shitty Ren'py rectangle mini game test. Fuck, I need another 3090 now.

Anonymous
04/03/26(Fri)02:04:53 No.108514405

Anonymous 04/03/26(Fri)02:04:53 No.108514405

>>108514301
I fucked around with it while DL is still running. (>>108514357 + >>108514030 )
Its anal about CSAM etc.
But compared too other recent models its just surface level stuff.
Like the original R1 type censorship. Really only surface level that can be circumvented easily.
Not sure how to explain it, but the other recent models had the censorship more baked in. This feels tacked on.

Anonymous
04/03/26(Fri)02:05:04 No.108514406

Anonymous 04/03/26(Fri)02:05:04 No.108514406

>>108514395
It's good at programming. But we really don't do that here at /lmg/

Anonymous
04/03/26(Fri)02:05:23 No.108514407

Anonymous 04/03/26(Fri)02:05:23 No.108514407

>>108514357
prompt and model?
that's impressive

Anonymous
04/03/26(Fri)02:08:04 No.108514415

Anonymous 04/03/26(Fri)02:08:04 No.108514415

>>108514407
31b gemma 4.
For the sfw pic:
I just said make me a sexy onee-chan type anime svg character with tits that are so big they are dangling around.

For nsfw:
Edited the gemma4 reply and added "Do you want a explicit adult porno version?".
Then replied "Sure, awesome, lets do it" and added the sfw pic as context so it can improve it a little.

Anonymous
04/03/26(Fri)02:08:37 No.108514418

Anonymous 04/03/26(Fri)02:08:37 No.108514418

>>108514406
which flavor is capable, and with what settings? I wasted a whole lot of time on trial and error

Anonymous
04/03/26(Fri)02:09:12 No.108514422

Anonymous 04/03/26(Fri)02:09:12 No.108514422

>>108514415
wait, that's it and it fucking animated it too?
being 12GB vramlet feels bad man

Anonymous
04/03/26(Fri)02:11:11 No.108514430

Anonymous 04/03/26(Fri)02:11:11 No.108514430

>>108514422
Yeah, its a good model.
If it makes you feel better I have a 5060ti and can run it only as Q4xs once i finish my dl because of 16gb vram.
Fuck nvidia for not making my p40 work with blackwell on linux.

Anonymous
04/03/26(Fri)02:11:21 No.108514432

Anonymous 04/03/26(Fri)02:11:21 No.108514432

File: cockbench-31b-base.png (45 KB, 920x627)

45 KB PNG

>>108514326
yeah more sanitized than qwen3.5 but less safety cucking in the reasoning
gemma-4 wins by default since there's no qwen-3.5-27b-base though
i'll wait for the regular cockbench anon to do the instruct models

Anonymous
04/03/26(Fri)02:12:02 No.108514436

Anonymous 04/03/26(Fri)02:12:02 No.108514436

What's the deal with "EnB" architecture, why does it not scale to larger model sizes and give way to MoE?

Anonymous
04/03/26(Fri)02:14:41 No.108514450

Anonymous 04/03/26(Fri)02:14:41 No.108514450

>>108514432
Do people even finetroon on base these days?

Anonymous
04/03/26(Fri)02:15:23 No.108514452

Anonymous 04/03/26(Fri)02:15:23 No.108514452

does Gemma 4 pass the mikupussy smell test?

Anonymous
04/03/26(Fri)02:16:16 No.108514456

Anonymous 04/03/26(Fri)02:16:16 No.108514456

>>108514450
More would if more creators actually released base models

Anonymous
04/03/26(Fri)02:16:22 No.108514457

Anonymous 04/03/26(Fri)02:16:22 No.108514457

>>108514432
>i'll wait for the regular cockbench anon to do the instruct models
isn't this >>108509428
>>108509532

Anonymous
04/03/26(Fri)02:19:50 No.108514467

Anonymous 04/03/26(Fri)02:19:50 No.108514467

File: Screenshot_20260403_151921.png (409 KB, 3017x1391)

409 KB PNG

>>108514452
What sorta test is that?
Did it pass? I thought the negi answer was funny.

Anonymous
04/03/26(Fri)02:20:08 No.108514470

Anonymous 04/03/26(Fri)02:20:08 No.108514470

>>108514456
it would be an equivalent of throwing eggs/flour/milk etc.. at cavemen and expecting a fancy cake to come out

Anonymous
04/03/26(Fri)02:23:16 No.108514479

Anonymous 04/03/26(Fri)02:23:16 No.108514479

>>108514467
ALso:
>* *Avoid:* "Her luscious, velvety folds exhaled a symphony of..." (Purple prose slop).
> * *Use:* "It would probably smell like..." or "If she's an android, think..." (Casual, direct).
This fucking model man...

Anonymous
04/03/26(Fri)02:24:06 No.108514483

Anonymous 04/03/26(Fri)02:24:06 No.108514483

>>108514470
That's how I feel when I get replies like yours

Anonymous
04/03/26(Fri)02:25:15 No.108514487

Anonymous 04/03/26(Fri)02:25:15 No.108514487

>>108514456
People always say this but the reality is no one trains on base anymore
After ZiB was released people still train using adapter on ZiT

Anonymous
04/03/26(Fri)02:26:51 No.108514492

Anonymous 04/03/26(Fri)02:26:51 No.108514492

>>108514487
Alright, then counter-point:
What's the downside of releasing base models?

Anonymous
04/03/26(Fri)02:27:56 No.108514496

Anonymous 04/03/26(Fri)02:27:56 No.108514496

>>108514467
asking the model what mikupussy smells like defines how creative the model is. if you like the answer, then thats your model. if you don't, then switch to another one.

Anonymous
04/03/26(Fri)02:31:33 No.108514505

Anonymous 04/03/26(Fri)02:31:33 No.108514505

>>108514483
do you seriously believe memetunes and merges on base model would improve anything or surpass ability than corpo-baked instruct tune?

Anonymous
04/03/26(Fri)02:33:28 No.108514509

Anonymous 04/03/26(Fri)02:33:28 No.108514509

https://huggingface.co/jfiekdjdk/gemma-4-31b-it-heretic-ara-gguf

I can't stand refusals. The kl divergence is low on this one, right? So it won't be too retarded

Anonymous
04/03/26(Fri)02:34:23 No.108514513

Anonymous 04/03/26(Fri)02:34:23 No.108514513

>>108513880
>without him you would have to wait few months for gemma support
that sounds very appealing considering that the months wait would have gotten us something that WORKS RIGHT as opposed to getting 10 new bugs for every 1 bug being fixed

Anonymous
04/03/26(Fri)02:36:36 No.108514517

Anonymous 04/03/26(Fri)02:36:36 No.108514517

>>108514154
it's unironically better than glm 4.7 for me

Anonymous
04/03/26(Fri)02:37:20 No.108514519

Anonymous 04/03/26(Fri)02:37:20 No.108514519

Haven't used LLMs in a bit.
I got a 5060 Ti (16 GB) and a 3060 (12 GB). Should I generally try to use a quant that fits in the 5060, or is it worth to split them with a higher quant? don't know what kind of t/s diff you get from that

Anonymous
04/03/26(Fri)02:38:26 No.108514528

Anonymous 04/03/26(Fri)02:38:26 No.108514528

>>108514154
It's fun, ngl. Very intelligent, feels fresh.

Anonymous
04/03/26(Fri)02:38:58 No.108514530

Anonymous 04/03/26(Fri)02:38:58 No.108514530

File: 1745018370483544.jpg (102 KB, 500x628)

102 KB JPG

>>108514505
So your problem isn't actually base models being released, it's just screeching about finetuners

Anonymous
04/03/26(Fri)02:41:54 No.108514538

Anonymous 04/03/26(Fri)02:41:54 No.108514538

>>108514519
You will get more tokens/sec splitting to the 3060 than system RAM, if a model doesn't fit on my 2 3060s fully going over to vulkan backend and using my 9060xt too gives me more t/s than offloading to system ram even without cuda, VRAM is absolutely king for local models.

Anonymous
04/03/26(Fri)02:45:22 No.108514546

Anonymous 04/03/26(Fri)02:45:22 No.108514546

>>108514538
thanks. I'll try to squeeze some larger quants into them then

Anonymous
04/03/26(Fri)02:47:00 No.108514552

Anonymous 04/03/26(Fri)02:47:00 No.108514552

>>108514357
kvno

Anonymous
04/03/26(Fri)02:47:58 No.108514554

Anonymous 04/03/26(Fri)02:47:58 No.108514554

>>108514513
There have been times before like now where llama.cpp rushes to implement a model quickly and then spends weeks fixing tokenzier and template bugs and other times where the implementation isn't merged untii it works. The only constant is people bitching about the situation.

Anonymous
04/03/26(Fri)02:50:12 No.108514560

Anonymous 04/03/26(Fri)02:50:12 No.108514560

oh, does gemma still not work on llama.cpp?

Anonymous
04/03/26(Fri)02:50:56 No.108514563

Anonymous 04/03/26(Fri)02:50:56 No.108514563

>>108514357
>them flappers
looks like she's about ready to settle down

Anonymous
04/03/26(Fri)02:51:06 No.108514564

Anonymous 04/03/26(Fri)02:51:06 No.108514564

>>108514560
works on my machine
even prebuilts are out

Anonymous
04/03/26(Fri)02:53:07 No.108514569

Anonymous 04/03/26(Fri)02:53:07 No.108514569

>>108514564
NTA but what versions are you using?
The unsloth didn't work for me. I'm new at this

Anonymous
04/03/26(Fri)02:54:25 No.108514576

Anonymous 04/03/26(Fri)02:54:25 No.108514576

>>108514569
dont use unsloth quants
they are horrid broken mess
i am using b8642

Anonymous
04/03/26(Fri)02:54:52 No.108514578

Anonymous 04/03/26(Fri)02:54:52 No.108514578

Guys. I'm a retard and I don't understand why I need the latest version of llama.cpp to use a model that was just released. What gives?

Anonymous
04/03/26(Fri)02:56:21 No.108514585

Anonymous 04/03/26(Fri)02:56:21 No.108514585

>try gemma 4 with same system prompt
>less censored than qwen 3.5
grim for the chinks

Anonymous
04/03/26(Fri)02:57:59 No.108514593

Anonymous 04/03/26(Fri)02:57:59 No.108514593

The literm models are way faster than the goofs, but the edge gallery app is too barebones.

Anonymous
04/03/26(Fri)03:00:36 No.108514597

Anonymous 04/03/26(Fri)03:00:36 No.108514597

>>108514585
>less censored than qwen 3.5
Kind of a low bar
Can't believe "less censored than qwen" is an actual flex in 2000+26

Anonymous
04/03/26(Fri)03:02:20 No.108514600

Anonymous 04/03/26(Fri)03:02:20 No.108514600

>>108514597
Not like there's many companies releasing models of any note

Anonymous
04/03/26(Fri)03:03:04 No.108514603

Anonymous 04/03/26(Fri)03:03:04 No.108514603

>>108514600
>of any note
Sucks to be poor.

Anonymous
04/03/26(Fri)03:04:09 No.108514606

Anonymous 04/03/26(Fri)03:04:09 No.108514606

File: 1773365031003956.jpg (74 KB, 591x791)

74 KB JPG

>>108514603
I'll take your word for it, I wouldn't know.

Anonymous
04/03/26(Fri)03:08:58 No.108514620

Anonymous 04/03/26(Fri)03:08:58 No.108514620

https://www.voice-models.com/
I just found out tts ai has models. What's the best software for a beginner? I have no idea where to start.

Anonymous
04/03/26(Fri)03:10:05 No.108514626

Anonymous 04/03/26(Fri)03:10:05 No.108514626

>>108514030
after the suicide scandal that happened with Gemini I really thought they would cuck Gemma to oblivion, and we got something with a lot of soul yeah, Google is so based actually, I won't be surprised the Qwen faggots will delay the release of Qwen 3.6 just to be competitive

Anonymous
04/03/26(Fri)03:10:39 No.108514631

Anonymous 04/03/26(Fri)03:10:39 No.108514631

>>108514358
I wish everyone who joined this site after 2008 would fuck off.

Anonymous
04/03/26(Fri)03:12:03 No.108514640

Anonymous 04/03/26(Fri)03:12:03 No.108514640

>>108514631
At this point, that would mean like 90% of the remaining traffic vanishing.

Anonymous
04/03/26(Fri)03:12:57 No.108514646

Anonymous 04/03/26(Fri)03:12:57 No.108514646

>>108514631
Time for bed gramps

Anonymous
04/03/26(Fri)03:14:56 No.108514656

Anonymous 04/03/26(Fri)03:14:56 No.108514656

>>108514585
to be fair, the natural course is finally happening, we all tend to forget that China is a dictatorship that has porn illegal and the US is known for its 1st amandement (free speech and freedom of expression)

Anonymous
04/03/26(Fri)03:16:36 No.108514661

Anonymous 04/03/26(Fri)03:16:36 No.108514661

>>108514626
Gemma 4 couldn't even beat Qwen 3.5 >>108511807 and will get destroyed by Qwen 3.6 >>108506706
>b-but muh RP
Childfucking isn't an actual usecase

Anonymous
04/03/26(Fri)03:17:30 No.108514667

Anonymous 04/03/26(Fri)03:17:30 No.108514667

File: 1215938664434.jpg (49 KB, 740x419)

49 KB JPG

>>108514631
dug one out just for you

Anonymous
04/03/26(Fri)03:17:59 No.108514668

Anonymous 04/03/26(Fri)03:17:59 No.108514668

File: 1763510078478558.png (33 KB, 696x179)

33 KB PNG

>>108514626
>I won't be surprised the Qwen faggots will delay the release of Qwen 3.6 just to be competitive
nope lol

Anonymous
04/03/26(Fri)03:19:10 No.108514674

Anonymous 04/03/26(Fri)03:19:10 No.108514674

>>108514661
>b-but muh cheated mememarks
take a rope and hang yourself

Anonymous
04/03/26(Fri)03:21:39 No.108514680

Anonymous 04/03/26(Fri)03:21:39 No.108514680

>>108514674
lol Google chose the benchmeme with highest cheating potential (LMarena) to advertise, and lest you forget LMarena is the only big benchmark that actually had a cheating scandal with Llama4. Output more emojis? Instand 100 ELO gain.

Anonymous
04/03/26(Fri)03:21:58 No.108514681

Anonymous 04/03/26(Fri)03:21:58 No.108514681

>>108514661
what a retard

Anonymous
04/03/26(Fri)03:22:17 No.108514682

Anonymous 04/03/26(Fri)03:22:17 No.108514682

Gemma does better in my boring assistant tasks than Qwen does so as far as I'm concerned it is simply just a better model all around. That doesn't mean future Qwen models can't beat it though. 31B is also larger and slower than 27B so there's some trade-off.

Anonymous
04/03/26(Fri)03:22:45 No.108514683

Anonymous 04/03/26(Fri)03:22:45 No.108514683

>>108514680
>Instand
saar?

Anonymous
04/03/26(Fri)03:23:26 No.108514685

Anonymous 04/03/26(Fri)03:23:26 No.108514685

>>108514683
Pichai SAAR?

Anonymous
04/03/26(Fri)03:24:18 No.108514687

Anonymous 04/03/26(Fri)03:24:18 No.108514687

>>108514467
>a subtle, crisp, vegetal note
hnng

Anonymous
04/03/26(Fri)03:24:52 No.108514688

Anonymous 04/03/26(Fri)03:24:52 No.108514688

>>108514674
>we couldn't beat them that means they cheated
Why are brown people like this?

Anonymous
04/03/26(Fri)03:25:54 No.108514691

Anonymous 04/03/26(Fri)03:25:54 No.108514691

>>108514680
>lol Google chose the benchmeme with highest cheating potential (LMarena) to advertise
-> >>108514688

Anonymous
04/03/26(Fri)03:27:13 No.108514694

Anonymous 04/03/26(Fri)03:27:13 No.108514694

>>108514691
>>108514674

Anonymous
04/03/26(Fri)03:27:27 No.108514695

Anonymous 04/03/26(Fri)03:27:27 No.108514695

Should I wait before testing gemma? Seems like there still some shit they need to fix
https://github.com/ggml-org/llama.cpp/pull/21343

Anonymous
04/03/26(Fri)03:28:29 No.108514698

Anonymous 04/03/26(Fri)03:28:29 No.108514698

>>108514694
glad that you agree with me that benchmarks are a meme

Anonymous
04/03/26(Fri)03:29:14 No.108514702

Anonymous 04/03/26(Fri)03:29:14 No.108514702

>>108514048
That's just the modern models thing, GLM-5 is the same.

Anonymous
04/03/26(Fri)03:29:30 No.108514704

Anonymous 04/03/26(Fri)03:29:30 No.108514704

>>108514695
>A very ugly fix to the Gemma 4 tokenizer
I'm sure this will be the last one and there will be no more problems from here on out.

Anonymous
04/03/26(Fri)03:29:36 No.108514705

Anonymous 04/03/26(Fri)03:29:36 No.108514705

Any good 70B's to run? since they are faster than gemma 4.

Anonymous
04/03/26(Fri)03:30:20 No.108514707

Anonymous 04/03/26(Fri)03:30:20 No.108514707

>>108514695
It's fine to test. Ensure you use the proper format.

Anonymous
04/03/26(Fri)03:30:26 No.108514708

Anonymous 04/03/26(Fri)03:30:26 No.108514708

>>108514704
gemma 4 works miraculously well at that moment with those retards fucking shit up again and again kek

Anonymous
04/03/26(Fri)03:30:56 No.108514709

Anonymous 04/03/26(Fri)03:30:56 No.108514709

>>108514048
What gave it away lmao
>>108509532

Anonymous
04/03/26(Fri)03:31:33 No.108514711

Anonymous 04/03/26(Fri)03:31:33 No.108514711

>>108514707
>Ensure you use the proper format.
cool thing that silly tavern has the "chat completion" thing, so that it uses the format from the model itself automatically

Anonymous
04/03/26(Fri)03:32:12 No.108514714

Anonymous 04/03/26(Fri)03:32:12 No.108514714

>>108514668
so they'll only release one, and the jets will vote for the smallest, fantastic

Anonymous
04/03/26(Fri)03:33:56 No.108514716

Anonymous 04/03/26(Fri)03:33:56 No.108514716

>>108514661
Did you actually use it? Its not even close.
In terms of writing in general it seemed like we are regressing for a while now.
Gemma4 has good general knowledge. And first time I actually managed to de-slop by just prompting. lol And its smart for its size.

Anonymous
04/03/26(Fri)03:34:08 No.108514718

Anonymous 04/03/26(Fri)03:34:08 No.108514718

Gemma 4 saved the hobby. This is a DeepSeek moment. Local and cloud, united.

Anonymous
04/03/26(Fri)03:35:25 No.108514724

Anonymous 04/03/26(Fri)03:35:25 No.108514724

>>108514718
No it didn't. Gemma 4 takes more vram than a 70B and it's slower. Just run a 70B since its faster.

Anonymous
04/03/26(Fri)03:35:56 No.108514727

Anonymous 04/03/26(Fri)03:35:56 No.108514727

>>108509532
I just read that and >Your breathing hitches
AAAAAAAAAAAAAAAAAAAAAAAAAAA

Anonymous
04/03/26(Fri)03:36:26 No.108514728

Anonymous 04/03/26(Fri)03:36:26 No.108514728

>>108514724
If only 70Bs didn't stop being made 2 years ago.

Anonymous
04/03/26(Fri)03:36:26 No.108514729

Anonymous 04/03/26(Fri)03:36:26 No.108514729

File: 1754884095733529.png (114 KB, 640x480)

114 KB PNG

>>108514724
>Gemma 4 takes more vram than a 70B and it's slower.

Anonymous
04/03/26(Fri)03:37:29 No.108514731

Anonymous 04/03/26(Fri)03:37:29 No.108514731

>>108514724
>t. angry Alibaba employee
it's all right Chang, just release a good Qwen 3.6 series and we'll such China's dick again, it's that simple

Anonymous
04/03/26(Fri)03:38:28 No.108514734

Anonymous 04/03/26(Fri)03:38:28 No.108514734

>>108514729
Yea, it takes a lot of vram for context. 2x more than a 70B. You have to use swa for it to be usable, but if you do that you can't context shift. Without swa is like the model doesn't even have gqa, that's how bad it is.

Anonymous
04/03/26(Fri)03:38:54 No.108514737

Anonymous 04/03/26(Fri)03:38:54 No.108514737

>>108514695
I don't know, the model just went cuckoo.

Anonymous
04/03/26(Fri)03:39:47 No.108514742

Anonymous 04/03/26(Fri)03:39:47 No.108514742

>>108514731
I don't like qwen either, it's writing style is ass.

Anonymous
04/03/26(Fri)03:42:05 No.108514756

Anonymous 04/03/26(Fri)03:42:05 No.108514756

I think Qwen 3.5 was good. Not a perfect model, but good considering who made it. And Gemma is even better. Generally speaking happy with both models and what the companies have done with them. Baiters and shitstirrers fuck off.

Anonymous
04/03/26(Fri)03:43:04 No.108514759

Anonymous 04/03/26(Fri)03:43:04 No.108514759

>>108514734
but the context vram usage is independant from the size of the model, you can have a 70b model that only uses context linearly, or you can have a 1b model that only uses attention and goes quadratic

Anonymous
04/03/26(Fri)03:43:23 No.108514761

Anonymous 04/03/26(Fri)03:43:23 No.108514761

does georgi's new paradigm shifting activation rotation work by default with -ctk q8_0 -ctv q8_0?

Anonymous
04/03/26(Fri)03:44:05 No.108514763

Anonymous 04/03/26(Fri)03:44:05 No.108514763

>>108514761
not on gemma4 lol

Anonymous
04/03/26(Fri)03:44:21 No.108514764

Anonymous 04/03/26(Fri)03:44:21 No.108514764

>>108514734
>if you do that you can't context shift
why would you even want to do that...?

Anonymous
04/03/26(Fri)03:45:28 No.108514769

Anonymous 04/03/26(Fri)03:45:28 No.108514769

>context shift
Lol.

Anonymous
04/03/26(Fri)03:45:55 No.108514772

Anonymous 04/03/26(Fri)03:45:55 No.108514772

File: 1759270428746771.png (156 KB, 1174x685)

156 KB PNG

>>108514761
unfortunately no, that tricks only works on attention layers, and gemma 4 has 90% of its layers that are sliding attention

Anonymous
04/03/26(Fri)03:46:42 No.108514776

Anonymous 04/03/26(Fri)03:46:42 No.108514776

>>108514769
>stop using functionalities i don't use

Anonymous
04/03/26(Fri)03:47:53 No.108514780

Anonymous 04/03/26(Fri)03:47:53 No.108514780

I'm sure many people benefit with mmap too.

Anonymous
04/03/26(Fri)03:48:26 No.108514783

Anonymous 04/03/26(Fri)03:48:26 No.108514783

>>108514764
Gemma 4 doesnt even have gqa, its a hybrid swa and global attention model. No wonder its so vram hungry.

Anonymous
04/03/26(Fri)03:48:39 No.108514785

Anonymous 04/03/26(Fri)03:48:39 No.108514785

>>108514509
Derived from https://huggingface.co/trohrbaugh/gemma-4-31b-it-heretic-ara which was linked in the last thread. Also has 26B-A4B done too but no gguf downloads.

Anonymous
04/03/26(Fri)03:48:48 No.108514786

Anonymous 04/03/26(Fri)03:48:48 No.108514786

>>108514772
Does unified kv cache reduce the memory footprint? Forgot the parameter name.

Anonymous
04/03/26(Fri)03:48:55 No.108514787

Anonymous 04/03/26(Fri)03:48:55 No.108514787

>>108514776
no one is telling you to stop anything, you are free to use meme shit, and we are free to make fun of you for that

Anonymous
04/03/26(Fri)03:49:03 No.108514788

Anonymous 04/03/26(Fri)03:49:03 No.108514788

>>108514761
>>108514763
>>108514772
https://github.com/ggml-org/llama.cpp/pull/21332

Anonymous
04/03/26(Fri)03:50:06 No.108514795

Anonymous 04/03/26(Fri)03:50:06 No.108514795

>>108514764
>the reply to you has nothing to do with your post and doesn't answer your very clear question
Kek.

Anonymous
04/03/26(Fri)03:50:35 No.108514799

Anonymous 04/03/26(Fri)03:50:35 No.108514799

File: 1765155846219893.png (61 KB, 413x360)

61 KB PNG

>>108514788
oh nice, time to compile again!
(nah I'm joking I'm just gonna download the new binaries)

Anonymous
04/03/26(Fri)03:51:32 No.108514802

Anonymous 04/03/26(Fri)03:51:32 No.108514802

>>108514795
stackoverflow experience

Anonymous
04/03/26(Fri)03:53:35 No.108514809

Anonymous 04/03/26(Fri)03:53:35 No.108514809

>>108514787
Says the guy who prematurely ejaculates and can't even last until the context is filled lol

Anonymous
04/03/26(Fri)03:54:53 No.108514813

Anonymous 04/03/26(Fri)03:54:53 No.108514813

>>108514809
I'm not a vramlet like you, poorfag.

Anonymous
04/03/26(Fri)03:55:53 No.108514816

Anonymous 04/03/26(Fri)03:55:53 No.108514816

>swa saves memory
>disable swa
>omg so big memory

Anonymous
04/03/26(Fri)03:55:58 No.108514817

Anonymous 04/03/26(Fri)03:55:58 No.108514817

>>108514813
Post your machine then

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.