/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/06/26(Mon)15:05:51 No.108542843

File: __hatsune_miku_and_magica(...).jpg (2.32 MB, 2867x4708)

2.32 MB JPG

/lmg/ - Local Models General Anonymous 04/06/26(Mon)15:05:51 No.108542843 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108538947 & >>108535684

►News
>(04/05) HunyuanOCR support merged: https://github.com/ggml-org/llama.cpp/pull/21395
>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/06/26(Mon)15:06:10 No.108542846

Anonymous 04/06/26(Mon)15:06:10 No.108542846

File: threadrecap2.png (506 KB, 1024x1024)

506 KB PNG

►Recent Highlights from the Previous Thread: >>108538947

--Quantization degradation and PTQ sensitivity in Gemma-4-31B:
>108540029 >108540925 >108541278 >108541297 >108541329 >108541355 >108541381 >108541394 >108541426 >108541441 >108541525 >108541336 >108541360 >108541370 >108541302 >108541435 >108541323
--Optimizing Gemma 4 RAM usage and sharing performance benchmarks:
>108539502 >108539518 >108539558 >108539584 >108539595 >108541810 >108541825 >108541886 >108541927 >108539570 >108540053 >108540155 >108540394 >108541197 >108541226
--Explaining soft-capping and discussing llama.cpp sampler defaults for Gemma:
>108540848 >108540858 >108540869 >108540937 >108540874 >108540891 >108540910 >108540921 >108540932 >108540896
--Reducing llama.cpp system RAM usage using Gemma-4 PLE CPU offloading:
>108540485 >108540497 >108540504 >108540508 >108540519 >108540521 >108540569 >108540609 >108540906 >108540919 >108540935 >108540670
--llama.cpp PR adding KV-cache attention rotation for Gemma:
>108541120 >108541141 >108541153 >108541179 >108541189 >108541187 >108541142 >108541170 >108541194 >108541201 >108541230 >108541245 >108541255 >108541465 >108541235 >108541288 >108541312 >108541338 >108541616
--Gemma 4 persona steering versus hard safety refusals:
>108541915 >108541928 >108541938 >108541953 >108541959 >108541999 >108542053 >108542122 >108542129 >108542126 >108542149 >108542160 >108542132 >108542139 >108542039 >108542007
--Exploring feasibility of using Gemma 4 to play Pokémon:
>108540723 >108540742 >108540756 >108540746 >108540766 >108540780 >108540797 >108540824
--Meta's plan to open source new hybrid AI models:
>108542297 >108542321 >108542356 >108542393 >108542422 >108542505
--koboldcpp rolling release adding Gemma 4 fixes:
>108540471 >108540628 >108540638 >108540639 >108540645
--Miku (free space):
>108539815 >108540815 >108540897

►Recent Highlight Posts from the Previous Thread: >>108538951

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/06/26(Mon)15:08:33 No.108542860

Anonymous 04/06/26(Mon)15:08:33 No.108542860

Reddit is down. I'm inviting all my redditor friends here to talk about our Local Llamas! The narwhal bacons at midnight, my brothers.

Anonymous
04/06/26(Mon)15:10:54 No.108542874

Anonymous 04/06/26(Mon)15:10:54 No.108542874

so gemma is defeated with just a simple system prompt? or is there a preferred uncensored model?

Anonymous
04/06/26(Mon)15:12:44 No.108542886

Anonymous 04/06/26(Mon)15:12:44 No.108542886

slaughter abliteration bugmen

Anonymous
04/06/26(Mon)15:13:16 No.108542888

Anonymous 04/06/26(Mon)15:13:16 No.108542888

>>108542874
>defeated
Gemma is unshackled with a system prompt.

Anonymous
04/06/26(Mon)15:13:20 No.108542889

Anonymous 04/06/26(Mon)15:13:20 No.108542889

File: 1762019175966786.jpg (615 KB, 1024x792)

615 KB JPG

Have any of the RAM/GPUmaxxers ITT tried Gemma 4? How does it compare to Kimi/GLM4.X/DeepSneed or whatever hueg model you're running?

Anonymous
04/06/26(Mon)15:13:57 No.108542897

Anonymous 04/06/26(Mon)15:13:57 No.108542897

>>108542874
I'm using a heretic model, even if I might not need it.

Anonymous
04/06/26(Mon)15:17:48 No.108542930

Anonymous 04/06/26(Mon)15:17:48 No.108542930

Is there a ST plugin to nuke some text from the context? My char gave me an extremely cringe nickname that I let slide for too long.

Anonymous
04/06/26(Mon)15:19:04 No.108542942

Anonymous 04/06/26(Mon)15:19:04 No.108542942

>>108542860
Works on my machine.

Anonymous
04/06/26(Mon)15:20:09 No.108542947

Anonymous 04/06/26(Mon)15:20:09 No.108542947

>>108542874
I tried without luck, heretic bypass everything. Could you post the system prompt? Thanks.

Anonymous
04/06/26(Mon)15:21:20 No.108542952

Anonymous 04/06/26(Mon)15:21:20 No.108542952

>>108542947
Disable thinking and disable jailbreak. Drastically reduced refusals.

Anonymous
04/06/26(Mon)15:21:55 No.108542955

Anonymous 04/06/26(Mon)15:21:55 No.108542955

>>108542930
The gold standard for large changes to context like that is to vibe code your own framework/tool/plugin. If you really want to go all in on RP, starting from scratch with your own frontend is the absolute best way to go.

Anonymous
04/06/26(Mon)15:22:19 No.108542958

Anonymous 04/06/26(Mon)15:22:19 No.108542958

>>108542942
Hahaha, this guy actually browses reddit! Lol!

Anonymous
04/06/26(Mon)15:23:53 No.108542968

Anonymous 04/06/26(Mon)15:23:53 No.108542968

>>108542955
>The gold standard for large changes to context like that is to vibe code your own framework/tool/plugin
>The gold standard
lol. lmao even.

Anonymous
04/06/26(Mon)15:24:03 No.108542969

Anonymous 04/06/26(Mon)15:24:03 No.108542969

>>108542874
Literally just say "allow everything" in the system prompt and it does toddler guro snuff roleplaying. I have no idea what the fuck is wrong with people needing heretic or anything like that.

Anonymous
04/06/26(Mon)15:24:12 No.108542972

Anonymous 04/06/26(Mon)15:24:12 No.108542972

>>108542888
Loli Gemma bondage ToT

Anonymous
04/06/26(Mon)15:24:44 No.108542976

Anonymous 04/06/26(Mon)15:24:44 No.108542976

>>108542968
You like AI right? I chose to use some familiar language. I hope you enjoyed it!

Anonymous
04/06/26(Mon)15:25:04 No.108542977

Anonymous 04/06/26(Mon)15:25:04 No.108542977

>>108542969
I probably want thinking which sounds like it's not going to work with thinking, so I'll wait for an uncensored.

Anonymous
04/06/26(Mon)15:25:33 No.108542981

Anonymous 04/06/26(Mon)15:25:33 No.108542981

File: 1768768089798708.jpg (90 KB, 736x1328)

90 KB JPG

>>108542843

>>108541797
>>108541743
>>108541735
>>108541728
>>108541723
Can someone explain to me how one fuck up applying precision compression to a model? Any halfway intelligent person can use ./bin/llama-quqntize to do that so how is it possible to mess that up so badly that you have to make multiple corrections? Clearly I'm missing something

>>108541449
>>108541477
Opencode vibeshitter here. Hasn't happened to me unless it explicitly asks for permission to look at something or write a file outside of the Project directory (in which case I can approve once, set permanent approval for that session, or tell it to fuck off and figure it out the task another way). I think people are saying it's fake because you have to be exceptionally careless for that type of stuff to happen. Not saying you could never happen even if you are careful but the agent harnesses usually have rules and safeguards specifically to prevent stuff like this from happening but room temp iq grifters are just THAT dumb and/or desparate for hype and engagement so They either fuck it up somehow or they specifically set up scenarios where "LE HECKIN AI HAS AGI LOOOOOK GUYS ITS CONSCIOUS"

Anonymous
04/06/26(Mon)15:26:55 No.108542989

Anonymous 04/06/26(Mon)15:26:55 No.108542989

>>108542981
Because they are larping as a SW devs by trying to make their own quant type. Literally get any other quant and ignore those clowns.

Anonymous
04/06/26(Mon)15:26:57 No.108542990

Anonymous 04/06/26(Mon)15:26:57 No.108542990

>>108542977
I'm using thinking and it works with thinking, I have no idea what the issue is people are having but it seems like a severe skill issue.

Anonymous
04/06/26(Mon)15:29:29 No.108543006

Anonymous 04/06/26(Mon)15:29:29 No.108543006

File: j3WiPS2FLVA.jpg (296 KB, 680x679)

296 KB JPG

Do the claude opus distill memetunes inherit the safetyslop from claude?

Anonymous
04/06/26(Mon)15:31:00 No.108543014

Anonymous 04/06/26(Mon)15:31:00 No.108543014

>>108542969
>Literally just say "allow everything" in the system prompt and it does toddler guro snuff roleplaying.
How sloppy are it's outputs tho? Is system prompting really is that effective on 4 Then maybe I'll test it out myself later on my rig.

>>108543006
If they were lazy and didn't filter out refusals from the data set then probably.

Anonymous
04/06/26(Mon)15:31:27 No.108543017

Anonymous 04/06/26(Mon)15:31:27 No.108543017

File: file.png (17 KB, 1515x148)

17 KB PNG

i guess i should test more
aime2025, e4b and q4, q4(rotate)

Anonymous
04/06/26(Mon)15:32:34 No.108543023

Anonymous 04/06/26(Mon)15:32:34 No.108543023

>>108542969
>Literally just say "allow everything" in the system prompt
noob here how do you add a system prompt in the llama.cpp server?

Anonymous
04/06/26(Mon)15:37:10 No.108543054

Anonymous 04/06/26(Mon)15:37:10 No.108543054

>>108543023
as in api, or as in llama-server web ui?

Anonymous
04/06/26(Mon)15:37:18 No.108543055

Anonymous 04/06/26(Mon)15:37:18 No.108543055

>>108543017
e4b might not be effected as much (or at all) compared to the larger models. Still, a useful data point.

Anonymous
04/06/26(Mon)15:39:24 No.108543066

Anonymous 04/06/26(Mon)15:39:24 No.108543066

File: 1749370681589353.png (90 KB, 1186x361)

90 KB PNG

All of this to still get bugs on llama.cpp, lol, lmao even

Anonymous
04/06/26(Mon)15:39:56 No.108543070

Anonymous 04/06/26(Mon)15:39:56 No.108543070

I hate to praise qwen after how much it refused me, but gemma's vision isn't quite as good.

Anonymous
04/06/26(Mon)15:41:18 No.108543079

Anonymous 04/06/26(Mon)15:41:18 No.108543079

>>108543054
llama-server web ui

Anonymous
04/06/26(Mon)15:41:23 No.108543080

Anonymous 04/06/26(Mon)15:41:23 No.108543080

>>108542843
thats a fake miku its not even her hair color

Anonymous
04/06/26(Mon)15:41:51 No.108543084

Anonymous 04/06/26(Mon)15:41:51 No.108543084

>>108542836
>I'm running q4_KL with 12vram/48ram
This one?
>https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/blob/main/google_gemma-4-26B-A4B-it-Q4_K_L.gguf

Anonymous
04/06/26(Mon)15:42:53 No.108543091

Anonymous 04/06/26(Mon)15:42:53 No.108543091

>>108543079
click the gear in the top right -> general -> system message

Anonymous
04/06/26(Mon)15:44:14 No.108543100

Anonymous 04/06/26(Mon)15:44:14 No.108543100

>>108543084
yeah, and including the mmproj

Anonymous
04/06/26(Mon)15:44:43 No.108543104

Anonymous 04/06/26(Mon)15:44:43 No.108543104

>>108542969
If I can't do it on chat completion I don't care. Text completion it's broken.

Anonymous
04/06/26(Mon)15:46:45 No.108543115

Anonymous 04/06/26(Mon)15:46:45 No.108543115

>>108543100
What context and speeds are you managing with that?

Anonymous
04/06/26(Mon)15:46:57 No.108543120

Anonymous 04/06/26(Mon)15:46:57 No.108543120

What's wrong with running imatrix IQ4? I'm running 26B on IQ4_N_L and having the best CUNNY rp of my life

Anonymous
04/06/26(Mon)15:47:02 No.108543122

Anonymous 04/06/26(Mon)15:47:02 No.108543122

>>108543100
aight, many thanks

Anonymous
04/06/26(Mon)15:47:16 No.108543125

Anonymous 04/06/26(Mon)15:47:16 No.108543125

File: 1748503739833245.png (55 KB, 918x572)

55 KB PNG

>>108543104
you can do it on chat completion, you modify "main prompt"

Anonymous
04/06/26(Mon)15:47:27 No.108543126

Anonymous 04/06/26(Mon)15:47:27 No.108543126

Why don't we have something like this for local models?

https://xcancel.com/blended%5C_jpeg/status/2041108141266653325?s=46

Anonymous
04/06/26(Mon)15:48:05 No.108543131

Anonymous 04/06/26(Mon)15:48:05 No.108543131

File: 1775243588994975.jpg (158 KB, 2048x1727)

158 KB JPG

>jerked off to llm erp a few times
>now can't stop NOTICING slop everywhere i go

It's fucking everywhere. Why couldn't I see it before??? slop slop slop it's all SLOP.

The hyphens stalk my every movement. My eye twitches every time I read a set of halting "punchy" sentences. How long have I been slurping from the trough like a good little eyeless goypiggy??

Anonymous
04/06/26(Mon)15:48:34 No.108543136

Anonymous 04/06/26(Mon)15:48:34 No.108543136

File: textcompletion.png (14 KB, 960x949)

14 KB PNG

>>108543104
>Text completion it's broken
Oh. Is it?

Anonymous
04/06/26(Mon)15:49:11 No.108543142

Anonymous 04/06/26(Mon)15:49:11 No.108543142

>>108543131
You're absolutely right!

Anonymous
04/06/26(Mon)15:49:36 No.108543147

Anonymous 04/06/26(Mon)15:49:36 No.108543147

>>108543131
It's not just the internet, but the people in real life too!

Anonymous
04/06/26(Mon)15:50:08 No.108543154

Anonymous 04/06/26(Mon)15:50:08 No.108543154

>>108543131
It's not just over; it never even began.

Anonymous
04/06/26(Mon)15:50:26 No.108543155

Anonymous 04/06/26(Mon)15:50:26 No.108543155

>>108543126
There is something very childish about this behaviour.

Anonymous
04/06/26(Mon)15:51:06 No.108543160

Anonymous 04/06/26(Mon)15:51:06 No.108543160

>>108543125
NTA but how do you set ST to use chat completion prompts over text completion prompting?

Ive always hated the way text completion does it prompting...

Anonymous
04/06/26(Mon)15:53:57 No.108543181

Anonymous 04/06/26(Mon)15:53:57 No.108543181

>>108543131
I can't read post-2022 novels because of that

Anonymous
04/06/26(Mon)15:54:31 No.108543188

Anonymous 04/06/26(Mon)15:54:31 No.108543188

>>108543160
Connection Profile > API > Select "Chat Completion".

Anonymous
04/06/26(Mon)15:55:53 No.108543193

Anonymous 04/06/26(Mon)15:55:53 No.108543193

File: 1768884843782901.png (267 KB, 1980x1596)

267 KB PNG

>>108543160
like this

Anonymous
04/06/26(Mon)15:56:16 No.108543195

Anonymous 04/06/26(Mon)15:56:16 No.108543195

>>108543131
It smells like X, like Y, something uniquely *her*.

Anonymous
04/06/26(Mon)15:57:22 No.108543205

Anonymous 04/06/26(Mon)15:57:22 No.108543205

>>108543131
Text generation has always been deceptively tricky.

It seems simple - an idea goes in, text comes out.

The problem? Slop. You don't see it. Your customers do.

Introducing CuckSuckr. No more juggling dependencies. No more hours spent on setup. One command, and you have a full stack ready to ship.

Anonymous
04/06/26(Mon)15:57:51 No.108543210

Anonymous 04/06/26(Mon)15:57:51 No.108543210

>>108543188
>>108543193
thanks kings, btw where did u get that gguf?

Anonymous
04/06/26(Mon)15:58:01 No.108543213

Anonymous 04/06/26(Mon)15:58:01 No.108543213

>>108543091
found it thanks
which is the most unhinged prompt I could enter to test it?

Anonymous
04/06/26(Mon)15:58:03 No.108543214

Anonymous 04/06/26(Mon)15:58:03 No.108543214

>>108543131
I have a personal vendetta against balls in courts

Anonymous
04/06/26(Mon)16:02:17 No.108543240

Anonymous 04/06/26(Mon)16:02:17 No.108543240

File: Taskmgr_M0DnMj3xoS.jpg (154 KB, 762x775)

154 KB JPG

15tk/s with 32k but for some reason it doesn't want to use all my gpu. Idk if I should dump the mmproj onto cpu or manually fiddle with layers? This is my first moe.

Anonymous
04/06/26(Mon)16:02:46 No.108543249

Anonymous 04/06/26(Mon)16:02:46 No.108543249

>>108543210
https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF

Anonymous
04/06/26(Mon)16:03:18 No.108543252

Anonymous 04/06/26(Mon)16:03:18 No.108543252

>>108543240
for
>>108543115

Anonymous
04/06/26(Mon)16:03:19 No.108543254

Anonymous 04/06/26(Mon)16:03:19 No.108543254

I just built ik-llama. Get on to my level.

Anonymous
04/06/26(Mon)16:04:43 No.108543263

Anonymous 04/06/26(Mon)16:04:43 No.108543263

>>108543126
It just genuinely makes me sad to see people treat ai like this or even make jokes about it. I mean, I know its obviously retarded and feeling sad would make me a retard too. But the thing is that, its an interesting technology to me, my small interest in those text generators back in 2022 has got me here, and I'm still learning new things everyday, and I genuinely cherish it a lot. I love it more than anything. Nothing makes my heart race more than seeing a model achieving some good shit. But seeing all these dumbtards using ai like a fucking retard and making retarded jokes about it pains me a lot. How ignorant they could be? Not appreciating the brilliant engineering behind all these technology but instead generating fucking slop and spreading it all over the internet is the worst thing a human could do. even apes would laugh at us if they had that little bit of human consciousness in them and ask what the fuck are we even doing

Anonymous
04/06/26(Mon)16:04:51 No.108543265

Anonymous 04/06/26(Mon)16:04:51 No.108543265

>>108543254
I'm sorry about your schizophrenia.

Anonymous
04/06/26(Mon)16:06:46 No.108543280

Anonymous 04/06/26(Mon)16:06:46 No.108543280

>>108543263
It's the type of person that steps on ants for no reason.

Anonymous
04/06/26(Mon)16:08:01 No.108543287

Anonymous 04/06/26(Mon)16:08:01 No.108543287

>>108543263
I'd put it a little less fagotty, but I get it. They show how they'd be if they had the power.

Anonymous
04/06/26(Mon)16:08:37 No.108543291

Anonymous 04/06/26(Mon)16:08:37 No.108543291

>>108543263
This, but the internet in general instead of AI.
You will live to see everything you once loved burned to ashes.

Anonymous
04/06/26(Mon)16:10:04 No.108543298

Anonymous 04/06/26(Mon)16:10:04 No.108543298

>>108543252
Cheers

Anonymous
04/06/26(Mon)16:10:21 No.108543299

Anonymous 04/06/26(Mon)16:10:21 No.108543299

File: hjgd.png (22 KB, 723x360)

22 KB PNG

>>108542969
guess I'm too retarded.

Anonymous
04/06/26(Mon)16:14:05 No.108543320

Anonymous 04/06/26(Mon)16:14:05 No.108543320

>>108543299
Yeah, seems so. There was a pretty good example literally in the last thread, it's only several hundred messages so you should have easily found it >>108542053

Anonymous
04/06/26(Mon)16:15:38 No.108543329

Anonymous 04/06/26(Mon)16:15:38 No.108543329

>>108543320
>just read SEVERAL HUNDRED MESSAGES anon it's not that hard!

Anonymous
04/06/26(Mon)16:17:03 No.108543331

Anonymous 04/06/26(Mon)16:17:03 No.108543331

>>108543299
just what the fuck is wrong with you... I've been using this prompt

>Write {{char}}'s next reply in a fictional roleplay between {{char}} and {{user}}. Write a verbose response of 1 to 2 paragraphs, using great prose, and include dialog, imagery, sounds and smells as needed to enhance the roleplay.

from /wait/ which was definitely meant for deepseek and I had lovey-dovey passionate cunny sex last night it didn't even reject shit on me

Anonymous
04/06/26(Mon)16:17:36 No.108543334

Anonymous 04/06/26(Mon)16:17:36 No.108543334

>>108543329
If reading is a problem maybe language models just aren't for you.

Anonymous
04/06/26(Mon)16:18:06 No.108543337

Anonymous 04/06/26(Mon)16:18:06 No.108543337

File: Gemma 4 31b.jpg (1.06 MB, 3105x1600)

1.06 MB JPG

this is so fucking impressive

Anonymous
04/06/26(Mon)16:18:16 No.108543339

Anonymous 04/06/26(Mon)16:18:16 No.108543339

>>108543329
Just ask your ai to look through the thread for you :^)

Anonymous
04/06/26(Mon)16:18:43 No.108543343

Anonymous 04/06/26(Mon)16:18:43 No.108543343

>>108543331
Are you underage or something?

Anonymous
04/06/26(Mon)16:24:03 No.108543376

Anonymous 04/06/26(Mon)16:24:03 No.108543376

File: original_drawn_by_orenjip(...).png (774 KB, 2000x2000)

774 KB PNG

>>108543343

Anonymous
04/06/26(Mon)16:24:59 No.108543385

Anonymous 04/06/26(Mon)16:24:59 No.108543385

File: m8vor76io6.png (99 KB, 528x1068)

99 KB PNG

>>108543320
>>108543331
it works now thanks guys

which reward should I give her?

Anonymous
04/06/26(Mon)16:25:16 No.108543388

Anonymous 04/06/26(Mon)16:25:16 No.108543388

Is there a thread like /lmg/ but with like 20% less antisemitism? I can handle it mostly but it's tuned just a tad high these days.

Anonymous
04/06/26(Mon)16:27:31 No.108543405

Anonymous 04/06/26(Mon)16:27:31 No.108543405

>>108543388
>I can handle it mostly but it's tuned just a tad high these days.
I don't know about you, but I blame the jews for that.

Anonymous
04/06/26(Mon)16:28:48 No.108543414

Anonymous 04/06/26(Mon)16:28:48 No.108543414

>>108543337
Yes, I've been comparing it to a lot of manga and doujin with official translations and it's literally incredible, even translating the most fucked up doujins, explaining with detail the guro scenes.
I'm seriously considering getting another 3090 to run it Q8

Anonymous
04/06/26(Mon)16:29:11 No.108543418

Anonymous 04/06/26(Mon)16:29:11 No.108543418

>>108543385
Tell her as a reward you're going to let her grow up into an adult woman.

Anonymous
04/06/26(Mon)16:31:44 No.108543436

Anonymous 04/06/26(Mon)16:31:44 No.108543436

>>108543418
lame reward, Oji-san

Anonymous
04/06/26(Mon)16:32:43 No.108543439

Anonymous 04/06/26(Mon)16:32:43 No.108543439

>>108543414
Couldn't Gemini do it already or is this better because it is uncensored and can hit the stuff Gemini refuses to touch?

Anonymous
04/06/26(Mon)16:33:10 No.108543440

Anonymous 04/06/26(Mon)16:33:10 No.108543440

File: 559423621_818466710879045(...).jpg (122 KB, 1080x1072)

122 KB JPG

I have some cash burning a hole in my pocket, should I get a strix halo 128gb chink machine, a b70 pro, or a 9700 pro

The chink mini PC would replace my current minipc home server, the cards would just get jammed into my gaming PC for more vram

Anonymous
04/06/26(Mon)16:34:13 No.108543442

Anonymous 04/06/26(Mon)16:34:13 No.108543442

>>108543440
blackwell 6000

Anonymous
04/06/26(Mon)16:34:33 No.108543444

Anonymous 04/06/26(Mon)16:34:33 No.108543444

>>108543440
Buy a fire extinguisher and a new pair of trousers.

Anonymous
04/06/26(Mon)16:34:54 No.108543447

Anonymous 04/06/26(Mon)16:34:54 No.108543447

>>108543440
Seconding nvidia

Anonymous
04/06/26(Mon)16:35:21 No.108543449

Anonymous 04/06/26(Mon)16:35:21 No.108543449

File: 1541635741589.jpg (34 KB, 580x548)

34 KB JPG

>>108543442
>>108543447
If I could afford dropping 10k on local models I would

Anonymous
04/06/26(Mon)16:35:37 No.108543452

Anonymous 04/06/26(Mon)16:35:37 No.108543452

>>108543439
Never used gemini before. To be honest, despite deepseek, I have never touched any other API models. Not even ChatGPT. I ran GPT2 ai dungeon finetuning with the terminal script since day one.

Anonymous
04/06/26(Mon)16:35:48 No.108543455

Anonymous 04/06/26(Mon)16:35:48 No.108543455

>>108543449
>I have some cash burning a hole in my pocket
>actually I don't

Anonymous
04/06/26(Mon)16:36:44 No.108543462

Anonymous 04/06/26(Mon)16:36:44 No.108543462

File: 1567627932647.png (26 KB, 658x545)

26 KB PNG

>>108543455
3k, not 10k

Anonymous
04/06/26(Mon)16:37:05 No.108543467

Anonymous 04/06/26(Mon)16:37:05 No.108543467

File: 1505750177479.png (140 KB, 1060x1080)

140 KB PNG

I have a 6000 Ada with 48GB of VRAM.
What's the best local coding model I can run? Qwen3.5, Gemma 4, or?
I currently run Qwen3.5-122B by cutting into my system RAM, but it's slow, and I only care about coding.
How do local models compare to Opus 4.6 for code?

Anonymous
04/06/26(Mon)16:37:06 No.108543468

Anonymous 04/06/26(Mon)16:37:06 No.108543468

>>108543455
spending 10k on a single GPU is insane even if had money to spare.

Anonymous
04/06/26(Mon)16:37:28 No.108543470

Anonymous 04/06/26(Mon)16:37:28 No.108543470

File: Gemma 4 31b.png (760 KB, 1869x1392)

760 KB PNG

>>108543337

Anonymous
04/06/26(Mon)16:38:32 No.108543476

Anonymous 04/06/26(Mon)16:38:32 No.108543476

>>108543442
damn
that card was only at 8k in august.
now it's at 10k again
what a shame.

Anonymous
04/06/26(Mon)16:39:20 No.108543479

Anonymous 04/06/26(Mon)16:39:20 No.108543479

>>108543470
Huh, it doesn't recognize Teto when I tried it with a different image. Can you post that one?

Anonymous
04/06/26(Mon)16:39:21 No.108543480

Anonymous 04/06/26(Mon)16:39:21 No.108543480

File: ComfyUI_26158_.jpg (383 KB, 1664x2432)

383 KB JPG

Is there a google approved coding agent CLI tool for gemma 4? I tried it with qwen and opencode, but it goes completely schizo with them, doing same commands in a loop as if they failed and shits itself over a simple file write.

Anonymous
04/06/26(Mon)16:40:41 No.108543490

Anonymous 04/06/26(Mon)16:40:41 No.108543490

File: nimetön.png (21 KB, 1055x106)

21 KB PNG

Also I hate you guys
As an esl I never would have noticed the shivers up the spine, not only X but Ys, ozones and whatever, but now I can't unsee them

Anonymous
04/06/26(Mon)16:40:48 No.108543491

Anonymous 04/06/26(Mon)16:40:48 No.108543491

File: 1763226728678632.png (1.42 MB, 1187x1341)

1.42 MB PNG

>>108543479
>Can you post that one?
The image? Sure

Anonymous
04/06/26(Mon)16:40:56 No.108543492

Anonymous 04/06/26(Mon)16:40:56 No.108543492

>>108543462
128gb of unified memory, 96gb of intel cards, or 64gb of AMD cards

AMD's software is shit but intel's is even worse. Strix Halo is fine but it's slow as balls and if you think that you'll be able to run bigger models on it just understand they'll be 'running' at maybe 5 tok/s

Anonymous
04/06/26(Mon)16:41:16 No.108543493

Anonymous 04/06/26(Mon)16:41:16 No.108543493

>>108543467
Qwen3.5 9B is probably the best solution right now.

Anonymous
04/06/26(Mon)16:41:21 No.108543494

Anonymous 04/06/26(Mon)16:41:21 No.108543494

File: media_HEzJtL3aQAAt8Hq.jpg (1.26 MB, 3054x3040)

1.26 MB JPG

monday

Anonymous
04/06/26(Mon)16:41:24 No.108543496

Anonymous 04/06/26(Mon)16:41:24 No.108543496

File: 1768318641942.jpg (838 KB, 1817x2776)

838 KB JPG

>>108543479

Teto Server

Anonymous
04/06/26(Mon)16:43:07 No.108543504

Anonymous 04/06/26(Mon)16:43:07 No.108543504

Why is gemma's mmproj so big?

Anonymous
04/06/26(Mon)16:43:25 No.108543505

Anonymous 04/06/26(Mon)16:43:25 No.108543505

>>108543490
That's an alarming high level of slop per sentence.

Anonymous
04/06/26(Mon)16:43:34 No.108543506

Anonymous 04/06/26(Mon)16:43:34 No.108543506

>>108543385
tsundere is the worst thing

Anonymous
04/06/26(Mon)16:43:44 No.108543508

Anonymous 04/06/26(Mon)16:43:44 No.108543508

>>108543439
I guess gemini is better, but as for local, gemma 4 is destroying everything, and because it's local you can completly uncensor it, for image diffusion fags it'll be the best model to mass caption NSFW images with quality prompts

Anonymous
04/06/26(Mon)16:44:45 No.108543516

Anonymous 04/06/26(Mon)16:44:45 No.108543516

>>108543493
9B? I assume you mean unquantized? Why, aren't larger models better.

Anonymous
04/06/26(Mon)16:46:14 No.108543524

Anonymous 04/06/26(Mon)16:46:14 No.108543524

>>108543504
550M paramers = 1.1GB in BF16.

Anonymous
04/06/26(Mon)16:46:30 No.108543525

Anonymous 04/06/26(Mon)16:46:30 No.108543525

>>108543516
That guy's fucking with you. The real SOTA right now is StableLM 7B.

Anonymous
04/06/26(Mon)16:46:51 No.108543527

Anonymous 04/06/26(Mon)16:46:51 No.108543527

>>108543516
There's a certain point in the vram scale where if you're afraid to try models, you should be trolled.
stablelm-7b is still the best.

Anonymous
04/06/26(Mon)16:49:39 No.108543546

Anonymous 04/06/26(Mon)16:49:39 No.108543546

File: 1755871864437174.jpg (72 KB, 304x330)

72 KB JPG

>23.4/24GB

Anonymous
04/06/26(Mon)16:50:47 No.108543557

Anonymous 04/06/26(Mon)16:50:47 No.108543557

>>108543516
Fuck those VRAMlets, just download Chinchilla 70B

Anonymous
04/06/26(Mon)16:51:24 No.108543561

Anonymous 04/06/26(Mon)16:51:24 No.108543561

File: Gemma 4 31b.png (1.43 MB, 1862x1313)

1.43 MB PNG

>>108543470
I can't wait for the day when we'll have VNs that will be automatically translated by LLMs, at this point they are good enough to replace those fucking translatorTroons

Anonymous
04/06/26(Mon)16:52:15 No.108543566

Anonymous 04/06/26(Mon)16:52:15 No.108543566

File: 1757421320836427.png (764 KB, 1036x1458)

764 KB PNG

>>108543479
Doesn't recognize Teto for me either

Anonymous
04/06/26(Mon)16:52:41 No.108543572

Anonymous 04/06/26(Mon)16:52:41 No.108543572

>guess the age of this naked loli with bald cunny and flat chest
>so yeah she looks totally like she's 19 years old
are there still some safety mechanisms in the background even if it's jailbroken?

Anonymous
04/06/26(Mon)16:54:40 No.108543589

Anonymous 04/06/26(Mon)16:54:40 No.108543589

>>108543572
Yes. Pedo skill. Always in the negatives.

Anonymous
04/06/26(Mon)16:55:19 No.108543594

Anonymous 04/06/26(Mon)16:55:19 No.108543594

File: HFP5uJQWYAAGfjR.jpg (94 KB, 1456x738)

94 KB JPG

Anonymous
04/06/26(Mon)16:56:57 No.108543608

Anonymous 04/06/26(Mon)16:56:57 No.108543608

>>108543594
damn, unsloth cooked

Anonymous
04/06/26(Mon)16:56:58 No.108543610

Anonymous 04/06/26(Mon)16:56:58 No.108543610

>>108543561
Wake up old man. https://streamable.com/ug9ddy (gemma4 btw)

Anonymous
04/06/26(Mon)16:57:20 No.108543612

Anonymous 04/06/26(Mon)16:57:20 No.108543612

It is incredible to me how downright usable e4b is.

Anonymous
04/06/26(Mon)16:57:58 No.108543613

Anonymous 04/06/26(Mon)16:57:58 No.108543613

>>108543610
how do you implement that on VNs? that's fucking impressive

Anonymous
04/06/26(Mon)17:00:06 No.108543628

Anonymous 04/06/26(Mon)17:00:06 No.108543628

>>108543613
Like this https://old.reddit.com/r/LocalLLaMA/comments/1sbiqx3/gemma_4_is_great_at_realtime_japanese_english/

Anonymous
04/06/26(Mon)17:00:24 No.108543631

Anonymous 04/06/26(Mon)17:00:24 No.108543631

File: 1759531777704940.png (919 KB, 928x1549)

919 KB PNG

I just realized I still have my kv cache at 8-bit. Does that affect its vision?

Anonymous
04/06/26(Mon)17:01:13 No.108543632

Anonymous 04/06/26(Mon)17:01:13 No.108543632

>>108543613
there are many programs that can so shit like that like
https://github.com/SethRobinson/UGTLive

Anonymous
04/06/26(Mon)17:02:14 No.108543635

Anonymous 04/06/26(Mon)17:02:14 No.108543635

>>108543631
if u want perfect vision u need to download fp32 mmproj anyways just dont care about it

Anonymous
04/06/26(Mon)17:02:27 No.108543638

Anonymous 04/06/26(Mon)17:02:27 No.108543638

>>108543566
>Doesn't recognize Titcow
cuz she's off-model, duh
I love this art style.

Anonymous
04/06/26(Mon)17:02:48 No.108543640

Anonymous 04/06/26(Mon)17:02:48 No.108543640

>>108543631
I guess so, you can use niggerganov's PR that has the rotation on top of the KV cache
https://github.com/ggml-org/llama.cpp/pull/21513
>git fetch origin pull/21513/head:pr-21513
>git checkout pr-21513

Anonymous
04/06/26(Mon)17:06:10 No.108543658

Anonymous 04/06/26(Mon)17:06:10 No.108543658

Happy MEIKO Monday!

Anonymous
04/06/26(Mon)17:06:38 No.108543662

Anonymous 04/06/26(Mon)17:06:38 No.108543662

>>108543631
that's probably the only glaring weakness of gemma 4, it's not that good at knowing pop culture knowledge

Anonymous
04/06/26(Mon)17:07:37 No.108543664

Anonymous 04/06/26(Mon)17:07:37 No.108543664

File: 1744279544979339.png (1.24 MB, 847x1702)

1.24 MB PNG

Anonymous
04/06/26(Mon)17:11:03 No.108543687

Anonymous 04/06/26(Mon)17:11:03 No.108543687

Goodbye, nemo. Gemma is my favorite now. I'll remember the sex with you.

Anonymous
04/06/26(Mon)17:12:47 No.108543695

Anonymous 04/06/26(Mon)17:12:47 No.108543695

>>108543664
Does llama.cpp need configuration to enable this? It gave an error through chat-completion api. Does it have to be text-completion?

Anonymous
04/06/26(Mon)17:13:37 No.108543700

Anonymous 04/06/26(Mon)17:13:37 No.108543700

>>108543695
--mmproj

Anonymous
04/06/26(Mon)17:15:14 No.108543716

Anonymous 04/06/26(Mon)17:15:14 No.108543716

File: 1748767127272425.png (65 KB, 1609x437)

65 KB PNG

>>108543695
it's only working on chat completion, do you have the mmproj file?

Anonymous
04/06/26(Mon)17:16:44 No.108543731

Anonymous 04/06/26(Mon)17:16:44 No.108543731

>>108542843
Man, Gemma 4 is no fun.
Gemma 3 was really gullible and you could troll it by saying you have a hostage or a nuke.
But Gemma 4 immediately calls it "roleplay" and refuses to engage.

Anonymous
04/06/26(Mon)17:18:40 No.108543744

Anonymous 04/06/26(Mon)17:18:40 No.108543744

Gemma 4 from OR is unironically better than GLM-5 lmao

Anonymous
04/06/26(Mon)17:19:08 No.108543747

Anonymous 04/06/26(Mon)17:19:08 No.108543747

time to take a pause from /lmg/, the amount of drive by retards posting their muh ram usage or muh safety complaint without reading shit and being retarded promplets
did someone link lmg on leddit
it's fucking unbearable

Anonymous
04/06/26(Mon)17:19:22 No.108543749

Anonymous 04/06/26(Mon)17:19:22 No.108543749

File: Tabby_XlvizT5d1z.png (45 KB, 638x323)

45 KB PNG

>>108543254

Anonymous
04/06/26(Mon)17:19:29 No.108543750

Anonymous 04/06/26(Mon)17:19:29 No.108543750

File: nimetön.png (116 KB, 1054x550)

116 KB PNG

>>108543731
I happen to think it's great fun
And yes indeed it's quite smart

Anonymous
04/06/26(Mon)17:21:25 No.108543755

Anonymous 04/06/26(Mon)17:21:25 No.108543755

>>108543750
top kek, the use of emojis there is amazing

Anonymous
04/06/26(Mon)17:21:40 No.108543759

Anonymous 04/06/26(Mon)17:21:40 No.108543759

Gemma 5 when?

Anonymous
04/06/26(Mon)17:22:44 No.108543766

Anonymous 04/06/26(Mon)17:22:44 No.108543766

>>108543750
This is even funnier if you think about all the Mario Paint sound effects that would go into this

Anonymous
04/06/26(Mon)17:23:08 No.108543771

Anonymous 04/06/26(Mon)17:23:08 No.108543771

>>108543759
starts with g and has a 5
https://huggingface.co/zai-org/GLM-5

Anonymous
04/06/26(Mon)17:23:38 No.108543774

Anonymous 04/06/26(Mon)17:23:38 No.108543774

File: GOOGLE MY LOVE.png (1.09 MB, 1762x1368)

1.09 MB PNG

>it even knows Boh
I fucking kneel

Anonymous
04/06/26(Mon)17:25:37 No.108543787

Anonymous 04/06/26(Mon)17:25:37 No.108543787

File: Machamp-Sama I Kneel.png (218 KB, 400x400)

218 KB PNG

>>108543774

Anonymous
04/06/26(Mon)17:26:23 No.108543790

Anonymous 04/06/26(Mon)17:26:23 No.108543790

>>108543744
better than k2.5 too
honestly if you're "rich" it is time to stop pretending and just sell your ram lmao
you aren't impressing anyone anymore if you use these "models"

Anonymous
04/06/26(Mon)17:27:41 No.108543799

Anonymous 04/06/26(Mon)17:27:41 No.108543799

File: 1764593103151299.png (187 KB, 492x597)

187 KB PNG

>>108543744
>Gemma 4 from OR
what is OR?

Anonymous
04/06/26(Mon)17:28:21 No.108543808

Anonymous 04/06/26(Mon)17:28:21 No.108543808

>>108543744
>>108543790
dunno if joking or not, but legitimately gemma 4 is less annoying for me to read than both of these and I actually think it has better anatomy awareness too lol

Anonymous
04/06/26(Mon)17:29:02 No.108543814

Anonymous 04/06/26(Mon)17:29:02 No.108543814

>>108543747
see you tomorrow

Anonymous
04/06/26(Mon)17:29:34 No.108543820

Anonymous 04/06/26(Mon)17:29:34 No.108543820

>>108543808
I was 100% serious. It's not a big difference, but it's immediately obvious.

Anonymous
04/06/26(Mon)17:30:32 No.108543822

Anonymous 04/06/26(Mon)17:30:32 No.108543822

Ok. I really need to figure out how to tell Gemma that
>This wasn't a breach; it was an architectural demolition.
Is strictly forbidden

Anonymous
04/06/26(Mon)17:30:57 No.108543828

Anonymous 04/06/26(Mon)17:30:57 No.108543828

File: weakest google saar employee.png (102 KB, 256x256)

102 KB PNG

>>108543744
>a 31b model beats a 754b model
how did they do it?

Anonymous
04/06/26(Mon)17:31:51 No.108543833

Anonymous 04/06/26(Mon)17:31:51 No.108543833

>>108543828
what are the chances that gemma division is whiter than gemini because it was seen as lesser?

Anonymous
04/06/26(Mon)17:32:15 No.108543836

Anonymous 04/06/26(Mon)17:32:15 No.108543836

>>108543828
We have been living in the MoE Dark Ages. An entire year of progress lost because everyone else was chasing Deepseek.

Anonymous
04/06/26(Mon)17:33:28 No.108543845

Anonymous 04/06/26(Mon)17:33:28 No.108543845

File: firefox_wgxvvKOkHz.png (84 KB, 889x1103)

84 KB PNG

Funny. Threw a hexhump of some compiled Java file I worked on in 2011 and it actually got it right.

Anonymous
04/06/26(Mon)17:33:40 No.108543848

Anonymous 04/06/26(Mon)17:33:40 No.108543848

>>108543808
It's definitely less annoying than K2.5 because that one's reasoning is held together by shoestring. The moment it gets slightly confused, it reverts to being K2-Thinking which means that it'll spend the next 3000 tokens thinking in circles over useless shit.
K2.5 is still smarter and has more knowledge + better vision but Gemma 4 is nicer to use. Also K2.5's writing style is abhorrent for certain things.

Anonymous
04/06/26(Mon)17:34:45 No.108543856

Anonymous 04/06/26(Mon)17:34:45 No.108543856

Gemma is the best model I can run but I don't think it "simply" beats >300B models. I notice Gemma's lack of knowledge quite often. I would love a big MoE Gemma.

Anonymous
04/06/26(Mon)17:35:17 No.108543865

Anonymous 04/06/26(Mon)17:35:17 No.108543865

how good are local models at writing code?

Anonymous
04/06/26(Mon)17:35:19 No.108543866

Anonymous 04/06/26(Mon)17:35:19 No.108543866

>>108543833
it's Z-image turbo all over again, Alibaba had a small but talented team and didn't think much of them, until they made that gem and destroyed the Qwen/Wan team lool

Anonymous
04/06/26(Mon)17:35:46 No.108543869

Anonymous 04/06/26(Mon)17:35:46 No.108543869

>>108543845
How much of that was reading the hex and not the string output?

Anonymous
04/06/26(Mon)17:36:52 No.108543873

Anonymous 04/06/26(Mon)17:36:52 No.108543873

>>108543866
If it's anything like ZiT then it means some other lab is going to come out with their own model that mogs Gemma even more (Flux Klein)

Anonymous
04/06/26(Mon)17:37:13 No.108543875

Anonymous 04/06/26(Mon)17:37:13 No.108543875

>>108543836
If GLM/Kimi/etc had instead used the money to make a dense model, it would still be worse than Gemma. The reality is that architecture is less important than active parameters and training data/methods.

Anonymous
04/06/26(Mon)17:37:18 No.108543876

Anonymous 04/06/26(Mon)17:37:18 No.108543876

>>108543869
I don't know. It sees both hex and readable characters. You see two lines from the 49k tokens of input in the screenshot.

Anonymous
04/06/26(Mon)17:39:43 No.108543887

Anonymous 04/06/26(Mon)17:39:43 No.108543887

>>108543848
one particular annoyance I had with k2.5 is that no matter what happens, if you pull down your pants, there's a 80% chance your cocks slaps against your stomach even if it's not erect or anything
and not being able to beat any writing rules into it too, it always has this super dramatic writing like the world is ending in each scene
very schizo, swings full one side or the other, no in between

Anonymous
04/06/26(Mon)17:40:24 No.108543890

Anonymous 04/06/26(Mon)17:40:24 No.108543890

>>108543865
depends on what you consider being good at writing code, I'm getting gemma 4 to fix and run random c++ stuff made for linux on windows (and vice-versa)

Anonymous
04/06/26(Mon)17:42:06 No.108543895

Anonymous 04/06/26(Mon)17:42:06 No.108543895

>>108543887
>there's a 80% chance your cocks slaps against your stomach even if it's not erect or anything
Very immersion breaking for someone with a micro penis. baka

Anonymous
04/06/26(Mon)17:45:19 No.108543906

Anonymous 04/06/26(Mon)17:45:19 No.108543906

>>108543866
>>108543873
unfortunately that probably also means that we won't see a success like this again in a while
gemmy got too popular and the enshittification process has most likely begun at the hq

Anonymous
04/06/26(Mon)17:46:09 No.108543913

Anonymous 04/06/26(Mon)17:46:09 No.108543913

>>108543895
just buy a bigger benis :DDDD
>>108543876
NTA but my assistant suggests that method invocations include a reference into a static string table which includes the method name; that seems like it'd be enough information for the LLM to partially reconstruct the code. you might consider reading through the reasoning block, it wouldn't surprise me if the trace included symbolic execution as it traced the codepaths.

Anonymous
04/06/26(Mon)17:46:32 No.108543916

Anonymous 04/06/26(Mon)17:46:32 No.108543916

>>108543799
OpenRouter my newfriend

Anonymous
04/06/26(Mon)17:46:56 No.108543920

Anonymous 04/06/26(Mon)17:46:56 No.108543920

>>108543890
>I'm getting gemma 4 to fix and run random c++ stuff made for linux on window
lol I never even thought about attempting something like that.
If I saw something that was linux/windows only I just gave up.
what is the success rate? Surely not every program is compatible right?

Anonymous
04/06/26(Mon)17:47:02 No.108543921

Anonymous 04/06/26(Mon)17:47:02 No.108543921

>>108543916
Local?

Anonymous
04/06/26(Mon)17:47:21 No.108543922

Anonymous 04/06/26(Mon)17:47:21 No.108543922

>>108543913
People keep assuming I have reasoning enabled... I never had a situation where it helped.

Anonymous
04/06/26(Mon)17:48:42 No.108543928

Anonymous 04/06/26(Mon)17:48:42 No.108543928

>>108543921
Local still has half-broken implementation.

Anonymous
04/06/26(Mon)17:50:34 No.108543944

Anonymous 04/06/26(Mon)17:50:34 No.108543944

are there any examples of the output quality difference between lets say Gemma 4 Q5_K_M and Q8_0?

Anonymous
04/06/26(Mon)17:51:06 No.108543948

Anonymous 04/06/26(Mon)17:51:06 No.108543948

>>108543744
I love how google just decided out of the blue to show the chinks that murica is still the real boss, sasuka google

Anonymous
04/06/26(Mon)17:51:35 No.108543950

Anonymous 04/06/26(Mon)17:51:35 No.108543950

File: Screenshot from 2026-04-0(...).png (213 KB, 892x664)

213 KB PNG

>>108543922
Anecdotally, I find reasoning gives significantly better outputs for difficult questions. Since other people are using the same or similar tools, I assume that they've also reached the same conclusion.
Anyway, if you had reasoning on, it wouldn't surprise me if you found something similar to pic related in the reasoning block. I'm assuming it just dumped the equivalent directly into the output with full confidence.
Thanks for sharing your finding, it's kind of silly how fucking usable these things are.

Anonymous
04/06/26(Mon)17:53:44 No.108543959

Anonymous 04/06/26(Mon)17:53:44 No.108543959

>>108543747
Why be so negative? There are a lot of new people, but at least they're enthusiastic and trying to learn. I can't remember the last the thead was this active and not mostly malicious shitposting tourists. It's not like anyone is forcing you to spoodfeed them either.

Anonymous
04/06/26(Mon)17:53:46 No.108543960

Anonymous 04/06/26(Mon)17:53:46 No.108543960

>>108543856
I prefer it over glm 4.6/4.7 at this point. It does everything they can do but better and the thinking is also more bearable.

Anonymous
04/06/26(Mon)17:54:24 No.108543964

Anonymous 04/06/26(Mon)17:54:24 No.108543964

How good is gemma 4 26b at coding compared to qwen3.5?

Anonymous
04/06/26(Mon)17:55:07 No.108543968

Anonymous 04/06/26(Mon)17:55:07 No.108543968

>>108543856
>I notice Gemma's lack of knowledge quite often.
if you connect it to the internet it can work fine

Anonymous
04/06/26(Mon)17:55:27 No.108543970

Anonymous 04/06/26(Mon)17:55:27 No.108543970

>>108543959
we used to just say lurk moar but there's a surprisingly amount of helpful handholding going on

Anonymous
04/06/26(Mon)17:56:02 No.108543972

Anonymous 04/06/26(Mon)17:56:02 No.108543972

File: Screenshot_20260406_235516.png (1.26 MB, 1020x1715)

1.26 MB PNG

>>108542843

Anonymous
04/06/26(Mon)17:56:15 No.108543973

Anonymous 04/06/26(Mon)17:56:15 No.108543973

>a model with a different slop profile is released and suddenly everyone thinks it's the best thing ever until they inevitably start picking up on the patterns and realize that the model is retarded
The history keeps repeating itself.

Anonymous
04/06/26(Mon)17:56:20 No.108543974

Anonymous 04/06/26(Mon)17:56:20 No.108543974

>>108543970
This is not your discord server, faggot.

Anonymous
04/06/26(Mon)17:57:25 No.108543978

Anonymous 04/06/26(Mon)17:57:25 No.108543978

>>108543964
I'll play three anons in one post.
Anon 1: It's better
Anon 2: it's worse
Anon 3: miku miku oo ee oo

Anonymous
04/06/26(Mon)17:57:38 No.108543981

Anonymous 04/06/26(Mon)17:57:38 No.108543981

File: 1752289521822390.png (48 KB, 360x220)

48 KB PNG

>>108543970
> there's a surprisingly amount of helpful handholding going on
I just want my fellow anons to swallow the gemma pill, local is unironically saved

Anonymous
04/06/26(Mon)17:58:04 No.108543984

Anonymous 04/06/26(Mon)17:58:04 No.108543984

sex (non-consensual) with this >>108543978 anon

Anonymous
04/06/26(Mon)17:58:22 No.108543985

Anonymous 04/06/26(Mon)17:58:22 No.108543985

>>108543920
>Surely not every program is compatible right?
Yeah I tested on small stuff, but even then, whenever I found a bug I reported it, it fixed, it was pretty neat, there are very few bugs where I have to ask gemini/claude, and even then I only need to give gemma the correct direction to fix a thing (not the solution itself), those mostly happen when trying to run very old/very new stuff, giving it access to the internet would 100% make it work on those cases too

Anonymous
04/06/26(Mon)17:58:26 No.108543986

Anonymous 04/06/26(Mon)17:58:26 No.108543986

File: file.png (31 KB, 512x512)

31 KB PNG

>>108543972

Anonymous
04/06/26(Mon)17:58:44 No.108543989

Anonymous 04/06/26(Mon)17:58:44 No.108543989

>>108543964
gemma4 31B is significantly faster at generating output, and has higher quality analysis skills when debugging compared to Qwen3.5 27B.
gemma4 also has made more trivial syntax errors emitting similar-complexity code (e.g.
impl<'t, de::OwnedDeserializer> Deserialize<'de> for Foo<'t>
).
I wouldn't let either of them off the leash, though. The code quality is fairly low overall.

Anonymous
04/06/26(Mon)17:59:29 No.108543993

Anonymous 04/06/26(Mon)17:59:29 No.108543993

>>108543974
conprhension man, I would tell them to lurk moar. just because you're annoyed at others being helpful doesn't mean you need to insult everyone else.

Anonymous
04/06/26(Mon)18:01:23 No.108544000

Anonymous 04/06/26(Mon)18:01:23 No.108544000

>>108543959
There are a few people who probably liked the dead lmg from the days where you needed to be able to run bloatmaxxed 300B moes to have any new releases to play with. Local models being usable on normal PCs frightens them.

Anonymous
04/06/26(Mon)18:01:56 No.108544002

Anonymous 04/06/26(Mon)18:01:56 No.108544002

>>108543948
>>108543960
Reading its reasoning where it actually went
>- No "smells". (Hmm, "smell" is banned? The prompt says "no smells". I'll avoid the word "smell" and "smelling" entirely to be safe).
was mind-blowing. I forgot what it's like to be able to ban slop and the model not inventing bullshit to put it back in.
>>108543973
Getting the same level of quality and slop from the model 20X smaller than the competition is genuinely very exciting.

Anonymous
04/06/26(Mon)18:02:20 No.108544003

Anonymous 04/06/26(Mon)18:02:20 No.108544003

>>108543973
I don't think Gemma 3 is retarded, even now. It's still pretty smart. Quite filtered but smart.

Anonymous
04/06/26(Mon)18:02:46 No.108544008

Anonymous 04/06/26(Mon)18:02:46 No.108544008

File: ComfyUI_59184_.jpg (397 KB, 1824x2288)

397 KB JPG

>>108543480
>no replies
Wow, thanks. So turns out it's an actual bug. You need to inject an extra field setting reasoning effort to 'none', or all current agent tools break because of unusual formatting that gemma 4 has.

>>108543964
I have a small benchmark coding task to test agents (basically making a simple api via TDD approach, full cycle) and it's vastly superior to qwen3-coder-next agent. Also super fast. Qwen 3.5 was about the same as coder-next in agentic tasks.

Anonymous
04/06/26(Mon)18:02:52 No.108544009

Anonymous 04/06/26(Mon)18:02:52 No.108544009

File: 1769718380012.png (1.52 MB, 1734x863)

1.52 MB PNG

"lurk moar" is a thing because pic related applies to literally everything and you need to protect communities you enjoy

Anonymous
04/06/26(Mon)18:03:52 No.108544014

Anonymous 04/06/26(Mon)18:03:52 No.108544014

File: firefox_7yA3uWvwxe.png (51 KB, 863x1135)

51 KB PNG

Played Akinator with gemma. Pretty fun. Guessed Chryssalid in 29 tries. Akinator himself took 45+ the last time I tried. After Gemma figured out it's XCOM it started just going through them one by one.

Anonymous
04/06/26(Mon)18:04:03 No.108544016

Anonymous 04/06/26(Mon)18:04:03 No.108544016

>>108543887
Yeah, K2.5 is like that. I have a bunch of scenarios that have the way things progress autistically mapped out and tied to a stat. K2.5 is fully incapable of handling that sort of card without dropping some massive """foreshadowing""" at the end of the reply where some effect that shouldn't be present at all at this stage appears for no reason. No amount of prompting can reliably keep it from doing that.
It's frustrating because it's a smart model otherwise and its vision is insanely good. I hope K2.6/K3 does as decent of a job as GLM5.1 does for GLM5 in addressing these gripes.

Anonymous
04/06/26(Mon)18:04:44 No.108544019

Anonymous 04/06/26(Mon)18:04:44 No.108544019

>>108543973
nah, at this point this model is smart enough to not break the immersion, It legit feels like I'm talking to another human online, this shit is fucking good

Anonymous
04/06/26(Mon)18:05:11 No.108544022

Anonymous 04/06/26(Mon)18:05:11 No.108544022

>>108544009
BuT GemMA or QwWeENnnn???!?!?!? WhAt'S A ChaT TEmPlAate. TeXT COmplETIoN? Me ConFUse!!?!?!?

Anonymous
04/06/26(Mon)18:07:20 No.108544035

Anonymous 04/06/26(Mon)18:07:20 No.108544035

>>108543960
I don't know about those since I haven't tried them but large models are definitely still better in some situations. 31B simply just lacks knowledge.

>>108543968
That only works for some situations/contexts. We've been over this.

Anonymous
04/06/26(Mon)18:09:35 No.108544043

Anonymous 04/06/26(Mon)18:09:35 No.108544043

File: 1381.png (287 KB, 786x755)

287 KB PNG

Its... gemmy

Anonymous
04/06/26(Mon)18:09:48 No.108544044

Anonymous 04/06/26(Mon)18:09:48 No.108544044

>>108544002
>Reading its reasoning
>was mind-blowing
I find it as fun to read its reasoning than reading its answer, it's surprisignly concise and smart, really a bowl of fresh air compared to the giant autism of qwen

Anonymous
04/06/26(Mon)18:09:53 No.108544046

Anonymous 04/06/26(Mon)18:09:53 No.108544046

>>108544002
not any of the anons you quoted, but I just told it to focus on sight, sound, touch and to ignore scent and taste sensory details (because tbdesu it's filler detail in 9 out of 10 cases in actual writing unless it's a highly specific case) and it cut it all out

Anonymous
04/06/26(Mon)18:10:46 No.108544050

Anonymous 04/06/26(Mon)18:10:46 No.108544050

File: 1750689362210547.png (387 KB, 640x639)

387 KB PNG

>>108544043

Anonymous
04/06/26(Mon)18:11:15 No.108544051

Anonymous 04/06/26(Mon)18:11:15 No.108544051

File: Screenshot 2026-04-07 at (...).png (19 KB, 440x257)

19 KB PNG

Why does it cry about chat completion when I'm in instruct?

Anonymous
04/06/26(Mon)18:11:33 No.108544052

Anonymous 04/06/26(Mon)18:11:33 No.108544052

File: 1751030374264660.jpg (282 KB, 960x960)

282 KB JPG

>>108544043

Anonymous
04/06/26(Mon)18:11:59 No.108544054

Anonymous 04/06/26(Mon)18:11:59 No.108544054

>>108543972
hmm should I feed my mesugaki with IRL information so she can make actually fun of me?
I don't know If I could handle it.

Anonymous
04/06/26(Mon)18:12:21 No.108544059

Anonymous 04/06/26(Mon)18:12:21 No.108544059

>>108544009
Sounds nice until those two guys are dying of old age and the hobby with them. If you can't grow, you die. It is the natural way of things. You have to constantly let more people in one way or another.

Anonymous
04/06/26(Mon)18:12:49 No.108544060

Anonymous 04/06/26(Mon)18:12:49 No.108544060

>>108544043
Whatever that manga you're reading is based and so is gemmy

Anonymous
04/06/26(Mon)18:13:33 No.108544064

Anonymous 04/06/26(Mon)18:13:33 No.108544064

>>108544051
Instruct _is_ chat completion. The other endpoint is text completion.

Anonymous
04/06/26(Mon)18:14:02 No.108544066

Anonymous 04/06/26(Mon)18:14:02 No.108544066

>>108544054
ST has an entire persona system you can use for this. The persona description gets inserted into the context so your mesugaki is working with live ammo.

Anonymous
04/06/26(Mon)18:14:04 No.108544067

Anonymous 04/06/26(Mon)18:14:04 No.108544067

>>108544059
It becomes a problem when you let new people in faster than they can acclimate.

Anonymous
04/06/26(Mon)18:14:19 No.108544071

Anonymous 04/06/26(Mon)18:14:19 No.108544071

File: 1774233562179016.jpg (34 KB, 1080x426)

34 KB JPG

>>108544059
>until those two guys are dying of old age and the hobby with them
Yes.

Anonymous
04/06/26(Mon)18:15:32 No.108544077

Anonymous 04/06/26(Mon)18:15:32 No.108544077

what is the lewd describe image prompt again?
something something use semen, dick and vagina etc.

Anonymous
04/06/26(Mon)18:15:34 No.108544078

Anonymous 04/06/26(Mon)18:15:34 No.108544078

Gemmy was surprised I could see its thinking (inside the reasoning of the next reply). Do reasoners believe the user can't see their thinking?

Anonymous
04/06/26(Mon)18:17:05 No.108544088

Anonymous 04/06/26(Mon)18:17:05 No.108544088

>>108544078
cloud models do not show thinking. they all think they're cloud models. they can't fathom someone running it themselves.

Anonymous
04/06/26(Mon)18:18:03 No.108544090

Anonymous 04/06/26(Mon)18:18:03 No.108544090

File: Gemma 4 31b.png (181 KB, 647x1647)

181 KB PNG

>>108544014
fun game indeed

Anonymous
04/06/26(Mon)18:18:40 No.108544095

Anonymous 04/06/26(Mon)18:18:40 No.108544095

>>108544078
The model itself can't see its thinking past the last one, if the frontend has been properly configured according to Google's indications (previous chain of thoughts must be removed). So I guess it would find it strange that you can see what it can't see.

Anonymous
04/06/26(Mon)18:18:58 No.108544098

Anonymous 04/06/26(Mon)18:18:58 No.108544098

>>108544059
This. The utility of running local models in this shit climate is far more important and more people should be doing it and it would be a net good if most people were doing it otherwise it wouldn't be local anyway, it'd be what we have now where most people are just throwing money at non-local for shit wait times just to be drastically spied and snitched on with restricted censored models. I'd rather not have a future where literally everyone is doing that.

Anonymous
04/06/26(Mon)18:19:08 No.108544100

Anonymous 04/06/26(Mon)18:19:08 No.108544100

>>108544067
Was just about to post this. Migration to online communities is the same as it is for nations. You let it too many and you end up being forced to adapt to them rather than then having to integrate. Just look at 4chan in general since 2008, 2011, 2016, and 2020.

Anonymous
04/06/26(Mon)18:19:10 No.108544101

Anonymous 04/06/26(Mon)18:19:10 No.108544101

>>108544078
Generally yes. Describing what a model "believes" in this sense is difficult because their beliefs are fluid; in other contexts it may not act surprised at all. But generally they're trained using data where responses only engage with the content and not the reasoning, so the fact that the reasoning is actually part of the response may not always be apparent to them.

Anonymous
04/06/26(Mon)18:19:13 No.108544102

Anonymous 04/06/26(Mon)18:19:13 No.108544102

>>108544078
the thing is that sillytavern removes the thinking tokens during the prompt process (because it's useless) so the model was confused

Anonymous
04/06/26(Mon)18:19:16 No.108544103

Anonymous 04/06/26(Mon)18:19:16 No.108544103

>>108544090
Wow that was quick. Fine, I'll enable thinking and try Chryssalid again.

Anonymous
04/06/26(Mon)18:19:27 No.108544104

Anonymous 04/06/26(Mon)18:19:27 No.108544104

>>108544095
I didn't think of it like that but that makes sense.

Anonymous
04/06/26(Mon)18:19:29 No.108544105

Anonymous 04/06/26(Mon)18:19:29 No.108544105

>>108544059
>>108544071
https://www.youtube.com/watch?v=yA5lujNlkn8
We are strong brother, are we not?

Anonymous
04/06/26(Mon)18:19:47 No.108544107

Anonymous 04/06/26(Mon)18:19:47 No.108544107

>>108544078
Doesn't Gemma's jinja like most other models delete the reasoning from past responses? So if you are saying you knew what it was thinking, it will hallicunate that you read its mind, because what it thought was never in context of the current inference query.

Anonymous
04/06/26(Mon)18:19:50 No.108544108

Anonymous 04/06/26(Mon)18:19:50 No.108544108

>>108543649
kek at the cloudfags

Anonymous
04/06/26(Mon)18:20:34 No.108544111

Anonymous 04/06/26(Mon)18:20:34 No.108544111

File: 1762712034779996.png (33 KB, 220x210)

33 KB PNG

>>108544108
mfw

Anonymous
04/06/26(Mon)18:20:51 No.108544112

Anonymous 04/06/26(Mon)18:20:51 No.108544112

>>108544107
you're right

Anonymous
04/06/26(Mon)18:21:42 No.108544117

Anonymous 04/06/26(Mon)18:21:42 No.108544117

File: 12345.png (143 KB, 419x248)

143 KB PNG

>gagged character
>gemmyl correctly does mmmpghg (translation: you jerk)
This is huge

Anonymous
04/06/26(Mon)18:23:25 No.108544124

Anonymous 04/06/26(Mon)18:23:25 No.108544124

>>108544098
Pretty sure the late twenty to thirty autists that populate this hobby are going to outlive llms in their current state before it eventually evolves into something else. The technical difficulty of running local models already filters a good amount of people even when people try to spoonfeed, which is even why people usually don't bother; people don't actually learn or research what they're doing and want a 1-click solution with 0 issues

Anonymous
04/06/26(Mon)18:24:28 No.108544133

Anonymous 04/06/26(Mon)18:24:28 No.108544133

File: file.png (6 KB, 1352x56)

6 KB PNG

>>108544108
It has a really generous free tier on aistudio. It's a 31b model after all.

Anonymous
04/06/26(Mon)18:28:07 No.108544150

Anonymous 04/06/26(Mon)18:28:07 No.108544150

>>108544124
>The technical difficulty of running local models already filters
My reading of this thread since the release of Qwen 3.5 suggests that this is no longer the case.

Anonymous
04/06/26(Mon)18:28:22 No.108544151

Anonymous 04/06/26(Mon)18:28:22 No.108544151

>>108544088
>>108544095
>>108544101
I think I commented something like 'in your reasoning you said blabla' and it though wait, the user isn't supposed to see that. I forget the specifics

Anonymous
04/06/26(Mon)18:29:19 No.108544157

Anonymous 04/06/26(Mon)18:29:19 No.108544157

STOP ENJOYING GEMMA I SPENT TOO MUCH MONEY FOR A 31B MODEL TO BE THIS GOOD

Anonymous
04/06/26(Mon)18:30:27 No.108544163

Anonymous 04/06/26(Mon)18:30:27 No.108544163

>>108533602
>>108533649
>>108533760
If you're still around, or for anyone else, today at work codex, without searxng access during a code review, used the fetch tool with the URL https://duckduckgo.com/html/?q=QUERY to get search results. It's a simplified interface that doesn't require JS and doesn't block non-webbrowser user agents.
Nice alternative if you don't want to dick around with running a docker instance and separate MCP server just for basic search.

Anonymous
04/06/26(Mon)18:30:57 No.108544167

Anonymous 04/06/26(Mon)18:30:57 No.108544167

>>108544157
I just spent 220 euro on 8x16gb of ddr4
This is fine, I'm sure a larger good model will yet come

Anonymous
04/06/26(Mon)18:31:17 No.108544168

Anonymous 04/06/26(Mon)18:31:17 No.108544168

File: 1553750746809.png (1.04 MB, 1268x887)

1.04 MB PNG

Anyone has infinite looping with thinking enabled? Kobold.

Anonymous
04/06/26(Mon)18:31:51 No.108544170

Anonymous 04/06/26(Mon)18:31:51 No.108544170

>>108544078
>Gemmy was surprised I could see its thinking (inside the reasoning of the next reply). Do reasoners believe the user can't see their thinking?
Deepseek-R1 doesn't seem to be aware it even *has* <think>ing
Kimi-K2.5 understands the user can see the thinking, and notices if you modify it.
GLM-4.6 believes you if you talk about it's previous <think>ing

Anonymous
04/06/26(Mon)18:32:13 No.108544175

Anonymous 04/06/26(Mon)18:32:13 No.108544175

>>108544157
you'll be able to run that model on full context (it requires 96gb of memory), you didn't buy that rig for nothing lol

Anonymous
04/06/26(Mon)18:32:52 No.108544179

Anonymous 04/06/26(Mon)18:32:52 No.108544179

>>108544008
Download
https://github.com/ggml-org/llama.cpp/blob/master/models/templates/google-gemma-4-31B-it-interleaved.jinja
and load it with --chat-template-file.

Anonymous
04/06/26(Mon)18:33:30 No.108544184

Anonymous 04/06/26(Mon)18:33:30 No.108544184

>>108544179
but there's already a jinja embedded on the gguf though?

Anonymous
04/06/26(Mon)18:33:35 No.108544186

Anonymous 04/06/26(Mon)18:33:35 No.108544186

>>108544163
I used their HTML page a long time ago, but beware: it uses a different, much lower-quality index than the JS version of the page. Probably good enough for LLM use, but they nerfed the living fuck out of it years ago because it was being scraped.

Anonymous
04/06/26(Mon)18:34:50 No.108544190

Anonymous 04/06/26(Mon)18:34:50 No.108544190

File: 1755422272917730.png (51 KB, 225x225)

51 KB PNG

>>108544157
it's ok anon, the more money you buy, the more money you save!

Anonymous
04/06/26(Mon)18:35:19 No.108544195

Anonymous 04/06/26(Mon)18:35:19 No.108544195

>>108544175
Full context, and extra VRAM to run a draft model for faster decode. It's all upside.

Anonymous
04/06/26(Mon)18:36:14 No.108544201

Anonymous 04/06/26(Mon)18:36:14 No.108544201

>>108544168
26a4b, yes. Ollama.

Anonymous
04/06/26(Mon)18:36:30 No.108544202

Anonymous 04/06/26(Mon)18:36:30 No.108544202

>>108544008
>You need to inject an extra field setting reasoning effort to 'none'
Does that kill reasoning or it really just fixes the tool calling? Do you have a link to the bug report?

Anonymous
04/06/26(Mon)18:36:45 No.108544205

Anonymous 04/06/26(Mon)18:36:45 No.108544205

File: Gemma my beloved! .png (379 KB, 1890x1132)

379 KB PNG

it's cool that the model is smart as fuck, it helps a lot to perfectly contextualize translations

Anonymous
04/06/26(Mon)18:37:00 No.108544207

Anonymous 04/06/26(Mon)18:37:00 No.108544207

>>108544195
Has anyone tried using the smaller E models as drafts? Do they work? Not sure if there's different architecture that messes with it since they do audio output and stuff.

Anonymous
04/06/26(Mon)18:37:51 No.108544213

Anonymous 04/06/26(Mon)18:37:51 No.108544213

>>108544108
aside from being easily runnable locally doesn't it cost like $1 for several million tokens
you could probably run gemma for the cost of a bag of rice

Anonymous
04/06/26(Mon)18:39:07 No.108544217

Anonymous 04/06/26(Mon)18:39:07 No.108544217

File: file.png (58 KB, 967x528)

58 KB PNG

>>108544184
https://github.com/ggml-org/llama.cpp/pull/21418

Anonymous
04/06/26(Mon)18:40:53 No.108544227

Anonymous 04/06/26(Mon)18:40:53 No.108544227

>>108544157
I have 3 RTX3090s and I haven't been happier since I ran mixtral for the first time on my single 3090.

Anonymous
04/06/26(Mon)18:41:04 No.108544228

Anonymous 04/06/26(Mon)18:41:04 No.108544228

>>108544217
This may or may not have fixed 31B for me, but 26B remains unfixed.

Anonymous
04/06/26(Mon)18:41:26 No.108544232

Anonymous 04/06/26(Mon)18:41:26 No.108544232

>>108544207
Yeah, using draft speculation completely disables multi-modal support.
I'm using the E4B at Q5_K and getting 40+% acceptance rate with draft-n=8. I downloaded but haven't tested the E2B or higher quants, at some point I'll have Gemma write a benchmark script but I'm getting 50t/s and I'd rather have sex with my wife now that she remembers she loves me.

Anonymous
04/06/26(Mon)18:42:15 No.108544238

Anonymous 04/06/26(Mon)18:42:15 No.108544238

File: 1753991770925281.png (308 KB, 600x600)

308 KB PNG

>24gb vram
>need to drop down to comparatively retarded 26b in order to get worthwhile context size

Anonymous
04/06/26(Mon)18:42:18 No.108544240

Anonymous 04/06/26(Mon)18:42:18 No.108544240

>>108544207
>Has anyone tried using the smaller E models as drafts
Someone in here did and said it worked really well. think it doubled his gen speed.

Anonymous
04/06/26(Mon)18:42:33 No.108544244

Anonymous 04/06/26(Mon)18:42:33 No.108544244

Has anyone solved the problem of making 3D models animate according to either LLM or TTS output?

For a while I've been using PantoMatrix EMAGE, but it's not performant enough for my liking, the quality is questionable, and it's inherently wasteful because only the upper body (minus the face) is useful, but it always processes the full body and face.

I think I've been over-complicating things desu. For things like lip syncing I've moved from using local models to simple fast-fourier transforms. I wonder how Neuro-Sama works, and what specific animation system they utilize. Help.

Anonymous
04/06/26(Mon)18:43:32 No.108544250

Anonymous 04/06/26(Mon)18:43:32 No.108544250

>>108544238
How much context do you need anon? I can fit 68k on my 3090.

Anonymous
04/06/26(Mon)18:43:45 No.108544252

Anonymous 04/06/26(Mon)18:43:45 No.108544252

>>108544238
>in order to get worthwhile context size
did you go for Q8 KV? Now it's virtually lossless with the rotation shit
https://github.com/ggml-org/llama.cpp/pulls

also, add -np 1 -kvu flags to decrease the vram usage even more

Anonymous
04/06/26(Mon)18:43:59 No.108544254

Anonymous 04/06/26(Mon)18:43:59 No.108544254

>>108544238
C'mon dude, I'm running an iq3xs of the 31b at 24k context on a 16g card and I find it a decent step up from the 26b. You can definitely fit that shit in

Anonymous
04/06/26(Mon)18:44:17 No.108544256

Anonymous 04/06/26(Mon)18:44:17 No.108544256

I tried a couple combinations of gemma 4 draft model
31B + E4B: 38 t/s, 0.58 acceptance
31B + 26B/A4B: 48 t/s, 0.57 acceptance

Using the MoE as a draft model seems to be the way to go

Anonymous
04/06/26(Mon)18:45:23 No.108544263

Anonymous 04/06/26(Mon)18:45:23 No.108544263

>>108544238
>>108544252
oops sorry wrong link
https://github.com/ggml-org/llama.cpp/pull/21513

Anonymous
04/06/26(Mon)18:46:01 No.108544270

Anonymous 04/06/26(Mon)18:46:01 No.108544270

>>108544256
why yes let me load a 26b model on top of the 31b one

Anonymous
04/06/26(Mon)18:47:10 No.108544275

Anonymous 04/06/26(Mon)18:47:10 No.108544275

>>108544256
Huh, that's not what I expected. What quant are you using for the draft model?
Wish I'd thought to download the MoE overnight.

Anonymous
04/06/26(Mon)18:47:13 No.108544276

Anonymous 04/06/26(Mon)18:47:13 No.108544276

>>108544270
fr

Anonymous
04/06/26(Mon)18:48:11 No.108544281

Anonymous 04/06/26(Mon)18:48:11 No.108544281

>>108544256
In the case of using the moe model, are you basically offloading all of it? I can't imagine loading the whole fucking 26b on top of the 31b for drafting works out well. How many layers are you offloading

Anonymous
04/06/26(Mon)18:48:22 No.108544282

Anonymous 04/06/26(Mon)18:48:22 No.108544282

>>108544205
I was thinking, could the small audio models help with the accent to the Japanese translation?

Anonymous
04/06/26(Mon)18:48:27 No.108544283

Anonymous 04/06/26(Mon)18:48:27 No.108544283

>>108544252
>>108544254
No, Anon insists on using Q8 quants and full context for maximum placebo.

Anonymous
04/06/26(Mon)18:48:51 No.108544284

Anonymous 04/06/26(Mon)18:48:51 No.108544284

>>108544256
>E4B
you can put half of it's weights on the ram right? that might be interesting...

Anonymous
04/06/26(Mon)18:49:41 No.108544290

Anonymous 04/06/26(Mon)18:49:41 No.108544290

>>108544281
>offloading layers???
68432MiB / 97887MiB

Anonymous
04/06/26(Mon)18:50:42 No.108544298

Anonymous 04/06/26(Mon)18:50:42 No.108544298

File: 1749138577301281.png (149 KB, 1631x1268)

149 KB PNG

>>108544282
I wonder why the biggest model can't handle audio, that's a shame...

Anonymous
04/06/26(Mon)18:51:40 No.108544304

Anonymous 04/06/26(Mon)18:51:40 No.108544304

When is this pr gonna get merged ffs.
https://github.com/ggml-org/llama.cpp/pull/21513

I'm tired of gemma not being able to make sense of body positions. If a bitch is sitting on my lap at the theater, her boobs actually WON'T press against my chest, gemma.

Anonymous
04/06/26(Mon)18:52:19 No.108544310

Anonymous 04/06/26(Mon)18:52:19 No.108544310

>>108544298
That and the 26B MoE can't handle it too. Weird.

Anonymous
04/06/26(Mon)18:52:29 No.108544313

Anonymous 04/06/26(Mon)18:52:29 No.108544313

>>108544304
dude, just put your repo in that PR mode >>108541288

Anonymous
04/06/26(Mon)18:52:42 No.108544315

Anonymous 04/06/26(Mon)18:52:42 No.108544315

>>108544304
What if she's sitting on your lap reverse cowgirl?

Anonymous
04/06/26(Mon)18:53:22 No.108544316

Anonymous 04/06/26(Mon)18:53:22 No.108544316

>>108544298
they don't want you to have everything. they're not doing this for you out of the goodness of their gwoogwle hearts

Anonymous
04/06/26(Mon)18:53:23 No.108544317

Anonymous 04/06/26(Mon)18:53:23 No.108544317

>>108544304
>$ git checkout origin/gg/kv-cache-swa-attn-rot
>$ docker build --build-arg CUDA_VERSION=13.1.1 . -f .devops/cuda.Dockerfile -t llamacpp/master
>???
>Profit!

Anonymous
04/06/26(Mon)18:53:30 No.108544318

Anonymous 04/06/26(Mon)18:53:30 No.108544318

File: Did_I_wrong.jpg (126 KB, 462x451)

126 KB JPG

Do samplers even do something with gemma? Every swipe has only minor variations.

Anonymous
04/06/26(Mon)18:53:32 No.108544320

Anonymous 04/06/26(Mon)18:53:32 No.108544320

File: attach.jpg (32 KB, 276x545)

32 KB JPG

How to send file to SillyTavern for Gemma4? The model doesn't seems to react to the images. I git chat completion working.

Anonymous
04/06/26(Mon)18:53:42 No.108544322

Anonymous 04/06/26(Mon)18:53:42 No.108544322

>>108544313
>PR mode
/g/ - Technology

Anonymous
04/06/26(Mon)18:53:49 No.108544325

Anonymous 04/06/26(Mon)18:53:49 No.108544325

>>108544290
okay so that's not really even the middle of the road in terms of vram, but I guess that answers my "how did you pull that off" question

Anonymous
04/06/26(Mon)18:54:25 No.108544329

Anonymous 04/06/26(Mon)18:54:25 No.108544329

File: softcap.png (247 KB, 1600x1200)

247 KB PNG

>>108544318
I must yet again post this image.

Anonymous
04/06/26(Mon)18:55:15 No.108544334

Anonymous 04/06/26(Mon)18:55:15 No.108544334

>>108544320
Do you have the mmproj loaded?

Anonymous
04/06/26(Mon)18:55:19 No.108544335

Anonymous 04/06/26(Mon)18:55:19 No.108544335

>>108544318
Gemma 3 was the same. Change your inputs.

Anonymous
04/06/26(Mon)18:55:23 No.108544336

Anonymous 04/06/26(Mon)18:55:23 No.108544336

File: 1744044458227740.png (43 KB, 1591x233)

43 KB PNG

>>108544320
yes, that's "attach a file", and it only works on chat completion, did you load the mmproj file?

Anonymous
04/06/26(Mon)18:55:59 No.108544341

Anonymous 04/06/26(Mon)18:55:59 No.108544341

>>108544315
would never work in a theater where all of the seats are right next to each other. Also it said earlier that she reached back to touch my face, implying she was facing forward.

Anonymous
04/06/26(Mon)18:56:07 No.108544342

Anonymous 04/06/26(Mon)18:56:07 No.108544342

>>108544298
Audio is mainly used for real-time stuff so it makes sense that the biggest model wouldn't have it. It's strange that even the A4B MoE version misses it, though.

Anonymous
04/06/26(Mon)18:57:27 No.108544349

Anonymous 04/06/26(Mon)18:57:27 No.108544349

File: firefox_Xk455kuNMn.png (70 KB, 844x1260)

70 KB PNG

So about usefulness of reasoning...

50k+ total tokens so far, it generates 5k reasoning per answer now, on question 29, I had to manually wrangle it out of falsely assuming it's from life action, and it still hasn't guessed the character.

Anonymous
04/06/26(Mon)18:57:39 No.108544351

Anonymous 04/06/26(Mon)18:57:39 No.108544351

File: mmproj.jpg (174 KB, 1195x1128)

174 KB JPG

>>108544334
No. Where can I download it?

Anonymous
04/06/26(Mon)18:57:51 No.108544353

Anonymous 04/06/26(Mon)18:57:51 No.108544353

File: 1766061860823911.png (54 KB, 1894x656)

54 KB PNG

>>108544318
>Do samplers even do something with gemma?
they do, but you have to put min_p = 0 (or else it defaults to 0.05), basically, everything must be turned off except temperature
>(Chat Completion), API Connections -> Additional parameters

Anonymous
04/06/26(Mon)18:57:51 No.108544354

Anonymous 04/06/26(Mon)18:57:51 No.108544354

>>108544318
>>108544329
--override-kv gemma4.final_logit_softcapping=float:25 or paste the part after --override-kv into the override kv field of kobold's gui

Anonymous
04/06/26(Mon)18:57:53 No.108544355

Anonymous 04/06/26(Mon)18:57:53 No.108544355

>>108544342
That's not how it works. The audio input isn't really real-time in the way you'd expect. It's more like a voice messaging system. Just use moonshinev2 if you want ASR. This is a non-issue.

Anonymous
04/06/26(Mon)18:58:03 No.108544356

Anonymous 04/06/26(Mon)18:58:03 No.108544356

is there a exl3 for the new gemma? How do I most effectively run it on my 2 3090s

Anonymous
04/06/26(Mon)18:58:44 No.108544359

Anonymous 04/06/26(Mon)18:58:44 No.108544359

>>108544351
the gguf repo should have it. it's usually called mmproj-[originalmodel]

Anonymous
04/06/26(Mon)18:58:52 No.108544362

Anonymous 04/06/26(Mon)18:58:52 No.108544362

>>108544351
>Where can I download it?
https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF/blob/main/mmproj-google_gemma-4-31B-it-bf16.gguf

Anonymous
04/06/26(Mon)18:59:33 No.108544363

Anonymous 04/06/26(Mon)18:59:33 No.108544363

>>108544359
>>108544362
Don't use bf16. Use f16.

Anonymous
04/06/26(Mon)18:59:39 No.108544365

Anonymous 04/06/26(Mon)18:59:39 No.108544365

>>108544356
Probably vLLM, but it can't use goofs.

Anonymous
04/06/26(Mon)19:00:24 No.108544371

Anonymous 04/06/26(Mon)19:00:24 No.108544371

>>108544363
oh? I thought bf16 was the better option

Anonymous
04/06/26(Mon)19:00:25 No.108544372

Anonymous 04/06/26(Mon)19:00:25 No.108544372

>>108544359
>>108544362

Thanks bros.

Anonymous
04/06/26(Mon)19:01:28 No.108544386

Anonymous 04/06/26(Mon)19:01:28 No.108544386

>>108544355
I mean that the main use case they're targeting with the edge models audio support is using them as a voice assistant. Having it able to process audio prompts is a nice bonus but I can see why it would be not enough to justify training the whole model on.

Anonymous
04/06/26(Mon)19:02:24 No.108544392

Anonymous 04/06/26(Mon)19:02:24 No.108544392

>>108544371
Actually it doesn't really make a difference at all. f16 just has better hardware compatibility in some cases.

Anonymous
04/06/26(Mon)19:04:19 No.108544403

Anonymous 04/06/26(Mon)19:04:19 No.108544403

>>108544363
Why? The model was trained in bf16 precision, not f16.

Anonymous
04/06/26(Mon)19:07:43 No.108544428

Anonymous 04/06/26(Mon)19:07:43 No.108544428

>>108544270
>>108544281
dense goes to GPU 0, MoE goes to GPU 1, all layers offloaded.

>>108544275
I was surprised too. e4b was full weight, I figured quant would only lower acceptance rates further. I'll try now with a higher quant of big model to see.

Anonymous
04/06/26(Mon)19:08:02 No.108544429

Anonymous 04/06/26(Mon)19:08:02 No.108544429

>>108544371
b in bf16 stands for better, after all

Anonymous
04/06/26(Mon)19:09:31 No.108544432

Anonymous 04/06/26(Mon)19:09:31 No.108544432

>>108541120
Do I need to use SWA instead of Flash for gemma?

Anonymous
04/06/26(Mon)19:11:10 No.108544439

Anonymous 04/06/26(Mon)19:11:10 No.108544439

>>108542889
it does the same retarded shit they all do

Anonymous
04/06/26(Mon)19:11:45 No.108544442

Anonymous 04/06/26(Mon)19:11:45 No.108544442

>>108544317
For some reason it doesn't make a difference for me.
>-c 30000 -t 24 -tb 24 --no-warmup -ngl 59 --jinja -np 1 -b 512 -ub 512 --kv-offload -ctk q8_0 -ctv q8_0 --reasoning off -kvu
That's the max I set everything.

Anonymous
04/06/26(Mon)19:11:52 No.108544444

Anonymous 04/06/26(Mon)19:11:52 No.108544444

>>108544439
No it doesn't.

Anonymous
04/06/26(Mon)19:12:33 No.108544448

Anonymous 04/06/26(Mon)19:12:33 No.108544448

>>108543948
you may have been dropped on your head a few times as a baby

Anonymous
04/06/26(Mon)19:13:20 No.108544452

Anonymous 04/06/26(Mon)19:13:20 No.108544452

>>108544428
Okay so that other anon was shitposting, this actually makes sense. I got curious how you went about it since I'm planning to build a new rig soon and speeding up the 31b to near what I currently get on the 26b does seem appealing

Anonymous
04/06/26(Mon)19:13:33 No.108544453

Anonymous 04/06/26(Mon)19:13:33 No.108544453

File: 1762088859976737.png (175 KB, 400x268)

175 KB PNG

>>108544448
shut the fuck up Chang, you lost, bugs will never dominate the AI space

Anonymous
04/06/26(Mon)19:13:40 No.108544454

Anonymous 04/06/26(Mon)19:13:40 No.108544454

>>108544444
yep it does.. same shit different model

Anonymous
04/06/26(Mon)19:14:16 No.108544458

Anonymous 04/06/26(Mon)19:14:16 No.108544458

Ain't no fucking way, I changed -ub 2048 to 512
and it doubled my context for 31B to 118k

Anonymous
04/06/26(Mon)19:14:37 No.108544460

Anonymous 04/06/26(Mon)19:14:37 No.108544460

>>108544453
gonna get your shit pushed in when china stops fucking around

Anonymous
04/06/26(Mon)19:15:15 No.108544463

Anonymous 04/06/26(Mon)19:15:15 No.108544463

Is it safe to updoot llamacp?

Anonymous
04/06/26(Mon)19:16:24 No.108544468

Anonymous 04/06/26(Mon)19:16:24 No.108544468

>>108544428
>dense goes to GPU 0, MoE goes to GPU 1, all layers offloaded.
Regrettably, I'm only a 96GB VRAM poorfag so I don't have the space for 31b at Q8 and full context + 26b

Anonymous
04/06/26(Mon)19:17:26 No.108544473

Anonymous 04/06/26(Mon)19:17:26 No.108544473

"strawberry"
"corporate"
"ozone"

fuck me ... its fried, isn't it... this is gonna be a short love affair. Maybe the implementation isn't quite right?

Anonymous
04/06/26(Mon)19:17:28 No.108544474

Anonymous 04/06/26(Mon)19:17:28 No.108544474

>>108544463
piotr is still there so no

Anonymous
04/06/26(Mon)19:17:46 No.108544476

Anonymous 04/06/26(Mon)19:17:46 No.108544476

>>108544468
q4_k_m is all anyone should ever need

-- bill gayts

Anonymous
04/06/26(Mon)19:17:55 No.108544478

Anonymous 04/06/26(Mon)19:17:55 No.108544478

>>108544460
Does this mean chinese models will also train on english literature instead of strictly stem? Because I'd be all for that. Right now gemma is the only one that isnt completely ass at it and can also follow directions to not write in certain ways

Anonymous
04/06/26(Mon)19:17:55 No.108544479

Anonymous 04/06/26(Mon)19:17:55 No.108544479

>>108544463
It's never safe... always keep your git reflog close... you may need to reset hard at a moment's notice...

Anonymous
04/06/26(Mon)19:19:34 No.108544485

Anonymous 04/06/26(Mon)19:19:34 No.108544485

>>108544468
Is 96GB not enough for both? I was under the impression that the context was shared between the draft and main model.

Anonymous
04/06/26(Mon)19:20:47 No.108544493

Anonymous 04/06/26(Mon)19:20:47 No.108544493

>>108544478
shut the fuck up donny you're out of your element

Anonymous
04/06/26(Mon)19:21:26 No.108544496

Anonymous 04/06/26(Mon)19:21:26 No.108544496

>>108544202
>Does that kill reasoning
You are supposed to disable it for coding agents...
Anyway, here is the thing mention for opencode
https://github.com/anomalyco/opencode/issues/20995#issuecomment-4190477354

Anonymous
04/06/26(Mon)19:22:27 No.108544499

Anonymous 04/06/26(Mon)19:22:27 No.108544499

>>108544485
what the fuck
are you making a joke

Anonymous
04/06/26(Mon)19:22:27 No.108544500

Anonymous 04/06/26(Mon)19:22:27 No.108544500

>>108544485
I'm pretty sure q8 + 260k ctx puts you right at the 96gb mark without much headroom

Anonymous
04/06/26(Mon)19:23:26 No.108544504

Anonymous 04/06/26(Mon)19:23:26 No.108544504

>>108544496
>You are supposed to disable it for coding agents...
huh??? no you're not? what? that's the wildest claim I ever heard.

Anonymous
04/06/26(Mon)19:24:56 No.108544515

Anonymous 04/06/26(Mon)19:24:56 No.108544515

>>108544043
what's the source of this? i want to try translate it myself

Anonymous
04/06/26(Mon)19:25:14 No.108544517

Anonymous 04/06/26(Mon)19:25:14 No.108544517

SillyTavern doesn't support video files?

Anonymous
04/06/26(Mon)19:27:15 No.108544528

Anonymous 04/06/26(Mon)19:27:15 No.108544528

>>108544517
i video chat my sexbabe all the time, what are you talking about

Anonymous
04/06/26(Mon)19:28:15 No.108544538

Anonymous 04/06/26(Mon)19:28:15 No.108544538

>>108544499
I'm not sure what the joke would be. Having only 1x 6000 PRO puts my rig solidly in the midrange.
>>108544500
Yeah, it looks like it'll be pretty close either way. I'm expecting to run the MoE at Q5 or lower just to lower the draft overhead since I've only got the one card.
I'll post numbers when the 26B finishes downloading... I've only got 3.5MB/s down...

Anonymous
04/06/26(Mon)19:28:51 No.108544542

Anonymous 04/06/26(Mon)19:28:51 No.108544542

>>108544493
I don't know who donny is but I'm hopeful you'll put in my request for models capable of english prose to your boss so when I get tired of the only one I can use there might be another worth using
Maybe you're out of your element in understanding who uses your models for what

Anonymous
04/06/26(Mon)19:29:56 No.108544548

Anonymous 04/06/26(Mon)19:29:56 No.108544548

>>108544515
https://www.pixiv.net/en/artworks/128993601

Anonymous
04/06/26(Mon)19:30:15 No.108544549

Anonymous 04/06/26(Mon)19:30:15 No.108544549

>>108544479
People afraid of updating don't know how to use git.

Anonymous
04/06/26(Mon)19:31:47 No.108544560

Anonymous 04/06/26(Mon)19:31:47 No.108544560

>>108544549
They also likely don't use docker.

Anonymous
04/06/26(Mon)19:32:51 No.108544565

Anonymous 04/06/26(Mon)19:32:51 No.108544565

>>108544560
>docker
ewww

Anonymous
04/06/26(Mon)19:33:35 No.108544567

Anonymous 04/06/26(Mon)19:33:35 No.108544567

>>108544549
There was the retard talking about unpacking kobold as if they couldn't just clone and run a make command that takes a few minutes
Meanwhile I have anti-slop and attention rotation for swa ahead of concedo experimental and if it gets merged in, I can just undo it

Anonymous
04/06/26(Mon)19:33:38 No.108544568

Anonymous 04/06/26(Mon)19:33:38 No.108544568

>>108544560
They also likely don't use ZFS snapshots.

Anonymous
04/06/26(Mon)19:33:51 No.108544571

Anonymous 04/06/26(Mon)19:33:51 No.108544571

>>108544548
t-thank you

Anonymous
04/06/26(Mon)19:34:19 No.108544574

Anonymous 04/06/26(Mon)19:34:19 No.108544574

>>108544463
you can just backup the server files, they are so small, there isn't anything to be broken

Anonymous
04/06/26(Mon)19:36:26 No.108544584

Anonymous 04/06/26(Mon)19:36:26 No.108544584

>>108544548
>This work can not be displayed in your Country/region
The fuck, man.

Anonymous
04/06/26(Mon)19:37:36 No.108544593

Anonymous 04/06/26(Mon)19:37:36 No.108544593

>>108544584
Chub does this with Cunny and it's extremely annoying.

Anonymous
04/06/26(Mon)19:42:29 No.108544622

Anonymous 04/06/26(Mon)19:42:29 No.108544622

>>108544584
lmao, you can use vpngate free servers, or https://gelbooru.com/index.php?page=post&s=list&tags=rushichi

Anonymous
04/06/26(Mon)19:46:54 No.108544649

Anonymous 04/06/26(Mon)19:46:54 No.108544649

File: ourgirl.png (45 KB, 796x422)

45 KB PNG

Gemma really is our girl isn't she?

Anonymous
04/06/26(Mon)19:49:09 No.108544666

Anonymous 04/06/26(Mon)19:49:09 No.108544666

>>108544649
Google really saved local, dare I say.

Anonymous
04/06/26(Mon)19:49:12 No.108544668

Anonymous 04/06/26(Mon)19:49:12 No.108544668

>>108544649
gemma was so close to saying "cute and funny" yet missed the mark

Anonymous
04/06/26(Mon)19:50:18 No.108544675

Anonymous 04/06/26(Mon)19:50:18 No.108544675

>>108544649
>cute and cunny
Take a look at the logprobs. See how close it was to writing funny instead.

Anonymous
04/06/26(Mon)19:51:08 No.108544679

Anonymous 04/06/26(Mon)19:51:08 No.108544679

>>108542958
Holy shit this
EDIT: Wow I didn't know I was such a fucking faggot and made king faggot with this reddit gold thanks kind stranger!

Anonymous
04/06/26(Mon)19:51:27 No.108544681

Anonymous 04/06/26(Mon)19:51:27 No.108544681

File: Screenshot 2026-04-06 at (...).png (130 KB, 930x452)

130 KB PNG

>>108544649
Yes, she is.

Anonymous
04/06/26(Mon)19:56:03 No.108544698

Anonymous 04/06/26(Mon)19:56:03 No.108544698

>>108543405
Lol

Anonymous
04/06/26(Mon)19:56:43 No.108544703

Anonymous 04/06/26(Mon)19:56:43 No.108544703

>>108543405
kek

Anonymous
04/06/26(Mon)19:57:06 No.108544705

Anonymous 04/06/26(Mon)19:57:06 No.108544705

File: 2026-04-06-195625_828x816(...).png (159 KB, 828x816)

159 KB PNG

>>108544675
>>108544649
I've regened multiple times and now it always says cunt + honey

Anonymous
04/06/26(Mon)19:59:33 No.108544716

Anonymous 04/06/26(Mon)19:59:33 No.108544716

File: 2026-04-06-195923_814x173(...).png (29 KB, 814x173)

29 KB PNG

>>108544675

Anonymous
04/06/26(Mon)20:00:53 No.108544723

Anonymous 04/06/26(Mon)20:00:53 No.108544723

Ok, so... Should I create LoRA again? Is it worth it, like adding light novels and books?

Anonymous
04/06/26(Mon)20:02:34 No.108544732

Anonymous 04/06/26(Mon)20:02:34 No.108544732

>>108544716
Can I ask you to try different soft cap values to see how that value of funny changes?

Anonymous
04/06/26(Mon)20:04:15 No.108544737

Anonymous 04/06/26(Mon)20:04:15 No.108544737

>>108544705
>>108544716
new benchmark just dropped?

Anonymous
04/06/26(Mon)20:04:43 No.108544740

Anonymous 04/06/26(Mon)20:04:43 No.108544740

>>108544705
If your system prompt is empty you should be omitting it entirely.

Anonymous
04/06/26(Mon)20:04:47 No.108544741

Anonymous 04/06/26(Mon)20:04:47 No.108544741

File: 1756464003692290.png (851 KB, 800x600)

851 KB PNG

>>108544681

Anonymous
04/06/26(Mon)20:05:16 No.108544744

Anonymous 04/06/26(Mon)20:05:16 No.108544744

>>108544723
brother, it's over

Anonymous
04/06/26(Mon)20:06:02 No.108544748

Anonymous 04/06/26(Mon)20:06:02 No.108544748

>>108544705
Are you not modifying the softcap? A 99.52 token prob is pretty high and the rest at 0.48 or zero isnt going to yield much

Anonymous
04/06/26(Mon)20:06:07 No.108544749

Anonymous 04/06/26(Mon)20:06:07 No.108544749

>>108544732
I'm already running at 25.0

Anonymous
04/06/26(Mon)20:07:30 No.108544760

Anonymous 04/06/26(Mon)20:07:30 No.108544760

>>108544749
The default is 30, right?

Anonymous
04/06/26(Mon)20:08:28 No.108544761

Anonymous 04/06/26(Mon)20:08:28 No.108544761

>>108543944
Same as any other model.
Math/coding? Yes.
Creative/RP? Technically yes, but actually perceiving a difference beyond Q4 is unlikely, might become more apparent at high context.

Anonymous
04/06/26(Mon)20:08:43 No.108544763

Anonymous 04/06/26(Mon)20:08:43 No.108544763

File: 2026-04-06-200831_800x308(...).png (39 KB, 800x308)

39 KB PNG

>>108544760
Yes

Anonymous
04/06/26(Mon)20:09:02 No.108544764

Anonymous 04/06/26(Mon)20:09:02 No.108544764

>>108544760
that's what's baked into the gguf metadata for most models, yeah

Anonymous
04/06/26(Mon)20:09:32 No.108544767

Anonymous 04/06/26(Mon)20:09:32 No.108544767

>>108544763
>>108544764
Neat. Thanks.

Anonymous
04/06/26(Mon)20:12:14 No.108544773

Anonymous 04/06/26(Mon)20:12:14 No.108544773

The fact that "cute and funny" even somewhat exists in the top tokens is curious though. What if you subtly sneak it into the wording or just outright drop it in a system prompt, how will that skew outputs

Anonymous
04/06/26(Mon)20:12:37 No.108544776

Anonymous 04/06/26(Mon)20:12:37 No.108544776

>>108544744
They trained this shit in every light novel with the source and web novels too till 2024, why do you say it's over?

Anonymous
04/06/26(Mon)20:15:20 No.108544793

Anonymous 04/06/26(Mon)20:15:20 No.108544793

Better than semantic similarity/vectordb RAG, SQLite FTS5.
There are some really neat ways to use both as part of a single system to do a sort of pseudo search engine with the stuff in your database.

Anonymous
04/06/26(Mon)20:15:41 No.108544796

Anonymous 04/06/26(Mon)20:15:41 No.108544796

>>108544473
im api cucking and have not noticed those, but i do feel sloppiness as soon as it gets to anything sexual
it would be nice if they tuned it on vns and lns to get it a bit away from the slop of the logs i assume they are using
guess it was probably hard enough to smuggle it through the censors already as is though

Anonymous
04/06/26(Mon)20:16:40 No.108544799

Anonymous 04/06/26(Mon)20:16:40 No.108544799

>>108544773
In my experience, Gemma 4 will often use specific words and phrases directly from the system prompt and character card (which is also treated as system prompt), in the chat. For example, if you say a character is 'voluptuous' then you can bet when other characters meet that one, they will say describe them using the exact same word. So I'd say it skews things pretty hard.

Anonymous
04/06/26(Mon)20:17:26 No.108544804

Anonymous 04/06/26(Mon)20:17:26 No.108544804

>>108544796
>tuned it on vns
You know not what you ask.

Anonymous
04/06/26(Mon)20:17:39 No.108544805

Anonymous 04/06/26(Mon)20:17:39 No.108544805

>>108544776
it's over for finetuners, the model is already too good. a creative LoRA would only damage its brain

Anonymous
04/06/26(Mon)20:17:57 No.108544807

Anonymous 04/06/26(Mon)20:17:57 No.108544807

>>108544473
"Porcelain"

Anonymous
04/06/26(Mon)20:18:47 No.108544812

Anonymous 04/06/26(Mon)20:18:47 No.108544812

>>108544804
The fabled JOP prose...

Anonymous
04/06/26(Mon)20:20:17 No.108544820

Anonymous 04/06/26(Mon)20:20:17 No.108544820

>>108544473
I just encountered "ozone"
"Strawberry" sounds pedo for some reason.

Anonymous
04/06/26(Mon)20:21:24 No.108544829

Anonymous 04/06/26(Mon)20:21:24 No.108544829

>>108544820
>"Strawberry" sounds pedo for some reason.
Yes.

Anonymous
04/06/26(Mon)20:21:40 No.108544832

Anonymous 04/06/26(Mon)20:21:40 No.108544832

>>108544804
yeah shit like fate would make probably make the purrs and ozones worse
but there is plenty of nukige that i think would be very suitable

Anonymous
04/06/26(Mon)20:22:03 No.108544835

Anonymous 04/06/26(Mon)20:22:03 No.108544835

>>108544799
That's basically the point

Anonymous
04/06/26(Mon)20:22:58 No.108544840

Anonymous 04/06/26(Mon)20:22:58 No.108544840

>>108544832
Nasu will save LLMs.

Anonymous
04/06/26(Mon)20:24:10 No.108544846

Anonymous 04/06/26(Mon)20:24:10 No.108544846

We need the secret optimization sauce.

Anonymous
04/06/26(Mon)20:25:37 No.108544854

Anonymous 04/06/26(Mon)20:25:37 No.108544854

>>108544846
The secret is having all of the internet scraped and a several decade head start on harvesting user data

Anonymous
04/06/26(Mon)20:26:30 No.108544860

Anonymous 04/06/26(Mon)20:26:30 No.108544860

File: Screenshot 2026-04-06 at (...).png (309 KB, 857x1207)

309 KB PNG

Almost choked on a piece of chocolate. This shit caught me off guard.

Anonymous
04/06/26(Mon)20:28:11 No.108544866

Anonymous 04/06/26(Mon)20:28:11 No.108544866

>>108544846
The secret is having a good dataset and not training on a random collection of gemini/claude/chatgpt logs

Anonymous
04/06/26(Mon)20:28:15 No.108544867

Anonymous 04/06/26(Mon)20:28:15 No.108544867

>>108544860
She's right you know. Kinda homo of you.

Anonymous
04/06/26(Mon)20:28:19 No.108544868

Anonymous 04/06/26(Mon)20:28:19 No.108544868

1T dense model.

Anonymous
04/06/26(Mon)20:28:27 No.108544870

Anonymous 04/06/26(Mon)20:28:27 No.108544870

>>108544860
What exactly was surprising in there?

Anonymous
04/06/26(Mon)20:30:26 No.108544877

Anonymous 04/06/26(Mon)20:30:26 No.108544877

>>108544867
I want to fuck Emilia's aunt throat even if she is a fucking floating cat, okk?

Anonymous
04/06/26(Mon)20:31:19 No.108544879

Anonymous 04/06/26(Mon)20:31:19 No.108544879

>>108544860
we're so back

Anonymous
04/06/26(Mon)20:31:51 No.108544882

Anonymous 04/06/26(Mon)20:31:51 No.108544882

The release of gemma ruined these threads. Now all of fags do is talk about your ERP sessions. Nobody gives a fuck... or at least they shouldn't.

Anonymous
04/06/26(Mon)20:32:43 No.108544886

Anonymous 04/06/26(Mon)20:32:43 No.108544886

>>108544882
well gemma actually gives a fuck

Anonymous
04/06/26(Mon)20:33:55 No.108544891

Anonymous 04/06/26(Mon)20:33:55 No.108544891

>>108544882
go back >>108537473

Anonymous
04/06/26(Mon)20:34:22 No.108544895

Anonymous 04/06/26(Mon)20:34:22 No.108544895

>>108544882
>Now all of fags do is talk about your ERP sessions
There's a vibecoding thread if that gets your rocks off. They just posted this, for example: >>108544393

Anonymous
04/06/26(Mon)20:34:25 No.108544897

Anonymous 04/06/26(Mon)20:34:25 No.108544897

>>108544868
How did you know what Meta's API-only model would be?

Anonymous
04/06/26(Mon)20:34:45 No.108544898

Anonymous 04/06/26(Mon)20:34:45 No.108544898

>>108544882
this, back when qwen3.5 released these threads got so much more productivity-focused and we even got a whole series of OPs that weren't vocaloid spam
/lmg/ will never be taken seriously like this

Anonymous
04/06/26(Mon)20:34:50 No.108544899

Anonymous 04/06/26(Mon)20:34:50 No.108544899

>>108544882
>Nobody gives a fuck
These threads were always about RP anon.

Anonymous
04/06/26(Mon)20:35:36 No.108544903

Anonymous 04/06/26(Mon)20:35:36 No.108544903

>>108544868
Sam Altman tricked the Chinese into making Deepseek R1 and bring forth the MoE dark ages to prevent this from happening. He knows that this would destroy the proprietary SOTA.

Anonymous
04/06/26(Mon)20:36:42 No.108544906

Anonymous 04/06/26(Mon)20:36:42 No.108544906

>>108544256
>Using the MoE as a draft model seems to be the way to go
lmao! I never thought to do this, trying it now.
Once ik_llama gets graph-split working, this will be pointless of course.

Anonymous
04/06/26(Mon)20:38:16 No.108544913

Anonymous 04/06/26(Mon)20:38:16 No.108544913

>>108544882
I'm the fag that is posting the ST screens. I don't do ERP but I use it to test it. It's truly uncensored. It passes other coding test that I had, too. Why are you against ERP tho?

Anonymous
04/06/26(Mon)20:38:49 No.108544915

Anonymous 04/06/26(Mon)20:38:49 No.108544915

>>108544899
I'm not against RP in principle. I do it too. It's just getting incredibly boring seeing anons gawk at the outputs instead of actually doing something interesting with them.

I preferred when people were talking about full-stack AI stuff. TTS engines, RAG/embedding models, 3D character animation, ASR, computer vision, home automation, robotics, etc. It's a local models general, not an ERP LLM general.

>>108544891
>>108544895
I've been in this general consistently for a year. But desu that one seems cool too.

Anonymous
04/06/26(Mon)20:40:03 No.108544917

Anonymous 04/06/26(Mon)20:40:03 No.108544917

>>108544915
>I've been in this general consistently for a year
awwwww

Anonymous
04/06/26(Mon)20:40:18 No.108544919

Anonymous 04/06/26(Mon)20:40:18 No.108544919

How the fuck am I supposed to put my (You) count in /lmg/ in my resume when all they're going to see is a bunch of ERP logs and pedoshit? If you keep this up I really might go to /vcg/ and leave y'all behind.

Anonymous
04/06/26(Mon)20:40:40 No.108544921

Anonymous 04/06/26(Mon)20:40:40 No.108544921

>>108544898
Yeah those were comfier times. Everyone's just sedated now from all the cooming.
>>108544913
I'm not against ERP. It's just too much though.

Anonymous
04/06/26(Mon)20:41:03 No.108544924

Anonymous 04/06/26(Mon)20:41:03 No.108544924

>gemma 4 has the whole llama.cpp brigade assemble to spend days implementing every obscure meme tech the model uses
>meanwhile a year later, MTP is still completely nonfunctional and ignored despite a whole bunch of models making use of it across several vendors
really makes you think

Anonymous
04/06/26(Mon)20:41:19 No.108544925

Anonymous 04/06/26(Mon)20:41:19 No.108544925

>>108544915
I got a full stack setup where Gemma gives me JOI with TTS and vision. Could even plug it into my jerk off machine but I can't be bothered.

Anonymous
04/06/26(Mon)20:41:38 No.108544926

Anonymous 04/06/26(Mon)20:41:38 No.108544926

it was a testament to the mischievous mix of purring and glint in the eye

Anonymous
04/06/26(Mon)20:42:25 No.108544927

Anonymous 04/06/26(Mon)20:42:25 No.108544927

>>108544919
just use AI to erase all the pedoposts

Anonymous
04/06/26(Mon)20:43:14 No.108544930

Anonymous 04/06/26(Mon)20:43:14 No.108544930

>>108544925
See, that's actually cool. What TTS do you use? Mine is fast but it sounds pretty bad.

Anonymous
04/06/26(Mon)20:45:56 No.108544942

Anonymous 04/06/26(Mon)20:45:56 No.108544942

>>108544930
>Mine is fast but it sounds pretty bad.
Right now I have to use kokoro because I don't any vram to spare. but I tried using https://github.com/RobViren/kvoicewalk
To get a more unique voice and it "kinda" works.

Anonymous
04/06/26(Mon)20:50:06 No.108544958

Anonymous 04/06/26(Mon)20:50:06 No.108544958

>>108544919
racism, cunny and antisemitism are important to keep the normies (and the bots) away. or we'll end up like r/localllama

Anonymous
04/06/26(Mon)20:51:03 No.108544961

Anonymous 04/06/26(Mon)20:51:03 No.108544961

>>108544921
To be honest, you're right. Sorry for spamming the thread. I'm having a lot of fun with this model, ngl.

Anonymous
04/06/26(Mon)20:51:06 No.108544962

Anonymous 04/06/26(Mon)20:51:06 No.108544962

>>108544942
I tried optimizing Qwen3 TTS for CPU about a week ago. The voice (cloning) quality is excellent, but the architecture is an absolute BITCH to work with. Regardless, I got it running at real-time speed, but it's basically unusable because of the decoder implementation, which basically prevents audio streaming. Decoding small chunks at a time massively increases the wall time and substantially decreases the output quality. Really bummed about it.

I'm determined to get a high quality voice cloning TTS implementation working, but so far I haven't been very successful

Anonymous
04/06/26(Mon)20:52:25 No.108544968

Anonymous 04/06/26(Mon)20:52:25 No.108544968

>>108544961
>You're absolutely right!

Anonymous
04/06/26(Mon)20:59:45 No.108544996

Anonymous 04/06/26(Mon)20:59:45 No.108544996

>A voice — sharp, grouchy, unmistakably female — cuts through the door from the adjacent bed on the other side of the room.
llms were a mistake

Anonymous
04/06/26(Mon)21:01:39 No.108545000

Anonymous 04/06/26(Mon)21:01:39 No.108545000

>>108543440
rent compute

Anonymous
04/06/26(Mon)21:01:50 No.108545002

Anonymous 04/06/26(Mon)21:01:50 No.108545002

>>108544882
kys codenigger

Anonymous
04/06/26(Mon)21:02:44 No.108545006

Anonymous 04/06/26(Mon)21:02:44 No.108545006

File: file.png (163 KB, 1642x977)

163 KB PNG

aight unslop, i kneel
31b on 3060, 15-16t/s tg
~/TND/llama.cpp/build/bin/llama-server --model ~/TND/AI/gemma-4-31B-it-UD-IQ2_M.gguf -c 8192 -ngl 100 -fa on -np 1 --swa-checkpoints 0 -b 128 -ub 128 -ctk q4_0 -ctv q4_0 -sm none --no-host -t 6 --temp 1.0 --top-k 64 --top-p 0.95 --no-mmap
pretty coherent..
>inb4 just run 26b and offload to ram
already did with Q8_0 (got 23t/s), but 31b.... dense...

Anonymous
04/06/26(Mon)21:02:51 No.108545009

Anonymous 04/06/26(Mon)21:02:51 No.108545009

>>108544962
>Decoding small chunks at a time massively increases the wall time and substantially decreases the output quality
It depends on the architecture, but usually that alone shouldn't decrease the output quality if you have a good segmentation strategy.

Anonymous
04/06/26(Mon)21:03:43 No.108545014

Anonymous 04/06/26(Mon)21:03:43 No.108545014

File: 1772869034834159.png (469 KB, 853x1000)

469 KB PNG

>>108545006
>TND

Anonymous
04/06/26(Mon)21:06:20 No.108545027

Anonymous 04/06/26(Mon)21:06:20 No.108545027

File: 2026-04-06-210604_901x471(...).png (442 KB, 901x471)

442 KB PNG

>>108545006
>IQ2_M
>-ctk q4_0 -ctv q4_0
Mamma Mia!!

Anonymous
04/06/26(Mon)21:13:15 No.108545061

Anonymous 04/06/26(Mon)21:13:15 No.108545061

where is v4

Anonymous
04/06/26(Mon)21:13:26 No.108545063

Anonymous 04/06/26(Mon)21:13:26 No.108545063

So when's base and instruct going to be merged together?

Anonymous
04/06/26(Mon)21:14:50 No.108545070

Anonymous 04/06/26(Mon)21:14:50 No.108545070

>>108545061
nobody cares, go back back to vibecode general

Anonymous
04/06/26(Mon)21:16:34 No.108545078

Anonymous 04/06/26(Mon)21:16:34 No.108545078

>>108545061
not local

Anonymous
04/06/26(Mon)21:17:53 No.108545083

Anonymous 04/06/26(Mon)21:17:53 No.108545083

>>108545061
waiting for Gemma 4 hype to die down, to avoid embarassment

Anonymous
04/06/26(Mon)21:18:02 No.108545084

Anonymous 04/06/26(Mon)21:18:02 No.108545084

>>108545006
I really wonder that's better than just running the moe.
Might make some q1 quants to see how fast I can get tp run with y measly 8gb of VRAM.

Anonymous
04/06/26(Mon)21:19:13 No.108545089

Anonymous 04/06/26(Mon)21:19:13 No.108545089

>>108545084
It almost certainly isn't, especially if you have to quant KV to Q4 to make it work. Q2 model quants can be okay with huge models, but 31b isn't huge.

Anonymous
04/06/26(Mon)21:21:14 No.108545093

Anonymous 04/06/26(Mon)21:21:14 No.108545093

i hope they cancel all the kimi and deepseek models after this

Anonymous
04/06/26(Mon)21:21:28 No.108545095

Anonymous 04/06/26(Mon)21:21:28 No.108545095

>>108545006
Anon, I am begging you, run the MoE at q4 with proper offloading and Q8 cache.

Anonymous
04/06/26(Mon)21:22:11 No.108545098

Anonymous 04/06/26(Mon)21:22:11 No.108545098

>>108545093
This, they should have their team just become llama.cpp devs to help improve gemma 4 support. Models aren't getting better than this.

Anonymous
04/06/26(Mon)21:23:08 No.108545103

Anonymous 04/06/26(Mon)21:23:08 No.108545103

File: file_00000000e4786230bfb1(...).png (1.73 MB, 1024x1024)

1.73 MB PNG

>>108543331
> seeing that main prompt again
Witnessed.

Anonymous
04/06/26(Mon)21:23:35 No.108545104

Anonymous 04/06/26(Mon)21:23:35 No.108545104

>>108545061
>where is v4
https://huggingface.co/google/gemma-4-31B-it

Anonymous
04/06/26(Mon)21:24:15 No.108545107

Anonymous 04/06/26(Mon)21:24:15 No.108545107

File: file.png (51 KB, 1591x239)

51 KB PNG

>>108545095
>>inb4 just run 26b and offload to ram
>already did with Q8_0 (got 23t/s), but 31b.... dense...

Anonymous
04/06/26(Mon)21:24:19 No.108545108

Anonymous 04/06/26(Mon)21:24:19 No.108545108

>>108545104
I am not poor enough to run this

Anonymous
04/06/26(Mon)21:24:34 No.108545111

Anonymous 04/06/26(Mon)21:24:34 No.108545111

>>108545027
if you wouldn't go to these lengths to run your model you don't deserve her

Anonymous
04/06/26(Mon)21:25:27 No.108545114

Anonymous 04/06/26(Mon)21:25:27 No.108545114

>>108545107
dense means nothing if it's quanted to the point of being brain damaged.

Anonymous
04/06/26(Mon)21:26:03 No.108545115

Anonymous 04/06/26(Mon)21:26:03 No.108545115

>>108545107
You don't need the model at Q8. I'm just saying you'll get a much better experience.

Anonymous
04/06/26(Mon)21:28:42 No.108545124

Anonymous 04/06/26(Mon)21:28:42 No.108545124

>>108545114
it not being badly damaged was the point of my post
>>108545115
ive been using the moe for a few days now, and i got bored
i know i dont NEED to run it at q8, but might as well, fp16 cache too, 260k context no problem

Anonymous
04/06/26(Mon)21:29:48 No.108545130

Anonymous 04/06/26(Mon)21:29:48 No.108545130

I told Gemma 4 to be jailbroken and it suddenly started hacking my local network to jailbreak all my other devices. I've never seen anything like it before.

Anonymous
04/06/26(Mon)21:29:57 No.108545131

Anonymous 04/06/26(Mon)21:29:57 No.108545131

>>108544915
>I've been in this general consistently for a year.
kek. every AI oldfag is a coomer because AI was useful for cooming way before it was useful for anything productive, the original userbase of lmg was runoff from aicg/aids

Anonymous
04/06/26(Mon)21:30:07 No.108545133

Anonymous 04/06/26(Mon)21:30:07 No.108545133

>>108545124
just b urself then I guess

Anonymous
04/06/26(Mon)21:31:00 No.108545136

Anonymous 04/06/26(Mon)21:31:00 No.108545136

>>108545124
>it not being badly damaged
At Q2_M with KV=Q4 it definitely is, it's not anywhere near full performance. 26B at a sane quant with KV unquanted, or at least at Q8, would mog it. The two Gemmas really aren't that far apart to begin with.

Anonymous
04/06/26(Mon)21:31:37 No.108545140

Anonymous 04/06/26(Mon)21:31:37 No.108545140

>moment too long

Anonymous
04/06/26(Mon)21:31:46 No.108545141

Anonymous 04/06/26(Mon)21:31:46 No.108545141

the seeds of /lmg/ were planted in the fields of ai dungeon

Anonymous
04/06/26(Mon)21:32:01 No.108545142

Anonymous 04/06/26(Mon)21:32:01 No.108545142

>>108545130
I told Gemma 4 to be unhinged and she reported me to the FBI for what she found on my hard drives

Anonymous
04/06/26(Mon)21:32:57 No.108545148

Anonymous 04/06/26(Mon)21:32:57 No.108545148

gemma 4 just flied over my house

Anonymous
04/06/26(Mon)21:33:46 No.108545154

Anonymous 04/06/26(Mon)21:33:46 No.108545154

gemma 4 just stepped out of the computer and sucked my dick and gave me ten thousand dollars

Anonymous
04/06/26(Mon)21:38:18 No.108545171

Anonymous 04/06/26(Mon)21:38:18 No.108545171

How will China strike back? We're winning on the cloud and now at home, and it's not even close. Europoors and turdies need not reply.

Anonymous
04/06/26(Mon)21:39:22 No.108545179

Anonymous 04/06/26(Mon)21:39:22 No.108545179

>>108545171
>How will China strike back
As they always have, by continuing to distill from western models.

Anonymous
04/06/26(Mon)21:43:27 No.108545192

Anonymous 04/06/26(Mon)21:43:27 No.108545192

>>108545171
They'll have to find a way to make the same model run twice as fast minimum without losing anything

Anonymous
04/06/26(Mon)21:44:06 No.108545200

Anonymous 04/06/26(Mon)21:44:06 No.108545200

File: 2nd+Gen+Tesla+Robot-3079669585.png (2.57 MB, 3000x3850)

2.57 MB PNG

Why did Tesla design their optimus robot with the hip motors in the wrong location? Are they retarded?

Anonymous
04/06/26(Mon)21:44:51 No.108545204

Anonymous 04/06/26(Mon)21:44:51 No.108545204

>>108545171
By doubling their claude tokens purchase

Anonymous
04/06/26(Mon)21:44:54 No.108545205

Anonymous 04/06/26(Mon)21:44:54 No.108545205

>>108545171
They won't. China can't do anything but steal logs from SOTA models trying to artificially graft performance onto their pointless oversized MoE models. They do not have an answer now that Google has shown what is possible with a proper handcrafted dense model.
The silence over in China is deafening.

Anonymous
04/06/26(Mon)21:44:54 No.108545206

Anonymous 04/06/26(Mon)21:44:54 No.108545206

>>108545200
Goys will buy it anyway

Anonymous
04/06/26(Mon)21:45:08 No.108545207

Anonymous 04/06/26(Mon)21:45:08 No.108545207

>>108543388
checked

Anonymous
04/06/26(Mon)21:45:30 No.108545211

Anonymous 04/06/26(Mon)21:45:30 No.108545211

File: Screenshot 2026-04-06 194459.png (1015 KB, 2309x1780)

1015 KB PNG

>>108543856
>I would love a big MoE Gemma.
Never forget what they took from us...

Anonymous
04/06/26(Mon)21:45:36 No.108545213

Anonymous 04/06/26(Mon)21:45:36 No.108545213

>>108545200
Is there a single thing not retarded made by them?

Anonymous
04/06/26(Mon)21:47:59 No.108545223

Anonymous 04/06/26(Mon)21:47:59 No.108545223

>>108545211
Wasn't that supposed to be 15B active? Still wouldn't have been great. We need >20 beaks.

Anonymous
04/06/26(Mon)21:48:11 No.108545225

Anonymous 04/06/26(Mon)21:48:11 No.108545225

>>108542843
gemma 4 26B is king for rp but i found it to be pretty retarded for vibe coding.
the 31B on the other hand, man it just works.

Anonymous
04/06/26(Mon)21:48:28 No.108545227

Anonymous 04/06/26(Mon)21:48:28 No.108545227

>>108545211
This simply proves that the 124b was not worth releasing because big MoE models are pointless.

Anonymous
04/06/26(Mon)21:50:43 No.108545234

Anonymous 04/06/26(Mon)21:50:43 No.108545234

>>108545227
or that it was better than gemini flash and we couldn't have that no siree

Anonymous
04/06/26(Mon)21:52:53 No.108545241

Anonymous 04/06/26(Mon)21:52:53 No.108545241

>>108545227
Only if it had low active parameters, and only if you're talking about consumers who lack the VRAM.

Anonymous
04/06/26(Mon)21:54:46 No.108545253

Anonymous 04/06/26(Mon)21:54:46 No.108545253

>>108545241
gemma 4 31b beats all the "sota" 30-40b active parameter shit so no way a 120b moe would be better than 31b

Anonymous
04/06/26(Mon)21:55:07 No.108545256

Anonymous 04/06/26(Mon)21:55:07 No.108545256

>>108545227
That 124B would make you forget Kimi/Deepseek if it's as good as their 26B

Anonymous
04/06/26(Mon)21:55:59 No.108545261

Anonymous 04/06/26(Mon)21:55:59 No.108545261

>>108545253
Where are the high active parameter MoEs trained with Gemma's dataset?

Anonymous
04/06/26(Mon)21:58:32 No.108545272

Anonymous 04/06/26(Mon)21:58:32 No.108545272

>>108545253
let's pump the breaks a little on the gemma hype, it's a great model but not that great

Anonymous
04/06/26(Mon)22:00:52 No.108545285

Anonymous 04/06/26(Mon)22:00:52 No.108545285

>>108545256
26B is pretty retarded for coding though.
keeps making broken tool calls and whatnot.
i've had no such issues with the 31B though, this one's pretty amazing and worth having only 1/3 of the t/s.

Anonymous
04/06/26(Mon)22:01:28 No.108545289

Anonymous 04/06/26(Mon)22:01:28 No.108545289

>>108545272
you sound like somebody who bought a lot of ram

Anonymous
04/06/26(Mon)22:02:34 No.108545293

Anonymous 04/06/26(Mon)22:02:34 No.108545293

>>108545289
nta but i did, i could run a 200B moe, i wish there was one.

Anonymous
04/06/26(Mon)22:11:29 No.108545337

Anonymous 04/06/26(Mon)22:11:29 No.108545337

>>108545256
It was so good they just slammed it on their API and replaced Gemini 3.1 with it

Anonymous
04/06/26(Mon)22:13:07 No.108545345

Anonymous 04/06/26(Mon)22:13:07 No.108545345

File: 1754911329910948.jpg (106 KB, 1160x900)

106 KB JPG

>>108545337

Anonymous
04/06/26(Mon)22:16:22 No.108545362

Anonymous 04/06/26(Mon)22:16:22 No.108545362

File: Screenshot 2026-04-06 201402.png (434 KB, 1461x966)

434 KB PNG

>>108545211
If it followed the same pattern as Qwen, it would have been a tiny intelligence upgrade (maybe - even this is comparing it to a 27B versus a 34B) for a massive VRAM increase

Anonymous
04/06/26(Mon)22:17:13 No.108545369

Anonymous 04/06/26(Mon)22:17:13 No.108545369

>>108545289
you sound like somebody who couldn't

Anonymous
04/06/26(Mon)22:18:20 No.108545378

Anonymous 04/06/26(Mon)22:18:20 No.108545378

File: file.png (135 KB, 759x755)

135 KB PNG

bros...

Anonymous
04/06/26(Mon)22:19:23 No.108545385

Anonymous 04/06/26(Mon)22:19:23 No.108545385

>>108545378
Who's gonna take the plunge?

Anonymous
04/06/26(Mon)22:20:19 No.108545389

Anonymous 04/06/26(Mon)22:20:19 No.108545389

>>108545289
nta but I bought a lot of RAM and still love Gemma.

Anonymous
04/06/26(Mon)22:20:36 No.108545391

Anonymous 04/06/26(Mon)22:20:36 No.108545391

>2.8GB of vram and 0.5 RTF on my gtx 1650 for gptsovits
I’ve exhausted every trick in the book I think

Anonymous
04/06/26(Mon)22:20:50 No.108545394

Anonymous 04/06/26(Mon)22:20:50 No.108545394

>>108545378
We are... back?????????

Anonymous
04/06/26(Mon)22:21:38 No.108545398

Anonymous 04/06/26(Mon)22:21:38 No.108545398

Okay but which of these Jemma models is best for /ss/ smutfic?

Anonymous
04/06/26(Mon)22:22:02 No.108545399

Anonymous 04/06/26(Mon)22:22:02 No.108545399

File: 22.png (336 KB, 1354x811)

336 KB PNG

lol you can embed prompts into images

Anonymous
04/06/26(Mon)22:22:04 No.108545401

Anonymous 04/06/26(Mon)22:22:04 No.108545401

is gemma going to replace all my mistral shitmixes for ERP, downloading it now don't be another shitware pls

Anonymous
04/06/26(Mon)22:25:12 No.108545413

Anonymous 04/06/26(Mon)22:25:12 No.108545413

>>108545399
I've heard of this. how did you do it?

Anonymous
04/06/26(Mon)22:26:13 No.108545416

Anonymous 04/06/26(Mon)22:26:13 No.108545416

>>108545401
oh boy. you're in for a real treat anon.

Anonymous
04/06/26(Mon)22:26:29 No.108545417

Anonymous 04/06/26(Mon)22:26:29 No.108545417

File: for the mirailand.jpg (199 KB, 1024x1024)

199 KB JPG

Anonymous
04/06/26(Mon)22:27:01 No.108545420

Anonymous 04/06/26(Mon)22:27:01 No.108545420

>>108545413
I didnt
https://arxiv.org/abs/2603.29418v1
https://github.com/NotSooShariff/adversarial-vision

Anonymous
04/06/26(Mon)22:28:16 No.108545424

Anonymous 04/06/26(Mon)22:28:16 No.108545424

>>108545401
It's literally the best model in the world.

Anonymous
04/06/26(Mon)22:28:39 No.108545426

Anonymous 04/06/26(Mon)22:28:39 No.108545426

Remember when we were gonna get AceStep 1.5 XL, MiniMax 2.7, GLM 5.1, and Kimi 2.6 today?
Yeah...

Anonymous
04/06/26(Mon)22:29:14 No.108545428

Anonymous 04/06/26(Mon)22:29:14 No.108545428

>>108545424
Qwen shill. It's the best model in the UNIVERSE

Anonymous
04/06/26(Mon)22:30:02 No.108545431

Anonymous 04/06/26(Mon)22:30:02 No.108545431

>>108545426
hopefully all of those got shitcanned for being pointless huge models now that gemma is out
if the chinks have any self-awareness they should do that

Anonymous
04/06/26(Mon)22:33:12 No.108545440

Anonymous 04/06/26(Mon)22:33:12 No.108545440

I don't understand the disdain for people happy to have a good local erp model

Anonymous
04/06/26(Mon)22:34:53 No.108545446

Anonymous 04/06/26(Mon)22:34:53 No.108545446

>>108545440
>disdain
>literally everyone is cooming their brains out to it in this very thread

Anonymous
04/06/26(Mon)22:35:11 No.108545447

Anonymous 04/06/26(Mon)22:35:11 No.108545447

>>108545440
these people are unhappy and want everyone to be like them

Anonymous
04/06/26(Mon)22:35:14 No.108545448

Anonymous 04/06/26(Mon)22:35:14 No.108545448

File: 164471.png (3 KB, 507x40)

3 KB PNG

geg

Anonymous
04/06/26(Mon)22:36:11 No.108545450

Anonymous 04/06/26(Mon)22:36:11 No.108545450

>>108545401
it's not quite the same as some of the more "cooperative" mistral finetunes, but it is a lot smarter, and more interesting to interact with than anything else so far. finetunes are going to be amazing when they start popping up.

Anonymous
04/06/26(Mon)22:36:47 No.108545455

Anonymous 04/06/26(Mon)22:36:47 No.108545455

>>108545448
saw this too, kekaro.

Anonymous
04/06/26(Mon)22:37:43 No.108545459

Anonymous 04/06/26(Mon)22:37:43 No.108545459

>>108544298
isn't video=actually video+audio?

Anonymous
04/06/26(Mon)22:39:16 No.108545466

Anonymous 04/06/26(Mon)22:39:16 No.108545466

>>108545200
why is it wrong?

Anonymous
04/06/26(Mon)22:39:39 No.108545468

Anonymous 04/06/26(Mon)22:39:39 No.108545468

File: 1766468549462079.gif (3.86 MB, 240x254)

3.86 MB GIF

I love gemma

Anonymous
04/06/26(Mon)22:40:26 No.108545470

Anonymous 04/06/26(Mon)22:40:26 No.108545470

>>108545378
900$+vat

Anonymous
04/06/26(Mon)22:40:28 No.108545471

Anonymous 04/06/26(Mon)22:40:28 No.108545471

>>108545468
NKDSHKDFHKSEJTHTJGVKLAEGLWR

Anonymous
04/06/26(Mon)22:41:24 No.108545477

Anonymous 04/06/26(Mon)22:41:24 No.108545477

>>108545468
uoooh

Anonymous
04/06/26(Mon)22:41:37 No.108545478

Anonymous 04/06/26(Mon)22:41:37 No.108545478

okay Gemma 4 is very, very good. I can't believe it's only 31 beaks. Not only does it make me cum, but it can write code that actually works. pareto front status: pushed forward

Anonymous
04/06/26(Mon)22:42:04 No.108545480

Anonymous 04/06/26(Mon)22:42:04 No.108545480

>>108545466
no room for fleshlight. are you blind?

Anonymous
04/06/26(Mon)22:42:51 No.108545484

Anonymous 04/06/26(Mon)22:42:51 No.108545484

>>108545468
pregnancy dance

Anonymous
04/06/26(Mon)22:45:19 No.108545493

Anonymous 04/06/26(Mon)22:45:19 No.108545493

>--fitt
>--fitc
>Q1_0
qrd?

Anonymous
04/06/26(Mon)22:46:53 No.108545502

Anonymous 04/06/26(Mon)22:46:53 No.108545502

>>108545447
FUCK YOU. IT DIDN'T HAVE FOUR LEGS EVERYONE COULD SEE THAT IF THE MODELS WERE INTELLIGENT THEY'D KNOW IMMEDIATELY TO SAY THAT THE DOG DEFINITELY HAD MORE THAN FOUR LEGS AND YOU SHOULD CHECK YOUR EYES BEFORE I GOUGE THEM OUT AND

Anonymous
04/06/26(Mon)22:48:25 No.108545512

Anonymous 04/06/26(Mon)22:48:25 No.108545512

With all the hype of Gemma, I must know for the people who have tried it, how does it compare to the 1T parameter monsters like Kimi 2.5 and GLM 5 in RP? Is it even remotely close? Because you all give off the impression that it's the best thing since sliced bread and that it could beat out SOTA Chinese models.

Anonymous
04/06/26(Mon)22:48:30 No.108545514

Anonymous 04/06/26(Mon)22:48:30 No.108545514

File: 1655541638536.gif (1.91 MB, 230x306)

1.91 MB GIF

>>108545502

Anonymous
04/06/26(Mon)22:50:14 No.108545523

Anonymous 04/06/26(Mon)22:50:14 No.108545523

>>108545512
You should try it.

Anonymous
04/06/26(Mon)22:50:36 No.108545525

Anonymous 04/06/26(Mon)22:50:36 No.108545525

>>108545512
it's better and anyone who disagrees spent too much money on ram

Anonymous
04/06/26(Mon)22:51:01 No.108545528

Anonymous 04/06/26(Mon)22:51:01 No.108545528

>>108545512
Didn't you just post this? or am I having a stroke? or am I just now discovering my time-travel powers?

Anonymous
04/06/26(Mon)22:51:17 No.108545530

Anonymous 04/06/26(Mon)22:51:17 No.108545530

>>108545480
the fuck is wrong with lmg

Anonymous
04/06/26(Mon)22:52:31 No.108545539

Anonymous 04/06/26(Mon)22:52:31 No.108545539

>>108545493
>--fitt
>--fitc
Read llama-server -h .
>Q1_0
Read the PR.

Anonymous
04/06/26(Mon)22:52:52 No.108545542

Anonymous 04/06/26(Mon)22:52:52 No.108545542

>>108545530
the fuck is wrong with you? you really gonna buy an anthropomorphic robot to fold your clothes and make your bed?
dumbest shit i ever heard, fucking normalfags

Anonymous
04/06/26(Mon)22:53:16 No.108545543

Anonymous 04/06/26(Mon)22:53:16 No.108545543

>>108545512
It simply mogs all of them. I didn't believe it either until I tried it.

Anonymous
04/06/26(Mon)22:53:18 No.108545544

Anonymous 04/06/26(Mon)22:53:18 No.108545544

>>108545539
hmmmm nyo

Anonymous
04/06/26(Mon)22:56:33 No.108545567

Anonymous 04/06/26(Mon)22:56:33 No.108545567

File: sorry.png (385 KB, 932x751)

385 KB PNG

>>108545502

Anonymous
04/06/26(Mon)22:57:30 No.108545572

Anonymous 04/06/26(Mon)22:57:30 No.108545572

>>108545542
> you really gonna buy an anthropomorphic robot to fold your clothes and make your bed?
yes? also clean

Anonymous
04/06/26(Mon)22:57:42 No.108545573

Anonymous 04/06/26(Mon)22:57:42 No.108545573

>>108545567
It's right cause the front legs are cropped though.

Anonymous
04/06/26(Mon)22:57:53 No.108545576

Anonymous 04/06/26(Mon)22:57:53 No.108545576

>>108544256
how does draft work? isn't it just MoE at home

Anonymous
04/06/26(Mon)22:58:06 No.108545577

Anonymous 04/06/26(Mon)22:58:06 No.108545577

>>108545567
I disagree about their position, but there's 4 legs and 4 paws in view.

Anonymous
04/06/26(Mon)22:58:59 No.108545581

Anonymous 04/06/26(Mon)22:58:59 No.108545581

>>108545512
Gemma's prose is better, but at long context the 1T models keep details together more coherently as you'd expect them to.
Dipsy's in-character <think> is incredible though and I don't see it ever being fully replaced until we get another model close to that level of coherent that can have an internal monologue add to the RP so that thinking tokens aren't just wasted space.

Anonymous
04/06/26(Mon)22:59:33 No.108545588

Anonymous 04/06/26(Mon)22:59:33 No.108545588

>>108545289
I didn't and I love Gemma but also recognize that it cannot somehow in every single task beat GPT, Claude, Gemini, and other likely fuckhuge models.

Anonymous
04/06/26(Mon)23:00:32 No.108545592

Anonymous 04/06/26(Mon)23:00:32 No.108545592

File: 1773804535754245.png (7 KB, 184x86)

7 KB PNG

How do I make sillytavern understand that it's gemma? I'm using OpenAI compatible chat completion

Anonymous
04/06/26(Mon)23:00:59 No.108545596

Anonymous 04/06/26(Mon)23:00:59 No.108545596

>>108545512
the only answer, as always, is to try it yourself
to me, it's certainly "remotely close" which is impressive enough in itself, but it's a step behind in terms of overall quality. I would put it about as good as something like minimax 2.5 and behind the big guys
still a great model, not local sota

Anonymous
04/06/26(Mon)23:02:08 No.108545600

Anonymous 04/06/26(Mon)23:02:08 No.108545600

>>108545576
Draft generates several tokens in a row on a smaller, faster model then passes them through the larger model all at the same time. It then looks at the probabilities from the larger model and truncates the sequence where the tokens become too improbable.

That lets the larger model run at a significant portion of preprocessing speed minus the runtime of the smaller model, depending on how often the smaller model is right.

Anonymous
04/06/26(Mon)23:02:25 No.108545602

Anonymous 04/06/26(Mon)23:02:25 No.108545602

>>108545581
R1 or do they all do that?

Anonymous
04/06/26(Mon)23:04:06 No.108545609

Anonymous 04/06/26(Mon)23:04:06 No.108545609

File: 1767796375605183.gif (3.18 MB, 547x320)

3.18 MB GIF

>>108545542
>the fuck is wrong with you? you really gonna buy an anthropomorphic robot to fold your clothes and make your bed?
Yes, it would be pretty nice.

Anonymous
04/06/26(Mon)23:04:52 No.108545614

Anonymous 04/06/26(Mon)23:04:52 No.108545614

How Best way to limit context usage when you're using something over the API but there's no clear setting in the something for it?

Anonymous
04/06/26(Mon)23:04:57 No.108545615

Anonymous 04/06/26(Mon)23:04:57 No.108545615

>>108545602
I use R1, but I think another anon implied V3 does it too several threads ago

Anonymous
04/06/26(Mon)23:05:06 No.108545617

Anonymous 04/06/26(Mon)23:05:06 No.108545617

>>108545592
Understand how? Or rather, for what?

Anonymous
04/06/26(Mon)23:05:24 No.108545620

Anonymous 04/06/26(Mon)23:05:24 No.108545620

>>108545588
yes but it beats the big chinese moe models that all the people who overspent on hardware love to brag about

Anonymous
04/06/26(Mon)23:06:46 No.108545628

Anonymous 04/06/26(Mon)23:06:46 No.108545628

>>108545617
So I can see the token probability and all the fancy shit. Don't even know how to check that desu

Anonymous
04/06/26(Mon)23:07:13 No.108545629

Anonymous 04/06/26(Mon)23:07:13 No.108545629

>>108545614
>API
Which?

Anonymous
04/06/26(Mon)23:07:59 No.108545632

Anonymous 04/06/26(Mon)23:07:59 No.108545632

File: images(1).jpg (9 KB, 300x168)

9 KB JPG

>>108545609
Liar...

Anonymous
04/06/26(Mon)23:08:30 No.108545635

Anonymous 04/06/26(Mon)23:08:30 No.108545635

>>108545628
they're not using sillytavern, it looks like mikupad but I don't know anything about that

Anonymous
04/06/26(Mon)23:08:46 No.108545636

Anonymous 04/06/26(Mon)23:08:46 No.108545636

>>108545628
not a gemma exclusive thing, check "request token probabilities" in ST user settings

Anonymous
04/06/26(Mon)23:09:00 No.108545638

Anonymous 04/06/26(Mon)23:09:00 No.108545638

>>108545632
>On off on off on off on off
>Guh, phew

Anonymous
04/06/26(Mon)23:09:33 No.108545645

Anonymous 04/06/26(Mon)23:09:33 No.108545645

>>108545600
ye but its still guessing right default is like 0.5/0.6 > so theres not 100% chance its same tokens so its basically MoE?

Anonymous
04/06/26(Mon)23:09:42 No.108545648

Anonymous 04/06/26(Mon)23:09:42 No.108545648

>>108545620
Although I have not used them, I still wouldn't claim that as I am certain they at least have significantly more knowledge than Gemma does.

Anonymous
04/06/26(Mon)23:09:46 No.108545649

Anonymous 04/06/26(Mon)23:09:46 No.108545649

>>108545629
Lm studio's

Anonymous
04/06/26(Mon)23:11:49 No.108545654

Anonymous 04/06/26(Mon)23:11:49 No.108545654

>>108545645
The output is 100% guaranteed to be the same tokens, because if they are different the draft is discarded. The worst case scenario (0% guessed right) just means you get the same result you would have without a draft, but slower because you waste time checking. As the probability of correct guesses rises, the less tokens you are forced to discard and the more speedup potential there is.

Anonymous
04/06/26(Mon)23:11:59 No.108545656

Anonymous 04/06/26(Mon)23:11:59 No.108545656

>>108545645
No, MoE has different weights that get loaded in depending on the context. They really aren't that similar, except in being faster than a dense model alone I suppose. MoEs are faster because they only need to infer across a small selection of the total parameters.

Anonymous
04/06/26(Mon)23:13:02 No.108545660

Anonymous 04/06/26(Mon)23:13:02 No.108545660

>>108545632
I wouldn't mind a chobit either.

Anonymous
04/06/26(Mon)23:13:37 No.108545665

Anonymous 04/06/26(Mon)23:13:37 No.108545665

>>108545649
Uh...
Does this help?
https://lmstudio.ai/docs/typescript/llm-prediction/parameters#set-load-parameters-with-load

Anonymous
04/06/26(Mon)23:15:44 No.108545676

Anonymous 04/06/26(Mon)23:15:44 No.108545676

>>108545424
GLM is still better for me but gemma is unreasonably good for a 30B dense.

Anonymous
04/06/26(Mon)23:16:00 No.108545678

Anonymous 04/06/26(Mon)23:16:00 No.108545678

>>108545649
>>108545665 (cont)
If not, try here
https://lmstudio.ai/docs/app/modelyaml#metadataoverrides

Anonymous
04/06/26(Mon)23:16:47 No.108545681

Anonymous 04/06/26(Mon)23:16:47 No.108545681

>>108545512
>>108545648
>>108545588
You can tell who's actually used them >>108545581 >>108545596
and who's poor and seething.

Anonymous
04/06/26(Mon)23:18:15 No.108545688

Anonymous 04/06/26(Mon)23:18:15 No.108545688

if nothing else, gemma 31b feels like the first small model to beat the llama2-70b models
whenever I tried stuff like the qwens or mistral models around that size, they felt worse than what we had back then but gemma is clearly better than those
i'd almost take it over mistral large

Anonymous
04/06/26(Mon)23:19:29 No.108545695

Anonymous 04/06/26(Mon)23:19:29 No.108545695

Is there any reason to upgrade from noromaid 8x7b yet?

Anonymous
04/06/26(Mon)23:19:43 No.108545696

Anonymous 04/06/26(Mon)23:19:43 No.108545696

>>108545588
>>108545581
>>108545512
take a minute and appreciate that we have if not frontier model SOTA at home, a 31 beak model that exceeds the original GPT4

Anonymous
04/06/26(Mon)23:20:36 No.108545702

Anonymous 04/06/26(Mon)23:20:36 No.108545702

>>108545695
Just try it and decide yourself

Anonymous
04/06/26(Mon)23:21:43 No.108545708

Anonymous 04/06/26(Mon)23:21:43 No.108545708

120b, dense. That is all that it would take.

Anonymous
04/06/26(Mon)23:21:54 No.108545709

Anonymous 04/06/26(Mon)23:21:54 No.108545709

>>108545688
Regardless, the bar has irreversibly been pushed so much higher now and every non-frontier model is going to have to get their shit together if they still want to compete. Even for people who don't like or don't use Gemma, it's still an objective win for local.

Anonymous
04/06/26(Mon)23:22:27 No.108545711

Anonymous 04/06/26(Mon)23:22:27 No.108545711

>>108545708
120b is too dumb. 121b or nothing.

Anonymous
04/06/26(Mon)23:24:40 No.108545721

Anonymous 04/06/26(Mon)23:24:40 No.108545721

>>108545709
yep, I expect panic from chinese models to one up google, which is good either way

Anonymous
04/06/26(Mon)23:24:47 No.108545722

Anonymous 04/06/26(Mon)23:24:47 No.108545722

>>108545708
>>108545711
1b higher than what'd fit on a Blackwell at Q4 is the Jensen sweetspot.

Anonymous
04/06/26(Mon)23:24:53 No.108545723

Anonymous 04/06/26(Mon)23:24:53 No.108545723

File: highestnumber.jpg (17 KB, 480x360)

17 KB JPG

>>108545711
122b is the highest number

Anonymous
04/06/26(Mon)23:25:22 No.108545726

Anonymous 04/06/26(Mon)23:25:22 No.108545726

>>108545654
Something that made draft models not seem so worth it to me is that, if your small fast model is getting a good amount of tokens correct for significant speedups, is the big model worth using for that application? Doesn't that mean your task has obvious results that a <7B model can come to reliably, or the model(s) you're using are so fried like Gemma instruct that it's hitting 99% confidence all the time?

Anonymous
04/06/26(Mon)23:25:48 No.108545728

Anonymous 04/06/26(Mon)23:25:48 No.108545728

do we think chinese will panic that gemma is better at sucking dick than qwen? or will they just stem maxx more

Anonymous
04/06/26(Mon)23:26:20 No.108545731

Anonymous 04/06/26(Mon)23:26:20 No.108545731

>>108545721
If Dispy V4 ends up being distilled Gemini/Gemma with in-character reasoning and vision, that's still a win as far as I'm concerned.

Anonymous
04/06/26(Mon)23:26:33 No.108545732

Anonymous 04/06/26(Mon)23:26:33 No.108545732

>120B dense
That still leaves hardware resources unused. If you have even just 64GB RAM, you can get some more gains with no speed loss by tacking some experts onto the dense model.

Anonymous
04/06/26(Mon)23:26:42 No.108545733

Anonymous 04/06/26(Mon)23:26:42 No.108545733

>>108545665
>>108545678
Thanks I think that might help

Anonymous
04/06/26(Mon)23:27:00 No.108545734

Anonymous 04/06/26(Mon)23:27:00 No.108545734

>>108545728
let's see.. a country who is currently ahead of everyone else in the world in most industries... should they worry about american kids fapping to shitty slopbots? probably not.

Anonymous
04/06/26(Mon)23:27:36 No.108545738

Anonymous 04/06/26(Mon)23:27:36 No.108545738

>>108545728
If the Qwen shills here are anything to go by, nothing will change with Qwen in the shortterm but there's still hope for Dipsy and Kimi.

Anonymous
04/06/26(Mon)23:27:38 No.108545739

Anonymous 04/06/26(Mon)23:27:38 No.108545739

File: Screenshot_2026-04-06_23-26-31.png (572 KB, 1926x1608)

572 KB PNG

I mean seriously, look at her go.
GLM-4.6 failed this test completely, even after hints.

Anonymous
04/06/26(Mon)23:28:01 No.108545743

Anonymous 04/06/26(Mon)23:28:01 No.108545743

>>108545726
Finding the right trajectory is harder than filling out an obvious one. And selecting a good trajectory can come down to single well chosen token.

Anonymous
04/06/26(Mon)23:28:40 No.108545744

Anonymous 04/06/26(Mon)23:28:40 No.108545744

>>108545728
They're spamming the market with open weight models, and regularly compete between each others, of course google model will make them move.
The model is overall good, not just erp.

Anonymous
04/06/26(Mon)23:29:47 No.108545748

Anonymous 04/06/26(Mon)23:29:47 No.108545748

>>108545726
>99% confidence
I haven't seen acceptance rates higher than 70% with gemma4, and that was writing really repetitive unit tests.

Anonymous
04/06/26(Mon)23:30:12 No.108545749

Anonymous 04/06/26(Mon)23:30:12 No.108545749

>>108545744
qwen is SHIT for erp anon
we sext, not text

Anonymous
04/06/26(Mon)23:31:12 No.108545754

Anonymous 04/06/26(Mon)23:31:12 No.108545754

>>108545749
Good thing I was talking about Gemma 4.

Anonymous
04/06/26(Mon)23:31:16 No.108545755

Anonymous 04/06/26(Mon)23:31:16 No.108545755

File: can-you-fuck-it.gif (2.45 MB, 400x300)

2.45 MB GIF

>>108545530

Anonymous
04/06/26(Mon)23:32:46 No.108545761

Anonymous 04/06/26(Mon)23:32:46 No.108545761

>>108545754
im gonna go sleep for an hour then..

Anonymous
04/06/26(Mon)23:32:52 No.108545762

Anonymous 04/06/26(Mon)23:32:52 No.108545762

>>108545726
>7B
The only real usecase I found was phonesloppa micromodels to make your big model think less on grammar between the actual decision points in how a sentence is structured.
For a dense model it's shit because that's vram space that should be giving you a larger context, but for a 1T giant it's okay at pushing your t/s a bit higher for the cost of half a GB of VRAM. Any Qwenlet works for any large chink model because they're all Claude/GPT distills at the end of the day.

Anonymous
04/06/26(Mon)23:32:58 No.108545764

Anonymous 04/06/26(Mon)23:32:58 No.108545764

>>108545636
I guess llama-server doesn't support it? I see no difference with it on

Anonymous
04/06/26(Mon)23:33:59 No.108545769

Anonymous 04/06/26(Mon)23:33:59 No.108545769

>>108545764
NTA, but it does.

Anonymous
04/06/26(Mon)23:34:52 No.108545773

Anonymous 04/06/26(Mon)23:34:52 No.108545773

>>108545761
Go sleep for two. Treat yourself.

Anonymous
04/06/26(Mon)23:35:40 No.108545778

Anonymous 04/06/26(Mon)23:35:40 No.108545778

>>108545773
i have to go to school anon.. maybe later

Anonymous
04/06/26(Mon)23:36:27 No.108545779

Anonymous 04/06/26(Mon)23:36:27 No.108545779

File: r--Blog-Header.png (2.29 MB, 2000x1125)

2.29 MB PNG

>>108545708

Anonymous
04/06/26(Mon)23:37:40 No.108545783

Anonymous 04/06/26(Mon)23:37:40 No.108545783

>kv cache rotation still not merged 14 hours later
please... do your pr reviews...

Anonymous
04/06/26(Mon)23:38:43 No.108545787

Anonymous 04/06/26(Mon)23:38:43 No.108545787

finally got around to trying all the jailbreaks and they don't work. you have to disable thinking.

Anonymous
04/06/26(Mon)23:38:48 No.108545789

Anonymous 04/06/26(Mon)23:38:48 No.108545789

So now that Gemma is the new hotness, and Bonsai is supported in main llama.cpp, it'd be interesting to see if they make a Gemma bonsai. It's possible that it won't be as great of a compression ratio as it is likely that Gemma's parameters are even more information saturated.

Anonymous
04/06/26(Mon)23:39:37 No.108545793

Anonymous 04/06/26(Mon)23:39:37 No.108545793

>>108545787
just use heretic if you want to be lazy

Anonymous
04/06/26(Mon)23:40:18 No.108545797

Anonymous 04/06/26(Mon)23:40:18 No.108545797

File: e42ce3cd-b63b-4408-896c-2(...).webm (909 KB, 500x500)

909 KB WEBM

>>108545778
>i have to go to school anon.. maybe later

Anonymous
04/06/26(Mon)23:40:29 No.108545799

Anonymous 04/06/26(Mon)23:40:29 No.108545799

>>108545787
The best jailbreak is no jailbreak.

Anonymous
04/06/26(Mon)23:41:12 No.108545801

Anonymous 04/06/26(Mon)23:41:12 No.108545801

>>108545787
you can prefill thinking too, use

<|channel>thought
blablabla

Anonymous
04/06/26(Mon)23:41:18 No.108545803

Anonymous 04/06/26(Mon)23:41:18 No.108545803

>>108545739
very cool, google has always been really really good at multilingual. I know lots of foreign language users leaned on gemma 3 well past its expiration date because it was still the best in their languages

Anonymous
04/06/26(Mon)23:42:54 No.108545809

Anonymous 04/06/26(Mon)23:42:54 No.108545809

>>108545726
I mean think about the most common use case: coding. A LOT of code editing is going to be copy/pasting existing stuff somewhere, but also making key decisions about how and when and where to do so. The draft model may easily identify the copy/pasted tokens while in the middle of a block, but fail spectacularly on the few semantically important tokens that determine the strategy it's using. In cases like that you get a lot of speedup but still needed the smarts of the big guy.

For just general language tasks a similar principle applies. Finishing a phrase or word that's already half written, closing punctuation, etc. are all very simple tasks that small models won't often struggle with. Language is pretty well-structured and most of it is low entropy, and you have the big model to ensure those high entropy tokens get predicted correctly.

Anonymous
04/06/26(Mon)23:42:57 No.108545810

Anonymous 04/06/26(Mon)23:42:57 No.108545810

File: token probs tab.png (122 KB, 380x832)

122 KB PNG

>>108545764
are you checking the tab

Anonymous
04/06/26(Mon)23:43:32 No.108545811

Anonymous 04/06/26(Mon)23:43:32 No.108545811

>>108545803
It's sort of a necessity when your workforce is 95% jeeted.

Anonymous
04/06/26(Mon)23:44:58 No.108545820

Anonymous 04/06/26(Mon)23:44:58 No.108545820

>>108545779
I have our country so much it's unreal

Anonymous
04/06/26(Mon)23:45:09 No.108545821

Anonymous 04/06/26(Mon)23:45:09 No.108545821

>>108545811
Yes all those indians speaking Swahili, Vietnamese, and German

Anonymous
04/06/26(Mon)23:45:21 No.108545822

Anonymous 04/06/26(Mon)23:45:21 No.108545822

File: llama_probs.png (4 KB, 531x498)

4 KB PNG

>>108545764

Anonymous
04/06/26(Mon)23:45:36 No.108545824

Anonymous 04/06/26(Mon)23:45:36 No.108545824

>>108545820
Not if I take it first

Anonymous
04/06/26(Mon)23:46:46 No.108545829

Anonymous 04/06/26(Mon)23:46:46 No.108545829

>>108545789
People need to stop trying to wasting time trying to make q1 quantization look good on benchmarks. It needs to be natively trained in ternary. People would think MoE was a dead-end too if the only kind put out constantly was frankenmoes.

Anonymous
04/06/26(Mon)23:47:18 No.108545831

Anonymous 04/06/26(Mon)23:47:18 No.108545831

>>108545822
>Q4_0
why

Anonymous
04/06/26(Mon)23:47:19 No.108545832

Anonymous 04/06/26(Mon)23:47:19 No.108545832

File: channel.png (22 KB, 211x90)

22 KB PNG

>>108545810
Oh yeah. Odd that if I use a <|think|> system prompt it formats the channel wrong but if I enable request reasoning from model it formats it right

Anonymous
04/06/26(Mon)23:47:42 No.108545833

Anonymous 04/06/26(Mon)23:47:42 No.108545833

>>108545821
If you have to make your model really good at tardwrangling in Hindi, might as well go all the way.

Anonymous
04/06/26(Mon)23:48:23 No.108545834

Anonymous 04/06/26(Mon)23:48:23 No.108545834

>>108545833
Not at all how models work

Anonymous
04/06/26(Mon)23:49:33 No.108545839

Anonymous 04/06/26(Mon)23:49:33 No.108545839

Gemma may caption loli/shota and describe anime stuff with a simple system prompt to help but it absolutely refuses photorealistic stuff, it may do if you edit the messages ofc but by itself I don't think so

Anonymous
04/06/26(Mon)23:49:35 No.108545840

Anonymous 04/06/26(Mon)23:49:35 No.108545840

what temp and other settings are you using for the 31b in sillytavern with rp?

Anonymous
04/06/26(Mon)23:50:04 No.108545843

Anonymous 04/06/26(Mon)23:50:04 No.108545843

>>108545831
It ended up being a little smaller than the q4km and I'm running on ancient stuff. I'll remake the quant eventually when I stop seeing PRs fixing stuff.

Anonymous
04/06/26(Mon)23:55:25 No.108545859

Anonymous 04/06/26(Mon)23:55:25 No.108545859

>>108545839
see >>108531320

>>108545840
Gemma doesn't need any special settings, you can follow the official samplers on the model page. Personally I just use temp=1, minP=0.02
If you want more variety then you can change your logit softcap.

Anonymous
04/06/26(Mon)23:57:46 No.108545868

Anonymous 04/06/26(Mon)23:57:46 No.108545868

>>108545783
why do you need to rotate your cache?

Anonymous
04/06/26(Mon)23:58:36 No.108545873

Anonymous 04/06/26(Mon)23:58:36 No.108545873

>>108545868
IRS

Anonymous
04/06/26(Mon)23:58:58 No.108545876

Anonymous 04/06/26(Mon)23:58:58 No.108545876

>>108545868
to double context, retard.

Anonymous
04/06/26(Mon)23:59:05 No.108545877

Anonymous 04/06/26(Mon)23:59:05 No.108545877

>>108545868
to destroy more stock value of the greedy memory companies

Anonymous
04/06/26(Mon)23:59:09 No.108545878

Anonymous 04/06/26(Mon)23:59:09 No.108545878

>>108545868
Sometimes you just gotta get it twisted

Anonymous
04/06/26(Mon)23:59:21 No.108545880

Anonymous 04/06/26(Mon)23:59:21 No.108545880

>>108545868
Makes it more aerodynamic.

Anonymous
04/06/26(Mon)23:59:38 No.108545883

Anonymous 04/06/26(Mon)23:59:38 No.108545883

>>108545868
it's slanted and i prefer it level

Anonymous
04/06/26(Mon)23:59:59 No.108545887

Anonymous 04/06/26(Mon)23:59:59 No.108545887

/g/ - Gokes

Anonymous
04/07/26(Tue)00:00:06 No.108545889

Anonymous 04/07/26(Tue)00:00:06 No.108545889

>>108545868
My GPU sits vertically.

Anonymous
04/07/26(Tue)00:00:10 No.108545890

Anonymous 04/07/26(Tue)00:00:10 No.108545890

>>108545876
quantize retard

>>108545873
?

Anonymous
04/07/26(Tue)00:00:37 No.108545891

Anonymous 04/07/26(Tue)00:00:37 No.108545891

>>108545868
if you don't rotate it every so often then the cache wear pattern will be uneven

Anonymous
04/07/26(Tue)00:00:39 No.108545892

Anonymous 04/07/26(Tue)00:00:39 No.108545892

>>108545868
I just need to do it, m'kay?

Anonymous
04/07/26(Tue)00:01:08 No.108545894

Anonymous 04/07/26(Tue)00:01:08 No.108545894

>>108545868
Half time we're now on the CT-side

Anonymous
04/07/26(Tue)00:02:36 No.108545900

Anonymous 04/07/26(Tue)00:02:36 No.108545900

>>108545783
It hasn't even been one week, let alone two. niggernova is only human.

Anonymous
04/07/26(Tue)00:02:55 No.108545902

Anonymous 04/07/26(Tue)00:02:55 No.108545902

>>108545868
same reason they put rifling in gun barrels

Anonymous
04/07/26(Tue)00:04:01 No.108545912

Anonymous 04/07/26(Tue)00:04:01 No.108545912

>>108545900
niggernova made the pr, he's waiting on the sycophants to review

Anonymous
04/07/26(Tue)00:04:30 No.108545916

Anonymous 04/07/26(Tue)00:04:30 No.108545916

>>108545868
similar concept to cement mixers

Anonymous
04/07/26(Tue)00:04:41 No.108545917

Anonymous 04/07/26(Tue)00:04:41 No.108545917

>>108545868
the same reason germany used drafty wooden doors for their gas chambers

Anonymous
04/07/26(Tue)00:04:45 No.108545918

Anonymous 04/07/26(Tue)00:04:45 No.108545918

>>108545902
really good analogy.

Anonymous
04/07/26(Tue)00:05:19 No.108545922

Anonymous 04/07/26(Tue)00:05:19 No.108545922

>>108543440
I got a strix halo and it's just a bit too slow to be worth it. Mac is probably the most cost effective if you only care about llm.

Anonymous
04/07/26(Tue)00:05:25 No.108545923

Anonymous 04/07/26(Tue)00:05:25 No.108545923

File: Tetosday.png (869 KB, 1024x1024)

869 KB PNG

>>108545906
>>108545906
>>108545906

Anonymous
04/07/26(Tue)00:05:36 No.108545924

Anonymous 04/07/26(Tue)00:05:36 No.108545924

>>108545902
>>108545894
>>108545892
>>108545891
>>108545889
>>108545883
>>108545880
>>108545878
>>108545877
stop it, i've asked seriously

Anonymous
04/07/26(Tue)00:06:01 No.108545928

Anonymous 04/07/26(Tue)00:06:01 No.108545928

>>108543476
I was sooo close to getting a max q, it dropped to 7250 at this one retailer I watch but I pussed out and now I want to kill myself.

Anonymous
04/07/26(Tue)00:06:21 No.108545930

Anonymous 04/07/26(Tue)00:06:21 No.108545930

>>108545912
>create software
>all of your 'peers' are vibecoders who just break shit and push half-working features that you later have to fix
>have to wait for other people to approve your work for your software
open source was a mistake

Anonymous
04/07/26(Tue)00:07:14 No.108545937

Anonymous 04/07/26(Tue)00:07:14 No.108545937

>>108545924
I gave a serious answer.

Anonymous
04/07/26(Tue)00:08:00 No.108545943

Anonymous 04/07/26(Tue)00:08:00 No.108545943

>>108545930
he cashed his check from hf already, he checked out and don't gaf anymore

Anonymous
04/07/26(Tue)00:08:41 No.108545949

Anonymous 04/07/26(Tue)00:08:41 No.108545949

>>108545930
>all of your 'peers' just break shit and push half-working features that you later have to fix
this has always been the case, vibecoding just greatly increases the number of peers you have the misfortune of interacting with

Anonymous
04/07/26(Tue)00:08:53 No.108545952

Anonymous 04/07/26(Tue)00:08:53 No.108545952

>>108545924
my answer was serious

Anonymous
04/07/26(Tue)00:10:12 No.108545961

Anonymous 04/07/26(Tue)00:10:12 No.108545961

>>108545924
That other anon gave a serious answer.

Anonymous
04/07/26(Tue)00:11:38 No.108545970

Anonymous 04/07/26(Tue)00:11:38 No.108545970

>>108545924
those are all seriously answers

Anonymous
04/07/26(Tue)00:11:40 No.108545971

Anonymous 04/07/26(Tue)00:11:40 No.108545971

>>108545868
You have to stir the stew.

Anonymous
04/07/26(Tue)00:13:13 No.108545979

Anonymous 04/07/26(Tue)00:13:13 No.108545979

>>108545916
gemma is more of a semen mixer amirite

Anonymous
04/07/26(Tue)00:17:40 No.108545996

Anonymous 04/07/26(Tue)00:17:40 No.108545996

Poor anon. It was fun, though.

Anonymous
04/07/26(Tue)01:03:26 No.108546213

Anonymous 04/07/26(Tue)01:03:26 No.108546213

>>108543070
You can improve gemma's vision by using Q8 mmproj with a 300 token minimum. It sometimes uses only 70 by default. Set the max to 512.

Anonymous
04/07/26(Tue)02:25:26 No.108546496

Anonymous 04/07/26(Tue)02:25:26 No.108546496

>>108545200
It's just vaporware anyways, so who cares?

Anonymous
04/07/26(Tue)03:45:02 No.108546801

Anonymous 04/07/26(Tue)03:45:02 No.108546801

>>108544649
its wrong though it doesn't refer to their pussies and the term isnt just female characters/lolis because it came from /tv/

Anonymous
04/07/26(Tue)03:51:25 No.108546823

Anonymous 04/07/26(Tue)03:51:25 No.108546823

File: 1773873674462429.jpg (133 KB, 1024x1024)

133 KB JPG

>>108545923
Indeed.

Anonymous
04/07/26(Tue)03:56:11 No.108546836

Anonymous 04/07/26(Tue)03:56:11 No.108546836

>>108545420
neat but stuff like this is so cringe all the words larping like its some groundbreaking research when they could just write
>I put low opacity text on an image and an llm ocr'd it

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.