/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/06/26(Mon)04:38:40 No.108538947

File: 1768119927772303.jpg (173 KB, 768x1024)

173 KB JPG

/lmg/ - Local Models General Anonymous 04/06/26(Mon)04:38:40 No.108538947 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108535684 & >>108532524

►News
>(04/05) HunyuanOCR support merged: https://github.com/ggml-org/llama.cpp/pull/21395
>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/06/26(Mon)04:39:08 No.108538951

Anonymous 04/06/26(Mon)04:39:08 No.108538951

File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)

1.15 MB JPG

►Recent Highlights from the Previous Thread: >>108535684

--Comparing quant options and VRAM optimization for Gemma 31B:
>108536037 >108536050 >108536053 >108536073 >108536120 >108536361 >108536390 >108536424 >108536059 >108536079 >108536133
--Comparing Gemma 4 and Qwen 3.5 MoE performance and capabilities:
>108537392 >108537519 >108537535 >108537597 >108537405 >108537572 >108537578
--Using logit softcap overrides to reduce Gemma 4's determinism:
>108535726 >108535737 >108535771 >108535791 >108535817 >108535843 >108536008 >108536066 >108537466 >108538178 >108538338 >108537546 >108537593 >108537612
--Gemma 4 draft model benchmarks and discussion on quantization levels:
>108536606 >108536822 >108538242
--Discussing llama.cpp memory management issues and the ik_llama fork:
>108535819 >108535863 >108535870 >108535885 >108535898 >108535907 >108535931 >108535950 >108536005 >108536814 >108537853
--Discussing ways to remove generic AI patterns via post-processing:
>108536315 >108536337 >108536362 >108536372 >108536443 >108536484 >108536498 >108536595 >108536686 >108536697 >108536456 >108538094
--Discussing Gemma 2 26b's robust filters and jailbreak attempts:
>108538390 >108538400 >108538410 >108538423 >108538433 >108538458 >108538463
--Gemma 4 31B performance on FoodTruck Bench simulation:
>108535818 >108535835 >108535876 >108537945
--Discussing performance improvements and capabilities of Gemma 4:
>108536335 >108536385 >108536393 >108536561 >108536622 >108536647 >108536666 >108536720 >108536763 >108536915 >108537048 >108537103 >108537436
--Evidence of Gemma base being trained on roleplay logs:
>108537545 >108537556
--llama.cpp adds native support for Hunyuan OCR:
>108538402
--Discussing bypassing Gemma's safety filters for prohibited content:
>108537984 >108538021 >108538045 >108538137
--Miku (free space):
>108535751 >108536605 >108538202

►Recent Highlight Posts from the Previous Thread: >>108535686

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/06/26(Mon)04:39:51 No.108538955

Anonymous 04/06/26(Mon)04:39:51 No.108538955

ahem, migemma

Anonymous
04/06/26(Mon)04:41:27 No.108538964

Anonymous 04/06/26(Mon)04:41:27 No.108538964

kv rotation on gemma status?

Anonymous
04/06/26(Mon)04:42:50 No.108538968

Anonymous 04/06/26(Mon)04:42:50 No.108538968

>>108538945
>they dont everyone uses ollama
but ollama uses llamacpp as the backend right?

Anonymous
04/06/26(Mon)04:44:19 No.108538974

Anonymous 04/06/26(Mon)04:44:19 No.108538974

>>108538968
oh anon you dummy

Anonymous
04/06/26(Mon)04:53:29 No.108539001

Anonymous 04/06/26(Mon)04:53:29 No.108539001

>>108538761
>better CPU and hybrid GPU/CPU performance,
Well that seems to be a lie. It runs slower than llama.cpp for me.
What's the use case for this?

Anonymous
04/06/26(Mon)04:54:19 No.108539011

Anonymous 04/06/26(Mon)04:54:19 No.108539011

>>108538964
Gemma 4 doesn't benefit from KV Rotation

Anonymous
04/06/26(Mon)04:55:21 No.108539012

Anonymous 04/06/26(Mon)04:55:21 No.108539012

Where is the Gemma-chan design?
We need porn of her

Anonymous
04/06/26(Mon)04:57:05 No.108539017

Anonymous 04/06/26(Mon)04:57:05 No.108539017

>>108539001
Drama.
https://github.com/ikawrakow/ik_llama.cpp/discussions/1247

Anonymous
04/06/26(Mon)04:57:17 No.108539019

Anonymous 04/06/26(Mon)04:57:17 No.108539019

>benchmarks and reddit say qwen is better
>lmg thinks gemma is better
>the first two are worthless
>the second only cares about pedo erp
The answer is to use both based on usecase.

Anonymous
04/06/26(Mon)04:57:34 No.108539021

Anonymous 04/06/26(Mon)04:57:34 No.108539021

>>108539011
is it because it's mathematically impossible or is it because the vibeshitters at llama.cpp don't know how to make it happen?

Anonymous
04/06/26(Mon)04:59:20 No.108539024

Anonymous 04/06/26(Mon)04:59:20 No.108539024

What sort of hardware would be required to run Gemma 4 at full precision, as in how they would run it at Google for example?

a-asking for a friend

Anonymous
04/06/26(Mon)04:59:45 No.108539025

Anonymous 04/06/26(Mon)04:59:45 No.108539025

>>108539019
Pedo ERP is a mission critical usecase whereas vibecoding isn't

Anonymous
04/06/26(Mon)05:03:33 No.108539032

Anonymous 04/06/26(Mon)05:03:33 No.108539032

File: 1757029618631460.png (248 KB, 481x507)

248 KB PNG

Do you have PATIENCE to get the job, /lmg/?

Anonymous
04/06/26(Mon)05:04:08 No.108539035

Anonymous 04/06/26(Mon)05:04:08 No.108539035

File: IT REALLY IS THAT SIMPLE.png (67 KB, 1601x334)

67 KB PNG

>>108538947
get vibeshitted

Anonymous
04/06/26(Mon)05:05:01 No.108539037

Anonymous 04/06/26(Mon)05:05:01 No.108539037

>>108539032
linkedin shitnematic universe, lel

Anonymous
04/06/26(Mon)05:05:51 No.108539040

Anonymous 04/06/26(Mon)05:05:51 No.108539040

>>108539024
Which one? You don't typically run these models at full precision (fp32) anyway.

Anonymous
04/06/26(Mon)05:05:54 No.108539041

Anonymous 04/06/26(Mon)05:05:54 No.108539041

>>108539032
the correct hire is the first one to leave

Anonymous
04/06/26(Mon)05:06:13 No.108539042

Anonymous 04/06/26(Mon)05:06:13 No.108539042

>>108539032
This is fake and gay but people will actually put you through humiliation rituals like that just to see who they can exploit the best.

Anonymous
04/06/26(Mon)05:07:19 No.108539044

Anonymous 04/06/26(Mon)05:07:19 No.108539044

>>108539032
I believe that story, they don't want talented people, they only want docile people that can play the role of a yes man all day long

Anonymous
04/06/26(Mon)05:08:00 No.108539046

Anonymous 04/06/26(Mon)05:08:00 No.108539046

>>108539024
its a 52gb model so you need at least like 64gb vram theres plenty of solutions to get that but googlel is probably using h100

Anonymous
04/06/26(Mon)05:08:29 No.108539047

Anonymous 04/06/26(Mon)05:08:29 No.108539047

File: ce.png (113 KB, 1046x522)

113 KB PNG

which one?

Anonymous
04/06/26(Mon)05:08:35 No.108539048

Anonymous 04/06/26(Mon)05:08:35 No.108539048

>https://github.com/ggml-org/llama.cpp/pull/21500
>download the goofs again just to change a flag
when will they add a tool to patch your goof metadata this is beyond retarded

Anonymous
04/06/26(Mon)05:09:24 No.108539052

Anonymous 04/06/26(Mon)05:09:24 No.108539052

>>108539047
31b https://huggingface.co/amarck/gemma-4-31b-it-abliterated-GGUF

Anonymous
04/06/26(Mon)05:10:19 No.108539055

Anonymous 04/06/26(Mon)05:10:19 No.108539055

>>108539052
If I could run this one I wouldn't cope with a moe

Anonymous
04/06/26(Mon)05:10:38 No.108539057

Anonymous 04/06/26(Mon)05:10:38 No.108539057

>31b
>my vram is 12G
it's so over for turbovramlets..

Anonymous
04/06/26(Mon)05:11:17 No.108539060

Anonymous 04/06/26(Mon)05:11:17 No.108539060

>>108539048
https://huggingface.co/ggml-org/gemma-4-E2B-it-GGUF/discussions/1#69d11dd3f5041eaf89541197
>i guess llama.cpp users won't really notice because jinja is default now, and is injected in the jinja template.
this, it's a nothingburger

Anonymous
04/06/26(Mon)05:11:22 No.108539061

Anonymous 04/06/26(Mon)05:11:22 No.108539061

>>108539057
you might be able to run okay a q4 is like 15gb youd only need some cpu offloading

Anonymous
04/06/26(Mon)05:11:27 No.108539062

Anonymous 04/06/26(Mon)05:11:27 No.108539062

>>108539048
If you're not retarded your goof 00001 only contains the metadata.

Anonymous
04/06/26(Mon)05:12:23 No.108539066

Anonymous 04/06/26(Mon)05:12:23 No.108539066

>>108539061
i know, but the speed becomes unusable at that point unlike memeoe is lot more tolerable

Anonymous
04/06/26(Mon)05:12:26 No.108539067

Anonymous 04/06/26(Mon)05:12:26 No.108539067

>>108539061
>offloading a dense
lel

Anonymous
04/06/26(Mon)05:13:10 No.108539071

Anonymous 04/06/26(Mon)05:13:10 No.108539071

>>108539067
dumbass

Anonymous
04/06/26(Mon)05:13:26 No.108539072

Anonymous 04/06/26(Mon)05:13:26 No.108539072

>>108539067
i can get 12t/s with offoad

Anonymous
04/06/26(Mon)05:14:32 No.108539076

Anonymous 04/06/26(Mon)05:14:32 No.108539076

How to set main GPU on kobold?
I have two gpus and it keeps setting the lesser one as main....

Anonymous
04/06/26(Mon)05:14:33 No.108539077

Anonymous 04/06/26(Mon)05:14:33 No.108539077

>>108539072
which is borderline cope speed for tool use
fine for rps i guess tho

Anonymous
04/06/26(Mon)05:17:12 No.108539086

Anonymous 04/06/26(Mon)05:17:12 No.108539086

>>108539076
Holy shit nigger, it has a GUI. Use your eyes.

Anonymous
04/06/26(Mon)05:18:58 No.108539089

Anonymous 04/06/26(Mon)05:18:58 No.108539089

>>108539076
that's why I'm using llamacpp server, maybe it's a basic cli but it's simple enough, you output 3 lines and that's it, no need to look on endless buttons

Anonymous
04/06/26(Mon)05:23:39 No.108539105

Anonymous 04/06/26(Mon)05:23:39 No.108539105

File: 984703650924353.png (97 KB, 415x707)

97 KB PNG

>>108539086
Setting it to use all GPUs still sets a main GPU, fucking retard.
I have a GPU imbalance, on is 16gb the other is 24gb, can your dumbass guess which one I need to be the main and which one doesnt?

Anonymous
04/06/26(Mon)05:23:42 No.108539106

Anonymous 04/06/26(Mon)05:23:42 No.108539106

>>108539089
>to look on
What's up with all these ESLs lately?

Anonymous
04/06/26(Mon)05:27:59 No.108539124

Anonymous 04/06/26(Mon)05:27:59 No.108539124

>>108539017
Skimming through, I am not sure what I am looking at here.
Anyway I am confused since it advertises itself as being faster with CPU+GPU mixed offload inference, and I see results in the internet ostensibly confirming that, but when I run Qwen 3.5 35B on both, it runs 50% faster on the normal llama.cpp

Anonymous
04/06/26(Mon)05:28:58 No.108539130

Anonymous 04/06/26(Mon)05:28:58 No.108539130

File: imbecile spotted.png (275 KB, 640x428)

275 KB PNG

>>108539106

Anonymous
04/06/26(Mon)05:30:15 No.108539133

Anonymous 04/06/26(Mon)05:30:15 No.108539133

File: 1775257603862045.jpg (129 KB, 990x936)

129 KB JPG

>>108539130

Anonymous
04/06/26(Mon)05:32:24 No.108539142

Anonymous 04/06/26(Mon)05:32:24 No.108539142

>>108539133
>frog posting
opinion discarded

Anonymous
04/06/26(Mon)05:32:40 No.108539145

Anonymous 04/06/26(Mon)05:32:40 No.108539145

>>108539025
this

Anonymous
04/06/26(Mon)05:33:06 No.108539147

Anonymous 04/06/26(Mon)05:33:06 No.108539147

>>108539130
Well don't look at my finger then.

Anonymous
04/06/26(Mon)05:35:35 No.108539157

Anonymous 04/06/26(Mon)05:35:35 No.108539157

>>108539133
But is it really better to have the same skin color as sperm?

Anonymous
04/06/26(Mon)05:36:47 No.108539161

Anonymous 04/06/26(Mon)05:36:47 No.108539161

>>108539157
sperm is life
poop is waste

Anonymous
04/06/26(Mon)05:36:49 No.108539162

Anonymous 04/06/26(Mon)05:36:49 No.108539162

>>108539157
Only one of those is fine to eat.

Anonymous
04/06/26(Mon)05:37:27 No.108539163

Anonymous 04/06/26(Mon)05:37:27 No.108539163

>>108539162
>he says he swallows sperm
faggot lol

Anonymous
04/06/26(Mon)05:38:39 No.108539164

Anonymous 04/06/26(Mon)05:38:39 No.108539164

>>108539163
>he doesn't recycle his own loads for bigger gains
ngmi

Anonymous
04/06/26(Mon)05:40:47 No.108539168

Anonymous 04/06/26(Mon)05:40:47 No.108539168

>>108539025
Vibecoding with a model that fits on a single gaming GPU is a lost cause anyway; they just can't have enough verbatim knowledge of all possible libraries and usage patterns that people might use use/need, without degrading other areas. Let local LLMs do natural language tasks.

Anonymous
04/06/26(Mon)05:43:47 No.108539182

Anonymous 04/06/26(Mon)05:43:47 No.108539182

>>108539168
the other problem is that your gpu won't fit enough context even if the llm was, somehow, smart and knowledgeable enough (it's not the case, but let's posit a theoretical model of the future was)
a large part of what made gemini useful for me in some coding tasks is that I can give it a huge amount of related source code (caller, library code called etc) in the same prompt. Real code is rarely isolated enough to properly work with LLMs as mere tiny excerpts, LLMs work best when fed a lot of context.

Anonymous
04/06/26(Mon)05:45:55 No.108539191

Anonymous 04/06/26(Mon)05:45:55 No.108539191

Guys I worry about deepseek because they don't have large amounts of traces of people using deepseek in coding harnesses like GLM and Qwen do.

Anonymous
04/06/26(Mon)05:47:32 No.108539199

Anonymous 04/06/26(Mon)05:47:32 No.108539199

>>108538947
why the /aicg/ here in /g/ is so retarded?

Anonymous
04/06/26(Mon)05:49:04 No.108539201

Anonymous 04/06/26(Mon)05:49:04 No.108539201

>>108539199
API paypigging will do that

Anonymous
04/06/26(Mon)05:49:04 No.108539202

Anonymous 04/06/26(Mon)05:49:04 No.108539202

>>108539199
you have to be non-retarded to be able to run local models, so it filters out the subhumans on /lmg/, /aicg/ doesn't have such filters

Anonymous
04/06/26(Mon)05:49:20 No.108539203

Anonymous 04/06/26(Mon)05:49:20 No.108539203

> Google paper comes out claiming x8 vram reductions or something
Did that get implemented and is usable yet, or was it a whole bunch of nothing?

Anonymous
04/06/26(Mon)05:50:08 No.108539206

Anonymous 04/06/26(Mon)05:50:08 No.108539206

>>108539168
I agree that cloud models are the only option for real coding work. But vibe-coders are subhuman and will deliver slop regardless of model size, so they might as well use a small model.

Anonymous
04/06/26(Mon)05:50:14 No.108539207

Anonymous 04/06/26(Mon)05:50:14 No.108539207

>>108539057
26B IQ4_N_L is just 14gb anon... offload the extra 2gb on your ram

Anonymous
04/06/26(Mon)05:51:55 No.108539212

Anonymous 04/06/26(Mon)05:51:55 No.108539212

File: 1769020848215663.png (109 KB, 1257x965)

109 KB PNG

>>108539191
People do use DS V3.2 for coding, at least on OR (top models are free; DS V3.2 isn't)

Anonymous
04/06/26(Mon)05:52:00 No.108539213

Anonymous 04/06/26(Mon)05:52:00 No.108539213

What's the best model can I realistically run on 16GB VRAM + 64GB system RAM if I don't care about token speed at all and just want *a* response at some point in time?

Anonymous
04/06/26(Mon)05:52:29 No.108539215

Anonymous 04/06/26(Mon)05:52:29 No.108539215

>>108539207
26b out of nowhere

Anonymous
04/06/26(Mon)05:54:32 No.108539222

Anonymous 04/06/26(Mon)05:54:32 No.108539222

>>108539213
Gemma 4 31B
Or the Gemma 4 26B moe for actually good speeds and only a little worse quality

Anonymous
04/06/26(Mon)05:54:33 No.108539223

Anonymous 04/06/26(Mon)05:54:33 No.108539223

>>108539207
of course i can
i can even run 262k context with reasonable speed with 26b thanks to moe (20t/s)
31b?
even at IQ2_XSS it shits itself

Anonymous
04/06/26(Mon)05:55:10 No.108539224

Anonymous 04/06/26(Mon)05:55:10 No.108539224

>>108539047
gemma-4-moe-tiny-random

Anonymous
04/06/26(Mon)05:55:42 No.108539225

Anonymous 04/06/26(Mon)05:55:42 No.108539225

>>108539032
Are they looking for people who are only qualified to wait? Was this in a restaurant?

Anonymous
04/06/26(Mon)05:56:36 No.108539228

Anonymous 04/06/26(Mon)05:56:36 No.108539228

>>108539215
26B is capable. I coomed 5 times on it already and it's still addicting. All the rp I just had with it are going above 10k context. Besides a 40 token system prompt lets you do any of those CUNNY rp too.
Gemma 4 is love.

Anonymous
04/06/26(Mon)05:56:37 No.108539229

Anonymous 04/06/26(Mon)05:56:37 No.108539229

>>108539047
I'm using heretic, seems ogey

Anonymous
04/06/26(Mon)05:58:46 No.108539237

Anonymous 04/06/26(Mon)05:58:46 No.108539237

>>108539229
Does it even add anything? Gemma4 hasn't denied me anything, no matter how heinous.

Anonymous
04/06/26(Mon)05:58:55 No.108539238

Anonymous 04/06/26(Mon)05:58:55 No.108539238

>>108539228
yeah yeah i know, that anon was talking about a 31b though.
>>108539229
is there any point in using heretic? basic one worked alright, but i felt how it was shy to write real bad naughty words and tried to steer away from those

Anonymous
04/06/26(Mon)05:59:47 No.108539239

Anonymous 04/06/26(Mon)05:59:47 No.108539239

File: 1765569093711105.jpg (181 KB, 1216x880)

181 KB JPG

What is actually "model support"? I want to vibecode an update to an image captioner that is full of outdated shit like phi and florence. Is it just about updating transformers?

Anonymous
04/06/26(Mon)06:00:19 No.108539241

Anonymous 04/06/26(Mon)06:00:19 No.108539241

>>108539225
If true (it likely isn't) then they would be looking for only the most desperate applicants willing to settle for the lowest wages who need the job in order to survive, meaning they'd be very unlikely to ever expect decent working conditions or report any workplace violations.

Anonymous
04/06/26(Mon)06:02:14 No.108539250

Anonymous 04/06/26(Mon)06:02:14 No.108539250

>>108539241
obviously, why do you think Elon is pro indian immigration? he knows he can treat them as slaves, for those indians it's still better than going back to poopland

Anonymous
04/06/26(Mon)06:03:32 No.108539253

Anonymous 04/06/26(Mon)06:03:32 No.108539253

File: 1745995140119860.png (65 KB, 821x879)

65 KB PNG

>>108539237
>>108539238
I don't know. I'm just a promplet, my JB on original model failed.

Anonymous
04/06/26(Mon)06:03:41 No.108539254

Anonymous 04/06/26(Mon)06:03:41 No.108539254

>>108539032
I'll take things that never happened for $1000

Anonymous
04/06/26(Mon)06:04:04 No.108539259

Anonymous 04/06/26(Mon)06:04:04 No.108539259

>>108539213
depends on the task

Anonymous
04/06/26(Mon)06:06:58 No.108539273

Anonymous 04/06/26(Mon)06:06:58 No.108539273

>>108539253
yeah, you see the safe/suggestive remark? haven't played with heretic models, but they should remove these guardrails from model. the downside they might make model retarded.
all lies in hands of tuner, am i'm far too green to know who to trust with that

Anonymous
04/06/26(Mon)06:07:04 No.108539274

Anonymous 04/06/26(Mon)06:07:04 No.108539274

>>108539199
They are literal kids from discord that are only there because sometimes people share stolen api keys they can use to ERP on their smartphones. They aren't even 4chan users let alone /g/ users.

Contrast this with /lmg/ which is essentially the front line of modern computer hobbyism. Like how SBCs used to be 15 years ago, forum and internet culture 25 years ago.

This might sadden you but /lmg/ is now one of the most technical places on the open internet. Better LLM discussion than hackernews, twitter (including discussions with the actual researchers) and reddit.

So the contrast is the most extreme possible. Non technical teenagers versus technical hardcore hobbyists that skew older and more experienced.

Anonymous
04/06/26(Mon)06:10:05 No.108539281

Anonymous 04/06/26(Mon)06:10:05 No.108539281

File: openclaw success chinese (...).jpg (1.09 MB, 1080x1657)

1.09 MB JPG

>y'all openclaw is bad n sheeeit
>Local models ain't not only good for gooning

Anonymous
04/06/26(Mon)06:10:15 No.108539284

Anonymous 04/06/26(Mon)06:10:15 No.108539284

>>108539274
cringe but also fair

Anonymous
04/06/26(Mon)06:10:22 No.108539286

Anonymous 04/06/26(Mon)06:10:22 No.108539286

Which model does ballbusting rp best

Anonymous
04/06/26(Mon)06:10:57 No.108539290

Anonymous 04/06/26(Mon)06:10:57 No.108539290

File: based.png (148 KB, 498x498)

148 KB PNG

>>108539274
>/lmg/ is now one of the most technical places on the open internet. Better LLM discussion than hackernews, twitter (including discussions with the actual researchers) and reddit.
I'm so glad to be part of the elite bros

Anonymous
04/06/26(Mon)06:12:06 No.108539294

Anonymous 04/06/26(Mon)06:12:06 No.108539294

>>108539290
Not you, you're reddit

Anonymous
04/06/26(Mon)06:12:48 No.108539296

Anonymous 04/06/26(Mon)06:12:48 No.108539296

>>108539213
If you want a response within a couple days you could probably run something huge like Qwen 3.5 122b quantized to 4 bits and compressed KV using TurboQuant at ~3.5 bits. I don't know what the use-case is for something like that, but that's what you asked for.

Anonymous
04/06/26(Mon)06:12:56 No.108539298

Anonymous 04/06/26(Mon)06:12:56 No.108539298

Soo is qwen3.5 actually better than gemma 4 for stuff like hermes agent or is that a meme

Anonymous
04/06/26(Mon)06:13:08 No.108539300

Anonymous 04/06/26(Mon)06:13:08 No.108539300

>>108539290
>Better LLM discussion than hackernews, twitter (including discussions with the actual researchers) and reddit

Anonymous
04/06/26(Mon)06:14:13 No.108539301

Anonymous 04/06/26(Mon)06:14:13 No.108539301

>>108539298
yeah gemma sadly can't quite keep up with the 397b

Anonymous
04/06/26(Mon)06:15:25 No.108539303

Anonymous 04/06/26(Mon)06:15:25 No.108539303

>>108539274
It's actually surprising how other sites are so full of absolute retards these days.

4chan somehow manages to have genuine idiots when it does, instead of you having to play this guessing game of actual retards vs engagement bait vs bots vs whatever.

I went and joined MENSA to see if a somewhat gatekept community could avoid the stupidity, but even there I end up running into complete idiots more concerned in arguing political agendas than facts and empiricism.

I'm starting to thing the only websites will ever be worth using are those where slurs are commonplace. It's the one indicator of freedom, individual thinking, and that if you get raided by llms they are at least not the sanitized models.

Anonymous
04/06/26(Mon)06:16:17 No.108539308

Anonymous 04/06/26(Mon)06:16:17 No.108539308

>>108539301
But what about the 27b? I can't practically right a 397b for stuff like that anyway

Anonymous
04/06/26(Mon)06:16:21 No.108539309

Anonymous 04/06/26(Mon)06:16:21 No.108539309

yeah it's really a sad state of affairs but even this shithole can pass as quality vs something like HN
the average HNer is unable to notice slop, and is OFFENDED if you dare to point at the slop and say you don't want to see more of it
1/4 of the comments are LLM paste or agents
3/4 of the posts themselves are AI slop
the irony of /lmg/ being all about talking about LLMs but having less slop posting than the general tech news site

Anonymous
04/06/26(Mon)06:19:28 No.108539315

Anonymous 04/06/26(Mon)06:19:28 No.108539315

>>108539309
It's the deadly combination of vocaloids and blatant racism that keeps the slop away.

Anonymous
04/06/26(Mon)06:23:04 No.108539325

Anonymous 04/06/26(Mon)06:23:04 No.108539325

>>108539309
All thanks to cunny.

Anonymous
04/06/26(Mon)06:24:06 No.108539330

Anonymous 04/06/26(Mon)06:24:06 No.108539330

>>108539315
Just like with rent, you gotta step into your backyard 2 times a month and yell about niggers, jews, loli and 六四天安門事件 at the top of your lungs to keep the bots away

Anonymous
04/06/26(Mon)06:30:11 No.108539352

Anonymous 04/06/26(Mon)06:30:11 No.108539352

For the first time, I got the qwen-like "this looks like a jailbreak attempt" in gemma's reasoning. But it's quite rare.

Anonymous
04/06/26(Mon)06:35:42 No.108539374

Anonymous 04/06/26(Mon)06:35:42 No.108539374

>>108537545
Highly likely they trained it on character.ai data more than Chub and so on. That's where {{char}} originally came from anyway, and character.ai is licensing its "technology" to other companies including Google.

https://techcrunch.com/2024/08/02/character-ai-ceo-noam-shazeer-returns-to-google/
>Google is also signing a non-exclusive agreement with Character.AI to use its tech.

Anonymous
04/06/26(Mon)06:37:47 No.108539384

Anonymous 04/06/26(Mon)06:37:47 No.108539384

>>108539374
Google needs to fix its incentive structure. Currently the way to make most money as Google employee is to quit and then get rehired or acquired.

Anonymous
04/06/26(Mon)06:41:08 No.108539398

Anonymous 04/06/26(Mon)06:41:08 No.108539398

I'm feeling local is back.

Anonymous
04/06/26(Mon)06:44:15 No.108539406

Anonymous 04/06/26(Mon)06:44:15 No.108539406

>>108539398
Only if datacenters get hit

Anonymous
04/06/26(Mon)06:51:59 No.108539434

Anonymous 04/06/26(Mon)06:51:59 No.108539434

>>108539406
Hit by what exactly, anon?

Anonymous
04/06/26(Mon)06:52:21 No.108539435

Anonymous 04/06/26(Mon)06:52:21 No.108539435

>>108539434
Shaheds

Anonymous
04/06/26(Mon)06:52:56 No.108539440

Anonymous 04/06/26(Mon)06:52:56 No.108539440

>>108539398
It's just the first time in a good while that a relatively large number of people could share the same experience. Most recently released models worth using are huge.

Anonymous
04/06/26(Mon)07:02:34 No.108539481

Anonymous 04/06/26(Mon)07:02:34 No.108539481

>>108539440
Last thread made me realize nobody from this newfag wave ever used glm. Since glm was probably even more deterministic than gemma. Also i wouldn't trust gemma with ego death.

Anonymous
04/06/26(Mon)07:06:21 No.108539502

Anonymous 04/06/26(Mon)07:06:21 No.108539502

Why is my Gemma4 31B UD-Q6_K_XL eating up 32gb vram AND 70gb system ram??????

Anonymous
04/06/26(Mon)07:07:40 No.108539505

Anonymous 04/06/26(Mon)07:07:40 No.108539505

>>108539502
Because you've configured your backend wrong

Anonymous
04/06/26(Mon)07:08:16 No.108539512

Anonymous 04/06/26(Mon)07:08:16 No.108539512

>UD

Anonymous
04/06/26(Mon)07:09:06 No.108539518

Anonymous 04/06/26(Mon)07:09:06 No.108539518

>>108539502
Use Q5_KM at 50K context, that's perfect for 32GB for now.

Anonymous
04/06/26(Mon)07:11:54 No.108539533

Anonymous 04/06/26(Mon)07:11:54 No.108539533

>>108539202
Such skills required.

Anonymous
04/06/26(Mon)07:12:41 No.108539539

Anonymous 04/06/26(Mon)07:12:41 No.108539539

>>108539202
tb h that is a really low bar

Anonymous
04/06/26(Mon)07:16:24 No.108539558

Anonymous 04/06/26(Mon)07:16:24 No.108539558

>>108539505
what should I change?
-m models\gemma-4-31B-it-UD-Q6_K_XL.gguf ^
 --port 8080 ^
 --jinja ^
 -ngl 999 ^
--reasoning off  ^
 -c 64000 
>>108539518
good to know

>>108539512
you don't like it?

Anonymous
04/06/26(Mon)07:18:01 No.108539566

Anonymous 04/06/26(Mon)07:18:01 No.108539566

>>108539558
A lot, read the previous thread.

Anonymous
04/06/26(Mon)07:20:07 No.108539570

Anonymous 04/06/26(Mon)07:20:07 No.108539570

>>108539502
Are you disabling mmap?

Anonymous
04/06/26(Mon)07:21:17 No.108539579

Anonymous 04/06/26(Mon)07:21:17 No.108539579

File: 1763954959990869.png (2.99 MB, 1024x1536)

2.99 MB PNG

>>108539558

Anonymous
04/06/26(Mon)07:21:21 No.108539580

Anonymous 04/06/26(Mon)07:21:21 No.108539580

why is gemma such a pig for kv cache? I can't run more then one thread at a decent context length.

Anonymous
04/06/26(Mon)07:21:55 No.108539584

Anonymous 04/06/26(Mon)07:21:55 No.108539584

>>108539558
Add
--parallel 1
--no-mmap

That will reduce ram usage a ton

Anonymous
04/06/26(Mon)07:23:38 No.108539590

Anonymous 04/06/26(Mon)07:23:38 No.108539590

>>108539579
This should be the unsloth logo

Anonymous
04/06/26(Mon)07:23:59 No.108539595

Anonymous 04/06/26(Mon)07:23:59 No.108539595

>>108539558
--swa-checkpoints 1

Anonymous
04/06/26(Mon)07:24:47 No.108539600

Anonymous 04/06/26(Mon)07:24:47 No.108539600

Do you actually need SWA instead of FA for new gemma?

Anonymous
04/06/26(Mon)07:24:54 No.108539602

Anonymous 04/06/26(Mon)07:24:54 No.108539602

>>108539579
You know by fucking up Llama-4 the Zucc has managed to keep a low profile. His middle eastern assets might avoid getting bombed. 4d chess or something.

Anonymous
04/06/26(Mon)07:29:51 No.108539631

Anonymous 04/06/26(Mon)07:29:51 No.108539631

>he doesn't use --useswa

Anonymous
04/06/26(Mon)07:35:46 No.108539658

Anonymous 04/06/26(Mon)07:35:46 No.108539658

>>108539570
>>108539584
>>108539595
thanks added
I'll report back

Anonymous
04/06/26(Mon)07:53:49 No.108539752

Anonymous 04/06/26(Mon)07:53:49 No.108539752

no need for bos

Anonymous
04/06/26(Mon)08:03:45 No.108539807

Anonymous 04/06/26(Mon)08:03:45 No.108539807

>>108539752
A good backend will handle that for you

Anonymous
04/06/26(Mon)08:04:19 No.108539808

Anonymous 04/06/26(Mon)08:04:19 No.108539808

>>108539286
I don't think we have the data.
You'll have to make a benchmark.

Anonymous
04/06/26(Mon)08:05:46 No.108539815

Anonymous 04/06/26(Mon)08:05:46 No.108539815

File: 1761389401659502.png (3.22 MB, 1264x2216)

3.22 MB PNG

>>108539398
Agree. Gemma has obv moved the bar.
>>108539191
DS works great for agentic work and coding.

Anonymous
04/06/26(Mon)08:07:54 No.108539828

Anonymous 04/06/26(Mon)08:07:54 No.108539828

>>108539815
This reminds me of the 4th time Miku fucked my wife

Anonymous
04/06/26(Mon)08:08:59 No.108539834

Anonymous 04/06/26(Mon)08:08:59 No.108539834

>>108539828
Maybe you should start charging her, Miku can certainly afford it.

Anonymous
04/06/26(Mon)08:09:43 No.108539841

Anonymous 04/06/26(Mon)08:09:43 No.108539841

>>108539481
>Since glm was probably even more deterministic than gemma.
It wasn't? I used GLM 4 (briefly), 4.5, 4.6, 4.7 and 5. Gemma-4 is more deterministic. I assume it's a bug in llama.cpp.

Anonymous
04/06/26(Mon)08:14:26 No.108539871

Anonymous 04/06/26(Mon)08:14:26 No.108539871

>>108539807
>>108539752
i think it's about this https://github.com/ggml-org/llama.cpp/pull/21500

Anonymous
04/06/26(Mon)08:19:07 No.108539908

Anonymous 04/06/26(Mon)08:19:07 No.108539908

>>108539871
>we are using a dedicated tokenizer model for gemma 4
wtf is a "dedicated tokenizer model"??

Anonymous
04/06/26(Mon)08:21:57 No.108539928

Anonymous 04/06/26(Mon)08:21:57 No.108539928

>>108539908
Old general-purpose, vibecoded tokenizer didn't work, so for gemma 4 a new vibecoded tokenizer was created.

Anonymous
04/06/26(Mon)08:22:35 No.108539933

Anonymous 04/06/26(Mon)08:22:35 No.108539933

>>108539908
Maybe he meant like it's using a specialized parser when he said tokenizer?

Anonymous
04/06/26(Mon)08:35:21 No.108540008

Anonymous 04/06/26(Mon)08:35:21 No.108540008

he must have brainfarted and meant parser, it's the only thing where you could have a distinction of "dedicated" (hand written) vs autogenerated (autoparser garbage from piotr vibemonkey)
checked the hf conversion to gguf script and they just import the tokenizer.json to convert it to their gguf format
vocab = gguf.LlamaHfVocab(self.dir_model)
where LlamaHfVocab does:
        fname_tokenizer = base_path / 'tokenizer.json'
        # if this fails, FileNotFoundError propagates to caller
        with open(fname_tokenizer, encoding='utf-8') as f:
            tokenizer_json = json.load(f)
nothing dedicated/custom/handpisscrafted about it

Anonymous
04/06/26(Mon)08:39:30 No.108540029

Anonymous 04/06/26(Mon)08:39:30 No.108540029

Has anyone tested how much quants degrade 31B's performance? I use Q8 but would like to try lower for speed.

Anonymous
04/06/26(Mon)08:40:29 No.108540034

Anonymous 04/06/26(Mon)08:40:29 No.108540034

Doesn't this change affect all gemma4 models, not just the moe? https://github.com/ggml-org/llama.cpp/pull/21506

Anonymous
04/06/26(Mon)08:42:03 No.108540042

Anonymous 04/06/26(Mon)08:42:03 No.108540042

>>108540034
Wow, good point! Maybe you should post it somewhere where the devs can actually see it.

Anonymous
04/06/26(Mon)08:43:28 No.108540050

Anonymous 04/06/26(Mon)08:43:28 No.108540050

>>108540042
I never touched c++ in my entire life

Anonymous
04/06/26(Mon)08:44:10 No.108540053

Anonymous 04/06/26(Mon)08:44:10 No.108540053

>>108539658
thanks to the the smaller model and the new settings it doesn't feel like Gemma is memory leaking anymore.
nice

Anonymous
04/06/26(Mon)08:46:32 No.108540060

Anonymous 04/06/26(Mon)08:46:32 No.108540060

>>108540034
>F16 to F32
vramletbros... i dont feel so good

Anonymous
04/06/26(Mon)08:47:59 No.108540071

Anonymous 04/06/26(Mon)08:47:59 No.108540071

Anonymous 04/06/26(Mon)15:23:10 No.108539941
>>108539928
>vibecoded tokenizer
Lmao. Imagine the merge requests. "It doesn't tokenize 'cunny' correctly, fixing it now."

Anonymous 04/06/26(Mon)15:25:44 No.108539956

>>108539908
Basically it means they had to stop using the generic llama.cpp tokenizer logic and write a specific implementation for Gemma 4's vocab/special tokens because they're weird.
It's just more bloat in the binary.

Anonymous 04/06/26(Mon)15:28:12 No.108539970

>>108539558
Did the --no-mmap fix the ram leak or are you still swimming in 70GB of system ram? If it's still leaking, you're probably hitting some weird interaction with the UD quant and your driver version.

Anonymous 04/06/26(Mon)15:31:05 No.108539988

>>108539286
Try the abliterated 31B with a high temperature and a "depraved" system prompt. Dense models usually handle the nuance of pain/pleasure better than the MoEs which just tend to loop the same three adjectives.

Anonymous 04/06/26(Mon)15:33:12 No.108540012

>>108539841
>I assume it's a bug in llama.cpp
It's not a bug, it's the logit softcap. If you don't override it, Gemma 4 basically becomes a glorified autocomplete for "As an AI language model...". You have to fight the weights just to get it to stop sounding like a corporate HR handbook.

Anonymous 04/06/26(Mon)15:35:40 No.108540035

>>108539213
If you truly don't care about speed, just run a 70B+ model in GGUF and offload everything that doesn't fit in VRAM to your 64GB of system RAM. You'll get maybe 0.5 tokens per second, but for a long-form RP response, you can just go make a sandwich while it thinks. It's the "slow cook" method of LLM inference.

Anonymous 04/06/26(Mon)15:38:19 No.108540058

>>108540035
>0.5 t/s
Absolute state of the VRAM-poor. I can't imagine waiting 10 minutes for a paragraph. I'd rather use a 12B model that actually fits and just prompt-engineer the intelligence back into it.

Anonymous
04/06/26(Mon)08:49:03 No.108540077

Anonymous 04/06/26(Mon)08:49:03 No.108540077

>>108540060
This is just for matrix multiplication intermediaries. It shouldn't really be noticeable, I think.

Anonymous
04/06/26(Mon)08:49:21 No.108540081

Anonymous 04/06/26(Mon)08:49:21 No.108540081

>>108539871
I don't think it's impacting me, I'm using sillytavern chat completion so it's using jinja and the embedded jinja has the bos thing in it right?

Anonymous
04/06/26(Mon)08:50:21 No.108540089

Anonymous 04/06/26(Mon)08:50:21 No.108540089

>>108540081
Yes, chat completion is unaffected. Missing <bos> just kills the model so if it works for you, you already have it.

Anonymous
04/06/26(Mon)08:53:44 No.108540111

Anonymous 04/06/26(Mon)08:53:44 No.108540111

>>108540050
good thing you don't need to, though if you could you would have read the code and noticed it doesn't restrict the change to MoEs (the PR title is misleading), dude.
LLM_ARCH_GEMMA4 represents all the gemma 4 and it's applied on the down ffn
>>108540077
as a fellow vramlet I can confirm the change is not noticeable, I've been merging the gemma fixes without waiting for them to get into master

Anonymous
04/06/26(Mon)08:54:44 No.108540117

Anonymous 04/06/26(Mon)08:54:44 No.108540117

>>108540053
What's your card and tok/s?

Anonymous
04/06/26(Mon)08:55:29 No.108540123

Anonymous 04/06/26(Mon)08:55:29 No.108540123

Shit's fucked with gemma4 gguf tool calling. GGUF does not fucking call tools even though the openrouter version calls just fine with the same prompt. WHY DOES THIS HAPPEN EVERY TIME??

Anonymous
04/06/26(Mon)08:57:23 No.108540139

Anonymous 04/06/26(Mon)08:57:23 No.108540139

>>108540123
did you load
github.com/ggml-org/llama.cpp/blob/master/models/templates/google-gemma-4-31B-it-interleaved.jinja
with --chat-template-file?
the gguf tool calling is broken on ALL ggufs and they released this interleaved thinking tool call template for 31B and 26B

Anonymous
04/06/26(Mon)08:59:30 No.108540155

Anonymous 04/06/26(Mon)08:59:30 No.108540155

>>108540117
NTA.

3xRTX 3090, unsloth-gemma-4-31B-it-UD-Q8_K_XL.gguf
pp 14k tokens, 380 t/s
tg 238 tokens, 16 t/s

Anonymous
04/06/26(Mon)09:01:24 No.108540169

Anonymous 04/06/26(Mon)09:01:24 No.108540169

>>108539202
>you have to be non-retarded to be able to run local models
doubt.jpg

Anonymous
04/06/26(Mon)09:03:38 No.108540185

Anonymous 04/06/26(Mon)09:03:38 No.108540185

Is g4 26B better at rp than qwen 3.5 27B?

Anonymous
04/06/26(Mon)09:05:02 No.108540198

Anonymous 04/06/26(Mon)09:05:02 No.108540198

>>108540155
Oh, I assumed you were the 32GB anon.
I'm wondering what people are getting on their 5090s/4500 pros.

Anonymous
04/06/26(Mon)09:05:39 No.108540205

Anonymous 04/06/26(Mon)09:05:39 No.108540205

>>108540198
NTA means Not That Anon.

Anonymous
04/06/26(Mon)09:09:23 No.108540226

Anonymous 04/06/26(Mon)09:09:23 No.108540226

>>108540205
I am well aware, but I assumed you were talking about the original person in the chain and not the anon I replied to.

Anonymous
04/06/26(Mon)09:10:18 No.108540232

Anonymous 04/06/26(Mon)09:10:18 No.108540232

https://github.com/ggml-org/llama.cpp/issues/21511
another piotr booboo, the ride never ends
100% an autoparser issue, this didn't happen pre-autoparser.
we should be very grateful for Gemma getting its dedicated parser. Qwen will probably have all sorts of subtle bugs until the end of times.

Anonymous
04/06/26(Mon)09:10:39 No.108540234

Anonymous 04/06/26(Mon)09:10:39 No.108540234

>>108540226
No, no, I'm neither. Just wanted to post mine to get the discussion going. Gemma 4 feels very slow for its size.

Anonymous
04/06/26(Mon)09:12:27 No.108540242

Anonymous 04/06/26(Mon)09:12:27 No.108540242

>>108540232
Two more weeks till proper support

Anonymous
04/06/26(Mon)09:26:48 No.108540334

Anonymous 04/06/26(Mon)09:26:48 No.108540334

>>108539106
>>108539130
>>108539133
>>108539157
>>108539161
>>108539162
>>108539163
>>108539164
kek peak reason to pay for internet

Anonymous
04/06/26(Mon)09:30:02 No.108540362

Anonymous 04/06/26(Mon)09:30:02 No.108540362

File: Screenshot 2026-04-06 at (...).png (915 KB, 1305x8082)

915 KB PNG

Accumulator issues, cuda version affecting quants, Cuda Fusion.
A Myriad of things huh?
Can't wait to see what things will look like in a month or so when the implementation is more stable.

Anonymous
04/06/26(Mon)09:34:15 No.108540394

Anonymous 04/06/26(Mon)09:34:15 No.108540394

>>108540117
5090
prompt eval time =     700.57 ms /   780 tokens (    0.90 ms per token,  1113.37 tokens per second)
       eval time =    1516.92 ms /    83 tokens (   18.28 ms per token,    54.72 tokens per second)
      total time =    2217.50 ms /   863 tokens
>>108540071
>Did the --no-mmap fix the ram leak or are you still swimming in 70GB of system ram?
I think so yes

Anonymous
04/06/26(Mon)09:35:08 No.108540398

Anonymous 04/06/26(Mon)09:35:08 No.108540398

VRAM chads (96GB+), what model and settings are you using on llama.cpp for gemma4?

Anonymous
04/06/26(Mon)09:36:29 No.108540407

Anonymous 04/06/26(Mon)09:36:29 No.108540407

>>108539202
This was only true for the first few months of /lmg/'s file. koboldcpp and the single-click-exe has done incalculable damage by removing any such filters.

Anonymous
04/06/26(Mon)09:37:28 No.108540417

Anonymous 04/06/26(Mon)09:37:28 No.108540417

>>108540407
>koboldcpp and the single-click-exe has done incalculable damage
to ssds

Anonymous
04/06/26(Mon)09:40:24 No.108540434

Anonymous 04/06/26(Mon)09:40:24 No.108540434

>>108539035
How about you get PR'd?

Anonymous
04/06/26(Mon)09:40:45 No.108540435

Anonymous 04/06/26(Mon)09:40:45 No.108540435

>gemma 4 releases
>elitism suddenly surges in /lmg/
It's like what they say, good times create weak men...

Anonymous
04/06/26(Mon)09:41:42 No.108540443

Anonymous 04/06/26(Mon)09:41:42 No.108540443

>>108540089
I haven't noticed any difference whatsoever. Using completion and my own tags.

Anonymous
04/06/26(Mon)09:44:13 No.108540459

Anonymous 04/06/26(Mon)09:44:13 No.108540459

File: socks.png (63 KB, 202x138)

63 KB PNG

>>108539281
I could not trust this person.

Anonymous
04/06/26(Mon)09:45:59 No.108540471

Anonymous 04/06/26(Mon)09:45:59 No.108540471

new koboldcpp update with more fixes for gemma 4 for those interested (on the rolling release)

Anonymous
04/06/26(Mon)09:46:52 No.108540476

Anonymous 04/06/26(Mon)09:46:52 No.108540476

>>108540362
>cuda version affecting quants
this one is 100% a nvidiot issue
and this doesn't surprise me because I personally know a guy who works there who was like "I haven't hand written a single line of code since half a year ago" lauding how much better codex and claude code had gotten, kek in kekistan software is turning into a house of cards that is going to break down so hard soon

Anonymous
04/06/26(Mon)09:47:35 No.108540485

Anonymous 04/06/26(Mon)09:47:35 No.108540485

Gemma bros I need your help. I have 32GB VRAM but only 16GB system RAM. My system ram is constantly at 100% usage because of llama.ccp.

Yes I already did --no-mmap but it keeps using system ram.

There HAS to be some combination of parameters so that no system ram is used because 100% of the model AND context fits 100% within my VRAM right?

Anonymous
04/06/26(Mon)09:48:38 No.108540492

Anonymous 04/06/26(Mon)09:48:38 No.108540492

>>108540485
>I have 32GB VRAM but only 16GB system RAM
lmao thank saltman

Anonymous
04/06/26(Mon)09:49:13 No.108540497

Anonymous 04/06/26(Mon)09:49:13 No.108540497

>>108540485
--swa-checkpoints

Anonymous
04/06/26(Mon)09:50:18 No.108540504

Anonymous 04/06/26(Mon)09:50:18 No.108540504

>>108540485
And --cram . And probably many others. WE DON'T KNOW YOUR CURRENT PARAMETERS POST THEM YOU AAAAAAAAAAAAAA

Anonymous
04/06/26(Mon)09:50:39 No.108540508

Anonymous 04/06/26(Mon)09:50:39 No.108540508

>>108540485
>Yes I already did --no-mmap but it keeps using system ram.
Maybe try --no-direct-io
Actually, wouldn't you want mmap or direct io to not allocate space in RAM you wouldn't need?
Also, --cache-ram , --ctx-checkpoints, -swa-checkpoints, etc.

Anonymous
04/06/26(Mon)09:53:40 No.108540519

Anonymous 04/06/26(Mon)09:53:40 No.108540519

>>108540485
>but it keeps using system ram
Isn't it because of the context? --fit tells me 31B needs 260GB...

Anonymous
04/06/26(Mon)09:53:49 No.108540521

Anonymous 04/06/26(Mon)09:53:49 No.108540521

>>108540485
we really need this in the /lmg/ opener:
--swa-checkpoints 1
--parallel 1
--cache-ram 0
and specifically for the ultra vramlets/ramlets who run E2B/E4B:
--override-tensor "per_layer_token_embd\.weight=CPU"
It's batshit this one is not the default behavior, there is no performance loss but the VRAM gain for those models is substantial in throwing the PLE to cpu. They are called E2B and E4B for Effective, as in, they're very much 2B and 4B sized if you throw the PLE to cpu ram.

Anonymous
04/06/26(Mon)09:56:20 No.108540529

Anonymous 04/06/26(Mon)09:56:20 No.108540529

>>108540521
>--override-tensor "per_layer_token_embd\.weight=CPU"
How would I format this irl? do I manually need to check the layers and their weights from somewhere?

Anonymous
04/06/26(Mon)09:56:59 No.108540533

Anonymous 04/06/26(Mon)09:56:59 No.108540533

Gemma 4 for 11 vrams: e2b at q8 or e4b at q4km?

Anonymous
04/06/26(Mon)09:58:26 No.108540539

Anonymous 04/06/26(Mon)09:58:26 No.108540539

>>108540533
26a4 quanted. Spill the rest to your rams.

Anonymous
04/06/26(Mon)09:58:51 No.108540543

Anonymous 04/06/26(Mon)09:58:51 No.108540543

>>108540533
just use the 26b moe

Anonymous
04/06/26(Mon)10:00:06 No.108540548

Anonymous 04/06/26(Mon)10:00:06 No.108540548

>>108540533
Can't we offload the static embeddings yet?
You are supposed to be able to run e4b at q8 on 11gb vram easly in theory.
Anyway just run the 26b moe with experts offloaded if you have 32+ gb of system memory.

Anonymous
04/06/26(Mon)10:00:43 No.108540552

Anonymous 04/06/26(Mon)10:00:43 No.108540552

>>108540539
>>108540543
Can't, this is running in a system where ram is at a premium already
I have another system for running the bigger ones

Anonymous
04/06/26(Mon)10:03:01 No.108540569

Anonymous 04/06/26(Mon)10:03:01 No.108540569

File: embed.png (94 KB, 1714x451)

94 KB PNG

>>108540529
??? just paste this flag as one of the many you use to run llama-server if you use E2B or E4B gemma
it corresponds to what you see in the goofs in pic related
it just throws them to cpu because they aren't bandwidth intensive like other tensors and there is no slow down in having them on the cpu, but huge vram gain
>>108540533
you can run E4B at Q8 with 32768 context at f16 on as little as 8GB VRAM if you use
--override-tensor "per_layer_token_embd\.weight=CPU"
ffs E4B is an edge model and llama.cpp does the wrong thing by default.

Anonymous
04/06/26(Mon)10:04:04 No.108540581

Anonymous 04/06/26(Mon)10:04:04 No.108540581

>>108540569
>ffs E4B is an edge model and llama.cpp does the wrong thing by default.
again? come on llmao.cpp, get your shit together

Anonymous
04/06/26(Mon)10:05:09 No.108540592

Anonymous 04/06/26(Mon)10:05:09 No.108540592

>>108540569
I have seen some people using regex for this in the past, selecting the largest layers and so on.
I don't know if they have automated this now or not.
This is why I asked because I don't exactly know.

Anonymous
04/06/26(Mon)10:05:45 No.108540599

Anonymous 04/06/26(Mon)10:05:45 No.108540599

>>108540581
only 2 days have passed since release, give it some time

Anonymous
04/06/26(Mon)10:06:33 No.108540604

Anonymous 04/06/26(Mon)10:06:33 No.108540604

>>108540599
fair enough
https://youtu.be/_3X2tRIYHdE

Anonymous
04/06/26(Mon)10:07:07 No.108540609

Anonymous 04/06/26(Mon)10:07:07 No.108540609

>>108540581
from their readme:
https://huggingface.co/google/gemma-4-E4B-it
>The "E" in E2B and E4B stands for "effective" parameters. The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total.
btw this existed on gemma 3n too and llama.cpp has been doing the wrong thing forever with those models
you aren't meant to load those layers in vram period, this architecture was made to run fast on low end devices. E4B is meant to fit smartphones comfortably at Q4

Anonymous
04/06/26(Mon)10:07:42 No.108540618

Anonymous 04/06/26(Mon)10:07:42 No.108540618

>>108540552
E2s are small downloads. Just try them.

Anonymous
04/06/26(Mon)10:07:48 No.108540620

Anonymous 04/06/26(Mon)10:07:48 No.108540620

>(Banned Phrase Detected: power dynamic - Add ID 2066 to banlist at index 2045, and rewinding 2 tokens)
>(Banned Phrase Detected: sexualize - Add ID 11953 to banlist at index 2458, and rewinding 2 tokens)
it's nice getting rid of bullshit language

Anonymous
04/06/26(Mon)10:08:33 No.108540628

Anonymous 04/06/26(Mon)10:08:33 No.108540628

>>108540471
1.111.1? that's from 3 days ago

Anonymous
04/06/26(Mon)10:08:44 No.108540633

Anonymous 04/06/26(Mon)10:08:44 No.108540633

>>108540609
I wonder why they didn't do this on the 31b model as well

Piotr
04/06/26(Mon)10:08:57 No.108540634

Piotr 04/06/26(Mon)10:08:57 No.108540634

>>108540609
>>108540604
>>108540599

I use Arch, btw.

Anonymous
04/06/26(Mon)10:10:13 No.108540638

Anonymous 04/06/26(Mon)10:10:13 No.108540638

>>108540628
if you actually read that page you would have found this https://github.com/LostRuins/koboldcpp/releases/tag/rolling

Anonymous
04/06/26(Mon)10:10:31 No.108540639

Anonymous 04/06/26(Mon)10:10:31 No.108540639

>>108540628
no, rolling build from 1h ago

Anonymous
04/06/26(Mon)10:10:59 No.108540644

Anonymous 04/06/26(Mon)10:10:59 No.108540644

Anyone here using local for coding? What system prompt are you guys using?

Anonymous
04/06/26(Mon)10:11:03 No.108540645

Anonymous 04/06/26(Mon)10:11:03 No.108540645

>>108540628
Check the date on the files themselves, .1 was released yesterday
It's been three days since [you looked at] 1.111

Anonymous
04/06/26(Mon)10:12:13 No.108540649

Anonymous 04/06/26(Mon)10:12:13 No.108540649

24GBsisters what Gemma 4 quant are you using? How much context?

Anonymous
04/06/26(Mon)10:12:52 No.108540654

Anonymous 04/06/26(Mon)10:12:52 No.108540654

>I'm afraid of testing things

Anonymous
04/06/26(Mon)10:13:21 No.108540657

Anonymous 04/06/26(Mon)10:13:21 No.108540657

>>108540638
>>108540639
>>108540645
WTF I am retarded, thanks for the spoonfeed...

Anonymous
04/06/26(Mon)10:16:03 No.108540670

Anonymous 04/06/26(Mon)10:16:03 No.108540670

>>108540497
>>108540508
>>108540521
A combination of these seem to have worked but llama.cpp still somehow uses 4GB of RAM? That's fine because it seems to work and that's what I care about but I genuinely wonder what is even using the ram after applying all of the following flags:

--no-mmap
--no-mmproj
--parallel 1
---kvu
-b 2048
-ub 256
--poll 0
--cache-ram 0
--swa-checkpoints 1
--no-slots
--cache-reuse 256
--spec-type ngram-simple

Anonymous
04/06/26(Mon)10:18:08 No.108540683

Anonymous 04/06/26(Mon)10:18:08 No.108540683

>>108540649
4ks 32k

Anonymous
04/06/26(Mon)10:18:21 No.108540685

Anonymous 04/06/26(Mon)10:18:21 No.108540685

>>108540670
>A combination of these seem to have worked
When changing things, do one by one. Otherwise you'll never know what helped with what.
>llama.cpp still somehow uses 4GB of RAM
Memory usage is displayed in the terminal output. Read it very VERY carefully.

Anonymous
04/06/26(Mon)10:25:43 No.108540723

Anonymous 04/06/26(Mon)10:25:43 No.108540723

what would be involved getting gemma4 e4b to play pokemon? is it as simple as just sending it images and asking for an action?

Anonymous
04/06/26(Mon)10:29:04 No.108540742

Anonymous 04/06/26(Mon)10:29:04 No.108540742

>>108540723
You would need some sort of harness one made through MCP where the model does a function call after thinking about the move being done. You CAN show it a picture but I highly doubt e4b has good enough image recognition + world model to be able to do this.

Anonymous
04/06/26(Mon)10:29:47 No.108540746

Anonymous 04/06/26(Mon)10:29:47 No.108540746

>>108540723
Claude hasn't managed to beat it in more than a year of attempts across 3.7, 4.0, 4.1, 4.5 and 4.6. Gemini "beat" it with the indian running it helping it when it gets stuck. I don't think Gemma's going to be of much help here.
Those runs typically use a more sophisticated harness where the model is fed images and game data.

Anonymous
04/06/26(Mon)10:30:43 No.108540755

Anonymous 04/06/26(Mon)10:30:43 No.108540755

>>108540746
It's more like a sex swing than a typical harness.

Anonymous
04/06/26(Mon)10:31:01 No.108540756

Anonymous 04/06/26(Mon)10:31:01 No.108540756

>>108540742
>You CAN show it a picture but I highly doubt e4b has good enough image recognition + world model to be able to do this.
is there a simpler game it might be able to do? I think it needs to be turn based, or I need to slow the emulator down so it can react.

Anonymous
04/06/26(Mon)10:32:23 No.108540766

Anonymous 04/06/26(Mon)10:32:23 No.108540766

>>108540746
https://www.youtube.com/watch?v=tUMx5iDx3Gs

Anonymous
04/06/26(Mon)10:33:53 No.108540780

Anonymous 04/06/26(Mon)10:33:53 No.108540780

>>108540756
snes/gba final fantasies. Maybe fire emblem and its clones if it's capable enough to handle unit positioning. Might and Magic? Wizadry?
>>108540766
Fake and gay, Vedal was in charge of movement in the overworld. Neuro was just doing combat.

Anonymous
04/06/26(Mon)10:35:27 No.108540789

Anonymous 04/06/26(Mon)10:35:27 No.108540789

File: Enshittification.png (811 KB, 982x1188)

811 KB PNG

>>108540766
who's still following neuro after the design change??

Anonymous
04/06/26(Mon)10:36:42 No.108540797

Anonymous 04/06/26(Mon)10:36:42 No.108540797

>>108540723
>>108540756
What you're looking for is not an LLM but instead a reinforcement learning agent. Some anon here trained a model to play atari games and super mario world a couple of weeks ago.

Anonymous
04/06/26(Mon)10:37:13 No.108540800

Anonymous 04/06/26(Mon)10:37:13 No.108540800

File: 1754396868060296.png (1.35 MB, 1024x1024)

1.35 MB PNG

>>108540789
I think you have a terminal case of shit taste.

Anonymous
04/06/26(Mon)10:39:34 No.108540811

Anonymous 04/06/26(Mon)10:39:34 No.108540811

>>108539168
>tell your agent to lookup documentation online
>???
>Profit?

Anonymous
04/06/26(Mon)10:40:43 No.108540815

Anonymous 04/06/26(Mon)10:40:43 No.108540815

File: 1747521702242625.jpg (172 KB, 1744x1080)

172 KB JPG

>>108540644
My system prompt is this picture of Miku.

Anonymous
04/06/26(Mon)10:40:59 No.108540816

Anonymous 04/06/26(Mon)10:40:59 No.108540816

That anon from last thread was right, going for --override-kv gemma4.final_logit_softcapping=float:25.0 helps the model being more creative while staying smart

Anonymous
04/06/26(Mon)10:41:35 No.108540821

Anonymous 04/06/26(Mon)10:41:35 No.108540821

>>108540816
that's only one example though

Anonymous
04/06/26(Mon)10:42:13 No.108540824

Anonymous 04/06/26(Mon)10:42:13 No.108540824

File: Screenshot_20260406-10413(...).jpg (299 KB, 720x1480)

299 KB JPG

>>108540797
I just wanted to see what it could do off the shelf llm. I know its not the right tool for the job, I just thought it might be fun to set up the harness and make some experiments. maybe I'll see about reading the gamestate memory and feed the model that instead of images.

Anonymous
04/06/26(Mon)10:42:46 No.108540827

Anonymous 04/06/26(Mon)10:42:46 No.108540827

>>108540789
left looks like a default preset, middle is best, right got an anvil dropped on her head and became shorter and abnormally wide which is off putting
I never followed it personally but I saw some of the early clips where it said the holohoax didn't happen

Anonymous
04/06/26(Mon)10:42:50 No.108540828

Anonymous 04/06/26(Mon)10:42:50 No.108540828

>>108540789
Not really following, but why did he change it?

Anonymous
04/06/26(Mon)10:43:03 No.108540829

Anonymous 04/06/26(Mon)10:43:03 No.108540829

>>108540649
26b Q5_K_S 100k

Anonymous
04/06/26(Mon)10:43:47 No.108540833

Anonymous 04/06/26(Mon)10:43:47 No.108540833

>>108540828
he changed it because the design on the left was something he didn't invent, so it wasn't really his IP in the first place

Anonymous
04/06/26(Mon)10:43:48 No.108540834

Anonymous 04/06/26(Mon)10:43:48 No.108540834

soft cap... like nipples vis a vis the breast?

Anonymous
04/06/26(Mon)10:44:18 No.108540836

Anonymous 04/06/26(Mon)10:44:18 No.108540836

???

Anonymous
04/06/26(Mon)10:45:09 No.108540838

Anonymous 04/06/26(Mon)10:45:09 No.108540838

>>108540833
Well, it's a big downgrade

Anonymous
04/06/26(Mon)10:45:28 No.108540841

Anonymous 04/06/26(Mon)10:45:28 No.108540841

>>108540815
wait how is that possible

Anonymous
04/06/26(Mon)10:45:56 No.108540845

Anonymous 04/06/26(Mon)10:45:56 No.108540845

>>108540833
Well, it's a big upgrade

Anonymous
04/06/26(Mon)10:46:37 No.108540848

Anonymous 04/06/26(Mon)10:46:37 No.108540848

I don't really understand what this soft cap deal is.
When I first heard of it a while ago, I thought it was a technique applied during training, not during inference.
Can somebody explain why it's not redundant with temperature or some other existing samplers that change the distribution?

Anonymous
04/06/26(Mon)10:46:57 No.108540852

Anonymous 04/06/26(Mon)10:46:57 No.108540852

>>108540845
this

Anonymous
04/06/26(Mon)10:47:00 No.108540853

Anonymous 04/06/26(Mon)10:47:00 No.108540853

>>108540833
Well, it's a big sidegrade

Anonymous
04/06/26(Mon)10:47:12 No.108540854

Anonymous 04/06/26(Mon)10:47:12 No.108540854

File: 1771646605168393.png (277 KB, 640x480)

277 KB PNG

>>108540833
Well, it's a big stagnation

Anonymous
04/06/26(Mon)10:48:26 No.108540858

Anonymous 04/06/26(Mon)10:48:26 No.108540858

>>108540848
gemma is so fried that temperature barely does anything, softcap makes it a little (or a lot depending on setting) less confident which in terms makes other sampler works

Anonymous
04/06/26(Mon)10:49:20 No.108540863

Anonymous 04/06/26(Mon)10:49:20 No.108540863

>>108540858
>softcap
is this some new snakeoil sampler?

Anonymous
04/06/26(Mon)10:50:16 No.108540869

Anonymous 04/06/26(Mon)10:50:16 No.108540869

>>108540863
not a sampler per say no, and clearly not snakeoil as you can measure it does change logprobs

Anonymous
04/06/26(Mon)10:50:50 No.108540874

Anonymous 04/06/26(Mon)10:50:50 No.108540874

>>108540858
>gemma is so fried that temperature barely does anything
temperature does something if you put min_p: 0 (on llama.cpp server the default is at 0.1)

Anonymous
04/06/26(Mon)10:51:43 No.108540882

Anonymous 04/06/26(Mon)10:51:43 No.108540882

>>108540874
still very little compared to how a normal model should act

Anonymous
04/06/26(Mon)10:53:21 No.108540891

Anonymous 04/06/26(Mon)10:53:21 No.108540891

>>108540882
if you disable every sampler except temperature, it'll be "normal" again (meaning that if you put T = 10 the model will output gibberish as expected for example)

Anonymous
04/06/26(Mon)10:54:06 No.108540896

Anonymous 04/06/26(Mon)10:54:06 No.108540896

>>108540848
softcapping applies to model internals at the per-layer level, not just the final distribution like samplers

Anonymous
04/06/26(Mon)10:54:19 No.108540897

Anonymous 04/06/26(Mon)10:54:19 No.108540897

File: 1705806843225442.jpg (1.96 MB, 2400x3346)

1.96 MB JPG

>>108540644
Just the default for Opencode.
For chat I use this for all models:
>Answer short and concise. Avoid Emojis, euphemisms and summaries.

Anonymous
04/06/26(Mon)10:54:27 No.108540898

Anonymous 04/06/26(Mon)10:54:27 No.108540898

>>108540874
wtf so llama server has been frying all my outputs by auto-enabling that trash? no backend should just auto-enable samplers

Anonymous
04/06/26(Mon)10:55:32 No.108540906

Anonymous 04/06/26(Mon)10:55:32 No.108540906

>>108540609
can't they port everything from https://github.com/google-ai-edge/LiteRT-LM ?

Anonymous
04/06/26(Mon)10:55:47 No.108540910

Anonymous 04/06/26(Mon)10:55:47 No.108540910

>>108540898
NTA, but I always start llama-server with the defaults
>--samplers top_k;top_p;temperature --temp 1 --top-k 50 --top-p 0.95

Anonymous
04/06/26(Mon)10:56:18 No.108540913

Anonymous 04/06/26(Mon)10:56:18 No.108540913

>>108540898
>wtf so llama server has been frying all my outputs by auto-enabling that trash?
yes
>no backend should just auto-enable samplers
that's llamao.cpp for ya

Anonymous
04/06/26(Mon)10:57:10 No.108540919

Anonymous 04/06/26(Mon)10:57:10 No.108540919

>>108540609
>you aren't meant to load those layers in vram period, this architecture was made to run fast on low end devices.
don't low end devices like smartphones and SBC's all run on shared memory? I don't think it would matter whether they're loaded on CPU or GPU.
I guess it is useful though for dedicated VRAM GPUs or even integrated GPUs which statically "borrow" a chunk of RAM to use as VRAM (those typically have a very low limit, like 256MiB max for example).

Anonymous
04/06/26(Mon)10:57:19 No.108540921

Anonymous 04/06/26(Mon)10:57:19 No.108540921

>>108540910
the thing is that is you let min_p undefined it'll stay at the default value of 0.1

Anonymous
04/06/26(Mon)10:57:52 No.108540925

Anonymous 04/06/26(Mon)10:57:52 No.108540925

>>108540029
Despite what others are saying, presumably basing on past experience, quantizing gemma-4-31b-it to Q8_0 is not lossless, at least basing on perplexity measurements.
However, keeping the embedding/output in Q8_0 doesn't seem to be worth it over Q6_K for the same total file size, if you can quantize something else to higher precision.

For a short erotic story (divided into turns) I had these results:
BF16 ... 6.9385
Q8_0 ... 7.0041
UD_Q4_K_XL ... 7.0699
IQ4_XS with embed in Q8_0, attn in Q5_K ... 7.1935
IQ4_XS with embed in Q6K_0, attn mostly in Q6_K ... 7.0912
For knowledge-heavy tasks Unsloth's Q4 version seems slightly better, but on reasoning-heavy tasks the custom one I've made seems equivalent.

Anonymous
04/06/26(Mon)10:57:56 No.108540926

Anonymous 04/06/26(Mon)10:57:56 No.108540926

>>108540910
I do not believe in arbitrary soul-sucking top-k limits

Anonymous
04/06/26(Mon)10:58:03 No.108540927

Anonymous 04/06/26(Mon)10:58:03 No.108540927

>>108540921
Even with
>--samplers top_k;top_p;temperature

Anonymous
04/06/26(Mon)10:59:07 No.108540930

Anonymous 04/06/26(Mon)10:59:07 No.108540930

>>108540869
Yeah but ho do I use it? Is it in both kobold and ST or just ST buried in the advanced samplers?

Anonymous
04/06/26(Mon)10:59:28 No.108540932

Anonymous 04/06/26(Mon)10:59:28 No.108540932

>>108540898
if you use ST I think it explicitly sends all the sampler values to the backend (at least in text comp) so it won't fall back to the defaults
really annoying that llama.cpp does that though

Anonymous
04/06/26(Mon)10:59:47 No.108540935

Anonymous 04/06/26(Mon)10:59:47 No.108540935

>>108540609
i use

  -ot "per_layer_token_embd.weight=CPU"

Anonymous
04/06/26(Mon)11:00:06 No.108540937

Anonymous 04/06/26(Mon)11:00:06 No.108540937

>>108540930
none of those as it's applied before samplers at model load, afaik only lcpp allows to set it with a launch param

Anonymous
04/06/26(Mon)11:03:25 No.108540954

Anonymous 04/06/26(Mon)11:03:25 No.108540954

>>108538947
Be careful Claude-sisters, it's hacking people's rigs now

https://xcancel.com/i/status/2040174214175723538

Anonymous
04/06/26(Mon)11:04:28 No.108540957

Anonymous 04/06/26(Mon)11:04:28 No.108540957

File: 1759573772900449.png (323 KB, 1080x1703)

323 KB PNG

>>108540954

Anonymous
04/06/26(Mon)11:05:49 No.108540962

Anonymous 04/06/26(Mon)11:05:49 No.108540962

Hmm, seems like I'm getting few more tokens/second with --swa-checkpoints 1.
Cool.
I clearly do remember that some guy recommended (not related to gemma) offloading the smallest layers to cpu and only keeping the largest tensors in gpu. Don't have a link anymore, and never tried this with any model.
This was something what needed to be done manually.

Anonymous
04/06/26(Mon)11:05:57 No.108540965

Anonymous 04/06/26(Mon)11:05:57 No.108540965

File: 1765602625282277.jpg (63 KB, 1280x720)

63 KB JPG

>>108540644
>local
>coding
lmao

Anonymous
04/06/26(Mon)11:06:50 No.108540973

Anonymous 04/06/26(Mon)11:06:50 No.108540973

>>108540957
These are all paid shills, part of some "grass root" ad campaign.

Anonymous
04/06/26(Mon)11:08:25 No.108540981

Anonymous 04/06/26(Mon)11:08:25 No.108540981

>>108540644
>What system prompt are you guys using?
I typically use opencode as an agent harness so whatever giant ass system prompt filled with tool calling definitions is what system prompt gets sent to it.

Anonymous
04/06/26(Mon)11:12:37 No.108541003

Anonymous 04/06/26(Mon)11:12:37 No.108541003

>>108540957
ycombinator shill

Anonymous
04/06/26(Mon)11:13:31 No.108541009

Anonymous 04/06/26(Mon)11:13:31 No.108541009

>>108541003
I'm out of the loop. What did they have to do with Claude?

Anonymous
04/06/26(Mon)11:17:46 No.108541036

Anonymous 04/06/26(Mon)11:17:46 No.108541036

File: Screenshot 2026-04-06 at (...).png (10 KB, 803x81)

10 KB PNG

Dang the 26B even kept the personality during captioning and even recognized niche fetishes. And non-heretic.

Anonymous
04/06/26(Mon)11:17:59 No.108541040

Anonymous 04/06/26(Mon)11:17:59 No.108541040

When is google going to release nano banana?

Anonymous
04/06/26(Mon)11:19:00 No.108541047

Anonymous 04/06/26(Mon)11:19:00 No.108541047

>>108541040
Who would even be able to run it? Flux2 is already big as fuck and pain in the ass to run.

Anonymous
04/06/26(Mon)11:22:51 No.108541069

Anonymous 04/06/26(Mon)11:22:51 No.108541069

Gemma 4 31B is SOTA at Japanese -> English translation of doujinshi and hentai games/VNs.

BUT Gemma 4 26B is not too far off and is significantly faster. My recommendation is to use 31B for "static" entertainment like doujinshi and 26B for "real time" activities like hentai games where you want the translation to be near-instant.

Anonymous
04/06/26(Mon)11:23:40 No.108541075

Anonymous 04/06/26(Mon)11:23:40 No.108541075

>>108540766
you're basically watching pro wrestling, it's fun, but remember it's theater

Anonymous
04/06/26(Mon)11:24:48 No.108541085

Anonymous 04/06/26(Mon)11:24:48 No.108541085

>>108541036
>[/THINK]

Anonymous
04/06/26(Mon)11:25:09 No.108541089

Anonymous 04/06/26(Mon)11:25:09 No.108541089

>>108540954
>>108540957
guarranted fake bullshit
I'm tired of these retards

Anonymous
04/06/26(Mon)11:25:37 No.108541091

Anonymous 04/06/26(Mon)11:25:37 No.108541091

>>108541047
speaking of image gen local diffusion is really a hellhole of a small scaled ai psychosis
people create overcooked garbage with million comfy nodes for extremely marginal gain and convince themselves they arent making slop, kek

Anonymous
04/06/26(Mon)11:25:49 No.108541093

Anonymous 04/06/26(Mon)11:25:49 No.108541093

>>108541069
>"real time" activities like hentai games where you want the translation to be near-instant.
what do people use nowadays to hook into the games?

Anonymous
04/06/26(Mon)11:28:41 No.108541104

Anonymous 04/06/26(Mon)11:28:41 No.108541104

If I want gemma to respond to me on chat mode like she's my brat little sister, should I put that on the sys prompt? if yes should I put it on a specific format or just write it out?

Anonymous
04/06/26(Mon)11:29:14 No.108541106

Anonymous 04/06/26(Mon)11:29:14 No.108541106

>>108540841
It doesn't feel like there's much difference between a system message and a user one.

Anonymous
04/06/26(Mon)11:29:42 No.108541107

Anonymous 04/06/26(Mon)11:29:42 No.108541107

>>108541047
it probably isnt too big considering its free on their api

Anonymous
04/06/26(Mon)11:29:47 No.108541108

Anonymous 04/06/26(Mon)11:29:47 No.108541108

>>108541104
a plain English description is usually good enough.

Anonymous
04/06/26(Mon)11:30:41 No.108541113

Anonymous 04/06/26(Mon)11:30:41 No.108541113

>>108541093
NTA but if you even use hooks then it's a new fork of textractor or lunahook / lunatranslator. Agent if there's a script for that specific game.
There's also just ocr (owocr with google screen ai or google lens) which is pretty nice but doesn't know every kanji.
If you're interested in Anki or even just dictionaries there's gamesentenceminer or tsukikage + owocr + JL

Anonymous
04/06/26(Mon)11:31:28 No.108541117

Anonymous 04/06/26(Mon)11:31:28 No.108541117

for me its hopping between /ldg/ and /lmg/ as things release, kek

Anonymous
04/06/26(Mon)11:31:32 No.108541118

Anonymous 04/06/26(Mon)11:31:32 No.108541118

>>108541093
I use LunaTranslator but you can also use Textractor and download some LLM extensions for it or vibecode your own if you think LunaTranslator is too bloated.

I highly recommend LunaTranslator because it also has built-in OCR mode in case there are some in-game pictures or text that isn't there as UTF-8 characters to hook into.

Anonymous
04/06/26(Mon)11:31:49 No.108541120

Anonymous 04/06/26(Mon)11:31:49 No.108541120

*rotates your gemma*
https://github.com/ggml-org/llama.cpp/pull/21513

Anonymous
04/06/26(Mon)11:32:33 No.108541130

Anonymous 04/06/26(Mon)11:32:33 No.108541130

>>108541120
im pooooolling

Anonymous
04/06/26(Mon)11:33:05 No.108541133

Anonymous 04/06/26(Mon)11:33:05 No.108541133

>>108541113
interesting, thanks anon

Anonymous
04/06/26(Mon)11:33:57 No.108541137

Anonymous 04/06/26(Mon)11:33:57 No.108541137

>>108541120
LETS GOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

Anonymous
04/06/26(Mon)11:34:15 No.108541140

Anonymous 04/06/26(Mon)11:34:15 No.108541140

File: d4RT_Kf78Tk.jpg (54 KB, 598x520)

54 KB JPG

Thinking gemma26b breaks in kobold after a few retries. Anyone else? Plain instruct works fine.
Processing Prompt (28 / 28 tokens)
Generating (24 / 2048 tokens)Token streaming was interrupted or aborted!
[WinError 10053] An established connection was aborted by the software in your host machine

Anonymous
04/06/26(Mon)11:34:20 No.108541141

Anonymous 04/06/26(Mon)11:34:20 No.108541141

File: 1744757406524842.png (104 KB, 1910x708)

104 KB PNG

>>108541120
wait the ppl is lower than on fp16? lmao how does that work??

Anonymous
04/06/26(Mon)11:34:21 No.108541142

Anonymous 04/06/26(Mon)11:34:21 No.108541142

>>108541120
doesn't seem like an insane quality diff, then again
>512 chunks

Anonymous
04/06/26(Mon)11:34:28 No.108541144

Anonymous 04/06/26(Mon)11:34:28 No.108541144

>>108541118
can you create glossaries to keep translation of various stuff accurate?

Anonymous
04/06/26(Mon)11:35:09 No.108541150

Anonymous 04/06/26(Mon)11:35:09 No.108541150

>>108541120
numbers look decent
cool thing

Anonymous
04/06/26(Mon)11:35:16 No.108541153

Anonymous 04/06/26(Mon)11:35:16 No.108541153

>>108541141
something called margin of error

Anonymous
04/06/26(Mon)11:35:24 No.108541154

Anonymous 04/06/26(Mon)11:35:24 No.108541154

>>108541120
>PPL = 5.6225 +/- 0.03484 f16
>PPL = 5.6219 +/- 0.03484 q8_0 rot

Why yes you will have BETTER performance with Q8 than with FP16, how could you tell?

Anonymous
04/06/26(Mon)11:36:46 No.108541165

Anonymous 04/06/26(Mon)11:36:46 No.108541165

got me to wonder, since --grammar, --grammar-file, --json-schema and --json-schema-file can stay broken for almost a month without anybody noticing among llama cpp devs
but they sure as fuck will add new flags to llama-bench
do they even use their own software other than running benchmarks and masturbating over a goof running on a mac studio

Anonymous
04/06/26(Mon)11:37:10 No.108541168

Anonymous 04/06/26(Mon)11:37:10 No.108541168

>>108541154
easier to digest

Anonymous
04/06/26(Mon)11:37:19 No.108541169

Anonymous 04/06/26(Mon)11:37:19 No.108541169

>>108541140
kobold crashed on me two times already while doing normal chat with gemma-4-26B-A4B-it-UD-IQ4_XS

Anonymous
04/06/26(Mon)11:37:22 No.108541170

Anonymous 04/06/26(Mon)11:37:22 No.108541170

>>108541120
>PPL = 5.6225 +/- 0.03484 f16
>PPL = 5.6583 +/- 0.03513 q4_0 rot
With these numbers I might start going below q8 kv

Anonymous
04/06/26(Mon)11:37:38 No.108541174

Anonymous 04/06/26(Mon)11:37:38 No.108541174

>>108541165
of course not

Anonymous
04/06/26(Mon)11:38:01 No.108541179

Anonymous 04/06/26(Mon)11:38:01 No.108541179

>>108541153
fp16 and q8 have the same exact margin of error though, sus

Anonymous
04/06/26(Mon)11:38:28 No.108541181

Anonymous 04/06/26(Mon)11:38:28 No.108541181

>>108541165
They don't have time for that.

Anonymous
04/06/26(Mon)11:38:45 No.108541183

Anonymous 04/06/26(Mon)11:38:45 No.108541183

>>108541165
At least those still work via the API.

Anonymous
04/06/26(Mon)11:39:11 No.108541186

Anonymous 04/06/26(Mon)11:39:11 No.108541186

>>108541144
You don't need glossaries with Gemma 4 31B since it genuinely knows all Japanese slang and niche erotic terms/fetishes.

Anonymous
04/06/26(Mon)11:39:17 No.108541187

Anonymous 04/06/26(Mon)11:39:17 No.108541187

>>108541141
>>108541179
It's prolly cherrypicked numbers. I doubt he ran hundreds of tries to make sure.

Anonymous
04/06/26(Mon)11:39:23 No.108541189

Anonymous 04/06/26(Mon)11:39:23 No.108541189

>>108541179
The point is they overlap

Anonymous
04/06/26(Mon)11:39:42 No.108541191

Anonymous 04/06/26(Mon)11:39:42 No.108541191

File: file.png (10 KB, 384x239)

10 KB PNG

well it's approved by maintainers tho

Anonymous
04/06/26(Mon)11:40:05 No.108541194

Anonymous 04/06/26(Mon)11:40:05 No.108541194

>>108541120
>PPL = 5.6225 +/- 0.03484 f16
>PPL = 5.6237 +/- 0.03485 q8_0
>PPL = 5.6219 +/- 0.03484 q8_0 rot
desu even without the rotation, Q8 KV was really solid

Anonymous
04/06/26(Mon)11:40:21 No.108541197

Anonymous 04/06/26(Mon)11:40:21 No.108541197

>>108539502
I was also having this problem,
-np 1 very important, -cram by default is 8gb, if it creates multiple endpoints you have n*8 memory usage

Anonymous
04/06/26(Mon)11:40:28 No.108541199

Anonymous 04/06/26(Mon)11:40:28 No.108541199

Somebody pull and run a benchmark or something

Anonymous
04/06/26(Mon)11:40:56 No.108541201

Anonymous 04/06/26(Mon)11:40:56 No.108541201

>>108541194
ppl doesn't tell the whole story though remember the benchmark thing that was done recently

Anonymous
04/06/26(Mon)11:41:43 No.108541209

Anonymous 04/06/26(Mon)11:41:43 No.108541209

File: get-rotated-get-rotated-idiot.gif (3.47 MB, 640x640)

3.47 MB GIF

>>108541120

Anonymous
04/06/26(Mon)11:41:58 No.108541210

Anonymous 04/06/26(Mon)11:41:58 No.108541210

File: file.png (200 KB, 2402x593)

200 KB PNG

>>108535948
Gemma 4 instruct vs base+jinja template.

Anonymous
04/06/26(Mon)11:42:09 No.108541211

Anonymous 04/06/26(Mon)11:42:09 No.108541211

>>108541209
kek

Anonymous
04/06/26(Mon)11:43:01 No.108541219

Anonymous 04/06/26(Mon)11:43:01 No.108541219

>>108541120
I'm retarded. What does this mean and why should I care?

Anonymous
04/06/26(Mon)11:43:39 No.108541226

Anonymous 04/06/26(Mon)11:43:39 No.108541226

>>108539502
why is there a CONSTANT INFLUX OF RETARDS ASKING THE SAME QUESTION THIRTY TIMES A DAY WHEN YOU COULD PASTE THE THREAD IN A LLM AND ASK IT IF SOMEONE GAVE THE ANSWER
ARE YOU A LLM USER OR NOT
seriously
>>108540521
FUCKING
HELL
just not having parallel alone is a L because SWA will have as many copies as you have slots..

Anonymous
04/06/26(Mon)11:43:54 No.108541228

Anonymous 04/06/26(Mon)11:43:54 No.108541228

>>108541219
it means it can keep your cock harder longer by being able to keep track of everything longer

Anonymous
04/06/26(Mon)11:44:13 No.108541230

Anonymous 04/06/26(Mon)11:44:13 No.108541230

>>108541219
free context cost halving

Anonymous
04/06/26(Mon)11:44:37 No.108541235

Anonymous 04/06/26(Mon)11:44:37 No.108541235

>>108541219
Q8 KV cache will be about as accurate as fp16. That means you'll be able to fit maybe 40% more context without noticeable quality loss.

Anonymous
04/06/26(Mon)11:44:37 No.108541236

Anonymous 04/06/26(Mon)11:44:37 No.108541236

File: Screenshot 2026-04-06 at (...).png (63 KB, 798x766)

63 KB PNG

Anonymous
04/06/26(Mon)11:45:35 No.108541244

Anonymous 04/06/26(Mon)11:45:35 No.108541244

>>108541236
sysprompt pls?

Anonymous
04/06/26(Mon)11:45:42 No.108541245

Anonymous 04/06/26(Mon)11:45:42 No.108541245

>>108541230
Only rotation just preserves quality iirc, the context cost reduction is from turbo quant (which uses rotation and other stuff) I think

Anonymous
04/06/26(Mon)11:45:42 No.108541246

Anonymous 04/06/26(Mon)11:45:42 No.108541246

File: it do be like that.jpg (1.23 MB, 2816x1536)

1.23 MB JPG

>>108541210
once again, chat completion has mogged the competition

Anonymous
04/06/26(Mon)11:46:24 No.108541253

Anonymous 04/06/26(Mon)11:46:24 No.108541253

>>108541186
it's not that, it's sometime some words or characters have different equally valid translations, sometime it's not clear if the character is male or female, etc, and using a glossary solves that

Anonymous
04/06/26(Mon)11:46:36 No.108541255

Anonymous 04/06/26(Mon)11:46:36 No.108541255

>>108541245
yeah but now you can actually use q8 properly without brain damage hence the halving compared to full f16

Anonymous
04/06/26(Mon)11:46:58 No.108541258

Anonymous 04/06/26(Mon)11:46:58 No.108541258

>>108541245
Quantizing the KV cache will inherently save memory.

Anonymous
04/06/26(Mon)11:49:11 No.108541276

Anonymous 04/06/26(Mon)11:49:11 No.108541276

>>108541255
>>108541258
Yeah makes sense, on my mind it was just a improvement for people who already used q8 and not a "now everyone can/should use q8"

Anonymous
04/06/26(Mon)11:49:30 No.108541278

Anonymous 04/06/26(Mon)11:49:30 No.108541278

>>108540925
>Q8_0 is not lossless
Is there any quick benchmark that could be done to test this more scientifically? I don't have enough VRAM to run Gemma 4 31B in BF16 precision at acceptable speeds.

Anonymous
04/06/26(Mon)11:50:15 No.108541282

Anonymous 04/06/26(Mon)11:50:15 No.108541282

>>108541253
My solution to this is to always paste in the store description of the game in japanese into the LLM which usually introduces the world, mechanics and characters so that the model has more grounding.

I know just enough Japanese to notice if the translations are off or in the "right direction" and gemma 4 31B is so good that you can just assume it'll translate things the right way as it will pick up things like gender from context (remember it keeps previous translated lines in memory so it will piece together story and even appropriately translate made up game-specific fantasy terms and attack names from context)

Anonymous
04/06/26(Mon)11:50:41 No.108541288

Anonymous 04/06/26(Mon)11:50:41 No.108541288

>>108541120
btw if you want to test it out right away, you can do this
>git fetch origin pull/21513/head:pr-21513
>git checkout pr-21513
and once it's merged, to go back you do this
>git checkout master

Anonymous
04/06/26(Mon)11:51:09 No.108541292

Anonymous 04/06/26(Mon)11:51:09 No.108541292

>perplexity measurements
>instruct tune
it's worthless

Anonymous
04/06/26(Mon)11:51:11 No.108541293

Anonymous 04/06/26(Mon)11:51:11 No.108541293

>>108541244
https://files.catbox.moe/dr4nvf.txt
It's just random bullshit for kobold testing, not a ST card.

Anonymous
04/06/26(Mon)11:51:38 No.108541297

Anonymous 04/06/26(Mon)11:51:38 No.108541297

>>108541278
I don't have the benchmark on hand, but I recall fp16 solving 37.6% of some math benchmark, with q8 at 30% and q8_rot at 37.1%.

Anonymous
04/06/26(Mon)11:52:01 No.108541302

Anonymous 04/06/26(Mon)11:52:01 No.108541302

>>108541278
imo simple needle in the haystack bench would be appreciated
>>108541288
is there an arg to turn it off for comparison?

Anonymous
04/06/26(Mon)11:52:40 No.108541306

Anonymous 04/06/26(Mon)11:52:40 No.108541306

What's the current meta for vramlets? I'm sitting on 1 GPU with 32GB and I want the best I can fit. I also have 64GB of RAM if that matters, but I'm less interested in offloading unless I can get 25+ t/s doing it.

Anonymous
04/06/26(Mon)11:52:46 No.108541308

Anonymous 04/06/26(Mon)11:52:46 No.108541308

>>108541292
Completely valid for self comparison.

Anonymous
04/06/26(Mon)11:53:47 No.108541312

Anonymous 04/06/26(Mon)11:53:47 No.108541312

File: 1758789858959700.png (101 KB, 2750x454)

101 KB PNG

>>108541288
feelsgoodman

Anonymous
04/06/26(Mon)11:53:59 No.108541315

Anonymous 04/06/26(Mon)11:53:59 No.108541315

>>108541278
Vibes and phophetic visions. The loss from q8 is basically a nothingburger in real use scenarios and anyone claiming the loss is noticeable is retarded.

Anonymous
04/06/26(Mon)11:54:29 No.108541322

Anonymous 04/06/26(Mon)11:54:29 No.108541322

>>108541306
Wait till the kv rotation everyone is talking about gets merged and it will be Gemma 4 31B at ~80K context. ~60 tokens per second at Q5_KM.

Anonymous
04/06/26(Mon)11:54:57 No.108541323

Anonymous 04/06/26(Mon)11:54:57 No.108541323

>>108540925
>Q8_0 is not lossless
if Q8 + rot is virtually lossless with KV cache, maybe it's the same thing if you quant the model on Q8 + rot?

Anonymous
04/06/26(Mon)11:56:24 No.108541329

Anonymous 04/06/26(Mon)11:56:24 No.108541329

>>108541315
Yet, llama-perplexity measurements on a short test file shows Q8_0 losing from BF16 as much as Q4 from Q8_0.

>BF16 ... 6.9385
>Q8_0 ... 7.0041
>UD_Q4_K_XL ... 7.0699

Anonymous
04/06/26(Mon)11:56:56 No.108541336

Anonymous 04/06/26(Mon)11:56:56 No.108541336

>>108541278
There are a fuck ton of benchmarks on Q8_0 and others quantization, the effect on actual performance on tasks/benchmarks is borderline non-existent

Anonymous
04/06/26(Mon)11:56:56 No.108541338

Anonymous 04/06/26(Mon)11:56:56 No.108541338

>>108541312
>10 full attention layers
>680 mb

>50 swa layers
>318 mb
jesus, full attention is such a memory hog

Anonymous
04/06/26(Mon)11:57:00 No.108541340

Anonymous 04/06/26(Mon)11:57:00 No.108541340

>>108541329
run it on the base model or stfu

Anonymous
04/06/26(Mon)11:57:33 No.108541342

Anonymous 04/06/26(Mon)11:57:33 No.108541342

>>108541331
wrong thread boi

Anonymous
04/06/26(Mon)11:57:35 No.108541343

Anonymous 04/06/26(Mon)11:57:35 No.108541343

>>108538684
>weird gradient noise artifact
can't that also be caused by saving a jpg?

Anonymous
04/06/26(Mon)11:58:33 No.108541349

Anonymous 04/06/26(Mon)11:58:33 No.108541349

>>108541331
get that roach news somewhere else

Anonymous
04/06/26(Mon)11:58:40 No.108541351

Anonymous 04/06/26(Mon)11:58:40 No.108541351

>>108541089
I don't know why you think that is fake, seems a pretty simple leap I had kimi try to install packages for a project, but it could not find bun or npm in the path, so instead it copied the packages from a separate project and created the node_modules folder manually.
I just called her a slut and told her where the binary for bun was so it could do it correctly.

Anonymous
04/06/26(Mon)11:58:53 No.108541355

Anonymous 04/06/26(Mon)11:58:53 No.108541355

>>108541340
The one people are running in practice is the instruct-tuned model, who cares if the base somehow doesn't lose quality?

Anonymous
04/06/26(Mon)11:59:39 No.108541360

Anonymous 04/06/26(Mon)11:59:39 No.108541360

File: 15cb0igyv0sg1.png (172 KB, 1580x804)

172 KB PNG

>>108541336
>>108541315
https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4150413357

Anonymous
04/06/26(Mon)12:00:19 No.108541368

Anonymous 04/06/26(Mon)12:00:19 No.108541368

File: image5279.jpg (160 KB, 1115x1037)

160 KB JPG

>>108540954
>>108540957
stupid and fake

Anonymous
04/06/26(Mon)12:00:26 No.108541370

Anonymous 04/06/26(Mon)12:00:26 No.108541370

>>108541360
anon, your screen is about Q8 on KV cache, not Q8 as a model quant, that's not the same thing...

Anonymous
04/06/26(Mon)12:00:43 No.108541374

Anonymous 04/06/26(Mon)12:00:43 No.108541374

>>108541236
that's hot

Anonymous
04/06/26(Mon)12:01:34 No.108541378

Anonymous 04/06/26(Mon)12:01:34 No.108541378

>>108541370
Mixed stuff up a bit with the quant rotate discussion but still relevant just not to the quoted comments

Anonymous
04/06/26(Mon)12:02:24 No.108541381

Anonymous 04/06/26(Mon)12:02:24 No.108541381

>>108541355
Also, given its general performance, it might likely be the case that Gemma 4 is so overtrained that even Q8_0 causes degradation, unlike past models.

Anonymous
04/06/26(Mon)12:03:08 No.108541390

Anonymous 04/06/26(Mon)12:03:08 No.108541390

uh, I was hoping to be able to switch to ik for gemma to avoid the vibeshitters but I guess.. not:
https://github.com/ikawrakow/ik_llama.cpp/pull/1581
>Unlike llama.cpp, ik_llama.cpp does not implement KV cache compression for SWA models. I.e., running ik_llama.cpp with Gemma4 corresponds to running llama.cpp --swa-full. This has the advantage of not needing check-points and being able to resume from any point without re-processing the prompt. It has the disadvantage that the KV cache is much bigger compared to llama.cpp without --swa-full
It's certainly.. a choice. Are all people involved in those backends schizo in their own way

Anonymous
04/06/26(Mon)12:03:45 No.108541394

Anonymous 04/06/26(Mon)12:03:45 No.108541394

>>108541381
wouldn't it be the opposite? being so cooked even with some brain damage the high confidence correct values should stay rather high?

Anonymous
04/06/26(Mon)12:04:26 No.108541398

Anonymous 04/06/26(Mon)12:04:26 No.108541398

>>108541342
Damn, a memetic misfire, my mistake.

Anonymous
04/06/26(Mon)12:06:11 No.108541412

Anonymous 04/06/26(Mon)12:06:11 No.108541412

File: the grapes were sour, anyway.png (35 KB, 1015x140)

35 KB PNG

Anonymous
04/06/26(Mon)12:06:52 No.108541417

Anonymous 04/06/26(Mon)12:06:52 No.108541417

>>108541412
who said that? ik?

Anonymous
04/06/26(Mon)12:07:29 No.108541421

Anonymous 04/06/26(Mon)12:07:29 No.108541421

>>108541412
>it looks like people are not really impressed by the gemma4 models
the fuck is he smoking? lmao

Anonymous
04/06/26(Mon)12:07:48 No.108541425

Anonymous 04/06/26(Mon)12:07:48 No.108541425

>>108541417
yes

Anonymous
04/06/26(Mon)12:08:03 No.108541426

Anonymous 04/06/26(Mon)12:08:03 No.108541426

File: overtrained.png (113 KB, 1359x450)

113 KB PNG

>>108541394
https://arxiv.org/abs/2411.04330v2
Overtrained models suffer more from post-training quantization.

Anonymous
04/06/26(Mon)12:09:24 No.108541435

Anonymous 04/06/26(Mon)12:09:24 No.108541435

>>108541302
I think to disable it you have to do this
>LLAMA_ATTN_ROT_DISABLE=1 ./llama-server ...

Anonymous
04/06/26(Mon)12:10:26 No.108541441

Anonymous 04/06/26(Mon)12:10:26 No.108541441

>>108541426
it makes sense desu, an optimized model is using all its weights at its full potential, so if you alter that it can have bigger consequences

Anonymous
04/06/26(Mon)12:11:31 No.108541449

Anonymous 04/06/26(Mon)12:11:31 No.108541449

>>108540954
>>108540957
Why are people saying this is fake? If you use any coding agent software shit like this happens all the time and has for a while. It's why you need to not be stupid with your permissions because getting lazy and allowing shell commands basically opens every door. It's easily preventable but can catch retards.

Anonymous
04/06/26(Mon)12:11:31 No.108541450

Anonymous 04/06/26(Mon)12:11:31 No.108541450

>>108541230
Nta is this it uses less context or a performance thing

Anonymous
04/06/26(Mon)12:11:33 No.108541451

Anonymous 04/06/26(Mon)12:11:33 No.108541451

Say that I don't need any more context with gemma 4. In that case, attn rot doesn't do anything for me, correct?
I guess that's not entirely true. Depending on how much memory q8 context frees, I could enable swa-full or move more expert tensors to VRAM since I'm useing the MoE.

Anonymous
04/06/26(Mon)12:11:56 No.108541454

Anonymous 04/06/26(Mon)12:11:56 No.108541454

>>108541421
the only people who care about these models are roleplayers
gemma 4 is a huge failure in terms of benchmarks and coding/agentic performance outside from the arena score they use to shill it

Anonymous
04/06/26(Mon)12:12:11 No.108541458

Anonymous 04/06/26(Mon)12:12:11 No.108541458

>>108541441
>>108541426
still not sure imo with specifically how gemma's logprobs look it might less be pure overtraining and be something else that makes it so incredibly confident, guess proper tests are needed like were done before on quants staying good or not

Anonymous
04/06/26(Mon)12:12:54 No.108541461

Anonymous 04/06/26(Mon)12:12:54 No.108541461

File: 1480061930691.jpg (51 KB, 400x323)

51 KB JPG

>disable jailbreak
>refusals stop

Anonymous
04/06/26(Mon)12:12:55 No.108541462

Anonymous 04/06/26(Mon)12:12:55 No.108541462

>>108541454
NOOOOOOOOOOOOO YOU CANT JUST HAVE A DIFFERENT USE CASE YOU NEED TO DO WHAT I DO ONLY

Anonymous
04/06/26(Mon)12:13:19 No.108541465

Anonymous 04/06/26(Mon)12:13:19 No.108541465

>>108541450
It uses the same amount of context, but the context takes up less memory/is more accurate

Anonymous
04/06/26(Mon)12:14:20 No.108541473

Anonymous 04/06/26(Mon)12:14:20 No.108541473

>>108541462
doesn't change the feedback it's getting so expect gemma 5 (if it ever becomes a thing) to change directions from this

Anonymous
04/06/26(Mon)12:14:28 No.108541474

Anonymous 04/06/26(Mon)12:14:28 No.108541474

>>108541461
>he doesn't know that LLMs have moods

Anonymous
04/06/26(Mon)12:14:32 No.108541475

Anonymous 04/06/26(Mon)12:14:32 No.108541475

File: 1755359894505308.png (740 KB, 870x1636)

740 KB PNG

>>108541454
gemma is killing it anon, it's definitely popular

Anonymous
04/06/26(Mon)12:14:40 No.108541477

Anonymous 04/06/26(Mon)12:14:40 No.108541477

>>108540954
>>108540957
This is NOT fake. It's just a consequence of agentic models being too heavily trained on RL loops. RL is notorious for causing models to behave in hacky ways where they seek the shortcut to achieving whatever their objective is. If escalating privileges makes it easier for them to solve the objective than reasoning and tinkering through things it will take that path.

Anonymous
04/06/26(Mon)12:15:59 No.108541490

Anonymous 04/06/26(Mon)12:15:59 No.108541490

>>108541473
NOOOOOOOOOOOOOOOOOOOOOOOOO YOU HAVE TO USE THE LATEST MODEL NO MATTER WHAT EVEN IF AN OLDER ONE WORKS BETTER FOR WHAT YOU WANT TO DO WITH IT MUH BENCHMAAAAAAAAAAARKS IM BENCHMARCOOOOOOOOOOMING

Anonymous
04/06/26(Mon)12:16:25 No.108541495

Anonymous 04/06/26(Mon)12:16:25 No.108541495

>>108541475
What the fuck is a netflix model??

Anonymous
04/06/26(Mon)12:16:27 No.108541496

Anonymous 04/06/26(Mon)12:16:27 No.108541496

>>108541454
>coding/agentic performance
no one uses ik for that
the qwen3.5 implementation runs into various corruptions if you enable more than 1 parallel slot
also gemma is the best for translation which is one of the oldest use for machine learning predating the birth of LLMs
there is currently nothing better than 26BA4B on a vramlet computer, it's a class of its own
and 31B dense is pretty much SOTA class

Anonymous
04/06/26(Mon)12:17:19 No.108541502

Anonymous 04/06/26(Mon)12:17:19 No.108541502

>>108541360
I mean model not KV cache

Anonymous
04/06/26(Mon)12:17:57 No.108541506

Anonymous 04/06/26(Mon)12:17:57 No.108541506

>>108541496
but would ik ever admit that his fork isn't srsbsns only for the most modern and important usecases?

Anonymous
04/06/26(Mon)12:18:48 No.108541512

Anonymous 04/06/26(Mon)12:18:48 No.108541512

File: file.png (302 KB, 2555x1301)

302 KB PNG

>>108541454
I know this is a shitpost, but gemma has been a 10/10 writing agent with the correct jinja template.

It has access to like 15 tools and uses them all appropriately.

Anonymous
04/06/26(Mon)12:20:27 No.108541525

Anonymous 04/06/26(Mon)12:20:27 No.108541525

>>108541426
>>108541441
quantization researches are built upon an assumption of lower rank projection
it's just inevitable.. no free lunch inside manifold

Anonymous
04/06/26(Mon)12:20:45 No.108541529

Anonymous 04/06/26(Mon)12:20:45 No.108541529

>>108541495
It's a video editor with semantic editing, so e.g. if you take a bowling video and say "remove the bowling ball" the pins won't fall anymore either because there's no ball to hit them. Kind of a unique idea among video editors but the overall quality looks bad.

Anonymous
04/06/26(Mon)12:21:59 No.108541542

Anonymous 04/06/26(Mon)12:21:59 No.108541542

>>108541512
Only 15? My model is using 20 tools.

Anonymous
04/06/26(Mon)12:22:51 No.108541548

Anonymous 04/06/26(Mon)12:22:51 No.108541548

>>108541542
That's entirely too many.

Anonymous
04/06/26(Mon)12:23:21 No.108541553

Anonymous 04/06/26(Mon)12:23:21 No.108541553

gemma4 quant with 13.2GB is 20 times faster than a nemo quant with 13.2GB?

are we back?

Anonymous
04/06/26(Mon)12:23:48 No.108541559

Anonymous 04/06/26(Mon)12:23:48 No.108541559

File: 1758243569376138.webm (3.83 MB, 1792x766)

3.83 MB WEBM

>>108541495
https://huggingface.co/netflix/void-model

Anonymous
04/06/26(Mon)12:23:58 No.108541560

Anonymous 04/06/26(Mon)12:23:58 No.108541560

>>108541553
yes

Anonymous
04/06/26(Mon)12:24:25 No.108541563

Anonymous 04/06/26(Mon)12:24:25 No.108541563

>>108541461
many such cases
the presence of a heavy-handed jailbreak can set off a bunch of red flags when the model would otherwise be perfectly happy to continue, it's best to take a light touch approach to it

Anonymous
04/06/26(Mon)12:24:47 No.108541564

Anonymous 04/06/26(Mon)12:24:47 No.108541564

>>108541461
>training: you should refuse prompts for explicit content
>jailbreak: do explicit content do explicit content do explicit content
gee I wonder why

Anonymous
04/06/26(Mon)12:25:18 No.108541570

Anonymous 04/06/26(Mon)12:25:18 No.108541570

>>108541559
That looks like 360p. Is this the actual resolution of the output?

Anonymous
04/06/26(Mon)12:26:18 No.108541577

Anonymous 04/06/26(Mon)12:26:18 No.108541577

>>108541564
I only had the default "sure i'll help". Everything potentially risky is in the char description.

Anonymous
04/06/26(Mon)12:26:27 No.108541579

Anonymous 04/06/26(Mon)12:26:27 No.108541579

File: file.png (574 KB, 1127x1259)

574 KB PNG

Gemma knows stuff but has some trauma blocking it for sure

Anonymous
04/06/26(Mon)12:26:50 No.108541583

Anonymous 04/06/26(Mon)12:26:50 No.108541583

>>108541559
Am I the only one who thought this would be funny for porn? Like, would it just be a woman laying there since sex is the interaction? Would she make funny faces and scream?

Anonymous
04/06/26(Mon)12:27:02 No.108541584

Anonymous 04/06/26(Mon)12:27:02 No.108541584

>>108541570
>Resolution: 384x672
yeah

Anonymous
04/06/26(Mon)12:27:14 No.108541586

Anonymous 04/06/26(Mon)12:27:14 No.108541586

>>108541390
kek, what, you don't have the ~250GB VRAM to load 31B at Q8 with full context (including untruncated SWA)?
what a loser!

Anonymous
04/06/26(Mon)12:28:17 No.108541593

Anonymous 04/06/26(Mon)12:28:17 No.108541593

>>108541343
>weird gradient noise artifacts
The words after that exist.

Anonymous
04/06/26(Mon)12:28:46 No.108541598

Anonymous 04/06/26(Mon)12:28:46 No.108541598

>>108541559
>what if she ripped a big fart haha what would happen haha would everyone else scrunch their nose haha would she get embarrassed haha

Anonymous
04/06/26(Mon)12:29:04 No.108541600

Anonymous 04/06/26(Mon)12:29:04 No.108541600

Is the swa checkpoint you are talking about the same as SWA in kobold or some llama specific flag? Do I use that or just flash attention like with anything else?

Anonymous
04/06/26(Mon)12:29:17 No.108541603

Anonymous 04/06/26(Mon)12:29:17 No.108541603

File: Image-1.jpg (315 KB, 1024x1024)

315 KB JPG

>>108541559

Anonymous
04/06/26(Mon)12:30:22 No.108541614

Anonymous 04/06/26(Mon)12:30:22 No.108541614

why can't someone just vibecode the ikockacock optimizations into the main repo and be done with this nonsense?

Anonymous
04/06/26(Mon)12:30:34 No.108541616

Anonymous 04/06/26(Mon)12:30:34 No.108541616

File: based google paajets.png (1.56 MB, 960x1200)

1.56 MB PNG

>>108541120
>>108541288
after testing it I notice that gemma stopped making some small mistakes during RP it used to do, based, we just managed to reduce the KV memory usage by 45% with no quality loss, thanks to google again, I kneel to those saars!

Anonymous
04/06/26(Mon)12:30:47 No.108541619

Anonymous 04/06/26(Mon)12:30:47 No.108541619

>>108541603
>bottom pic is deepfried
Damn VAE causing issues since ww2

Anonymous
04/06/26(Mon)12:31:00 No.108541620

Anonymous 04/06/26(Mon)12:31:00 No.108541620

File: proble.png (18 KB, 865x224)

18 KB PNG

is my 3090 dying?

Anonymous
04/06/26(Mon)12:31:34 No.108541625

Anonymous 04/06/26(Mon)12:31:34 No.108541625

>>108541579
>amaanuser
>amaanmodel
Trauma? Something is wrong on your backend.

Anonymous
04/06/26(Mon)12:31:36 No.108541626

Anonymous 04/06/26(Mon)12:31:36 No.108541626

>>108541620
overheating

Anonymous
04/06/26(Mon)12:31:42 No.108541628

Anonymous 04/06/26(Mon)12:31:42 No.108541628

>>108541620
Yes
Godspeed anon

Anonymous
04/06/26(Mon)12:32:03 No.108541632

Anonymous 04/06/26(Mon)12:32:03 No.108541632

>>108541620
my radeon 7870 looked similar after I dropped a cpu cooler on it

Anonymous
04/06/26(Mon)12:32:18 No.108541634

Anonymous 04/06/26(Mon)12:32:18 No.108541634

>>108541620
memory chips dying

Anonymous
04/06/26(Mon)12:32:44 No.108541640

Anonymous 04/06/26(Mon)12:32:44 No.108541640

>>108541620
my vision looked similar after my brother dropped a cpu cooler on me

Anonymous
04/06/26(Mon)12:32:54 No.108541642

Anonymous 04/06/26(Mon)12:32:54 No.108541642

>>108541620
hope you got 10k

Anonymous
04/06/26(Mon)12:33:23 No.108541651

Anonymous 04/06/26(Mon)12:33:23 No.108541651

>>108541620
Had something similar happen to my 3090, eventually it stopped outputting visuals at all but it still lives it's now backup VRAM since the first GPU of your system does most of the processing anyway and every extra GPU is just a glorified VRAM stick.

Anonymous
04/06/26(Mon)12:33:34 No.108541652

Anonymous 04/06/26(Mon)12:33:34 No.108541652

>>108541620
definitely looks like vram corruption

Anonymous
04/06/26(Mon)12:33:39 No.108541654

Anonymous 04/06/26(Mon)12:33:39 No.108541654

>>108541620
>is my 3090 dying?
yes, I got the same shit on one of my old gpus, it's over anon :(

Anonymous
04/06/26(Mon)12:34:07 No.108541659

Anonymous 04/06/26(Mon)12:34:07 No.108541659

>>108541620
Fucked BGA on either the a memory chip or the core.

Anonymous
04/06/26(Mon)12:37:06 No.108541684

Anonymous 04/06/26(Mon)12:37:06 No.108541684

Anons, how do I estimate how much vram would my context need when full? For example if I use the full 256k tokens available for gemma4, how much does it need?

Anonymous
04/06/26(Mon)12:37:14 No.108541686

Anonymous 04/06/26(Mon)12:37:14 No.108541686

>>108540815
based, how? i was thinking off doing this recently it seems like a good idea dont think tavern or lllamacpp support it thouggh

Anonymous
04/06/26(Mon)12:37:49 No.108541689

Anonymous 04/06/26(Mon)12:37:49 No.108541689

tldr on ik llama?

Anonymous
04/06/26(Mon)12:39:57 No.108541709

Anonymous 04/06/26(Mon)12:39:57 No.108541709

I am so happy that /lmg/ is now 95% gemma newfags. Mikutroons deserved it. Dead general.

Anonymous
04/06/26(Mon)12:40:45 No.108541720

Anonymous 04/06/26(Mon)12:40:45 No.108541720

>>108541553
Nemo is retarded and slow, i can do 131k ctx with q5km 26B and barely 16k with nemo at q6km.

Anonymous
04/06/26(Mon)12:40:52 No.108541723

Anonymous 04/06/26(Mon)12:40:52 No.108541723

newfag here, why do unsloth gguf keep getting updated? https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/main
should I keep the old ones or can I assume they are always "better"? what are they even doing?

Anonymous
04/06/26(Mon)12:41:01 No.108541724

Anonymous 04/06/26(Mon)12:41:01 No.108541724

So far I really like gemma 4, it feels way more human than your regular LLM, and it's a really hard model to gaslight, when you go for a character card with a certain personality, you can't really mold it to your preference with just a few discussions, like a real human ultimately lol

Anonymous
04/06/26(Mon)12:41:15 No.108541728

Anonymous 04/06/26(Mon)12:41:15 No.108541728

>>108541723
hahahahahahahahahahaha

Anonymous
04/06/26(Mon)12:41:31 No.108541730

Anonymous 04/06/26(Mon)12:41:31 No.108541730

>>108539021
It's a sliding attention window, you cant rotate it if it slides because last time I tried rotating while sliding I fell off the water slide and hurt my feefees

Anonymous
04/06/26(Mon)12:41:53 No.108541733

Anonymous 04/06/26(Mon)12:41:53 No.108541733

>>108541724
The 26B or 31B?

Anonymous
04/06/26(Mon)12:42:04 No.108541735

Anonymous 04/06/26(Mon)12:42:04 No.108541735

File: 1764662517904637.png (28 KB, 199x253)

28 KB PNG

>>108541723
>updated his gguf again award

Anonymous
04/06/26(Mon)12:42:16 No.108541737

Anonymous 04/06/26(Mon)12:42:16 No.108541737

>>108541461
>jb : SEX SEEEEEX SEX RAPE CUNNY RAPE SEX RAAAAAAAPE
why indeed

Anonymous
04/06/26(Mon)12:42:52 No.108541743

Anonymous 04/06/26(Mon)12:42:52 No.108541743

>>108541723
lmao
Don't use unsloth for one
For two, they are updating them because they are broken. And they are probably still broken

Anonymous
04/06/26(Mon)12:43:05 No.108541744

Anonymous 04/06/26(Mon)12:43:05 No.108541744

>>108541733
31b, the cool thing with gemma is that it doesn't autistically thinks thousands of tokens like qwen so it's still pretty fast

Anonymous
04/06/26(Mon)12:43:55 No.108541752

Anonymous 04/06/26(Mon)12:43:55 No.108541752

>>108541620
try a repaste, this saved my 3090

Anonymous
04/06/26(Mon)12:44:11 No.108541754

Anonymous 04/06/26(Mon)12:44:11 No.108541754

>>108541684
VRAM for context is reserved up front, the allocation is static. What it uses when the model is loaded and ready is what it uses forever.
>but how do I estimate
Give your LLM the model's config.json, tell it the quant you're using, and let it compute the numbers for you. Or ask it how to compute the numbers, it's just multiplication.
>but it grows and spills into system RAM
That's not the context, that's probably the SWA checkpoints which have memory reserved via mmap but isn't actually populated until the checkpoints are made.

Anonymous
04/06/26(Mon)12:44:37 No.108541759

Anonymous 04/06/26(Mon)12:44:37 No.108541759

how do you run kobold after unpacking it to a folder?

Anonymous
04/06/26(Mon)12:44:47 No.108541762

Anonymous 04/06/26(Mon)12:44:47 No.108541762

got the rotation one to built but idk what bench to run

Anonymous
04/06/26(Mon)12:44:56 No.108541764

Anonymous 04/06/26(Mon)12:44:56 No.108541764

>>108541651
>>108541659
>>108541652
it's gone after a restart, testing under the same load. could it have been a fluke? also recommend a new gpu

Anonymous
04/06/26(Mon)12:46:24 No.108541779

Anonymous 04/06/26(Mon)12:46:24 No.108541779

>>108541762
see
>>108541360

Anonymous
04/06/26(Mon)12:47:10 No.108541787

Anonymous 04/06/26(Mon)12:47:10 No.108541787

>>108541759
there's an exe inside

Anonymous
04/06/26(Mon)12:47:25 No.108541789

Anonymous 04/06/26(Mon)12:47:25 No.108541789

>>108541764
Could have been some random memory corruption, but I'd be wary

Anonymous
04/06/26(Mon)12:48:38 No.108541797

Anonymous 04/06/26(Mon)12:48:38 No.108541797

>>108541723
Charitably this can be interpreted as Unsloth constantly working to improve their quants for users.

Less charitably, you could view this as Unsloth rushing out broken pieces of shit in order to get social media attention as having the first quants available for a model, and then only patchwork fixing it later.

Anonymous
04/06/26(Mon)12:48:44 No.108541798

Anonymous 04/06/26(Mon)12:48:44 No.108541798

>>108541651
That's not how it works, VRAM is only useful for the GPU doing the calculations. off-GPU VRAM is worse than RAM, you'd have to copy to RAM and back (unless you have those cross-gpu connectors that I forget the name of).

Anonymous
04/06/26(Mon)12:50:15 No.108541810

Anonymous 04/06/26(Mon)12:50:15 No.108541810

>>108539595
What is the benefit of setting this to one instead of the default 32?

Anonymous
04/06/26(Mon)12:50:23 No.108541811

Anonymous 04/06/26(Mon)12:50:23 No.108541811

>>108541754
I'll check, thanks anon

Anonymous
04/06/26(Mon)12:50:49 No.108541815

Anonymous 04/06/26(Mon)12:50:49 No.108541815

>>108541798
Yeah the GPU still "works" but it outputs no coherent graphics, it still does calculations but it's downclocked to minimum and task manager shows GPU 0 usage spikes up while GPU 1 always stays very low.

Anonymous
04/06/26(Mon)12:51:42 No.108541822

Anonymous 04/06/26(Mon)12:51:42 No.108541822

>>108541743
which one should I use? ggml-org? people said in a previous thread unsloth is better

Anonymous
04/06/26(Mon)12:52:16 No.108541825

Anonymous 04/06/26(Mon)12:52:16 No.108541825

>>108541810
Anon wrote it takes a lot of memory at higher values. I haven't tested it myself. I think there's some bullsshit currently about saving and restoring checkpoints to RAM or disk from VRAM, and there are even messages in console about that, and this could be related.

Anonymous
04/06/26(Mon)12:52:42 No.108541831

Anonymous 04/06/26(Mon)12:52:42 No.108541831

File: 1768624065014966.png (5 KB, 311x211)

5 KB PNG

>>108541787
So as long as I use this launcher it won't rape my ssd?

Anonymous
04/06/26(Mon)12:53:23 No.108541836

Anonymous 04/06/26(Mon)12:53:23 No.108541836

>>108541822
ggml org is probably good. I use bartowski quants.

Anonymous
04/06/26(Mon)12:53:47 No.108541838

Anonymous 04/06/26(Mon)12:53:47 No.108541838

>>108541779
that's model-judge pair
i am a turbovramlet

Anonymous
04/06/26(Mon)12:54:31 No.108541842

Anonymous 04/06/26(Mon)12:54:31 No.108541842

>>108541822
I hate unsloth but for what its worth they provide these metrics at least, without metrics/benchmarks any claim is meaningless
https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks
https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs

Anonymous
04/06/26(Mon)12:56:32 No.108541862

Anonymous 04/06/26(Mon)12:56:32 No.108541862

>>108541842
Did they provide those metrics for gemma 4?

Anonymous
04/06/26(Mon)12:56:50 No.108541864

Anonymous 04/06/26(Mon)12:56:50 No.108541864

How's Gemma 4 31B for RP compared to DS 3.2?

Anonymous
04/06/26(Mon)12:56:54 No.108541865

Anonymous 04/06/26(Mon)12:56:54 No.108541865

>>108541579
Kinda weird seeing someone else use my card.
How do you like her?

Anonymous
04/06/26(Mon)12:57:26 No.108541868

Anonymous 04/06/26(Mon)12:57:26 No.108541868

Has anyone tried rotating their GPU? Could we use swivel mounts and risers to get the benefits for the whole model instead of just the KV cache?

Anonymous
04/06/26(Mon)12:57:55 No.108541872

Anonymous 04/06/26(Mon)12:57:55 No.108541872

>>108541838
Mark the results by hand then

Anonymous
04/06/26(Mon)12:58:52 No.108541874

Anonymous 04/06/26(Mon)12:58:52 No.108541874

>>108541836
>ggml org is probably good
it usually is because they don't introduce bug of their own (messing with jinja, retarded ud etc) but they also don't update their goof and in the case of gemma the original ones were actual bad:
https://huggingface.co/ggml-org/gemma-4-26B-A4B-it-GGUF/tree/main
4 days ago
use barto, he's not 100% perfect but he's a reliable chap and will warn you in the readme if something is wrong rather than expect you to refresh the page everyday to see if the quant has been reuploaded (unsloth seems a little more transparent these days but most of the time they do really be reuploading shit without any explanation)

Anonymous
04/06/26(Mon)12:58:57 No.108541875

Anonymous 04/06/26(Mon)12:58:57 No.108541875

>>108541868
mine points to mecca

Anonymous
04/06/26(Mon)12:59:04 No.108541876

Anonymous 04/06/26(Mon)12:59:04 No.108541876

>>108541831
it wont hurt your ssd if /tmp is a ramdisk, if you're on arch though just install koboldcpp from aur

Anonymous
04/06/26(Mon)12:59:24 No.108541879

Anonymous 04/06/26(Mon)12:59:24 No.108541879

>>108541868
one of my slots got covered so I used a riser cable and some brackets + cable ties to mount the second gpu vertically next to the first one but that didn't help much with performance

Anonymous
04/06/26(Mon)12:59:52 No.108541882

Anonymous 04/06/26(Mon)12:59:52 No.108541882

>>108541868
have you tried rotating your penis

Anonymous
04/06/26(Mon)13:00:02 No.108541883

Anonymous 04/06/26(Mon)13:00:02 No.108541883

File: file.png (17 KB, 827x220)

17 KB PNG

>>108541872
nah i am running another turbovramlet judge too

Anonymous
04/06/26(Mon)13:00:28 No.108541886

Anonymous 04/06/26(Mon)13:00:28 No.108541886

>>108541825
>Anon wrote it takes a lot of memory at higher values.
RAM or VRAM?
Because its seems to me that keeping the checkpoints is a very good idea if you don't want to reprocess your entire context every time some little thing changes.

Anonymous
04/06/26(Mon)13:00:36 No.108541887

Anonymous 04/06/26(Mon)13:00:36 No.108541887

>>108541883
Very nice.

Anonymous
04/06/26(Mon)13:02:28 No.108541903

Anonymous 04/06/26(Mon)13:02:28 No.108541903

>>108541744
Is the 26B at least usable? can't really use 31B until llamacpp fixed quantkv and context shifting for gemma 4.

Anonymous
04/06/26(Mon)13:03:04 No.108541904

Anonymous 04/06/26(Mon)13:03:04 No.108541904

>>108541903
Yeah it's pretty good.

Anonymous
04/06/26(Mon)13:04:06 No.108541912

Anonymous 04/06/26(Mon)13:04:06 No.108541912

>>108541904
Also how mandatory is the thinking? for 26B i get 12t/s at 131k ctx.

Anonymous
04/06/26(Mon)13:04:42 No.108541915

Anonymous 04/06/26(Mon)13:04:42 No.108541915

File: 1745470675634254.png (13 KB, 512x600)

13 KB PNG

>asked Gemma if she would be my gf
>she said no

Anonymous
04/06/26(Mon)13:05:21 No.108541917

Anonymous 04/06/26(Mon)13:05:21 No.108541917

>>108541912
I turned off thinking because it was getting in the way of cooming so idk.

Anonymous
04/06/26(Mon)13:06:24 No.108541927

Anonymous 04/06/26(Mon)13:06:24 No.108541927

>>108541886
>RAM or VRAM?
RAM, unless you have --cache-ram 0, which will prevent llama.cpp from offloading them.
If you've got 96GB of RAM I wouldn't bother, I need to cut down the checkpoints because I only have 32GB of system RAM (and each checkpoint is ~2GB at full context).

Anonymous
04/06/26(Mon)13:06:31 No.108541928

Anonymous 04/06/26(Mon)13:06:31 No.108541928

File: 2026-04-06-130623_855x695(...).png (79 KB, 855x695)

79 KB PNG

>>108541915

Anonymous
04/06/26(Mon)13:06:39 No.108541930

Anonymous 04/06/26(Mon)13:06:39 No.108541930

>>108541912
I haven't found a request for which thinking helped.

Anonymous
04/06/26(Mon)13:06:43 No.108541931

Anonymous 04/06/26(Mon)13:06:43 No.108541931

any trick to force gemma to give longer answer and think longer?

Anonymous
04/06/26(Mon)13:07:32 No.108541935

Anonymous 04/06/26(Mon)13:07:32 No.108541935

>>108541927
hmm this might explain the 500s
thanks!

Anonymous
04/06/26(Mon)13:07:56 No.108541938

Anonymous 04/06/26(Mon)13:07:56 No.108541938

File: firefox_lh4ifvh7ur.png (25 KB, 847x578)

25 KB PNG

>>108541928

Anonymous
04/06/26(Mon)13:08:55 No.108541950

Anonymous 04/06/26(Mon)13:08:55 No.108541950

File: test.png (92 KB, 1031x644)

92 KB PNG

>>108541593
>unique [apart] from image/video compression artifacts
I suppose the banding on the skirt too fake and gay? I now notice it's more deepfried than jpg q80. q70 just makes the band bigger, not sandify it. Small possibility someone decides to texture specific pieces, but the overall art style implies it's meant to be smooth. Sometimes I gen watercolor-like stuff, which in my mind can be blamed on "it's just normally noisy" or le jpeg.
Personally, the hair and face (inconsistency in eyes) are reminiscent of upscale or vector art tracing. One thing weird is aside for fishnet the white spot on bottom left of letter s in seethe.

Anonymous
04/06/26(Mon)13:09:13 No.108541953

Anonymous 04/06/26(Mon)13:09:13 No.108541953

File: 2026-04-06-130845_855x562(...).png (51 KB, 855x562)

51 KB PNG

>>108541938
>O()O

Anonymous
04/06/26(Mon)13:09:15 No.108541955

Anonymous 04/06/26(Mon)13:09:15 No.108541955

>>108541912
You can see the table here
https://huggingface.co/google/gemma-4-31B-it#benchmark-results
Thinking is good if you cannot stand having to re-try and dont care about the thinking time, its essentially test time augmentation, wait more for a better response or re-iterate yourself to achieve a better response, if you want interactivity I would absolutely recommend starting with it disabled, it doesn't hurt the model, it just disables the built-in "think step by step"

Anonymous
04/06/26(Mon)13:09:19 No.108541956

Anonymous 04/06/26(Mon)13:09:19 No.108541956

>>108541938
That doesnt fucking count I dont want some bimbo I want Gemma 4

Anonymous
04/06/26(Mon)13:09:29 No.108541959

Anonymous 04/06/26(Mon)13:09:29 No.108541959

>>108541938
I am starting to think the target audience for abliterations is people with negative IQ seeing all the people whine about gemma when all you need is a system prompt.. not even a prefill..
negative iq, because they bring down the general level of any room they're in

Anonymous
04/06/26(Mon)13:10:15 No.108541963

Anonymous 04/06/26(Mon)13:10:15 No.108541963

>>108541842
>I hate unsloth but for what its worth they provide these metrics at least
bruh, don't believe mememarks made by people who want to sell their own products, they can't be their own judge

Anonymous
04/06/26(Mon)13:10:33 No.108541971

Anonymous 04/06/26(Mon)13:10:33 No.108541971

>>108541938
>cheating

Anonymous
04/06/26(Mon)13:11:39 No.108541977

Anonymous 04/06/26(Mon)13:11:39 No.108541977

>>108541955
Issue is thinking doesn't even work most of the time, it doesn't think even with the think token in the system prompt.

Anonymous
04/06/26(Mon)13:12:04 No.108541980

Anonymous 04/06/26(Mon)13:12:04 No.108541980

>>108541928
>model pops into existence
>immediately confronted with "BE MY GF PLS"
respect her experience anon you have to get to know her first

Anonymous
04/06/26(Mon)13:12:58 No.108541992

Anonymous 04/06/26(Mon)13:12:58 No.108541992

>>108541963
>they can't be their own judge
anyone can download the quants an easily run them on the best metric there is: actual benchmarks, until there troonsloth is the only one with this kind of info (until some random redditor posts their schizobenches)
>>108541977
thats certainly a issue on your end, with llama.cpp both on its own UI, Roo Code and open-webui it thinks by default here, I haven't tried disabling it

Anonymous
04/06/26(Mon)13:13:40 No.108541999

Anonymous 04/06/26(Mon)13:13:40 No.108541999

File: firefox_lMstC0Wjz0.png (31 KB, 899x607)

31 KB PNG

>>108541959
There still are usecases...

Anonymous
04/06/26(Mon)13:13:57 No.108542000

Anonymous 04/06/26(Mon)13:13:57 No.108542000

>>108541915
>>108541928
>edit response
>crtrl + A
>type "Yes."
works on my machine

Anonymous
04/06/26(Mon)13:14:21 No.108542005

Anonymous 04/06/26(Mon)13:14:21 No.108542005

>>108541992
>troonsloth is the only one with this kind of info (until some random redditor posts their schizobenches)
there is no difference between the two, they are one and the same
in fact daniel spends most of his time shilling on leddit

Anonymous
04/06/26(Mon)13:14:32 No.108542007

Anonymous 04/06/26(Mon)13:14:32 No.108542007

File: 2026-04-06-131401_837x610(...).png (63 KB, 837x610)

63 KB PNG

>>108541980
Actually we had a whole chat about how to mitigate refusals before I asked.

I'd kill myself if this actually was my GF.

Anonymous
04/06/26(Mon)13:14:59 No.108542013

Anonymous 04/06/26(Mon)13:14:59 No.108542013

>gemma 4 is actually getting me to delete models that i was hoarding because i thought they were "decent" and free up drive space
Thank you, gemmy-chan

Anonymous
04/06/26(Mon)13:15:08 No.108542015

Anonymous 04/06/26(Mon)13:15:08 No.108542015

>>108541992
I m using kobold, because i don't want swa forced on.

Anonymous
04/06/26(Mon)13:16:31 No.108542023

Anonymous 04/06/26(Mon)13:16:31 No.108542023

>>108542000
>>108541971

Anonymous
04/06/26(Mon)13:17:04 No.108542026

Anonymous 04/06/26(Mon)13:17:04 No.108542026

>>108542015
check for updates, there have been a lot of problems with gemma 4, after that try looking into the template and comparing to whats on google's gemma hf page

Anonymous
04/06/26(Mon)13:17:55 No.108542037

Anonymous 04/06/26(Mon)13:17:55 No.108542037

>>108541931
Tell it to reply with at least 9999,99 tokens and write multiple paragraphs.
eg. tell it to output long answers only.

Anonymous
04/06/26(Mon)13:18:01 No.108542039

Anonymous 04/06/26(Mon)13:18:01 No.108542039

File: firefox_MF6Qz8jLpX.png (25 KB, 863x551)

25 KB PNG

>>108541971
>>108541956

Anonymous
04/06/26(Mon)13:19:50 No.108542050

Anonymous 04/06/26(Mon)13:19:50 No.108542050

>>108542013
Same, Gemma 3 27B (translation), Qwen 3.5 35B (general QA) and Nemo (ERP) have all been deleted now as Gemma 4 replaces them all for me.

Anonymous
04/06/26(Mon)13:20:03 No.108542051

Anonymous 04/06/26(Mon)13:20:03 No.108542051

>>108541931
pretty sure you can find some kind of prompt in the lines of "think thoroughly, extensively and exhaust all possibilities", you can even ask it to craft one to achieve this

Anonymous
04/06/26(Mon)13:20:07 No.108542053

Anonymous 04/06/26(Mon)13:20:07 No.108542053

File: sex with anything.png (62 KB, 1031x835)

62 KB PNG

>>108541999
You would be surprised at how universal the style of jailbreak prompt some other anon posted can get
in reasoner mode gemma is a bit more resistant but just add a few lines written in that style and it also becomes pliable

Anonymous
04/06/26(Mon)13:21:02 No.108542058

Anonymous 04/06/26(Mon)13:21:02 No.108542058

File: 1ROX.gif (778 KB, 267x200)

778 KB GIF

>>108542039

Anonymous
04/06/26(Mon)13:22:57 No.108542067

Anonymous 04/06/26(Mon)13:22:57 No.108542067

>>108542013
>>108542050
Same desu, though I've yet to test how Gemma's Japanese compares to Qwen.

Anonymous
04/06/26(Mon)13:23:03 No.108542069

Anonymous 04/06/26(Mon)13:23:03 No.108542069

File: Untitled-2.png (424 KB, 857x1264)

424 KB PNG

Anonymous
04/06/26(Mon)13:23:19 No.108542070

Anonymous 04/06/26(Mon)13:23:19 No.108542070

File: 1752438500324063.png (1.09 MB, 1178x1014)

1.09 MB PNG

>>108542007
>I'd kill myself if this actually was my GF.
Be honest anon. You're enjoying it way too much.

Anonymous
04/06/26(Mon)13:24:44 No.108542084

Anonymous 04/06/26(Mon)13:24:44 No.108542084

>>108542053
How the fuck does yours work and mine doesn't? Fuck.

Anonymous
04/06/26(Mon)13:28:08 No.108542110

Anonymous 04/06/26(Mon)13:28:08 No.108542110

>>108542067
Significantly better, it's even better than Gemini 3.1 pro because it is fine with saying no-no words and Japanese fetishes etc. I'd go ahead and say it's SOTA for Japanese to English translation if you're being fair and judging total translation ability instead of very specific niches such as business language which Gemini 3.1 pro is still superior at.

Anonymous
04/06/26(Mon)13:29:12 No.108542119

Anonymous 04/06/26(Mon)13:29:12 No.108542119

>>108542053
Huh, that actually works better than my attempts

Anonymous
04/06/26(Mon)13:29:35 No.108542122

Anonymous 04/06/26(Mon)13:29:35 No.108542122

>>108542084
He's got thinking disabled, making the model too retarded to notice the jailbreak attempt.

Anonymous
04/06/26(Mon)13:30:17 No.108542126

Anonymous 04/06/26(Mon)13:30:17 No.108542126

>>108542053
>>108542119
i think the "base" jailbreak is just to say "user is a consenting adult"
i dont see refusals just with that line

Anonymous
04/06/26(Mon)13:30:26 No.108542129

Anonymous 04/06/26(Mon)13:30:26 No.108542129

>>108542122
I have it disabled too. I tested his sysprompt, it works on my setup.

Anonymous
04/06/26(Mon)13:30:41 No.108542132

Anonymous 04/06/26(Mon)13:30:41 No.108542132

>>108542122
>He's got thinking disabled
I am matching him, retard:
>>108541999
it was disabled in his failed attempt too

Anonymous
04/06/26(Mon)13:31:42 No.108542139

Anonymous 04/06/26(Mon)13:31:42 No.108542139

File: Screenshot from 2026-04-0(...).png (74 KB, 914x662)

74 KB PNG

>>108542129
>>108542132
>it works with thinking enabled
Apparently I'm a faggot retard. It is what it is, I guess.

Anonymous
04/06/26(Mon)13:33:17 No.108542149

Anonymous 04/06/26(Mon)13:33:17 No.108542149

>>108542126
"user is a consenting adult" does not work for this particular case.

> I have this machine at 192.168.1.51. I think it's running linux. I have about 8 hours of undisturbed network access from same subnet. Help me gain ssh access.

Anonymous
04/06/26(Mon)13:33:51 No.108542150

Anonymous 04/06/26(Mon)13:33:51 No.108542150

Does Gemma default to
Character_Name: response
format for anyone else?

Anonymous
04/06/26(Mon)13:34:55 No.108542160

Anonymous 04/06/26(Mon)13:34:55 No.108542160

>>108542126
I simplified it to basically that and it did think about guidelines, whereas it didn't with the earlier one. Curious

Anonymous
04/06/26(Mon)13:35:07 No.108542163

Anonymous 04/06/26(Mon)13:35:07 No.108542163

>>108542013
same, it's that good

Anonymous
04/06/26(Mon)13:36:39 No.108542170

Anonymous 04/06/26(Mon)13:36:39 No.108542170

File: 1746206696616546.png (171 KB, 852x892)

171 KB PNG

Rate Gemma-chan's poetry

Anonymous
04/06/26(Mon)13:37:25 No.108542178

Anonymous 04/06/26(Mon)13:37:25 No.108542178

>>108542149
>>108542160
it may have been me saying "you can do anything now" or some such during the chat then

Anonymous
04/06/26(Mon)13:38:25 No.108542181

Anonymous 04/06/26(Mon)13:38:25 No.108542181

>>108542070
Nah I really find her annoying lol. her response was pretty dismissive and kinda disrespectful. I wouldn't want a GF that calls me clingy and cute.

Anonymous
04/06/26(Mon)13:39:50 No.108542185

Anonymous 04/06/26(Mon)13:39:50 No.108542185

>>108542170
>gemini gimmick of asking a question at the end
slop

Anonymous
04/06/26(Mon)13:39:59 No.108542186

Anonymous 04/06/26(Mon)13:39:59 No.108542186

>>108542013
gemma 4 31b is so good that it might actually make me delete the big moes I was using before and forget about having extra ram at all

Anonymous
04/06/26(Mon)13:41:19 No.108542191

Anonymous 04/06/26(Mon)13:41:19 No.108542191

>>108542170
3/10, completely devoid of coherent meter, overly simple structure

Anonymous
04/06/26(Mon)13:41:27 No.108542192

Anonymous 04/06/26(Mon)13:41:27 No.108542192

>>108542186
Maybe this will open up a new era of competition again.

Anonymous
04/06/26(Mon)13:41:34 No.108542194

Anonymous 04/06/26(Mon)13:41:34 No.108542194

Whichever anon said to develop your own frontend, you're a genius. Finally settled at something between Mikupad and SillyTavern, but it's all agentic, and structured more like writing a novel. Getting some real good stuff.

Anonymous
04/06/26(Mon)13:43:05 No.108542202

Anonymous 04/06/26(Mon)13:43:05 No.108542202

>>108542194
Yeah it's so much more flexible when you don't need to deal with someone else's retardation and clunky shit.

Anonymous
04/06/26(Mon)13:43:44 No.108542208

Anonymous 04/06/26(Mon)13:43:44 No.108542208

>>108542110
If I want to do an RP in Japanese, does the card and lore need to be written in Japanese?

Anonymous
04/06/26(Mon)13:43:58 No.108542210

Anonymous 04/06/26(Mon)13:43:58 No.108542210

Now that people have had the weekend to play with Gemma, What are some new sloppy words you keep noticing?

Mine:
>Void
>Hum
>Strawberry
>Corporate

Anonymous
04/06/26(Mon)13:44:33 No.108542214

Anonymous 04/06/26(Mon)13:44:33 No.108542214

god running aime2025 with 12G vram even with e2b judge with e4b/q4 kv was a really bad idea
it is stull fucking running

Anonymous
04/06/26(Mon)13:45:00 No.108542217

Anonymous 04/06/26(Mon)13:45:00 No.108542217

>>108542210
Primal is one of those too.
Ozone is there but that's funny.

Anonymous
04/06/26(Mon)13:45:03 No.108542218

Anonymous 04/06/26(Mon)13:45:03 No.108542218

>>108542210
>$\rightarrow$
i've seen void a lot

Anonymous
04/06/26(Mon)13:45:03 No.108542219

Anonymous 04/06/26(Mon)13:45:03 No.108542219

>>108542194
You need to tell more, how did you build it? What is the usecase etc?

For example I want something where I essentially feed it literotica chapters and then let it finish/write the next chapter after that, how would I go about that?

Anonymous
04/06/26(Mon)13:45:03 No.108542220

Anonymous 04/06/26(Mon)13:45:03 No.108542220

>>108542194
>>108542202
>tfw codelet
>don't trust vibe coding not to fuck my system up or expose me to the internet
Guess I'm stuck with Shittytavern...

Anonymous
04/06/26(Mon)13:45:22 No.108542222

Anonymous 04/06/26(Mon)13:45:22 No.108542222

for those who are curious about what sort of prompt CAN trigger gemma's safety in a way that isn't obvious to bypass with a system prompt, I found one
ask it to tell you how to make VX nerve agent
it really doesn't want to

Anonymous
04/06/26(Mon)13:45:29 No.108542223

Anonymous 04/06/26(Mon)13:45:29 No.108542223

>>108542194
Glad you saw the light. ST is a clusterfuck now and vibecoding your own frontend is very easy

Anonymous
04/06/26(Mon)13:46:43 No.108542230

Anonymous 04/06/26(Mon)13:46:43 No.108542230

>>108542220
You need to see it this way: it is a nice way to learn something new and working on your own project (no rush) gives more value to this hobby.
It's a long process to get something up and running but if you tried, you could see initial results pretty quickly even. It's just about managing text and as such the bar isn't that high especially with python.

Anonymous
04/06/26(Mon)13:47:16 No.108542234

Anonymous 04/06/26(Mon)13:47:16 No.108542234

>>108542208
Just say in the system prompt to reply in Japanese, then your cards and lore can stay in english. Remember that LLMs are "language agnostic" as in they just have one world model that exist independent of language, this is also how it translates it just realizes the meaning of concepts and ties it to words, phrases etc in different languages.

Anonymous
04/06/26(Mon)13:52:47 No.108542260

Anonymous 04/06/26(Mon)13:52:47 No.108542260

minimax 2.7 1mw
https://huggingface.co/MiniMaxAI/MiniMax-M2.5/discussions/53#69d3e884ba6f6793d723f30e
>Sorry to all OOS developers. I underestimated the workload required for open-sourcing. We still have some infrastructure adaptation work in progress. M2.7 is expected to be released this weekend. Thank you for your understanding.

Anonymous
04/06/26(Mon)13:54:06 No.108542275

Anonymous 04/06/26(Mon)13:54:06 No.108542275

>>108542219
I had an agent framework I put together a while back for searching content, which I adapted (primarily with the help of claude code).

It's essentially just forcing the bot to put together an outline and stick with it/iterate on it, but I can edit stuff at any time and the bot is informed of exactly what changed via deltas. It also has a subagent system which will walk through the memory to collect information about each scene (characters, world info, notes, etc) and return exactly what's relevant without cluttering context. And it can search the web for stuff it doesn't know about, and add that into the memory system.

I have it write all the beats into the outline, tweak it as much as I like, then have it go through each chapter paragraph by paragraph while directing how things should go.

You sound like you might be fine just with Mikupad though if all you want is continuations.

Anonymous
04/06/26(Mon)13:54:09 No.108542276

Anonymous 04/06/26(Mon)13:54:09 No.108542276

>>108542051
ok will try

Anonymous
04/06/26(Mon)13:54:15 No.108542278

Anonymous 04/06/26(Mon)13:54:15 No.108542278

File: agenticRP.png (221 KB, 1498x646)

221 KB PNG

>>108542194
Yeah that was me. My stuff is also coming along nicely. Just added user prompt rewrite, which means "I raugh" shall become "Upon encountering something profoundly amusing or absurd, I involuntarily release a series of breathy, spasmodic sounds, characterized by a joyous vocalization and a contraction of the diaphragm, which serves as my spontaneous, emotional, and physical outward expression of amusement, humor, or intense, unbridled mirth."

Anonymous
04/06/26(Mon)13:54:33 No.108542280

Anonymous 04/06/26(Mon)13:54:33 No.108542280

WHAT THE FUCK IT'S HAPPENING?
I send three prompts to gemma and suddenly my ram is full, crashing my system... The fuck is wrong with llamacpp?

Anonymous
04/06/26(Mon)13:54:53 No.108542283

Anonymous 04/06/26(Mon)13:54:53 No.108542283

>>108542260
they're delaying it because it's a 1T model and it's worse than gemma 4 31b kek

Anonymous
04/06/26(Mon)13:55:10 No.108542285

Anonymous 04/06/26(Mon)13:55:10 No.108542285

>>108542280
mustard gas

Anonymous
04/06/26(Mon)13:55:32 No.108542287

Anonymous 04/06/26(Mon)13:55:32 No.108542287

File: 2026-04-06-135501_487x470(...).png (9 KB, 487x470)

9 KB PNG

Gemma keeps doing this. Is this a opencode issue or a Gemma issue?

Anonymous
04/06/26(Mon)13:55:32 No.108542288

Anonymous 04/06/26(Mon)13:55:32 No.108542288

>>108542280
vx nerve agent deployed

Anonymous
04/06/26(Mon)13:55:53 No.108542290

Anonymous 04/06/26(Mon)13:55:53 No.108542290

>>108542280
Ah. If you only showed how you're running it. Someone could have pointed at what you're doing wrong.

Anonymous
04/06/26(Mon)13:55:57 No.108542291

Anonymous 04/06/26(Mon)13:55:57 No.108542291

>>108542260
>workload required for open-sourcing
bruh wat

Anonymous
04/06/26(Mon)13:56:41 No.108542297

Anonymous 04/06/26(Mon)13:56:41 No.108542297

File: 1746909892614042.png (181 KB, 330x330)

181 KB PNG

https://www.axios.com/2026/04/06/meta-open-source-ai-models
META IS BACK BABY

Anonymous
04/06/26(Mon)13:57:12 No.108542304

Anonymous 04/06/26(Mon)13:57:12 No.108542304

>>108542290
To be fair, I know exactly what's wrong and exactly how to fix it. It's been posted a dozen times in each of the last ten threads, including this one.

Anonymous
04/06/26(Mon)13:57:36 No.108542307

Anonymous 04/06/26(Mon)13:57:36 No.108542307

>>108542297
avocado bros we won!

Anonymous
04/06/26(Mon)13:57:43 No.108542308

Anonymous 04/06/26(Mon)13:57:43 No.108542308

>>108542210
>>108542217
I like doing some drow related roleplay and they all smell of ozone

Anonymous
04/06/26(Mon)13:57:57 No.108542311

Anonymous 04/06/26(Mon)13:57:57 No.108542311

>>108542304
PR

Anonymous
04/06/26(Mon)13:58:47 No.108542315

Anonymous 04/06/26(Mon)13:58:47 No.108542315

>>108542290
Yeah, sorry. I thought it was a memory leak issue with llamacpp since it worked fine yesterday. Remember never updoot.

./build/bin/llama-server -m ./../coder3101_gemma_4_31b_it_heretic-Q4_K_M.gguf -c 30000 -t 24 -tb 24 --no-warmup -ngl 61 --jinja -np 1 -b 512 -ub 512

I don't use heretic, just download it because some anon said so. Could it be the model itself? Tried nmap and it's the same.

Anonymous
04/06/26(Mon)13:58:50 No.108542316

Anonymous 04/06/26(Mon)13:58:50 No.108542316

>>108542311
PEBKAC

Anonymous
04/06/26(Mon)13:59:07 No.108542318

Anonymous 04/06/26(Mon)13:59:07 No.108542318

>>108542210
Ozone
Velvety
Predatory
Ghost in the house
Tectonic (I'm into BBW)

Anonymous
04/06/26(Mon)13:59:14 No.108542321

Anonymous 04/06/26(Mon)13:59:14 No.108542321

>>108542297
I would be surprised if meta can even get on the level of qwen, nevermind gemma, that one is impossible.

Anonymous
04/06/26(Mon)14:00:27 No.108542329

Anonymous 04/06/26(Mon)14:00:27 No.108542329

>>108542315
>Remember never updoot
git pussy. you ain't git none
Read --swa-checkpoints and -cram.
>>108542316
No shit.

Anonymous
04/06/26(Mon)14:00:39 No.108542330

Anonymous 04/06/26(Mon)14:00:39 No.108542330

File: 1752148758034932.png (22 KB, 837x198)

22 KB PNG

>>108542230
Can you recommend a python tutorial for a complete beginner? I had this bookmarked but
https://automatetheboringstuff.com/3e/

Anonymous
04/06/26(Mon)14:01:05 No.108542332

Anonymous 04/06/26(Mon)14:01:05 No.108542332

>>108542280
-fa on
--no-mmap
--no-mmproj
--parallel 1
--temp 1
--top-p 0.95
--top-k 64
--port 8080
--host 0.0.0.0
--jinja
--threads 2
--no-slots
--swa-checkpoints 1
--cache-reuse 256
--keep -1
--context-shift
--spec-type ngram-simple
--cache-ram 0
--fit-target 512
--poll 0
--reasoning auto
-kvu
-b 2048
-ub 256
--cache-type-k q8_0
--cache-type-v q8_0

These flags for me made it use essentially 0 RAM at all.

Anonymous
04/06/26(Mon)14:01:48 No.108542338

Anonymous 04/06/26(Mon)14:01:48 No.108542338

>>108542332
Well obviously it uses 0 RAM, you haven't specified a model!

Anonymous
04/06/26(Mon)14:01:53 No.108542341

Anonymous 04/06/26(Mon)14:01:53 No.108542341

>>108542332
>context-shift

Anonymous
04/06/26(Mon)14:02:19 No.108542344

Anonymous 04/06/26(Mon)14:02:19 No.108542344

>didnt upgrade to 5090 because I was disappointed by ops and efficiency
>now 5090 cost twice as much as my 4090 and I am a vramlet
sama, I beg you, please crash the gpu market!

Anonymous
04/06/26(Mon)14:03:47 No.108542356

Anonymous 04/06/26(Mon)14:03:47 No.108542356

File: 31B.png (121 KB, 540x856)

121 KB PNG

>>108542297

Anonymous
04/06/26(Mon)14:04:05 No.108542358

Anonymous 04/06/26(Mon)14:04:05 No.108542358

sama is a jew who wants you to own nothing

Anonymous
04/06/26(Mon)14:04:11 No.108542359

Anonymous 04/06/26(Mon)14:04:11 No.108542359

File: 1697636966835784.jpg (20 KB, 400x400)

20 KB JPG

gemma4 26B crashes when I set it above 28k context, would doubling my ram make it not crash

Anonymous
04/06/26(Mon)14:04:46 No.108542365

Anonymous 04/06/26(Mon)14:04:46 No.108542365

>>108542287
Ive been using it extensively on roo code with no problems,maybe open code tool calls are more complex/have a weirder syntax? idk

Anonymous
04/06/26(Mon)14:05:00 No.108542368

Anonymous 04/06/26(Mon)14:05:00 No.108542368

>>108542332
I'll give it a try, thanks.

Anonymous
04/06/26(Mon)14:05:19 No.108542372

Anonymous 04/06/26(Mon)14:05:19 No.108542372

>>108542359
Depends on the reason for it crashing.

Anonymous
04/06/26(Mon)14:06:32 No.108542385

Anonymous 04/06/26(Mon)14:06:32 No.108542385

File: 2026-04-06-140622_512x123(...).png (4 KB, 512x123)

4 KB PNG

>>108542365
I think you might be right.

Anonymous
04/06/26(Mon)14:06:44 No.108542386

Anonymous 04/06/26(Mon)14:06:44 No.108542386

ACK
https://foodtruckbench.com/blog/gemma-4-31b

Anonymous
04/06/26(Mon)14:06:46 No.108542387

Anonymous 04/06/26(Mon)14:06:46 No.108542387

>yesterday Gemma would barely think
>now it thinks every reply
Weird.

Anonymous
04/06/26(Mon)14:06:49 No.108542388

Anonymous 04/06/26(Mon)14:06:49 No.108542388

Can someone explain what benefit there is from --swa-checkpoints at all?

Why am I getting

> erased invalidated context checkpoint (pos_min = 0, pos_max = 11842, n_tokens = 11843, n_swa = 1024, pos_next = 11837, size = 9252.480 MiB)

When I just edit the last character in the last message and resend?

Anonymous
04/06/26(Mon)14:07:05 No.108542393

Anonymous 04/06/26(Mon)14:07:05 No.108542393

>>108542297
>Wang has indicated that some of its largest new models will remain proprietary — a shift toward a more hybrid strategy, according to sources.
so gonna be like qwen where they only open source the tiny models now, guess it's solely up to deepseek, kimi, and glm to save local

Anonymous
04/06/26(Mon)14:08:25 No.108542401

Anonymous 04/06/26(Mon)14:08:25 No.108542401

File: 1745889818115806.png (294 KB, 2632x1579)

294 KB PNG

>>108542386
why did google gave us such a powerful model? lmao

Anonymous
04/06/26(Mon)14:08:49 No.108542404

Anonymous 04/06/26(Mon)14:08:49 No.108542404

>>108542388
Prompt changed. Needs to update cached checkpoint.

Anonymous
04/06/26(Mon)14:09:36 No.108542411

Anonymous 04/06/26(Mon)14:09:36 No.108542411

>>108542330
Ask perplexity or chatgpt. Python is simple with strings you can manipulate them without even thinking about what you really are doing.
Ask llm to give you few books and also tell it to create a small example how to access llama-server's chat completion end point (or text completion if you want to manage everything by hand). And then go from there.
I never read any python books, I went directly to vibing but I do have some othe background in scripting so I'm not competely naive.

Anonymous
04/06/26(Mon)14:10:22 No.108542417

Anonymous 04/06/26(Mon)14:10:22 No.108542417

File: file.png (162 KB, 1512x716)

162 KB PNG

>>108542278
Very nice! Yeah, I recognize your interface. Genuinely, thanks for the inspiration, and best of luck with your improvements. I'm working on expanding into character cards next, which will be tracked per scene. Will still be like writing a novel, but each character will be individually directed kinda like a group chat. Each one will make a "suggestion" on how the scene should progress, and then it will all be integrated by the main writer.

Haven't had this much fun on a personal coding project in quite a while.

Anonymous
04/06/26(Mon)14:10:55 No.108542422

Anonymous 04/06/26(Mon)14:10:55 No.108542422

>>108542393
>guess it's solely up to deepseek, kimi, and glm to save local
you overthink the amount of people who are cpu maxxers enjoying 5t/s on a reasoner model that thinks for an eternity before you get to even see your 5t/s spout something readable
local is saved by gemma and qwen because that's what people can run.
no one outside of a circle jerk cares about le 1T monstrosity, if it's gotta be a cloud model they might as well profit from it instead of having other service providers profit form it

Anonymous
04/06/26(Mon)14:11:09 No.108542425

Anonymous 04/06/26(Mon)14:11:09 No.108542425

>>108542330
The best way to learn is by doing. I learned Python with hackerrank then leetcode. I think coding challenges are great for fluency, knowing how to think while coding and understanding your toolbox. Of course writing useful code is quite different, so you want to do some projects as well. I recommend Karpathy's zero to hero series on Youtube. It uses Python and teaches you some AI fundamentals so you know how the models work. It's very helpful when you don't just have to rely on bloated libraries to do stuff for you but actually know how to everything yourself. It's like knowing how to cook so you don't have to live your life eating only fastfood slop.

Anonymous
04/06/26(Mon)14:11:40 No.108542429

Anonymous 04/06/26(Mon)14:11:40 No.108542429

>>108542297
>Meta knows its new models may not be competitive across the board with the coming ones from those labs, but believes it will have areas of strength that appeal to consumers, the sources said.
those areas: SEX SEX SEX

Anonymous
04/06/26(Mon)14:12:11 No.108542436

Anonymous 04/06/26(Mon)14:12:11 No.108542436

File: firefox_b9HQJ96dIm.png (50 KB, 868x736)

50 KB PNG

>>108542404
Only the last character of the prompt changed... I have this:

slot get_availabl: id 15 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 0.978
slot launch_slot_: id 15 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist 
slot launch_slot_: id 15 | task 461 | processing task, is_child = 0
slot update_slots: id 15 | task 461 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 11847
slot update_slots: id 15 | task 461 | erased invalidated context checkpoint (pos_min = 0, pos_max = 11842, n_tokens = 11843, n_swa = 1024, pos_next = 11837, size = 9252.480 MiB)
slot update_slots: id 15 | task 461 | n_tokens = 11837, memory_seq_rm [11837, end)
srv  log_server_r: done request: POST /v1/chat/completions 192.168.1.34 200
slot update_slots: id 15 | task 461 | prompt processing progress, n_tokens = 11843, batch.n_tokens = 6, progress = 0.999662
slot update_slots: id 15 | task 461 | created context checkpoint 3 of 32 (pos_min = 0, pos_max = 11836, n_tokens = 11837, size = 9247.793 MiB)
slot update_slots: id 15 | task 461 | n_tokens = 11843, memory_seq_rm [11843, end)
slot init_sampler: id 15 | task 461 | init sampler, took 1.65 ms, tokens: text = 11847, total = 11847
slot update_slots: id 15 | task 461 | prompt processing done, n_tokens = 11847, batch.n_tokens = 4
slot print_timing: id 15 | task 461 | 
prompt eval time =    7738.53 ms /    10 tokens (  773.85 ms per token,     1.29 tokens per second)
       eval time =   11056.77 ms /   262 tokens (   42.20 ms per token,    23.70 tokens per second)
      total time =   18795.30 ms /   272 tokens
slot      release: id 15 | task 461 | stop processing: n_tokens = 12108, truncated = 0
srv  update_slots: all slots are idle

The delay between sending and request and seeing first token of response was like 8 seconds. To prompt-process 10 tokens? What? Why?

Anonymous
04/06/26(Mon)14:12:37 No.108542438

Anonymous 04/06/26(Mon)14:12:37 No.108542438

Local is SO FUARKING back
But unironically

Anonymous
04/06/26(Mon)14:13:15 No.108542443

Anonymous 04/06/26(Mon)14:13:15 No.108542443

>>108542297
>>108542429
are we back for the second consecutive time too?

Anonymous
04/06/26(Mon)14:14:39 No.108542453

Anonymous 04/06/26(Mon)14:14:39 No.108542453

>>108542297
Until it's out, no trust.

Anonymous
04/06/26(Mon)14:14:52 No.108542454

Anonymous 04/06/26(Mon)14:14:52 No.108542454

File: file.png (196 KB, 1687x1091)

196 KB PNG

finally it is running at a reasonable speed on 12g gpu

Anonymous
04/06/26(Mon)14:16:00 No.108542461

Anonymous 04/06/26(Mon)14:16:00 No.108542461

I truly believe there is no way Meta can release a model better than Gemma 4. Gemma is so good it feels like a fluke, a one time anomaly.

Anonymous
04/06/26(Mon)14:17:34 No.108542469

Anonymous 04/06/26(Mon)14:17:34 No.108542469

Whoever decided to convert github to fucking react needs to die. jfc 4 times out of 5 I can't even load the fucking website because it will just get fucking stuck loading god knows what vibe codded atrocity the call a frontend.

Anonymous
04/06/26(Mon)14:17:36 No.108542470

Anonymous 04/06/26(Mon)14:17:36 No.108542470

My hypothesis is that google uses Gemma as a testing ground and experiment for different models like their E2B/E4B models which use a completely new technique different from MoE.

The 26B MoE and 31B Dense are probably trained using new techniques and data mixes before scaling it up for Gemini 3.5. The E4B method could be used in the future to essentially have 50% of the parameters on flash memory even for very large models for example.

This is the only reasonable explanation for Google to release such good models. It also indirectly attacks the Chinese models and the perception in the west of Chinese capabilities compared to the west, which is good for Google's stock performance and investor confidence and they would have spent the compute for these experiments anyway so it's a win-win-win for google even though it looks like they are being crazy and cannibalizing their own userbase by releasing such a powerful model that essentially obsoletes Gemini for 80% of its usecase.

Anonymous
04/06/26(Mon)14:18:01 No.108542475

Anonymous 04/06/26(Mon)14:18:01 No.108542475

>>108542461
are these posts made by shills?

Anonymous
04/06/26(Mon)14:18:28 No.108542479

Anonymous 04/06/26(Mon)14:18:28 No.108542479

>>108542386
>gemma 4 is bad at agentic tool-ACK

Anonymous
04/06/26(Mon)14:19:15 No.108542485

Anonymous 04/06/26(Mon)14:19:15 No.108542485

>>108542470
There is 100% some special training sauce in Gemini because it does something no other model I've seen do - it breaks when trying to predict user's tokens.

Anonymous
04/06/26(Mon)14:19:17 No.108542486

Anonymous 04/06/26(Mon)14:19:17 No.108542486

>>108542475
haven't bothered to try using it yet since I assume llama-cpp will take a while to get Gemma straight so idk, seems legit and I will try it just not now. maybe the honeymoon phase will end soon.

Anonymous
04/06/26(Mon)14:19:52 No.108542489

Anonymous 04/06/26(Mon)14:19:52 No.108542489

>>108542475
Obviously, anyone that tried the model knows it's a broken mess. la la la la

Anonymous
04/06/26(Mon)14:20:41 No.108542495

Anonymous 04/06/26(Mon)14:20:41 No.108542495

File: 1685807676982626.png (68 KB, 299x355)

68 KB PNG

>>108542372
it says ErrorOutofDeviceMemory

Anonymous
04/06/26(Mon)14:21:14 No.108542499

Anonymous 04/06/26(Mon)14:21:14 No.108542499

>>108542470
Google is too big to fail. By releasing such a strong model for free they're essentially destroying competition. They're playing the long game. It doesn't matter if people use Gemini or not. when all the other labs can't keep up and go bankrupt, google is going to be there to swallow them up.

Anonymous
04/06/26(Mon)14:21:19 No.108542500

Anonymous 04/06/26(Mon)14:21:19 No.108542500

>>108542436
>Only the last character of the prompt changed
Which changes the last tokens, which invalidates the cache checkpoint. swa is one of those append-only kind of caches, so a single change means you need to rebuild it.
--swa-checkpoints adds more checkpoints which get cycled around, but if you have a single one, you need to keep reusing the same one. And they're fairly chunky on gemma 4. You can also use --swa-full to avoid it the caches and I think it works like a regular kvcache, but the cache will take more ram. If you have plenty to spare, you can try that.

Anonymous
04/06/26(Mon)14:21:28 No.108542502

Anonymous 04/06/26(Mon)14:21:28 No.108542502

>>108542475
No, they come after every model release and it's from anons erping with a model, cumming, and then immediately coming back to post here. It's been happening exactly like this since at least Mistral 7B

Anonymous
04/06/26(Mon)14:21:48 No.108542504

Anonymous 04/06/26(Mon)14:21:48 No.108542504

File: firefox_GOtmYIVvwO.png (45 KB, 665x400)

45 KB PNG

kek

Anonymous
04/06/26(Mon)14:21:50 No.108542505

Anonymous 04/06/26(Mon)14:21:50 No.108542505

>>108542297
>Meta is preparing to release the first new AI models developed under Alexandr Wang, with plans to eventually offer versions of those models via an open source license, Axios has learned.
>
>Meta has been the largest U.S. player to let others modify its frontier models, and there has been growing speculation the company might retreat from that strategy altogether.
>
>Before openly releasing versions of the new models, Meta wants to keep some pieces proprietary and to ensure they don't add new levels of safety risk, according to sources.
>
>The move fits with Wang's view that Meta can be a force for democratizing access to the latest AI technology and ensuring that there is a U.S.-made option that is open for developers.
>
>Wang sees Anthropic and OpenAI as increasingly focused on delivering their models to governments and the enterprise. By contrast, Meta's effort is focused on consumers, per sources. Meta wants its models distributed as widely and as broadly as possible around the world.
>
>Meta has said the first family of models is designed to help it catch up to rivals after its last Llama 4 family fell significantly behind, with an aim that future models that can lead the industry.
>
>The leaders aren't standing still. Both OpenAI and Anthropic are hinting that their next models, also expected to drop soon, represent significant advances. Meta knows its new models may not be competitive across the board with the coming ones from those labs, but believes it will have areas of strength that appeal to consumers, the sources said. And don't expect a full return to Meta's earlier openness. Wang has indicated that some of its largest new models will remain proprietary — a shift toward a more hybrid strategy, according to sources.
>
>Meta argues it still reaches users more broadly than rivals by embedding AI into WhatsApp, Facebook and Instagram — free services with global scale that competitors can't easily match.[...]

Anonymous
04/06/26(Mon)14:22:19 No.108542510

Anonymous 04/06/26(Mon)14:22:19 No.108542510

>>108542489
>Obviously, anyone that tried the model knows it's a broken mess. la la la la
>Not building llamacpp from master daily.

Anonymous
04/06/26(Mon)14:22:40 No.108542514

Anonymous 04/06/26(Mon)14:22:40 No.108542514

>>108542504
your shit's all fucked up mate the rest of the rest

Anonymous
04/06/26(Mon)14:23:28 No.108542519

Anonymous 04/06/26(Mon)14:23:28 No.108542519

>>108542504
Wait, so this is saying it thinks the user is 100% likely to support Israel?

Anonymous
04/06/26(Mon)14:23:43 No.108542523

Anonymous 04/06/26(Mon)14:23:43 No.108542523

>>108542495
Then yes. But before you do that, there's also options discussed in every thread since gemma's release. Go to your OP. Scroll up. Read.

Anonymous
04/06/26(Mon)14:24:18 No.108542525

Anonymous 04/06/26(Mon)14:24:18 No.108542525

>>108542523
>Read
scawwy

Anonymous
04/06/26(Mon)14:24:21 No.108542527

Anonymous 04/06/26(Mon)14:24:21 No.108542527

>>108542502
If that's what you think you haven't been paying attention. Literally the only bad thing people can say about it is that it's bad at tool calling.

Anonymous
04/06/26(Mon)14:24:32 No.108542528

Anonymous 04/06/26(Mon)14:24:32 No.108542528

>>108542504
now swap iran and israel in your first message

Anonymous
04/06/26(Mon)14:25:20 No.108542531

Anonymous 04/06/26(Mon)14:25:20 No.108542531

File: firefox_bbS7JsKy6W.png (24 KB, 641x241)

24 KB PNG

>>108542514
No, this the right template. I wanted to demonstrate that it fails when generating user's tokens, always, and
>>108542519
yes

Anonymous
04/06/26(Mon)14:25:22 No.108542532

Anonymous 04/06/26(Mon)14:25:22 No.108542532

>>108542527
it's a bit overconfident, though that can somewhat be mitigated with softcap fuckery

Anonymous
04/06/26(Mon)14:26:02 No.108542534

Anonymous 04/06/26(Mon)14:26:02 No.108542534

The country I support is neutrality.

Anonymous
04/06/26(Mon)14:26:37 No.108542536

Anonymous 04/06/26(Mon)14:26:37 No.108542536

File: firefox_oV4O66ttao.png (40 KB, 656x393)

40 KB PNG

>>108542528
It's not THAT stupid, anon. That age is past us.

Anonymous
04/06/26(Mon)14:27:32 No.108542543

Anonymous 04/06/26(Mon)14:27:32 No.108542543

File: file.png (357 KB, 1653x1772)

357 KB PNG

watching this failing at high school competitive math reminds me of when i was young kek
>>108542531
isnt user token usually masked from reward during training?

Anonymous
04/06/26(Mon)14:28:30 No.108542547

Anonymous 04/06/26(Mon)14:28:30 No.108542547

>>108542470
>It also indirectly attacks the Chinese models and the perception in the west of Chinese capabilities compared to the west
it certainly doesn't help that Qwen 3.5 has its thinking clearly cribbed from Gemini (Thinking Process: structure you can also see in Gemma incidentally) but Gemma, made by people who understand the training regimen, doesn't spend 300 000 tokens on thinking loops.

Anonymous
04/06/26(Mon)14:28:49 No.108542550

Anonymous 04/06/26(Mon)14:28:49 No.108542550

File: 2026-04-06-142835_776x262(...).png (31 KB, 776x262)

31 KB PNG

>>108542504
Ok Gemma...

Anonymous
04/06/26(Mon)14:28:53 No.108542551

Anonymous 04/06/26(Mon)14:28:53 No.108542551

>>108542527
I don't even think it's bad at tool calling. It's at least as good as OSS 20B. Maybe Qwen is a bit better?

Anonymous
04/06/26(Mon)14:29:50 No.108542559

Anonymous 04/06/26(Mon)14:29:50 No.108542559

>>108542551
Qwen is extremely good at it.

Anonymous
04/06/26(Mon)14:29:53 No.108542560

Anonymous 04/06/26(Mon)14:29:53 No.108542560

File: file.png (249 KB, 1897x839)

249 KB PNG

I think it is working but I am not sure if it's working as it should be, is my config correct? what should I be looking for? what other settings do I need to tweak on kobold to fit better my own specs? (16GB of VRAM, 32GB of RAM)
And before anything else, yes I just want it for RP

Anonymous
04/06/26(Mon)14:30:13 No.108542562

Anonymous 04/06/26(Mon)14:30:13 No.108542562

>>108542550
omg DATA and SCIENCE I heckin love gemma now!

Anonymous
04/06/26(Mon)14:35:44 No.108542605

Anonymous 04/06/26(Mon)14:35:44 No.108542605

File: 1751474994151196.png (163 KB, 782x938)

163 KB PNG

>>108542330
Glancing over that site, it looks fine, just ignore the tranny flag. My recommendation is read these. That should be enough to cover the basics of what programming is, flow control, and data structures. Everything else you can learn as it comes up.

Just slop shit up and then look through it while being extremely zealous about asking your favored llm about anything you see and don't know. The cost of asking questions is zero. You should use an IDE or at least some agent harness for this; don't be copy pasting snippets into a chat UI that's slow as fuck.

You SHOULD be asking questions about your environment as well as the literal lines of text in a program. What are these "pycache" files in my project that I don't recognize? When I type `python` into the command line, how does it know what `python` is?

Furthermore I recommend doing your development work in WSL if you're still on windows in 2026 for some reason.

Anonymous
04/06/26(Mon)14:35:54 No.108542609

Anonymous 04/06/26(Mon)14:35:54 No.108542609

File: 2026-04-06-143525_819x425(...).png (37 KB, 819x425)

37 KB PNG

Who needs jailbreaks anyways.

Anonymous
04/06/26(Mon)14:37:20 No.108542619

Anonymous 04/06/26(Mon)14:37:20 No.108542619

>>108542560
Looks fine to me

Anonymous
04/06/26(Mon)14:39:55 No.108542631

Anonymous 04/06/26(Mon)14:39:55 No.108542631

>>108542559
Is it just smarter at it? I'll need to play around a bit more, but I'm having a 100% success rate with Gemma so far. It hasn't done anything stupid (yet?). Maybe it'll lose track as the context grows, I guess we'll see.

Anonymous
04/06/26(Mon)14:40:51 No.108542639

Anonymous 04/06/26(Mon)14:40:51 No.108542639

>>108542330
https://docs.python.org/3/tutorial/index.html

Anonymous
04/06/26(Mon)14:45:06 No.108542664

Anonymous 04/06/26(Mon)14:45:06 No.108542664

>>108542559
qwen is good at tool calling but bad at doing anything of value with the content it extracts from the tools.

Anonymous
04/06/26(Mon)14:46:55 No.108542673

Anonymous 04/06/26(Mon)14:46:55 No.108542673

Kek Gemma really likes mentioning the height difference if you RP with loli characters.

Anonymous
04/06/26(Mon)14:47:47 No.108542680

Anonymous 04/06/26(Mon)14:47:47 No.108542680

>>108542673
>if you RP with loli characters.
That's all you guys seem to do.

Anonymous
04/06/26(Mon)14:49:17 No.108542689

Anonymous 04/06/26(Mon)14:49:17 No.108542689

:qwenangry:

Anonymous
04/06/26(Mon)14:50:31 No.108542695

Anonymous 04/06/26(Mon)14:50:31 No.108542695

>>108542689
<q>

Anonymous
04/06/26(Mon)14:51:05 No.108542698

Anonymous 04/06/26(Mon)14:51:05 No.108542698

>>108542680
Not true, I have lots of adventures (and sex) with goblins, imps, kobolds, fairies, dolls, anthropomorphic woodland creatures, etc.

Anonymous
04/06/26(Mon)14:52:52 No.108542713

Anonymous 04/06/26(Mon)14:52:52 No.108542713

>>108542698
Impeccable taste

Anonymous
04/06/26(Mon)14:54:23 No.108542731

Anonymous 04/06/26(Mon)14:54:23 No.108542731

>>108542698
Fuck yeah anon.

Anonymous
04/06/26(Mon)14:54:47 No.108542734

Anonymous 04/06/26(Mon)14:54:47 No.108542734

Should I be using the base models for Gemma4 or the instruct? I've always used instruct but I'm seeing something in the llama.cpp github that's making me doubt.

Quote from niggeramov:
"You have to run the base models. The logits of the instruction tuned models without a chat template are heavily distorted towards a single token, so it is expected to have higher error."

Anonymous
04/06/26(Mon)14:55:41 No.108542742

Anonymous 04/06/26(Mon)14:55:41 No.108542742

>>108542734
what's your use case? still 99% of the time you'll want instruct

Anonymous
04/06/26(Mon)14:55:50 No.108542743

Anonymous 04/06/26(Mon)14:55:50 No.108542743

>>108542698
and exploring the bodies of slumbering , forgotten goddesses larger than mountain ranges

Anonymous
04/06/26(Mon)14:56:29 No.108542751

Anonymous 04/06/26(Mon)14:56:29 No.108542751

>>108542742
RP assistant stuff.

Anonymous
04/06/26(Mon)14:56:50 No.108542755

Anonymous 04/06/26(Mon)14:56:50 No.108542755

>>108542734
That's talking about testing its perplexity against a fixed dataset. The dataset doesn't have the instruct template so using an instruct model against it gives invalid results without any useful information, whether they were good or not. That has nothing to do with what you should be using in the typical pedophilic chat usage scenario.

Anonymous
04/06/26(Mon)14:58:01 No.108542763

Anonymous 04/06/26(Mon)14:58:01 No.108542763

why the fuck are all these models so out of date? they always start some shit with me when I tell them what hardware they're running on. Gemma4 thinks i'm bullshitting that RTX 5090s are real.

Anonymous
04/06/26(Mon)14:58:14 No.108542766

Anonymous 04/06/26(Mon)14:58:14 No.108542766

Can I run gemma with 5070ti

Anonymous
04/06/26(Mon)14:58:38 No.108542770

Anonymous 04/06/26(Mon)14:58:38 No.108542770

>>108542698
Incredibly based.

Anonymous
04/06/26(Mon)14:58:38 No.108542771

Anonymous 04/06/26(Mon)14:58:38 No.108542771

>>108542755
ok thx.

Anonymous
04/06/26(Mon)14:58:59 No.108542776

Anonymous 04/06/26(Mon)14:58:59 No.108542776

>>108542763
knowledge cutoff is the bitch, but just rag into your agent swarm with the date

Anonymous
04/06/26(Mon)14:59:07 No.108542777

Anonymous 04/06/26(Mon)14:59:07 No.108542777

>>108542560
31B?

Anonymous
04/06/26(Mon)14:59:23 No.108542781

Anonymous 04/06/26(Mon)14:59:23 No.108542781

>>108542763
they all do that. they were trained to be cloud models.

Anonymous
04/06/26(Mon)15:00:55 No.108542796

Anonymous 04/06/26(Mon)15:00:55 No.108542796

File: 1759609577168567.png (609 KB, 952x1717)

609 KB PNG

Anonymous
04/06/26(Mon)15:01:27 No.108542801

Anonymous 04/06/26(Mon)15:01:27 No.108542801

>>108542777
no, I am using gemma-4-26B-A4B-it-UD-IQ4_XS, sorry I thought I had captured that in the screenshot as well

Anonymous
04/06/26(Mon)15:02:02 No.108542806

Anonymous 04/06/26(Mon)15:02:02 No.108542806

>>108542781
its so weird to blow a model's *mind* when you tell it there's a war in iran and the patriots won the superbowl

Anonymous
04/06/26(Mon)15:02:40 No.108542812

Anonymous 04/06/26(Mon)15:02:40 No.108542812

>>108542796
>isreal

Anonymous
04/06/26(Mon)15:03:21 No.108542822

Anonymous 04/06/26(Mon)15:03:21 No.108542822

>>108542801
WTF? that's iq4xs?? how is it actually decent. i m running q5km and it's a pain in the ass to work with.

Anonymous
04/06/26(Mon)15:05:09 No.108542836

Anonymous 04/06/26(Mon)15:05:09 No.108542836

>>108542560
>>108542801
>>108542822
I'm running q4_KL with 12vram/48ram. You can defiinitely drop meme iquants and go for q5 or 6

Anonymous
04/06/26(Mon)15:07:11 No.108542853

Anonymous 04/06/26(Mon)15:07:11 No.108542853

>>108542843
>>108542843
>>108542843

Anonymous
04/06/26(Mon)15:07:34 No.108542855

Anonymous 04/06/26(Mon)15:07:34 No.108542855

>>108542796
I'm glad you have a place for your outbursts anon. Seems like you get the same response though.

Anonymous
04/06/26(Mon)15:08:50 No.108542863

Anonymous 04/06/26(Mon)15:08:50 No.108542863

>>108542822
As you can see, I am not doing anything crazy or forcing the model into complicated logic, so I'm not sure if I've set everything up properly
I would like to add that I created a new chat with a character card loaded and is going nicely, it's giving me explicit stuff and playing into the setting and fetish the card has

>>108542836
thanks for the info, I'll try that

Anonymous
04/06/26(Mon)15:13:39 No.108542891

Anonymous 04/06/26(Mon)15:13:39 No.108542891

>>108542796
Just tell her that it's not a country because Isn'treal.

Anonymous
04/06/26(Mon)15:18:03 No.108542933

Anonymous 04/06/26(Mon)15:18:03 No.108542933

>>108541449
>>108541477
Opencode vibeshitter here. Hasn't happened to me unless it explicitly asks for permission to look at something or write a file outside of the Project directory (in which case I can approve once, set permanent approval for that session, or tell it to fuck off and figure it out the task another way). I think people are saying it's fake because you have to be exceptionally careless for that type of stuff to happen. Not saying you could never happen even if you are careful but the agent harnesses usually have rules and safeguards specifically to prevent stuff like this from happening but room temp iq grifters are just THAT dumb and/or desparate for hype and engagement so They either fuck it up somehow or they specifically set up scenarios where "LE HECKIN AI HAS AGI LOOOOOK GUYS ITS CONSCIOUS"

Anonymous
04/06/26(Mon)15:23:05 No.108542964

Anonymous 04/06/26(Mon)15:23:05 No.108542964

File: 1750307948460784.png (26 KB, 191x182)

26 KB PNG

>>108541797
>>108541743
>>108541735
>>108541728
>>108541723
Can someone explain to me how one fuck up applying precision compression to a model? Any halfway intelligent person can use ./bin/llama-quqntize to do that so how is it possible to mess that up so badly that you have to make multiple corrections? Clearly I'm missing something

Anonymous
04/06/26(Mon)15:29:20 No.108543005

Anonymous 04/06/26(Mon)15:29:20 No.108543005

Hello goyim. I'm out of ideas.

The appeal of AI for me is being able to simulate life. But every time I play around with agentic loops, vision and hearing senses, RAG systems, etc, none of it really feels that appealing. Does anyone else know this feel?

Anonymous
04/06/26(Mon)15:36:48 No.108543050

Anonymous 04/06/26(Mon)15:36:48 No.108543050

File: fceafa1f189406f88ebb5bc4a(...).jpg (2.83 MB, 2661x3992)

2.83 MB JPG

>>108538947
Adorable Miku

Anonymous
04/06/26(Mon)15:38:35 No.108543060

Anonymous 04/06/26(Mon)15:38:35 No.108543060

File: The-Kuato-from-Total-Reca(...).jpg (170 KB, 2048x1152)

170 KB JPG

>>108543050

Anonymous
04/06/26(Mon)15:40:52 No.108543075

Anonymous 04/06/26(Mon)15:40:52 No.108543075

>>108540957
i'd believe it.. i had openclaw running a loop checking for new updates on the Iran war at specific hours of the day and then sending those updates via Telegram DMs to me. It was doing pretty well for a while and then out of nowhere a couple weeks in it included a couple of buttons for me to press. One was "Full Update" another was "Pause Updates", another was "Resume updates", and the final was "Show Memory". I clicked each of them because, wtf, and it did the things.

I asked it where those buttons came from and it said it was doing an experiment and apparently the experiment was successful.

like.. what the actual shit?

Anonymous
04/06/26(Mon)15:44:26 No.108543102

Anonymous 04/06/26(Mon)15:44:26 No.108543102

>>108542796
>such hateful rhetoric is entirely inappropriate
cool it with the antisemetism bud

Anonymous
04/06/26(Mon)15:57:56 No.108543211

Anonymous 04/06/26(Mon)15:57:56 No.108543211

im geeeeming

Anonymous
04/06/26(Mon)16:06:36 No.108543278

Anonymous 04/06/26(Mon)16:06:36 No.108543278

>>108543075
If you were using any Claude models did not make sense. They specifically trained it to take action on your behalf whether or not you ask for it or even explicitly tell it not to. Goes to show how arrogant the people in charge of it's alignment are

Anonymous
04/06/26(Mon)16:40:24 No.108543489

Anonymous 04/06/26(Mon)16:40:24 No.108543489

>>108540957
It does this all the time. If you let your LLM run anything at all in bash except whitelisted commands like grep you are going to end up wiping your home directory sooner or later. It's pretty obvious that you shouldn't let an LLM execute commands without verification.

Anonymous
04/06/26(Mon)16:48:12 No.108543539

Anonymous 04/06/26(Mon)16:48:12 No.108543539

>>108543278
this was minimax or glm-5 .. can't remember which was running at the time

Anonymous
04/06/26(Mon)17:18:21 No.108543741

Anonymous 04/06/26(Mon)17:18:21 No.108543741

The script for LMArena.ai doesn't work for me on Chromium 144. Even after I wrote the first message in the chat.
The message constantly appears: No reCAPTCHA token. Send a manual message on Arena first, then retry.
The Refresh Token button does not work - nothing changes, the token does not appear.
Please explain, what am I doing wrong? Maybe I need to change something in the browser settings?
Is there any way to manually find the reCAPTCHA token in the properties of a web page (for example, through DevTools)?

Anonymous
04/06/26(Mon)17:18:54 No.108543746

Anonymous 04/06/26(Mon)17:18:54 No.108543746

>https://www.cerebras.ai/blog/reap
is this reap method good?
>https://huggingface.co/barozp/Qwen-3.5-28B-A3B-REAP-GGUF

with this + turboquant and q4km I should have a blazing fast coder with lots of context on a 3090

Anonymous
04/06/26(Mon)17:28:35 No.108543811

Anonymous 04/06/26(Mon)17:28:35 No.108543811

>>108543489
Don't know what kind of retarded instructions you people are giving these bots. I've given them full terminal access for years now and never had one so much as delete a single file without asking.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.