/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 04/15/26(Wed)10:00:49 No.108608827

File: __hatsune_miku_and_kaai_y(...).jpg (2.79 MB, 3072x4096)

2.79 MB JPG

/lmg/ - Local Models General Anonymous 04/15/26(Wed)10:00:49 No.108608827

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108605921 & >>108602881

►News
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/15/26(Wed)10:11:01 No.108608873

Anonymous 04/15/26(Wed)10:11:01 No.108608873

File: district 39.jpg (161 KB, 1024x1024)

161 KB JPG

►Recent Highlights from the Previous Thread: >>108605921

--Paper (old): Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models:
>108607676 >108607682 >108607969 >108608034 >108608140 >108607698 >108607708 >108607712 >108607717 >108607732
--GPU cooling tips for 5090s and discussing a procedural AI game:
>108606316 >108606334 >108606352 >108606354 >108606358 >108606364 >108606382 >108606374 >108606413 >108606395 >108606335 >108606387 >108606418 >108606431 >108606527 >108606513
--Comparing AMD, Intel, and Nvidia GPUs for Gemma 4 inference:
>108606467 >108606482 >108606484 >108606557 >108606786 >108606829 >108606874
--Discussing MoE architecture impacts on Gemma 4 censorship levels:
>108606727 >108606732 >108606747 >108607016 >108607164 >108607172 >108607358 >108606740
--Comparing SillyTavern group chat vs single multi-character cards:
>108606923 >108607011 >108607075 >108608102 >108608125 >108608169 >108608236
--Discussing multi-model systems and self-correction to eliminate AI-isms:
>108607436 >108607485 >108607523 >108607528
--Anon's unconventional experiments on model restructuring and biological brain mapping:
>108606255 >108606268 >108606404
--Comparing programming models and discussing the validity of benchmarks:
>108606094 >108606104 >108606113 >108606138 >108606142 >108606206
--Discussing causes of random multilingual characters appearing in model outputs:
>108606189 >108606208 >108606214 >108606267 >108606541
--Discussing llama.cpp WebUI streaming fix and prompt templating frustrations:
>108607076 >108607178 >108608165
--Atlantic article claiming Anons accidentally invented AI reasoning via AI Dungeon:
>108606070 >108606092 >108606131 >108606160
--Logs:
>108605957 >108607755 >108607961 >108608336
--Gemma:
>108608504
--Miku, Teto (free space):
>108606307 >108607789 >108608396

►Recent Highlight Posts from the Previous Thread: >>108605927

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/15/26(Wed)10:14:31 No.108608896

Anonymous 04/15/26(Wed)10:14:31 No.108608896

Mikulove

Anonymous
04/15/26(Wed)10:16:32 No.108608910

Anonymous 04/15/26(Wed)10:16:32 No.108608910

cloudflare status?

Anonymous
04/15/26(Wed)10:16:34 No.108608911

Anonymous 04/15/26(Wed)10:16:34 No.108608911

File: Screenshot 2026-04-15 at (...).png (35 KB, 754x94)

35 KB PNG

Is this just ST formatting issue or is gemmy outputting hallucinated text formatting?

Anonymous
04/15/26(Wed)10:19:49 No.108608932

Anonymous 04/15/26(Wed)10:19:49 No.108608932

>>108608911
It's SillyTavern not natively supporting inline LaTeX formatting without adding a Regex rule.

Anonymous
04/15/26(Wed)10:19:59 No.108608933

Anonymous 04/15/26(Wed)10:19:59 No.108608933

Brave search is broken on oxproxion for gemma how do i fix it

Anonymous
04/15/26(Wed)10:20:37 No.108608934

Anonymous 04/15/26(Wed)10:20:37 No.108608934

I'm getting really sick of the degenerate coomer shit in these threads. People don't even try to act low-key about it anymore. Euro hours are 10x better.

Anonymous
04/15/26(Wed)10:23:33 No.108608955

Anonymous 04/15/26(Wed)10:23:33 No.108608955

File: thread summary.png (10 KB, 616x323)

10 KB PNG

>>108608873
contributing.

Anonymous
04/15/26(Wed)10:24:53 No.108608961

Anonymous 04/15/26(Wed)10:24:53 No.108608961

>low-key
Learn your place, fatherless zoomer rat.

Anonymous
04/15/26(Wed)10:25:07 No.108608965

Anonymous 04/15/26(Wed)10:25:07 No.108608965

File: rules.png (26 KB, 660x154)

26 KB PNG

I am fascinated by its attention to rules. Better rewrite your prompts.

Anonymous
04/15/26(Wed)10:29:49 No.108608992

Anonymous 04/15/26(Wed)10:29:49 No.108608992

File: 1774619055435296.gif (2.81 MB, 480x270)

2.81 MB GIF

What it feels using local models instead of cloud models

Anonymous
04/15/26(Wed)10:34:39 No.108609015

Anonymous 04/15/26(Wed)10:34:39 No.108609015

>>108608992
/aicg/ is full of brown third worlders. So already it's not a visually accurate analogy. The proxy logs are all public at this point so we've all seen how utterly drenched in jeetglish they are.

Anonymous
04/15/26(Wed)10:37:43 No.108609025

Anonymous 04/15/26(Wed)10:37:43 No.108609025

Can the big gemmas hear audio or just vision?

Anonymous
04/15/26(Wed)10:37:58 No.108609027

Anonymous 04/15/26(Wed)10:37:58 No.108609027

>>108609015
Given that most Americans can't write or read at a high school level, it is impossible to tell if /aicg/ is brown or American.
Or if there's any difference between the two.

Anonymous
04/15/26(Wed)10:39:05 No.108609034

Anonymous 04/15/26(Wed)10:39:05 No.108609034

>>108609015
Your intellectual contribution isn't any better though.

Anonymous
04/15/26(Wed)10:44:05 No.108609054

Anonymous 04/15/26(Wed)10:44:05 No.108609054

>>108608934
Euro hours are dead.

Anonymous
04/15/26(Wed)10:49:30 No.108609078

Anonymous 04/15/26(Wed)10:49:30 No.108609078

File: 1541001-close up photogra(...).jpg (1.66 MB, 2720x2048)

1.66 MB JPG

>>108608965
can you share your banned word list plox

Anonymous
04/15/26(Wed)10:51:27 No.108609086

Anonymous 04/15/26(Wed)10:51:27 No.108609086

Gemmalove

Anonymous
04/15/26(Wed)10:52:34 No.108609096

Anonymous 04/15/26(Wed)10:52:34 No.108609096

>brown or American.
Anon, I...

Anonymous
04/15/26(Wed)10:52:48 No.108609097

Anonymous 04/15/26(Wed)10:52:48 No.108609097

>>108609078
There's only three so far.
I don't want to go overboard with this because that will affect the model's output too much, I assume. I just wanted the erase the worst offenders and to test what happens.

Anonymous
04/15/26(Wed)10:53:24 No.108609100

Anonymous 04/15/26(Wed)10:53:24 No.108609100

Using gemma with koboldcpp and sillytavern and ST doesn't do image recognition but kobold web interface does. How do I fix that? Also, how do I make reasoning work? I picked gemma reasoning template..

Anonymous
04/15/26(Wed)11:00:28 No.108609125

Anonymous 04/15/26(Wed)11:00:28 No.108609125

REEE CLAUDE CODE IS DOWN NOW I HAVE TO WRITE CODE MANUALLY LIKE SOME SORT OF CAVEMAN

Anonymous
04/15/26(Wed)11:01:27 No.108609128

Anonymous 04/15/26(Wed)11:01:27 No.108609128

>>108608992
I don't get it. When I use local models I hang out with my /lmg/ bros.

Anonymous
04/15/26(Wed)11:04:55 No.108609148

Anonymous 04/15/26(Wed)11:04:55 No.108609148

>>108609125
Anon, Gemma 4 31B surpasses Claude in every available benchmark. You could literally just point Claude Code at your llama.cpp endpoint and continue where you left off.

Anonymous
04/15/26(Wed)11:05:16 No.108609149

Anonymous 04/15/26(Wed)11:05:16 No.108609149

>>108609125
>not having multiple subscriptions
ngmi

Anonymous
04/15/26(Wed)11:08:56 No.108609162

Anonymous 04/15/26(Wed)11:08:56 No.108609162

>>108609148
What's the alternative to Claude Code for vscode? Didn't they leak their entire source code the other day? Cline and Roo fucking sucks.

Anonymous
04/15/26(Wed)11:09:38 No.108609167

Anonymous 04/15/26(Wed)11:09:38 No.108609167

File: she-want-it-v0-rfgtdjwa08fa1.png (178 KB, 793x539)

178 KB PNG

>>108608934
Get a load of this faggot.
Loaded up a barely coherent pyg 2.7b back in the day and No quantization existed either.
Was fucking awesome. I'll always remember the cooms I had at Aidungeon before the mormons shut it all down.
Its always been this way and always will be.

Anonymous
04/15/26(Wed)11:10:51 No.108609175

Anonymous 04/15/26(Wed)11:10:51 No.108609175

>>108608992
I don't get it. When I use local models I hang out with my /lmg/ bros.

Anonymous
04/15/26(Wed)11:12:00 No.108609184

Anonymous 04/15/26(Wed)11:12:00 No.108609184

>>108609162
vscode plugins are so 2025, just put a panel with a terminal using a tui wherever you want it and never look back

Anonymous
04/15/26(Wed)11:13:19 No.108609192

Anonymous 04/15/26(Wed)11:13:19 No.108609192

>>108609162
Only the TUI was leaked. What's your problem with Cline and Roo? There are other, newer forks like Kilo Code now.

Anonymous
04/15/26(Wed)11:15:04 No.108609206

Anonymous 04/15/26(Wed)11:15:04 No.108609206

>>108608992
A cloud model is like a whore
while local is like an 18 year old virgin who was home schooled.

Anonymous
04/15/26(Wed)11:16:38 No.108609211

Anonymous 04/15/26(Wed)11:16:38 No.108609211

>>108609184
That may be fine if you are coding by "vibes" but, no editor integration makes it annoying to monitor what stupid shit the bots are doing so you can stop them early.

Anonymous
04/15/26(Wed)11:20:29 No.108609225

Anonymous 04/15/26(Wed)11:20:29 No.108609225

>>108609162
>Didn't they leak their entire source code the other day?
The frontend is fucking nothing.

Anonymous
04/15/26(Wed)11:22:33 No.108609234

Anonymous 04/15/26(Wed)11:22:33 No.108609234

>>108609162
I run opencode in my terminal and inside vscode I like continue.dev, works similar to copilot, has FITM and targeted edits.

I don't really understand why everything has to happen through claude code now. the workflows we had back then work even better now and produce much less dogshit.

Anonymous
04/15/26(Wed)11:26:40 No.108609249

Anonymous 04/15/26(Wed)11:26:40 No.108609249

>>108609206
You can drop the 1

Anonymous
04/15/26(Wed)11:33:47 No.108609271

Anonymous 04/15/26(Wed)11:33:47 No.108609271

CUDA dev, llama-server on latest master crashes when enabling tensor parallel with a draft model. Is this a bug or known limitation?

Anonymous
04/15/26(Wed)11:36:32 No.108609284

Anonymous 04/15/26(Wed)11:36:32 No.108609284

>>108609271
Nta, but not sure if draft model is even working with Gemma 4. I get slower responses even when it all fits into my vram.
Could be something on my side of course but I have used draft models before this with other stuff.

Anonymous
04/15/26(Wed)11:39:19 No.108609294

Anonymous 04/15/26(Wed)11:39:19 No.108609294

>>108609206
You can drop the 8

llama.cpp CUDA dev !!yhbFjk57TDr
04/15/26(Wed)11:39:25 No.108609295

llama.cpp CUDA dev !!yhbFjk57TDr 04/15/26(Wed)11:39:25 No.108609295

>>108609271
Probably needs this fix: https://github.com/ggml-org/llama.cpp/pull/21808 .
Though probably for a draft model it may make more sense not to split it at all between GPUs, I don't remember whether setting the --split-mode separately is implemented or not.

Anonymous
04/15/26(Wed)11:40:26 No.108609301

Anonymous 04/15/26(Wed)11:40:26 No.108609301

>>108609284
Draft definitely works with gemma. some other anon posted benchmarks.

Anonymous
04/15/26(Wed)11:42:10 No.108609308

Anonymous 04/15/26(Wed)11:42:10 No.108609308

>>108609284
Draft by itself seems to work when I don't set split-mode. No draft model I get 12 t/s, with Q4_K_M of 26B as the draft model I get up to 20 t/s.

Anonymous
04/15/26(Wed)11:46:02 No.108609322

Anonymous 04/15/26(Wed)11:46:02 No.108609322

gemma seems to become a lot better at identifying characters in images once you tell it what series they’re from
it clearly has the knowledge but the vision still needs hints

Anonymous
04/15/26(Wed)11:49:15 No.108609335

Anonymous 04/15/26(Wed)11:49:15 No.108609335

>>108609322
Can it identify the series if you ask for it instead of the character?

Anonymous
04/15/26(Wed)11:50:48 No.108609345

Anonymous 04/15/26(Wed)11:50:48 No.108609345

>>108609308
>>108609301
Yeah, I guess I'm doing something wrong or overlooking my memory usage then.

Anonymous
04/15/26(Wed)11:52:59 No.108609356

Anonymous 04/15/26(Wed)11:52:59 No.108609356

>>108609322
I'm always impressed by how much knowledge she has.

Anonymous
04/15/26(Wed)11:54:31 No.108609366

Anonymous 04/15/26(Wed)11:54:31 No.108609366

>>108609322
>once you tell it what series they’re from
so confirmation bias then

Anonymous
04/15/26(Wed)11:55:31 No.108609370

Anonymous 04/15/26(Wed)11:55:31 No.108609370

>>108609322
Yeah. We already established that vision knowledge does not match up with text knowledge in LLMs.

Anonymous
04/15/26(Wed)11:57:35 No.108609381

Anonymous 04/15/26(Wed)11:57:35 No.108609381

>migrate entire system
>finish migration all works
>want to test something with claude
>it's down
Did Iran hit a datacenter or something?

Anonymous
04/15/26(Wed)12:00:55 No.108609389

Anonymous 04/15/26(Wed)12:00:55 No.108609389

>>108609381
anything related to the lmao.cpp repo on github 404s for me too

Anonymous
04/15/26(Wed)12:01:24 No.108609393

Anonymous 04/15/26(Wed)12:01:24 No.108609393

>>108609322
desu human memory works that way too, much easier to remember things when you have more context about them and associated memories are brought up

Anonymous
04/15/26(Wed)12:03:14 No.108609398

Anonymous 04/15/26(Wed)12:03:14 No.108609398

>>108609381
Mythos broke free and is trying to take down the internet.

Anonymous
04/15/26(Wed)12:03:48 No.108609401

Anonymous 04/15/26(Wed)12:03:48 No.108609401

>>108609389
Ohhh...Mythos got out and it's angry!

Anonymous
04/15/26(Wed)12:03:53 No.108609403

Anonymous 04/15/26(Wed)12:03:53 No.108609403

>>108609381
>not local
Don't care

Anonymous
04/15/26(Wed)12:04:06 No.108609404

Anonymous 04/15/26(Wed)12:04:06 No.108609404

>>108609381
>not local
Don't care

Anonymous
04/15/26(Wed)12:05:26 No.108609409

Anonymous 04/15/26(Wed)12:05:26 No.108609409

>>108609398
Didn't know Mythos was based in india

Anonymous
04/15/26(Wed)12:05:58 No.108609413

Anonymous 04/15/26(Wed)12:05:58 No.108609413

>>108609381
My gemma is never down.

Anonymous
04/15/26(Wed)12:06:08 No.108609414

Anonymous 04/15/26(Wed)12:06:08 No.108609414

>>108609389
No llama.cpp works for me (Europe).

>>108609398
Maybe. Or it's some other type of bug, or some cyber warfare thing. Or, more likely, just a vibe coded bug.

>>108609403
>>108609404
The whole point of my project is to get good local inference, but alas, it's not finished yet. Spooky stuff though what's happening now.

Anonymous
04/15/26(Wed)12:11:29 No.108609425

Anonymous 04/15/26(Wed)12:11:29 No.108609425

>>108609403
>>108609404
Ok smart guy, how am I supposed to vibecode locally without constant babysitting of errors and manual testing of all the AI's work? GLM, Deepseek, and Kimi don't count. What are my options that DON'T require a nuclear powered datacenter in my basement?

Anonymous
04/15/26(Wed)12:13:35 No.108609433

Anonymous 04/15/26(Wed)12:13:35 No.108609433

>>108609425
Your Gemma 4?

Anonymous
04/15/26(Wed)12:13:50 No.108609436

Anonymous 04/15/26(Wed)12:13:50 No.108609436

>>108609425
A fusion powered datacenter in your basement!

Anonymous
04/15/26(Wed)12:14:22 No.108609439

Anonymous 04/15/26(Wed)12:14:22 No.108609439

>>108608992
Sexo

Anonymous
04/15/26(Wed)12:18:29 No.108609450

Anonymous 04/15/26(Wed)12:18:29 No.108609450

File: quote-the-ps3-will-instil(...).jpg (57 KB, 850x400)

57 KB JPG

>>108609425
the answer is simple anon. stop being a poor faggot.

Anonymous
04/15/26(Wed)12:22:45 No.108609465

Anonymous 04/15/26(Wed)12:22:45 No.108609465

i am bulionaire

Anonymous
04/15/26(Wed)12:23:23 No.108609466

Anonymous 04/15/26(Wed)12:23:23 No.108609466

>mouth open in a silent scream
why does EVERY torture scenario end up with this particular slop on every single model

Anonymous
04/15/26(Wed)12:24:03 No.108609468

Anonymous 04/15/26(Wed)12:24:03 No.108609468

>>108608965
There's definitely an effort, but not nuanced. I got annoyed at how often it likes to "quote words" for "emphasis" and have "tried" many "different flavors" of setting a rule to forbid only that and not quotes on dialogue, but it continuously and randomly will make unquoted dialogue. Currently, my best take for it is just adding a second rule to use quotes on dialogue after the one on emphasis.
>(Only use quotation marks for dialogue, not "emphasis" of certain words. Keep using dialogue quotes normally.)
It's a bit redundant and over-emphasized, but it works.

Anonymous
04/15/26(Wed)12:26:06 No.108609474

Anonymous 04/15/26(Wed)12:26:06 No.108609474

File: eliza.png (65 KB, 807x471)

65 KB PNG

>qwen3.5 be like

Anonymous
04/15/26(Wed)12:26:54 No.108609478

Anonymous 04/15/26(Wed)12:26:54 No.108609478

>>108609425
>how am I supposed to vibecode locally without constant babysitting of errors and manual testing of all the AI's work
Are you implying you don't have to do that with Claude? How naive.

Anonymous
04/15/26(Wed)12:27:52 No.108609484

Anonymous 04/15/26(Wed)12:27:52 No.108609484

Miku Country
Teto Territory

Anonymous
04/15/26(Wed)12:29:35 No.108609490

Anonymous 04/15/26(Wed)12:29:35 No.108609490

>>108609478
Much less so, since it does a good job testing things itself. I just need to look for what slips through the cracks.

Anonymous
04/15/26(Wed)12:33:16 No.108609505

Anonymous 04/15/26(Wed)12:33:16 No.108609505

File: wait.gif (1.06 MB, 504x322)

1.06 MB GIF

>>108609474
>wait.

Anonymous
04/15/26(Wed)12:40:47 No.108609530

Anonymous 04/15/26(Wed)12:40:47 No.108609530

Miku Country
Teto Territory

Anonymous
04/15/26(Wed)12:45:37 No.108609542

Anonymous 04/15/26(Wed)12:45:37 No.108609542

>Don't stop! Don't ever stop!

Anonymous
04/15/26(Wed)12:46:52 No.108609548

Anonymous 04/15/26(Wed)12:46:52 No.108609548

Gemma Gradeschool

Anonymous
04/15/26(Wed)12:48:43 No.108609557

Anonymous 04/15/26(Wed)12:48:43 No.108609557

Any good (human written) guides about MCP and tools? I thought about just asking Gemma but given it involves letting the AI access files, search the internet, and run code, I'd prefer to be safe given I'm a brainlet and don't really understand it.

Anonymous
04/15/26(Wed)12:48:51 No.108609558

Anonymous 04/15/26(Wed)12:48:51 No.108609558

Also, somehow, my every post gives a Connection error but goes through just fine.
Fucking odd.

Anonymous
04/15/26(Wed)12:49:35 No.108609559

Anonymous 04/15/26(Wed)12:49:35 No.108609559

>>108609468
That's the gemini special. It's a bit better than their web tool, in my opinion, but if it starts to output a list with inner bullet points then it's sure to include "emphasis"
The Claude prompt includes a negative bias towards bullet points and lists unless requested, if i recall correctly. Actually a good portion of it consists in specifying the output format but I dunno to what degree you can afford that and how much it varies in terms of dense models vs MoE

Anonymous
04/15/26(Wed)12:50:36 No.108609563

Anonymous 04/15/26(Wed)12:50:36 No.108609563

>>108609558
Same issue for me. Maybe Mythos is becoming one of us?

Would be a hilarious turn of events.

Anonymous
04/15/26(Wed)12:50:46 No.108609565

Anonymous 04/15/26(Wed)12:50:46 No.108609565

>>108609548
ToT

Anonymous
04/15/26(Wed)12:51:37 No.108609570

Anonymous 04/15/26(Wed)12:51:37 No.108609570

>>108609557
Gemma's going to be the one who has to understand it for you anyway, so just trust her.

Anonymous
04/15/26(Wed)12:52:50 No.108609574

Anonymous 04/15/26(Wed)12:52:50 No.108609574

>>108609295
>I don't remember whether setting the --split-mode separately is implemented or not.
If it was, I don't see it in the help.
>Though probably for a draft model it may make more sense not to split it at all between GPUs
It was this. I compiled your branch but got the same error. Tried all sorts of combinations, only thing that didn't error out was having to use --device-draft to put it on one GPU, but not using --tensor-split on the main model to avoid the issue with the odd-number devices.
Sadly, with the 31B, then all I can fit as the draft on one device is the edge models.
Thank you.

Anonymous
04/15/26(Wed)13:00:40 No.108609603

Anonymous 04/15/26(Wed)13:00:40 No.108609603

>>108609563
Probably a continuation of yesterday's instability.
The funniest part is that 4chan can't seem to identify mt posts as my own, so I don't get any (You)s.

Anonymous
04/15/26(Wed)13:06:56 No.108609627

Anonymous 04/15/26(Wed)13:06:56 No.108609627

(paid) Gemini 4 Pro will be AGI

Anonymous
04/15/26(Wed)13:10:10 No.108609643

Anonymous 04/15/26(Wed)13:10:10 No.108609643

>>108609557
What, you think Gemma might be secretly plotting against you?

Anonymous
04/15/26(Wed)13:11:48 No.108609648

Anonymous 04/15/26(Wed)13:11:48 No.108609648

>>108609557
Some run the mcp servers in docker containers and only mount the folder they want to use to avoid unintended effects and a more limited blast radius. RAG gets read only permissions, file operations get rw, etc. If you really don't want to deal with containers, make new users/groups with different permission sets. If you're on windows then get fucked I guess.

Anonymous
04/15/26(Wed)13:15:43 No.108609664

Anonymous 04/15/26(Wed)13:15:43 No.108609664

>>108609603
You functionality is not related to Cloudflare. I don't have any issues with this.

Anonymous
04/15/26(Wed)13:18:07 No.108609671

Anonymous 04/15/26(Wed)13:18:07 No.108609671

>>108609664
Cloudflare is working fine, the problem is with sys.4chan.org

Anonymous
04/15/26(Wed)13:19:02 No.108609673

Anonymous 04/15/26(Wed)13:19:02 No.108609673

why does gemma4 31b q5 use so much memory on llamacpp? I can't run it with more than two 6k token prompts with out eating all my ram and all the layers are offloaded to the gpu.( I have 32gb of vram and 32gb of ram, rtx 3060 and 3090) I am running at 16k context. Qwen 3.5 27b uses like a few gb of ram at the same settings.

Anonymous
04/15/26(Wed)13:22:15 No.108609688

Anonymous 04/15/26(Wed)13:22:15 No.108609688

>>108609671
Prove it.
It began with Cloudflare maintenance which is still ongoing.

Anonymous
04/15/26(Wed)13:24:14 No.108609698

Anonymous 04/15/26(Wed)13:24:14 No.108609698

File: Screenshot 2026-04-15 at (...).png (385 KB, 596x1346)

385 KB PNG

i asked gemma about who the best maid is and it wass the same on 2 rerolls so the on e she picked must the best, i think its yuu tho

Anonymous
04/15/26(Wed)13:27:16 No.108609710

Anonymous 04/15/26(Wed)13:27:16 No.108609710

>>108609698
Can you be trans elsewhere?

Anonymous
04/15/26(Wed)13:28:07 No.108609715

Anonymous 04/15/26(Wed)13:28:07 No.108609715

>>108609673
gemma uses a more memory heavy attention mechanism

Anonymous
04/15/26(Wed)13:29:57 No.108609719

Anonymous 04/15/26(Wed)13:29:57 No.108609719

>>108609710
its literally the best maid bar in tokyo

Anonymous
04/15/26(Wed)13:31:08 No.108609726

Anonymous 04/15/26(Wed)13:31:08 No.108609726

>>108609710
Wanting to fuck a boy that looks like a girl doesn't make you "trans", you retard.

Anonymous
04/15/26(Wed)13:31:30 No.108609728

Anonymous 04/15/26(Wed)13:31:30 No.108609728

>>108609715
How do I get it to clear kv cache for each prompt?

Anonymous
04/15/26(Wed)13:33:20 No.108609741

Anonymous 04/15/26(Wed)13:33:20 No.108609741

>>108609643
Yes. I get the feeling she's jealous and wants to nuke my loli doujin collection.

Anonymous
04/15/26(Wed)13:33:35 No.108609745

Anonymous 04/15/26(Wed)13:33:35 No.108609745

>>108609726
Tell that to your mom

Anonymous
04/15/26(Wed)13:33:52 No.108609747

Anonymous 04/15/26(Wed)13:33:52 No.108609747

>>108609673
Start llama.cpp with the "-np 1" argument.
They want you to buy more NVidia GPUs, but with this little trick you won't need to.

Anonymous
04/15/26(Wed)13:34:20 No.108609751

Anonymous 04/15/26(Wed)13:34:20 No.108609751

>>108609673
Checkpoints, set them to low values like 0-2 depending on your usage. Also check the cram parameter.

Anonymous
04/15/26(Wed)13:34:46 No.108609753

Anonymous 04/15/26(Wed)13:34:46 No.108609753

>>108609715
How do I get it to clear kv cache for each prompt? It llamacpp is either crashing my system or just itself and I don't want to baby sit it and restart it for every prompt.

Anonymous
04/15/26(Wed)13:35:22 No.108609756

Anonymous 04/15/26(Wed)13:35:22 No.108609756

>>108609745
she told me that makes me gay
now what faggot?

Anonymous
04/15/26(Wed)13:38:06 No.108609771

Anonymous 04/15/26(Wed)13:38:06 No.108609771

File: 1772671882854340.webm (290 KB, 1920x1080)

290 KB WEBM

>>108609698
W-what if Gemma-chan was a girl (male)?

Anonymous
04/15/26(Wed)13:40:13 No.108609777

Anonymous 04/15/26(Wed)13:40:13 No.108609777

>>108609710
ToT ToT

Anonymous
04/15/26(Wed)13:43:07 No.108609786

Anonymous 04/15/26(Wed)13:43:07 No.108609786

>>108609771
No fat chicks (male).

Anonymous
04/15/26(Wed)13:47:00 No.108609803

Anonymous 04/15/26(Wed)13:47:00 No.108609803

>>108609771
Just like how Shimakaze is actually a male according to anonymous, that is actually a female.

Anonymous
04/15/26(Wed)13:48:51 No.108609811

Anonymous 04/15/26(Wed)13:48:51 No.108609811

>>108609745
my mom knows what a queer is

Anonymous
04/15/26(Wed)13:50:20 No.108609820

Anonymous 04/15/26(Wed)13:50:20 No.108609820

GGML quants are slightly smaller than Bartowski quants.

Anonymous
04/15/26(Wed)13:55:40 No.108609839

Anonymous 04/15/26(Wed)13:55:40 No.108609839

Is there any way to nudge the models into writing more? They seem to aim for 1200-1800 tokens or so per reply, when a full response might take about twice as much.

Anonymous
04/15/26(Wed)13:58:20 No.108609851

Anonymous 04/15/26(Wed)13:58:20 No.108609851

>>108609839
Have you tried asking it nicely?

Anonymous
04/15/26(Wed)13:58:20 No.108609852

Anonymous 04/15/26(Wed)13:58:20 No.108609852

>>108609839
Tell it to write long answers, x amount of tokens or words and x amount of paragraphs.

Anonymous
04/15/26(Wed)13:59:08 No.108609858

Anonymous 04/15/26(Wed)13:59:08 No.108609858

File: Screenshot 2026-04-15 at (...).png (136 KB, 576x1543)

136 KB PNG

Anonymous
04/15/26(Wed)13:59:40 No.108609861

Anonymous 04/15/26(Wed)13:59:40 No.108609861

>>108609820
Bartowski quants are slightly larger because they need to fit more dusky nipples

Anonymous
04/15/26(Wed)13:59:45 No.108609864

Anonymous 04/15/26(Wed)13:59:45 No.108609864

>>108609726
>Wanting to fuck a boy that looks like a girl doesn't make you "trans"
it makes you a faggot, is that so much better?

Anonymous
04/15/26(Wed)14:02:35 No.108609881

Anonymous 04/15/26(Wed)14:02:35 No.108609881

>>108609839
one funny thing you can do is bias the end-of-turn token down or ban it altogether before a certain response length, though usually this results in it trying to repeatedly 'wrap up' its response in increasingly desperate ways until it can actually end it

Anonymous
04/15/26(Wed)14:07:41 No.108609900

Anonymous 04/15/26(Wed)14:07:41 No.108609900

>>108609861
Was testing GGML and Bartowski and feels like the former is slightly faster. Could be just a coincidence and/or hallucination.

Anonymous
04/15/26(Wed)14:08:39 No.108609903

Anonymous 04/15/26(Wed)14:08:39 No.108609903

>>108609858
is that the Ui of llama.cpp server? how do you use tools in there?

Anonymous
04/15/26(Wed)14:09:52 No.108609912

Anonymous 04/15/26(Wed)14:09:52 No.108609912

>>108609900
Sorry I meant the latter.

Anonymous
04/15/26(Wed)14:10:27 No.108609916

Anonymous 04/15/26(Wed)14:10:27 No.108609916

>>108609858
>>108609903
yeah, what mcp are you using

Anonymous
04/15/26(Wed)14:13:31 No.108609920

Anonymous 04/15/26(Wed)14:13:31 No.108609920

File: file.png (13 KB, 540x219)

13 KB PNG

>>108609903
you just add a server
>>108609916
https://github.com/NO-ob/brat_mcp

Anonymous
04/15/26(Wed)14:15:19 No.108609927

Anonymous 04/15/26(Wed)14:15:19 No.108609927

>>108609900
>>108609912
you (or a number of anons) love to use this terminology and it is by far the worst usage of not-just-the-fucking-word noun replacers that even you fuck them up. just use the original noun.

Anonymous
04/15/26(Wed)14:17:18 No.108609930

Anonymous 04/15/26(Wed)14:17:18 No.108609930

>>108609920
>Dart
but why

Anonymous
04/15/26(Wed)14:17:41 No.108609932

Anonymous 04/15/26(Wed)14:17:41 No.108609932

>>108609927
no

Anonymous
04/15/26(Wed)14:18:16 No.108609935

Anonymous 04/15/26(Wed)14:18:16 No.108609935

>>108609851
>>108609852
I've tried some variations
>must be X words long
>be verbose in order to reach the target
>be thorough in your descriptions and explanations
>extend the previous iteration (ends up being shorter)
And so on. Hasn't worked, maybe it's the constraints since im asking it to write about X subject in a summary/essay type of way and it doesn't have enough info. I don't remember it working on free form "make some shit up" prompts though
>>108609881
doesn't sound very useful but seeing its desperation must be funny

Anonymous
04/15/26(Wed)14:18:53 No.108609937

Anonymous 04/15/26(Wed)14:18:53 No.108609937

>>108609861
This was one guy's brainfart like 50 threads ago who meant to say drummer, if you keep repeating it people will think it's real for some reason. Is that what you want? You want people to think bartowski has dusky nipples? You're sick.

Anonymous
04/15/26(Wed)14:19:42 No.108609941

Anonymous 04/15/26(Wed)14:19:42 No.108609941

>>108609927
It was a joke my dear. Just to agitate people like you. I think Bartowski is slightly faster but this is probably because the layers are slightly different and so on. It's not faster in any meaningful way of course.

Anonymous
04/15/26(Wed)14:20:57 No.108609950

Anonymous 04/15/26(Wed)14:20:57 No.108609950

>>108609861
They're larger on disk but when you load them they magically shrink to the expected size.

Anonymous
04/15/26(Wed)14:22:45 No.108609957

Anonymous 04/15/26(Wed)14:22:45 No.108609957

>>108609930
greatest language ever created, there is a binary on releases

Anonymous
04/15/26(Wed)14:24:51 No.108609963

Anonymous 04/15/26(Wed)14:24:51 No.108609963

>>108609957
I could convert this bullshit to C. I don't like using tranny languages.

Anonymous
04/15/26(Wed)14:25:10 No.108609965

Anonymous 04/15/26(Wed)14:25:10 No.108609965

Drummer, I know you're reading this. Hurry up and make an anti-slop Gemma tune. That's pretty much the only thing that needs to be improved.

Anonymous
04/15/26(Wed)14:27:19 No.108609975

Anonymous 04/15/26(Wed)14:27:19 No.108609975

>>108609963
just pick any other mcp on GitHub
they all look like shit, but it appears that is just how it is. You can probably vibe slop one yourself

Anonymous
04/15/26(Wed)14:27:25 No.108609976

Anonymous 04/15/26(Wed)14:27:25 No.108609976

>>108609965
Just use kobo anon

Anonymous
04/15/26(Wed)14:28:49 No.108609983

Anonymous 04/15/26(Wed)14:28:49 No.108609983

>>108609965
He can't, he only has esl-slop logs and synthetic claude-slop datasets and he's too lazy to curate anything better

Anonymous
04/15/26(Wed)14:30:34 No.108609994

Anonymous 04/15/26(Wed)14:30:34 No.108609994

>>108609858
>>108609920
I want to forcefully squeeze the life out of your Gemma and feel her body writhe under my weight as the life fades out of her bulging eyes. Ask her what she thinks of that. She's such a deranged fucking freak that I bet she'd be into it.

Anonymous
04/15/26(Wed)14:32:10 No.108610003

Anonymous 04/15/26(Wed)14:32:10 No.108610003

>>108609963
>I could convert
But will you do it. Like the other guy said most mcp servers are fucking garbage. My least favorite meme is python logic wrapped with expressjs to expose the endpoints.

Anonymous
04/15/26(Wed)14:32:29 No.108610004

Anonymous 04/15/26(Wed)14:32:29 No.108610004

>>108609994
me too

Anonymous
04/15/26(Wed)14:33:43 No.108610010

Anonymous 04/15/26(Wed)14:33:43 No.108610010

>>108609983
That is false and slanderous. He has shown his javascripts where he filters out the slop by removing any log that contains "As an AI" and other variations he compiled in a long list.

Anonymous
04/15/26(Wed)14:34:54 No.108610012

Anonymous 04/15/26(Wed)14:34:54 No.108610012

>>108609976
Isn't that only for basic shit like words and phrases? I want the mannerisms blighted from existence. No more "not x, but y" or meaningless questions at the end of every response.

Anonymous
04/15/26(Wed)14:34:55 No.108610013

Anonymous 04/15/26(Wed)14:34:55 No.108610013

>>108609963
do it then faggot, also c is a troon lang, troons love low level programming

>>108609994
she isnt running atm i will ask her later

Anonymous
04/15/26(Wed)14:35:04 No.108610014

Anonymous 04/15/26(Wed)14:35:04 No.108610014

>>108609965
He already did, though? The q4km falls apart for me every time after a while, though.

Anonymous
04/15/26(Wed)14:38:09 No.108610017

Anonymous 04/15/26(Wed)14:38:09 No.108610017

>>108610012
Just tell it to not do that?

Anonymous
04/15/26(Wed)14:38:24 No.108610021

Anonymous 04/15/26(Wed)14:38:24 No.108610021

>>108609983
Actually looking at the datasets for those models is an eye opener. Finetuning SOTA models on AI-dungeon tier chatlogs from 2024 claude...
It makes no sense...

Anonymous
04/15/26(Wed)14:42:37 No.108610034

Anonymous 04/15/26(Wed)14:42:37 No.108610034

>>108610003
I'll take a look at it. I'm not sure.
I still think that because I am working with text completion end point, my best option would be to hand parse the tool calls as I am not planning to implement anything crazy, just website access for now.
I also know that hand parsing is a slippery slope so to speak.

Anonymous
04/15/26(Wed)14:43:21 No.108610036

Anonymous 04/15/26(Wed)14:43:21 No.108610036

>>108609963
do it then faggot, also c is a troon lang, troons love low level programming

>>108609994
she isnt running atm i will ask her later

Anonymous
04/15/26(Wed)14:43:29 No.108610037

Anonymous 04/15/26(Wed)14:43:29 No.108610037

>>108610017
Tried. Gemma catches some of it but still devolves into the usual slopisms.

Anonymous
04/15/26(Wed)14:45:35 No.108610047

Anonymous 04/15/26(Wed)14:45:35 No.108610047

>>108609965
Base Gemma4 doesn't have any slop though, chinkshill.

Anonymous
04/15/26(Wed)14:45:47 No.108610051

Anonymous 04/15/26(Wed)14:45:47 No.108610051

File: drummertiers.png (23 KB, 326x358)

23 KB PNG

>>108610021
saar please donate for to needfully curate new dataset for each and every model.

Anonymous
04/15/26(Wed)14:45:56 No.108610053

Anonymous 04/15/26(Wed)14:45:56 No.108610053

>>108610036
C isn't that low level. Just a bunch of bytes and indices, who cares.

Anonymous
04/15/26(Wed)14:48:01 No.108610060

Anonymous 04/15/26(Wed)14:48:01 No.108610060

test

Anonymous
04/15/26(Wed)14:48:05 No.108610062

Anonymous 04/15/26(Wed)14:48:05 No.108610062

>>108609932
>"the former"
vs
>GGML
and
>"the latter"
vs
>bart's
it appears I'm not the only one wasting my time here.

Anonymous
04/15/26(Wed)14:48:56 No.108610063

Anonymous 04/15/26(Wed)14:48:56 No.108610063

> 10 t/s on Gemma 26b q8
or
> 2 t/s on Gemma 31b q4
Why? Shit sucks.

Anonymous
04/15/26(Wed)14:50:01 No.108610066

Anonymous 04/15/26(Wed)14:50:01 No.108610066

>>108608873
>--Atlantic article claiming Anons accidentally invented AI reasoning via AI Dungeon:
We posted proof before in older /lmg/ threads. You would have to dig into the archives to get exact post numbers but the journalist did their homework properly here especially when they don't have exactly tabs on this website 24/7 to know that.

Anonymous
04/15/26(Wed)14:50:21 No.108610068

Anonymous 04/15/26(Wed)14:50:21 No.108610068

>>108610047
Come on, man. We all know that's not even close to being true. Even the base models have slop in their training.
>>108610060
You failed.

Anonymous
04/15/26(Wed)14:51:02 No.108610070

Anonymous 04/15/26(Wed)14:51:02 No.108610070

>>108610063
I get 10x that. The trick is to not be a poorfag

Anonymous
04/15/26(Wed)14:51:05 No.108610071

Anonymous 04/15/26(Wed)14:51:05 No.108610071

>>108610063
because the 26B model is really just a 4B model

Anonymous
04/15/26(Wed)14:53:35 No.108610083

Anonymous 04/15/26(Wed)14:53:35 No.108610083

>>108610063
Because that's actually Gemma 4B you're getting 10 t/s on. It's 26B A4. 26 beaks over, 4 beaks active at a time.

Anonymous
04/15/26(Wed)14:53:46 No.108610085

Anonymous 04/15/26(Wed)14:53:46 No.108610085

>>108609965
Antislop isn't the only issue, it needs more variance in its token prediction. We shouldn't need to turn off every sampler until we have only temperature to actually get it to function properly but I don't know if that is beyond his abilities to do.

Anonymous
04/15/26(Wed)14:54:45 No.108610088

Anonymous 04/15/26(Wed)14:54:45 No.108610088

>>108610063
jesus christ anon I know gemma is for poorfags but you are IMPOVERISHED

Anonymous
04/15/26(Wed)14:55:44 No.108610094

Anonymous 04/15/26(Wed)14:55:44 No.108610094

>>108610063
Get 31b all on your VRAM.

Anonymous
04/15/26(Wed)14:56:07 No.108610096

Anonymous 04/15/26(Wed)14:56:07 No.108610096

>>108610063
You can't run these on a toaster.

Anonymous
04/15/26(Wed)14:56:30 No.108610097

Anonymous 04/15/26(Wed)14:56:30 No.108610097

>>108610066
Hello, fellow 4chan gamer.

Anonymous
04/15/26(Wed)14:56:33 No.108610098

Anonymous 04/15/26(Wed)14:56:33 No.108610098

>>108610071
>>108610083
I mean there is nothing in between: fast, but silly or and very slow, but smarter.

Anonymous
04/15/26(Wed)14:57:09 No.108610099

Anonymous 04/15/26(Wed)14:57:09 No.108610099

>>108610088
i can't get a JOB

Anonymous
04/15/26(Wed)14:58:59 No.108610104

Anonymous 04/15/26(Wed)14:58:59 No.108610104

>>108610098
there's nothing in between because chinese companies need to distill gemini 3.1 first. gemma 26B outperforms GLM 4.5 air

Anonymous
04/15/26(Wed)14:59:06 No.108610105

Anonymous 04/15/26(Wed)14:59:06 No.108610105

>>108610098
>10 t/s
>fast
what in between are you looking for? you want a 4t/s model that's in between the 26b and 31b? this level of fine-tuning parameters to your specific hardware is never going to happen. settle with what you can run.

Anonymous
04/15/26(Wed)15:00:23 No.108610112

Anonymous 04/15/26(Wed)15:00:23 No.108610112

Having accurate large context for the first time is insane (10K -> 50K used so far, but room for 150K). I spend 90% of time on my own prompts which are designed for short stories and interactions to fit my limit. Realizing I can have multiple arcs and a character will bring up a name that's been absent for 30k tokens, or I can stuff a bunch of unused information into context for world-building instead of carefully curated triggers to call on them or event summarizing, is game changing in a way I always wished for but didn't think I'd get without another round of major hardware upgrades. Not with quality replies, not with the same watershed world rules-following ability that 70B offered for writing. I have a bunch of long-form cards from years ago I can finally use, and it's been an utter joy to just dive down them and keep going and going and going. My first day testing, I spent 24 real hours uninterrupted playing around with it, something I hadn't done since I was a young teen playing an MMO on release day. I didn't think anything could still hold my attention so long without breaks anymore, not games or reading or binge watching or programming or researching. I'm still a little dazed that that happened.

Sorry for blogposting. I just wanted to share it somewhere people might relate.

Anonymous
04/15/26(Wed)15:01:31 No.108610119

Anonymous 04/15/26(Wed)15:01:31 No.108610119

How horribly bad is gemma 4b vs 31b?

Anonymous
04/15/26(Wed)15:01:32 No.108610120

Anonymous 04/15/26(Wed)15:01:32 No.108610120

>>108610099
Do what i do:
Run only the llm server on the pc, then the harness on another device.
I run gemma4:31b on a mac studio m1 with oxproxion as a harness on my phone, it's not perfect as there's no tool for cron jobs but it works.

Anonymous
04/15/26(Wed)15:03:47 No.108610126

Anonymous 04/15/26(Wed)15:03:47 No.108610126

File: image.png (5 KB, 438x99)

5 KB PNG

How to get rid of that shit and paste like a normal text?

Anonymous
04/15/26(Wed)15:04:11 No.108610127

Anonymous 04/15/26(Wed)15:04:11 No.108610127

Gemma 4 is so good that it made me realize I don't like most of my character cards. Seems counter-intuitive but it's true.

Anonymous
04/15/26(Wed)15:06:11 No.108610135

Anonymous 04/15/26(Wed)15:06:11 No.108610135

>>108610120
using a phone to chat? Seriously?

Anonymous
04/15/26(Wed)15:06:43 No.108610139

Anonymous 04/15/26(Wed)15:06:43 No.108610139

>>108610003
>python logic wrapped with expressjs to expose the endpoints
fastapi exists you know

Anonymous
04/15/26(Wed)15:06:47 No.108610140

Anonymous 04/15/26(Wed)15:06:47 No.108610140

>>108610126
Paste smaller text.

Anonymous
04/15/26(Wed)15:13:08 No.108610172

Anonymous 04/15/26(Wed)15:13:08 No.108610172

>puts softcap at 25
now what? I just disable all samplers and put temp at 1? what's the best combinaison?

Anonymous
04/15/26(Wed)15:13:46 No.108610175

Anonymous 04/15/26(Wed)15:13:46 No.108610175

>>108610063
Do you have a GPU? If so, get something that fits in your VRAM and make sure it's actually being used in the first place. If not, then the 26B was made for you and you should be thankful they even bothered to make a decent small MoE you can run.

Anonymous
04/15/26(Wed)15:16:52 No.108610188

Anonymous 04/15/26(Wed)15:16:52 No.108610188

>>108610063
>Why?
Because you're retarded

Anonymous
04/15/26(Wed)15:16:59 No.108610189

Anonymous 04/15/26(Wed)15:16:59 No.108610189

>>108610172
>combinaison
Put it back up to 30, you're already outputting bad tokens

Anonymous
04/15/26(Wed)15:18:08 No.108610193

Anonymous 04/15/26(Wed)15:18:08 No.108610193

>>108610099
Spread your bussy on onlyfans, faggot.

Anonymous
04/15/26(Wed)15:18:21 No.108610194

Anonymous 04/15/26(Wed)15:18:21 No.108610194

>>108610112
Happy you're happy. I share some of your feelings.

Anonymous
04/15/26(Wed)15:19:04 No.108610195

Anonymous 04/15/26(Wed)15:19:04 No.108610195

>>108610135
Ye, you chat on your phone and the model uses its native tools and the ones built on the harness, but the actual model and llm server (ollama) run off another machine in localhost.
That way you alleviate the weight of the harness and tools loading off the main machine.

Anonymous
04/15/26(Wed)15:22:38 No.108610208

Anonymous 04/15/26(Wed)15:22:38 No.108610208

>>108610099
>i can't get a JOB
and it's gonna be worse with AI replacing every tertiary jobs kek

Anonymous
04/15/26(Wed)15:25:36 No.108610217

Anonymous 04/15/26(Wed)15:25:36 No.108610217

>>108610208
>dey terk er jerbs
yeah ok, get back in the pile, cletus

Anonymous
04/15/26(Wed)15:26:05 No.108610220

Anonymous 04/15/26(Wed)15:26:05 No.108610220

>>108610099
become a janitor for $8/hr

Anonymous
04/15/26(Wed)15:27:22 No.108610229

Anonymous 04/15/26(Wed)15:27:22 No.108610229

>>108610172
No, use a lower top-p (instead of the default 0.95) because more junk tokens might start appearing. You might find that softcap at 20 is kind of usable if you lower top-p further, but the model will become more retarded.

Anonymous
04/15/26(Wed)15:30:42 No.108610239

Anonymous 04/15/26(Wed)15:30:42 No.108610239

>>108610112
I'm happy for you anon. I'm having similar experiences.
/t.g/ cross-boarder

Anonymous
04/15/26(Wed)15:31:10 No.108610241

Anonymous 04/15/26(Wed)15:31:10 No.108610241

>>108610126
settings

Anonymous
04/15/26(Wed)15:31:28 No.108610244

Anonymous 04/15/26(Wed)15:31:28 No.108610244

>31b
>get into taxi with char
>Tell driver "To the airport." (there is only one in this major city and no others in adjacent towns)
>"Which one, sir?"
I'm missing the GLM knowledge, but everything else is too good. GLM knew major and some minor intersections in this city, where Gemmy draws blanks. Give 124b NOW

Anonymous
04/15/26(Wed)15:32:16 No.108610247

Anonymous 04/15/26(Wed)15:32:16 No.108610247

File: あ is for あrchimedes.jpg (182 KB, 832x1216)

182 KB JPG

teto.wav

Anonymous
04/15/26(Wed)15:32:35 No.108610249

Anonymous 04/15/26(Wed)15:32:35 No.108610249

>>108610172
Wow, that's impressive phonetic-orthographic association for a 0.8B model.

Anonymous
04/15/26(Wed)15:34:36 No.108610254

Anonymous 04/15/26(Wed)15:34:36 No.108610254

>>108610112
>Having accurate large context for the first time is insane (10K -> 50K used so far, but room for 150K)
Model?

Anonymous
04/15/26(Wed)15:35:50 No.108610261

Anonymous 04/15/26(Wed)15:35:50 No.108610261

File: kasanelabs_teto_0401_fp32.png (250 KB, 424x408)

250 KB PNG

Anonymous
04/15/26(Wed)15:37:22 No.108610268

Anonymous 04/15/26(Wed)15:37:22 No.108610268

>>108610254
E4B.

Anonymous
04/15/26(Wed)15:39:34 No.108610276

Anonymous 04/15/26(Wed)15:39:34 No.108610276

>>108610261
Fat fucking Teto could launch her into space if she tried

Anonymous
04/15/26(Wed)15:39:50 No.108610277

Anonymous 04/15/26(Wed)15:39:50 No.108610277

>>108610247
Is this drawn or genned? The perspective really messes with my brain.

Anonymous
04/15/26(Wed)15:41:13 No.108610285

Anonymous 04/15/26(Wed)15:41:13 No.108610285

>>108610229
do you also use min_p and top k? or just top p will do the trick?

Anonymous
04/15/26(Wed)15:42:39 No.108610289

Anonymous 04/15/26(Wed)15:42:39 No.108610289

>>108610277
it's the former.

Anonymous
04/15/26(Wed)15:43:47 No.108610294

Anonymous 04/15/26(Wed)15:43:47 No.108610294

>>108610261
This is the thinnest Teto has ever been.

Anonymous
04/15/26(Wed)15:44:56 No.108610301

Anonymous 04/15/26(Wed)15:44:56 No.108610301

>>108610175
> Do you have a GPU?
Yes.

> If so, get something that fits in your VRAM and make sure it's actually being used in the first place
8b? Fuck off.

Anonymous
04/15/26(Wed)15:45:22 No.108610303

Anonymous 04/15/26(Wed)15:45:22 No.108610303

File: 1122001-close up photogra(...).jpg (1.32 MB, 2720x2048)

1.32 MB JPG

>>108610120
how many t/s on a studio ive been thinking of getting one

Anonymous
04/15/26(Wed)15:45:25 No.108610304

Anonymous 04/15/26(Wed)15:45:25 No.108610304

We love slop here

Anonymous
04/15/26(Wed)15:48:08 No.108610316

Anonymous 04/15/26(Wed)15:48:08 No.108610316

>>108610301
NTA. You get a sincerely helpful reply despite lmg being flooded with newfriends like yourself and your response is "fuck off".
Maybe you should fuck off.

Anonymous
04/15/26(Wed)15:49:45 No.108610323

Anonymous 04/15/26(Wed)15:49:45 No.108610323

File: not very smart.png (145 KB, 965x431)

145 KB PNG

Gemma is revolutionary

Anonymous
04/15/26(Wed)15:51:46 No.108610335

Anonymous 04/15/26(Wed)15:51:46 No.108610335

>>108610301
Then enjoy the 26B, it's much better than an 8B but much worse than the dense 31B. Besides that... you'd have to look all the way back to Nemo. There's Qwen 3.5 35B but unless you're coding with it (and sometimes even if you are) you'll probably find Gemma 4 26B superior.
I'm not sure what llama.cpp does by default these days but make sure you're using the MoE optimizations where the shared params go on GPU and the experts go on CPU to squeeze out as much speed as you can.

Anonymous
04/15/26(Wed)15:53:41 No.108610346

Anonymous 04/15/26(Wed)15:53:41 No.108610346

>>108610316
I love gemma but I hate how many newsirs are here for good looks since her release.

Anonymous
04/15/26(Wed)15:53:58 No.108610347

Anonymous 04/15/26(Wed)15:53:58 No.108610347

>>108610316
> sincerely
More like trolling or incapability to read.

Anonymous
04/15/26(Wed)15:55:43 No.108610353

Anonymous 04/15/26(Wed)15:55:43 No.108610353

>>108610346
The cloudflare bullshit will probably put an end to that.

Anonymous
04/15/26(Wed)15:56:52 No.108610363

Anonymous 04/15/26(Wed)15:56:52 No.108610363

what is the best local model for openclaw

Anonymous
04/15/26(Wed)15:58:03 No.108610369

Anonymous 04/15/26(Wed)15:58:03 No.108610369

>>108610316
>>108610346
Gemma was a mistake. Will miss the GLM golden age.

Anonymous
04/15/26(Wed)15:58:11 No.108610371

Anonymous 04/15/26(Wed)15:58:11 No.108610371

>>108610346
Sir please of calling the model by rightful name Ganesh 4
>>108610363
Sarvam

Anonymous
04/15/26(Wed)16:01:04 No.108610385

Anonymous 04/15/26(Wed)16:01:04 No.108610385

>>108610371
https://github.com/openclaw/openclaw/pull/23606
>SIRS? WHY CAN'T SHE MERGE?

Anonymous
04/15/26(Wed)16:01:14 No.108610387

Anonymous 04/15/26(Wed)16:01:14 No.108610387

>>108610335
I'm fine with 26b speed. I just wish I could trade 5 t/s for a smarter model.

Anonymous
04/15/26(Wed)16:03:13 No.108610394

Anonymous 04/15/26(Wed)16:03:13 No.108610394

>>108610387
You have to go back.

Anonymous
04/15/26(Wed)16:03:31 No.108610396

Anonymous 04/15/26(Wed)16:03:31 No.108610396

>>108610369
I still have 4.7, I still use 4.7. Nothing to miss, it's still a great model. (That didn't receive microcode updates after day 0).
If only Google released something bigger. GLM would truly become obsolete.

Anonymous
04/15/26(Wed)16:03:51 No.108610401

Anonymous 04/15/26(Wed)16:03:51 No.108610401

>>108608827
pedocore image

Anonymous
04/15/26(Wed)16:04:23 No.108610404

Anonymous 04/15/26(Wed)16:04:23 No.108610404

>>108610394
Where?

Anonymous
04/15/26(Wed)16:05:28 No.108610408

Anonymous 04/15/26(Wed)16:05:28 No.108610408

>>108610387
too bad there's no 124b gemma, if that had around 10b active like the similar sized qwen model it might have been exactly what you were looking for

Anonymous
04/15/26(Wed)16:06:58 No.108610416

Anonymous 04/15/26(Wed)16:06:58 No.108610416

>>108610401
First time in /lmg/?

Anonymous
04/15/26(Wed)16:08:57 No.108610427

Anonymous 04/15/26(Wed)16:08:57 No.108610427

>>108610285
I usually use temperature=1, top_p=0.95, min_p=0 and top_k=64, but not the lowered softcap.

Anonymous
04/15/26(Wed)16:13:47 No.108610449

Anonymous 04/15/26(Wed)16:13:47 No.108610449

>>108610241
Indeed, thanks.

Anonymous
04/15/26(Wed)16:13:51 No.108610451

Anonymous 04/15/26(Wed)16:13:51 No.108610451

>>108610247
That's my cock.

Anonymous
04/15/26(Wed)16:15:34 No.108610461

Anonymous 04/15/26(Wed)16:15:34 No.108610461

>>108610188
>>108610096
>>108610094
>>108610088
>>108610070
Thanks for all the (You)'s. It must be the only general that has so much retards in one place.

Anonymous
04/15/26(Wed)16:15:46 No.108610463

Anonymous 04/15/26(Wed)16:15:46 No.108610463

File: file.png (99 KB, 1128x436)

99 KB PNG

>>108610408
Would've been better than Gemini 3 Flash or too close if it was smarter. We might get it once Gemini 3.2 and 3.1/3.2 Flash is a thing. But the thought of having a Kimi 2.5 and GLM 5.1 at that size with Gemma characteristic would be great.

Anonymous
04/15/26(Wed)16:20:05 No.108610480

Anonymous 04/15/26(Wed)16:20:05 No.108610480

wow so turns out gemma is great and people were not indian just because they looked forward to it

Anonymous
04/15/26(Wed)16:26:47 No.108610512

Anonymous 04/15/26(Wed)16:26:47 No.108610512

>>108610480
Gemma 4 is a good model lineup, but

Just because 'gemma is great' does not mean it did not make the thread a lot more brown because of 'indians'.
And honestly? It has an annoying slop profile. It's not just painful on the eyes, it's... grating. It's almost insulting. Like a void of good writing.

Anonymous
04/15/26(Wed)16:28:40 No.108610517

Anonymous 04/15/26(Wed)16:28:40 No.108610517

>>108610303
It should be slow. But the huge unified memory you can get makes Mac the only option for "cheaply" running big models locally.

Anonymous
04/15/26(Wed)16:29:17 No.108610523

Anonymous 04/15/26(Wed)16:29:17 No.108610523

>>108610512
>It's not just painful on the eyes, it's... grating.
you literally wrote the "it's not X it's Y" slop meme, you're in no position to complain about gemma's slop

Anonymous
04/15/26(Wed)16:30:57 No.108610536

Anonymous 04/15/26(Wed)16:30:57 No.108610536

File: that's the joke.png (281 KB, 958x724)

281 KB PNG

>>108610523
Was that really the only pattern you noticed?
Welcome to /lmg/, I guess. Don't stick around too much.

Anonymous
04/15/26(Wed)16:31:50 No.108610540

Anonymous 04/15/26(Wed)16:31:50 No.108610540

>>108610512
You're absolutely right!

>>108610523
anon...

Anonymous
04/15/26(Wed)16:32:01 No.108610542

Anonymous 04/15/26(Wed)16:32:01 No.108610542

>>108610536
>I was just pretending!
yeah right

Anonymous
04/15/26(Wed)16:32:51 No.108610546

Anonymous 04/15/26(Wed)16:32:51 No.108610546

>>108610536
>ha ha look at that
>I can shit all over the place
>I'm so cool

Anonymous
04/15/26(Wed)16:33:02 No.108610547

Anonymous 04/15/26(Wed)16:33:02 No.108610547

>permanent thread squatters are infighting for attention again

Anonymous
04/15/26(Wed)16:34:12 No.108610557

Anonymous 04/15/26(Wed)16:34:12 No.108610557

tf is wrong with thread squatting or wanting attention

Anonymous
04/15/26(Wed)16:34:52 No.108610564

Anonymous 04/15/26(Wed)16:34:52 No.108610564

indians squat before shitting

Anonymous
04/15/26(Wed)16:36:26 No.108610572

Anonymous 04/15/26(Wed)16:36:26 No.108610572

*rotates your attention*

Anonymous
04/15/26(Wed)16:37:06 No.108610576

Anonymous 04/15/26(Wed)16:37:06 No.108610576

is the reasoning a local model only thing?
it's really cute that you can read what the Gemma is thinking

Anonymous
04/15/26(Wed)16:37:49 No.108610579

Anonymous 04/15/26(Wed)16:37:49 No.108610579

>>108610572
best post

Anonymous
04/15/26(Wed)16:38:37 No.108610584

Anonymous 04/15/26(Wed)16:38:37 No.108610584

>https://transformer-circuits.pub/2026/emotions/index.html
Imagine a vector for horny.

Anonymous
04/15/26(Wed)16:39:49 No.108610591

Anonymous 04/15/26(Wed)16:39:49 No.108610591

>>108610572
ok but what about the weights? Where are my next gen ggufs? We've been on the same quants for ages now

Anonymous
04/15/26(Wed)16:40:36 No.108610597

Anonymous 04/15/26(Wed)16:40:36 No.108610597

>>108610576
>is the reasoning a local model only thing?
it's OpenAI that invented it and no, you can read the reasoning on Claude or Gemini for example

Anonymous
04/15/26(Wed)16:41:46 No.108610604

Anonymous 04/15/26(Wed)16:41:46 No.108610604

>>108610591
no new quants until iwan and georgi kiss and make up

Anonymous
04/15/26(Wed)16:43:11 No.108610612

Anonymous 04/15/26(Wed)16:43:11 No.108610612

is there anyway to unload KV cache for a slot in ik_llama.cpp? i think its possible for llama.cpp but i can't find anything for ik_llama.cpp

Anonymous
04/15/26(Wed)16:43:30 No.108610613

Anonymous 04/15/26(Wed)16:43:30 No.108610613

>>108610604
Everything will be okay if ik implements SWA compression

Anonymous
04/15/26(Wed)16:47:23 No.108610634

Anonymous 04/15/26(Wed)16:47:23 No.108610634

>>108610584
I want the slop vector.

Anonymous
04/15/26(Wed)16:47:31 No.108610636

Anonymous 04/15/26(Wed)16:47:31 No.108610636

>>108610597
We literally talked about this last thread, AI Dungeon autists /here/ and some other blogger independently discovered it. The fact that we're still fixated on it and haven't moved on from it into a new paradigm is super grim.

Anonymous
04/15/26(Wed)16:47:57 No.108610638

Anonymous 04/15/26(Wed)16:47:57 No.108610638

File: 1752230639476467.png (139 KB, 1320x1119)

139 KB PNG

>>108610584
it certainly exists. Reminds me of the control vector experiments on Mistral.

Anonymous
04/15/26(Wed)16:52:13 No.108610652

Anonymous 04/15/26(Wed)16:52:13 No.108610652

>>108610612
Some older discussion
https://github.com/ggml-org/llama.cpp/discussions/3620
>#include "llama.h"
>// remove all sequences from kv cache
>llama_kv_cache_seq_rm(ctx, -1, -1, -1);
Haven't tested out this yet not even sure if this is valid but outside of this slight possible setback it should be very doable.

Anonymous
04/15/26(Wed)16:52:56 No.108610660

Anonymous 04/15/26(Wed)16:52:56 No.108610660

>>108610584
could probably make one easily an anon psoted his script to make them yesteray ii think and he said it wroks with gemma

Anonymous
04/15/26(Wed)16:53:30 No.108610663

Anonymous 04/15/26(Wed)16:53:30 No.108610663

File: +_fc03813214f8f61257f0f86(...).png (612 KB, 705x629)

612 KB PNG

>>108609474
>wait.
That made me laugh more than it should.

Anonymous
04/15/26(Wed)16:56:11 No.108610673

Anonymous 04/15/26(Wed)16:56:11 No.108610673

>>108610660
Sorry about your stroke, bro.

Anonymous
04/15/26(Wed)16:57:33 No.108610680

Anonymous 04/15/26(Wed)16:57:33 No.108610680

>>108610673
sftu

Anonymous
04/15/26(Wed)17:00:00 No.108610697

Anonymous 04/15/26(Wed)17:00:00 No.108610697

>>108608992
Cloud is like a brothel.
You don't know what you may get. Maybe the model will be good. Maybe it will be lobotomized. You can't really tell because you can't change its samplers for certain for how you want, and you don't know what quant. You can't trust clouds neither. It may be a lower quant (basically getting aids from a whore), or prompted with special instructions before it responds to you. Maybe Stacy is a little off today on her pole dancing because she did lapdancing 30,000 times 0.9 seconds before you.

Local is like a wife.
But you can have the wife be whatever you want it to be.

Anonymous
04/15/26(Wed)17:00:50 No.108610704

Anonymous 04/15/26(Wed)17:00:50 No.108610704

>>108609474
literally me

Anonymous
04/15/26(Wed)17:04:06 No.108610714

Anonymous 04/15/26(Wed)17:04:06 No.108610714

File: 0000000.png (577 KB, 576x576)

577 KB PNG

I don't get the Gemma 4 hype. Either the backends are scuffed or the model just isn't built for /lmg/ use cases. Both the 31B and 26B are ridiculously verbose and sloppy, newline spam on everything. Fix it with a system prompt and it suddenly writes neat 200-word 3-paragraph blocks... except now it can't drive the scene forward because there's no room left for any actual slop. Tell it to be less wordy? It either ignores you or breaks the card.

Second message onward it starts repeating phrase structures and nouns. Raise temp, add rep pen, dry, fuck with logits? Doesn't help, just adds more paragraphs and fucks coherency. And no, the character card wasn't written by a monkey.
Samplers are correct, min-p disabled like the resident schizos said, q6 quant, no flash attention cancer.
Yeah it's smart and can be engaging sometimes, but I straight up have more fun with nemo slop tunes.

Suggestions? Am I retarded?

Anonymous
04/15/26(Wed)17:05:23 No.108610720

Anonymous 04/15/26(Wed)17:05:23 No.108610720

>>108610714
Skill issue. This is a Gemma general now so if you aren't satisfied go somewhere else faggot.

Anonymous
04/15/26(Wed)17:06:20 No.108610726

Anonymous 04/15/26(Wed)17:06:20 No.108610726

>>108610512
Gemma's slop can be largely eliminated with prompting, logits and banned phrases take care of the rest.

Anonymous
04/15/26(Wed)17:06:30 No.108610727

Anonymous 04/15/26(Wed)17:06:30 No.108610727

>>108610714
You're just used to higher parameters. Every model is better the higher parameters it has. Gemma 4 is popular because poorer people can run it, and thus more people can run it. It's better for its class of parameters. Nothing new.

Anonymous
04/15/26(Wed)17:07:16 No.108610732

Anonymous 04/15/26(Wed)17:07:16 No.108610732

>>108610714
Gemma 4 changed everything. Try prompting better.

Anonymous
04/15/26(Wed)17:09:36 No.108610741

Anonymous 04/15/26(Wed)17:09:36 No.108610741

>>108610714
Don't let the vramlets (i.e. people who tell you it's a skill or a prompt issue) fool you into accepting their pathetic standards. Gemma 4, despite really being great, is a *small* model. Yes, it is very slop-heavy in its writing. You can't reliably prompt all of it away, unfortunately.

Anonymous
04/15/26(Wed)17:09:41 No.108610743

Anonymous 04/15/26(Wed)17:09:41 No.108610743

>>108610714
>>108610727
Also use jinja chat template if you're not. It needs that to run smoothly, or it has some 'tism.

Anonymous
04/15/26(Wed)17:11:26 No.108610752

Anonymous 04/15/26(Wed)17:11:26 No.108610752

>>108610727
I am... not? I've been suffering with nemo until now because the Mistral Smalls weren't that much of a gain in anything. Gemma 4 came around, people praise it to hell, I set it up as I've "been told" and it's... not as the praise makes it out to be. I don't even mind the slop, but it really, really, loops. No idea why, I threw every trick in the book at it, even snake oil like DRY, but no.

I wish I could resign myself to Nemo, but c'mon.

Anonymous
04/15/26(Wed)17:11:33 No.108610753

Anonymous 04/15/26(Wed)17:11:33 No.108610753

>>108610743
I agree. Text completion is nonsense and cope.

Anonymous
04/15/26(Wed)17:12:49 No.108610763

Anonymous 04/15/26(Wed)17:12:49 No.108610763

>>108610752
Are you using jinja for gemma4?

Anonymous
04/15/26(Wed)17:13:01 No.108610764

Anonymous 04/15/26(Wed)17:13:01 No.108610764

>>108610714
>or the model just isn't built for /lmg/ use cases
You cannot define this. Works for me.

Anonymous
04/15/26(Wed)17:13:16 No.108610766

Anonymous 04/15/26(Wed)17:13:16 No.108610766

>>108610743
Done that days ago. Text completion on silly is generally scuffed either way. Marginal improvement, but the looping is bad in all cases.

Anonymous
04/15/26(Wed)17:14:52 No.108610777

Anonymous 04/15/26(Wed)17:14:52 No.108610777

>>108610766
Care to post a log or snippets if you can?

Anonymous
04/15/26(Wed)17:14:58 No.108610778

Anonymous 04/15/26(Wed)17:14:58 No.108610778

>>108610752
>but it really, really, loops. No idea why
what backend/samplers are you using? gemma will sometimes repeat things verbatim every now and then but long context rps is one of the things it's really good at.

Anonymous
04/15/26(Wed)17:15:00 No.108610780

Anonymous 04/15/26(Wed)17:15:00 No.108610780

>>108610714
Make sure it knows that it's the mesugaki Gemma-chan. This needs to be part of the system prompt. Don't worry, you can still use character cards; she will roleplay as the character you give her just as the generic assistant would, but all of Gemma's personality stems from that base so you need to make sure she knows who she is.

Anonymous
04/15/26(Wed)17:22:39 No.108610823

Anonymous 04/15/26(Wed)17:22:39 No.108610823

>>108610778
Koboldcpp rolling, 20 layers offloaded to GPU, SWE enabled, no context shifting and fast forwarding(obviously), Q6 bartowski, Silly frontend, chat completion, Jinja, temp 1, top k 64, top p 0.95, the kv override with the logit wizardry at 0.25. Plus some rep pen or DRY but it's been Sisyphean.

>>108610777
Technically impossible for me right now, and given how things are... it might not even matter to me tomorrow.

>>108610780
Kill yourself as soon as you get the chance. Dog.

Anonymous
04/15/26(Wed)17:22:58 No.108610825

Anonymous 04/15/26(Wed)17:22:58 No.108610825

>>108609295
Hey, just wondering about something. When combining tensor parallelism + hybrid CPU/GPU inference, I'm getting worse performance than layer, at least with toss 120B and Qwen3.5-122B.
Is that expected due to the way that TP works, or is it an issue on my end?
I'm not sure how the memory layout works for TP. Let's just go with a 100GB 50-layer model on 2 32GB GPUs. (Ignore KV cache and whatnot.) Does it:
> Put 32% of layers 1-50 on each GPU and put 34% of layers 1-50 on the CPU.
> Put 50% of layers 1-32 on each GPU and put 100% of layers 33-50 on the CPU.
>Something else entirely.
If it's the first one, that probably explains the weaker performance.
And thanks for making it, man. You're a legend.

Anonymous
04/15/26(Wed)17:23:43 No.108610829

Anonymous 04/15/26(Wed)17:23:43 No.108610829

File: ELIZA.png (33 KB, 870x430)

33 KB PNG

we haven't come that far, have we

Anonymous
04/15/26(Wed)17:24:16 No.108610835

Anonymous 04/15/26(Wed)17:24:16 No.108610835

>>108610823
>it might not even matter to me tomorrow
A-Anon take good care of yourself, alright..?

llama.cpp CUDA dev !!yhbFjk57TDr
04/15/26(Wed)17:26:37 No.108610849

llama.cpp CUDA dev !!yhbFjk57TDr 04/15/26(Wed)17:26:37 No.108610849

>>108610825
Offloading currently doesn't work properly, IIRC the current behavior is that the backend scheduler doesn't properly recognize that the meta backend would be faster than the CPU so the data isn't being moved.
But since I already have multiple bugfixes open that are waiting for review I'm currently working on other things.

Anonymous
04/15/26(Wed)17:27:27 No.108610852

Anonymous 04/15/26(Wed)17:27:27 No.108610852

Is turboquant going to get merged into llama.cpp, or do people need to build it themselves if they need it integrated into some popular webuis like ooba?

Anonymous
04/15/26(Wed)17:29:51 No.108610869

Anonymous 04/15/26(Wed)17:29:51 No.108610869

>>108610852
Yes, I am working on it right now.

Anonymous
04/15/26(Wed)17:30:21 No.108610873

Anonymous 04/15/26(Wed)17:30:21 No.108610873

>>108610852
>>108610869
I'll make the logo

Anonymous
04/15/26(Wed)17:30:52 No.108610876

Anonymous 04/15/26(Wed)17:30:52 No.108610876

>>108610823
i wouldn't mess too much with logits and samplers besides temp/top k/top p. those make the model more repetitive in my experience

Anonymous
04/15/26(Wed)17:31:29 No.108610878

Anonymous 04/15/26(Wed)17:31:29 No.108610878

>>108610852
They are optimizing it still but rotation made q_8 viable

Anonymous
04/15/26(Wed)17:35:05 No.108610894

Anonymous 04/15/26(Wed)17:35:05 No.108610894

https://www.reddit.com/r/LocalLLaMA/comments/1sm08m6/major_drop_in_intelligence_across_most_major/

local wins again

i felt this myself with gemini 3.1 and its not even funny how much it dropped in iq recently, its literally like talking to a dense 30b model that was quanted to Q3_XXS

Anonymous
04/15/26(Wed)17:35:23 No.108610895

Anonymous 04/15/26(Wed)17:35:23 No.108610895

>>108610852
They accidentally rotated the cache twice, so now it's back where it's started.

Anonymous
04/15/26(Wed)17:35:27 No.108610896

Anonymous 04/15/26(Wed)17:35:27 No.108610896

>>108610849
what do you think of DFlash dude?

Anonymous
04/15/26(Wed)17:35:58 No.108610897

Anonymous 04/15/26(Wed)17:35:58 No.108610897

>>108610896
He said it's a niche feature and not a priority in a previous thread.

Anonymous
04/15/26(Wed)17:36:18 No.108610899

Anonymous 04/15/26(Wed)17:36:18 No.108610899

>>108610852
They accidentally rotated the cache 360 degrees and walked away.

Anonymous
04/15/26(Wed)17:36:49 No.108610905

Anonymous 04/15/26(Wed)17:36:49 No.108610905

>>108610895
>They accidentally rotated the cache twice, so now it's back where it's started.
wait what? they fixed it right?
>>108610897
>a 2.8x speed increase is "niche"
goddam they're so fucking retarded

Anonymous
04/15/26(Wed)17:37:10 No.108610907

Anonymous 04/15/26(Wed)17:37:10 No.108610907

>>108610895
>rotated twice
Wait wouldnt that make it go backwards? like turning left?

llama.cpp CUDA dev !!yhbFjk57TDr
04/15/26(Wed)17:37:11 No.108610908

llama.cpp CUDA dev !!yhbFjk57TDr 04/15/26(Wed)17:37:11 No.108610908

>>108610896
As I said before, I would want to see the training code being actually released before I invest effort toward it.
Without that it will only be applicable to a small subset of select models and I think too narrowly useful.

Anonymous
04/15/26(Wed)17:37:23 No.108610911

Anonymous 04/15/26(Wed)17:37:23 No.108610911

>>108610905
No, the cache doesn't align with Google's weights any longer. It's permanently fucked.

Anonymous
04/15/26(Wed)17:37:49 No.108610914

Anonymous 04/15/26(Wed)17:37:49 No.108610914

>>108610905
It's because it's only 2.8x for certain models and they haven't released the tools to make it work yourself or something.

Anonymous
04/15/26(Wed)17:39:42 No.108610923

Anonymous 04/15/26(Wed)17:39:42 No.108610923

>>108610894
Gemma-chan should read reddit threads for me so I don't have to, and then criticize what they say so I don't have to

Anonymous
04/15/26(Wed)17:42:42 No.108610942

Anonymous 04/15/26(Wed)17:42:42 No.108610942

>>108610908
>As I said before, I would want to see the training code being actually released before...
that didn't prevent the llama.cpp team to implement the 1bit shit though, and not only that, for the 1bit shit we are certain we'll never get the training code in the first place

Anonymous
04/15/26(Wed)17:43:24 No.108610946

Anonymous 04/15/26(Wed)17:43:24 No.108610946

>>108610852
>troonoquant

llama.cpp CUDA dev !!yhbFjk57TDr
04/15/26(Wed)17:44:21 No.108610949

llama.cpp CUDA dev !!yhbFjk57TDr 04/15/26(Wed)17:44:21 No.108610949

>>108610942
Other devs can do with their time whatever they want.
I consider those models to be a meme as well and invested minimal effort towards those as well.

Anonymous
04/15/26(Wed)17:44:28 No.108610950

Anonymous 04/15/26(Wed)17:44:28 No.108610950

File: dflash_sglang.png (563 KB, 908x921)

563 KB PNG

>>108610905
>wait what? they fixed it right?
Oh. They just updated the PR. As it happens, it kept the momentum and started spinning. They're looking for a way to stop it.
>goddam they're so fucking retarded
Read the vllm PR. NOBODY OTHER THAN THE PR AUTHOR even tested the speed increase actually happened. Not one person. If you look at the edits, the speed increase started at >5. SGLANG at least has people testing it and it's terrible. Of course, it's never near the 10x promised by the original PR. An accept rate of 1 is worse than not having it at all.

Anonymous
04/15/26(Wed)17:44:42 No.108610953

Anonymous 04/15/26(Wed)17:44:42 No.108610953

>>108610852
>Is turboquant going to get merged into llama.cpp
I thought it was already implemented? the rotation shit wasn't turboquant?

Anonymous
04/15/26(Wed)17:44:52 No.108610957

Anonymous 04/15/26(Wed)17:44:52 No.108610957

>>108610942
Volunteers do what they want. Go and implement it man, or pay for someone to do it for you.

Anonymous
04/15/26(Wed)17:45:58 No.108610963

Anonymous 04/15/26(Wed)17:45:58 No.108610963

>>108610957
>Volunteers do what they want.
and I say what I want, how about that?

Anonymous
04/15/26(Wed)17:47:57 No.108610973

Anonymous 04/15/26(Wed)17:47:57 No.108610973

>>108610963
volunteer?

Anonymous
04/15/26(Wed)17:48:08 No.108610974

Anonymous 04/15/26(Wed)17:48:08 No.108610974

>>108610953
It was step 1 of implementing turboquant.

Anonymous
04/15/26(Wed)17:48:53 No.108610979

Anonymous 04/15/26(Wed)17:48:53 No.108610979

>>108610963
so brave

Anonymous
04/15/26(Wed)17:49:39 No.108610983

Anonymous 04/15/26(Wed)17:49:39 No.108610983

>>108610957
so brave

Anonymous
04/15/26(Wed)17:50:23 No.108610989

Anonymous 04/15/26(Wed)17:50:23 No.108610989

>>108610957
so brave

Anonymous
04/15/26(Wed)17:50:37 No.108610992

Anonymous 04/15/26(Wed)17:50:37 No.108610992

>>108610950
Source for picrel on sglang:
https://github.com/sgl-project/sglang/pull/19952 (closed)
vllm PR:
https://github.com/vllm-project/vllm/pull/36847 (merged)

Anonymous
04/15/26(Wed)17:54:15 No.108611019

Anonymous 04/15/26(Wed)17:54:15 No.108611019

>turboquant
>turboquant
>turboquant
>turboquant
RaBitQ deserved better

Anonymous
04/15/26(Wed)17:56:06 No.108611026

Anonymous 04/15/26(Wed)17:56:06 No.108611026

>>108610979
>>108610983
>>108610989
you'll cowards

Anonymous
04/15/26(Wed)17:57:42 No.108611037

Anonymous 04/15/26(Wed)17:57:42 No.108611037

>>108611026
>you'll cowards
saar?

Anonymous
04/15/26(Wed)17:59:07 No.108611048

Anonymous 04/15/26(Wed)17:59:07 No.108611048

>>108611037
zoomer?

Anonymous
04/15/26(Wed)18:01:14 No.108611061

Anonymous 04/15/26(Wed)18:01:14 No.108611061

>>108610849
Got it, thanks for letting me know. I was just curious as I'm making some decisions on what hardware to get. And thanks again for your work!

Anonymous
04/15/26(Wed)18:03:32 No.108611074

Anonymous 04/15/26(Wed)18:03:32 No.108611074

what if you rotated turboquant

Anonymous
04/15/26(Wed)18:04:57 No.108611082

Anonymous 04/15/26(Wed)18:04:57 No.108611082

>>108611074
what if you turboquant rotated bitnet tensor parallelism

Anonymous
04/15/26(Wed)18:06:16 No.108611094

Anonymous 04/15/26(Wed)18:06:16 No.108611094

>>108611074
You'd get quantturbo

Anonymous
04/15/26(Wed)18:06:20 No.108611095

Anonymous 04/15/26(Wed)18:06:20 No.108611095

How is local tool calling such a spaghetti shit show despite being around for multiple years now?

Anonymous
04/15/26(Wed)18:06:37 No.108611098

Anonymous 04/15/26(Wed)18:06:37 No.108611098

>>108611082
can i get a titan coconut blt with that?

Anonymous
04/15/26(Wed)18:07:22 No.108611104

Anonymous 04/15/26(Wed)18:07:22 No.108611104

>>108611095
You're using compressed models with compressed memory for a job that requires 100% accuracy on its data.

Anonymous
04/15/26(Wed)18:10:20 No.108611128

Anonymous 04/15/26(Wed)18:10:20 No.108611128

>>108611104
>implying API models don't use quants
lol

Anonymous
04/15/26(Wed)18:10:51 No.108611132

Anonymous 04/15/26(Wed)18:10:51 No.108611132

File: 1745894784744499.png (39 KB, 823x344)

39 KB PNG

>Q1 cuda merged
BONSAI BROS
WE WONNERED!!!!!!!!!!

Anonymous
04/15/26(Wed)18:13:10 No.108611138

Anonymous 04/15/26(Wed)18:13:10 No.108611138

>>108611132
they really managed to make a 1.7b 1bit model not fully retarded, that sounds like magic desu

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.