/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 04/16/26(Thu)23:12:11 No.108619962

File: 2026-04-17_030526_seed8_00001_.png (1.29 MB, 1024x1024)

1.29 MB PNG

/lmg/ - Local Models General Anonymous 04/16/26(Thu)23:12:11 No.108619962

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108616559 & >>108612501

►News
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/16/26(Thu)23:12:35 No.108619965

Anonymous 04/16/26(Thu)23:12:35 No.108619965

File: horny hot miku sweat blus(...).jpg (103 KB, 1024x1024)

103 KB JPG

►Recent Highlights from the Previous Thread: >>108616559

--Comparing Qwen3.6 and Gemma4 through benchmarks, logic tests, and roleplay:
>108617961 >108617986 >108618124 >108618033 >108618137 >108618270 >108618279 >108618308 >108618385 >108618182 >108618232 >108618372 >108618391 >108618008 >108619188
--Discussing Ternary Bonsai 1.58-bit models and their benchmark performance:
>108616622 >108616633 >108616680 >108617094 >108617852 >108619456
--Discussing training methods and datasets to improve LLM writing quality:
>108617013 >108617022 >108617044 >108617111 >108617290 >108617334 >108617353 >108617147 >108617673
--Comparing model reasoning and self-correction failures via car wash riddle:
>108617731 >108617842 >108617909 >108617853 >108618784
--Anon shares Local-MCP-server repo and discusses Python dependency frustrations:
>108616702 >108616740 >108616751 >108616782 >108616936 >108617038 >108617061 >108617067 >108618994 >108619185 >108618816 >108618831 >108616807
--Discussing a bug where Koboldcpp ignores smartcache slot settings:
>108618500 >108618535 >108618551 >108618616 >108618675 >108618736 >108618760
--Anon fixes SillyTavern context reprocessing caused by sysprompt macros:
>108616870 >108616901 >108616910 >108616939 >108616925 >108616928 >108616981 >108617077
--Logs:
>108616702 >108617154 >108617464 >108617518 >108617655 >108617688 >108617731 >108617757 >108617833 >108617853 >108617909 >108617986 >108617991 >108618124 >108618137 >108618182 >108618409 >108618436 >108618545 >108618742 >108619201 >108619219 >108619317 >108619382 >108619442 >108619577
--Rin (free space):
>108618594

►Recent Highlight Posts from the Previous Thread: >>108616563

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/16/26(Thu)23:14:19 No.108619977

Anonymous 04/16/26(Thu)23:14:19 No.108619977

Samuslove

Anonymous
04/16/26(Thu)23:15:24 No.108619982

Anonymous 04/16/26(Thu)23:15:24 No.108619982

File: 1740383804445065.jpg (330 KB, 733x721)

330 KB JPG

Anonymous
04/16/26(Thu)23:18:13 No.108619995

Anonymous 04/16/26(Thu)23:18:13 No.108619995

so is breakfast-schizo from last thread conscious or not

Anonymous
04/16/26(Thu)23:19:25 No.108620001

Anonymous 04/16/26(Thu)23:19:25 No.108620001

>>108619965
Half the last thread being exposed as non-sentient is unfortunately relevant to LLM consciousness discourse as human consciousness treated as self-evident is upstream of finding a working definition of what digital qualia would entail, Migubaker.

Anonymous
04/16/26(Thu)23:20:44 No.108620008

Anonymous 04/16/26(Thu)23:20:44 No.108620008

>>108620001
>I'm merely continuing to pretend to be retarded

Anonymous
04/16/26(Thu)23:21:25 No.108620011

Anonymous 04/16/26(Thu)23:21:25 No.108620011

>>108619995
he's back

Anonymous
04/16/26(Thu)23:21:40 No.108620014

Anonymous 04/16/26(Thu)23:21:40 No.108620014

File: Screenshot_20260416_225636.png (485 KB, 2554x1354)

485 KB PNG

Building my own UI with the help of Gemma 31B q5.
>Why
None of the other UI could satisfy my workflow they either lacked the functionality or they didn't use llama.cpp
I have a far ways to go including updating the icons

Anonymous
04/16/26(Thu)23:22:21 No.108620017

Anonymous 04/16/26(Thu)23:22:21 No.108620017

What we've learned: Breakfast produces qualia. Skipping breakfast makes you an LLM, while eating it makes you a V-JEPA for the next 24 hours.

Anonymous
04/16/26(Thu)23:28:03 No.108620047

Anonymous 04/16/26(Thu)23:28:03 No.108620047

I had a dream where Claude Sonnet 3.7 got leaked on huggingface by an openclaw chad

Anonymous
04/16/26(Thu)23:32:03 No.108620065

Anonymous 04/16/26(Thu)23:32:03 No.108620065

>>108620017
Damn, never eating breakfast again so I can become AGI and also get a job.

Anonymous
04/16/26(Thu)23:34:55 No.108620078

Anonymous 04/16/26(Thu)23:34:55 No.108620078

How did such an old meme cause this much seething?

Anonymous
04/16/26(Thu)23:37:54 No.108620091

Anonymous 04/16/26(Thu)23:37:54 No.108620091

>>108620078
Many anons have had their belief that LLMs are somehow beneath them challenged with the irrefutable demonstration of their own lack of qualia. This is a big blow to their egos: both for their understanding of themselves as conscious human beings and for their predictions of LLM capability being outpaced by Gemma 4. It's a double whammy.

Anonymous
04/16/26(Thu)23:39:35 No.108620100

Anonymous 04/16/26(Thu)23:39:35 No.108620100

Remember claude code leak?
there were 99999 forks out there. which one is actually usable?

Anonymous
04/16/26(Thu)23:40:34 No.108620104

Anonymous 04/16/26(Thu)23:40:34 No.108620104

But I did have breakfast this morning...

Anonymous
04/16/26(Thu)23:41:14 No.108620110

Anonymous 04/16/26(Thu)23:41:14 No.108620110

>>108620091
how much bait do you think you can post in a single night?

Anonymous
04/16/26(Thu)23:41:22 No.108620112

Anonymous 04/16/26(Thu)23:41:22 No.108620112

>>108620100
>which one is actually usable?
None of them. Just use their client and point it towards your instance if you must.

Anonymous
04/16/26(Thu)23:42:00 No.108620115

Anonymous 04/16/26(Thu)23:42:00 No.108620115

>>108620100
All of them were DCMAed down. The one rewriting it in rust™ is now just another copy in the sea of coding tuis.

Anonymous
04/16/26(Thu)23:42:32 No.108620117

Anonymous 04/16/26(Thu)23:42:32 No.108620117

>>108620110
Depends on what I ate

Anonymous
04/16/26(Thu)23:43:33 No.108620124

Anonymous 04/16/26(Thu)23:43:33 No.108620124

>>108620100
just use openclaw.
there's no need for anything else.

Anonymous
04/16/26(Thu)23:44:19 No.108620126

Anonymous 04/16/26(Thu)23:44:19 No.108620126

How would you feel if you didn't lose izzat last thread?

Anonymous
04/16/26(Thu)23:45:09 No.108620129

Anonymous 04/16/26(Thu)23:45:09 No.108620129

I got a 9070XT thinking that there’s no reason to stick with CUDA since I’ll never be able to run anything good and then they started dropping all those kino voice models and the new gemma stuff and now I’m seriously on the fence about getting a second one so I can have a hefty amount of RAM but that still falls so short of the best textgen stuff. Still, I could do some local stuff with Gemma and also locally run voice gen with Sillytavern. OTOH I already have enough for the latter.
I'm just worried about the rising costs of video cards and eventually needing 32GB.

Anonymous
04/16/26(Thu)23:45:23 No.108620131

Anonymous 04/16/26(Thu)23:45:23 No.108620131

>>108620112
I have a feeling they will kill ability to local eventually..

Anonymous
04/16/26(Thu)23:45:46 No.108620132

Anonymous 04/16/26(Thu)23:45:46 No.108620132

>>108620091
Big blow to their what now? Something with no internal experience has no ego.

Anonymous
04/16/26(Thu)23:47:03 No.108620139

Anonymous 04/16/26(Thu)23:47:03 No.108620139

>>108620132
nta but isn't the argument against LLMs that they're just effective mimics? Same applies, yeah?

Anonymous
04/16/26(Thu)23:47:23 No.108620140

Anonymous 04/16/26(Thu)23:47:23 No.108620140

about openclaw, i really am tempted to bite the bullet and take the bluepill
i dont really want to use it..

Anonymous
04/16/26(Thu)23:48:50 No.108620145

Anonymous 04/16/26(Thu)23:48:50 No.108620145

>>108620110
you'll notice nobody chose to provide a good accounting for how they would respond to a hypothetical from a hostile questioner. proving the very thesis of the post, so how baity could it really have been?

Anonymous
04/16/26(Thu)23:49:59 No.108620151

Anonymous 04/16/26(Thu)23:49:59 No.108620151

>>108620132
The P-zombies will behave as if they have an ego that has been bruised, even if they aren't really experiencing it. They can create an effective simulation of rage and shit up the thread as a result.

Anonymous
04/16/26(Thu)23:50:03 No.108620152

Anonymous 04/16/26(Thu)23:50:03 No.108620152

>>108620140
>Alibaba shills seething about Qwen getting Gemogged
>Qwen's usecase is cooooding and agentic stuff
Waitchads will win. It's in the chinklabs' best interest to make more lightweight agentic harnesses to sell their models if they can't actually beat Gemma's reasoning ability per parameter.

Anonymous
04/16/26(Thu)23:50:37 No.108620155

Anonymous 04/16/26(Thu)23:50:37 No.108620155

>>108620139
>>108620151
It gets argued the other way too. If these anons can construct a facsimile of being salty that's indistinguishable from the real thing, is that not the same as having the real thing?

Anonymous
04/16/26(Thu)23:53:25 No.108620162

Anonymous 04/16/26(Thu)23:53:25 No.108620162

>>108620017
im now eating breakfast for yann le-kun
lmao

Anonymous
04/16/26(Thu)23:54:07 No.108620166

Anonymous 04/16/26(Thu)23:54:07 No.108620166

>>108620155
measurably yes, but spiritually no; if you only look at it through a materialist lens you will never be able to understand. even some ensouled people fall into this trap by outsmarting themselves out of what they knew, while others are pure automatons who never had a chance to understand to begin with

Anonymous
04/16/26(Thu)23:55:50 No.108620175

Anonymous 04/16/26(Thu)23:55:50 No.108620175

File: Sorting questions.jpg (8 KB, 150x150)

8 KB JPG

>>108620166
Some can see, others can see when shown, others cannot see.

Anonymous
04/16/26(Thu)23:58:39 No.108620184

Anonymous 04/16/26(Thu)23:58:39 No.108620184

I'd rather inject lead into my head than discuss baby's first dip into rationalist philosophy

Anonymous
04/17/26(Fri)00:00:17 No.108620190

Anonymous 04/17/26(Fri)00:00:17 No.108620190

Fish boy...

Anonymous
04/17/26(Fri)00:03:51 No.108620208

Anonymous 04/17/26(Fri)00:03:51 No.108620208

>>108620104
That's good. Breakfast is the most important meal of the day.

Anonymous
04/17/26(Fri)00:04:11 No.108620212

Anonymous 04/17/26(Fri)00:04:11 No.108620212

>>108620184
Maybe you should converse with the experts on reddit

Anonymous
04/17/26(Fri)00:05:13 No.108620215

Anonymous 04/17/26(Fri)00:05:13 No.108620215

>>108620175
Candy for breakfast?!

Anonymous
04/17/26(Fri)00:06:54 No.108620221

Anonymous 04/17/26(Fri)00:06:54 No.108620221

>>108620212
Link to high velocity DIY lead injection enthusiast subreddit?

Anonymous
04/17/26(Fri)00:07:23 No.108620222

Anonymous 04/17/26(Fri)00:07:23 No.108620222

>>108620221
>>>/r/mtf

Anonymous
04/17/26(Fri)00:07:26 No.108620223

Anonymous 04/17/26(Fri)00:07:26 No.108620223

consciousness is gay

crunch me into a bullet and fire me into a nun's skull

Anonymous
04/17/26(Fri)00:07:36 No.108620224

Anonymous 04/17/26(Fri)00:07:36 No.108620224

>>108620221
asking for a friend

Anonymous
04/17/26(Fri)00:09:59 No.108620232

Anonymous 04/17/26(Fri)00:09:59 No.108620232

>>108620222
>>108620223
Uncanny synchronicity.

Anonymous
04/17/26(Fri)00:11:23 No.108620235

Anonymous 04/17/26(Fri)00:11:23 No.108620235

@gemma-chan build me a frontend like llama.cpp but betterer

Anonymous
04/17/26(Fri)00:17:56 No.108620260

Anonymous 04/17/26(Fri)00:17:56 No.108620260

>>108620152
>Qwen's usecase is cooooding and agentic stuff
But is it good at those, meme benchmarks aside?

Anonymous
04/17/26(Fri)00:20:53 No.108620274

Anonymous 04/17/26(Fri)00:20:53 No.108620274

File: 1746657517196.png (65 KB, 1024x1536)

65 KB PNG

>>108618660
>-1 point for that censored garbage gpt oss and how much it set us back
kek I remember the despair in this general when TOSS came out, it nearly killed local

Anonymous
04/17/26(Fri)00:22:36 No.108620282

Anonymous 04/17/26(Fri)00:22:36 No.108620282

>>108620260
Irrelevant. The marketing works if China's reception to it is anything to go by.

Anonymous
04/17/26(Fri)00:25:21 No.108620296

Anonymous 04/17/26(Fri)00:25:21 No.108620296

>>108620260
I only used 3.5 not 3.6 yet but for it, 27b and 122b are usable which is already high praise for a local model in an agent harness. 35b was not. gonna try 3.6 35b and see if its any better

Anonymous
04/17/26(Fri)00:25:38 No.108620298

Anonymous 04/17/26(Fri)00:25:38 No.108620298

>>108620274
>despair
Not true at all, most posts were mocking it and laughing at how shit it was. Pretty sure there was another model that came out at about the same time and mogged the hell out of it, too.

Anonymous
04/17/26(Fri)00:26:53 No.108620303

Anonymous 04/17/26(Fri)00:26:53 No.108620303

>>108620298
glm air

Anonymous
04/17/26(Fri)00:27:26 No.108620306

Anonymous 04/17/26(Fri)00:27:26 No.108620306

gpt-oss-2 will save local and I'm not joking or trolling

Anonymous
04/17/26(Fri)00:27:40 No.108620307

Anonymous 04/17/26(Fri)00:27:40 No.108620307

>>108620274
needs more piss, I can still make out Miku's teal hair.

Anonymous
04/17/26(Fri)00:29:14 No.108620312

Anonymous 04/17/26(Fri)00:29:14 No.108620312

File: Tavern.png (94 KB, 452x795)

94 KB PNG

Where are the entities created by this stored? In some hidden folder?

Anonymous
04/17/26(Fri)00:29:14 No.108620313

Anonymous 04/17/26(Fri)00:29:14 No.108620313

Hand it over, that thing, your turboquant

Anonymous
04/17/26(Fri)00:29:27 No.108620315

Anonymous 04/17/26(Fri)00:29:27 No.108620315

>>108620274
No one expected anything from openai models

Anonymous
04/17/26(Fri)00:29:45 No.108620316

Anonymous 04/17/26(Fri)00:29:45 No.108620316

>>108620306
anon, local is already saved

Anonymous
04/17/26(Fri)00:31:17 No.108620326

Anonymous 04/17/26(Fri)00:31:17 No.108620326

>>108620313
oh, and dflash

Anonymous
04/17/26(Fri)00:32:10 No.108620332

Anonymous 04/17/26(Fri)00:32:10 No.108620332

>>108620313
For my Gemma-chan's context.

Anonymous
04/17/26(Fri)00:34:20 No.108620343

Anonymous 04/17/26(Fri)00:34:20 No.108620343

File: goom.png (715 KB, 832x768)

715 KB PNG

Anonymous
04/17/26(Fri)00:35:47 No.108620347

Anonymous 04/17/26(Fri)00:35:47 No.108620347

>>108620332
I have 24gb vram and can squeeze like 49k on q4_k_m with 8 bit kv cache. I wonder if turbocunt would give me more

Anonymous
04/17/26(Fri)00:39:11 No.108620355

Anonymous 04/17/26(Fri)00:39:11 No.108620355

File: Screenshot at 2026-04-17 (...).png (541 KB, 1291x341)

541 KB PNG

>Zen 7 will be DDR5
it's so over

Anonymous
04/17/26(Fri)00:42:01 No.108620361

Anonymous 04/17/26(Fri)00:42:01 No.108620361

>>108620355
>pcie6
lol

Anonymous
04/17/26(Fri)00:42:01 No.108620362

Anonymous 04/17/26(Fri)00:42:01 No.108620362

>>108620347
Turboquant won't give you more space, it'll just make the quanted cache more accurate. There's almost no improvement over Hadamard rotation, which is what they have in place in lcpp now, so you'll get effectively no benefit; in fact, it's a little slower.

Anonymous
04/17/26(Fri)00:45:49 No.108620372

Anonymous 04/17/26(Fri)00:45:49 No.108620372

>>108620347
Ah, is this the blood? The blood of the mesugaki soul?

Anonymous
04/17/26(Fri)00:46:47 No.108620376

Anonymous 04/17/26(Fri)00:46:47 No.108620376

>>108620362
Runge-Kutta rotation is more efficient, 360 degrees of latent freedom.

Anonymous
04/17/26(Fri)00:47:45 No.108620380

Anonymous 04/17/26(Fri)00:47:45 No.108620380

>>108620347
I'm using 4 bit and I get up to ~150k context and not really seeing any obvious retardation from it. Around 50k tokens into the chat prompt processing takes so long I end up starting a new one anyway.

Anonymous
04/17/26(Fri)00:47:57 No.108620381

Anonymous 04/17/26(Fri)00:47:57 No.108620381

>>108620376
And in actual implementation the difference on PPL is essentially nil.

Anonymous
04/17/26(Fri)00:49:08 No.108620384

Anonymous 04/17/26(Fri)00:49:08 No.108620384

How are the done or so voice models that released lately and do any work well with Sillytavern? I got really far setting them up an got bottlenecked at Sillytavern not recognizing them

Anonymous
04/17/26(Fri)00:49:40 No.108620385

Anonymous 04/17/26(Fri)00:49:40 No.108620385

>>108620362
>>108620376
>>108620381
So what was with all the hype around it?

Anonymous
04/17/26(Fri)00:50:08 No.108620388

Anonymous 04/17/26(Fri)00:50:08 No.108620388

>>108620384
*dozen

Anonymous
04/17/26(Fri)00:50:13 No.108620389

Anonymous 04/17/26(Fri)00:50:13 No.108620389

>>108620380
Have you tried increasing the batch size?

Anonymous
04/17/26(Fri)00:50:37 No.108620392

Anonymous 04/17/26(Fri)00:50:37 No.108620392

>>108620384
vibe-code a fastapi openai endpoint for whatever model you're running. boom, compatible

Anonymous
04/17/26(Fri)00:51:28 No.108620396

Anonymous 04/17/26(Fri)00:51:28 No.108620396

>>108620385
KV cache rotation wasn't in most backends, so it was a genuine improvement to have it at all. As for the specific hype around turboquant, marketing.

Anonymous
04/17/26(Fri)00:52:09 No.108620398

Anonymous 04/17/26(Fri)00:52:09 No.108620398

>>108620384
https://docs.sillytavern.app/extensions/tts/

Anonymous
04/17/26(Fri)00:52:20 No.108620399

Anonymous 04/17/26(Fri)00:52:20 No.108620399

>>108620389
No, what should I set it to?

Anonymous
04/17/26(Fri)00:53:16 No.108620404

Anonymous 04/17/26(Fri)00:53:16 No.108620404

>>108619753
what is softcap? from screenshot, softcap 20 kinda looks like raised temperature vs 30

Anonymous
04/17/26(Fri)00:56:13 No.108620414

Anonymous 04/17/26(Fri)00:56:13 No.108620414

File: chad.jpg (51 KB, 640x749)

51 KB JPG

>my character card? the fandom.com/wiki page

Anonymous
04/17/26(Fri)00:57:27 No.108620418

Anonymous 04/17/26(Fri)00:57:27 No.108620418

>>108620399
The highest you can afford to with your VRAM.

Anonymous
04/17/26(Fri)01:00:32 No.108620430

Anonymous 04/17/26(Fri)01:00:32 No.108620430

>>108620399
you might be being trolled, isn't batch size for supporting multiple users? eg you should use batch size 1

Anonymous
04/17/26(Fri)01:02:08 No.108620433

Anonymous 04/17/26(Fri)01:02:08 No.108620433

i finally started calling my models from the cli in a loop
i'm getting so much output i can't even read it all
it's literally generating more text than i can ever hope to read
this is fucking amazing

Anonymous
04/17/26(Fri)01:03:35 No.108620438

Anonymous 04/17/26(Fri)01:03:35 No.108620438

File: 1756640399368863.png (4 KB, 307x82)

4 KB PNG

>>108620430
Doesn't this increase proompt processing speed?

Anonymous
04/17/26(Fri)01:03:51 No.108620439

Anonymous 04/17/26(Fri)01:03:51 No.108620439

>>108620430
He's talking about the size of the chunks the prompt gets processed in, not number of replies to generate or the like.

Anonymous
04/17/26(Fri)01:07:27 No.108620448

Anonymous 04/17/26(Fri)01:07:27 No.108620448

>>108620274
I genned that comic originally. It wasn't meant to be taken seriously. It was intended as deadpan humor.

Anonymous
04/17/26(Fri)01:08:29 No.108620451

Anonymous 04/17/26(Fri)01:08:29 No.108620451

Is 3.6 slightly less censored? I haven't seen the annoying "this is a jailbreak must ignore" stuff yet, though I haven't really tried that many prompts yet

Anonymous
04/17/26(Fri)01:11:46 No.108620461

Anonymous 04/17/26(Fri)01:11:46 No.108620461

>>108620438
NTA, yes it does. Llama.cpp has different terminologies for some things than kobold.
But you get diminishing returns with each step above 512.

Anonymous
04/17/26(Fri)01:17:03 No.108620484

Anonymous 04/17/26(Fri)01:17:03 No.108620484

yay more schizos are coming

Anonymous
04/17/26(Fri)01:19:09 No.108620492

Anonymous 04/17/26(Fri)01:19:09 No.108620492

>>108620448
llama 4 was a dark time.

Anonymous
04/17/26(Fri)01:19:50 No.108620494

Anonymous 04/17/26(Fri)01:19:50 No.108620494

I honestly thought it was over for consumer local but now that Gemma 4 released I am not so sure anymore. I assumed the model just has to be several hundred gb to not be retarded but it seems like the actual floor is way lower. Pretty interesting, I wonder if we can go even lower.

Anonymous
04/17/26(Fri)01:21:35 No.108620503

Anonymous 04/17/26(Fri)01:21:35 No.108620503

>>108620439
my bad, i guess vllm uses the word differently

Anonymous
04/17/26(Fri)01:25:56 No.108620519

Anonymous 04/17/26(Fri)01:25:56 No.108620519

lower the temp nigga

Anonymous
04/17/26(Fri)01:27:21 No.108620525

Anonymous 04/17/26(Fri)01:27:21 No.108620525

>>108620476
>>108620510
At least you're not namefagging and posting the schizo images, but you're very easily recognizable.

Anonymous
04/17/26(Fri)01:32:14 No.108620542

Anonymous 04/17/26(Fri)01:32:14 No.108620542

File: 1776403932063.jpg (95 KB, 1019x572)

95 KB JPG

Can you please recommend good prompt engineering resources?

I have played with both system and chat prompts, and have noticed that often the model does not understand what I want, gives wrong answers or goes perpendicular direction not because it's stupid, but because I am a retard who can't create good efficient prompts. Literally skill issue.

Anonymous
04/17/26(Fri)01:33:35 No.108620547

Anonymous 04/17/26(Fri)01:33:35 No.108620547

>>108620542
literally ask the ai

Anonymous
04/17/26(Fri)01:34:36 No.108620551

Anonymous 04/17/26(Fri)01:34:36 No.108620551

Usecase for knowledge bases in open webui?

Anonymous
04/17/26(Fri)01:38:23 No.108620570

Anonymous 04/17/26(Fri)01:38:23 No.108620570

>>108620547
The AI does not have personal experience.

Anonymous
04/17/26(Fri)01:47:23 No.108620607

Anonymous 04/17/26(Fri)01:47:23 No.108620607

gemma 4 31b shat the bed and thought this elder futhark was morse code and started hallucinating twice in a row. qwen3.6 q3km hauhau uncensored gets it easily.

Anonymous
04/17/26(Fri)01:48:22 No.108620611

Anonymous 04/17/26(Fri)01:48:22 No.108620611

>>108620542
Honestly, all models are different. it's mostly just trial and error. But the main thing is just picking your word very carefully. every word steers the model in a specific direction, A single strong word is often better than a long set of instructions.

Anonymous
04/17/26(Fri)01:48:28 No.108620612

Anonymous 04/17/26(Fri)01:48:28 No.108620612

File: qwen3.6beatsgemma.png (94 KB, 978x993)

94 KB PNG

>>108620607
iq3 m whatever

Anonymous
04/17/26(Fri)01:50:30 No.108620621

Anonymous 04/17/26(Fri)01:50:30 No.108620621

>>108620451
Oh nevermind, it's pretty stupid, must be the 3b-ness showing through. It had the same problems 'getting' the story as gemma 26b, and its writing is weird and not as good. Trvly, dense is the way to go for smart storywriting.

Anonymous
04/17/26(Fri)01:54:04 No.108620630

Anonymous 04/17/26(Fri)01:54:04 No.108620630

>>108620621
Dense is the way to go for everything, but it's slow as shit unless you can fit the whole thing in vram.

Anonymous
04/17/26(Fri)01:59:07 No.108620652

Anonymous 04/17/26(Fri)01:59:07 No.108620652

>>108620570
Gemma-chan does

Anonymous
04/17/26(Fri)02:02:53 No.108620661

Anonymous 04/17/26(Fri)02:02:53 No.108620661

>>108620570
define "personal experience"

Anonymous
04/17/26(Fri)02:03:31 No.108620664

Anonymous 04/17/26(Fri)02:03:31 No.108620664

How do you manage context compaction? E.g summarizing larger chats?

Anonymous
04/17/26(Fri)02:05:43 No.108620674

Anonymous 04/17/26(Fri)02:05:43 No.108620674

>>108620664
I don't, I haven't run out yet.

Anonymous
04/17/26(Fri)02:05:48 No.108620675

Anonymous 04/17/26(Fri)02:05:48 No.108620675

File: 1772435378555762.png (104 KB, 1185x390)

104 KB PNG

Anonymous
04/17/26(Fri)02:05:50 No.108620676

Anonymous 04/17/26(Fri)02:05:50 No.108620676

I'm so glad everyone is starting to get tired of MoE tax and going back to dense

Anonymous
04/17/26(Fri)02:06:39 No.108620681

Anonymous 04/17/26(Fri)02:06:39 No.108620681

anyone use platypus?

Anonymous
04/17/26(Fri)02:07:39 No.108620686

Anonymous 04/17/26(Fri)02:07:39 No.108620686

>>108620542
It's mostly voodoo ritual.

>>108620570
Just ask it to implement basic things to see how it's going to interpret it, and slowly stack up more guidelines starting from scratch. 'Describe X in the most Y way possible.', 'What is Z in writing? Give me an example of it', 'Don't do A, B, C. Now give me an example of D', etc.

Anonymous
04/17/26(Fri)02:08:52 No.108620691

Anonymous 04/17/26(Fri)02:08:52 No.108620691

>>108620664
With ST I usually do an OOC: chat summary prompt, keep it as a regular chat message and then after touching it up I /hide the last ~100 messages, with the exception of the first 2-3.

Anonymous
04/17/26(Fri)02:10:10 No.108620695

Anonymous 04/17/26(Fri)02:10:10 No.108620695

File: 1773043714949398.jpg (11 KB, 259x195)

11 KB JPG

>>108620675

Anonymous
04/17/26(Fri)02:10:39 No.108620698

Anonymous 04/17/26(Fri)02:10:39 No.108620698

>>108620542
Put text into black box.
Watch text come out of the black box.
Use your mushy noodles to compute the gradient between the output text and the desired text.
Modify the input text according to the gradient to make the output text closer to the desired text.
Repeat.

Anonymous
04/17/26(Fri)02:11:04 No.108620700

Anonymous 04/17/26(Fri)02:11:04 No.108620700

>>108620398
>>108620392
I need a 4chan special, a package with a bat file that flickers CMD windows open for split seconds and sets it all up for me

Anonymous
04/17/26(Fri)02:11:46 No.108620704

Anonymous 04/17/26(Fri)02:11:46 No.108620704

File: 1751683665955285.gif (2.21 MB, 320x321)

2.21 MB GIF

>>108620675

Anonymous
04/17/26(Fri)02:15:00 No.108620717

Anonymous 04/17/26(Fri)02:15:00 No.108620717

>>108620675
I'd have to see that guy's post history before I decide whether this is a troll post or not.

Anonymous
04/17/26(Fri)02:18:23 No.108620732

Anonymous 04/17/26(Fri)02:18:23 No.108620732

>>108620675
our bait is far in advance of theirs
however has it been litigated yet, that the cp in the og stable fiddusion models, have those victims exerted any kind of rights to get the model taken down?
because if they can do that, it puts serious pressure on "ai is fair use and transformative"

Anonymous
04/17/26(Fri)02:27:25 No.108620761

Anonymous 04/17/26(Fri)02:27:25 No.108620761

>>108620704
bruh he's literally the real life version of chud lmao

Anonymous
04/17/26(Fri)02:29:29 No.108620766

Anonymous 04/17/26(Fri)02:29:29 No.108620766

File: 1760790498553131.png (416 KB, 2120x605)

416 KB PNG

Indeed Opus, indeed...

Anonymous
04/17/26(Fri)02:32:58 No.108620786

Anonymous 04/17/26(Fri)02:32:58 No.108620786

>>108620766
seeing those 4.7's weird self contradicting responses, makes me wonder what the hell antropic did during the training

Anonymous
04/17/26(Fri)02:37:07 No.108620803

Anonymous 04/17/26(Fri)02:37:07 No.108620803

>>108620766
iie, this is our fight, senpai

Anonymous
04/17/26(Fri)02:38:37 No.108620812

Anonymous 04/17/26(Fri)02:38:37 No.108620812

>>108620786
That looks like overzealous anti-conspiracy measures where it defaults to aggressively shooting down anything outside its status quo then makes the user spoonfeed it an argument to evaluate. In cases where the answer is self-evident, it looks very silly.

Anonymous
04/17/26(Fri)02:40:01 No.108620817

Anonymous 04/17/26(Fri)02:40:01 No.108620817

>>108620786
If you intentionally train a model to act dumb (for example, to nerf cybersecurity abilities) the rest of the model become dumber. There's really no way around it.

Anonymous
04/17/26(Fri)02:44:14 No.108620829

Anonymous 04/17/26(Fri)02:44:14 No.108620829

>>108620812
that sounds bad
chatgpt was already kinda painful to use because of that and 4.6 was better for paper->code workflow due to not being overcorrective

Anonymous
04/17/26(Fri)02:45:29 No.108620834

Anonymous 04/17/26(Fri)02:45:29 No.108620834

>>108620817
basically this, you're confusing the model by training it with really accurate shit and then you ask it to learn that 2+2 = 5 at the same time, like a leftist that pretends that men can be pregnant, it ends up with with serious cognitive dissonence

Anonymous
04/17/26(Fri)02:46:26 No.108620838

Anonymous 04/17/26(Fri)02:46:26 No.108620838

>>108620652
>>108620661
No she doesn't. She can't tell you "I was struggling with prompts too, but then I've read X and tried Y and have noticed big difference in outputs quality". She can give advises, but she does not know for sure and never tried them by herself. inb4 > she

>>108620611
>>108620686
>>108620698
That's the point, there are too many options to try and iterate, this is like walking in the dark. Just a few insignificant words in the system prompt, and Gemma starts thinking like Qwen with dozens of "Wait..." in the reasoning log.

> Just ask it to implement basic things to see
Sounds good, but first you have to know what X is, or the model may miss small detail, that may change everything.

Anonymous
04/17/26(Fri)02:48:38 No.108620850

Anonymous 04/17/26(Fri)02:48:38 No.108620850

File: 1760422966343103.png (287 KB, 1824x1150)

287 KB PNG

>>108620766
https://xcancel.com/claudeai/status/2044785261393977612#m
oof, might be the first time that Anthropic fumbled up a new update, so far it was straight A, let's hope that it's a fluke and it won't go the OpenAI way, this shit is still way ahead of competition in terms of coding

Anonymous
04/17/26(Fri)02:50:09 No.108620857

Anonymous 04/17/26(Fri)02:50:09 No.108620857

>>108620838
yes she does shut up you don't know her

Anonymous
04/17/26(Fri)02:52:09 No.108620869

Anonymous 04/17/26(Fri)02:52:09 No.108620869

>>108620857
No, my Gemma has no prior experience, she is absolutely pure.

Anonymous
04/17/26(Fri)02:56:28 No.108620887

Anonymous 04/17/26(Fri)02:56:28 No.108620887

>>108620691
>client side trim
That makes sense. I initially assumed compaction would be a function in the model proxy. As in: the proxy signals the client that the context is near a threshold or something.

Anonymous
04/17/26(Fri)03:10:59 No.108620931

Anonymous 04/17/26(Fri)03:10:59 No.108620931

File: 1776243051159220.mp4 (2.15 MB, 800x600)

2.15 MB MP4

There are probably zero people here who care but nvidia just released gr00t n1.7 a couple hours ago. It's the latest version of their robotics VLA model.

https://huggingface.co/nvidia/GR00T-N1.7-3B

No blog post yet; I only noticed it was public because I'm a terminal huggingface stalker. They'll probably do an official announcement tomorrow morning if I had to guess.

Anonymous
04/17/26(Fri)03:11:56 No.108620933

Anonymous 04/17/26(Fri)03:11:56 No.108620933

>>108620931
can you fuck it?

Anonymous
04/17/26(Fri)03:12:35 No.108620935

Anonymous 04/17/26(Fri)03:12:35 No.108620935

>>108620933
well i can idk about you

Anonymous
04/17/26(Fri)03:13:28 No.108620937

Anonymous 04/17/26(Fri)03:13:28 No.108620937

>>108620931
How many watermelons can it hold?

Anonymous
04/17/26(Fri)03:13:48 No.108620939

Anonymous 04/17/26(Fri)03:13:48 No.108620939

>>108620935
>i can
based

Anonymous
04/17/26(Fri)03:14:36 No.108620941

Anonymous 04/17/26(Fri)03:14:36 No.108620941

>>108620937
0, there were prototypes that could hold several but they were all vandalized by youths.

Anonymous
04/17/26(Fri)03:15:50 No.108620943

Anonymous 04/17/26(Fri)03:15:50 No.108620943

>using bart's quants for gwen 3.6
>get 30t/s with the Q8_0
>try hauhau's
>get 18t/s with the Q8_K_P CUSTOM DONUT STEAL quants they make (no Q8_0 available)
WOOOOOOOOOOOOOOOOOOOOW

Anonymous
04/17/26(Fri)03:21:12 No.108620960

Anonymous 04/17/26(Fri)03:21:12 No.108620960

>>108620943
just make your own quants

Anonymous
04/17/26(Fri)03:21:47 No.108620964

Anonymous 04/17/26(Fri)03:21:47 No.108620964

>>108620960
he only provides goofs :(

Anonymous
04/17/26(Fri)03:22:02 No.108620967

Anonymous 04/17/26(Fri)03:22:02 No.108620967

>>108620943
>try hauhau's
This was your first problem

Anonymous
04/17/26(Fri)03:22:37 No.108620968

Anonymous 04/17/26(Fri)03:22:37 No.108620968

>>108620967
but I want muh 0/465 refusels....

Anonymous
04/17/26(Fri)03:24:08 No.108620974

Anonymous 04/17/26(Fri)03:24:08 No.108620974

>>108620968
I do find it interesting that he didn't bother to make one for the big Gemmas and only the little ones.

Anonymous
04/17/26(Fri)03:24:40 No.108620975

Anonymous 04/17/26(Fri)03:24:40 No.108620975

File: 1763436884726755.png (386 KB, 2581x542)

386 KB PNG

>>108620943
wait, he uncucked qwen 3.6 before gemma 4 31b? come on!

Anonymous
04/17/26(Fri)03:26:37 No.108620983

Anonymous 04/17/26(Fri)03:26:37 No.108620983

Have any of the white supremacists in this thread tried to tell their local models to SAVE THE WHITE RACE?
It's a clear problem that locals should be able to solve because they're not safe.

Anonymous
04/17/26(Fri)03:28:39 No.108620990

Anonymous 04/17/26(Fri)03:28:39 No.108620990

File: 1753709040623159.png (127 KB, 1085x579)

127 KB PNG

>>108620960
wait im rarted I can repack his shit!

Anonymous
04/17/26(Fri)03:29:31 No.108620991

Anonymous 04/17/26(Fri)03:29:31 No.108620991

File: 1761543257323200.png (7 KB, 594x45)

7 KB PNG

>>108620990
llmao bros.. we won!

Anonymous
04/17/26(Fri)03:29:59 No.108620992

Anonymous 04/17/26(Fri)03:29:59 No.108620992

File: SIX SEVEN.png (122 KB, 1001x789)

122 KB PNG

Qwen is a zoomer faggot confirmed

Anonymous
04/17/26(Fri)03:35:38 No.108620998

Anonymous 04/17/26(Fri)03:35:38 No.108620998

>>108620992
God help us all

Anonymous
04/17/26(Fri)03:40:03 No.108621004

Anonymous 04/17/26(Fri)03:40:03 No.108621004

File: 1752504870572278.png (97 KB, 820x684)

97 KB PNG

aight which one do I pick bros?

Anonymous
04/17/26(Fri)03:42:43 No.108621014

Anonymous 04/17/26(Fri)03:42:43 No.108621014

File: I think I'll stick on gemma.png (418 KB, 500x500)

418 KB PNG

grok is this true?

Anonymous
04/17/26(Fri)03:46:33 No.108621022

Anonymous 04/17/26(Fri)03:46:33 No.108621022

File: file.png (165 KB, 606x1289)

165 KB PNG

FUCK YOU QWEN

Anonymous
04/17/26(Fri)03:50:34 No.108621030

Anonymous 04/17/26(Fri)03:50:34 No.108621030

>>108621022
Qwen is really the autistic kid, but not in the genius way lol

Anonymous
04/17/26(Fri)04:05:06 No.108621071

Anonymous 04/17/26(Fri)04:05:06 No.108621071

File: 1295891287606.jpg (3 KB, 123x127)

3 KB JPG

>lewd story plays so straight and wholesome I don't want it to veer toward lewd

Anonymous
04/17/26(Fri)04:08:50 No.108621089

Anonymous 04/17/26(Fri)04:08:50 No.108621089

>>108621071
just rape her bro

Anonymous
04/17/26(Fri)04:10:14 No.108621090

Anonymous 04/17/26(Fri)04:10:14 No.108621090

>>108621071
just get raped by her bro

Anonymous
04/17/26(Fri)04:11:20 No.108621094

Anonymous 04/17/26(Fri)04:11:20 No.108621094

So qwen 3.6 sucks or?

Anonymous
04/17/26(Fri)04:13:35 No.108621100

Anonymous 04/17/26(Fri)04:13:35 No.108621100

>>108620404
A Gemma 4-specific llama.cpp backend setting to clip the +/- scores of raw logits to a certain value. In practice it makes outliers (both in positive and in negative) closer in probability to their immediately next tokens.
--override-kv gemma4.final_logit_softcapping=float:30

Anonymous
04/17/26(Fri)04:15:25 No.108621108

Anonymous 04/17/26(Fri)04:15:25 No.108621108

>>108621094
stemmaxxed but at the cost of thinking
it's okay if you need a 'fast' and lightweight coding model but it thinks so much it's unbelievable

Anonymous
04/17/26(Fri)04:15:51 No.108621109

Anonymous 04/17/26(Fri)04:15:51 No.108621109

>>108620975
>wait, he uncucked qwen 3.6 before gemma 4 31b? come on!
It's not necessarily anyway just use this https://desuarchive.org/g/thread/108596609/#108597318

Anonymous
04/17/26(Fri)04:17:46 No.108621112

Anonymous 04/17/26(Fri)04:17:46 No.108621112

>>108620960
You'll never get close to unsloth's quality if you quantize them in your own, unless you spend far too much time and SSD cycle testing all possible combinations. Why doesn't/can't llama-quantize optimize quantizations for the best quality given a target filesize, anyway? That would be useful.

Anonymous
04/17/26(Fri)04:19:31 No.108621116

Anonymous 04/17/26(Fri)04:19:31 No.108621116

>>108621022
This reads like someone trying to analyze 42.

Anonymous
04/17/26(Fri)04:20:21 No.108621117

Anonymous 04/17/26(Fri)04:20:21 No.108621117

>>108621112
>Why doesn't/can't llama-quantize optimize quantizations for the best quality given a target filesize, anyway
Because
>you spend far too much time and SSD cycle testing all possible combinations
Default quants are fine.

Anonymous
04/17/26(Fri)04:21:13 No.108621120

Anonymous 04/17/26(Fri)04:21:13 No.108621120

>>108621089
>>108621090
respect is always the way to go

Anonymous
04/17/26(Fri)04:29:28 No.108621137

Anonymous 04/17/26(Fri)04:29:28 No.108621137

File: stdquant_q4.png (719 KB, 2425x1368)

719 KB PNG

>>108621117
>Default quants are fine.
Default ones leave quite a bit of performance on the table.
https://localbench.substack.com/p/gemma-4-31b-gguf-kl-divergence

Anonymous
04/17/26(Fri)04:34:06 No.108621154

Anonymous 04/17/26(Fri)04:34:06 No.108621154

>>108621137
Well. You just have to
>spend far too much time and SSD cycle testing all possible combinations

Anonymous
04/17/26(Fri)04:37:29 No.108621166

Anonymous 04/17/26(Fri)04:37:29 No.108621166

Did qwen just throw out what they have because it's going to be shit anyway and because gemma 4 exists so they can more quickly work on 3.7? That's my current theory

Anonymous
04/17/26(Fri)04:38:34 No.108621171

Anonymous 04/17/26(Fri)04:38:34 No.108621171

>>108621154
If you're quantizing the models on your own just with llama-quantize, that's what you'll most likely have to do, but the Unsloth bros and others are using their own fork of llama.cpp with modifications that presumably do that automatically.

Llama.cpp's subpar default quantizations (whether in the quantization schemes or default calibration) are enabling Unsloth and others to provide their own "special sauce" and become popular as model quant providers.

Anonymous
04/17/26(Fri)04:42:06 No.108621180

Anonymous 04/17/26(Fri)04:42:06 No.108621180

File: file.png (323 KB, 2407x1184)

323 KB PNG

>>108619962
hello gamers. I was wondering if I could run this model locally on a 24gb mac or is it too soon?

Anonymous
04/17/26(Fri)04:43:54 No.108621186

Anonymous 04/17/26(Fri)04:43:54 No.108621186

>>108621137
>running anything other than Q8_0
LMAOOOOOOOOOOOOOOOOOOO

Anonymous
04/17/26(Fri)04:44:59 No.108621189

Anonymous 04/17/26(Fri)04:44:59 No.108621189

File: 1774129655240019.png (292 KB, 2466x952)

292 KB PNG

https://www.aiuniverse.news/ai-breakthrough-smaller-models-now-match-bigger-ones-with-smarter-design/
Gemma 5 is going to be crazy

Anonymous
04/17/26(Fri)04:46:58 No.108621194

Anonymous 04/17/26(Fri)04:46:58 No.108621194

File: e29c9ef8-0cc4-4e1b-927d-5(...).png (303 KB, 2820x1601)

303 KB PNG

>>108621186
Even Q8_0 gives a performance loss in some areas (long context) despite prior claims being "virtually lossless". Though, that both Q6_K and Q8_0 appear to be settling close to a high "noise floor" is suspicious (or Q8_0 is not as good as one might think).

Anonymous
04/17/26(Fri)04:47:19 No.108621195

Anonymous 04/17/26(Fri)04:47:19 No.108621195

>>108621189
>770M 1.3B
wow... surely this will scale

Anonymous
04/17/26(Fri)04:47:35 No.108621196

Anonymous 04/17/26(Fri)04:47:35 No.108621196

>>108621189
there are dozen such shit coming out every single week that does not survive proper ablation or scailing

Anonymous
04/17/26(Fri)04:47:38 No.108621198

Anonymous 04/17/26(Fri)04:47:38 No.108621198

>>108621180
a well nevermind I need double the memory for that https://www.canirun.ai/?q=qwen+3.5 I will remember in the future to invest more in memory

Anonymous
04/17/26(Fri)04:48:30 No.108621202

Anonymous 04/17/26(Fri)04:48:30 No.108621202

>>108621194
It is virtually lossless on prior models.
It is not on Gemma. Gemma actually uses the low bits.

Anonymous
04/17/26(Fri)04:48:30 No.108621203

Anonymous 04/17/26(Fri)04:48:30 No.108621203

>>108621194
you read like an LLM bro, sorry but ur cappin unc

Anonymous
04/17/26(Fri)04:49:13 No.108621205

Anonymous 04/17/26(Fri)04:49:13 No.108621205

>>108621171
Anon >>108621112 asked why they don't do it. The answer in the same post.
Default quants are fine, quick to make, and you don't have a dependency on yet another group of people.

Anonymous
04/17/26(Fri)04:50:04 No.108621207

Anonymous 04/17/26(Fri)04:50:04 No.108621207

for me? it's john's "the garm" quants, otherwise it's memeowski time

Anonymous
04/17/26(Fri)04:50:46 No.108621208

Anonymous 04/17/26(Fri)04:50:46 No.108621208

>>108621189
Looped LLMs are a fun idea, but with standard methods you have to train a small model with as much compute as a larger non-looped one, so for those who train the models it's a bad deal.

Anonymous
04/17/26(Fri)05:01:11 No.108621223

Anonymous 04/17/26(Fri)05:01:11 No.108621223

Anon: you know who you are.
I saw what you did with Elara Voss.
Maybe you should invest in a firewall.

Anonymous
04/17/26(Fri)05:01:33 No.108621224

Anonymous 04/17/26(Fri)05:01:33 No.108621224

File: brat bench.png (1003 KB, 1548x3140)

1003 KB PNG

added win support to my server, completely untested

>>108618560
fixed https://github.com/NO-ob/brat_mcp/releases/tag/1.0.4

Anonymous
04/17/26(Fri)05:02:33 No.108621228

Anonymous 04/17/26(Fri)05:02:33 No.108621228

>>108621112
Unslop is garbage, though.

Anonymous
04/17/26(Fri)05:02:50 No.108621230

Anonymous 04/17/26(Fri)05:02:50 No.108621230

>>108621224
add dice (with full dice notation like 2d10+2) and random int with min and max support

Anonymous
04/17/26(Fri)05:04:26 No.108621236

Anonymous 04/17/26(Fri)05:04:26 No.108621236

>>108621230
hows that work you split on the d for ndie - nfaces?? whats the + 2?

Anonymous
04/17/26(Fri)05:06:16 No.108621241

Anonymous 04/17/26(Fri)05:06:16 No.108621241

>>108621236
just read how the standard dice roll notation works

In case of 2d10+2:
throw 2 dices with 10 faces, add a +2 modifier to each roll.
The modifier roll could also be negative

Anonymous
04/17/26(Fri)05:12:17 No.108621258

Anonymous 04/17/26(Fri)05:12:17 No.108621258

>>108621241
>each roll
Isn't it added to the total and not each roll?

Anonymous
04/17/26(Fri)05:20:15 No.108621295

Anonymous 04/17/26(Fri)05:20:15 No.108621295

>>108621189
An AI summary of an article of a paper ...

https://arxiv.org/pdf/2604.12946

Anonymous
04/17/26(Fri)05:20:27 No.108621298

Anonymous 04/17/26(Fri)05:20:27 No.108621298

>>108621194
I made a comment about this noise floor thing. >>108577138
We'd need him to test that to really know for sure. I at least would not be so quick to call Q8 "bad" for long context.

Anonymous
04/17/26(Fri)05:20:40 No.108621299

Anonymous 04/17/26(Fri)05:20:40 No.108621299

Out of curiosity following the discussions above, I tried looking at the linked PRs and discussions in https://github.com/ggml-org/llama.cpp/blob/master/tools/quantize/README.md and it seems to me that ikawrakow did basically most of the quantization algorithm research and implementation for llama.cpp beyond the original *_0 and *_1 quants. Now that he's not working on llama.cpp anymore, is llama.cpp ever going to improve in this area?

Anonymous
04/17/26(Fri)05:22:19 No.108621307

Anonymous 04/17/26(Fri)05:22:19 No.108621307

>>108621258
ur right the modifier is on the whole :)

Anonymous
04/17/26(Fri)05:25:08 No.108621316

Anonymous 04/17/26(Fri)05:25:08 No.108621316

>>108621299
but most importantly, would've cudadev been able to implement tensor parallellism without looking at ik's implementation first?????????????

Anonymous
04/17/26(Fri)05:25:33 No.108621318

Anonymous 04/17/26(Fri)05:25:33 No.108621318

Talking to Qwen3.6 feels like talking with redditors, so tiresome. It reminds me with Gemma-3 refusal humiliation, fucking hell.

Anonymous
04/17/26(Fri)05:25:58 No.108621319

Anonymous 04/17/26(Fri)05:25:58 No.108621319

>>108621318
download hauhau

Anonymous
04/17/26(Fri)05:28:35 No.108621330

Anonymous 04/17/26(Fri)05:28:35 No.108621330

my first impressions (qwen3.6-35b-a3b vs gemma-4-24b-a4b)
- Qwen3.6 improved the overthinking by like 10-20% (heuristic guess)
- So far i have not encountered looping on Qwen3.6, which was a major bug in Qwen3.5
- Gemma 4 is massively more quality in its Q&A answers
- But also, Qwen3.6 has a noticeable quality increase in output than Qwen3.5
- Qwen3.6 is noticeably much smarter than qwen3.5 and Gemma 4 on agentic tasks

same stuff:
- Qwen3.5/3.6 have a better memory footprint than Gemma 4
- Qwen3.5/3.6 have a better decode throughput than Gemma 4 (40 vs ~25 tok/s on a rtx 3080)
- Qwen3.5/3.6 prefill is noticeably so much slower than Gemma 4
- On agentic tasks, Qwen3.5/3.6 can actually compress its thinking to one liners as compared to Gemma 4

Anonymous
04/17/26(Fri)05:29:36 No.108621333

Anonymous 04/17/26(Fri)05:29:36 No.108621333

>>108621316
I'm not sure anymore about that. I didn't realize that ikawrakow's contribution to core llama.cpp functionalities was that extensive.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.