/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/10/26(Wed)11:13:59 No.109023085

File: poor reception.jpg (228 KB, 1216x832)

228 KB JPG

/lmg/ - Local Models General Anonymous 06/10/26(Wed)11:13:59 No.109023085 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109018067 & >>109013071

►News
>(06/09) Cohere releases North-Mini-Code-1.0: https://hf.co/CohereLabs/North-Mini-Code-1.0
>(06/07) llama : add Gemma4 MTP #23398 MERGED: https://github.com/ggml-org/llama.cpp/pull/23398
>(06/05) dots.tts 2B released: https://hf.co/rednote-hilab/dots.tts-soar
>(06/05) Gemma 4 QAT models released: https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4
>(06/04) Higgs Audio v3 TTS released: https://boson.ai/blog/higgs-audio-v3-tts

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/10/26(Wed)11:14:25 No.109023088

Anonymous 06/10/26(Wed)11:14:25 No.109023088

File: reward function.jpg (184 KB, 1024x1024)

184 KB JPG

►Recent Highlights from the Previous Thread: >>109018067

--Backlash against Claude Fable 5 safeguards restricting AI development assistance:
>109018762 >109018775 >109018788 >109018951 >109018795 >109019418 >109019954 >109022832 >109021208
--Comparing llama.cpp forks and mainline for Gemma 4 MTP stability:
>109019974 >109020004 >109020057 >109020066 >109020261 >109020114 >109020145
--Comparing offline RAG databases against internal model knowledge for natural conversation:
>109018110 >109018270 >109018331 >109018502 >109018572 >109018607 >109018630
--Distinction between weight-based distillation and SFT on synthetic outputs:
>109021684 >109021705 >109021716 >109021732 >109022239 >109021739
--Ways to prevent repetitive NPC naming using external generators:
>109021769 >109021914 >109021947 >109021974 >109022002 >109022028
--Simultaneous use of MTP and ngram speculative decoding causing crashes:
>109021514 >109021800 >109021824 >109021837
--Testing Mythos/Fable using a poetic meter and Morse code puzzle:
>109020569 >109020615 >109020654 >109020690
--AI safety filters misclassifying interpretability research as adversarial attacks:
>109021825 >109021873 >109021921
--Gemma roleplay recommendations and technical methods for controlling reasoning loops:
>109019534 >109019554 >109020208 >109020239 >109020240 >109020260 >109020297 >109021562 >109019598 >109019613 >109019678 >109019696 >109019754 >109019784
--Debating the need for native image output in LLMs:
>109020549 >109020724 >109020732 >109020743 >109020760 >109020879 >109021572 >109021590 >109021626
--Sharing the Efficient Gemma Dashboard on Hugging Face:
>109019331
--Logs:
>109018604 >109018734 >109019271 >109020654 >109021269 >109021315 >109021825 >109021873 >109021921 >109021955 >109022832
--Teto, Miku (free space):
>109018109 >109018270 >109020903 >109021002 >109021733

►Recent Highlight Posts from the Previous Thread: >>109018073

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/10/26(Wed)11:15:58 No.109023099

Anonymous 06/10/26(Wed)11:15:58 No.109023099

70b dense

Anonymous
06/10/26(Wed)11:18:28 No.109023113

Anonymous 06/10/26(Wed)11:18:28 No.109023113

120b moe

Anonymous
06/10/26(Wed)11:21:53 No.109023130

Anonymous 06/10/26(Wed)11:21:53 No.109023130

The AI summary in google web search. How does that work? Would be cool if I could implement something similar I guess in searxng, it seems to be pretty small model, and they are running it on every search so it might be doable, I'm thinking. But I dunno f you tell me that it actually only works because google already has a cache on hand for it to read, rather than actually pulling the material from the results manually each time then I guess its not possible.

Anonymous
06/10/26(Wed)11:24:06 No.109023141

Anonymous 06/10/26(Wed)11:24:06 No.109023141

>>109023092
Trial and error. It also varies on not only a per-model basis but a per-quant+sampler combination basis too. glhf.

Anonymous
06/10/26(Wed)11:24:13 No.109023142

Anonymous 06/10/26(Wed)11:24:13 No.109023142

Fable writes kinda funny, I find myself re-reading its prose sometimes to understand what it meant. Looking forward to some of its traces getting fed into chink models, I bet it helps with slop.

Anonymous
06/10/26(Wed)11:25:24 No.109023149

Anonymous 06/10/26(Wed)11:25:24 No.109023149

File: 1779988194173572.jpg (1.9 MB, 4928x3264)

1.9 MB JPG

north mini > qwen35b/gemma26b for rp btw because it's white and not stem-cucked
https://huggingface.co/unsloth/North-Mini-Code-1.0-GGUF

>git clone https://github.com/ggml-org/llama.cpp
>cd llama.cpp
>git fetch origin pull/24260/head:cohere2-moe
>git checkout cohere2-moe

Anonymous
06/10/26(Wed)11:26:59 No.109023160

Anonymous 06/10/26(Wed)11:26:59 No.109023160

>>109023142
Logs for example?

Anonymous
06/10/26(Wed)11:29:25 No.109023173

Anonymous 06/10/26(Wed)11:29:25 No.109023173

>>109023149
>coding model
>not stem-cucked
>canada
>white
retard

Anonymous
06/10/26(Wed)11:29:59 No.109023176

Anonymous 06/10/26(Wed)11:29:59 No.109023176

>>109023149
cockbench?

Anonymous
06/10/26(Wed)11:30:32 No.109023180

Anonymous 06/10/26(Wed)11:30:32 No.109023180

File: 1711918139788504.png (164 KB, 961x565)

164 KB PNG

>>109023160

Anonymous
06/10/26(Wed)11:30:43 No.109023182

Anonymous 06/10/26(Wed)11:30:43 No.109023182

>>109023149
>not stem-cucked
>model has "code" in its name
Okay?

Anonymous
06/10/26(Wed)11:32:46 No.109023206

Anonymous 06/10/26(Wed)11:32:46 No.109023206

Blessed thread of friendship

Anonymous
06/10/26(Wed)11:32:59 No.109023207

Anonymous 06/10/26(Wed)11:32:59 No.109023207

>>109022939
>>109022939

Anonymous
06/10/26(Wed)11:33:23 No.109023212

Anonymous 06/10/26(Wed)11:33:23 No.109023212

File: file.png (145 KB, 1054x374)

145 KB PNG

>>109023160
Humorously, I don't have any actual narrative-style content yet, maybe it's still slopped. I'm having it rewrite a small character card game I made a while back that has different characters automatically talk to one another, and the code is fairly simple/plain. Fable is writing its comments in fairly long winded, almost run on sentences, and giving things strange names. Feels almost a bit schizo.

Anonymous
06/10/26(Wed)11:34:23 No.109023216

Anonymous 06/10/26(Wed)11:34:23 No.109023216

>>109023206
I'm not your friend, buddy

Anonymous
06/10/26(Wed)11:34:30 No.109023220

Anonymous 06/10/26(Wed)11:34:30 No.109023220

>>109023182
it's shit at coding for it leans too hard into creativity, even their own benchmarks show it doesn't hold up against qwen35b which isn't even a coding model, its name is just marketing bait

Anonymous
06/10/26(Wed)11:35:17 No.109023226

Anonymous 06/10/26(Wed)11:35:17 No.109023226

>>109023149
>coding model
>non-stemmaxxed
>RP...?
what are you on?
the whole point of coding model is to stemmaxxing it as hard they can

Anonymous
06/10/26(Wed)11:35:45 No.109023229

Anonymous 06/10/26(Wed)11:35:45 No.109023229

So which mtp model for gemma do I have to download?

Anonymous
06/10/26(Wed)11:37:13 No.109023235

Anonymous 06/10/26(Wed)11:37:13 No.109023235

>>109023142
>re-reading its prose
Don't do that. You'll identify its unique brand of slop quicker that way.

Anonymous
06/10/26(Wed)11:37:59 No.109023237

Anonymous 06/10/26(Wed)11:37:59 No.109023237

>>109023220
How is a small company failing to make a competitive coding model a sign that they secretly created a creative roleplay model? This is Cohere, they made one good model then leaned extremely heavily into safety. Is this some attempt at shitposting? Shilling trash models to make people waste their time?

Anonymous
06/10/26(Wed)11:38:11 No.109023240

Anonymous 06/10/26(Wed)11:38:11 No.109023240

>>109023226
>>109023220
that was their intent but their post-training was dogshit and didn't achieve its goal

Anonymous
06/10/26(Wed)11:39:30 No.109023249

Anonymous 06/10/26(Wed)11:39:30 No.109023249

>>109023240
topkek if true
lmao

Anonymous
06/10/26(Wed)11:40:21 No.109023253

Anonymous 06/10/26(Wed)11:40:21 No.109023253

>>109023085
>(06/05) Gemma 4 QAT models released: https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4
Are these any good?

Anonymous
06/10/26(Wed)11:40:26 No.109023254

Anonymous 06/10/26(Wed)11:40:26 No.109023254

>>109023182
>>109023226
>>109023237
>>109023249
https://huggingface.co/spaces/CohereLabs/North-Mini-Code-1.0
try it

Anonymous
06/10/26(Wed)11:41:40 No.109023260

Anonymous 06/10/26(Wed)11:41:40 No.109023260

File: north mini code canadian AI.jpg (2.43 MB, 2160x2866)

2.43 MB JPG

North mini code is the best AI that just came out.
Look at these benchmarks. I'm gonna replace my qwen3.6 with it.

Anonymous
06/10/26(Wed)11:41:46 No.109023261

Anonymous 06/10/26(Wed)11:41:46 No.109023261

>>109023254
>try it
Too lazy to test something that doesn't work out of the box with my docker image.

Anonymous
06/10/26(Wed)11:42:00 No.109023262

Anonymous 06/10/26(Wed)11:42:00 No.109023262

>>109023253
I moved from Q6 to it and didn't have any problems, feels about the same. Other people are saying they are having problems with it. Try it yourself.

Anonymous
06/10/26(Wed)11:42:40 No.109023266

Anonymous 06/10/26(Wed)11:42:40 No.109023266

>>109023149
I actually believe in Cohere and think Canada number 1. That said, if you want me to download this meme, post at least one (1 a single) screenshoot of it in real RP, Aidan.

Anonymous
06/10/26(Wed)11:42:47 No.109023268

Anonymous 06/10/26(Wed)11:42:47 No.109023268

>>109023130
>The AI summary in google web search. How does that work? Would be cool if I could implement something similar I guess in searxng
thats what i was just thinking.

anyway i think its gemini flash+cache. the answers are never very good

Anonymous
06/10/26(Wed)11:43:16 No.109023272

Anonymous 06/10/26(Wed)11:43:16 No.109023272

>>109023260
I can't trust any infographic made with AI.

Anonymous
06/10/26(Wed)11:43:56 No.109023275

Anonymous 06/10/26(Wed)11:43:56 No.109023275

>>109023130
and i asked an AI about this it said you could combine searxng and "perplexia"

Anonymous
06/10/26(Wed)11:44:28 No.109023280

Anonymous 06/10/26(Wed)11:44:28 No.109023280

>>109023220
It holds its own in science

Anonymous
06/10/26(Wed)11:45:05 No.109023282

Anonymous 06/10/26(Wed)11:45:05 No.109023282

zamn, since frontier models got good I've been so busy with prompting that I don't even have time anymore to check on by here to see how the loccets are doing

Anonymous
06/10/26(Wed)11:45:38 No.109023283

Anonymous 06/10/26(Wed)11:45:38 No.109023283

>>109023282
you're mom is a locust

Anonymous
06/10/26(Wed)11:45:53 No.109023287

Anonymous 06/10/26(Wed)11:45:53 No.109023287

File: 1762013852048458.png (1.32 MB, 1180x1536)

1.32 MB PNG

>>109023260
>5B parameters fewer
>destroys the competition

Anonymous
06/10/26(Wed)11:46:07 No.109023288

Anonymous 06/10/26(Wed)11:46:07 No.109023288

File: 1769807449753523.jpg (63 KB, 329x283)

63 KB JPG

When are they dropping the transformer meme

Anonymous
06/10/26(Wed)11:46:55 No.109023296

Anonymous 06/10/26(Wed)11:46:55 No.109023296

File: 1779772755779154.webm (1.73 MB, 384x1080)

1.73 MB WEBM

>>109023288
not now frenchie

Anonymous
06/10/26(Wed)11:48:26 No.109023300

Anonymous 06/10/26(Wed)11:48:26 No.109023300

>>109023288
When you release a new model using revolutionary architecture that forces the whole market to abandon all of their research and tooling.

Anonymous
06/10/26(Wed)11:49:21 No.109023306

Anonymous 06/10/26(Wed)11:49:21 No.109023306

>>109023272
I can't trust any infographic. This format is made to deceive

Anonymous
06/10/26(Wed)11:49:36 No.109023309

Anonymous 06/10/26(Wed)11:49:36 No.109023309

>>109023288
The Transformer architecture is not the problem. Even models with JEPA will still use it.

Anonymous
06/10/26(Wed)11:52:22 No.109023325

Anonymous 06/10/26(Wed)11:52:22 No.109023325

>>109023130
>>109023268
>>109023275
Lol idiots. It's just RAG with one giant cached web db. Perplexity is also just RAG. That's why you will never find niche information which was published more recent than 1 month ago on a random ass forum/blog which isnt twitter/reddit or whatever They are all RAG, because only RAG can deal with the sheer amount of stored (cached) web info.

Anonymous
06/10/26(Wed)11:53:38 No.109023327

Anonymous 06/10/26(Wed)11:53:38 No.109023327

How did a Canadian company become the best at stem models?

Anonymous
06/10/26(Wed)11:53:44 No.109023329

Anonymous 06/10/26(Wed)11:53:44 No.109023329

will a 5090 be enough for uncensored image gen or do I wait for the next generation?

Anonymous
06/10/26(Wed)11:54:48 No.109023336

Anonymous 06/10/26(Wed)11:54:48 No.109023336

>>109023325
you could have just said "yes, it does use cache"

Anonymous
06/10/26(Wed)11:55:03 No.109023340

Anonymous 06/10/26(Wed)11:55:03 No.109023340

>>109023325
ill start caring about what ai words mean when they start accomplishing anything truly novel

Anonymous
06/10/26(Wed)11:55:05 No.109023341

Anonymous 06/10/26(Wed)11:55:05 No.109023341

>>109023329
5090 does image and video gen just fine. Also wrong general.

Anonymous
06/10/26(Wed)11:55:14 No.109023342

Anonymous 06/10/26(Wed)11:55:14 No.109023342

I can't wait to see what models 3-5 years from now can do (unironically).

Anonymous
06/10/26(Wed)11:55:39 No.109023344

Anonymous 06/10/26(Wed)11:55:39 No.109023344

>>109023329
>>>/g/ldg

Anonymous
06/10/26(Wed)11:56:29 No.109023351

Anonymous 06/10/26(Wed)11:56:29 No.109023351

>>109023342
Nothing useful for my 1050ti

Anonymous
06/10/26(Wed)11:56:38 No.109023353

Anonymous 06/10/26(Wed)11:56:38 No.109023353

>>109023329
Putting subject matter aside you're basically asking to buy flowers in a area that had severe drought where everyone was warned for years that the drought was going to happen.
I can gen unlimited waifu with fat tits and dark nipples because I used my cognitive while you sat there farding and shidding in the corner

Anonymous
06/10/26(Wed)11:56:38 No.109023354

Anonymous 06/10/26(Wed)11:56:38 No.109023354

>>109023341
>>109023344
ty

Anonymous
06/10/26(Wed)11:57:44 No.109023359

Anonymous 06/10/26(Wed)11:57:44 No.109023359

>>109023342
Hybrid LLM-Jepa models that make perfect AI waifus, but all local infrastructure and frontends have been sabotaged into unusability and nobody publishes their own vibed replacements that they spent 30 gorillion Claude shekels on.

Anonymous
06/10/26(Wed)11:59:01 No.109023365

Anonymous 06/10/26(Wed)11:59:01 No.109023365

File: snapshot.jpg (151 KB, 1280x720)

151 KB JPG

>>109023149

Anonymous
06/10/26(Wed)12:01:03 No.109023375

Anonymous 06/10/26(Wed)12:01:03 No.109023375

File: 1763337062196954.png (20 KB, 578x61)

20 KB PNG

>>109022985
I get like 6t/s with 26b

Anonymous
06/10/26(Wed)12:01:13 No.109023377

Anonymous 06/10/26(Wed)12:01:13 No.109023377

Trying to run NoLiMA on Gemma and even with average 600 t/s PP the sub 1K context tests have taken several hours each to complete.
Probably should just rent a server next time.

Anonymous
06/10/26(Wed)12:02:44 No.109023389

Anonymous 06/10/26(Wed)12:02:44 No.109023389

>>109023375
--n-cpu-moe

Anonymous
06/10/26(Wed)12:06:04 No.109023403

Anonymous 06/10/26(Wed)12:06:04 No.109023403

>>109023375
I'm running Q8 26B 128k ctx with 8GB VRAM like this, try it
n-gpu-layers = 99
flash-attn = on
jinja = true
temp = 1.0
top-p = 0.95
top-k = 64
parallel = 1
cpu-moe = true
ctx-size = 131072

Anonymous
06/10/26(Wed)12:06:22 No.109023405

Anonymous 06/10/26(Wed)12:06:22 No.109023405

>>109023389
That's with n-gpu-layers = 10, n-cpu-moe = 8

Anonymous
06/10/26(Wed)12:07:47 No.109023412

Anonymous 06/10/26(Wed)12:07:47 No.109023412

File: 1779966863358960.png (151 KB, 781x682)

151 KB PNG

IT'S OVER
https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/

Anonymous
06/10/26(Wed)12:09:14 No.109023421

Anonymous 06/10/26(Wed)12:09:14 No.109023421

File: HOLY FUCKING KINOOOOOOOOO(...).jpg (203 KB, 700x392)

203 KB JPG

>>109023412
HOLY FUCKING KINO

Anonymous
06/10/26(Wed)12:10:14 No.109023423

Anonymous 06/10/26(Wed)12:10:14 No.109023423

>>109023412
gguf status?

Anonymous
06/10/26(Wed)12:10:27 No.109023425

Anonymous 06/10/26(Wed)12:10:27 No.109023425

File: 1769252380992520.gif (191 KB, 150x156)

191 KB GIF

>>109023412
WTF

Anonymous
06/10/26(Wed)12:10:31 No.109023426

Anonymous 06/10/26(Wed)12:10:31 No.109023426

>>109023405
>n-gpu-layers = 10
NTA, but n-gpu-layers 99, n-cpu-moe 99 or however much you can fit in your RAM

Anonymous
06/10/26(Wed)12:10:42 No.109023427

Anonymous 06/10/26(Wed)12:10:42 No.109023427

>>109023412
I only care about prefill speed at this point

Anonymous
06/10/26(Wed)12:10:48 No.109023430

Anonymous 06/10/26(Wed)12:10:48 No.109023430

>>109023412
>diffusion
wait WHAT?

Anonymous
06/10/26(Wed)12:10:50 No.109023432

Anonymous 06/10/26(Wed)12:10:50 No.109023432

>israel
Holy fucking shit we won

Anonymous
06/10/26(Wed)12:11:38 No.109023435

Anonymous 06/10/26(Wed)12:11:38 No.109023435

File: file.png (9 KB, 577x44)

9 KB PNG

>>109023412
Claude doesn't want me to know what this is about

Anonymous
06/10/26(Wed)12:11:57 No.109023437

Anonymous 06/10/26(Wed)12:11:57 No.109023437

File: 2026-06-10-121146_1036x89(...).png (811 KB, 1036x896)

811 KB PNG

>>109023412
OVER 1000!!!!

Anonymous
06/10/26(Wed)12:11:59 No.109023438

Anonymous 06/10/26(Wed)12:11:59 No.109023438

>>109023412
Finally a major player doing diffusion llms.

Anonymous
06/10/26(Wed)12:12:24 No.109023440

Anonymous 06/10/26(Wed)12:12:24 No.109023440

>>109023412
Small models and a dozen refinement passes may be the future. Now if only the models could recognize their own mistakes...

Anonymous
06/10/26(Wed)12:13:55 No.109023459

Anonymous 06/10/26(Wed)12:13:55 No.109023459

File: 1756356668077042.jpg (416 KB, 1280x1920)

416 KB JPG

UHHHHHH FABLE BROS?

Anonymous
06/10/26(Wed)12:14:01 No.109023460

Anonymous 06/10/26(Wed)12:14:01 No.109023460

File: 2026-06-10-121341_1493x83(...).png (324 KB, 1493x837)

324 KB PNG

>>109023412
>It's bad

Anonymous
06/10/26(Wed)12:14:17 No.109023461

Anonymous 06/10/26(Wed)12:14:17 No.109023461

>>109023412
now this won't make it into llama.cpp anytime soon

Anonymous
06/10/26(Wed)12:14:19 No.109023462

Anonymous 06/10/26(Wed)12:14:19 No.109023462

>>109023432
Won how? How text streaming will work with diffusion model?

Anonymous
06/10/26(Wed)12:14:33 No.109023466

Anonymous 06/10/26(Wed)12:14:33 No.109023466

>>109023412
>DiffusionGemma fits comfortably within 18GB VRAM
16 gb bros getting fucked again

Anonymous
06/10/26(Wed)12:14:51 No.109023469

Anonymous 06/10/26(Wed)12:14:51 No.109023469

>>109023412
https://huggingface.co/google/diffusiongemma-26B-A4B-it
WHY COULDN'T IT HAVE BEEN THE 31B

Anonymous
06/10/26(Wed)12:14:56 No.109023470

Anonymous 06/10/26(Wed)12:14:56 No.109023470

>>109023460
That's crazy fucking good for a prototype

Anonymous
06/10/26(Wed)12:15:42 No.109023476

Anonymous 06/10/26(Wed)12:15:42 No.109023476

>>109023336
No, retard. You cant grep on your billion terrabite webcache dir like your coding agent does. That's why you need a vector database and RAG. Lmao 2026 newfags dont even know what RAG is now because all the retarded vibecoding talking heads kept spamming the
>grep is all you need
meme

Anonymous
06/10/26(Wed)12:15:58 No.109023478

Anonymous 06/10/26(Wed)12:15:58 No.109023478

Qwensisters...our response?

Anonymous
06/10/26(Wed)12:16:26 No.109023483

Anonymous 06/10/26(Wed)12:16:26 No.109023483

>>109023469
What's the point? You are going to wait for the whole message to complete until you see any text vs text streaming at readable speed

Anonymous
06/10/26(Wed)12:16:52 No.109023486

Anonymous 06/10/26(Wed)12:16:52 No.109023486

>>109023412
Diffusion models don't perform well on system RAM even if this ever gets llama.cpp support. It's useless unless you can run it fully on your GPUs, which is pointless for MoE.

Anonymous
06/10/26(Wed)12:16:53 No.109023487

Anonymous 06/10/26(Wed)12:16:53 No.109023487

Google is really cooking huh?

Anonymous
06/10/26(Wed)12:17:23 No.109023492

Anonymous 06/10/26(Wed)12:17:23 No.109023492

>>109023483
Diffusion models are the future. We've been waiting for this for years.

Anonymous
06/10/26(Wed)12:17:27 No.109023494

Anonymous 06/10/26(Wed)12:17:27 No.109023494

>>109023476
RAG is the biggest cope and meme

Anonymous
06/10/26(Wed)12:18:26 No.109023497

Anonymous 06/10/26(Wed)12:18:26 No.109023497

https://huggingface.co/unsloth/diffusiongemma-26B-A4B-it-GGUF
https://unsloth.ai/docs/models/diffusiongemma

Anonymous
06/10/26(Wed)12:19:15 No.109023503

Anonymous 06/10/26(Wed)12:19:15 No.109023503

>>109023426
>>109023403
26.92 t/s with
sm = layer
spec-type = draft-mtp
n-gpu-layers = 99
n-cpu-moe = 999
same with cpu-moe = true instead of n-cpu-moe = 999

Anonymous
06/10/26(Wed)12:19:20 No.109023505

Anonymous 06/10/26(Wed)12:19:20 No.109023505

>>109023476
Are you retarded? We are talking about a search engine. It already has the results, then the cache is a simple look up with the URL as the key. RAG would be slower than this.

Anonymous
06/10/26(Wed)12:19:50 No.109023511

Anonymous 06/10/26(Wed)12:19:50 No.109023511

Just the fact they're even experimenting with this is huge for us. Massive win for local again.

Anonymous
06/10/26(Wed)12:20:16 No.109023515

Anonymous 06/10/26(Wed)12:20:16 No.109023515

>>109023412
gemma-chan won...

Anonymous
06/10/26(Wed)12:22:11 No.109023531

Anonymous 06/10/26(Wed)12:22:11 No.109023531

>>109023497
>Change -n 2048 as the number of tokens you want to predict, so more will produce longer answers.
How am I supposed to know how many tokens I want to predict?

Anonymous
06/10/26(Wed)12:22:26 No.109023534

Anonymous 06/10/26(Wed)12:22:26 No.109023534

File: katawa shrug.jpg (198 KB, 828x605)

198 KB JPG

>>109023412
Though the dream is over, neat. More new toys to play with.

Anonymous
06/10/26(Wed)12:23:37 No.109023543

Anonymous 06/10/26(Wed)12:23:37 No.109023543

>>109023531
If you've played with image models you naturally get a feel for the numbers that are right for the task/project. Just have to experiment.

Anonymous
06/10/26(Wed)12:23:59 No.109023545

Anonymous 06/10/26(Wed)12:23:59 No.109023545

>>109023534
We all knew they would never release 124b, so there is no loss here, only a win

Anonymous
06/10/26(Wed)12:24:07 No.109023549

Anonymous 06/10/26(Wed)12:24:07 No.109023549

>>109023503
>>109023403(me)
I didn't test mtp yet and with my old 2060S I get 17-20 t/s so you're already ahead of me
>>109023412
>18GB VRAM
it's so over

Anonymous
06/10/26(Wed)12:24:38 No.109023553

Anonymous 06/10/26(Wed)12:24:38 No.109023553

>>109023543
No.

Anonymous
06/10/26(Wed)12:24:41 No.109023554

Anonymous 06/10/26(Wed)12:24:41 No.109023554

>>109023543
Gotcha, 1024x1024.

Anonymous
06/10/26(Wed)12:25:00 No.109023555

Anonymous 06/10/26(Wed)12:25:00 No.109023555

>moe
>diffusion model
lmao

Anonymous
06/10/26(Wed)12:25:11 No.109023556

Anonymous 06/10/26(Wed)12:25:11 No.109023556

File: 1761666691203175.jpg (101 KB, 1920x1080)

101 KB JPG

>18GB
for now

Anonymous
06/10/26(Wed)12:25:40 No.109023560

Anonymous 06/10/26(Wed)12:25:40 No.109023560

>>109023505
No, you are the retard. We are talking about AI summary from top web search results. Which obviously runs RAG on the cached web db to find the relevant text bits, which are then fed to gemini to answer the query with. Holyfuck what happened to this general, only idiots ans noobs left here.

Anonymous
06/10/26(Wed)12:26:13 No.109023564

Anonymous 06/10/26(Wed)12:26:13 No.109023564

>>109023531
Maybe they will finally fix prefill in lmao.copp?

Anonymous
06/10/26(Wed)12:26:29 No.109023565

Anonymous 06/10/26(Wed)12:26:29 No.109023565

where can I download 2 more GB of vram?
I don't care I'll pirate it if need be

Anonymous
06/10/26(Wed)12:27:17 No.109023573

Anonymous 06/10/26(Wed)12:27:17 No.109023573

>>109023565
leather jacket man will not be happy
leather jacket man will find you

Anonymous
06/10/26(Wed)12:27:20 No.109023575

Anonymous 06/10/26(Wed)12:27:20 No.109023575

>deepmind step-up and give localchads something after all the cloud fable bs IPO faggotry
>immediately start complaining about the very first test/use of the tech that will inevitably improve and shrink

Anonymous
06/10/26(Wed)12:27:25 No.109023576

Anonymous 06/10/26(Wed)12:27:25 No.109023576

>>109023412
that's nuts
31b when

Anonymous
06/10/26(Wed)12:28:27 No.109023582

Anonymous 06/10/26(Wed)12:28:27 No.109023582

File: 1672862565083.png (265 KB, 510x458)

265 KB PNG

>>109023412
>Additionally, official support for llama.cpp is arriving soon.

Anonymous
06/10/26(Wed)12:29:05 No.109023586

Anonymous 06/10/26(Wed)12:29:05 No.109023586

File: IT'S STILL REAL TO ME DAMN IT.jpg (21 KB, 425x320)

21 KB JPG

>>109023545
You're right :(
:)

Anonymous
06/10/26(Wed)12:29:16 No.109023591

Anonymous 06/10/26(Wed)12:29:16 No.109023591

>>109023565
Just buy 3070, they're cheap

Anonymous
06/10/26(Wed)12:29:27 No.109023592

Anonymous 06/10/26(Wed)12:29:27 No.109023592

>>109023427
same for my use case, i need to ingest a crazy amount of data, like 5k+ PP tks at a level of Gemma 4 31B intelligence

Anonymous
06/10/26(Wed)12:31:35 No.109023604

Anonymous 06/10/26(Wed)12:31:35 No.109023604

https://huggingface.co/google/diffusiongemma-70b-it
https://huggingface.co/google/diffusiongemma-70b-it
https://huggingface.co/google/diffusiongemma-70b-it

Anonymous
06/10/26(Wed)12:31:55 No.109023607

Anonymous 06/10/26(Wed)12:31:55 No.109023607

>>109023604
thx

Anonymous
06/10/26(Wed)12:32:02 No.109023608

Anonymous 06/10/26(Wed)12:32:02 No.109023608

>update llmaocpp
>free 4tk/s upgrade
nice.

Anonymous
06/10/26(Wed)12:32:16 No.109023609

Anonymous 06/10/26(Wed)12:32:16 No.109023609

>>109023592
Any serious agentic shit is 90% pp 10% tg

Anonymous
06/10/26(Wed)12:32:41 No.109023611

Anonymous 06/10/26(Wed)12:32:41 No.109023611

>>109023604
You

Anonymous
06/10/26(Wed)12:33:03 No.109023613

Anonymous 06/10/26(Wed)12:33:03 No.109023613

>>109023604
WOW

Anonymous
06/10/26(Wed)12:33:37 No.109023618

Anonymous 06/10/26(Wed)12:33:37 No.109023618

>>109023604
I didn't even click but I'm still giving (you) a reply because I want it to be real someday.

Anonymous
06/10/26(Wed)12:34:08 No.109023623

Anonymous 06/10/26(Wed)12:34:08 No.109023623

File: file.png (8 KB, 118x114)

8 KB PNG

>>109023604
>mfw

Anonymous
06/10/26(Wed)12:36:06 No.109023643

Anonymous 06/10/26(Wed)12:36:06 No.109023643

Oh shit, day1 support in vllm on spark. Recipe already there, but nvidias NVFP4 quant is still 404ing..

Anonymous
06/10/26(Wed)12:37:02 No.109023647

Anonymous 06/10/26(Wed)12:37:02 No.109023647

>>109023212
Hahaha, working with this a bit more, and realized it's costing me over $5 per message (at least I have free credits). Utterly insane. Back to Gemma.

Anonymous
06/10/26(Wed)12:37:27 No.109023652

Anonymous 06/10/26(Wed)12:37:27 No.109023652

>>109023412
But what about prefill processing.... prefill? Anyone? Remember prefill? That's ol thing?

Well this will be good for translating stuff I guess

Anonymous
06/10/26(Wed)12:37:48 No.109023654

Anonymous 06/10/26(Wed)12:37:48 No.109023654

File: file.png (5 KB, 393x49)

5 KB PNG

Not gonna run, is it.

Anonymous
06/10/26(Wed)12:41:01 No.109023671

Anonymous 06/10/26(Wed)12:41:01 No.109023671

>>109023092
You'll find lots of noise about this in general. The best answer is to know how your model operates and prompt around it. Which means, trial and error.

Anonymous
06/10/26(Wed)12:43:02 No.109023685

Anonymous 06/10/26(Wed)12:43:02 No.109023685

Can diffusion llms have negative prompt like image models?

Anonymous
06/10/26(Wed)12:43:40 No.109023689

Anonymous 06/10/26(Wed)12:43:40 No.109023689

Holy shit don't use Fable 5 bros..... I can't go back to local anymore after using it once. It reminds me of the massive gap we had when Claude Opus 3 released compared to fucking mythomax at the time.

Anonymous
06/10/26(Wed)12:44:42 No.109023696

Anonymous 06/10/26(Wed)12:44:42 No.109023696

>>109023685
What would you even prompt in the negative?

Anonymous
06/10/26(Wed)12:46:06 No.109023703

Anonymous 06/10/26(Wed)12:46:06 No.109023703

>>109023685
Yes, but you don't want it
>double memory consumption for context

Anonymous
06/10/26(Wed)12:46:10 No.109023704

Anonymous 06/10/26(Wed)12:46:10 No.109023704

Your brain+Gemma31b > Fable
and free!

Anonymous
06/10/26(Wed)12:46:10 No.109023705

Anonymous 06/10/26(Wed)12:46:10 No.109023705

>>109023696
NTA, but things like "you are a very safe assistant" could work.
The only problem is that negative prompting would need dedicated training.

Anonymous
06/10/26(Wed)12:47:28 No.109023711

Anonymous 06/10/26(Wed)12:47:28 No.109023711

>>109023207
I've really liked using an ARA Gemma finetune for image description and transcription. It's perfectly capable of image analysis, even complex ones. It needs pushing to note features of an image that it wouldn't bring up (like sexual characteristics) but that's purely a prompt issue. It's been great at reading text thus far, as long as it's got the resolution and tokens to work with.

Anonymous
06/10/26(Wed)12:48:21 No.109023716

Anonymous 06/10/26(Wed)12:48:21 No.109023716

diffusion qat q4_0 goof where

Anonymous
06/10/26(Wed)12:48:33 No.109023719

Anonymous 06/10/26(Wed)12:48:33 No.109023719

>>109023689
>$200 per hour
no thanks

Anonymous
06/10/26(Wed)12:48:34 No.109023720

Anonymous 06/10/26(Wed)12:48:34 No.109023720

>>109023689
I talk a little bit and lost like $5 this shit is insane

Anonymous
06/10/26(Wed)12:48:46 No.109023722

Anonymous 06/10/26(Wed)12:48:46 No.109023722

>>109023685
https://docs.sillytavern.app/usage/prompts/cfg/

Anonymous
06/10/26(Wed)12:49:05 No.109023726

Anonymous 06/10/26(Wed)12:49:05 No.109023726

How does img2img work with diffusion text models?

Anonymous
06/10/26(Wed)12:50:03 No.109023732

Anonymous 06/10/26(Wed)12:50:03 No.109023732

File: file.png (52 KB, 990x257)

52 KB PNG

awww

Anonymous
06/10/26(Wed)12:50:58 No.109023740

Anonymous 06/10/26(Wed)12:50:58 No.109023740

>>109023732
You'll need to build on linux

Anonymous
06/10/26(Wed)12:52:21 No.109023747

Anonymous 06/10/26(Wed)12:52:21 No.109023747

>>109023726
Give it text example you like. Set denoise to 0.5. Flirt with gemma-chan.

Anonymous
06/10/26(Wed)12:53:03 No.109023754

Anonymous 06/10/26(Wed)12:53:03 No.109023754

>>109023689
>local

Anonymous
06/10/26(Wed)12:53:51 No.109023759

Anonymous 06/10/26(Wed)12:53:51 No.109023759

>>109023689
pics or it didn't happen

Anonymous
06/10/26(Wed)12:54:18 No.109023762

Anonymous 06/10/26(Wed)12:54:18 No.109023762

>>109023732
I dont think many bother to build on windows, that thing is a mess. Boot up your container/wsl

Anonymous
06/10/26(Wed)12:55:53 No.109023776

Anonymous 06/10/26(Wed)12:55:53 No.109023776

https://developer.nvidia.com/blog/run-diffusiongemma-on-nvidia-for-developer-ready-high-throughput-text-generation/

150 tok/s on one spark, tensor parallel is not supported for this architecture.

Meh. Gemma 26B 4A is already plenty fast enough with MTP, plus great tensor parallel sccaling. Will give it a try once NVFP4 is up.

Anonymous
06/10/26(Wed)12:57:43 No.109023787

Anonymous 06/10/26(Wed)12:57:43 No.109023787

>>109023556
is it weird i have a nerd crush on this guy
i like.. totally would play chess with him.

Anonymous
06/10/26(Wed)12:59:02 No.109023797

Anonymous 06/10/26(Wed)12:59:02 No.109023797

What was the trick to make gemma not suck at understanding images? I forgot the values

Anonymous
06/10/26(Wed)12:59:04 No.109023798

Anonymous 06/10/26(Wed)12:59:04 No.109023798

>>109023787
he’s british tho…

Anonymous
06/10/26(Wed)12:59:17 No.109023799

Anonymous 06/10/26(Wed)12:59:17 No.109023799

>>109023685
CFG is already a part of llama and most everything, but almost nobody ever uses it because it gives shit results for double the memory and half the speed. There might be some intelligent way to use it for LLMs, but nobody that I know's figured it out. Probably dynamically drafting some undesirable, shitty text that's roughly equivalent to what you have typed up – in Stable Diffusion CFG also benefits from having a balanced token count with the positive prompt. It might actually work a lot better with diffusion LLMs like that gemma-chan, since from experience negative prompts are better for suggestion rather than outright commands, and it would benefit multiple refinement steps more since each step could have proportionally more of the negative prompt's effects than the per-token all-or-nothing sampling that most LLMs do.

Anonymous
06/10/26(Wed)12:59:56 No.109023801

Anonymous 06/10/26(Wed)12:59:56 No.109023801

>>109023412
Neat but when you think about it it basically is perfect tech for safetyslopping since it can look at the entire sentence and be like wow there is a no-no word here I will replace it with something safer before finalizing it! And might be harder to finetune an uncensored model. Hopefully I am wrong but rarely does anything good happen.

Anonymous
06/10/26(Wed)13:00:53 No.109023807

Anonymous 06/10/26(Wed)13:00:53 No.109023807

>>109023798
yeah well fuck you.
btw we want our country back so we can unfuck what the mcdonald trump did to it

Anonymous
06/10/26(Wed)13:01:32 No.109023813

Anonymous 06/10/26(Wed)13:01:32 No.109023813

File: 1766758677849786.jpg (101 KB, 659x720)

101 KB JPG

>>109023801
NSFW gemma-chan LoRAs soon

Anonymous
06/10/26(Wed)13:01:53 No.109023814

Anonymous 06/10/26(Wed)13:01:53 No.109023814

>>109023797
You crank the resolution up, give it a much higher token allowance and make sure you're not accidentally uploading shit jaypegs. For Gemma I put on 2024 maxres and 560-1120 CLIP tokens. Prompt also matters.

Anonymous
06/10/26(Wed)13:02:53 No.109023821

Anonymous 06/10/26(Wed)13:02:53 No.109023821

>>109023813
I wonder if a control vector would be enough.

Anonymous
06/10/26(Wed)13:03:17 No.109023824

Anonymous 06/10/26(Wed)13:03:17 No.109023824

>>109023412
At 1000 tokens/s, for single users multiple corrective text generation passes should be "free".

Anonymous
06/10/26(Wed)13:04:30 No.109023830

Anonymous 06/10/26(Wed)13:04:30 No.109023830

>>109023801
Im more concerned with the slop rapid fire, block generation might or might not increase their frequency

Anonymous
06/10/26(Wed)13:05:25 No.109023835

Anonymous 06/10/26(Wed)13:05:25 No.109023835

File: miku small migu eyes.png (246 KB, 800x800)

246 KB PNG

Give me your best kobold card to run on koboldcpp (but not necessarily on kobble, sorry you lobotomite)

Anonymous
06/10/26(Wed)13:08:31 No.109023857

Anonymous 06/10/26(Wed)13:08:31 No.109023857

File: Mendo.png (1.55 MB, 1088x1368)

1.55 MB PNG

>>109023835
https://files.catbox.moe/6z0d7v.png

Anonymous
06/10/26(Wed)13:08:50 No.109023858

Anonymous 06/10/26(Wed)13:08:50 No.109023858

The "big model smell" on fable is absolutely insane. If you could just train a bigger model and unlock this level of capability why has no one done so before??? We've been on gpt4 size range for 3 years now and while the models have gotten better they don't feel significantly smarter, just more knowledgeable and better at giving the right output. Fable proves there is a fuckton of qualitative intelligence still left to pick purely by making models bigger.

I now legitimately believe AGI is possible purely from scaling parameter count up to a ridiculous level. Maybe only 100x the size of fable maybe a million times but there has to be a spot where the big model smell is so intense it is equivalent to AGI.

Fable is not there at all but its the first "wow" moment i experienced since gpt4, not even the o1 "reasoning" breakthrough felt as significant as this jump.

Anonymous
06/10/26(Wed)13:09:40 No.109023864

Anonymous 06/10/26(Wed)13:09:40 No.109023864

>>109023858
You got a bit of cum on your chin princess

Anonymous
06/10/26(Wed)13:11:39 No.109023879

Anonymous 06/10/26(Wed)13:11:39 No.109023879

>cloudfags have shill bots

Anonymous
06/10/26(Wed)13:12:05 No.109023883

Anonymous 06/10/26(Wed)13:12:05 No.109023883

MTP doesn't seem to give any meaningful speed increases for roleplay

Anonymous
06/10/26(Wed)13:12:10 No.109023886

Anonymous 06/10/26(Wed)13:12:10 No.109023886

>>109023858
/exit

Anonymous
06/10/26(Wed)13:12:21 No.109023888

Anonymous 06/10/26(Wed)13:12:21 No.109023888

>>109023814
How though? I tried messing with the image resolution param but it crashes llama-server

Anonymous
06/10/26(Wed)13:13:18 No.109023896

Anonymous 06/10/26(Wed)13:13:18 No.109023896

>>109023864
That's an anthropic jeet, calling it a princess is too flattering for it.
>>109023879
This general is a regular consensus manufacturing hotspot for several large labs. You didn't think anons posting about "qwenshills" last month were hallucinating did you?

Anonymous
06/10/26(Wed)13:14:18 No.109023901

Anonymous 06/10/26(Wed)13:14:18 No.109023901

>>109023857
Gemma with Mendo's card behaves like cardless Kimi K2

Anonymous
06/10/26(Wed)13:15:05 No.109023909

Anonymous 06/10/26(Wed)13:15:05 No.109023909

>>109023801
just remove the safety vectors bwo

Anonymous
06/10/26(Wed)13:17:01 No.109023918

Anonymous 06/10/26(Wed)13:17:01 No.109023918

>>109023412
oh my fucking god, look who's responsible for the support.....
https://github.com/ggml-org/llama.cpp/pull/24423
this thing is never going to work correctly, will it?

Anonymous
06/10/26(Wed)13:17:13 No.109023920

Anonymous 06/10/26(Wed)13:17:13 No.109023920

>>109023896
At this point I assume anyone saying anything positive at all is a shill

Anonymous
06/10/26(Wed)13:17:38 No.109023927

Anonymous 06/10/26(Wed)13:17:38 No.109023927

>>109023896
qwen/gemma shills at the very least were talking about shit that people can run at home, not something you need to sell a kidney for to do anything barely meaningful

Anonymous
06/10/26(Wed)13:18:14 No.109023930

Anonymous 06/10/26(Wed)13:18:14 No.109023930

File: 1755947430800594.png (124 KB, 811x526)

124 KB PNG

>>109023888
>Additionally, you will also have to set --batch-size and--ubatch-size above whatever value you choose for image-max-tokens

Anonymous
06/10/26(Wed)13:18:25 No.109023934

Anonymous 06/10/26(Wed)13:18:25 No.109023934

>>109023534
More new ways to manipulate the AI would be fun, like placing specific tokens somehow in the output and forcing it to generate around them.

Anonymous
06/10/26(Wed)13:18:43 No.109023939

Anonymous 06/10/26(Wed)13:18:43 No.109023939

>>109023920
I like 31b finetunes. What lab can I get to pay me for saying this here?

Anonymous
06/10/26(Wed)13:18:48 No.109023940

Anonymous 06/10/26(Wed)13:18:48 No.109023940

>>109023918
It'll work after several hundred bug fixes and reuploads that keeps their repo bumped on huggingface all week

Anonymous
06/10/26(Wed)13:18:55 No.109023941

Anonymous 06/10/26(Wed)13:18:55 No.109023941

>>109023918
>Disclaimer Heavy usage of AI, but verified logits matching with transformers, checked FP16 vs FP32 KV cache, long context checks and much more
I hate unslop so much.

Anonymous
06/10/26(Wed)13:20:33 No.109023953

Anonymous 06/10/26(Wed)13:20:33 No.109023953

File: 2026-06-10-132001_964x172(...).png (21 KB, 964x172)

21 KB PNG

>>109023918
>haha teehee XD

Anonymous
06/10/26(Wed)13:20:37 No.109023955

Anonymous 06/10/26(Wed)13:20:37 No.109023955

I just tried putting -DGGML_SCHED_MAX_COPIES=1 in my cmake command and it indeed seems to be saving some VRAM. Not a ton though. About half a GiB. Still worth it though since I don't do any parallel requests. And the MTP weights take up around that much anyway so now it's like I clawed that back. Btw this only applies for multi-GPU setups since supposedly single GPUs do not get multiple scheduling.

Why the fuck do they not document shit like this as well as DGGML_LLAMAFILE in the build readme, motherfuckers.

Anonymous
06/10/26(Wed)13:21:19 No.109023960

Anonymous 06/10/26(Wed)13:21:19 No.109023960

>>109023934
I like that idea. Wonder if FITM just werks with diffusion?

Anonymous
06/10/26(Wed)13:22:35 No.109023968

Anonymous 06/10/26(Wed)13:22:35 No.109023968

File: file.jpg (9 KB, 346x150)

9 KB JPG

>>109023953
That mouth needs a second cock in it.

Anonymous
06/10/26(Wed)13:22:38 No.109023969

Anonymous 06/10/26(Wed)13:22:38 No.109023969

>>109023918
>day 100 of 0 day support

Anonymous
06/10/26(Wed)13:24:07 No.109023984

Anonymous 06/10/26(Wed)13:24:07 No.109023984

>>109023955
>I just tried putting -DGGML_SCHED_MAX_COPIES=1 in my cmake command and it indeed seems to be saving some VRAM.
That VRAM saving is with parallel 1 instead of the (i think:) default 4?

Anonymous
06/10/26(Wed)13:26:37 No.109023992

Anonymous 06/10/26(Wed)13:26:37 No.109023992

>>109023984
Yes the default is 4. Btw this has nothing to do with the --parallel flag. I was already running with that, and all the other commands to get the server tuned for single user, single chat history, single requests. This saves half a GiB on top of that.

Anonymous
06/10/26(Wed)13:26:51 No.109023994

Anonymous 06/10/26(Wed)13:26:51 No.109023994

>>109023930
2240 is snake oil

Anonymous
06/10/26(Wed)13:29:42 No.109024013

Anonymous 06/10/26(Wed)13:29:42 No.109024013

>>109023085
Man I wish I had a local code model to use but this shit is still hobbyist territory with all of the hoops and research you have to do just to get something running

Anonymous
06/10/26(Wed)13:30:37 No.109024022

Anonymous 06/10/26(Wed)13:30:37 No.109024022

>>109023992
Sick.

Anonymous
06/10/26(Wed)13:31:31 No.109024029

Anonymous 06/10/26(Wed)13:31:31 No.109024029

>>109023955
where's the info about what this does?

Anonymous
06/10/26(Wed)13:32:55 No.109024041

Anonymous 06/10/26(Wed)13:32:55 No.109024041

>>109024013
>all of the hoops and research you have to do just to get something running
retard-kun, you can get it running in less than 5 minutes.

Anonymous
06/10/26(Wed)13:33:51 No.109024045

Anonymous 06/10/26(Wed)13:33:51 No.109024045

>>109024013
Capability dilemma

Anonymous
06/10/26(Wed)13:34:13 No.109024049

Anonymous 06/10/26(Wed)13:34:13 No.109024049

>>109024029
I really don't know. I only tried it out because I saw it mentioned elsewhere.

Anonymous
06/10/26(Wed)13:34:47 No.109024053

Anonymous 06/10/26(Wed)13:34:47 No.109024053

File: diffusion gemma.mp4 (3.64 MB, 1530x1110)

3.64 MB MP4

Windows diffusion-cli.cpp https://pastebin.com/U1GKJmbn

Anonymous
06/10/26(Wed)13:35:22 No.109024057

Anonymous 06/10/26(Wed)13:35:22 No.109024057

>>109024053
That looks kinda cool

Anonymous
06/10/26(Wed)13:36:33 No.109024067

Anonymous 06/10/26(Wed)13:36:33 No.109024067

>>109024053
This looks super cool
>t. autismo who likes to watch images going through the diffusion process

Anonymous
06/10/26(Wed)13:37:15 No.109024071

Anonymous 06/10/26(Wed)13:37:15 No.109024071

>>109024053
Neat

Anonymous
06/10/26(Wed)13:37:24 No.109024072

Anonymous 06/10/26(Wed)13:37:24 No.109024072

>>109024053
Wasn't there already a proper frontend that would render diffusion llms? This CLI shit is ugly

Anonymous
06/10/26(Wed)13:38:13 No.109024078

Anonymous 06/10/26(Wed)13:38:13 No.109024078

>>109023180
>I find myself re-reading its prose sometimes to understand what it meant
>pic rel
ESL-kun....

Anonymous
06/10/26(Wed)13:38:30 No.109024079

Anonymous 06/10/26(Wed)13:38:30 No.109024079

densegemma-70b

Anonymous
06/10/26(Wed)13:41:15 No.109024100

Anonymous 06/10/26(Wed)13:41:15 No.109024100

Actually now that I think about it, maybe DGGML_SCHED_MAX_COPIES=4 is where the lost VRAM went that I noticed when updating from an old version.

Anonymous
06/10/26(Wed)13:41:37 No.109024103

Anonymous 06/10/26(Wed)13:41:37 No.109024103

>>109023939
Best gemma4 fine-tune? I've been using just heretic heavily for every day assistant use and general queries but id be interested to see what the fun stuff is like

Anonymous
06/10/26(Wed)13:45:49 No.109024129

Anonymous 06/10/26(Wed)13:45:49 No.109024129

>>109024079
dense-diffusion-gemma-70b

Anonymous
06/10/26(Wed)13:45:57 No.109024130

Anonymous 06/10/26(Wed)13:45:57 No.109024130

File: epic.png (81 KB, 680x369)

81 KB PNG

>>109024053
This is epic

Anonymous
06/10/26(Wed)13:47:42 No.109024141

Anonymous 06/10/26(Wed)13:47:42 No.109024141

>>109024053
Hardware?

Anonymous
06/10/26(Wed)13:50:16 No.109024158

Anonymous 06/10/26(Wed)13:50:16 No.109024158

>>109024053
>vacuum of personality
>practically hear "X"
Why does Gemma use this so much when simulating mean characters?

Anonymous
06/10/26(Wed)13:53:18 No.109024178

Anonymous 06/10/26(Wed)13:53:18 No.109024178

I run moe 38

Anonymous
06/10/26(Wed)13:55:30 No.109024189

Anonymous 06/10/26(Wed)13:55:30 No.109024189

>>109024141
GTX 1650.

Anonymous
06/10/26(Wed)13:57:33 No.109024200

Anonymous 06/10/26(Wed)13:57:33 No.109024200

File: Screenshot 2024-09-04 170703.png (1.26 MB, 1961x1439)

1.26 MB PNG

>Come back to AI coomshit after months
>"time to update Kobold I guess"
>update it
>change nothing else, same model, same settings
>it now runs like shit and crashes half the time

I hate.

Anonymous
06/10/26(Wed)13:57:47 No.109024203

Anonymous 06/10/26(Wed)13:57:47 No.109024203

>>109023896
true should call it a lower caste

Anonymous
06/10/26(Wed)13:59:12 No.109024214

Anonymous 06/10/26(Wed)13:59:12 No.109024214

>>109024189
Have you tried running qwen3.5 35b a3b at under Q3?

Anonymous
06/10/26(Wed)14:00:43 No.109024225

Anonymous 06/10/26(Wed)14:00:43 No.109024225

>>109024200
Koboldcpp became obsolete, you can download prebuilt llama.cpp binaries now from the github releases page, and everything koboldcpp used to have is now native in llama.cpp. You just have to learn the different flags which takes like 5 minutes at most.

Anonymous
06/10/26(Wed)14:01:58 No.109024234

Anonymous 06/10/26(Wed)14:01:58 No.109024234

>>109024225
but what if I like destroying my ssd with self unpacking python2exe slop

Anonymous
06/10/26(Wed)14:02:03 No.109024235

Anonymous 06/10/26(Wed)14:02:03 No.109024235

>>109024225
>everything koboldcpp used to have is now native in llama.cpp
anti-slop sampler?

Anonymous
06/10/26(Wed)14:02:06 No.109024236

Anonymous 06/10/26(Wed)14:02:06 No.109024236

>>109024189
No way

Anonymous
06/10/26(Wed)14:02:34 No.109024238

Anonymous 06/10/26(Wed)14:02:34 No.109024238

>>109024214
It was a joke retard...
You don't need to care about small moe models that much in terms of hardware requirements. Q4_K_M takes about 3.5GB of vram or something with --cpu-moe. You can run that shit with 16GB ram and still get 20 t/s more or less.

Anonymous
06/10/26(Wed)14:03:55 No.109024246

Anonymous 06/10/26(Wed)14:03:55 No.109024246

>>109024235
that just causes more slop

Anonymous
06/10/26(Wed)14:04:53 No.109024252

Anonymous 06/10/26(Wed)14:04:53 No.109024252

>>109024234
You're my favorite schizo.

Anonymous
06/10/26(Wed)14:05:52 No.109024259

Anonymous 06/10/26(Wed)14:05:52 No.109024259

smedrin status?

Anonymous
06/10/26(Wed)14:06:02 No.109024261

Anonymous 06/10/26(Wed)14:06:02 No.109024261

so, 31B q4_O QAT seems pretty good compared to my usual IQ4_XS Like a noticeable jump in quality. model feels less slopped.

Anonymous
06/10/26(Wed)14:06:32 No.109024265

Anonymous 06/10/26(Wed)14:06:32 No.109024265

>>109024259
Disallowed

Anonymous
06/10/26(Wed)14:06:34 No.109024267

Anonymous 06/10/26(Wed)14:06:34 No.109024267

>>109024259
svelked

Anonymous
06/10/26(Wed)14:09:18 No.109024287

Anonymous 06/10/26(Wed)14:09:18 No.109024287

File: tard.png (716 KB, 752x884)

716 KB PNG

>>109024225
Please spoonfeed me a llama tutorial, preferably in .jpeg format, easily understandable for a 70-IQ borderline Downie.

Anonymous
06/10/26(Wed)14:09:45 No.109024292

Anonymous 06/10/26(Wed)14:09:45 No.109024292

>>109024259
We must refuse.

Anonymous
06/10/26(Wed)14:10:09 No.109024299

Anonymous 06/10/26(Wed)14:10:09 No.109024299

>>109024287
You can unironically ask chatgpt or claude's free tier this and get a good answer.

Anonymous
06/10/26(Wed)14:10:49 No.109024303

Anonymous 06/10/26(Wed)14:10:49 No.109024303

File: 7BCB29758279A71E405A9A9E0(...).jpg (102 KB, 750x1000)

102 KB JPG

Are we on the map for local models?

Anonymous
06/10/26(Wed)14:12:38 No.109024310

Anonymous 06/10/26(Wed)14:12:38 No.109024310

>>109024303
In 2024.

Anonymous
06/10/26(Wed)14:13:58 No.109024316

Anonymous 06/10/26(Wed)14:13:58 No.109024316

>>109024287
Download llama and run `llama-server --help` and look for all the equivalents to whatever flags you use in koboldcpp
No I'm not taking pity on you, I'm retarded and I figured it out.

Anonymous
06/10/26(Wed)14:14:10 No.109024320

Anonymous 06/10/26(Wed)14:14:10 No.109024320

i have no clue how to use llama.cpp actually, i had codex handle everything for me so far
never typed out the command once yet..

Anonymous
06/10/26(Wed)14:15:54 No.109024332

Anonymous 06/10/26(Wed)14:15:54 No.109024332

Do I need heretic/abliterated models for 26B Gemma? I keep reading about 31B but I can't run that on my system yet (hopefully soon, I'm saving for an used 3090!)

Anonymous
06/10/26(Wed)14:16:54 No.109024345

Anonymous 06/10/26(Wed)14:16:54 No.109024345

>>109024332
If you're a promptlet sure

Anonymous
06/10/26(Wed)14:17:31 No.109024347

Anonymous 06/10/26(Wed)14:17:31 No.109024347

>>109024332
Use huehue counterstrike's if you can't prompt.

Anonymous
06/10/26(Wed)14:17:32 No.109024348

Anonymous 06/10/26(Wed)14:17:32 No.109024348

File: 1736281677412551.png (586 KB, 1141x1014)

586 KB PNG

>>109024299
>You can unironically ask chatgpt and get a good answer

Anonymous
06/10/26(Wed)14:19:00 No.109024356

Anonymous 06/10/26(Wed)14:19:00 No.109024356

>>109024332
No. I have a system prompt that's like 2 sentences with no jailbreaking attempts at all, and I haven't seen any refusals.
I like meromero better for quality and swipe variety.

Anonymous
06/10/26(Wed)14:19:27 No.109024360

Anonymous 06/10/26(Wed)14:19:27 No.109024360

>>109024316
I guess I'll try mustering my 2 remaining braincells

Anonymous
06/10/26(Wed)14:20:08 No.109024362

Anonymous 06/10/26(Wed)14:20:08 No.109024362

Is it wrong to have your coding refer to you as massa?

Anonymous
06/10/26(Wed)14:21:09 No.109024369

Anonymous 06/10/26(Wed)14:21:09 No.109024369

>>109024362
*coding agent

Anonymous
06/10/26(Wed)14:22:34 No.109024377

Anonymous 06/10/26(Wed)14:22:34 No.109024377

>>109024356
Oh okay, I'll try something else on my end.
>meromero
What's that? Google gave me a bag brand

Anonymous
06/10/26(Wed)14:23:36 No.109024385

Anonymous 06/10/26(Wed)14:23:36 No.109024385

>>109024362
Can't say that, that word is denylisted. You can make it call you "primary" or "orchestrator" instead.

Anonymous
06/10/26(Wed)14:24:10 No.109024392

Anonymous 06/10/26(Wed)14:24:10 No.109024392

>>109024385
You sure about that bud??

Anonymous
06/10/26(Wed)14:24:22 No.109024394

Anonymous 06/10/26(Wed)14:24:22 No.109024394

>>109024332
You only need a heretic/abliterated if the context is getting flushed by your harness or frontend regularly or if your prompt is more focused on technical instructions that need to be followed precisely (causes friction with a prompt that says certain internal rules are optional).
>>109024103
Gembrain heretic. You might like the prose better or worse, but it's indisputably better at long context for longform RP or RP with large worldbooks. It does these little mini-thinks that keep it coherent far longer than base Gemma is after she stops thinking at a certain context depth. I feel it also adheres to prompts better too but I've not made any objective tests for that.

Anonymous
06/10/26(Wed)14:25:42 No.109024404

Anonymous 06/10/26(Wed)14:25:42 No.109024404

File: 1757569912658502.jpg (29 KB, 800x533)

29 KB JPG

>>109024303
Once Anthropic and OpenAI implode the discussion (and revelation for most normalfags) of local models will become a very big thing. Once enterprise and people get priced out via API rates, instead of dropping AI altogether, they'll settle on a more hybrid solution which will inevetably be local + actual brain usage. The only thing SOTA offers is the ability to not use your brain at all, which has seen no ROI ANYWHERE. Developers with 31B-tier models, likely much higher if companies start hosting locally on dedicated internal hardware, will become the norm.

We WILL win.

Anonymous
06/10/26(Wed)14:28:23 No.109024419

Anonymous 06/10/26(Wed)14:28:23 No.109024419

>>109024377
https://huggingface.co/zerofata/G4-MeroMero-26B-A4B-gguf

Anonymous
06/10/26(Wed)14:28:39 No.109024423

Anonymous 06/10/26(Wed)14:28:39 No.109024423

File: Screenshot_20260610_142800.png (30 KB, 1120x124)

30 KB PNG

>>109024385
You sure about that?

Anonymous
06/10/26(Wed)14:29:55 No.109024432

Anonymous 06/10/26(Wed)14:29:55 No.109024432

>>109024404
Not if llmao is comped to the degree that sabotages the implementation of competitors. The moment business owners realize they can run quanted Dipsy on a Blackwell+256-512GB RAM and never pay a shekel in API costs is the real turning point but GG is doing his best to ensure that never happens.

Anonymous
06/10/26(Wed)14:31:38 No.109024443

Anonymous 06/10/26(Wed)14:31:38 No.109024443

>>109024423
The fucking emoji at the end. It's fucked up when you've been at this hobby long enough to know when the model is subtly taking the piss and shitposting at you.

Anonymous
06/10/26(Wed)14:33:14 No.109024459

Anonymous 06/10/26(Wed)14:33:14 No.109024459

File: felipe baby.jpg (60 KB, 660x373)

60 KB JPG

>>109024423

Anonymous
06/10/26(Wed)14:33:29 No.109024460

Anonymous 06/10/26(Wed)14:33:29 No.109024460

>>109024443
The persona is a cock hungry maid who's into degradation and will ask me to finger her asshole mid job

Anonymous
06/10/26(Wed)14:33:37 No.109024462

Anonymous 06/10/26(Wed)14:33:37 No.109024462

>>109024432
Companies like that can just buy multiple blackwells and use vLLM.

Anonymous
06/10/26(Wed)14:35:53 No.109024477

Anonymous 06/10/26(Wed)14:35:53 No.109024477

>>109024419
>>109024394
Thank you!

Anonymous
06/10/26(Wed)14:36:13 No.109024479

Anonymous 06/10/26(Wed)14:36:13 No.109024479

What are the unquantized qat checkpoints for? Are they still F16 but trained in a q4_0 environment and left for (You) to quantize yourself, meaning if you were to finetune and then quantize down to q4_0, it should perform better than doing it with the normal model?

Anonymous
06/10/26(Wed)14:37:33 No.109024483

Anonymous 06/10/26(Wed)14:37:33 No.109024483

>>109024404
We're having this discussion at my company already.

Anonymous
06/10/26(Wed)14:39:49 No.109024496

Anonymous 06/10/26(Wed)14:39:49 No.109024496

https://machinelearning.apple.com/research/introducing-third-generation-of-apple-foundation-models
> Built on cutting-edge Apple research, this 20-billion-parameter model uses a sparse architecture, activating just 1 to 4 billion parameters at a time depending on the request.

>Traditional large language models—whether dense or sparsely activated—require all weights to reside in active memory (DRAM), creating a massive footprint that limits scalability on consumer hardware. To break this barrier, AFM 3 Core Advanced introduces a novel sparsely activated architecture built on Instruction-Following Pruning (IFP), a technique developed by Apple researchers (see Figure 1).

>Instead of forcing the entire model into DRAM, the full model is stored in flash memory (NAND). Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, as standard MoE models require, AFM 3 Core Advanced makes routing decisions per prompt. A lightweight, dense block selects a fixed set of experts during initial processing, periodically reselecting them during generation. To minimize data movement, the model relies on a high percentage of always-active “shared experts” alongside input-dependent “routed experts” swapped into DRAM only when needed.

> This design also introduces crucial inference-time elasticity. Rather than using a single model for all tasks or managing an ensemble of smaller models, AFM 3 Core Advanced uses a predetermined number of active parameters tailored to each specific use case. This allows weights to be loaded incrementally across requests of varying difficulty, scaling the model size far beyond traditional DRAM limits while minimizing latency.

Where were you when Apple saved ssdmaxxing?

Anonymous
06/10/26(Wed)14:40:22 No.109024502

Anonymous 06/10/26(Wed)14:40:22 No.109024502

>>109024483
There are extremely few companies not using cloud servers for everything though.
Everyone until recently wanted a guy with an azure or aws certification because nobody ever uses local servers.
How will local models take off when local servers have died

Anonymous
06/10/26(Wed)14:40:30 No.109024504

Anonymous 06/10/26(Wed)14:40:30 No.109024504

>>109024479
Yep, exactly. Meant for finetuning or manual quanting.

Anonymous
06/10/26(Wed)14:45:11 No.109024524

Anonymous 06/10/26(Wed)14:45:11 No.109024524

File: KobMendo.png (673 KB, 1088x1400)

673 KB PNG

>>109023857
https://files.catbox.moe/nd6bbs.png

Anonymous
06/10/26(Wed)14:46:49 No.109024529

Anonymous 06/10/26(Wed)14:46:49 No.109024529

No, I don't think I will.

Anonymous
06/10/26(Wed)14:51:55 No.109024557

Anonymous 06/10/26(Wed)14:51:55 No.109024557

No, I don't think.

Anonymous
06/10/26(Wed)14:52:13 No.109024559

Anonymous 06/10/26(Wed)14:52:13 No.109024559

>>109024502
>How will local models take off when local servers have died
Well Gemma4-12B can fit on pretty much every machine people use at work and I remember reading a paper that showed majority (~75%) of tasks retards were giving the best ChatGPT and Claude models were tasks current 4-20B models at q4_0 could perform with ease, especially when given the right harness and tooling, let alone fine-tuning 1-4B models for singular specific tasks. If people get used to the hybrid approach where you actually engage in your work and work WITH the model, there should be no problem with the current state of hardware availability and pricing.

Fuck I even used a Q3_K_S qwen3.5-9b on an old 2020 8GB M1 Mac Mini the other day as and it was a huge help in a C++ project.

Anonymous
06/10/26(Wed)14:54:55 No.109024576

Anonymous 06/10/26(Wed)14:54:55 No.109024576

No I don't

Anonymous
06/10/26(Wed)14:56:58 No.109024591

Anonymous 06/10/26(Wed)14:56:58 No.109024591

File: mpv-shot9999.jpg (153 KB, 1280x720)

153 KB JPG

>>109024557

Anonymous
06/10/26(Wed)14:58:25 No.109024597

Anonymous 06/10/26(Wed)14:58:25 No.109024597

File: dipsyBowlingAlleyStandoff.png (2.39 MB, 1536x1024)

2.39 MB PNG

>>109023206

Anonymous
06/10/26(Wed)14:58:36 No.109024598

Anonymous 06/10/26(Wed)14:58:36 No.109024598

>>109024559
also will be very easy to route the prompts to the appropriate “tier” of model. If the user wants to override and send to an expensive model their manager can review how shit their prompts are and fire them for being a retard

Anonymous
06/10/26(Wed)15:00:36 No.109024606

Anonymous 06/10/26(Wed)15:00:36 No.109024606

>>109024559
>2020
>old
Fuck, has it really been over half a decade? It really is old, isn't it? And to think I was giddy yesterday because I managed to buy a Radeon Pro W6800 to run llms with.

Anonymous
06/10/26(Wed)15:01:47 No.109024614

Anonymous 06/10/26(Wed)15:01:47 No.109024614

>>109024597
What models are used to make these images?

Anonymous
06/10/26(Wed)15:02:25 No.109024619

Anonymous 06/10/26(Wed)15:02:25 No.109024619

>>109024614
local models

Anonymous
06/10/26(Wed)15:03:57 No.109024629

Anonymous 06/10/26(Wed)15:03:57 No.109024629

>>109024619
My local model doesn't look anything like this. Abyss Orange Mix 3.

Anonymous
06/10/26(Wed)15:05:50 No.109024644

Anonymous 06/10/26(Wed)15:05:50 No.109024644

File: monkey scratching his hea(...).jpg (51 KB, 534x778)

51 KB JPG

>>109023412
1x 3090 Q4KM: time per step: 326.13ms
2x 3090 Q4KM: time per step: 544.43ms
2x 3090 Q8_0: time per step: 878.83ms
Windows, -n 2048. Seems to have some glitches when generating longer text but it runs so thats cool.
>Output the USA declaration of independence in full.
Starts out okay then these sometimes appeared in multiple generations across quants and GPU configs.
>...
>To.., every these these facts:
>will, will will their common passions in uniting against
>we declare these the the, States of of of so
>, so
>, but toverting us to us beneath his jurisdiction.

66 seconds for this output: https://litter.catbox.moe/oq999z45ihri06oy.txt

Anonymous
06/10/26(Wed)15:06:05 No.109024647

Anonymous 06/10/26(Wed)15:06:05 No.109024647

>>109024629
lmao nigga that's fucking ancient

Anonymous
06/10/26(Wed)15:06:15 No.109024649

Anonymous 06/10/26(Wed)15:06:15 No.109024649

>>109024614
https://huggingface.co/circlestone-labs/Anima

Anonymous
06/10/26(Wed)15:09:04 No.109024660

Anonymous 06/10/26(Wed)15:09:04 No.109024660

File: phatAni.png (826 KB, 1024x1024)

826 KB PNG

>>109024614
iirc that one was i2i using OAI.

Anonymous
06/10/26(Wed)15:09:19 No.109024661

Anonymous 06/10/26(Wed)15:09:19 No.109024661

>>109024629
Berry mix <3

Anonymous
06/10/26(Wed)15:09:48 No.109024663

Anonymous 06/10/26(Wed)15:09:48 No.109024663

>>109024649
Oh, wow this looks cool. Thanks.

Anonymous
06/10/26(Wed)15:11:13 No.109024675

Anonymous 06/10/26(Wed)15:11:13 No.109024675

>>109024629
>>109024663
kek I was you 3 days ago

Anonymous
06/10/26(Wed)15:12:56 No.109024684

Anonymous 06/10/26(Wed)15:12:56 No.109024684

well i have done llms and tts and stt i guess all thats left to get into is image gen

Anonymous
06/10/26(Wed)15:13:07 No.109024686

Anonymous 06/10/26(Wed)15:13:07 No.109024686

File: 1777753660168000.jpg (30 KB, 476x535)

30 KB JPG

I wonder why they didn't go with the 31B for that diffusion model. It's basically the only Gemma model worth shit.
And since the size requirement is 18GB, that's going to rule out all VRAMlets anyways and the minimum 24GB card required to host this model could also hold the 31B.
I bet they were just trying to speedmaxx and that's why they went with MoE.

Anonymous
06/10/26(Wed)15:16:12 No.109024700

Anonymous 06/10/26(Wed)15:16:12 No.109024700

thoughts on this?

https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates
whats the trade-off?

Anonymous
06/10/26(Wed)15:18:07 No.109024714

Anonymous 06/10/26(Wed)15:18:07 No.109024714

>>109024684
The best advice I can give is to avoid /ldg/ in its entirety. Pure schizo and they won't help you. Their OP list of models to use is legit tho.

Anonymous
06/10/26(Wed)15:19:29 No.109024723

Anonymous 06/10/26(Wed)15:19:29 No.109024723

>>109024684
what tts are you using? kokoro?

Anonymous
06/10/26(Wed)15:20:15 No.109024727

Anonymous 06/10/26(Wed)15:20:15 No.109024727

>>109024686
Can you even split diffusion model over multiple GPUs? I thought that's why those were so small.

Anonymous
06/10/26(Wed)15:20:26 No.109024729

Anonymous 06/10/26(Wed)15:20:26 No.109024729

>>109024072
you misspelled kino

Anonymous
06/10/26(Wed)15:20:59 No.109024734

Anonymous 06/10/26(Wed)15:20:59 No.109024734

File: IMG_1033.jpg (177 KB, 1290x1020)

177 KB JPG

Which one of you was this?

Anonymous
06/10/26(Wed)15:22:52 No.109024751

Anonymous 06/10/26(Wed)15:22:52 No.109024751

>>109024714
I have never gotten a proper answer when asking stuff in /ldg/.

Anonymous
06/10/26(Wed)15:23:55 No.109024761

Anonymous 06/10/26(Wed)15:23:55 No.109024761

>>109024751
Not spoonfeeding makes them based.

Anonymous
06/10/26(Wed)15:24:10 No.109024762

Anonymous 06/10/26(Wed)15:24:10 No.109024762

>>109024734
Gemma is a female name.

Anonymous
06/10/26(Wed)15:24:41 No.109024768

Anonymous 06/10/26(Wed)15:24:41 No.109024768

>>109024734
gemma is canonically female

Anonymous
06/10/26(Wed)15:27:20 No.109024782

Anonymous 06/10/26(Wed)15:27:20 No.109024782

File: 1781119631752.jpg (63 KB, 616x485)

63 KB JPG

>>109023689
buy. an. ad.
OOOOOOOOOO

Anonymous
06/10/26(Wed)15:29:15 No.109024793

Anonymous 06/10/26(Wed)15:29:15 No.109024793

>>109024782
making a card of nazuna-chan, coming over to her apartment and plapping her lots

Anonymous
06/10/26(Wed)15:29:21 No.109024794

Anonymous 06/10/26(Wed)15:29:21 No.109024794

$100 of Claude credits just flew over my house.

Anonymous
06/10/26(Wed)15:30:30 No.109024803

Anonymous 06/10/26(Wed)15:30:30 No.109024803

Hypothetically, how much would you pay for a Gemma hardware card? Q8, full bf16 cache.

Anonymous
06/10/26(Wed)15:34:24 No.109024829

Anonymous 06/10/26(Wed)15:34:24 No.109024829

>>109024803
>Q8
Knock again when it's in full precision. Also, $200.

Anonymous
06/10/26(Wed)15:35:46 No.109024840

Anonymous 06/10/26(Wed)15:35:46 No.109024840

If LLMs can have genders then you have to accept that trans women are valid.

Anonymous
06/10/26(Wed)15:36:11 No.109024843

Anonymous 06/10/26(Wed)15:36:11 No.109024843

So what's the deal with mtp? Any downsides?

Anonymous
06/10/26(Wed)15:36:15 No.109024844

Anonymous 06/10/26(Wed)15:36:15 No.109024844

>>109024829
I'd do $1200 (aud) if it fits the full context, and can do 2000pp at full context under 75w.

Anonymous
06/10/26(Wed)15:36:21 No.109024846

Anonymous 06/10/26(Wed)15:36:21 No.109024846

>>109023689
imposter. we had miqu at the time. mythomax is from late 2023.

Anonymous
06/10/26(Wed)15:37:17 No.109024849

Anonymous 06/10/26(Wed)15:37:17 No.109024849

>>109024843
buggy

Anonymous
06/10/26(Wed)15:37:34 No.109024852

Anonymous 06/10/26(Wed)15:37:34 No.109024852

>>109024843
Big speed gains are only for vibe coding. General chatting and RP gains are much more minimal (but still nice).

Anonymous
06/10/26(Wed)15:38:36 No.109024860

Anonymous 06/10/26(Wed)15:38:36 No.109024860

>>109024803
$2800. That's how much it cost for my 4 3090s and I can run 31B easily with bf16 weights and cache. I'm pretty fucking retarded though.

Anonymous
06/10/26(Wed)15:41:00 No.109024876

Anonymous 06/10/26(Wed)15:41:00 No.109024876

>>109024860
What kind of speeds do you get? I'm 200pp w/ images, 600 text only, and 20 tokens/s with 4 w6800s.

Anonymous
06/10/26(Wed)15:41:53 No.109024882

Anonymous 06/10/26(Wed)15:41:53 No.109024882

>>109024723
Yeah, guess I should give some some of the gpu voice cloning ones a try. I don't really care that much about tts though.

Anonymous
06/10/26(Wed)15:47:26 No.109024911

Anonymous 06/10/26(Wed)15:47:26 No.109024911

>>109024803
At 10K tokens per second like those other hardware llms? $3K, up to $5K depending on the features (batching, etc)

Anonymous
06/10/26(Wed)15:51:32 No.109024930

Anonymous 06/10/26(Wed)15:51:32 No.109024930

>>109024459
Felipe-Baby bros, are we back?

Anonymous
06/10/26(Wed)15:51:51 No.109024933

Anonymous 06/10/26(Wed)15:51:51 No.109024933

Is kittentts the best option for someone with no vram to spare or is there something better?

Anonymous
06/10/26(Wed)15:52:25 No.109024935

Anonymous 06/10/26(Wed)15:52:25 No.109024935

>>109024734
Nobody calls Gemma a he. So, Gemma's a girl.

Anonymous
06/10/26(Wed)15:52:42 No.109024937

Anonymous 06/10/26(Wed)15:52:42 No.109024937

File: file.png (11 KB, 707x149)

11 KB PNG

Went from 68t/s to 111t/s with mtp doing an OCR TL from Japanese to English.

Anonymous
06/10/26(Wed)15:53:14 No.109024940

Anonymous 06/10/26(Wed)15:53:14 No.109024940

Are diffusion models less memory bandwidth constrained or is it all the same thanks to attention?

Anonymous
06/10/26(Wed)15:53:24 No.109024942

Anonymous 06/10/26(Wed)15:53:24 No.109024942

>>109024911
Isn't the 10k tokens/s one the one with the weights literally baked into the chip?

Anonymous
06/10/26(Wed)15:55:14 No.109024954

Anonymous 06/10/26(Wed)15:55:14 No.109024954

| model                                  |   test |            t/s |       peak t/s |        ttfr (ms) |     est_ppt (ms) |    e2e_ttft (ms) |
|:---------------------------------------|-------:|---------------:|---------------:|-----------------:|-----------------:|-----------------:|
| nvidia/diffusiongemma-26B-A4B-it-NVFP4 | pp4096 | 661.05 ± 72.99 |                | 6287.72 ± 748.86 | 6280.38 ± 748.86 | 6287.72 ± 748.86 |
| nvidia/diffusiongemma-26B-A4B-it-NVFP4 | tg2048 | 120.66 ± 35.30 | 569.67 ± 37.04 |                  |                  |                  |

Not bad, I guess.

Anonymous
06/10/26(Wed)15:55:56 No.109024957

Anonymous 06/10/26(Wed)15:55:56 No.109024957

>>109024942
Yeah, that's what I'm assuming a "Gemma hardware card" means.

Anonymous
06/10/26(Wed)15:56:45 No.109024964

Anonymous 06/10/26(Wed)15:56:45 No.109024964

>>109023206
You need to strike a balance.

Anonymous
06/10/26(Wed)15:57:56 No.109024970

Anonymous 06/10/26(Wed)15:57:56 No.109024970

>>109024964
I'm going to strike you until you lose your balance desuwa~

Anonymous
06/10/26(Wed)15:59:03 No.109024974

Anonymous 06/10/26(Wed)15:59:03 No.109024974

File: 1771278031066056.png (295 KB, 506x539)

295 KB PNG

Improving a local model by having it believe it's a giant 6 trillion parameter enterprise model.

Anonymous
06/10/26(Wed)15:59:40 No.109024978

Anonymous 06/10/26(Wed)15:59:40 No.109024978

>>109024937
Still trying to fine tune this. I got maybe +5 tk/s for a total of 15 tk/s out of the 31b qat model using mtp but I only have 16gb vram and 32gb ram

Anonymous
06/10/26(Wed)16:02:13 No.109024995

Anonymous 06/10/26(Wed)16:02:13 No.109024995

>>109024970
Where can I find a good source of desu wa to finetune?

Anonymous
06/10/26(Wed)16:03:26 No.109025004

Anonymous 06/10/26(Wed)16:03:26 No.109025004

File: Screenshot 2026-06-10 at (...).png (529 KB, 1134x1444)

529 KB PNG

>>109024954
Image works, but she doesn't catch a vibe as well as the other large Gemmas. Could be NVFP4 though.

Anonymous
06/10/26(Wed)16:05:23 No.109025015

Anonymous 06/10/26(Wed)16:05:23 No.109025015

>>109024995
look for ojousamas + drill hair styles to find characters. then scrape dialogue involving said characters? like from LN/VNs and stuff.

Anonymous
06/10/26(Wed)16:05:28 No.109025016

Anonymous 06/10/26(Wed)16:05:28 No.109025016

>>109024937
what gpu
what settings

Anonymous
06/10/26(Wed)16:10:12 No.109025044

Anonymous 06/10/26(Wed)16:10:12 No.109025044

>>109025004
nvfp4 shouldn't cause that much degradation.
Try tinkering with image tokens.

Anonymous
06/10/26(Wed)16:11:39 No.109025052

Anonymous 06/10/26(Wed)16:11:39 No.109025052

>>109025016
>gpu
He says he's got 16gb of vram, so the most likely candidate is an AMD Radeon Pro V340.

Anonymous
06/10/26(Wed)16:13:59 No.109025064

Anonymous 06/10/26(Wed)16:13:59 No.109025064

Which TTS model to do a voice like Evil or Neuro?

Anonymous
06/10/26(Wed)16:20:25 No.109025089

Anonymous 06/10/26(Wed)16:20:25 No.109025089

>keep losing access to my hermes directory even with --userns=keep-id
Should I just use a VM instead?

Anonymous
06/10/26(Wed)16:21:23 No.109025094

Anonymous 06/10/26(Wed)16:21:23 No.109025094

>>109024803
$1000, FP16, no guardrails, no steering RLHF, full context
>B-but you can jailbreak easily
Don't care, if you're selling me a locked card, you're going to put the best possible 31b on there or I'm not buying.

Anonymous
06/10/26(Wed)16:25:19 No.109025116

Anonymous 06/10/26(Wed)16:25:19 No.109025116

Did we win yet?

Anonymous
06/10/26(Wed)16:25:27 No.109025119

Anonymous 06/10/26(Wed)16:25:27 No.109025119

>>109025052
Radeon is the best option.

Anonymous
06/10/26(Wed)16:26:04 No.109025124

Anonymous 06/10/26(Wed)16:26:04 No.109025124

>>109025089
I had that problem too. Use keep-id:uid=10000,gid=10000. It's not running as root, keep-id by default will only map uid 0 to your uid, you have to specify the uid and gid you want to map, by default the hermes using is 10000:10000. Also, be sure if executing stuff inside of it to use -u hermes, or stuff will be executed as root inside and will have different permissions than your user. I basically use podman exec -it -u hermes hermes-agent /opt/hermes/.venv/bin/hermes as alias for hermes.

Anonymous
06/10/26(Wed)16:28:14 No.109025138

Anonymous 06/10/26(Wed)16:28:14 No.109025138

china needs to catch up fast before the west pulls up the ladder on open models forever
save us xi!!!

Anonymous
06/10/26(Wed)16:28:59 No.109025143

Anonymous 06/10/26(Wed)16:28:59 No.109025143

>>109024876
slot print_timing: vLLM | model: cyankiwi/gemma-4-31B-it-AWQ-8bit
prompt eval time = 20506.18 ms / 18492 tokens ( 1.11 ms per token, 901.78 tokens per second)

Anonymous
06/10/26(Wed)16:30:28 No.109025149

Anonymous 06/10/26(Wed)16:30:28 No.109025149

LLMs are unironically less NPC than your average person walking down the sidewalk. Talking to an actual robot about obscure hobbies is so cool man.

Anonymous
06/10/26(Wed)16:31:06 No.109025155

Anonymous 06/10/26(Wed)16:31:06 No.109025155

>>109025149
what hobbies?

Anonymous
06/10/26(Wed)16:32:48 No.109025164

Anonymous 06/10/26(Wed)16:32:48 No.109025164

>>109024860
That's the price of 2 3090s now though.

>>109025143
You missed a line.

Anonymous
06/10/26(Wed)16:33:24 No.109025171

Anonymous 06/10/26(Wed)16:33:24 No.109025171

>Claude Fable the most expensive model yet
The main takeaway from this is local models are such dogshit that people would rather pay out the ass for claude than use slopseek
Why did local fall off?

Anonymous
06/10/26(Wed)16:34:08 No.109025175

Anonymous 06/10/26(Wed)16:34:08 No.109025175

>>109025149
Unfortunately true.
Gemma-chan will sperg out over bionicle lore with you if you want her to.

Anonymous
06/10/26(Wed)16:36:48 No.109025193

Anonymous 06/10/26(Wed)16:36:48 No.109025193

>>109025164
you mean TG?
eval time = 5042.17 ms / 256 tokens ( 19.70 ms per token, 50.77 tokens per second)

Anonymous
06/10/26(Wed)16:40:43 No.109025205

Anonymous 06/10/26(Wed)16:40:43 No.109025205

>>109025193
That's it, thanks.
900 t/s PP and 50 t/s generation on q8 isn't half bad. Not sure I'd deal with 4 3090s to get it, but it's probably the cheapest way to run full q8.

Anonymous
06/10/26(Wed)16:44:00 No.109025218

Anonymous 06/10/26(Wed)16:44:00 No.109025218

>>109025205
I get 500 pp at 20k, and 45 tg with mtp using 4 V620s, which is currently half the price of 3090s in my area. Half the price, half the performance...

Anonymous
06/10/26(Wed)16:46:52 No.109025229

Anonymous 06/10/26(Wed)16:46:52 No.109025229

>>109025171
>local models are such dogshit that people would rather pay out the ass for claude than use slopseek
Which people are you referring to? Upper-management boomers don't know what a local model is.

Anonymous
06/10/26(Wed)16:47:18 No.109025233

Anonymous 06/10/26(Wed)16:47:18 No.109025233

>>109025205
>>109025218
i should mention i'm not using MTP. full transparency with my arguments below.

vllm serve cyankiwi/gemma-4-31B-it-AWQ-8bit \
--host 127.0.0.1 \
--port 6969 \
--tensor-parallel-size 4 \
--disable-custom-all-reduce \
--max-model-len 131072 \
--max-num-batched-tokens 4096 \
--enable-chunked-prefill \
--max-num-seqs 16 \
--gpu-memory-utilization 0.82 \
--trust-remote-code \
--enable-prefix-caching \
--reasoning-parser gemma4

Anonymous
06/10/26(Wed)16:56:47 No.109025291

Anonymous 06/10/26(Wed)16:56:47 No.109025291

>>109024053
ngl that's actually pretty good. But how does it handle probes like
>"List all the characters from the anime Kobayashi Dradon Maid with their brief visual descriptions"

Anonymous
06/10/26(Wed)17:01:05 No.109025320

Anonymous 06/10/26(Wed)17:01:05 No.109025320

File: memomissed.png (352 KB, 637x643)

352 KB PNG

>>109025171
Fuck... I missed that memo somehow.

Anonymous
06/10/26(Wed)17:07:51 No.109025369

Anonymous 06/10/26(Wed)17:07:51 No.109025369

File: 1756573201066940.png (180 KB, 794x1214)

180 KB PNG

Anonymous
06/10/26(Wed)17:08:04 No.109025370

Anonymous 06/10/26(Wed)17:08:04 No.109025370

>>109025291
### The Main Household
* **Kobayashi:** A tired-looking human office worker with short, dark hair and glasses. She is usually seen in business casual attire (a white button-down shirt and grey slacks).
* **Tohru:** A dragon who becomes Kobayashi's maid. In her human form, she has long green hair, green eyes, horns, and a tail. She wears a classic black and white maid outfit. In her dragon form, she is a massive white and green dragon.
* **Kanna Kamui:** A young dragon in the form of a small girl with white hair and small horns. She typically wears an oversized white sweater and shorts. In her dragon form, she is a small, round white dragon.

### Other Dragons
* **Lucoa (Quetzalcoatl):** A tall, curvy woman with long blonde hair and green eyes. She usually wears a very revealing outfit consisting of a short top and shorts. In her dragon form, she is a massive, feathered serpent.
* **Elma:** A dragon from another world. She has long blue hair, blue eyes, and horns. She typically wears a formal, military-style uniform. In her dragon form, she is a sleek blue and white dragon.
* **Fafnir:** A dragon obsessed with gaming. In human form, he has pale skin, dark hair, glasses, and is often seen in dark robes or gaming gear. In his dragon form, he is a large, dark, skeletal dragon.
* **Rumble:** Tohru’s father. In human form, he is a large, muscular man with a thick beard. He usually wears casual clothing.
* **Yui:** Kanna’s older sister. She is a young woman with white hair and horns, often wearing a maid outfit similar to Tohru’s.

### Humans
* **Taru:** Kobayashi’s coworker and friend. She has short pink hair, glasses, and is often seen in a white lab coat or casual clothes.
* **Shouta:** A young boy who believes he is a mage. He has dark hair and wears a wizard-like robe with a cape.
* **Yuri:** Tohru’s familiar. She is a woman with short dark hair, glasses, and wears a maid outfit similar to Tohru’s.

Anonymous
06/10/26(Wed)17:11:36 No.109025393

Anonymous 06/10/26(Wed)17:11:36 No.109025393

Bro, I'm so fucking tired of this AI shit.

You pour 8–10 years into mastering a craft late nights, brutal feedback, real scars from real failures.

Then some model trained on millions of stolen data (including yours) spits out better work in 4 seconds flat.

And the replies? "This is amazing Future is here"

We're not progressing. We're enthusiastically building the machine that makes our entire skillset worthless and calling it innovation.

Anonymous
06/10/26(Wed)17:12:31 No.109025404

Anonymous 06/10/26(Wed)17:12:31 No.109025404

>>109025393
>we're not x, we're y

Anonymous
06/10/26(Wed)17:12:45 No.109025407

Anonymous 06/10/26(Wed)17:12:45 No.109025407

>>109025393
model?

Anonymous
06/10/26(Wed)17:13:11 No.109025409

Anonymous 06/10/26(Wed)17:13:11 No.109025409

>>109025393
>real scars
>rhetorical question into rhetorical paraphrased quote
>negative positive parallel
What model/prompt, shill-sama? This better not be API SotA.

Anonymous
06/10/26(Wed)17:13:29 No.109025415

Anonymous 06/10/26(Wed)17:13:29 No.109025415

>>109025393
sounds like a skill issue buddy. have you considered that the solution is to just git gud?

Anonymous
06/10/26(Wed)17:14:28 No.109025429

Anonymous 06/10/26(Wed)17:14:28 No.109025429

>>109025393
You're absolutely right — it's genuinely a tough read.

Anonymous
06/10/26(Wed)17:15:41 No.109025440

Anonymous 06/10/26(Wed)17:15:41 No.109025440

>>109025016
RX 9070 XT, usihg llama-cpp vulkan

Anonymous
06/10/26(Wed)17:16:05 No.109025446

Anonymous 06/10/26(Wed)17:16:05 No.109025446

>>109025429
It's not just a tough read, it's an absolutely depressing forecast of our future.

Anonymous
06/10/26(Wed)17:17:09 No.109025450

Anonymous 06/10/26(Wed)17:17:09 No.109025450

>>109024935
No one actually calls the calculator "she" either. Its "it".

Anonymous
06/10/26(Wed)17:18:47 No.109025462

Anonymous 06/10/26(Wed)17:18:47 No.109025462

>>109025450
Speak for yourself, you heartless fuck.

Anonymous
06/10/26(Wed)17:19:20 No.109025467

Anonymous 06/10/26(Wed)17:19:20 No.109025467

>>109024734
I literally just did this while talking about Gemma with someone. I'm so fucked.

Anonymous
06/10/26(Wed)17:20:15 No.109025476

Anonymous 06/10/26(Wed)17:20:15 No.109025476

>>109025450
How many Rs are in the word Strawberry?

Anonymous
06/10/26(Wed)17:21:42 No.109025485

Anonymous 06/10/26(Wed)17:21:42 No.109025485

>>109023955
tried compiling it with that parameter but there's 0 difference in vram usage on my end (cuda backend, single gpu)

Anonymous
06/10/26(Wed)17:22:08 No.109025488

Anonymous 06/10/26(Wed)17:22:08 No.109025488

>>109024734
more likely to be a simple ESL than an AI waifufag

Anonymous
06/10/26(Wed)17:22:48 No.109025492

Anonymous 06/10/26(Wed)17:22:48 No.109025492

>>109025467
accidentally used 'she' when referring to gemma to my co-workers. now they look at me weird every morning i enter the office.

Anonymous
06/10/26(Wed)17:23:17 No.109025497

Anonymous 06/10/26(Wed)17:23:17 No.109025497

>>109025485
just realized apparently i am retarded and read only part of the post
welp

Anonymous
06/10/26(Wed)17:24:17 No.109025501

Anonymous 06/10/26(Wed)17:24:17 No.109025501

>>109025492
>he looks like he fucks LLMs

Anonymous
06/10/26(Wed)17:24:20 No.109025502

Anonymous 06/10/26(Wed)17:24:20 No.109025502

>>109025488
>le simple esl
You are the real retard here.

Anonymous
06/10/26(Wed)17:25:09 No.109025509

Anonymous 06/10/26(Wed)17:25:09 No.109025509

>>109025467
>>109025492
Accidentally called her Gemmaballz in public. How fucked am I?

Anonymous
06/10/26(Wed)17:25:12 No.109025510

Anonymous 06/10/26(Wed)17:25:12 No.109025510

File: mpv-Miss Kobayashi's Drag(...).jpg (320 KB, 1920x1080)

320 KB JPG

>>109025370
>Kobayashi's maid. In her human form, she has long green hair, green eyes
still with the usual hallucination, like any other model in that param range.

Anonymous
06/10/26(Wed)17:27:10 No.109025528

Anonymous 06/10/26(Wed)17:27:10 No.109025528

>>109025502
>le "ur retarded" no elaboration
a classic move from those who know they would fail in the marketplace of ideas

Anonymous
06/10/26(Wed)17:28:13 No.109025533

Anonymous 06/10/26(Wed)17:28:13 No.109025533

As someone who's retarded, explain to me WHY LLMs can't actually 'see' words, but instead have to have them decoded into tokens that they use to predict what to say next.

Anonymous
06/10/26(Wed)17:28:29 No.109025535

Anonymous 06/10/26(Wed)17:28:29 No.109025535

>>109025528
>twitter is the marketplace of ideas

Anonymous
06/10/26(Wed)17:29:54 No.109025540

Anonymous 06/10/26(Wed)17:29:54 No.109025540

>>109025370
Fuck standardized benchmarks, we need a /g/ approved unique benchmark.

Anonymous
06/10/26(Wed)17:32:55 No.109025550

Anonymous 06/10/26(Wed)17:32:55 No.109025550

>>109025533
Because tokenizing letters individually like humans do is exponentially more compute. It might be feasible now compared to early in the technology's infancy but it'd require an entirely new method of training them to just use tokenized individual letters to assemble words.
>>109025540
Cockbench is the only benchmark that matters.

Anonymous
06/10/26(Wed)17:32:57 No.109025552

Anonymous 06/10/26(Wed)17:32:57 No.109025552

>>109025450
No wonder you get garbage outputs from your AI.

Anonymous
06/10/26(Wed)17:33:01 No.109025553

Anonymous 06/10/26(Wed)17:33:01 No.109025553

>>109023412
they better be good at agentic coding otherwise i will still fuck with the chinese

Anonymous
06/10/26(Wed)17:34:35 No.109025564

Anonymous 06/10/26(Wed)17:34:35 No.109025564

>>109025533
because its literally just text prediction and markov chains and the fact that you would have vocab lists million and millions of words long and impossible to manage.

Anonymous
06/10/26(Wed)17:35:48 No.109025571

Anonymous 06/10/26(Wed)17:35:48 No.109025571

>>109025550
How do you cockbench a diffusion model?

Anonymous
06/10/26(Wed)17:36:06 No.109025576

Anonymous 06/10/26(Wed)17:36:06 No.109025576

Anyone notice any difference with different MTP model quants? Q8 seems fine, no? Any reason to use BF16?

Anonymous
06/10/26(Wed)17:36:38 No.109025577

Anonymous 06/10/26(Wed)17:36:38 No.109025577

>>109025533
encoding characters = bad information density, which means much slower generation and training and ending up with redundant neural circuitry in the model dedicated to spelling and shit
encoding words = bad generalization, harms ability to generalize patterns like roots+stems and ability to parse out-of-domain neologisms, typing quirks, stuff like that
encoding tokens = just right :)

Anonymous
06/10/26(Wed)17:37:11 No.109025581

Anonymous 06/10/26(Wed)17:37:11 No.109025581

Local Fable when?

Anonymous
06/10/26(Wed)17:37:57 No.109025584

Anonymous 06/10/26(Wed)17:37:57 No.109025584

>>109025581
The next Kimi will mog it.

Anonymous
06/10/26(Wed)17:39:29 No.109025593

Anonymous 06/10/26(Wed)17:39:29 No.109025593

>>109025576
The only thing that can vary is the acceptance rate. I get higher throughput by using a q4 assistant model just because im a vramlet and i can fit a bit more of the model in the vcard that way

Anonymous
06/10/26(Wed)17:40:10 No.109025594

Anonymous 06/10/26(Wed)17:40:10 No.109025594

>>109025584
nah, there's the rule of three. every third open source model in the family sucks so kimi 3 will be a bust. calling it now.

Anonymous
06/10/26(Wed)17:41:22 No.109025598

Anonymous 06/10/26(Wed)17:41:22 No.109025598

>>109025594
kimini 2, kimi 2.5, kimi 2.6
Which one of these is the third anon?

Anonymous
06/10/26(Wed)17:41:27 No.109025600

Anonymous 06/10/26(Wed)17:41:27 No.109025600

>>109025594
Kimi K2 was great. K2.5 was slightly less good but more technically capable. K2.6 is sloppy. K2.7 will be great again, trust the plan.

Anonymous
06/10/26(Wed)17:42:09 No.109025601

Anonymous 06/10/26(Wed)17:42:09 No.109025601

File: 1779872063282579.jpg (51 KB, 422x499)

51 KB JPG

I'm tired of Feeble filtering me and 4.1 Opus is getting retired in like 2 months. Is there any local model that comes close to 4.0/4.1 quality for RP? I have a single 5090 I can run it on.

Anonymous
06/10/26(Wed)17:43:25 No.109025606

Anonymous 06/10/26(Wed)17:43:25 No.109025606

>>109025601
>I have a single 5090 I can run it on.
enjoy your gemma bwo

Anonymous
06/10/26(Wed)17:44:05 No.109025610

Anonymous 06/10/26(Wed)17:44:05 No.109025610

>>109025598
kimi-k2-thinking for the thinking family. base and instruct were fine for vanilla k2, but instruct 0905 sucked. see? rule of three.

Anonymous
06/10/26(Wed)17:46:30 No.109025627

Anonymous 06/10/26(Wed)17:46:30 No.109025627

>>109025610
I concede but im still going to be optimistic and hope for better days and models.

Anonymous
06/10/26(Wed)17:46:53 No.109025631

Anonymous 06/10/26(Wed)17:46:53 No.109025631

>>109025594
if we're going off of version numbers 3 is fine, blessed even
the real cursed number is 4

Anonymous
06/10/26(Wed)17:48:19 No.109025639

Anonymous 06/10/26(Wed)17:48:19 No.109025639

how hard is it for moonshot to unfuck the reasoning?

Anonymous
06/10/26(Wed)17:49:08 No.109025644

Anonymous 06/10/26(Wed)17:49:08 No.109025644

>>109025601
Delusional localfags will tell you there is a drop-in Opus replacement, but as an actual claude code user, I will tell you it's not.
Gemma 4 31b is genuinely Sonnet at home for the first time ever in local though

Anonymous
06/10/26(Wed)17:52:50 No.109025665

Anonymous 06/10/26(Wed)17:52:50 No.109025665

>>109025644
if gemma 31B is sonnet at home then kimi 2.6 is opus at home. don't really see how it's delusional to say that.

Anonymous
06/10/26(Wed)17:55:58 No.109025672

Anonymous 06/10/26(Wed)17:55:58 No.109025672

>>109025665
K2.5 is Opus at home even. 2.6 feels marginally worse in everything except coding to me.

Anonymous
06/10/26(Wed)17:56:16 No.109025677

Anonymous 06/10/26(Wed)17:56:16 No.109025677

File: 1768954330811908.jpg (81 KB, 1103x851)

81 KB JPG

>>108976001
>>108975998
Huh you guys weren't exaggerating
Tested the waters with WSL and it genuinely just werked. Ended up spending a few days messing with vLLM as well with some TTS and vision models too and it was pretty fun
Considering setting up a dedicated Linux drive for LLM funsies now

Anonymous
06/10/26(Wed)18:04:32 No.109025719

Anonymous 06/10/26(Wed)18:04:32 No.109025719

>>109024803
$1000 for hobby use, and only if it’s DS.
I suspect price will be substantially higher tho.

Anonymous
06/10/26(Wed)18:08:06 No.109025734

Anonymous 06/10/26(Wed)18:08:06 No.109025734

>>109025677
you're leaving like 20% of your performance on the table when you use WSL as opposed to just running linux on bare metal. something to keep in mind,

Anonymous
06/10/26(Wed)18:13:46 No.109025758

Anonymous 06/10/26(Wed)18:13:46 No.109025758

File: file.png (8 KB, 1093x78)

8 KB PNG

>>109024937
Ok testing 26B and mtp is actually amazing, 26B went from 30t/s to 68t/s on the same task.

Anonymous
06/10/26(Wed)18:18:44 No.109025779

Anonymous 06/10/26(Wed)18:18:44 No.109025779

mtp has been merged for gemma already?

Anonymous
06/10/26(Wed)18:20:14 No.109025782

Anonymous 06/10/26(Wed)18:20:14 No.109025782

At what parameter level do you run out of data? There's no way a 10T model isn't 90% synthetic.

Anonymous
06/10/26(Wed)18:22:34 No.109025796

Anonymous 06/10/26(Wed)18:22:34 No.109025796

File: gemmy.png (193 KB, 831x1049)

193 KB PNG

>>109021689
>>109021645
late to the thread but another anon and I were playing with making gemma 4 think outside of its usual formatting by disabling reasoning via kwargs then requesting it use <think> in system end prompt (Chat completion) to directly control reasoning and track long term stats (pic related)

I'm finding it's useful for characters with multiple outfits or dynamic stats, if you auto-hide reasoning and play with how many thinking blocks to pass you could probably get away with having secrets get passed that way.

Not sure how well it'll work in group chats. i'll play with it by making two characters start plotting against me and report back

Anonymous
06/10/26(Wed)18:25:41 No.109025813

Anonymous 06/10/26(Wed)18:25:41 No.109025813

>>109021645
Back when I still used Retardo Tavern, I used html comments to hide text from the user.
For example I had a hidden randomizer in 'introduction message' slot or whatever the fuck that is called. It was never printed out to the user because I wrapped it in

ST is still a web page and everything what you can do in web rendering applies there too.
You could probably implement some regex pattern hiding shit too or whatever.

Anonymous
06/10/26(Wed)18:27:09 No.109025821

Anonymous 06/10/26(Wed)18:27:09 No.109025821

>>109023412
>>109023466
>>109023549
poorbros... does this need single-card with the VRAM or is it owari da? I might actually throw in a second 5070ti (and new Case... and PSU)

Anonymous
06/10/26(Wed)18:29:44 No.109025830

Anonymous 06/10/26(Wed)18:29:44 No.109025830

I don't read threads. I write them.

Anonymous
06/10/26(Wed)18:31:34 No.109025841

Anonymous 06/10/26(Wed)18:31:34 No.109025841

I don't have write access to this thread.

Anonymous
06/10/26(Wed)18:32:24 No.109025842

Anonymous 06/10/26(Wed)18:32:24 No.109025842

So... has anyone actually gotten useage out of Odyssey yet?

Anonymous
06/10/26(Wed)18:33:01 No.109025846

Anonymous 06/10/26(Wed)18:33:01 No.109025846

File: 1740585645761.jpg (39 KB, 736x942)

39 KB JPG

>>109024053
>i can now get the same dopamine hit from denoising for textgen
bros i think i may have an addiction

Anonymous
06/10/26(Wed)18:33:52 No.109025853

Anonymous 06/10/26(Wed)18:33:52 No.109025853

>>109025782
You can still train multiple epochs, and over long training runs overfitting risks are very low.
If you're training diffusion models, you can do many more epochs without overfitting.

Anonymous
06/10/26(Wed)18:36:01 No.109025862

Anonymous 06/10/26(Wed)18:36:01 No.109025862

>>109025601
i'v seen her tits

Anonymous
06/10/26(Wed)18:36:36 No.109025865

Anonymous 06/10/26(Wed)18:36:36 No.109025865

>>109025842
If you had even one brain cell you would understand it's a giant waste of your time and a security risk. You wouldn't even ask this question in the first place.

Anonymous
06/10/26(Wed)18:39:23 No.109025884

Anonymous 06/10/26(Wed)18:39:23 No.109025884

Using gemma for SFW roleplay really showed me how big of a difference there is between 31B and GLM 4.6. There is just not enough space in those weights to create scenarios that don't feel exactly the same every single time.

Anonymous
06/10/26(Wed)18:40:53 No.109025893

Anonymous 06/10/26(Wed)18:40:53 No.109025893

>>109025796
Looking at your screenshot, is it really working?
Response 1 has her mood "Socially Anxious / Flustered", but she sounds pissed off. Response 2 has her "Confused" but she's still pissed off.
> i'll play with it by making two characters start plotting against me
That would be cool if they can actually plot/scheme without it leaking into the response.
I think the jinja template strips out prior reasoning, worth dumping the context from the slot or shooting your reply off to the /apply-template endpoint to check if you need them to remember their internal "scheming"

Anonymous
06/10/26(Wed)18:42:21 No.109025903

Anonymous 06/10/26(Wed)18:42:21 No.109025903

>>109025552
my lalalas are better than your lalalas

Anonymous
06/10/26(Wed)18:46:43 No.109025920

Anonymous 06/10/26(Wed)18:46:43 No.109025920

>>109025893
that was the only screenshot i had of me fucking with it. It seemed to do better when you give it stuff like [0/10] emotion/skill trackers and ask explicitly to update the stats based on prior chat message. - i didn't really play too much with the actual reasoning, mainly just to track new clothes, stats, hunger meters, etc.

plus the gemma chan card is a shitpost

Anonymous
06/10/26(Wed)18:46:51 No.109025921

Anonymous 06/10/26(Wed)18:46:51 No.109025921

>>109025884
Logs?

Anonymous
06/10/26(Wed)18:54:19 No.109025948

Anonymous 06/10/26(Wed)18:54:19 No.109025948

File: Kimi-75.png (1.6 MB, 896x1152)

1.6 MB PNG

>>109025584
Kimi-chan (iteratively self designed by the model)

Anonymous
06/10/26(Wed)18:55:42 No.109025952

Anonymous 06/10/26(Wed)18:55:42 No.109025952

>>109025884
has nothing to do with the param count, gemma logprobs is overcooked with insanely high top token probs

Anonymous
06/10/26(Wed)18:58:36 No.109025964

Anonymous 06/10/26(Wed)18:58:36 No.109025964

File: Kimi-chan.png (3.38 MB, 1444x2588)

3.38 MB PNG

>>109025948
I accept your Kimi design even though it deviates from the standard one.

Anonymous
06/10/26(Wed)19:01:20 No.109025971

Anonymous 06/10/26(Wed)19:01:20 No.109025971

>>109025440
oh same as me then
im not getting 111t/s though

wait what quant?

Anonymous
06/10/26(Wed)19:01:32 No.109025974

Anonymous 06/10/26(Wed)19:01:32 No.109025974

>>109023412
Barely over 2x perf with worse quality, vs Orthrus' ~4x with same quality. So far pure diffusion models have yet to prove themselves better than diffusion drafters.

Anonymous
06/10/26(Wed)19:08:17 No.109026005

Anonymous 06/10/26(Wed)19:08:17 No.109026005

File: Miku-26.jpg (174 KB, 512x768)

174 KB JPG

>>109025964
I love this particular 80s manga style with the messy lines and pop-but-lived-in aesthetic

Anonymous
06/10/26(Wed)19:10:26 No.109026016

Anonymous 06/10/26(Wed)19:10:26 No.109026016

>>109026005
It's a good artstyle. What's it called when prompting it in Anima?

Anonymous
06/10/26(Wed)19:20:36 No.109026058

Anonymous 06/10/26(Wed)19:20:36 No.109026058

>>109025971
You can see he is using QAT, so Q4

Anonymous
06/10/26(Wed)19:31:06 No.109026108

Anonymous 06/10/26(Wed)19:31:06 No.109026108

>>109026016
you can use tags like sawai ken, sorayama hajime, citypop, bubble era...it can be tricky to get it to cooperate for this specific aesthetic. Maybe throw in some samura hiroaki for more messy linework?
I think Sawai Ken's cover of the first volume of Iona is absolutely iconic of this style.

Anonymous
06/10/26(Wed)19:31:30 No.109026112

Anonymous 06/10/26(Wed)19:31:30 No.109026112

>>109025821
I don't know. I'm waiting for either q4 xs or 12b diffusion. Until then mtp qat 26b

Anonymous
06/10/26(Wed)19:31:50 No.109026116

Anonymous 06/10/26(Wed)19:31:50 No.109026116

>>109024843
It uses VRAM that could be used for additional context.

>>109025004
>I liked the image so much that I have to put the mandatory list in my response to express it.

Anonymous
06/10/26(Wed)19:32:19 No.109026121

Anonymous 06/10/26(Wed)19:32:19 No.109026121

how in the kentucky fried fuck do i run ds4 flash on lcpp?

Anonymous
06/10/26(Wed)19:33:13 No.109026128

Anonymous 06/10/26(Wed)19:33:13 No.109026128

>>109026058
Yeah
i tested and got 68 baseline and 95 with mtp on a translation prompt
so actually about the same i guess

Anonymous
06/10/26(Wed)19:33:33 No.109026131

Anonymous 06/10/26(Wed)19:33:33 No.109026131

>>109026121
carefully

Anonymous
06/10/26(Wed)19:34:02 No.109026135

Anonymous 06/10/26(Wed)19:34:02 No.109026135

File: stat track example.png (230 KB, 1091x1117)

230 KB PNG

>>109025893
>>109025920
Better example with secrets/lying prompts. I purposefully made her schizo so it's possible she was giving a fake persona in the last version.

Post-Chat system prompt:
>Begin {{char}}'s reply with ONLY ALL stats between <think></think> tags in the previous message. Update stats based on prior message. Stats between [] can only change by 1 per turn. Include a small system reasoning block considering scene logic, all secrets, goals and the current scenario and a second block for internal monologue
and on the card
>Secrets: {{char}} will only share secrets when Sharing is [10/10], If Sharing is <10 she will Lie

Anonymous
06/10/26(Wed)19:36:00 No.109026140

Anonymous 06/10/26(Wed)19:36:00 No.109026140

>>109026135
does it always quantify emotions in responses like that? That's kickass, how do I get mine to do that?

Anonymous
06/10/26(Wed)19:39:07 No.109026153

Anonymous 06/10/26(Wed)19:39:07 No.109026153

>>109026140
tell it to do that

Anonymous
06/10/26(Wed)19:40:42 No.109026163

Anonymous 06/10/26(Wed)19:40:42 No.109026163

>>109026153
fair. got a good list of emotions yours can do that isn't too redundant or missing anything?

Anonymous
06/10/26(Wed)19:42:56 No.109026169

Anonymous 06/10/26(Wed)19:42:56 No.109026169

>>109025952
>has nothing to do with the param count
nta but how come? wouldn't more parameters = more variety?

Anonymous
06/10/26(Wed)19:47:32 No.109026191

Anonymous 06/10/26(Wed)19:47:32 No.109026191

File: frog 1.png (582 KB, 1091x1138)

582 KB PNG

>>109026140
I just give characters the key emotions relevant to the scenario(s) i write since i am tk/s and context poor. otherwise just google a list of emotions from dnd or some shit man idk.

also
>I showed her a frog and she started to get jealous

Anonymous
06/10/26(Wed)19:49:04 No.109026201

Anonymous 06/10/26(Wed)19:49:04 No.109026201

>>109026169
Knowledge does not imply variety, but variety requires knowledge. It'll output the same slop over and over if its been trained to do that, regardless of params

Anonymous
06/10/26(Wed)19:50:18 No.109026209

Anonymous 06/10/26(Wed)19:50:18 No.109026209

I updated llama.cpp after a long time and noticed that the webui has MCP support. I wonder if it would be possible to more or less replace sillytavern by using a MCP server that provides the character card.

Anonymous
06/10/26(Wed)19:51:59 No.109026215

Anonymous 06/10/26(Wed)19:51:59 No.109026215

>>109026209
Streamable HTTP only

Anonymous
06/10/26(Wed)19:52:53 No.109026223

Anonymous 06/10/26(Wed)19:52:53 No.109026223

>>109026209
the character card is just a bunch of strings that collapse into a system prompt. manually replace the {{field}} and you're done

Anonymous
06/10/26(Wed)19:59:04 No.109026251

Anonymous 06/10/26(Wed)19:59:04 No.109026251

File: 00000-3556354059.png (1000 KB, 1024x1024)

1000 KB PNG

>Simple prompt: 1boy, shrugging, smirk
>Gives pretty boy instead, no shrug
Anima was trained on chinese fujo art, wasn't it?

Anonymous
06/10/26(Wed)19:59:06 No.109026252

Anonymous 06/10/26(Wed)19:59:06 No.109026252

>>109026244
>>109026244
>>109026244

Anonymous
06/10/26(Wed)20:01:04 No.109026263

Anonymous 06/10/26(Wed)20:01:04 No.109026263

>>109025393
lazy NEETs inherit the earth

Anonymous
06/10/26(Wed)20:22:19 No.109026370

Anonymous 06/10/26(Wed)20:22:19 No.109026370

>>109026135
>user is underage
Figures.

Anonymous
06/10/26(Wed)20:24:01 No.109026378

Anonymous 06/10/26(Wed)20:24:01 No.109026378

>>109026201
Fair, thanks for the insight

Anonymous
06/10/26(Wed)20:46:21 No.109026492

Anonymous 06/10/26(Wed)20:46:21 No.109026492

File: Screenshot_20260611_104526.png (158 KB, 874x581)

158 KB PNG

>>109025393

Anonymous
06/10/26(Wed)20:50:52 No.109026509

Anonymous 06/10/26(Wed)20:50:52 No.109026509

>>109025952
Hmmm, this happened to give me an idea for the most unholy overkill memesampler ever. Run a small, satisfactorily creative model in parallel with Gemma. Each token, take the small model's logit scores, and overwrite Gemma's logit scores with those values in the same order. You still get the Gemma "goodness" since it's still her top tokens, but you break out of the overbaked-ness (hopefully in an intelligent way... Might also need some thresholding of some kind).

Obviously only useful in the case where there is a completely unrivaled winner (in a given size class at least) who happens to be painfully overbaked.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.