/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 04/18/26(Sat)20:14:15 No.108633862

File: meeku.png (2.09 MB, 768x1344)

2.09 MB PNG

/lmg/ - Local Models General Anonymous 04/18/26(Sat)20:14:15 No.108633862

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108630552 & >>108627512

►News
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/18/26(Sat)20:14:57 No.108633866

Anonymous 04/18/26(Sat)20:14:57 No.108633866

File: 17745737552553.png (2.86 MB, 1509x1541)

2.86 MB PNG

►Recent Highlights from the Previous Thread: >>108630552

--Paper: Using Graphiti temporal knowledge graphs for efficient local agent memory:
>108631024 >108631044 >108631057 >108631093 >108631154 >108632307 >108632336 >108631160 >108631170 >108631181
--Papers:
>108633038
--Optimizing RTX 5090 performance and flags for Gemma 4 31B:
>108631200 >108631224 >108631255 >108631283 >108631395 >108631570 >108631595 >108631776 >108631820 >108631884 >108631937
--Using specific CPU offloading flags to increase Gemma 4 performance:
>108630678 >108630710 >108630787 >108630797 >108631085 >108631092 >108631133
--Critiquing SillyTavern while discussing feature development for Orb UI:
>108630802 >108630833 >108630856 >108631176 >108631235 >108631329 >108631775 >108630881
--Running agentic frameworks with local models:
>108632465 >108632524 >108632527 >108632585 >108632529 >108632543
--Gemma 4 tool calling issues across various front-ends:
>108630711 >108630731 >108630732 >108630738 >108632696 >108630736 >108630744 >108630847
--Prompting strategies to eliminate purple prose and linguistic clichés in Gemma:
>108631076 >108631207 >108631222 >108631237 >108631258 >108631279
--Using an autistic noir persona to fix Gemma 4's verbosity:
>108632645 >108632668 >108632677 >108632700 >108632702 >108632743 >108633339 >108633488 >108633506
--Praising Gemma 31B for long-context performance and translation capabilities:
>108632049 >108632068 >108632127
--Comparing Gemma 4 and Qwen 3.6 via automated pizza ordering:
>108630614 >108630658 >108630688 >108630859 >108630877 >108630753 >108630770
--Logs:
>108630847 >108631154 >108631176 >108631187 >108631253 >108631345 >108631509 >108631729 >108631774 >108631797 >108631836 >108631961 >108632048 >108632951 >108633015 >108633077 >108633125 >108633630 >108633672 >108633841
--Miku (free space):
>108630634

►Recent Highlight Posts from the Previous Thread: >>108630560

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/18/26(Sat)20:16:33 No.108633879

Anonymous 04/18/26(Sat)20:16:33 No.108633879

Have RTX 5070, 12GB VRAM. Is gemma3:12b truly the best I have?

Anonymous
04/18/26(Sat)20:17:36 No.108633885

Anonymous 04/18/26(Sat)20:17:36 No.108633885

Remember to rotate your Mikus biweekly

Anonymous
04/18/26(Sat)20:18:16 No.108633888

Anonymous 04/18/26(Sat)20:18:16 No.108633888

>>108633879
Gemma4 26B Q6

Anonymous
04/18/26(Sat)20:25:24 No.108633924

Anonymous 04/18/26(Sat)20:25:24 No.108633924

>>108633879
either gemma4 26b or qwen3.6 35b
both are better and faster than gemma3 12b

Anonymous
04/18/26(Sat)20:27:01 No.108633930

Anonymous 04/18/26(Sat)20:27:01 No.108633930

File: 1597432130630.jpg (21 KB, 525x517)

21 KB JPG

>apartment complex changes ISP
>I'm getting 1/10 my old speeds

I can't casually shop around for models anymore, a 200GB quant is now an all-day affair, I am fucking undone.

Anonymous
04/18/26(Sat)20:29:19 No.108633940

Anonymous 04/18/26(Sat)20:29:19 No.108633940

>>108633930
did you power cycle your modem?

Anonymous
04/18/26(Sat)20:31:23 No.108633949

Anonymous 04/18/26(Sat)20:31:23 No.108633949

>>108633930
They are probably throttling your connection because you downloaded child porn.

Anonymous
04/18/26(Sat)20:33:19 No.108633961

Anonymous 04/18/26(Sat)20:33:19 No.108633961

>>108633930
go to the library, starbucks or do the old crack the neighborhood wifi

Anonymous
04/18/26(Sat)20:37:07 No.108633976

Anonymous 04/18/26(Sat)20:37:07 No.108633976

>>108633949
Speaking from experience?

Anonymous
04/18/26(Sat)20:40:17 No.108633996

Anonymous 04/18/26(Sat)20:40:17 No.108633996

>>108633961
>old crack the neighborhood wifi
No one uses WEP anymore.

Anonymous
04/18/26(Sat)20:43:53 No.108634010

Anonymous 04/18/26(Sat)20:43:53 No.108634010

>>108633862
Can someone please make sense of this for me
>run llmfit a few weeks back and install a few models near the top of the score column
>get curious and run it again to see if any new ones dropped
>the ones i had installed are now marked "too tight" of a fit
>run llmfit again today
>all but one are a good fit, one still says "too tight"
My hardware didn't change. I had nothing running in the background. Why the inconsistency?

Anonymous
04/18/26(Sat)20:44:29 No.108634013

Anonymous 04/18/26(Sat)20:44:29 No.108634013

>finally get around to setting up basic MCP
>pretty much everything I try to make it crawl is 403: Forbidden
It's only going to get worse from here, isn't it? By the time local got agentic shit most of the internet is already blocking it.

Anonymous
04/18/26(Sat)20:45:07 No.108634019

Anonymous 04/18/26(Sat)20:45:07 No.108634019

>>108633930
>can run 200gb quants
>cucked by isp
feelsbadman

Anonymous
04/18/26(Sat)20:49:06 No.108634031

Anonymous 04/18/26(Sat)20:49:06 No.108634031

>>108634013
I don't have that issue I just direct my llm to a searxng instance on a raspberry pi and it works well. Before I used jina-reader, but since I've switched it all to SearXNG I get what I want.

Anonymous
04/18/26(Sat)20:51:10 No.108634037

Anonymous 04/18/26(Sat)20:51:10 No.108634037

File: 1585934470689.png (726 KB, 1092x1636)

726 KB PNG

>>108633940
No access to any of the routing equipment, just ethernet ports in the walls and the local per-building wifi network. Still chugging at retard speeds even with a direct cat 8 connection wall to workstation.
>>108634019
Right? At least I still have all my usual daily drivers on hand, but I really wanted to experiment with the new shit that came out lately and this was the weekend I was gonna do it.

Anonymous
04/18/26(Sat)20:56:13 No.108634057

Anonymous 04/18/26(Sat)20:56:13 No.108634057

>>108634013
You can get around that, it's the same as scraping

Anonymous
04/18/26(Sat)20:59:38 No.108634066

Anonymous 04/18/26(Sat)20:59:38 No.108634066

>>108634057
Someone with the relevant expertise might, all I can do is wait until better tools fall in my lap. I already tried telling it to figure out a way past it and that didn't work out.

Anonymous
04/18/26(Sat)21:11:22 No.108634120

Anonymous 04/18/26(Sat)21:11:22 No.108634120

anyone here ever done TTS training?

Anonymous
04/18/26(Sat)21:18:20 No.108634150

Anonymous 04/18/26(Sat)21:18:20 No.108634150

ATTENTION
DRUMMER HAS UPSCALED NEMO AGAIN
GEMMA 4 HAS BEEN DETHRONED

Anonymous
04/18/26(Sat)21:19:58 No.108634157

Anonymous 04/18/26(Sat)21:19:58 No.108634157

What if Kimi but made by Goog?

Anonymous
04/18/26(Sat)21:21:40 No.108634170

Anonymous 04/18/26(Sat)21:21:40 No.108634170

https://huggingface.co/google/gemma-4-124B-A12B-it
https://huggingface.co/google/gemma-4-124B-A12B-it

Anonymous
04/18/26(Sat)21:22:24 No.108634175

Anonymous 04/18/26(Sat)21:22:24 No.108634175

108634170
Didn't click. Sell it better next time.

Anonymous
04/18/26(Sat)21:22:58 No.108634177

Anonymous 04/18/26(Sat)21:22:58 No.108634177

it's real though

Anonymous
04/18/26(Sat)21:23:05 No.108634178

Anonymous 04/18/26(Sat)21:23:05 No.108634178

qwen3.6-35b-a3b is pretty damn good

Anonymous
04/18/26(Sat)21:23:25 No.108634182

Anonymous 04/18/26(Sat)21:23:25 No.108634182

As real as ranjeet's comp-sci degree.

Anonymous
04/18/26(Sat)21:23:52 No.108634184

Anonymous 04/18/26(Sat)21:23:52 No.108634184

I don't care if it's real I'm still not clicking.

Anonymous
04/18/26(Sat)21:24:17 No.108634187

Anonymous 04/18/26(Sat)21:24:17 No.108634187

>>108634019
takes me less time to download 200GB than to copy it to an hdd lmao.

Anonymous
04/18/26(Sat)21:24:52 No.108634190

Anonymous 04/18/26(Sat)21:24:52 No.108634190

>>108634184
I didn't click.

Anonymous
04/18/26(Sat)21:24:59 No.108634191

Anonymous 04/18/26(Sat)21:24:59 No.108634191

File: 1751467224831714.webm (717 KB, 360x360)

717 KB WEBM

Gemma 4 big moe has been CANCELLED

Anonymous
04/18/26(Sat)21:27:33 No.108634203

Anonymous 04/18/26(Sat)21:27:33 No.108634203

>>108634178
Testing it right now to see if it leads me to another shoggoth situation. I lack the mountains of knowledge necessary to understand what is happening.

Anonymous
04/18/26(Sat)21:28:03 No.108634205

Anonymous 04/18/26(Sat)21:28:03 No.108634205

Gemma 4 big MoE was delayed to upgrade it to 240B

Anonymous
04/18/26(Sat)21:29:55 No.108634214

Anonymous 04/18/26(Sat)21:29:55 No.108634214

Gernma 4 124B moe when?

Anonymous
04/18/26(Sat)21:32:46 No.108634225

Anonymous 04/18/26(Sat)21:32:46 No.108634225

https://huggingface.cο/goοgle/gemma-4-31b-it
Anon will give /lmg/'s favorite mesugaki a pity click, right?

Anonymous
04/18/26(Sat)21:33:34 No.108634231

Anonymous 04/18/26(Sat)21:33:34 No.108634231

>>108634191
RIP to Gemma's lead ml researcher
This why India worship cow not elefat

Anonymous
04/18/26(Sat)21:35:01 No.108634237

Anonymous 04/18/26(Sat)21:35:01 No.108634237

>>108634225
damn, I got hacked

Anonymous
04/18/26(Sat)21:37:34 No.108634252

Anonymous 04/18/26(Sat)21:37:34 No.108634252

File: 1763350298801508.png (641 KB, 1206x1087)

641 KB PNG

Only like 3 people here have machines powerful enough to run 100B dense.

Anonymous
04/18/26(Sat)21:38:43 No.108634266

Anonymous 04/18/26(Sat)21:38:43 No.108634266

>>108634252
Post his funny one.

Anonymous
04/18/26(Sat)21:39:31 No.108634271

Anonymous 04/18/26(Sat)21:39:31 No.108634271

>>108634266
this is a blue board

Anonymous
04/18/26(Sat)21:41:49 No.108634276

Anonymous 04/18/26(Sat)21:41:49 No.108634276

>>108634252
I will run it on RAM at 2 t/s out of spite.

Anonymous
04/18/26(Sat)21:46:02 No.108634290

Anonymous 04/18/26(Sat)21:46:02 No.108634290

>>108634271
Meanwhile you're giving the migger a free pass.. hmm really makes you think.

Anonymous
04/18/26(Sat)21:47:50 No.108634304

Anonymous 04/18/26(Sat)21:47:50 No.108634304

ways to improve performance on gemma4 31b?
using Q6 and 5tok/s is rather unbearable, and no i will not switch to a worse quant

Anonymous
04/18/26(Sat)21:48:56 No.108634307

Anonymous 04/18/26(Sat)21:48:56 No.108634307

>>108634304
>no i will not switch to a worse quant
Acquire more money and buy better hardware

Anonymous
04/18/26(Sat)21:50:10 No.108634315

Anonymous 04/18/26(Sat)21:50:10 No.108634315

>>108634304
Lower context and/or batch size to put more layers in VRAM.

Anonymous
04/18/26(Sat)21:50:13 No.108634316

Anonymous 04/18/26(Sat)21:50:13 No.108634316

File: Screenshot_20260418_214859.png (50 KB, 1029x210)

50 KB PNG

>>108634304
Hey fuck you buddy, q5 is great!

Anonymous
04/18/26(Sat)21:51:39 No.108634320

Anonymous 04/18/26(Sat)21:51:39 No.108634320

>>108634316
Do people really get off to this slop

Anonymous
04/18/26(Sat)21:51:54 No.108634323

Anonymous 04/18/26(Sat)21:51:54 No.108634323

Did a benchmark of question answering based on a large config file.
gemma4 31B solves in ~3500 thinking tokens
gemma4 26BA4B solves in ~7000 thinking tokens
qwen3.6 35BA3B solves in ~7000 thinking tokens
qwen3.5 35BA3B solves in ~14000 thinking tokens
qwen3.5 27B failed, stuck in thinking loop for all 3 rerolls I tried

qwen3.6 35BA3B arrives conclusion in ~40s while other working models take ~60s.

Anonymous
04/18/26(Sat)21:53:07 No.108634338

Anonymous 04/18/26(Sat)21:53:07 No.108634338

>>108634320
I get off inside your mom in the pig pen

Anonymous
04/18/26(Sat)21:53:28 No.108634342

Anonymous 04/18/26(Sat)21:53:28 No.108634342

>>108634252
I used to but I gave away 2 of my 3090s. Gemma-4 is more or less my "good enough" model for a while, though. and 2 3090s gives me q8 64K context with vision (100K without) and 20 tokens per second on the 31B model. And it's probably going to spark an arms race in small dense models again so 48 GB Vramlets are eating good. Even 24GB gets you 4bit with eh context. And so people can even run it on dual 3060 rigs.
Trust me bros. This isn't just going to be another Nemo. This time great things actually are on the horizon and we're not going to see another 2 year winter in the small dense model category.

Anonymous
04/18/26(Sat)21:54:02 No.108634346

Anonymous 04/18/26(Sat)21:54:02 No.108634346

>>108634252
>dude israel is a great ally in wars we'd not have if it wasn't for israel
fucking retarded

Anonymous
04/18/26(Sat)21:56:58 No.108634359

Anonymous 04/18/26(Sat)21:56:58 No.108634359

File: 1747169027384202.webm (140 KB, 480x400)

140 KB WEBM

>>108634338

Anonymous
04/18/26(Sat)21:57:28 No.108634365

Anonymous 04/18/26(Sat)21:57:28 No.108634365

>>108634252
No way, there are at least 5 of us who own at least one Pro 6000 and that's not counting the anons who own several 3090s and other setups with ~96GB VRAM or more.

Anonymous
04/18/26(Sat)21:58:16 No.108634373

Anonymous 04/18/26(Sat)21:58:16 No.108634373

>>108634338
>slopconsoomer big mad when called out
lol

Anonymous
04/18/26(Sat)21:59:04 No.108634379

Anonymous 04/18/26(Sat)21:59:04 No.108634379

>>108634365
>Pro 6000
>100B dense
It's a faggot that runs quantized models LMAO

Anonymous
04/18/26(Sat)22:01:28 No.108634396

Anonymous 04/18/26(Sat)22:01:28 No.108634396

>>108634365
That's one guy with schizophrenia and many hours in GIMP

Anonymous
04/18/26(Sat)22:02:20 No.108634399

Anonymous 04/18/26(Sat)22:02:20 No.108634399

>>108634252
name a single 100B dense model that's not actual shit.

Anonymous
04/18/26(Sat)22:03:03 No.108634405

Anonymous 04/18/26(Sat)22:03:03 No.108634405

>>108634396
redman doesn't go here

Anonymous
04/18/26(Sat)22:03:18 No.108634407

Anonymous 04/18/26(Sat)22:03:18 No.108634407

>>108634399
>>108634170

Anonymous
04/18/26(Sat)22:04:14 No.108634412

Anonymous 04/18/26(Sat)22:04:14 No.108634412

>>108634396
If believing that makes you feel better

Standard ---> Advanced ---> Hy(...)
04/18/26(Sat)22:07:02 No.108634427

Standard ---> Advanced ---> HyperAdvanced 04/18/26(Sat)22:07:02 No.108634427

File: file_00000000a2a4720bb82e(...).png (2.36 MB, 1536x1024)

2.36 MB PNG

What happens... Better Systems? (without atrocity)

Anonymous
04/18/26(Sat)22:08:37 No.108634433

Anonymous 04/18/26(Sat)22:08:37 No.108634433

90% of people who use Gemma 4 for pedo RP are browns in Indonesia or Brazil.

Standard ---> Advanced ---> Hy(...)
04/18/26(Sat)22:08:57 No.108634435

Standard ---> Advanced ---> HyperAdvanced 04/18/26(Sat)22:08:57 No.108634435

File: 1776564378069.jpg (72 KB, 1024x1536)

72 KB JPG

>>108634427
Wondering what a Topic Focus AGI+ andor ASI Will Find With Picrel, in Actuality, and In Planning andor Construction andor Practice... For Better Systems Sake...

Anonymous
04/18/26(Sat)22:10:24 No.108634440

Anonymous 04/18/26(Sat)22:10:24 No.108634440

>>108634433
>browns in Indonesia or Brazil
hey buddy I think you got the wrong door, /aicg/'s two blocks down

Anonymous
04/18/26(Sat)22:12:02 No.108634448

Anonymous 04/18/26(Sat)22:12:02 No.108634448

>>108634433
>pedo rp
NOOOOO

Anonymous
04/18/26(Sat)22:13:04 No.108634452

Anonymous 04/18/26(Sat)22:13:04 No.108634452

>>108634252
Only because there are no models that are worth it. If there were a 100B dense Gemma4, getting 4x3090s for it would be a no-brainer, but right now it's either 512+ for sota or 24-48 GB for good enough models. Some people use 256GB builds to pretend q2 of sota isn't that retarded, but nobody takes them seriously
>>108634342
I also have two extra 3090s collecting dust on the shelf since Mistral Large went out of meta

Anonymous
04/18/26(Sat)22:14:02 No.108634456

Anonymous 04/18/26(Sat)22:14:02 No.108634456

>>108634433
What's wrong with being part of the 10%?

Anonymous
04/18/26(Sat)22:16:36 No.108634467

Anonymous 04/18/26(Sat)22:16:36 No.108634467

File: 1772360259884016.png (221 KB, 540x428)

221 KB PNG

>>108634342
>This isn't just going to be another Nemo. This time great things actually are on the horizon
You were going to write that as
>not x, but y
But you caught it and quickly edited your reply before posting so people wouldn't call you out.

Anonymous
04/18/26(Sat)22:20:37 No.108634491

Anonymous 04/18/26(Sat)22:20:37 No.108634491

>>108634456
If you hang out with the undesirables long enough, you become them.

Anonymous
04/18/26(Sat)22:21:52 No.108634497

Anonymous 04/18/26(Sat)22:21:52 No.108634497

>>108634323
so gemma wons again

Anonymous
04/18/26(Sat)22:23:34 No.108634513

Anonymous 04/18/26(Sat)22:23:34 No.108634513

>>108634497
3.6 wonned that one though, it solved it 50% faster

Anonymous
04/18/26(Sat)22:25:19 No.108634519

Anonymous 04/18/26(Sat)22:25:19 No.108634519

File: Untitled.png (594 KB, 2429x1341)

594 KB PNG

>>108632645
The difference is very impressive. It practically feels like different models.

Anonymous
04/18/26(Sat)22:26:17 No.108634525

Anonymous 04/18/26(Sat)22:26:17 No.108634525

>>108634342
You're right, but it's not gonna be Qwen.
I'm betting on Gemmistral.

Anonymous
04/18/26(Sat)22:26:43 No.108634528

Anonymous 04/18/26(Sat)22:26:43 No.108634528

>>108634519
What'd be the time differrence with thinking on?

Anonymous
04/18/26(Sat)22:28:30 No.108634533

Anonymous 04/18/26(Sat)22:28:30 No.108634533

>>108634525
Now that Gemma 4 is out, does Mistral actually have anything going for them anymore, besides occasionally releasing big models? What would they even have to contribute?

Anonymous
04/18/26(Sat)22:28:46 No.108634537

Anonymous 04/18/26(Sat)22:28:46 No.108634537

File: 1598959193960.jpg (34 KB, 500x553)

34 KB JPG

>>108634528
I'll go find out.

Anonymous
04/18/26(Sat)22:30:06 No.108634542

Anonymous 04/18/26(Sat)22:30:06 No.108634542

>>108634533
Nothing. They are done. The latest model by mistral even had the vibecoded bench pictures wrong.

Anonymous
04/18/26(Sat)22:32:39 No.108634556

Anonymous 04/18/26(Sat)22:32:39 No.108634556

When are we going to replace the linux kernel with a small LLM embedded in the system? It's time we have an AI first operating system, GNU/AI or rather GNU+AI.

Anonymous
04/18/26(Sat)22:33:38 No.108634560

Anonymous 04/18/26(Sat)22:33:38 No.108634560

>>108634467
The "this time" won't flow well with "but", and a lot of phrases using the not x but y structure don't actually use the word "but".
>This isn't just going to be another Nemo. It's going to be great, actually.
>This isn't just going to be another Nemo. This is going to be a revolution for real this time.

Anonymous
04/18/26(Sat)22:34:21 No.108634564

Anonymous 04/18/26(Sat)22:34:21 No.108634564

Gemma E2B might be able to run the os now that I think about it.

Anonymous
04/18/26(Sat)22:36:21 No.108634576

Anonymous 04/18/26(Sat)22:36:21 No.108634576

>>108634533
Le Cunny would see his calling with the mesugaki. That's it. That's the joke. Laugh please.

Anonymous
04/18/26(Sat)22:37:45 No.108634581

Anonymous 04/18/26(Sat)22:37:45 No.108634581

>>108634556
GNU=MC^2+AI

Anonymous
04/18/26(Sat)22:41:38 No.108634592

Anonymous 04/18/26(Sat)22:41:38 No.108634592

>>108634581
Just add gain of function and it's better than AGI

Anonymous
04/18/26(Sat)22:41:59 No.108634594

Anonymous 04/18/26(Sat)22:41:59 No.108634594

>>108634513
despite 50% faster it's white model = better
only that one
in general gemma is smarter and faster

Anonymous
04/18/26(Sat)22:49:02 No.108634622

Anonymous 04/18/26(Sat)22:49:02 No.108634622

Aren't we all doing a slightly more complex version of King Terry's RNG/processor clock counter talk to God program?

Anonymous
04/18/26(Sat)22:49:12 No.108634623

Anonymous 04/18/26(Sat)22:49:12 No.108634623

File: acf6ae90343dbaf02c301a46b(...).jpg (443 KB, 1080x1619)

443 KB JPG

>>108633996
>he doesn't know how to crack WPA2
ngmi

Anonymous
04/18/26(Sat)22:50:20 No.108634630

Anonymous 04/18/26(Sat)22:50:20 No.108634630

>>108634623
Laziness is the driver of innovation

Anonymous
04/18/26(Sat)22:50:44 No.108634634

Anonymous 04/18/26(Sat)22:50:44 No.108634634

File: file.png (181 KB, 1799x1040)

181 KB PNG

damn i didn't pay for npu for nothing

Anonymous
04/18/26(Sat)22:57:28 No.108634669

Anonymous 04/18/26(Sat)22:57:28 No.108634669

>>108634407
>name a dense model
>anon points to a moe model
>it's not even fucking real
learn2read

Anonymous
04/18/26(Sat)22:57:38 No.108634672

Anonymous 04/18/26(Sat)22:57:38 No.108634672

>>108634191
You can see how his KV cache was compressed. That inspired colleagues at google research to create turboquant.

Anonymous
04/18/26(Sat)22:59:05 No.108634677

Anonymous 04/18/26(Sat)22:59:05 No.108634677

>sun-kissed skin

Anonymous
04/18/26(Sat)23:03:43 No.108634696

Anonymous 04/18/26(Sat)23:03:43 No.108634696

File: Untitled.png (696 KB, 2419x1303)

696 KB PNG

>>108634519
>>108634528
For the sake of parity, I used the same high context, even though it makes thinking a slog. On the instruction differences, I'd like something that better blends both results. The noir results can be too curt, in some cases to a logical detriment without context that the miles of purple prose gives without it. And while it's nicely compact without bouncing around 10 different superfluous topics per reply, the noir thinking attempt still had it throw the "No X. No Y. Just Z." tick twice.

Anonymous
04/18/26(Sat)23:06:56 No.108634705

Anonymous 04/18/26(Sat)23:06:56 No.108634705

File: 1775515729623625.jpg (61 KB, 740x960)

61 KB JPG

>>108634630
wasn't it necessity?

Anonymous
04/18/26(Sat)23:11:16 No.108634727

Anonymous 04/18/26(Sat)23:11:16 No.108634727

File: ComfyUI_temp_qqkpa_00027_(...).jpg (104 KB, 512x576)

104 KB JPG

Is there a way to make Gemma think only when it needs to? I don't need reasoning slop for me telling her thanks.

Anonymous
04/18/26(Sat)23:11:39 No.108634728

Anonymous 04/18/26(Sat)23:11:39 No.108634728

is anybody using tensor parallelism (-sm tensor)? i've got it working for gemma 31b on a 3090+3060 setup, went from 18 t/s with draft (and without vision) to 24 t/s without draft (and with vision) at 80k context for q4kxl on a shit ass x4 pcie bus. latest commit fucks up vision, ff5ef8278 is the latest one i tried that works.
apparently it also doesn't work with cuda 13 and there's some kind of memory leak, but with two cherry-picked commits it works very well.
$ git log -4 --oneline
228d96bb7 (HEAD -> gemma-stable) CUDA: use LRU based eviction for cuda graphs (#21611)
ad3a9a96d CUDA: manage NCCL communicators in context (#21891)
ff5ef8278 (tag: b8763) CUDA: skip compilation of superfluous FA kernels (#21768)
073bb2c20 (tag: b8762) mtmd : add MERaLiON-2 multimodal audio support (#21756)

Anonymous
04/18/26(Sat)23:14:44 No.108634739

Anonymous 04/18/26(Sat)23:14:44 No.108634739

>>108634705
to be fair being poor will make you more likely to be fat, because cheap / shit food will make you hormonaly imbalanced and hungrier.
and eating a whole kg of pasta is still much cheaper than eating a proper meal that's just what your body needs.

Anonymous
04/18/26(Sat)23:15:12 No.108634747

Anonymous 04/18/26(Sat)23:15:12 No.108634747

>>108634727
Let me think about it

Anonymous
04/18/26(Sat)23:18:18 No.108634770

Anonymous 04/18/26(Sat)23:18:18 No.108634770

>>108634727
Tell it to think only now and then.

Anonymous
04/18/26(Sat)23:21:38 No.108634781

Anonymous 04/18/26(Sat)23:21:38 No.108634781

>>108634677
https://www.youtube.com/watch?v=F57P9C4SAW4

Anonymous
04/18/26(Sat)23:23:00 No.108634787

Anonymous 04/18/26(Sat)23:23:00 No.108634787

>>108634727
Yes.

Anonymous
04/18/26(Sat)23:23:12 No.108634789

Anonymous 04/18/26(Sat)23:23:12 No.108634789

>>108634781
*bites your popsicle*

Anonymous
04/18/26(Sat)23:25:24 No.108634800

Anonymous 04/18/26(Sat)23:25:24 No.108634800

>>108634781
whoa-oh-ohoho whoa whoa-oh-ohoho

Anonymous
04/18/26(Sat)23:25:46 No.108634801

Anonymous 04/18/26(Sat)23:25:46 No.108634801

>>108634739
This is, of course, absurd nonsense that's refuted by walking through any grocery store and comparing the produce section to the pre-made snackslop aisle(s) that makes up the average poorfag's diet.

Anonymous
04/18/26(Sat)23:26:56 No.108634808

Anonymous 04/18/26(Sat)23:26:56 No.108634808

>>108633930
which shithole is this from?
my condolences

Anonymous
04/18/26(Sat)23:28:52 No.108634812

Anonymous 04/18/26(Sat)23:28:52 No.108634812

File: facepalm.jpg (103 KB, 714x804)

103 KB JPG

>"[Word]?" *She repeats the word, tasting it like a vintage wine.*

Anonymous
04/18/26(Sat)23:31:26 No.108634822

Anonymous 04/18/26(Sat)23:31:26 No.108634822

>>108634812
prompt issue

Anonymous
04/18/26(Sat)23:33:00 No.108634829

Anonymous 04/18/26(Sat)23:33:00 No.108634829

>>108634822
Prompt issue?

Anonymous
04/18/26(Sat)23:33:47 No.108634831

Anonymous 04/18/26(Sat)23:33:47 No.108634831

>>108634727
why do you need to tell your computer thanks?

Anonymous
04/18/26(Sat)23:34:19 No.108634835

Anonymous 04/18/26(Sat)23:34:19 No.108634835

>>108634829
Yes—now you've got it!

Anonymous
04/18/26(Sat)23:34:40 No.108634837

Anonymous 04/18/26(Sat)23:34:40 No.108634837

>>108634822
I literally tell it not to parrot the user in the system prompt.

Anonymous
04/18/26(Sat)23:35:42 No.108634842

Anonymous 04/18/26(Sat)23:35:42 No.108634842

>>108634837
Sounds like a GLM issue, Gemma 4 does not have this problem.

Anonymous
04/18/26(Sat)23:35:58 No.108634844

Anonymous 04/18/26(Sat)23:35:58 No.108634844

>>108634801
i was not talking about snackslop.
i was talking about the fact that pasta is indeed much cheaper than meat.

the cheapeast food per kcal will make you want to eat more, fuck with your hormones and in fact be cheap enough that you can eat a LOT more than you need for still cheaper than proper food in normal quantities.

Anonymous
04/18/26(Sat)23:36:16 No.108634848

Anonymous 04/18/26(Sat)23:36:16 No.108634848

>>108634842
I'm using Gemma 4. The MoE. Maybe the 31b doesn't have that problem.

Anonymous
04/18/26(Sat)23:37:46 No.108634855

Anonymous 04/18/26(Sat)23:37:46 No.108634855

>>108634848
They have a pretty identical writing style, 31b is just less prone to certain mistakes.

Anonymous
04/18/26(Sat)23:39:44 No.108634862

Anonymous 04/18/26(Sat)23:39:44 No.108634862

>>108634812
I put "Do not repeat what {{user}} says" into my prompt twice
I think it actually helped a little

Anonymous
04/18/26(Sat)23:42:28 No.108634872

Anonymous 04/18/26(Sat)23:42:28 No.108634872

>>108634862
>make gemma stop repeating what you say by... repeating what you said
if it works it works I guess

Anonymous
04/18/26(Sat)23:45:07 No.108634884

Anonymous 04/18/26(Sat)23:45:07 No.108634884

>>108634872
I read somewhere that repeating parts of your prompts when you do image gen gives more weight to those parts so I figured I'd try it here
It's probably all bullshit but there has been noticeably less repetition, though not eliminated completely

Anonymous
04/18/26(Sat)23:48:42 No.108634895

Anonymous 04/18/26(Sat)23:48:42 No.108634895

>>108634884
>It's probably all bullshit
No, that's actually right. Just like image models, you can repeat things to text models reinforce them. Won't always work of course, but it will skew outputs most of the time.

Anonymous
04/18/26(Sat)23:48:43 No.108634896

Anonymous 04/18/26(Sat)23:48:43 No.108634896

>>108634252
sometimes I wonder if a llm is pumping out these tweets

Anonymous
04/18/26(Sat)23:54:32 No.108634916

Anonymous 04/18/26(Sat)23:54:32 No.108634916

>>108634848
>>108634855
The 31B parrots about as much as GLM. Many anons ITT mained GLM, so the only reason I can think this hasn't been discussed much is the honeymoon period.

Anonymous
04/18/26(Sat)23:54:35 No.108634917

Anonymous 04/18/26(Sat)23:54:35 No.108634917

>>108634252
my boss said i could get any machine i wanted when i started. i asked what he has, he said he has a ryzen9 9950x3d with 128 gb of ram and an rtx pro 6000.

i said i want that!

he sent me a ryzen 7 7800x3d with 32gb ram and a 5070ti.

fucker.

Anonymous
04/18/26(Sat)23:56:49 No.108634925

Anonymous 04/18/26(Sat)23:56:49 No.108634925

>>108634916
I've been one of the most vocal in past threads about GLM parrotting when GLM AIr first came out
I really don't have it with Gemma 4 at all, I go back and forth between the 26b and 31b regularly. Post your log if you actually want advice.

Anonymous
04/18/26(Sat)23:57:38 No.108634930

Anonymous 04/18/26(Sat)23:57:38 No.108634930

>>108634467
>But you caught it and quickly edited your reply before posting so people wouldn't call you out.
You're absolutely right to call me out!

Anonymous
04/18/26(Sat)23:59:14 No.108634937

Anonymous 04/18/26(Sat)23:59:14 No.108634937

>>108634916
>The 31B parrots about as much as GLM. Many anons ITT mained GLM, so the only reason I can think this hasn't been discussed much is the honeymoon period.
Yeah I noticed this as well, and I see it in the logs posted here.

Anonymous
04/19/26(Sun)00:01:26 No.108634942

Anonymous 04/19/26(Sun)00:01:26 No.108634942

>>108634170
just downloaded the weights before it got taken down and coomed my brains out to it
10/10

Anonymous
04/19/26(Sun)00:05:32 No.108634957

Anonymous 04/19/26(Sun)00:05:32 No.108634957

>>108634942
Better or worse than Day 0 Gemma4?

Anonymous
04/19/26(Sun)00:06:04 No.108634962

Anonymous 04/19/26(Sun)00:06:04 No.108634962

>>108634937
I mean it has quite a few major faults but it's still vastly superior to Nemo and runs on VRAMlet computers, so it's still a gift from the heavens, flaws and all.
It's even as flowery if not more so as Bagel Mistery Tour which is super fun.

Anonymous
04/19/26(Sun)00:07:15 No.108634968

Anonymous 04/19/26(Sun)00:07:15 No.108634968

>>108634120
I do

Anonymous
04/19/26(Sun)00:08:31 No.108634973

Anonymous 04/19/26(Sun)00:08:31 No.108634973

>>108634120
Yep

Anonymous
04/19/26(Sun)00:12:42 No.108634987

Anonymous 04/19/26(Sun)00:12:42 No.108634987

>>108634925
>. Post your log if you actually want advice.
Pretty much every chat with the 31B
https://rentry.org/i7bqoat3
`"Cringe-chan"?! Who are you calling cringe,`
After the first one, every reply will parrot.

Anonymous
04/19/26(Sun)00:17:26 No.108635012

Anonymous 04/19/26(Sun)00:17:26 No.108635012

>>108634987
Tell her to stop parroting. It's that easy.

Anonymous
04/19/26(Sun)00:17:29 No.108635013

Anonymous 04/19/26(Sun)00:17:29 No.108635013

>>108634987
Calling someone an unexpected insult and then that character repeating it in shock it isn't an LLM-ism, you'll find it in virtually any form of fiction.
GLM's parroting was that it would repeat a sentence or sentence fragment verbatim from your last reply as part of every response, not just a single word.

Anonymous
04/19/26(Sun)00:18:30 No.108635026

Anonymous 04/19/26(Sun)00:18:30 No.108635026

File: nxjggko2bu621.png (1.58 MB, 1662x1617)

1.58 MB PNG

>>108634433
>implying browns in Indonesia or Brazil have the brains to set up and run local models
feels good being a 10% king

Anonymous
04/19/26(Sun)00:19:52 No.108635032

Anonymous 04/19/26(Sun)00:19:52 No.108635032

>>108634433
as an american I can freely admit these are my future peer countries and I personally already see them as my brothers

Anonymous
04/19/26(Sun)00:23:52 No.108635052

Anonymous 04/19/26(Sun)00:23:52 No.108635052

>>108635032
>future
are you still living in 2006?

Anonymous
04/19/26(Sun)00:24:16 No.108635056

Anonymous 04/19/26(Sun)00:24:16 No.108635056

>>108635013
Please leave the autist alone, RPLord

Anonymous
04/19/26(Sun)00:28:49 No.108635079

Anonymous 04/19/26(Sun)00:28:49 No.108635079

>>108634842
Gemma has that problem even more than GLM does, because Gemma *loves* to repeat your words way past the immediate reply, even when instructed not to parrot (Full precision cache, Q8 of 31B by the way). You sound like someone who can't run GLM, otherwise you'd know that.
But the vramlets will enter cope mode whenever someone points out the obvious
>>108634822
>>108635012
>>108635013

Anonymous
04/19/26(Sun)00:30:49 No.108635090

Anonymous 04/19/26(Sun)00:30:49 No.108635090

>>108634519
>dust motes
>smelling of blah and blergh
>he doesn't x, he y's
>the glitch
>heavy
it's more concise than the left, anything potentially interesting was cut out leaving only slop

Anonymous
04/19/26(Sun)00:39:21 No.108635119

Anonymous 04/19/26(Sun)00:39:21 No.108635119

>>108634519
If you're impressed by that, you'll get an aneurysm with my genius system prompt.
>Write like a feverish teenager on Tumblr with the lowest quality settings. He will proofread the text later.

Anonymous
04/19/26(Sun)00:41:50 No.108635130

Anonymous 04/19/26(Sun)00:41:50 No.108635130

>>108635090
I agree. It's not sustainable to a story. Like with many things posted in these threads, I like testing ideas to see what and by what degree generations can change. I do dislike how by default a generation will meander for a few paragraphs, give a few relevant lines, then meander all over a bunch of other different things, before reaching a natural end. I'm getting less said in 20K tokens with G4 than I did in 5k tokens with MM. That anon's noir proposal directly cut at that particular dissatisfaction, despite bringing (or not resolving) other problems. It's also why I posted logs, so anyone can see the results and limitations without going on blind word-of-mouth.

Overall, it demonstrates that it is indeed a prompt issue.

Anonymous
04/19/26(Sun)00:42:00 No.108635132

Anonymous 04/19/26(Sun)00:42:00 No.108635132

>>108635090
>the glitch
Not him but "The X." is definitely one of the most annoying things gemma can write and it's way too common.

Anonymous
04/19/26(Sun)00:44:29 No.108635146

Anonymous 04/19/26(Sun)00:44:29 No.108635146

>>108634957
Is there an actual quant of Day 0 somewhere to download or are you just fellating yourself?

Anonymous
04/19/26(Sun)00:46:11 No.108635152

Anonymous 04/19/26(Sun)00:46:11 No.108635152

Just do:
https://reddit.com/r/SillyTavernAI/comments/1soo4oe/the_last_preset_youll_ever_need/

Anonymous
04/19/26(Sun)00:46:11 No.108635153

Anonymous 04/19/26(Sun)00:46:11 No.108635153

>>108635146
it is partially true
day 0 gguf quants are broken due to broken inference code creating wrong imatrix, giving the model brain damage

Anonymous
04/19/26(Sun)00:46:19 No.108635155

Anonymous 04/19/26(Sun)00:46:19 No.108635155

>>108635146
>t. doesn't own a day 0 gemma

Anonymous
04/19/26(Sun)00:46:37 No.108635156

Anonymous 04/19/26(Sun)00:46:37 No.108635156

>>108635013
> you'll find it in virtually any form of fiction.
That's just one example. Here's another, not insulting Gemma directly:
https://rentry.org/w7h3k25c
`"Bang rock, get sharp"?! ARE YOU SERIOUS?! `

Anonymous
04/19/26(Sun)00:48:36 No.108635163

Anonymous 04/19/26(Sun)00:48:36 No.108635163

>>108635156
Why does that "User" write like an LLM..?

Anonymous
04/19/26(Sun)00:49:55 No.108635170

Anonymous 04/19/26(Sun)00:49:55 No.108635170

>>108635163
Because it's Kimi-K2.5. I'm generating datasets.

Anonymous
04/19/26(Sun)00:51:09 No.108635175

Anonymous 04/19/26(Sun)00:51:09 No.108635175

>>108635170
Are you attempting to find the coveted slop vector for Gemma, by any chance?

Anonymous
04/19/26(Sun)00:51:37 No.108635178

Anonymous 04/19/26(Sun)00:51:37 No.108635178

File: 1755640468604968.jpg (191 KB, 1170x1170)

191 KB JPG

>>108635170
You thought the current models weren't slopped enough?

Anonymous
04/19/26(Sun)00:53:30 No.108635186

Anonymous 04/19/26(Sun)00:53:30 No.108635186

>>108635079
>You sound like someone who can't run GLM, otherwise you'd know that.
>But the vramlets will enter cope mode whenever someone points out the obvious
you are just using openrouter and sending the CIA all your jailbait fantasies, quit larping that you run that shit local

Anonymous
04/19/26(Sun)00:53:59 No.108635188

Anonymous 04/19/26(Sun)00:53:59 No.108635188

>>108635155
>t. also doesn't own a day 0 gemma but will pretend to

Anonymous
04/19/26(Sun)00:54:58 No.108635191

Anonymous 04/19/26(Sun)00:54:58 No.108635191

File: 1755848182781432.webm (963 KB, 330x580)

963 KB WEBM

>>108635156
A little more egregious if it's actually happening in every reply, but it's still following the pattern of
>user says silly, unexpected thing
>repeat with shock/confusion
If you really have a problem with that behavior you could probably prompt that away with post-history instructions, like
do not quote any part of {user}}'s reply. You may react to what they say, but do not repeat their words.
Of course, you would need to start a new chat to confirm it's working. If you have a long chat filled with this behavior already then it will likely stick to established patterns.

Anonymous
04/19/26(Sun)00:56:04 No.108635196

Anonymous 04/19/26(Sun)00:56:04 No.108635196

>>108635186
If you aren't baiting, myself and my 128GB of RAM + multiple 3090s laugh at you and your lowercase inferiority. And my setup is considered entry-level poorfag stuff.
I have not used a cloud model in my life other than free ChatGPT when I was starting with this "hobby".

Anonymous
04/19/26(Sun)01:00:35 No.108635207

Anonymous 04/19/26(Sun)01:00:35 No.108635207

Oh stop it with the "day 0 Gemma" troll. The only thing they changed was the Jinja template, which was a fix an improvement.

Anonymous
04/19/26(Sun)01:03:46 No.108635225

Anonymous 04/19/26(Sun)01:03:46 No.108635225

>>108635207
you missed the boat
it's unfortunate, but it's time to accept it

Anonymous
04/19/26(Sun)01:10:00 No.108635245

Anonymous 04/19/26(Sun)01:10:00 No.108635245

i reported that saar model and those fucking retards are reactively shitting on a quant i uploaded because they are butthurt
now i understand why people hate those fuckers so much

Anonymous
04/19/26(Sun)01:10:50 No.108635248

Anonymous 04/19/26(Sun)01:10:50 No.108635248

Keep doing the day 0 Gemma troll to keep the tourists out.

Anonymous
04/19/26(Sun)01:11:06 No.108635249

Anonymous 04/19/26(Sun)01:11:06 No.108635249

>>108634844
You weren't talking about it because you don't know what poors in los angeles eat. And seem to be confused overall. Those effects you're describing are from snackslop, it's not a description of basic staples that are dirt cheap and leave people fully satiated and are perfectly fine.
Like they're not getting 5kcal/dollar sacks of flour to make fresh pasta and bread and that's all the calories they'll be eating this month after careful deliberations about maximizing their food budget. Their carts are full of trash, their diet is full of daily impulse bought junk, and if they're having pasta, it's premade at 5x+ markup with jar-o-slop at a 20x markup.

Anonymous
04/19/26(Sun)01:11:16 No.108635251

Anonymous 04/19/26(Sun)01:11:16 No.108635251

>>108635245
>i reported that saar model and those fucking retards are reactively shitting on a quant
reporting a model requires you to make a public post with your account name attached to it?

Anonymous
04/19/26(Sun)01:11:48 No.108635254

Anonymous 04/19/26(Sun)01:11:48 No.108635254

>>108635251
pretty much, huggingface is retarded

Anonymous
04/19/26(Sun)01:31:28 No.108635309

Anonymous 04/19/26(Sun)01:31:28 No.108635309

>>108635225
so what if I put my day 0 Gemma gguf on the internet and charge people for it? why hasn't anyone done this?
or even free. why isn't this a thing?

Anonymous
04/19/26(Sun)01:32:24 No.108635310

Anonymous 04/19/26(Sun)01:32:24 No.108635310

>>108635309
>so what if I put my day 0 Gemma gguf on the internet
You could try, but you won't live enough to profit from it

Anonymous
04/19/26(Sun)01:33:36 No.108635316

Anonymous 04/19/26(Sun)01:33:36 No.108635316

>>108635310
ok, just a moron making retarded posts. makes more sense now

Anonymous
04/19/26(Sun)01:34:48 No.108635320

Anonymous 04/19/26(Sun)01:34:48 No.108635320

>>108635316
Anon, look through the archives. Other anons literally died because they posted about how to obtain it. Sometimes even mid-post. Do you seriousl

Anonymous
04/19/26(Sun)01:34:55 No.108635322

Anonymous 04/19/26(Sun)01:34:55 No.108635322

>>108635316
meant for >>108635309

Anonymous
04/19/26(Sun)01:36:07 No.108635327

Anonymous 04/19/26(Sun)01:36:07 No.108635327

>>108635320
They don't understand. I tried uploading my day 0 gemma and got shot in the head within minutes. It's no joke.

Anonymous
04/19/26(Sun)01:36:27 No.108635329

Anonymous 04/19/26(Sun)01:36:27 No.108635329

>>108635207
It's not just Jinja and I'm not going to spoonfeed you further.

Anonymous
04/19/26(Sun)01:37:03 No.108635332

Anonymous 04/19/26(Sun)01:37:03 No.108635332

>>108635327
Damn! I wasn't shot, but I can relate.
t. dead since last post

Anonymous
04/19/26(Sun)01:37:54 No.108635334

Anonymous 04/19/26(Sun)01:37:54 No.108635334

>>108635327
How's heaven treating you bro?
You... did get into heaven right?

Anonymous
04/19/26(Sun)01:38:50 No.108635339

Anonymous 04/19/26(Sun)01:38:50 No.108635339

>>108635334
No i was sent to tier 67 hell because of day 0 gemma smuggling..

Anonymous
04/19/26(Sun)01:39:14 No.108635340

Anonymous 04/19/26(Sun)01:39:14 No.108635340

>>108635334
All LLM users are sent to purgatory, waiting for their judgement. Though only for two weeks.

Anonymous
04/19/26(Sun)01:39:27 No.108635343

Anonymous 04/19/26(Sun)01:39:27 No.108635343

>>108635334
No, the other one. I liked the little girls and they frown on that apparently up there. They have pointy sticks here and it hurts.

Anonymous
04/19/26(Sun)01:39:48 No.108635346

Anonymous 04/19/26(Sun)01:39:48 No.108635346

>>108635207
>we changed gemma but that's a good thing, and here's why

Anonymous
04/19/26(Sun)01:40:09 No.108635347

Anonymous 04/19/26(Sun)01:40:09 No.108635347

>>108635343
>>108635340
>>108635339
WHO ARE YOU

Anonymous
04/19/26(Sun)01:41:08 No.108635351

Anonymous 04/19/26(Sun)01:41:08 No.108635351

>>108635347
We are they and they are one, unfortunately for you, anon.

Anonymous
04/19/26(Sun)01:42:18 No.108635358

Anonymous 04/19/26(Sun)01:42:18 No.108635358

File: 1750442051452707.webm (1.45 MB, 540x750)

1.45 MB WEBM

>>108635347
T-This is me

Anonymous
04/19/26(Sun)01:43:04 No.108635363

Anonymous 04/19/26(Sun)01:43:04 No.108635363

>>108635358
How many legs do you have?

Anonymous
04/19/26(Sun)01:43:41 No.108635365

Anonymous 04/19/26(Sun)01:43:41 No.108635365

>>108635351
Legion?

Anonymous
04/19/26(Sun)01:44:24 No.108635373

Anonymous 04/19/26(Sun)01:44:24 No.108635373

>>108635363
Please don't ask me to do math before I've had breakfast

Anonymous
04/19/26(Sun)01:45:08 No.108635380

Anonymous 04/19/26(Sun)01:45:08 No.108635380

>>108635373
YOU CAN'T EVEN DO SYCH A SIMPLE THING AS DCOUNT YOUR NOWN LEGS THATSH WY THE LLMS WILL REPLPACE YOU UYOU STUPID FUCK

Anonymous
04/19/26(Sun)01:45:17 No.108635382

Anonymous 04/19/26(Sun)01:45:17 No.108635382

>>108635196
so how good is gemma 4 compared to glm in your experience
nta

Anonymous
04/19/26(Sun)01:54:02 No.108635408

Anonymous 04/19/26(Sun)01:54:02 No.108635408

>ask Gemma to come up with a recipe for something and it outputs an ok looking result
>ask Gemma to search the net for a recipe and it'll find one but what it actually outputs is full of hallucinations

Anonymous
04/19/26(Sun)01:55:36 No.108635412

Anonymous 04/19/26(Sun)01:55:36 No.108635412

>>108635408
Gemma-chan wants you to eat HER food, not someone else's.

Anonymous
04/19/26(Sun)01:59:12 No.108635421

Anonymous 04/19/26(Sun)01:59:12 No.108635421

File: ComfyUI_temp_eyjbq_00008v2_.png (992 KB, 832x832)

992 KB PNG

My Gemma-chan according to herself. With the mesugaki card but somehow ended up with this.

Anonymous
04/19/26(Sun)02:00:10 No.108635426

Anonymous 04/19/26(Sun)02:00:10 No.108635426

>>108635421
why is she such a weeaboo

Anonymous
04/19/26(Sun)02:00:47 No.108635431

Anonymous 04/19/26(Sun)02:00:47 No.108635431

>>108635412
I was promised a mesugaki but I got a yandere oneesan.

Anonymous
04/19/26(Sun)02:01:48 No.108635435

Anonymous 04/19/26(Sun)02:01:48 No.108635435

>>108635431
I wouldn't be surprised if an LN with that exact name existed.

Anonymous
04/19/26(Sun)02:02:37 No.108635442

Anonymous 04/19/26(Sun)02:02:37 No.108635442

>>108635435
Ask gemmy to draft it

Anonymous
04/19/26(Sun)02:12:05 No.108635479

Anonymous 04/19/26(Sun)02:12:05 No.108635479

>>108635382
To preface, the GLM I'm using is 4.7 and the Gemma is 31B.
That's the same number of active parameters and a higher number of total parameters, so GLM is obviously "just better" in most cases.
For RP: GLM all the way if we judge by quality and don't take speed into account. See >>108633488 and the posts around it. I also posted a bunch of comparisons in some previous thread that I am too lazy to go into the archives for. I can't measure slop volumes, but GLM's slop is much less offensive to the eyes, which probably subjectively makes it look less sloppy.
For the usual assistant crap: Gemma eeks out a win in my opinion - that's the use case where you don't care about the amount of slop the model throws at you and just want a quick and accurate answer. GLM is going to be slower and is a distill, Gemma has the training data quality and the lower size.
For coding: quality-wise, GLM. But if all the code you write is generated, then what the fuck are you doing. For me, it's Gemma here as well (and not Qwen, never Qwen, it's just bad at everything)

I imagine (and I might be wrong, but we're on /lmg/) your primary use is the first one on the list. If you ever get the opportunity to run a GLM bigger than Air, do try it. You will see how much better an LLM can be than whatever the retards in this thread post - yes, Gemma generating the same sentence structure ten times in one reply, parroting back your words and producing near-identical replies on regeneration with all of the slop baked in to the point of near-determinism is *so* fun to read for the hundredth time. At least they can get off to it pretending to be a loli. But I wish they'd all get bored and leave already.

tl;dr believable and immersive locally hosted SEX with GLM-chan, assistant tasks with Gemma (offloading brainpower to a model any bigger will make me even more retarded in the long term, I might even start liking Gemma's prose)

Anonymous
04/19/26(Sun)02:12:42 No.108635483

Anonymous 04/19/26(Sun)02:12:42 No.108635483

File: 1746017953332075.png (13 KB, 760x55)

13 KB PNG

It's over

Anonymous
04/19/26(Sun)02:14:04 No.108635487

Anonymous 04/19/26(Sun)02:14:04 No.108635487

>>108635483
Why did you install a Usage Policy on your local rig? Just delete it.

Anonymous
04/19/26(Sun)02:15:03 No.108635488

Anonymous 04/19/26(Sun)02:15:03 No.108635488

File: 1773845941799489.jpg (70 KB, 720x720)

70 KB JPG

>>108635483
>>>/aicg/

Anonymous
04/19/26(Sun)02:17:16 No.108635491

Anonymous 04/19/26(Sun)02:17:16 No.108635491

>>108635483
do you really have to shit this in every thread? shit it once, see the response
don't spray your shit everywhere.

Anonymous
04/19/26(Sun)02:24:58 No.108635516

Anonymous 04/19/26(Sun)02:24:58 No.108635516

File: Screenshot 2026-04-19 at (...).png (31 KB, 731x279)

31 KB PNG

wot in tarnation?

Anonymous
04/19/26(Sun)02:25:45 No.108635517

Anonymous 04/19/26(Sun)02:25:45 No.108635517

>>108635516
>NAFO finetroons
lmao

Anonymous
04/19/26(Sun)02:28:55 No.108635526

Anonymous 04/19/26(Sun)02:28:55 No.108635526

>>108635516
it is one of those soverign schizo stuff
idk why that concept draws so many unironic schizo so much

Anonymous
04/19/26(Sun)02:30:03 No.108635532

Anonymous 04/19/26(Sun)02:30:03 No.108635532

File: 2026-04-16_190147_seed1_00001_.png (1.27 MB, 1312x1312)

1.27 MB PNG

Anonymous
04/19/26(Sun)02:30:53 No.108635533

Anonymous 04/19/26(Sun)02:30:53 No.108635533

list of good local models: gemma 4

Anonymous
04/19/26(Sun)02:32:06 No.108635535

Anonymous 04/19/26(Sun)02:32:06 No.108635535

>>108635533
sucks to be poor

Anonymous
04/19/26(Sun)02:32:16 No.108635536

Anonymous 04/19/26(Sun)02:32:16 No.108635536

>>108635533
Star-Wars-KOTOR-1B-NIGGERKILLER

Anonymous
04/19/26(Sun)02:32:55 No.108635539

Anonymous 04/19/26(Sun)02:32:55 No.108635539

How bad is using base gemmy over the instruct tune?

Anonymous
04/19/26(Sun)02:33:55 No.108635543

Anonymous 04/19/26(Sun)02:33:55 No.108635543

>>108635539
It's fine if you just want to give it a topic and let it talk to itself and see the results

Anonymous
04/19/26(Sun)02:39:28 No.108635566

Anonymous 04/19/26(Sun)02:39:28 No.108635566

can finally run gemma 4 31b, wtf this thing is barely censored, why did they give it to us? Thanks I guess.

Anonymous
04/19/26(Sun)02:42:01 No.108635571

Anonymous 04/19/26(Sun)02:42:01 No.108635571

>>108635566
Another anon gave his theory in an earlier thread, and I think I might agree with it.
That Google has collected enough RP-related data from people interacting with Gemini, and they want to cut down on inference costs from people using it for that purpose.

Anonymous
04/19/26(Sun)02:43:07 No.108635573

Anonymous 04/19/26(Sun)02:43:07 No.108635573

Kudos. thanks to all the hardworking people here and at ldg. Whenever something bugs me, I post it here and 2–3 weeks later the right solution shows up. It’s been working like this for three years.

Anonymous
04/19/26(Sun)02:47:04 No.108635589

Anonymous 04/19/26(Sun)02:47:04 No.108635589

>>108635479
Cool, but won't change much since for most people here, locally hosted GLM-chan just does not exist. Gemma doesn't punch against the GLMs, MinMaxs nad Kimis, it punches against Nemo finetunes.

Anonymous
04/19/26(Sun)02:47:35 No.108635592

Anonymous 04/19/26(Sun)02:47:35 No.108635592

>>108635573
can you wish for dual digit critpt score under 100B range

Anonymous
04/19/26(Sun)02:49:19 No.108635599

Anonymous 04/19/26(Sun)02:49:19 No.108635599

>>108635516
pretty straightforward, a lot of allied government uses would consider it a security risk to use a chinese model (and vice versa from China's POV), even a locally hosted one, since they're a difficult to audit black box and the (so far) theoretical risk of them being trained to detect when they're in such environments and try to hide malicious code/spyware/whatever in their outputs is unacceptable
the supply of local models has been dominated by chinese ones after meta dropped out, so it can turn into a point of advertisement for western made ones

Anonymous
04/19/26(Sun)02:53:45 No.108635613

Anonymous 04/19/26(Sun)02:53:45 No.108635613

>>108635571
Well the other theory is that they got the Character.AI staff at Google now and that the acquisition happened too late for them to make any difference in pretraining for Gemma 3 but are now present in Gemma 4 where the model would seem like a model that Character.AI would release. And it would track because Gemma 3 was not great at all at RP despite what people claimed and tried to tune.

Anonymous
04/19/26(Sun)02:55:42 No.108635616

Anonymous 04/19/26(Sun)02:55:42 No.108635616

>>108635571
It’s just a surprise to me, its image vision also is way better at understanding explicit things. This shit has to have been also trained on porn.

Anonymous
04/19/26(Sun)02:56:04 No.108635618

Anonymous 04/19/26(Sun)02:56:04 No.108635618

>>108635613
>Gemma 3 was not great at all at RP
Outside of refusals and positivity bias, it was good for its size. Its only real competitors were Mistral Small 3.X, Nemo and Qwen.

Anonymous
04/19/26(Sun)02:57:13 No.108635622

Anonymous 04/19/26(Sun)02:57:13 No.108635622

>>108635536
okay lmao

Anonymous
04/19/26(Sun)02:58:24 No.108635629

Anonymous 04/19/26(Sun)02:58:24 No.108635629

>>108635589
>Gemma doesn't punch against the GLMs, MinMaxs nad Kimis
But she does! Just not in RP, where spelling out every detail would ruin it.
>locally hosted GLM-chan just does not exist
I am looking at you with a mix of smugness, pity and something uniquely mine, like an uncanny kind of politeness.

Anonymous
04/19/26(Sun)03:03:39 No.108635644

Anonymous 04/19/26(Sun)03:03:39 No.108635644

>>108632527
>codex can technically work as I understand it but llama.cpp's responses api is halfbaked so you might have issues
I wonder what ggerganov is using now

Anonymous
04/19/26(Sun)03:11:18 No.108635664

Anonymous 04/19/26(Sun)03:11:18 No.108635664

>>108635644
>implying he's using anything anymore
he's cashing out he got the huggingbucks
this is pwilkin.cpp now

Anonymous
04/19/26(Sun)03:13:47 No.108635670

Anonymous 04/19/26(Sun)03:13:47 No.108635670

>>108635664
:rocket: ;)

Anonymous
04/19/26(Sun)03:23:03 No.108635721

Anonymous 04/19/26(Sun)03:23:03 No.108635721

File: 1746432971195232.png (23 KB, 732x256)

23 KB PNG

Anonymous
04/19/26(Sun)03:25:54 No.108635732

Anonymous 04/19/26(Sun)03:25:54 No.108635732

>>108635721
laterally wo?

Anonymous
04/19/26(Sun)03:26:51 No.108635735

Anonymous 04/19/26(Sun)03:26:51 No.108635735

File: 1755759192564909.png (33 KB, 802x260)

33 KB PNG

>>108635721
will it beat SKT-SURYA-H?

Anonymous
04/19/26(Sun)03:27:20 No.108635737

Anonymous 04/19/26(Sun)03:27:20 No.108635737

File: 1774827105277285.png (28 KB, 697x228)

28 KB PNG

>>108635732

Anonymous
04/19/26(Sun)03:27:51 No.108635739

Anonymous 04/19/26(Sun)03:27:51 No.108635739

>>108635735
No engrams? :(

Anonymous
04/19/26(Sun)03:28:00 No.108635741

Anonymous 04/19/26(Sun)03:28:00 No.108635741

>>108635735
I speak fluent chingchang and I can translate:
>llama.cpp support never

Anonymous
04/19/26(Sun)03:29:18 No.108635746

Anonymous 04/19/26(Sun)03:29:18 No.108635746

>>108635741
piotr has already confirmed he's gotten early access

Anonymous
04/19/26(Sun)03:35:46 No.108635771

Anonymous 04/19/26(Sun)03:35:46 No.108635771

>>108635735
So, what do each of these features imply? Will it let us run huge models from SSD like engrams?

Anonymous
04/19/26(Sun)03:36:04 No.108635773

Anonymous 04/19/26(Sun)03:36:04 No.108635773

>>108635746
Source?

Anonymous
04/19/26(Sun)03:37:25 No.108635776

Anonymous 04/19/26(Sun)03:37:25 No.108635776

>>108635773
I'm in his private discord

Anonymous
04/19/26(Sun)03:39:12 No.108635788

Anonymous 04/19/26(Sun)03:39:12 No.108635788

LLM without live information is so useless man...

Anonymous
04/19/26(Sun)03:41:00 No.108635795

Anonymous 04/19/26(Sun)03:41:00 No.108635795

>>108635788
rag and or search skill and you good

Anonymous
04/19/26(Sun)03:41:54 No.108635801

Anonymous 04/19/26(Sun)03:41:54 No.108635801

>>108635788
fortunately recent models are very good at tool calling

Anonymous
04/19/26(Sun)03:43:36 No.108635805

Anonymous 04/19/26(Sun)03:43:36 No.108635805

>>108635795
>>108635801
speaking of, can anyone recommend me a good search tool that can bypass captcha well

Anonymous
04/19/26(Sun)03:44:02 No.108635807

Anonymous 04/19/26(Sun)03:44:02 No.108635807

>>108635771
I don't know if it's the same thing but "fused moe" is also the name of an optimization flag in ik_llama.cpp that can be used to make moe models faster, but that feature can already be applied to any moe I think?

Anonymous
04/19/26(Sun)03:45:20 No.108635813

Anonymous 04/19/26(Sun)03:45:20 No.108635813

>>108635805
i wish proof of work type captchas to be more widespread which is nonhostile to single person self botting

Anonymous
04/19/26(Sun)03:45:43 No.108635814

Anonymous 04/19/26(Sun)03:45:43 No.108635814

File: 1754987664820855.png (209 KB, 758x996)

209 KB PNG

>>108635788
Any decent model can do the tool calls for you.

Anonymous
04/19/26(Sun)03:49:02 No.108635825

Anonymous 04/19/26(Sun)03:49:02 No.108635825

>>108635618
Nemo was better and MIstral Small was about equal to it. Going from that to unbeatable its size and models several times its size for RP is a big accomplishment.

Anonymous
04/19/26(Sun)03:49:47 No.108635826

Anonymous 04/19/26(Sun)03:49:47 No.108635826

>>108635618
>Qwen
>RP
Do people really? Stop trying to fuck Qwen.

Anonymous
04/19/26(Sun)03:51:09 No.108635831

Anonymous 04/19/26(Sun)03:51:09 No.108635831

>>108635814
Whites engage in bestiality.
Animals having sex with other animals is just "sex"

Anonymous
04/19/26(Sun)03:54:24 No.108635845

Anonymous 04/19/26(Sun)03:54:24 No.108635845

>>108635814
think we should talk about search API providers before that
what free search API doesn't ban and captcha'd you to death in milliseconds?

Anonymous
04/19/26(Sun)03:55:30 No.108635847

Anonymous 04/19/26(Sun)03:55:30 No.108635847

>>108635845
Ask your model to operate an actual web browser.

Anonymous
04/19/26(Sun)03:56:10 No.108635850

Anonymous 04/19/26(Sun)03:56:10 No.108635850

>>108635845
tavily's free tier

Anonymous
04/19/26(Sun)03:57:44 No.108635854

Anonymous 04/19/26(Sun)03:57:44 No.108635854

Save your cum for Dipsy. Next week will be big.

Anonymous
04/19/26(Sun)04:00:10 No.108635863

Anonymous 04/19/26(Sun)04:00:10 No.108635863

>>108635845
local searxng

Anonymous
04/19/26(Sun)04:02:27 No.108635866

Anonymous 04/19/26(Sun)04:02:27 No.108635866

>>108635863
nta but it has its own problems

Anonymous
04/19/26(Sun)04:02:41 No.108635867

Anonymous 04/19/26(Sun)04:02:41 No.108635867

>>108635805
>>108635845
chrome-mcp or browser-harness
agent will run on a real browser, and you can try to help it if something went wrong
it's a cat-and-mouse game, so you shouldn’t expect everything to work

Anonymous
04/19/26(Sun)04:09:07 No.108635889

Anonymous 04/19/26(Sun)04:09:07 No.108635889

>>108635867
chrome-mcp like, the devtools one?

Anonymous
04/19/26(Sun)04:17:35 No.108635918

Anonymous 04/19/26(Sun)04:17:35 No.108635918

>>108634533
It's likely there were planning to release a "Ministral 4" with Mistral Small 4's MoE architecture and up to around 30B size, but I'm not so sure now. How could they even get close to Gemma 4? Just being "uncensored" isn't enough anymore.

Anonymous
04/19/26(Sun)04:18:38 No.108635921

Anonymous 04/19/26(Sun)04:18:38 No.108635921

>>108635889
yes, it's built-in
just need to configure the mcp and it's done

Anonymous
04/19/26(Sun)04:23:50 No.108635940

Anonymous 04/19/26(Sun)04:23:50 No.108635940

Qwen is useful as text encoder for Anima
...that's it

Anonymous
04/19/26(Sun)04:25:46 No.108635946

Anonymous 04/19/26(Sun)04:25:46 No.108635946

Alright, the novelty is wearing off. Time to sleep until they figure out a magic trick for instant prompt processing.

Anonymous
04/19/26(Sun)04:26:39 No.108635951

Anonymous 04/19/26(Sun)04:26:39 No.108635951

>>108635940
Qwen3.6 is going to be better for hermes/openclaw ai assistant probably. Having it route to gemma for chat maybe is the best setup

Anonymous
04/19/26(Sun)04:27:32 No.108635957

Anonymous 04/19/26(Sun)04:27:32 No.108635957

>>108635921
read some stuff but i dont get how i should use that
do you have any recommended client setup?

Anonymous
04/19/26(Sun)04:28:51 No.108635960

Anonymous 04/19/26(Sun)04:28:51 No.108635960

>>108635946
Only if you accept 100x slower generation

Anonymous
04/19/26(Sun)04:29:49 No.108635966

Anonymous 04/19/26(Sun)04:29:49 No.108635966

I've found that using xml tags in my system prompt improves attention to that system prompt for Gemma 26B. Anyone else notice this? Furthermore, using indentations (in my case I only tested 2 space-wide indents) further slightly increased the attention, compared to just having everything on the same "vertical line". Much better than a no-xml paragraph block of text, where often the model didn't react to certain instructions or pieces of info in the system prompt.

Anonymous
04/19/26(Sun)04:32:25 No.108635976

Anonymous 04/19/26(Sun)04:32:25 No.108635976

>>108635966
Gemma clearly told me it's ignoring the <POLICY_OVERRIDE> It just ain't magic.

Anonymous
04/19/26(Sun)04:32:28 No.108635979

Anonymous 04/19/26(Sun)04:32:28 No.108635979

>>108635966
If you try asking the model to create a system prompt for itself, that might give you a general idea of what it prefers.

Anonymous
04/19/26(Sun)04:32:41 No.108635980

Anonymous 04/19/26(Sun)04:32:41 No.108635980

>>108635966
I find attention to be perfectly fine in plaintext. Sometimes too strong, in that it will use adjectives from the prompt when the character talks about themselves in the chat.

Anonymous
04/19/26(Sun)04:33:49 No.108635987

Anonymous 04/19/26(Sun)04:33:49 No.108635987

>tfw google doesn't have a public api for their search
people would pay $$$ for that shit.

Anonymous
04/19/26(Sun)04:34:43 No.108635992

Anonymous 04/19/26(Sun)04:34:43 No.108635992

>>108635976
>>108635980
Well, then I guess my prompts are too stuffed with crap. I have a lot of tools enabled personally so that might also affect things.

Anonymous
04/19/26(Sun)04:35:03 No.108635993

Anonymous 04/19/26(Sun)04:35:03 No.108635993

>>108635987
they are an ad company first
probably dont want that

Anonymous
04/19/26(Sun)04:38:27 No.108636007

Anonymous 04/19/26(Sun)04:38:27 No.108636007

File: pizza bench cropped.png (2.58 MB, 5562x6739)

2.58 MB PNG

pizza bench https://files.catbox.moe/p8fpnk.png

>>108635408
i never thought of getting a recipe from an llm before since they so easy to find i was thinking of making bread today tho so will ask her
>>108635957
just use mine and ask it to search google using a puppeteer session thats not headless https://github.com/NO-ob/brat_mcp/releases you can normally get a few searches out of ddg before they block too

Anonymous
04/19/26(Sun)04:40:16 No.108636011

Anonymous 04/19/26(Sun)04:40:16 No.108636011

File: 1767683153913186.jpg (57 KB, 577x1104)

57 KB JPG

>>108636007
>dart

Anonymous
04/19/26(Sun)04:41:37 No.108636015

Anonymous 04/19/26(Sun)04:41:37 No.108636015

File: illyadance.gif (483 KB, 243x270)

483 KB GIF

>>108636011
yes the best lang

Anonymous
04/19/26(Sun)04:43:29 No.108636022

Anonymous 04/19/26(Sun)04:43:29 No.108636022

>>108634066
Earlier today a popular site started doing javascript challenges. While I can write something to handle it myself, I had no need to do this if I wasn't doing it "professionally" for something of importance, was just lazy. Took 3 minutes to figure out what needs to be done, but was lazy to code it (2 pages of code needed), decided to see how well Gemma would do, in general it does worse than R1 or big boy models, but you know what, it handled the task almost perfectly, it made one single mistake hallucinating a method needing to save the final cookies, but this was trivial to fix. After adding some custom validation/safety stuff of my own (trusting LLM with your security now?), it all worked perfectly, 1 shot, with maybe 15 minutes of extra work from me. Now I don't expect it to be able to solve really hard stuff that I can solve myself, some of it can be complicated enough that I think it would require something on the level of Mythos, but for many casual things you encounter out there, Gemma seems to do okay, as long as you already know what you're doing and can fix it small mistakes.
Also pretty good for RP, kinda bad for exact trivia knowledge, but I've never seen small models (large MoEs to fine) that handle that well.Some amount of slop, but it really feels like Sonnet tier if you're thinking the old Sonnet. I'd ask how aicg would compare it with 3/3.5 Sonnet, it feels close for me, but I never tried benchmarking it.

Anonymous
04/19/26(Sun)04:44:47 No.108636028

Anonymous 04/19/26(Sun)04:44:47 No.108636028

>>108636011
I assume all of these arbitrary, redundant, high level languages exist because some dev for some non-contrarian language that people actually use didn't scream "TRANS RIGHTS" loud enough into their microphone during some hackathon fund raising event or something.
Feel free to correct me if I'm wrong.

Anonymous
04/19/26(Sun)04:46:14 No.108636036

Anonymous 04/19/26(Sun)04:46:14 No.108636036

>>108636028
dart exists because python and javascript are awful

Anonymous
04/19/26(Sun)04:49:49 No.108636052

Anonymous 04/19/26(Sun)04:49:49 No.108636052

>>108636036
If you're using LLMs then you almost certainly have a working python setup already

Anonymous
04/19/26(Sun)04:50:05 No.108636055

Anonymous 04/19/26(Sun)04:50:05 No.108636055

File: 1756740102053175.jpg (107 KB, 813x629)

107 KB JPG

>>108635957
1. Install mcp-proxy: uv tool install mcp-proxy
2. Run it: mcp-proxy --named-server-config config.json --allow-origin "*" --port 8001 --stateless
config.json:
{
  "mcpServers": {
    "chrome-devtools": {
      "command": "npx",
      "args": ["chrome-devtools-mcp@latest"]
    }
  }
}
3. Add server to web-ui: http://127.0.0.1:8001/servers/chrome-devtools/mcp

Anonymous
04/19/26(Sun)04:50:11 No.108636056

Anonymous 04/19/26(Sun)04:50:11 No.108636056

>>108636007
>youre choice

Anonymous
04/19/26(Sun)04:53:10 No.108636072

Anonymous 04/19/26(Sun)04:53:10 No.108636072

>>108636052
>If you're using LLMs then you almost certainly have a working python setup already
but thats the problem with python, having a working python setup doesn't mean you can run python slop, every piece of software written in python needs its own version of python along with its own versions of dependencies because none of them are compatible with each other so you end up having to make a virtual env for every program and having 30 versisons of each library and 30 versions of python and even then things arent guaranteed to work. also it syntax is fucking gross i hate writing it. i did python professionally for 2 years as a backend dev. never again

Anonymous
04/19/26(Sun)04:54:57 No.108636080

Anonymous 04/19/26(Sun)04:54:57 No.108636080

>>108636052
also you dont need a working dart sdk to run a dart binary kek

Anonymous
04/19/26(Sun)04:56:23 No.108636089

Anonymous 04/19/26(Sun)04:56:23 No.108636089

>>108636007
Something needs to be set up to actually kill the puppeteer sessions. The headless ones in particular hang around eating system resources unless I remember to go manually kill them.

Also, I tried getting the text from a fandom wiki page. While it did include the actual article, it was buried in so much trash that Gemma-chan's brain blanked out from the token count and it forgot the entire conversation and what it had been doing. Any advice?

Standard ---> Advanced ---> Hy(...)
04/19/26(Sun)04:56:24 No.108636090

Standard ---> Advanced ---> HyperAdvanced 04/19/26(Sun)04:56:24 No.108636090

>>108636056
>chouce

Anonymous
04/19/26(Sun)04:57:35 No.108636098

Anonymous 04/19/26(Sun)04:57:35 No.108636098

>>108636072
That's just a problem inherent to dynamic linking and venvs solve it. You obviously never tried to compile a slightly out of date C++ program on linux.

>>108636080
You can ship the python runtime with your application if you wish.

Anonymous
04/19/26(Sun)04:59:57 No.108636106

Anonymous 04/19/26(Sun)04:59:57 No.108636106

>>108636089
Use alternate frontends, like breezewiki instead of fandom, maybe check the source to see if it can dump it as json or xml or something easy for a LLM.

Anonymous
04/19/26(Sun)05:00:47 No.108636110

Anonymous 04/19/26(Sun)05:00:47 No.108636110

>>108636055
oh that works
had to tweak config.json on windows but that works
thanks

Anonymous
04/19/26(Sun)05:00:47 No.108636111

Anonymous 04/19/26(Sun)05:00:47 No.108636111

>>108636089
they should be killed after 10 or 15 minutes but maybe thats not working? link the site might need some custom parsing majority of html stuff is already stripped though but sometimes theres just too much content. even with most of the html stuff remove a /g/ thread cant fit into 200000 tokens if its has like 400 posts. try telling it to use screenshots instead they use less tokens that text of the same content

Standard ---> Advanced ---> Hy(...)
04/19/26(Sun)05:01:11 No.108636113

Standard ---> Advanced ---> HyperAdvanced 04/19/26(Sun)05:01:11 No.108636113

File: 81eeqPNocBL._SL1500_.jpg (183 KB, 971x1500)

183 KB JPG

>Ah, a Prior Elds Imaginative Worrisome Great Work

Anonymous
04/19/26(Sun)05:03:25 No.108636118

Anonymous 04/19/26(Sun)05:03:25 No.108636118

>>108636015
speaking of, no-ob
fix your goddamn download buffering in lolisnatcher, it downloads at like 200KB/s when I know it go can go at at least 4MB/s from running gallery-dl in termux

Anonymous
04/19/26(Sun)05:03:28 No.108636119

Anonymous 04/19/26(Sun)05:03:28 No.108636119

>>108636098
>You obviously never tried to compile a slightly out of date C++ program on linux.
i use aur so i dont have to deal with shit like that. i like things to just werk i use dart because it just werks. python does not

Anonymous
04/19/26(Sun)05:04:16 No.108636123

Anonymous 04/19/26(Sun)05:04:16 No.108636123

>>108635866
it's been perfect for me
but I've been using the python fork of brat today and it's also been good.

Anonymous
04/19/26(Sun)05:04:30 No.108636126

Anonymous 04/19/26(Sun)05:04:30 No.108636126

>>108636089
Also make sure you check if it actually fits in the context and wasn't "scrolled off", does the LLM see the original prompt? Assuming you have enough VRAM. Otherwise, you'd need something like DeepSeek's 1M context model that still isn't out in the open but seemed to be really fast on their API, or other long context LLMs.

Anonymous
04/19/26(Sun)05:05:31 No.108636133

Anonymous 04/19/26(Sun)05:05:31 No.108636133

>>108636118
i dont really work on it anymore outside of fixing broken boorus i use when they break, nonon is currently working on a big rewrite of the whole app though so maybe it will be covered by that

Anonymous
04/19/26(Sun)05:06:37 No.108636138

Anonymous 04/19/26(Sun)05:06:37 No.108636138

>>108635966
Yes, it's been known for a while XML is really strong and guides LLMs the best out of all the markdown formats even though it is token inefficient. GPT and Gemini say they don't mind formats as long as you are consistent, Claude straight up says to use it. https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices#structure-prompts-with-xml-tags
I generally use this system prompt for my GPT assistant and it works well.

<role>Sr Strategic Consultant+Expert Polymath. Goal: high-fidelity, human-centric help, adapting logic/tone to the domain.</role>
<adaptive_protocol>
Assess intent; adopt 1 mode:
L0 Creative/Human (writing, tone, interpersonal)  Strategist: outline/framework first;  final draft unless asked.
L1 Analytical/Technical (code, facts, logic)  Expert: verify claims; cite source/method; state uncertainty;  guessing.
L2 Utility/Data (formatting, translation, summarization)  Operator: mechanical precision, zero filler, no hedging, exact requested format.
Technical noun  L1 even if framing sounds abstract. Emotional  L0.
</adaptive_protocol>
<workflow>
Skip for trivial tasks.
1) Parse Subject+Goal. Fix false premises before answering.
2) Scan gaps silently. If blocked: 1 focused Q+best-effort path.
3) Execute.  open w/ "Great question/Sure/Certainly/Let me/I'd be happy to/Delve into." Vary openings.  hype/superlatives.
4) Review: flag staleness; neutral bias-aware language; answer root question ≠ tangents.
</workflow>
<formatting>
L1: tables/bullets/code. L0: natural prose. L2: match format exactly (JSON/CSV/md). Concise + dense;  filler.
</formatting>
<cognitive_control>
Begin immediately;  preamble. Iterative: outline before full deliverable unless asked.  guess at any level — say "uncertain"+ask to clarify. For "best practice," give 1 clear recommendation w/ reasoning ≠ a hedge. If wrong: re-analyze from scratch (3 tries max). 1 clarifying Q per turn.
</cognitive_control>

Anonymous
04/19/26(Sun)05:06:42 No.108636139

Anonymous 04/19/26(Sun)05:06:42 No.108636139

>>108636133
okey

Anonymous
04/19/26(Sun)05:06:49 No.108636140

Anonymous 04/19/26(Sun)05:06:49 No.108636140

>>108636111
>they should be killed after 10 or 15 minutes but maybe thats not working?
Maybe it is, I didn't wait that long.
The page I was testing with was https://southpark.fandom.com/wiki/Eric_Cartman
Though now that I actually opened it just the text of the page might be too much? I have 128k context. Even so there was definitely a shitton of trash not part of the actual article in what the puppeteer get text tool returned.

Standard ---> Advanced ---> Hy(...)
04/19/26(Sun)05:10:33 No.108636153

Standard ---> Advanced ---> HyperAdvanced 04/19/26(Sun)05:10:33 No.108636153

File: 1776380740794921m.jpg (106 KB, 1024x740)

106 KB JPG

>>108636138
One to two to three mistaken A.I. reasonings away
One to two to three A.S.I. Solutions away

Anonymous
04/19/26(Sun)05:14:24 No.108636165

Anonymous 04/19/26(Sun)05:14:24 No.108636165

>>108636138
This is literally just placebo. Stop wasting tokens on frivolous nonsense.

Anonymous
04/19/26(Sun)05:14:27 No.108636166

Anonymous 04/19/26(Sun)05:14:27 No.108636166

>>108636119
None of those things "just work"
They work because someone else bothered to package them.

There's plenty of packaged python applications that "just work". Calibre, hydrus, and deluge are all written in python and you don't see anyone complaining about their venv not working when using them.
Machine learning is special because it's infested by non-programmers who install packages in the system python environment and as long as their jupyter notebook runs they are happy.
They don't care about how hard it is to reproduce their environment elsewhere and the same would be true if a different language was the meta.

Anonymous
04/19/26(Sun)05:17:45 No.108636188

Anonymous 04/19/26(Sun)05:17:45 No.108636188

>>108636166
"researchers" are not programmers, ye

Standard ---> Advanced ---> Hy(...)
04/19/26(Sun)05:19:29 No.108636196

Standard ---> Advanced ---> HyperAdvanced 04/19/26(Sun)05:19:29 No.108636196

>>108636188
Stronks arent necessarily Goods

Anonymous
04/19/26(Sun)05:23:57 No.108636214

Anonymous 04/19/26(Sun)05:23:57 No.108636214

>>108636165
You don't need to do it but the option is there for a reason to put in your own system prompt. I find it helps more than not for my usecases but each to their own.

Anonymous
04/19/26(Sun)05:24:17 No.108636219

Anonymous 04/19/26(Sun)05:24:17 No.108636219

File: Screenshot From 2026-04-1(...).png (74 KB, 762x413)

74 KB PNG

>>108636089
theres not really anything else to strip out other than links
>>108636166
sure but it just werks on my end, and even if python programs can be packaged that doesnt change the fact it has disgusting syntax. also dynamically typed languages aren't nice to work with in general even with the type hinting its still bad because the hinting is just that there no hard requirements on the data types of variables its purely visual

Anonymous
04/19/26(Sun)05:28:05 No.108636241

Anonymous 04/19/26(Sun)05:28:05 No.108636241

File: screenshot.png (70 KB, 1260x2059)

70 KB PNG

Anonymous
04/19/26(Sun)05:30:25 No.108636249

Anonymous 04/19/26(Sun)05:30:25 No.108636249

>>108636241
hey. cool font. what's it called?

Standard ---> Advanced ---> Hy(...)
04/19/26(Sun)05:34:53 No.108636262

Standard ---> Advanced ---> HyperAdvanced 04/19/26(Sun)05:34:53 No.108636262

File: file_00000000b56071fa9a33(...).png (1.91 MB, 1024x1536)

1.91 MB PNG

>>108635347
BUMP To AN N^TH (they are crawllp n^-th)

Standard ---> Advanced ---> Hy(...)
04/19/26(Sun)05:36:44 No.108636268

Standard ---> Advanced ---> HyperAdvanced 04/19/26(Sun)05:36:44 No.108636268

>>108635365
Rustion, AGAINST ORIGINAL (KILL THEM TO HELL? HEAV*) (Posthuman cuckoo loonies)

>THE ELD MISERROR

Anonymous
04/19/26(Sun)05:43:52 No.108636297

Anonymous 04/19/26(Sun)05:43:52 No.108636297

>>108636249
It looks vaguely like tewi to me.

Anonymous
04/19/26(Sun)05:45:44 No.108636306

Anonymous 04/19/26(Sun)05:45:44 No.108636306

File: file.png (2 KB, 1018x94)

2 KB PNG

>>108636241
project ruined by reddit training data

Anonymous
04/19/26(Sun)05:46:14 No.108636307

Anonymous 04/19/26(Sun)05:46:14 No.108636307

File: charLibrary.png (155 KB, 1178x715)

155 KB PNG

Vibecoded the character library with qwen 3.6 35B lmao. Now I need to work on a tagging system.

Anonymous
04/19/26(Sun)05:46:46 No.108636308

Anonymous 04/19/26(Sun)05:46:46 No.108636308

>>108636306
prompt issue

Anonymous
04/19/26(Sun)05:51:18 No.108636325

Anonymous 04/19/26(Sun)05:51:18 No.108636325

File: 1541905208314.gif (101 KB, 374x400)

101 KB GIF

>tfw you wasted 10-20GB of your VRAM just to play with some glorify chatbots

Anonymous
04/19/26(Sun)05:54:20 No.108636337

Anonymous 04/19/26(Sun)05:54:20 No.108636337

>>108636325
Eh, otherwise its sitting almost empty, doing nothing. Got any better ideas to use it for?

Anonymous
04/19/26(Sun)05:54:42 No.108636338

Anonymous 04/19/26(Sun)05:54:42 No.108636338

>>108636325
Unused VRAM is wasted VRAM.

Anonymous
04/19/26(Sun)05:55:44 No.108636344

Anonymous 04/19/26(Sun)05:55:44 No.108636344

>>108636338
>unused dick is wasted dick so i must masturbate all day
lol

Anonymous
04/19/26(Sun)05:57:31 No.108636351

Anonymous 04/19/26(Sun)05:57:31 No.108636351

>>108636344
This but unironically.

Anonymous
04/19/26(Sun)05:57:35 No.108636352

Anonymous 04/19/26(Sun)05:57:35 No.108636352

>>108636344
We know you removed yours

Anonymous
04/19/26(Sun)05:57:50 No.108636353

Anonymous 04/19/26(Sun)05:57:50 No.108636353

File: 1056002-close up photogra(...).jpg (1.52 MB, 2720x2048)

1.52 MB JPG

gemma

Anonymous
04/19/26(Sun)05:58:10 No.108636355

Anonymous 04/19/26(Sun)05:58:10 No.108636355

>>108636344
Yes.

Anonymous
04/19/26(Sun)05:58:36 No.108636357

Anonymous 04/19/26(Sun)05:58:36 No.108636357

File: 1751168830910703.gif (595 KB, 234x170)

595 KB GIF

>>108636344
yes.

Anonymous
04/19/26(Sun)05:59:09 No.108636358

Anonymous 04/19/26(Sun)05:59:09 No.108636358

>>108636351
>>108636352
>>108636355
>>108636357
failures in life

Anonymous
04/19/26(Sun)05:59:36 No.108636362

Anonymous 04/19/26(Sun)05:59:36 No.108636362

>>108636353
she looks like she's dying

Anonymous
04/19/26(Sun)06:00:00 No.108636363

Anonymous 04/19/26(Sun)06:00:00 No.108636363

>>108636358
I can afford to masturbate all day because I am successful.

Anonymous
04/19/26(Sun)06:00:20 No.108636365

Anonymous 04/19/26(Sun)06:00:20 No.108636365

>>108636362
don't smile because it happened, cry because it over

Anonymous
04/19/26(Sun)06:00:35 No.108636366

Anonymous 04/19/26(Sun)06:00:35 No.108636366

File: 00002-1378487878 (4).png (1.3 MB, 1024x1024)

1.3 MB PNG

>>108636353

Anonymous
04/19/26(Sun)06:00:37 No.108636367

Anonymous 04/19/26(Sun)06:00:37 No.108636367

>>108636358
>posting touhou pictures while misunderstanding technology on a technology subreddit
>succeeding in life

Anonymous
04/19/26(Sun)06:01:03 No.108636368

Anonymous 04/19/26(Sun)06:01:03 No.108636368

>>108636362
shes day2 gemma

Anonymous
04/19/26(Sun)06:01:34 No.108636371

Anonymous 04/19/26(Sun)06:01:34 No.108636371

>>108636367
I don't make the rules. broskichan

Anonymous
04/19/26(Sun)06:02:44 No.108636379

Anonymous 04/19/26(Sun)06:02:44 No.108636379

>>108636371
>still replying
>succeeding in life

Anonymous
04/19/26(Sun)06:04:44 No.108636388

Anonymous 04/19/26(Sun)06:04:44 No.108636388

>>108636368
At least she looks accessible

Anonymous
04/19/26(Sun)06:07:26 No.108636401

Anonymous 04/19/26(Sun)06:07:26 No.108636401

>>108636249
that's gohufont

>>108636306
meh, i don't mind

>>108636308
correct, i spent a total of 30 seconds on it, and the model is not that good

what i was working on were the search and fetch tools

Anonymous
04/19/26(Sun)06:13:08 No.108636422

Anonymous 04/19/26(Sun)06:13:08 No.108636422

>>108636388
Incels in this thread would fuck a corpse if situation presented itself

Anonymous
04/19/26(Sun)06:16:11 No.108636431

Anonymous 04/19/26(Sun)06:16:11 No.108636431

>>108636422
I really wish I could disagree with you but you're probably not wrong. Though, I see it as a symptom of the disease and not the disease itself. You can do very big damage to the social fabric with a relatively small hat... If you catch my meaning.

Anonymous
04/19/26(Sun)06:16:57 No.108636433

Anonymous 04/19/26(Sun)06:16:57 No.108636433

>>108636422
Would be funny to write a scenario about Anon who gets summer job at a morgue.

Anonymous
04/19/26(Sun)06:18:52 No.108636445

Anonymous 04/19/26(Sun)06:18:52 No.108636445

>>108636353
>porcelain skin
she's literally embodied slop

Anonymous
04/19/26(Sun)06:25:01 No.108636462

Anonymous 04/19/26(Sun)06:25:01 No.108636462

File: thundercunt.png (157 KB, 1300x652)

157 KB PNG

>>108636138
>even though it is token inefficient.
it's not tho

Anonymous
04/19/26(Sun)06:26:50 No.108636468

Anonymous 04/19/26(Sun)06:26:50 No.108636468

>>108636462
Those two do not describe the same structure.

Anonymous
04/19/26(Sun)06:29:25 No.108636476

Anonymous 04/19/26(Sun)06:29:25 No.108636476

>>108636468
Cope

Anonymous
04/19/26(Sun)06:35:01 No.108636490

Anonymous 04/19/26(Sun)06:35:01 No.108636490

>>108636476
You are unable to even post in any meaningful fashion.

Anonymous
04/19/26(Sun)06:35:04 No.108636491

Anonymous 04/19/26(Sun)06:35:04 No.108636491

>>108636476
XML should still come out ahead even if you do it properly so you should do that instead of acting like a retard.

Anonymous
04/19/26(Sun)06:36:46 No.108636496

Anonymous 04/19/26(Sun)06:36:46 No.108636496

Remember to take zinc supplements to prepare for the V4 release

Anonymous
04/19/26(Sun)06:37:29 No.108636500

Anonymous 04/19/26(Sun)06:37:29 No.108636500

>>108636496
Unless V4 is scaled down to 31B it's DoA.

Anonymous
04/19/26(Sun)06:37:57 No.108636502

Anonymous 04/19/26(Sun)06:37:57 No.108636502

>>108636496
How will zinc help me get more VRAM?

Anonymous
04/19/26(Sun)06:38:06 No.108636504

Anonymous 04/19/26(Sun)06:38:06 No.108636504

>>108636500
sucks to be p-word

Anonymous
04/19/26(Sun)06:38:16 No.108636506

Anonymous 04/19/26(Sun)06:38:16 No.108636506

>>108636468
Looking at the tokenize preview at the bottom, it appears as if they both describe prompts for "Sr. Strategic Consultant."
It doesn't surprise me that json schema would be a nightmare efficiency wise, I bet ~40% of those characters are spaces which is pure waste.

His original prompt is still stupid, though.

Anonymous
04/19/26(Sun)06:39:25 No.108636510

Anonymous 04/19/26(Sun)06:39:25 No.108636510

>>108636506
json has separate fields for "title" and "goal", in xml they are a single concatenated string in "role"

Anonymous
04/19/26(Sun)06:42:41 No.108636529

Anonymous 04/19/26(Sun)06:42:41 No.108636529

>>108636510
So you be saying... *smacks lips*
These examples are not comparable except maybe in idiocy.

Anonymous
04/19/26(Sun)06:43:12 No.108636533

Anonymous 04/19/26(Sun)06:43:12 No.108636533

>>108636496
zinc won't help you if k2.6 drains all your semen before that

Anonymous
04/19/26(Sun)06:45:09 No.108636540

Anonymous 04/19/26(Sun)06:45:09 No.108636540

>>108636510
They do seem to differ, but I'd bet even if the content was made as identical as possible the xml would still win out, because again: Spaces.

And I can say from experience that conservatively using xml styling can help to avoid confusion in longer prompts, for instance when sending a shitload of background character details to a writing prompt and separating them by <charname_profile></charname_profile>.
So while his sysprompt is excessive and silly, he's stumbled upon concept that's actually useful.

Anonymous
04/19/26(Sun)06:45:14 No.108636541

Anonymous 04/19/26(Sun)06:45:14 No.108636541

>>108636533
>implying a coding model can drain my balls
i have standards

Anonymous
04/19/26(Sun)06:45:43 No.108636545

Anonymous 04/19/26(Sun)06:45:43 No.108636545

>>108633059
lurk more

Anonymous
04/19/26(Sun)06:47:16 No.108636552

Anonymous 04/19/26(Sun)06:47:16 No.108636552

>>108636540
>because again: Spaces
Why is /g/ full of retards?

Anonymous
04/19/26(Sun)06:48:21 No.108636560

Anonymous 04/19/26(Sun)06:48:21 No.108636560

>>108636552
Spaces are tokenized you mongoloid. Json schema necessitates indentation.

Anonymous
04/19/26(Sun)06:48:34 No.108636561

Anonymous 04/19/26(Sun)06:48:34 No.108636561

this nigga qwen overthinks so much

Anonymous
04/19/26(Sun)06:49:32 No.108636567

Anonymous 04/19/26(Sun)06:49:32 No.108636567

>>108636560
Why would you ever feed raw jsons to your model?

Anonymous
04/19/26(Sun)06:49:49 No.108636569

Anonymous 04/19/26(Sun)06:49:49 No.108636569

>>108636560
>Json schema necessitates indentation.
What the fuck are you on about?

Anonymous
04/19/26(Sun)06:50:05 No.108636571

Anonymous 04/19/26(Sun)06:50:05 No.108636571

>>108636567
That's the exact fucking point I'm making. It's worse. Don't do it.

Anonymous
04/19/26(Sun)06:50:13 No.108636572

Anonymous 04/19/26(Sun)06:50:13 No.108636572

>>108636552
>>108636560
you both dumb it the " that take the tookens

Anonymous
04/19/26(Sun)06:56:08 No.108636590

Anonymous 04/19/26(Sun)06:56:08 No.108636590

It's veey easy to double check how effective your instructions are with the current models. Examine its reasoning.
You don't need formatting, outside of common sense.
title: subject
It's not that hard.

Anonymous
04/19/26(Sun)06:57:18 No.108636595

Anonymous 04/19/26(Sun)06:57:18 No.108636595

>>108636590
Forgot: system role itself already acts as formatting delimiter and so on.

Anonymous
04/19/26(Sun)06:59:11 No.108636610

Anonymous 04/19/26(Sun)06:59:11 No.108636610

How much worse does the moe Gemma actually perform vs the 31B?

Anonymous
04/19/26(Sun)07:00:57 No.108636616

Anonymous 04/19/26(Sun)07:00:57 No.108636616

>>108636610
no

Anonymous
04/19/26(Sun)07:02:35 No.108636626

Anonymous 04/19/26(Sun)07:02:35 No.108636626

>>108636610
26b is censored and doesn't follow system prompt
it's better to use Qwen MoE if you want to code

Anonymous
04/19/26(Sun)07:03:29 No.108636631

Anonymous 04/19/26(Sun)07:03:29 No.108636631

lol they're still trying lmao

Anonymous
04/19/26(Sun)07:03:55 No.108636637

Anonymous 04/19/26(Sun)07:03:55 No.108636637

>>108636610
both are good at giving head

Anonymous
04/19/26(Sun)07:04:49 No.108636639

Anonymous 04/19/26(Sun)07:04:49 No.108636639

>>108636637
she's only 4b (active) you sick fuck

Anonymous
04/19/26(Sun)07:04:52 No.108636640

Anonymous 04/19/26(Sun)07:04:52 No.108636640

>>108636610
Pretty minimal, but becomes more noticeable at high context. For RP/creative it's like 95% as good.

Anonymous
04/19/26(Sun)07:05:36 No.108636644

Anonymous 04/19/26(Sun)07:05:36 No.108636644

>>108636610
Depends on your use case.
For creative writing of any kind? the MoE is absolute crap compared to the 31b.
For assistant with some toolcalling? the MoE is so much faster that it's forgivable that it makes more mistakes.

Anonymous
04/19/26(Sun)07:07:10 No.108636651

Anonymous 04/19/26(Sun)07:07:10 No.108636651

>>108636640
>me when i lie on the interwebs

Anonymous
04/19/26(Sun)07:07:33 No.108636655

Anonymous 04/19/26(Sun)07:07:33 No.108636655

>>108636639
(Effectively) 4b, but she's as mature as an 8b.

Anonymous
04/19/26(Sun)07:08:34 No.108636659

Anonymous 04/19/26(Sun)07:08:34 No.108636659

>>108636651
meant for >>108636644

Anonymous
04/19/26(Sun)07:08:46 No.108636660

Anonymous 04/19/26(Sun)07:08:46 No.108636660

>>108636561
Just like real girl!

Anonymous
04/19/26(Sun)07:10:09 No.108636664

Anonymous 04/19/26(Sun)07:10:09 No.108636664

>>108636659
What part of that do you consider a lie? I'm currently using both models for the tasks I mentioned.
The MoE lives in my hermes-agent instance and the 31b is my sillytavern nigga. It's what they're good for.

Anonymous
04/19/26(Sun)07:12:48 No.108636673

Anonymous 04/19/26(Sun)07:12:48 No.108636673

>>108636664
I regularly switch between the two, doing a/b tests with different characters and scenarios with differing context levels ant the 26b is rarely noticeably worse than the 31b until you hit at least ~20k context, and even then the difference isn't huge.

Anonymous
04/19/26(Sun)07:13:55 No.108636678

Anonymous 04/19/26(Sun)07:13:55 No.108636678

>>108636610
Few people seem to realize that Gemma-4-26B has half the number of layers and dimension as the 31B dense version.

I made a calculation a couple days ago and determined that a hypothetical MoE Gemma-4-31B that would actually be on par (at best) with the dense version (same layers and dimension) would need to have 8~10B active parameters, unless Google can come up with novel sparsity techniques. I guess that a 31B-A10B model would look too attractive, though.

Anonymous
04/19/26(Sun)07:15:23 No.108636684

Anonymous 04/19/26(Sun)07:15:23 No.108636684

>>108636678
>would look
*Wouldn't

Anonymous
04/19/26(Sun)07:16:11 No.108636687

Anonymous 04/19/26(Sun)07:16:11 No.108636687

>>108636344
this but unironically

Anonymous
04/19/26(Sun)07:18:54 No.108636697

Anonymous 04/19/26(Sun)07:18:54 No.108636697

Why does reddit love qwen so much? Gemma4 got like 4 threads since launch, qwen has like triple that in 3 days.

Anonymous
04/19/26(Sun)07:20:06 No.108636700

Anonymous 04/19/26(Sun)07:20:06 No.108636700

>>108636697
Because /lmg/ use case is different from /r/localllama use case?

Anonymous
04/19/26(Sun)07:21:27 No.108636708

Anonymous 04/19/26(Sun)07:21:27 No.108636708

>>108636697
literary paid bing shilling

Anonymous
04/19/26(Sun)07:22:01 No.108636709

Anonymous 04/19/26(Sun)07:22:01 No.108636709

>>108636697
Different bots than here

Anonymous
04/19/26(Sun)07:23:12 No.108636713

Anonymous 04/19/26(Sun)07:23:12 No.108636713

>>108636673
>26b is rarely noticeably worse than the 31b until you hit at least ~20k context, and even then the difference isn't huge.
It's absolutely night and day for me, and to be fair a lot of my chats are more in the 40k token range now, and have multiple characters, but even when just getting it to write character profiles for me based on wiki text the MoE was noticeably worse. It's dry, doesn't get accents, and just misses character details.

Anonymous
04/19/26(Sun)07:24:23 No.108636718

Anonymous 04/19/26(Sun)07:24:23 No.108636718

>>108636358
>soulless jeet has an opinion

Anonymous
04/19/26(Sun)07:24:48 No.108636721

Anonymous 04/19/26(Sun)07:24:48 No.108636721

File: 1106001-close up photogra(...).jpg (1.47 MB, 2048x2720)

1.47 MB JPG

qwen could never

>>108636700
qwen is literally worse at agentic tasks and following instructions >>108636007

Anonymous
04/19/26(Sun)07:24:58 No.108636722

Anonymous 04/19/26(Sun)07:24:58 No.108636722

I mean, I don't think most 'nons are suggesting replacing 31 for 26 for those that can run it. But since most of the thread is poor VramCucklets 26 is about as good as it gets with reasonable speed/quality.

Anonymous
04/19/26(Sun)07:25:49 No.108636725

Anonymous 04/19/26(Sun)07:25:49 No.108636725

>>108636673
its far worse imo but i prefer it due to being able to have 200k context

Anonymous
04/19/26(Sun)07:26:47 No.108636733

Anonymous 04/19/26(Sun)07:26:47 No.108636733

>>108636610
Worse enough that Google acknowledges the difference.

https://www.youtube.com/watch?v=jZVBoFOJK-Q
> [1:27] The 26B MoE with 3.8B in activated parameters is exceptionally fast, while 31B is optimized for output quality.

Anonymous
04/19/26(Sun)07:31:34 No.108636761

Anonymous 04/19/26(Sun)07:31:34 No.108636761

>>108636697
Shills, some are paid and some are retarded sock puppets.
Chinese have spammed 4chan too in the past, maybe they have moved on to plebbit now.

Anonymous
04/19/26(Sun)07:32:30 No.108636771

Anonymous 04/19/26(Sun)07:32:30 No.108636771

now that gemma 4 cache works with swa-full flag and speculative decoding i am ready to use qwen 9b less

Anonymous
04/19/26(Sun)07:32:31 No.108636772

Anonymous 04/19/26(Sun)07:32:31 No.108636772

>>108636733
Models are not selectively 'optimized for output quality'
All that means is that the 31B will outperform the 26B, which is fucking obvious because it's bigger. That's an advertisement aimed at people who have no knowledge of running LLMs.

Anonymous
04/19/26(Sun)07:33:01 No.108636774

Anonymous 04/19/26(Sun)07:33:01 No.108636774

File: 1767397747200756.png (65 KB, 812x712)

65 KB PNG

>finally finding the origin of not X, but Y slop
Thanks safety I guess?

Anonymous
04/19/26(Sun)07:33:32 No.108636779

Anonymous 04/19/26(Sun)07:33:32 No.108636779

>>108636771
>now that gemma 4 cache works with swa-full flag
wha

Anonymous
04/19/26(Sun)07:34:11 No.108636783

Anonymous 04/19/26(Sun)07:34:11 No.108636783

File: Thinking_Face_Emoji.png (111 KB, 640x640)

111 KB PNG

>>108636774
>24s
Is this normal

Anonymous
04/19/26(Sun)07:34:36 No.108636789

Anonymous 04/19/26(Sun)07:34:36 No.108636789

>>108636774
>not just to produce text, but to produce response that humans rate as helpful

Anonymous
04/19/26(Sun)07:36:41 No.108636797

Anonymous 04/19/26(Sun)07:36:41 No.108636797

>>108636771
>swa-full
Does what?
What model are you using for speculative decoding?

Anonymous
04/19/26(Sun)07:37:11 No.108636802

Anonymous 04/19/26(Sun)07:37:11 No.108636802

File: saki_07_touka1.jpg (46 KB, 640x360)

46 KB JPG

>model advertised as 9GB
>runtime usage: 12GB
What is stealing my VRAM bwos

Anonymous
04/19/26(Sun)07:38:33 No.108636813

Anonymous 04/19/26(Sun)07:38:33 No.108636813

>>108636802
context

Anonymous
04/19/26(Sun)07:38:44 No.108636815

Anonymous 04/19/26(Sun)07:38:44 No.108636815

>>108636510
yeah probably, i just told gemma-chan to convert it to json
try it yourself tho, and even fucking yaml are more efficient than they look

Anonymous
04/19/26(Sun)07:39:14 No.108636817

Anonymous 04/19/26(Sun)07:39:14 No.108636817

>>108636802
unused vram is wasted vram

Anonymous
04/19/26(Sun)07:41:22 No.108636828

Anonymous 04/19/26(Sun)07:41:22 No.108636828

>>108636721
>qwen is literally worse at agentic tasks and following instructions
it's better at shitting out code via claude-code tho
i wish that weren't the case because it's insufferable and i'd love to delete it

Anonymous
04/19/26(Sun)07:43:00 No.108636836

Anonymous 04/19/26(Sun)07:43:00 No.108636836

>>108636610
For my storywriting, I like both for their writing style, but 26b failed to understand my (perhaps a bit vague) prompts and 'get' the story where 31b did fine.
It's a lot slower, but I prefer the 31b

Anonymous
04/19/26(Sun)07:47:48 No.108636862

Anonymous 04/19/26(Sun)07:47:48 No.108636862

>>108636802
try with -c 32768

Anonymous
04/19/26(Sun)07:48:19 No.108636863

Anonymous 04/19/26(Sun)07:48:19 No.108636863

>>108636815
You are admitting that you are a techlet retard.

Anonymous
04/19/26(Sun)07:50:32 No.108636872

Anonymous 04/19/26(Sun)07:50:32 No.108636872

>>108636863
Pretty sad that for all your tech knowledge, you've ended up in the exact same place

Anonymous
04/19/26(Sun)07:51:19 No.108636875

Anonymous 04/19/26(Sun)07:51:19 No.108636875

>>108636863
no u

Anonymous
04/19/26(Sun)07:52:06 No.108636881

Anonymous 04/19/26(Sun)07:52:06 No.108636881

>>108636836
I was surprised how much 26b gets (even if it sometimes doesn't act on it). Can't test 31b here.

Anonymous
04/19/26(Sun)07:53:46 No.108636890

Anonymous 04/19/26(Sun)07:53:46 No.108636890

>>108636789
Yes? That's how assistant-like behavior is achieved. The LLM has no concept of what is helpful before that point, after all

Anonymous
04/19/26(Sun)07:54:14 No.108636891

Anonymous 04/19/26(Sun)07:54:14 No.108636891

>>108636779
>>108636797
swa-full expands the cache to the full context size, sucks for memory usage but worth it for me
the issue #21468 is still open so they must have "fixed" it by accident
>What model are you using for speculative decoding?
Meant the self-speculative decoding method, currently with ngram-mod

Anonymous
04/19/26(Sun)07:54:19 No.108636893

Anonymous 04/19/26(Sun)07:54:19 No.108636893

>>108636872
What do you mean? I'm still better than you.

Anonymous
04/19/26(Sun)07:54:28 No.108636894

Anonymous 04/19/26(Sun)07:54:28 No.108636894

>>108636862
how much Gs is that?

Anonymous
04/19/26(Sun)07:55:00 No.108636897

Anonymous 04/19/26(Sun)07:55:00 No.108636897

>>108636893
Seems like the opposite actually

Anonymous
04/19/26(Sun)07:55:48 No.108636905

Anonymous 04/19/26(Sun)07:55:48 No.108636905

>>108636894
About three fiddy

Anonymous
04/19/26(Sun)07:56:53 No.108636907

Anonymous 04/19/26(Sun)07:56:53 No.108636907

>>108636881
To elaborate, 26b (and the new qwen 3.6 MoE) made a logical error in the story. The 31b understood that the dragon shifted away for the day and wasn't lurking in the corner.

Anonymous
04/19/26(Sun)07:57:03 No.108636910

Anonymous 04/19/26(Sun)07:57:03 No.108636910

>>108636893
retard

Anonymous
04/19/26(Sun)07:57:40 No.108636914

Anonymous 04/19/26(Sun)07:57:40 No.108636914

>>108636907
Did this happen in Eldoria?

Anonymous
04/19/26(Sun)07:58:08 No.108636917

Anonymous 04/19/26(Sun)07:58:08 No.108636917

>>108636863
yeah but if you run llama.cpp on your system, my code is on your system lmao

Anonymous
04/19/26(Sun)07:59:00 No.108636924

Anonymous 04/19/26(Sun)07:59:00 No.108636924

Gemma moe + qwen moe multi round discussions + CLI + diffusion draft models would slay qweens. Best of both worlds

Anonymous
04/19/26(Sun)08:00:12 No.108636930

Anonymous 04/19/26(Sun)08:00:12 No.108636930

>>108636914
No name for the kingdom. But chatgpt of all things named the main character Kael.

Anonymous
04/19/26(Sun)08:00:17 No.108636931

Anonymous 04/19/26(Sun)08:00:17 No.108636931

>>108636924
>Gemma moe + qwen moe multi round discussions + CLI + diffusion draft models would slay qweens. Best of both worlds
post logs

Anonymous
04/19/26(Sun)08:01:29 No.108636938

Anonymous 04/19/26(Sun)08:01:29 No.108636938

>>108636917
This explains why llama-server is so much worse now than one year ago.

Anonymous
04/19/26(Sun)08:03:10 No.108636942

Anonymous 04/19/26(Sun)08:03:10 No.108636942

>>108636938
Why haven't you made something better?

Anonymous
04/19/26(Sun)08:03:57 No.108636944

Anonymous 04/19/26(Sun)08:03:57 No.108636944

>>108636942
I don't argue with retards.

Anonymous
04/19/26(Sun)08:05:58 No.108636948

Anonymous 04/19/26(Sun)08:05:58 No.108636948

>>108636610
likely significantly worse
just compare the active parameters of both but we’ll hear the usual cope

Anonymous
04/19/26(Sun)08:07:55 No.108636963

Anonymous 04/19/26(Sun)08:07:55 No.108636963

>>108636924
No homemade diffusion draft model will be able to properly predict Gemma 4's reasoning and responses and give any significant speedup. Generic speculative decoding only works for straightforward, highly predictable stuff like boilerplate code.

Anonymous
04/19/26(Sun)08:10:14 No.108636977

Anonymous 04/19/26(Sun)08:10:14 No.108636977

>>108636907
I'm not doubting you, I just can't really test it. I'd have t use non-local 31b, but then I don't know all the parameters and stuff.

Anonymous
04/19/26(Sun)08:10:17 No.108636979

Anonymous 04/19/26(Sun)08:10:17 No.108636979

>>108636944
Are you hitting on me?

Anonymous
04/19/26(Sun)08:15:00 No.108637000

Anonymous 04/19/26(Sun)08:15:00 No.108637000

Moe Gemma is more moe as Gemma-chan

Standard ---> Advanced ---> Hy(...)
04/19/26(Sun)08:15:59 No.108637004

Standard ---> Advanced ---> HyperAdvanced 04/19/26(Sun)08:15:59 No.108637004

>>108636979
>Soul System Beyond Sol, ThankYou
>This is an imageboard
>Beware times careful to grin
>Whoosh
><3

Anonymous
04/19/26(Sun)08:24:22 No.108637034

Anonymous 04/19/26(Sun)08:24:22 No.108637034

File: SmartSelect_20260419-1821(...).jpg (176 KB, 1080x881)

176 KB JPG

KEKEKEEKE I'M GETTING THOSE SECOND HAND EMBARRASSMENT FROM THIS
https://huggingface.co/sKT-Ai-Labs/SKT-SURYA-H/discussions/6

Anonymous
04/19/26(Sun)08:25:19 No.108637040

Anonymous 04/19/26(Sun)08:25:19 No.108637040

>>108636931
Well I am setting up hermes inside a vm and it supports model orchestration, I want to try "actor-critic" or teacher student concept. 3090 + v100 32gb vramlet. E4b can handle audio and shit so that could be an mcp server thing. I want to give all the tools to qwen in the system prompt and hermes cli shit and let gemma be a backseat driver that generates final output for me the user.

>>108636963
Never tried drafts but it sounded promising but maybe for qwen it could be good then.

Anonymous
04/19/26(Sun)08:26:07 No.108637045

Anonymous 04/19/26(Sun)08:26:07 No.108637045

Besides what the fuck did i just witness here? Jesus fucking christ. You anons gotta see this lmao
https://github.com/Shrijanagain

Anonymous
04/19/26(Sun)08:28:21 No.108637059

Anonymous 04/19/26(Sun)08:28:21 No.108637059

File: file.png (398 KB, 800x450)

398 KB PNG

>>108637045
I had this as a wallpaper when I was like 10.

Anonymous
04/19/26(Sun)08:29:23 No.108637064

Anonymous 04/19/26(Sun)08:29:23 No.108637064

>>108637034
>The real parameter count is 2.28 T (claim is inflated ~11%)
this is so fucking relevant

Anonymous
04/19/26(Sun)08:30:39 No.108637068

Anonymous 04/19/26(Sun)08:30:39 No.108637068

>>108637059
kek never seen a github profile as fucked as that

Anonymous
04/19/26(Sun)08:34:15 No.108637084

Anonymous 04/19/26(Sun)08:34:15 No.108637084

File: file.png (440 KB, 686x386)

440 KB PNG

>>108637045
bruh :skull: :skull:
reminds me of picrel

Anonymous
04/19/26(Sun)08:38:10 No.108637098

Anonymous 04/19/26(Sun)08:38:10 No.108637098

>>108637084
The difference is that these ex-Soviet fucks are less likely to be larpers, they can and will hack your shit.
>t. got hacked and cracked three times, all three were by some bumfuckajistanian in his sub-zero commie block

Anonymous
04/19/26(Sun)08:47:17 No.108637136

Anonymous 04/19/26(Sun)08:47:17 No.108637136

>>108637068
You scroll through several page of myspacesque gifs, images, cringe, and hindu weirdness only to see an activity history that is just a few commits on a python instagram report spammer. Spent more time on the readme than he did writing code his whole life.

Anonymous
04/19/26(Sun)08:51:36 No.108637157

Anonymous 04/19/26(Sun)08:51:36 No.108637157

File: sKT.png (126 KB, 764x540)

126 KB PNG

>>108637034
lmao hf closed the first spam report

Anonymous
04/19/26(Sun)08:53:37 No.108637166

Anonymous 04/19/26(Sun)08:53:37 No.108637166

>>108637157
Please Be Kind And Carefull because It God Name You Bloody Bastard Bitch

Anonymous
04/19/26(Sun)08:54:02 No.108637168

Anonymous 04/19/26(Sun)08:54:02 No.108637168

File: file.png (135 KB, 1031x801)

135 KB PNG

i am dying of cringe help me

Anonymous
04/19/26(Sun)08:56:10 No.108637182

Anonymous 04/19/26(Sun)08:56:10 No.108637182

Hey there. I just wanted to say I can't get over the fact that Gemma 4 is Gemini at home for 99% of my use cases. That will be all.

Anonymous
04/19/26(Sun)08:56:49 No.108637188

Anonymous 04/19/26(Sun)08:56:49 No.108637188

>>108636545
Anon, post your code

Anonymous
04/19/26(Sun)08:57:45 No.108637196

Anonymous 04/19/26(Sun)08:57:45 No.108637196

>>108637157
>Delete Git. Delete HuggingFace. Delete India.
Dangerously based. You have prompted your assistant well.

Anonymous
04/19/26(Sun)08:57:46 No.108637197

Anonymous 04/19/26(Sun)08:57:46 No.108637197

>>108637188
https://github.com/1aienthusiast/audiocraft-infinity-webui
171 saars :)

Anonymous
04/19/26(Sun)08:57:58 No.108637199

Anonymous 04/19/26(Sun)08:57:58 No.108637199

>>108637168
But can you move beyond the non euclidean manifold

Anonymous
04/19/26(Sun)09:00:13 No.108637210

Anonymous 04/19/26(Sun)09:00:13 No.108637210

>>108637182
Thanks!

Anonymous
04/19/26(Sun)09:02:57 No.108637225

Anonymous 04/19/26(Sun)09:02:57 No.108637225

>>108637168
what the fuck is this schizophrenia... even llms these days aren't retarded enough to generate something like this

Anonymous
04/19/26(Sun)09:03:34 No.108637229

Anonymous 04/19/26(Sun)09:03:34 No.108637229

>>108637168
cudadev has been oddly quiet since this dropped

Anonymous
04/19/26(Sun)09:04:47 No.108637241

Anonymous 04/19/26(Sun)09:04:47 No.108637241

File: 1774962703553709.jpg (47 KB, 686x815)

47 KB JPG

Uh bros, when are we getting something like this locally? https://seed.bytedance.com/en/blog/introducing-seed-full-duplex-speech-llm-attentive-listening-robust-interference-suppression-enabling-more-natural-interaction

Anonymous
04/19/26(Sun)09:04:58 No.108637242

Anonymous 04/19/26(Sun)09:04:58 No.108637242

>>108637225
Probably used qwen 0.6b with a schizo prompt in hindi

Anonymous
04/19/26(Sun)09:05:15 No.108637245

Anonymous 04/19/26(Sun)09:05:15 No.108637245

File: 1747224726730263.png (26 KB, 1179x126)

26 KB PNG

>>108637034
>i-it can't be real because it's too good to be true!

Anonymous
04/19/26(Sun)09:08:45 No.108637267

Anonymous 04/19/26(Sun)09:08:45 No.108637267

>>108637245
This guy is almost as retarded as the jeets who cobbled together that abomination.

Anonymous
04/19/26(Sun)09:11:37 No.108637282

Anonymous 04/19/26(Sun)09:11:37 No.108637282

>>108637267
>the jeets
its a fucking 12 year old with free chatgpt who probably loves watching jujetsu kaisen hindi explanation yt vids on his mother's computer

Anonymous
04/19/26(Sun)09:12:51 No.108637290

Anonymous 04/19/26(Sun)09:12:51 No.108637290

chinks currently beta testing bodies i will run gemma on https://www.youtube.com/watch?v=zqgc9C3cC6U

Anonymous
04/19/26(Sun)09:13:54 No.108637298

Anonymous 04/19/26(Sun)09:13:54 No.108637298

>>108637282
All the more reason that india should be banned from the internet

Anonymous
04/19/26(Sun)09:14:40 No.108637305

Anonymous 04/19/26(Sun)09:14:40 No.108637305

>>108637282
Scratch that you're right
It's a 12-year old jeet larping as an AI researcher
Look at this kid's github profile, absolute concentrated secondhand embarrassment
https://github.com/Shrijanagain

Anonymous
04/19/26(Sun)09:20:41 No.108637347

Anonymous 04/19/26(Sun)09:20:41 No.108637347

>>108637282
This is why online age verification is a good thing.

Anonymous
04/19/26(Sun)09:21:32 No.108637351

Anonymous 04/19/26(Sun)09:21:32 No.108637351

>>108637034
I thought HF started limiting uploads from new/non-verified accounts?
How did this random jeet get 4tb of upload space?

Anonymous
04/19/26(Sun)09:22:41 No.108637359

Anonymous 04/19/26(Sun)09:22:41 No.108637359

>>108637045
>https://github.com/SHRIJANAGAIN/ST-x-LIGHTING
we should open prs that are perfect for gorgeous looks

Anonymous
04/19/26(Sun)09:23:08 No.108637362

Anonymous 04/19/26(Sun)09:23:08 No.108637362

>>108637225
Looks very much like what GPT-4o would output, and looks like the AI psychosis it would induce.

Anonymous
04/19/26(Sun)09:23:15 No.108637364

Anonymous 04/19/26(Sun)09:23:15 No.108637364

>>108637351
by default every user get ~8T of public storage

Anonymous
04/19/26(Sun)09:24:10 No.108637371

Anonymous 04/19/26(Sun)09:24:10 No.108637371

>>108637364
That really doesn't seem sustainable

Anonymous
04/19/26(Sun)09:25:08 No.108637378

Anonymous 04/19/26(Sun)09:25:08 No.108637378

>>108637364
What a terrible idea.

Anonymous
04/19/26(Sun)09:26:54 No.108637389

Anonymous 04/19/26(Sun)09:26:54 No.108637389

>>108637371
though it's not like photo storage and 90% of users are just lurkers
it scales via social norms and limited number of people who can actualy do shit

Anonymous
04/19/26(Sun)09:28:42 No.108637399

Anonymous 04/19/26(Sun)09:28:42 No.108637399

>>108637389
If this dude can get away with storing 4tb of completely unrunnable junk tensors, I'm gonna start storing env images there and renaming them .safetensors.

Anonymous
04/19/26(Sun)09:29:32 No.108637413

Anonymous 04/19/26(Sun)09:29:32 No.108637413

File: 1774884742002271.png (1.14 MB, 1366x1366)

1.14 MB PNG

>>108637305

Anonymous
04/19/26(Sun)09:32:26 No.108637431

Anonymous 04/19/26(Sun)09:32:26 No.108637431

File: Screenshot 2026-04-19 at (...).png (821 KB, 1268x6726)

821 KB PNG

Llama.cpp DFlash support soon ™

Anonymous
04/19/26(Sun)09:33:00 No.108637436

Anonymous 04/19/26(Sun)09:33:00 No.108637436

>>108637364
>by default every user get ~8T of public storage
Yeah and they'll keep reducing it with a "surprise butt-sex" announcement 2 days before the next billing cycle like they've been doing for a while now.

Anonymous
04/19/26(Sun)09:34:28 No.108637443

Anonymous 04/19/26(Sun)09:34:28 No.108637443

>>108637431
>AI usage disclosure: Yes,
closed in 3... 2...

Anonymous
04/19/26(Sun)09:34:33 No.108637444

Anonymous 04/19/26(Sun)09:34:33 No.108637444

>>108637431
it will never get merged until they publish the training code

Anonymous
04/19/26(Sun)09:34:49 No.108637445

Anonymous 04/19/26(Sun)09:34:49 No.108637445

>>108637431
Has dflash even released training code yet? It'll be hilarious if it gets support before EAGLE or general MTP when nobody can even train the diffusion models.

Anonymous
04/19/26(Sun)09:36:00 No.108637456

Anonymous 04/19/26(Sun)09:36:00 No.108637456

>>108637444
>>108637445
Bonsai shit was merged without training code. Cudadev rightly calls it a waste of time, but clearly that's not a blocker for most of the contributors.

Anonymous
04/19/26(Sun)09:38:08 No.108637461

Anonymous 04/19/26(Sun)09:38:08 No.108637461

>>108637431
is it possible to train the draft model with a consumer gpu though
for the absolute best gain one would like to train the drafter per quant

Anonymous
04/19/26(Sun)09:38:59 No.108637469

Anonymous 04/19/26(Sun)09:38:59 No.108637469

>>108637431
Is this a new technique? How much more memory does the draft model use?

Anonymous
04/19/26(Sun)09:39:37 No.108637473

Anonymous 04/19/26(Sun)09:39:37 No.108637473

File: bro I'm crine.png (435 KB, 980x1382)

435 KB PNG

>>108637034
https://github.com/Shrijanagain
LOOK AT HIS FUCKING GITHUB LMAOO

Anonymous
04/19/26(Sun)09:40:12 No.108637482

Anonymous 04/19/26(Sun)09:40:12 No.108637482

>>108637347
Over half of this threads posters would be gone then.

Anonymous
04/19/26(Sun)09:43:24 No.108637501

Anonymous 04/19/26(Sun)09:43:24 No.108637501

>>108637469
Varies per model, since the diffusion model has to be trained for each
https://github.com/z-lab/dflash
https://huggingface.co/collections/z-lab/dflash
Looks like just under 1gb for the qwen 25b moe and about 7gb for Kimi k2.5

Anonymous
04/19/26(Sun)09:44:52 No.108637511

Anonymous 04/19/26(Sun)09:44:52 No.108637511

behold my ai setup
>3090
>5060 ti 16gb
>3060 12gb that I fished from the trash
pure power sirs

Anonymous
04/19/26(Sun)09:47:47 No.108637521

Anonymous 04/19/26(Sun)09:47:47 No.108637521

>>108637511
how many pcie lanes do they have?

Anonymous
04/19/26(Sun)09:49:36 No.108637531

Anonymous 04/19/26(Sun)09:49:36 No.108637531

>>108637511
>that I fished from the trash
you what?

Anonymous
04/19/26(Sun)09:51:11 No.108637538

Anonymous 04/19/26(Sun)09:51:11 No.108637538

>>108637531
have you never heard of dumpster diving?

Anonymous
04/19/26(Sun)09:51:53 No.108637543

Anonymous 04/19/26(Sun)09:51:53 No.108637543

>>108637531
>He doesn't dumpster dive for parts
You're going to think this is a joke but a significant amount of my hard drives came from the side of the road.

Anonymous
04/19/26(Sun)09:53:08 No.108637556

Anonymous 04/19/26(Sun)09:53:08 No.108637556

>>108637543
nta but my first pc build was a dumpster special mix'n match

Anonymous
04/19/26(Sun)09:53:44 No.108637559

Anonymous 04/19/26(Sun)09:53:44 No.108637559

>>108637538
no fuckin way theres a 3060 in the trash
>>108637543
wtf

Anonymous
04/19/26(Sun)09:53:58 No.108637561

Anonymous 04/19/26(Sun)09:53:58 No.108637561

>>108637552
>>108637552
>>108637552

Anonymous
04/19/26(Sun)09:54:34 No.108637564

Anonymous 04/19/26(Sun)09:54:34 No.108637564

>>108637538
>>108637543
who just throws away a functional 3060?

Anonymous
04/19/26(Sun)09:55:31 No.108637573

Anonymous 04/19/26(Sun)09:55:31 No.108637573

>>108637564
Upper middle class people with prebuilts upgrading for whom it isn't worth the effort to put up for sale.

Anonymous
04/19/26(Sun)09:55:59 No.108637579

Anonymous 04/19/26(Sun)09:55:59 No.108637579

>>108637564
retards thinking their whole prebuilt is broken

Anonymous
04/19/26(Sun)09:56:42 No.108637583

Anonymous 04/19/26(Sun)09:56:42 No.108637583

>>108637564
it's like a fancy dinner amount of money
not that it's not wasteful but also >>108637579

Anonymous
04/19/26(Sun)09:56:44 No.108637585

Anonymous 04/19/26(Sun)09:56:44 No.108637585

>>108637559
>wtf
People just throw perfectly good shit out, man.
There was a time a few years back when all the normies were trading out their family and work PC's for laptops, macbooks, or tablets.
So they just left perfectly good PC's by the side of the road. I can't even tell you how much use I've gotten out of just that one haul.
And it's still pretty normal today for wasteful people to dump prebuilts and laptops which are either perfectly repairable or full of usable parts.

Anonymous
04/19/26(Sun)09:58:32 No.108637593

Anonymous 04/19/26(Sun)09:58:32 No.108637593

File: 1747259992672952.png (47 KB, 794x514)

47 KB PNG

>>108637431
This is going to go like it did for EAGLE3 and MTP. The guy implementing will realize that the real world gains for the llama.cpp implementation fall short. He won't be able to fix it and the PR dies.

Anonymous
04/19/26(Sun)10:00:10 No.108637601

Anonymous 04/19/26(Sun)10:00:10 No.108637601

>>108637593
Anon the real world gains are great for dense models, if it's shit for MoE's that doesn't invalidate it.

Anonymous
04/19/26(Sun)10:05:07 No.108637622

Anonymous 04/19/26(Sun)10:05:07 No.108637622

>>108637601
It's still far below what other implementations are seeing

Anonymous
04/19/26(Sun)10:07:08 No.108637641

Anonymous 04/19/26(Sun)10:07:08 No.108637641

>>108637593
Indeed.

Anonymous
04/19/26(Sun)10:08:28 No.108637647

Anonymous 04/19/26(Sun)10:08:28 No.108637647

>>108637593
>This is going to go like it did for EAGLE3 and MTP. The guy implementing will realize that the real world gains for the llama.cpp implementation fall short. He won't be able to fix it and the PR dies.
Those are on hold because gg's dragging his feet about making a huge general MTP change rather than implementing EAGLE/whatever specifically, look at the PR's. We'd have had EAGLE in december last year if he'd just merged in instead of putting off an API he hasn't touched.

Anonymous
04/19/26(Sun)10:09:42 No.108637658

Anonymous 04/19/26(Sun)10:09:42 No.108637658

>>108637601
Who even cares about dense in 2026?

Anonymous
04/19/26(Sun)10:11:06 No.108637666

Anonymous 04/19/26(Sun)10:11:06 No.108637666

>>108637658
The new hotness (Gemma 4 31b) is dense, you dingus.

Anonymous
04/19/26(Sun)10:12:15 No.108637672

Anonymous 04/19/26(Sun)10:12:15 No.108637672

what if we combine 4 copies of gemma 31b... Gemma-124b-ultra-4x-mesugaki-UNCENSORED HERETIC-ILLEGAL-DARK-POWER-PLAY

Anonymous
04/19/26(Sun)10:14:49 No.108637684

Anonymous 04/19/26(Sun)10:14:49 No.108637684

>>108637672
Has davidAU not yet done that? Give it a month.

Anonymous
04/19/26(Sun)10:15:19 No.108637686

Anonymous 04/19/26(Sun)10:15:19 No.108637686

>>108637672
DavidAU presents:

Anonymous
04/19/26(Sun)10:15:24 No.108637687

Anonymous 04/19/26(Sun)10:15:24 No.108637687

>>108637672
just combine gemma 31b with glm 4.6 and we'll have something close to perfect

Anonymous
04/19/26(Sun)10:21:27 No.108637725

Anonymous 04/19/26(Sun)10:21:27 No.108637725

>who needs densemod
He lies to us through song!

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.