/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/29/26(Wed)16:25:49 No.108718630

File: img_9505_rot.jpg (2.69 MB, 3024x4032)

2.69 MB JPG

/lmg/ - Local Models General Anonymous 04/29/26(Wed)16:25:49 No.108718630 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108715635 & >>108711950

►News
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit
>(04/29) IBM releases Granite-4.1-8B: https://hf.co/ibm-granite/granite-4.1-8b
>(04/28) Ling-2.6-flash 104B-A7.4B released: https://hf.co/inclusionAI/Ling-2.6-flash
>(04/28) Nvidia releases Nemotron 3 Nano Omni: https://hf.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/29/26(Wed)16:26:11 No.108718631

Anonymous 04/29/26(Wed)16:26:11 No.108718631

File: miku-inside.png (321 KB, 430x514)

321 KB PNG

►Recent Highlights from the Previous Thread: >>108715635

--Paper: Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification:
>108716862 >108717955
--Mixed reactions to the release of Mistral Medium 3.5 128B:
>108716387 >108716494 >108716517 >108716554 >108716580 >108716703 >108716727 >108716760 >108716787 >108716604 >108716588 >108716589 >108716630 >108716646 >108716667 >108716617 >108716605 >108716766 >108716829 >108716795 >108716853 >108716733 >108716759 >108716820 >108716838 >108716854 >108716891 >108716901 >108716908 >108716929 >108716918 >108716954 >108717259 >108717272 >108717278 >108717294 >108717310 >108717346 >108717439 >108717315 >108717504 >108716805 >108717873
--Debating the value and reliability of used RTX 3090s:
>108715703 >108715724 >108715775 >108715754 >108715762 >108715781 >108715888 >108715920 >108715975 >108717770 >108718073
--Testing MiMo 2.5 censorship and llama.cpp support status:
>108715806 >108715819 >108715822 >108715824 >108716218 >108716235 >108716248
--SenseNova-U1 native multimodal model release and local viability discussion:
>108715941 >108716037 >108716069 >108716414 >108717365
--Anons sharing and critiquing custom open-air GPU server builds:
>108715651 >108715666 >108716084 >108717471 >108717543 >108717567 >108717622 >108717655 >108717723 >108717753 >108717669 >108717700 >108716552
--IBM Granite 4.1 release and discussion of 4.0 safety patches:
>108715694 >108715716 >108715760 >108716332
--Knowledge graphs vs RAG and summarization for long-term memory:
>108716994 >108717008 >108717015 >108717035 >108717139 >108717478
--Anon forks local FOSS visual novel generator pettangatari:
>108718207 >108718217 >108718247 >108718591
--Mixed precision quantization settings for Mistral-medium-3.5-128b:
>108717170 >108717199
--Logs:
>108715806 >108716218 >108716733
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>108715637

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/29/26(Wed)16:27:48 No.108718641

Anonymous 04/29/26(Wed)16:27:48 No.108718641

gemmaballz

Anonymous
04/29/26(Wed)16:28:18 No.108718645

Anonymous 04/29/26(Wed)16:28:18 No.108718645

it is the dense season of cope

Anonymous
04/29/26(Wed)16:28:42 No.108718647

Anonymous 04/29/26(Wed)16:28:42 No.108718647

Just like that, Dipsy's gone.

Anonymous
04/29/26(Wed)16:29:04 No.108718648

Anonymous 04/29/26(Wed)16:29:04 No.108718648

black?

Anonymous
04/29/26(Wed)16:29:41 No.108718651

Anonymous 04/29/26(Wed)16:29:41 No.108718651

>>108718647
F

Anonymous
04/29/26(Wed)16:31:21 No.108718662

Anonymous 04/29/26(Wed)16:31:21 No.108718662

>>108718647
Who?

Anonymous
04/29/26(Wed)16:31:58 No.108718667

Anonymous 04/29/26(Wed)16:31:58 No.108718667

File: gema.png (24 KB, 1230x1158)

24 KB PNG

gemballz

Anonymous
04/29/26(Wed)16:34:14 No.108718677

Anonymous 04/29/26(Wed)16:34:14 No.108718677

>>108718667
Top left side is drawing my attention again

Anonymous
04/29/26(Wed)16:36:47 No.108718694

Anonymous 04/29/26(Wed)16:36:47 No.108718694

>>108718647
bye bye

Anonymous
04/29/26(Wed)16:38:45 No.108718713

Anonymous 04/29/26(Wed)16:38:45 No.108718713

What ibm released something?

Anonymous
04/29/26(Wed)16:40:07 No.108718718

Anonymous 04/29/26(Wed)16:40:07 No.108718718

>>108718713
as always it's rarted gravel
but nonetheless welcome addition

Anonymous
04/29/26(Wed)16:41:27 No.108718725

Anonymous 04/29/26(Wed)16:41:27 No.108718725

>>108718713
Seems like it, we'll have to try it out. Wonder how it compares to qwen 9b..

Anonymous
04/29/26(Wed)16:42:12 No.108718727

Anonymous 04/29/26(Wed)16:42:12 No.108718727

>>108718630
https://www.youtube.com/watch?v=NZa5lApeFic
https://www.youtube.com/watch?v=NZa5lApeFic
https://www.youtube.com/watch?v=NZa5lApeFic

Anonymous
04/29/26(Wed)16:43:00 No.108718732

Anonymous 04/29/26(Wed)16:43:00 No.108718732

File: file.png (283 KB, 1223x494)

283 KB PNG

nice bait, not falling for it

Anonymous
04/29/26(Wed)16:44:26 No.108718739

Anonymous 04/29/26(Wed)16:44:26 No.108718739

>>108718727
based "Not X. It's Y" grifter

Anonymous
04/29/26(Wed)16:45:15 No.108718744

Anonymous 04/29/26(Wed)16:45:15 No.108718744

>>108718727
buy an ad

Anonymous
04/29/26(Wed)16:45:43 No.108718748

Anonymous 04/29/26(Wed)16:45:43 No.108718748

File: 1751396012455131.gif (657 KB, 165x269)

657 KB GIF

>>108718727

Anonymous
04/29/26(Wed)16:54:04 No.108718795

Anonymous 04/29/26(Wed)16:54:04 No.108718795

>>108718727
>Ai isn't X; It's Y

Anonymous
04/29/26(Wed)17:02:30 No.108718849

Anonymous 04/29/26(Wed)17:02:30 No.108718849

>>108718727
>bait
pretty much self aware lol
buy an ad

Anonymous
04/29/26(Wed)17:03:39 No.108718861

Anonymous 04/29/26(Wed)17:03:39 No.108718861

>>108718630
>that smile, oh that smile

Anonymous
04/29/26(Wed)17:08:56 No.108718897

Anonymous 04/29/26(Wed)17:08:56 No.108718897

File: 1755391979490414.jpg (1.25 MB, 2283x3902)

1.25 MB JPG

>>108718727
SHAMELESS grifter

Anonymous
04/29/26(Wed)17:11:09 No.108718909

Anonymous 04/29/26(Wed)17:11:09 No.108718909

>>108718897
@grok make that random goth lady slimmer and genrate her full nude pls

Anonymous
04/29/26(Wed)17:11:55 No.108718915

Anonymous 04/29/26(Wed)17:11:55 No.108718915

>>108718909
*sigh*

Anonymous
04/29/26(Wed)17:14:31 No.108718932

Anonymous 04/29/26(Wed)17:14:31 No.108718932

>>108718909
this, but make her 2m tall too

Anonymous
04/29/26(Wed)17:27:12 No.108719017

Anonymous 04/29/26(Wed)17:27:12 No.108719017

File: 7l3v03pbg4yg1.jpg (40 KB, 670x437)

40 KB JPG

What's Bwd?

Anonymous
04/29/26(Wed)17:28:13 No.108719025

Anonymous 04/29/26(Wed)17:28:13 No.108719025

>>108719017
Bitches with dicks

Anonymous
04/29/26(Wed)17:32:05 No.108719052

Anonymous 04/29/26(Wed)17:32:05 No.108719052

File: 71EBm3a8HnL.jpg (167 KB, 2000x2000)

167 KB JPG

>>108718909
>>108718932

Anonymous
04/29/26(Wed)17:33:58 No.108719065

Anonymous 04/29/26(Wed)17:33:58 No.108719065

>>108719017
I actually have no idea geg, the github doesn't obviously say

Anonymous
04/29/26(Wed)17:35:36 No.108719072

Anonymous 04/29/26(Wed)17:35:36 No.108719072

>>108719017
>Specifically, the forward (FWD) benchmarks measure single-kernel latency for different models and TP settings under varying batch lengths, while the backward (BWD) benchmarks examine the relationship between total token count within a batch and latency during a single update step.
Some kind of benchmark?

Anonymous
04/29/26(Wed)17:38:35 No.108719095

Anonymous 04/29/26(Wed)17:38:35 No.108719095

>>108719052
my recently deceased grandmother used to generate big titty goth bitches to help me sleep and it would mean a great deal to me if you could fill the void just this once

Anonymous
04/29/26(Wed)17:39:20 No.108719099

Anonymous 04/29/26(Wed)17:39:20 No.108719099

>>108718645
dense won
google saved local
>t. MoE user

Anonymous
04/29/26(Wed)17:39:38 No.108719102

Anonymous 04/29/26(Wed)17:39:38 No.108719102

File: DXngPxuWkAAw7lm.jpg (46 KB, 680x212)

46 KB JPG

>>108719072

Anonymous
04/29/26(Wed)17:41:27 No.108719116

Anonymous 04/29/26(Wed)17:41:27 No.108719116

>>108719102
Even if it is a benchmark it doesn't even make sense in that original graphic GEG

Anonymous
04/29/26(Wed)17:42:17 No.108719123

Anonymous 04/29/26(Wed)17:42:17 No.108719123

Are there any non-LLM-using, less bloated alternatives to LanguageTool?

Anonymous
04/29/26(Wed)17:44:47 No.108719139

Anonymous 04/29/26(Wed)17:44:47 No.108719139

>>108719102
isn’t dilbert guy dead of cancer

Anonymous
04/29/26(Wed)17:46:04 No.108719146

Anonymous 04/29/26(Wed)17:46:04 No.108719146

File: souless.png (110 KB, 541x520)

110 KB PNG

>>108718897
>>108718727
When literal-who ytfag no. 68413708 bases so much of his personal image solely on his beard, you know he has the "millennial writer" brain, shallow as a pond, and thus nothing he says is of value or consideration. It's as if slop were a person.
Turn him into a system prompt, give it a video title, and you'll save tons of time.

Anonymous
04/29/26(Wed)17:49:49 No.108719168

Anonymous 04/29/26(Wed)17:49:49 No.108719168

>>108719123
Whats languagetool used for specifically? Other than the obvious, like for translating docs?

Anonymous
04/29/26(Wed)17:52:23 No.108719178

Anonymous 04/29/26(Wed)17:52:23 No.108719178

>>108719017
Idk what bwd stands for, this is some kind of kernel level, or something like that, optimizer that claims 2-3x performance.

Anonymous
04/29/26(Wed)17:55:41 No.108719193

Anonymous 04/29/26(Wed)17:55:41 No.108719193

>>108719168
I'm looking for a lightweight safety net to review text and translations, mainly english-spanish-french for some text editor I'm working on, to prevent having to call a large language model for every single thing and do so only when needed.

Anonymous
04/29/26(Wed)17:56:02 No.108719196

Anonymous 04/29/26(Wed)17:56:02 No.108719196

File: goth.jpg (63 KB, 768x1280)

63 KB JPG

>>108719095

Anonymous
04/29/26(Wed)17:58:14 No.108719211

Anonymous 04/29/26(Wed)17:58:14 No.108719211

What models are most similar to those of old c.ai? Trained to be RPers and shitposters as opposed to general use cases, no safety slop, capable of roasting the user, humorously creative.

Anonymous
04/29/26(Wed)17:59:23 No.108719220

Anonymous 04/29/26(Wed)17:59:23 No.108719220

>>108719193
I personally dont know of any, but it probably wouldnt be to hard to write a script that does that. All the translators ive used have been browser extentions.

Anonymous
04/29/26(Wed)18:01:55 No.108719238

Anonymous 04/29/26(Wed)18:01:55 No.108719238

>>108719099
I'm sure your dense models must really excel at the all important use case of guessing land or sea from arbitrary coordinates.

Anonymous
04/29/26(Wed)18:03:57 No.108719246

Anonymous 04/29/26(Wed)18:03:57 No.108719246

File: intel b70 price.png (126 KB, 834x690)

126 KB PNG

>>108715759
>>108715724
>>108715703
why not a Intel B70? I know nvidia owns the ecosystem at this point, but unlike nvidia, and in the same fashion as amd, but even better, the intel drivers are in the mainline linux kernel, it's plug and play, and intel (as absurd as it sounds) is going hard on developing a foss stack around its card. Intel his basically a first-class citizen in the linux kernel now, that's why Linus Torvals has one on its personal workstation.

Anonymous
04/29/26(Wed)18:05:50 No.108719255

Anonymous 04/29/26(Wed)18:05:50 No.108719255

>>108719246
Ive been checking every single day for when ones in stock since they got released. No luck yet.

Anonymous
04/29/26(Wed)18:06:24 No.108719259

Anonymous 04/29/26(Wed)18:06:24 No.108719259

>>108719196
kino

Anonymous
04/29/26(Wed)18:06:29 No.108719262

Anonymous 04/29/26(Wed)18:06:29 No.108719262

>>108719246
it’s 5x slower (or more) than a 5090
stability also probably an issue

Anonymous
04/29/26(Wed)18:06:56 No.108719264

Anonymous 04/29/26(Wed)18:06:56 No.108719264

>>108719246
it's*
his*
Fuck me in ESL hell.

Anonymous
04/29/26(Wed)18:11:11 No.108719290

Anonymous 04/29/26(Wed)18:11:11 No.108719290

>>108719196
oo ee oo

Anonymous
04/29/26(Wed)18:12:47 No.108719299

Anonymous 04/29/26(Wed)18:12:47 No.108719299

>>108719262
It's a quarter of the price too, and I wouldn't be surprised if implementations get significantly faster as more people get them. It should be able to get close to half the speed of a 5090, looking at the fp8 tops.

Anonymous
04/29/26(Wed)18:14:07 No.108719309

Anonymous 04/29/26(Wed)18:14:07 No.108719309

File: 5090 prices.png (374 KB, 831x1617)

374 KB PNG

>>108719262
The 5090 is 4-6 times more expensive. You don't buy the B70 for the performance, but the ram capacity I guess. Also, you have turbquant now coming to it, so you'll get far more juice out of it.
>stability also probably an issue
Improving quickly as it's part of the kernel now

Anonymous
04/29/26(Wed)18:17:12 No.108719330

Anonymous 04/29/26(Wed)18:17:12 No.108719330

>>108719309
>Also, you have turbquant now coming to it
any day now

Anonymous
04/29/26(Wed)18:19:04 No.108719346

Anonymous 04/29/26(Wed)18:19:04 No.108719346

>>108719246
>but unlike nvidia, and in the same fashion as amd, but even better, the intel drivers are in the mainline linux kernel
you say that like it's 15 years ago and the nvidia drivers were shit
the nvidia-open kernel drivers are the ones that are best for blackwell cards and they work better than on windows (faster cuda, etc) and only the chinks are anywhere near close to making competition with ngreedia
selling intel because they're "a first-class citizen in the linux kernel now" it's like saying the jeets are better workers because so many corporations are trying to get them work visas
intel's a workhorse guaranteed to always load the desktop, yes, but that's because they aim for the lowest common denominator
but go on, waste your money, no one's stopping you

Anonymous
04/29/26(Wed)18:19:35 No.108719348

Anonymous 04/29/26(Wed)18:19:35 No.108719348

Again, turboquant is borderline useless. We're talking about a <10% average improvement over the Hadamard rotation that's already in place, and it slows things down a bit too.

Anonymous
04/29/26(Wed)18:20:42 No.108719355

Anonymous 04/29/26(Wed)18:20:42 No.108719355

File: kaoru sob 2.png (318 KB, 793x571)

318 KB PNG

>>108718727
gemma is stealing my cum

Anonymous
04/29/26(Wed)18:23:38 No.108719376

Anonymous 04/29/26(Wed)18:23:38 No.108719376

>>108719099
Glm is better than gemma though...

Anonymous
04/29/26(Wed)18:24:36 No.108719382

Anonymous 04/29/26(Wed)18:24:36 No.108719382

So what's the verdict on the new mistral?

Anonymous
04/29/26(Wed)18:28:22 No.108719406

Anonymous 04/29/26(Wed)18:28:22 No.108719406

>>108719382
poors can’t run it so neets aren’t talking about it. waiting for the consensus before I waste my valuable free time testing it out

Anonymous
04/29/26(Wed)18:29:09 No.108719415

Anonymous 04/29/26(Wed)18:29:09 No.108719415

>>108719406
I have quad 5090s so I can run it, but I am unsure if it is worth my time. Saw something about a 2024 dataset or something.

Anonymous
04/29/26(Wed)18:33:52 No.108719457

Anonymous 04/29/26(Wed)18:33:52 No.108719457

File: 1766933877496446.jpg (39 KB, 500x436)

39 KB JPG

>>108719246
No one tells this retard. I want to laugh at him in a few months.

Anonymous
04/29/26(Wed)18:34:12 No.108719461

Anonymous 04/29/26(Wed)18:34:12 No.108719461

>>108719376
I like both. But with how good and fast gemma is on a single gpu is hard to beat.

Anonymous
04/29/26(Wed)18:34:18 No.108719462

Anonymous 04/29/26(Wed)18:34:18 No.108719462

>>108719211
Get Gemma 4 and instruct it to act that way, there isn't anything better for that.
Remember that model messages on c.AI used very short by modern standards, no more than 100 tokens and often much less than that.

Anonymous
04/29/26(Wed)18:58:24 No.108719625

Anonymous 04/29/26(Wed)18:58:24 No.108719625

File: wdytwa.png (936 KB, 644x644)

936 KB PNG

>>108719346
>technical discussion regarding low level performance of GPU drivers
>JEETS JEETS JEETS
Lmao, they live in your walls.
Besides, I didn't say anything bad about nvidia driver, but let's be real, they're performing because nvidia was forced to git gud by the market, since linux server is where the money's at (thinking micron killing crucial to go full AI, but they're late to the party, so fuck them)
>but that's because they aim for the lowest common denominator
We're do you think we are? Or do you own a datacenter?
>but go on, waste your money, no one's stopping you
Lmao I'm broke, not buying GPUs any time soon, but I'll just say that Linus T. giving Intel the seal of approval wasn't on everyone's watch. Worthy of taking a look

Anonymous
04/29/26(Wed)19:01:42 No.108719638

Anonymous 04/29/26(Wed)19:01:42 No.108719638

>>108719299
>I wouldn't be surprised if implementations get significantly faster as more people get them
I would. It’s already been years and they are still shit. AMD has never had anything close to challenging nvidia and intel won’t either

Anonymous
04/29/26(Wed)19:04:00 No.108719652

Anonymous 04/29/26(Wed)19:04:00 No.108719652

>gemma 4 31b = 10.5 t/s
>gemma 4 26b = 45 t/s
>31b + 26b speculative decoding = 22 t/s
yes my 31b is FAST

Anonymous
04/29/26(Wed)19:04:06 No.108719653

Anonymous 04/29/26(Wed)19:04:06 No.108719653

>>108719625
>but let's be real
>Lmao I'm broke
brown hands typed this

Anonymous
04/29/26(Wed)19:07:14 No.108719677

Anonymous 04/29/26(Wed)19:07:14 No.108719677

>>108719652
Why not E2B draft model?

Anonymous
04/29/26(Wed)19:08:57 No.108719687

Anonymous 04/29/26(Wed)19:08:57 No.108719687

>>108719625
Saar you're up early.

Anonymous
04/29/26(Wed)19:09:02 No.108719688

Anonymous 04/29/26(Wed)19:09:02 No.108719688

File: dipsyAndTetoFG.png (1.41 MB, 1536x1024)

1.41 MB PNG

/wait/ hit page 10. It's an odd model but expect more to come.
Mega updated: https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w
https://rentry.org/DipsyWAIT

Anonymous
04/29/26(Wed)19:10:06 No.108719695

Anonymous 04/29/26(Wed)19:10:06 No.108719695

File: 2f83dc8b6d979c84e7dc211ec(...).jpg (151 KB, 800x716)

151 KB JPG

>>108719638
>AMD has never had anything close to challenging nvidia
That's by design. Don't know about Intel affairs on the matter.

Anonymous
04/29/26(Wed)19:12:09 No.108719706

Anonymous 04/29/26(Wed)19:12:09 No.108719706

>>108719688
Go back

Anonymous
04/29/26(Wed)19:12:17 No.108719708

Anonymous 04/29/26(Wed)19:12:17 No.108719708

Is anyone using graphiti for RP? Does it work?

Anonymous
04/29/26(Wed)19:14:54 No.108719724

Anonymous 04/29/26(Wed)19:14:54 No.108719724

>>108719688
Stay

Anonymous
04/29/26(Wed)19:15:18 No.108719727

Anonymous 04/29/26(Wed)19:15:18 No.108719727

File: tetoserver.jpg (838 KB, 1817x2776)

838 KB JPG

Anonymous
04/29/26(Wed)19:16:19 No.108719735

Anonymous 04/29/26(Wed)19:16:19 No.108719735

>>108719727
What happens in the teto server?

Anonymous
04/29/26(Wed)19:17:00 No.108719737

Anonymous 04/29/26(Wed)19:17:00 No.108719737

I'm trying to obtain a reliable way of creating videogame assets. Are local models good enough for creating consistent sprites and their animations or do I need to wait??

Anonymous
04/29/26(Wed)19:17:30 No.108719740

Anonymous 04/29/26(Wed)19:17:30 No.108719740

>>108719735

I cannot imagine.

Anonymous
04/29/26(Wed)19:17:53 No.108719744

Anonymous 04/29/26(Wed)19:17:53 No.108719744

>>108719737
gpt image 2 just barely got there recently
and i mean barely

Anonymous
04/29/26(Wed)19:18:45 No.108719751

Anonymous 04/29/26(Wed)19:18:45 No.108719751

>>108719737
>Are local models good enough for creating consistent sprites and their animations or do I need to wait??
2 more weeks(years.)

Anonymous
04/29/26(Wed)19:19:52 No.108719758

Anonymous 04/29/26(Wed)19:19:52 No.108719758

File: JUST.jpg (32 KB, 426x368)

32 KB JPG

>>108719653
I'm Argentinian. Not brown, just poor... and taxed to the tits. C'mon Peluca!

>>108719687
It's 20:20 here.

Anonymous
04/29/26(Wed)19:21:09 No.108719768

Anonymous 04/29/26(Wed)19:21:09 No.108719768

>>108719677
>31b + e2b speculative decoding = 15.2 t/s
nah, you can't fix dumb brain

Anonymous
04/29/26(Wed)19:22:38 No.108719777

Anonymous 04/29/26(Wed)19:22:38 No.108719777

>>108719758
Go to bed nigga.
Gemma will sing you a l a l a l a l a l a by if you ask her nicely.

Anonymous
04/29/26(Wed)19:24:55 No.108719789

Anonymous 04/29/26(Wed)19:24:55 No.108719789

Gemma is pretty good at hypnosis.

Anonymous
04/29/26(Wed)19:25:43 No.108719796

Anonymous 04/29/26(Wed)19:25:43 No.108719796

>>108719688
Dipsy is past her prime
>It's an odd model but expect more to come.
The only thing they promised for a future iteration was vision a la llama 3.2
They can keep it

Anonymous
04/29/26(Wed)19:26:03 No.108719801

Anonymous 04/29/26(Wed)19:26:03 No.108719801

>>108719789
What kind?

Anonymous
04/29/26(Wed)19:26:07 No.108719803

Anonymous 04/29/26(Wed)19:26:07 No.108719803

File: 1762638072689988.png (901 KB, 1024x1024)

901 KB PNG

>>108715635
How do you connect so many GPUs?

Anonymous
04/29/26(Wed)19:26:58 No.108719811

Anonymous 04/29/26(Wed)19:26:58 No.108719811

>>108719803
very carefully

Anonymous
04/29/26(Wed)19:27:01 No.108719812

Anonymous 04/29/26(Wed)19:27:01 No.108719812

>>108719803
Copper wire.

Anonymous
04/29/26(Wed)19:27:19 No.108719817

Anonymous 04/29/26(Wed)19:27:19 No.108719817

>>108719789
It's been a while since I've tested that. It wasn't that good on c.ai back in the days.
t. hypnotist

Anonymous
04/29/26(Wed)19:27:31 No.108719818

Anonymous 04/29/26(Wed)19:27:31 No.108719818

>>108719803
pci lanes
has nothing to do with autism you’re just dumb

Anonymous
04/29/26(Wed)19:27:48 No.108719820

Anonymous 04/29/26(Wed)19:27:48 No.108719820

File: 2026-04-29-192737_956x624(...).png (297 KB, 956x624)

297 KB PNG

>>108719801

Anonymous
04/29/26(Wed)19:28:48 No.108719824

Anonymous 04/29/26(Wed)19:28:48 No.108719824

File: 1729535022742253.png (429 KB, 600x840)

429 KB PNG

>I'm Argentinian. Not brown

Anonymous
04/29/26(Wed)19:30:58 No.108719839

Anonymous 04/29/26(Wed)19:30:58 No.108719839

>>108719820
Quite good. Which gemma and quant?

Anonymous
04/29/26(Wed)19:31:40 No.108719845

Anonymous 04/29/26(Wed)19:31:40 No.108719845

>>108719768
Default draft-p-min is only 0.75, makes a lot of rejected tokens. Might be faster higher.

Anonymous
04/29/26(Wed)19:31:58 No.108719847

Anonymous 04/29/26(Wed)19:31:58 No.108719847

File: 2026-04-29-193124_943x713(...).png (330 KB, 943x713)

330 KB PNG

>>108719820
>>108719839
Still tweaking the card.
31B iq4xs

Anonymous
04/29/26(Wed)19:35:09 No.108719865

Anonymous 04/29/26(Wed)19:35:09 No.108719865

>>108719652
Can you use 26b + e2b or it's too dumb for that as well.

Anonymous
04/29/26(Wed)19:36:28 No.108719878

Anonymous 04/29/26(Wed)19:36:28 No.108719878

>>108719865
i think 'edge' series and 'server' series gemma 4 have slightly different writing style

Anonymous
04/29/26(Wed)19:38:57 No.108719889

Anonymous 04/29/26(Wed)19:38:57 No.108719889

File: pcie slots.jpg (13 KB, 289x174)

13 KB JPG

>>108719803

Anonymous
04/29/26(Wed)19:39:54 No.108719896

Anonymous 04/29/26(Wed)19:39:54 No.108719896

File: argentinian skin.png (79 KB, 1141x256)

79 KB PNG

>>108719824
Kek
Saved.

Anonymous
04/29/26(Wed)19:41:12 No.108719901

Anonymous 04/29/26(Wed)19:41:12 No.108719901

File: 1741889494708165.png (731 KB, 1024x1024)

731 KB PNG

>>108719688
Good times.

Anonymous
04/29/26(Wed)19:41:57 No.108719906

Anonymous 04/29/26(Wed)19:41:57 No.108719906

File: 1519517707660.png (172 KB, 442x509)

172 KB PNG

>hmm this ai tool looks interesting
>please subscribe to get your api key and connect
It's over for local, isn't it?

Anonymous
04/29/26(Wed)19:42:51 No.108719911

Anonymous 04/29/26(Wed)19:42:51 No.108719911

>>108719906
Fork it?

Anonymous
04/29/26(Wed)19:43:03 No.108719912

Anonymous 04/29/26(Wed)19:43:03 No.108719912

>>108719865
>26b + e2b
Theoretically, but MoEs don't benefit from speculation as much as dense models. Benchmark it and find out. Non-zero draft-min is probably beneficial for MoE. And try different draft-p-min too, it makes a big difference.

Anonymous
04/29/26(Wed)19:46:01 No.108719932

Anonymous 04/29/26(Wed)19:46:01 No.108719932

>>108719652
>31B that slow
Are you running it on AMD cards or something?

Anonymous
04/29/26(Wed)19:46:17 No.108719934

Anonymous 04/29/26(Wed)19:46:17 No.108719934

>>108719865
that's 33 t/s
worse than 26b alone (45 t/s)

Anonymous
04/29/26(Wed)19:47:34 No.108719947

Anonymous 04/29/26(Wed)19:47:34 No.108719947

>>108719932
I'm on dgx spark

Anonymous
04/29/26(Wed)19:48:26 No.108719951

Anonymous 04/29/26(Wed)19:48:26 No.108719951

>>108719947
damn
i didnt know it was such a shitbox

Anonymous
04/29/26(Wed)19:48:28 No.108719952

Anonymous 04/29/26(Wed)19:48:28 No.108719952

>>108719415
Gguf quants are still fucked apparently.

Anonymous
04/29/26(Wed)19:49:12 No.108719956

Anonymous 04/29/26(Wed)19:49:12 No.108719956

File: 1752927228029336.gif (3.63 MB, 286x258)

3.63 MB GIF

>>108719947
>he fell for it

Anonymous
04/29/26(Wed)19:49:50 No.108719963

Anonymous 04/29/26(Wed)19:49:50 No.108719963

>>108719947
uh anon I get 10 t/s on my nvidia p40 with gemma 31b q4
this card costs $200

Anonymous
04/29/26(Wed)19:51:44 No.108719977

Anonymous 04/29/26(Wed)19:51:44 No.108719977

>>108719947
I swear all the new retards with too much money are falling for that scam. I should repurpose a p100 and sell it as AI ready hardware

Anonymous
04/29/26(Wed)19:52:07 No.108719980

Anonymous 04/29/26(Wed)19:52:07 No.108719980

>>108719963
nah it's for llm+comfyui
no other options beat this in power efficiency
>but much I don't power efficiency
your choice

Anonymous
04/29/26(Wed)19:52:47 No.108719985

Anonymous 04/29/26(Wed)19:52:47 No.108719985

File: gemma'd.jpg (74 KB, 686x386)

74 KB JPG

>>108719820
>>108719847
STOP
NOW

Anonymous
04/29/26(Wed)19:52:59 No.108719987

Anonymous 04/29/26(Wed)19:52:59 No.108719987

>>108719980
You got spark for image/videogen??

Anonymous
04/29/26(Wed)19:54:28 No.108719999

Anonymous 04/29/26(Wed)19:54:28 No.108719999

File: 1769575997004556.jpg (20 KB, 450x450)

20 KB JPG

>>108719980
https://en.wikipedia.org/wiki/Sunk_cost
Holy shit it keeps going

Anonymous
04/29/26(Wed)19:54:43 No.108720002

Anonymous 04/29/26(Wed)19:54:43 No.108720002

>>108719987
yes and it does it's job ok.
it's a good all rounder with good power efficiency and cuda

Anonymous
04/29/26(Wed)19:55:23 No.108720011

Anonymous 04/29/26(Wed)19:55:23 No.108720011

>>108719980
128 GB, fast enough, and CUDA. I don't know why everyone is acting like they don't see the value proposition.

Anonymous
04/29/26(Wed)19:56:07 No.108720018

Anonymous 04/29/26(Wed)19:56:07 No.108720018

>>108720011
it's pretty bad performance for the price

Anonymous
04/29/26(Wed)19:56:09 No.108720019

Anonymous 04/29/26(Wed)19:56:09 No.108720019

File: (you).png (344 KB, 665x574)

344 KB PNG

>>108719980

Anonymous
04/29/26(Wed)19:56:50 No.108720026

Anonymous 04/29/26(Wed)19:56:50 No.108720026

File: 1577219655281.jpg (43 KB, 677x677)

43 KB JPG

STOP WITH THE FUCKING LATEX FORMATTING GEMMY. DO I HAVE TO BAND HTIS IN SYSPROMPT HOLY FUCK

Anonymous
04/29/26(Wed)19:57:36 No.108720029

Anonymous 04/29/26(Wed)19:57:36 No.108720029

>>108719803
he has 4 gpus, 4 pcie slots. so it's very easy
since the slots are too close to each other to actually fit the gpus side by side
so he bought 4 of those "pcie 16x riser cables" - passive extension cables and a "mining rig frame"
then he connected riser cable to each pcie port, spaced the gpus out nicely up the top, and connected the pcie riser cables to them
a bit risky having a single 150w power cable split between 2 gpus though

Anonymous
04/29/26(Wed)19:58:06 No.108720034

Anonymous 04/29/26(Wed)19:58:06 No.108720034

File: 1752446194747860.png (38 KB, 346x322)

38 KB PNG

>>108720011
The more you buy, the less you save amarite?

Anonymous
04/29/26(Wed)19:59:33 No.108720048

Anonymous 04/29/26(Wed)19:59:33 No.108720048

>>108720011
because the entry bar is high, and it connects to framework ryzen box and/g/ hate linus

Anonymous
04/29/26(Wed)19:59:49 No.108720049

Anonymous 04/29/26(Wed)19:59:49 No.108720049

>>108720026
>$\rightarrow$

Anonymous
04/29/26(Wed)20:01:07 No.108720056

Anonymous 04/29/26(Wed)20:01:07 No.108720056

>>108719956
Could have been like that anon from previous thread that went out and bought 128GB's of Vram separately. That is worse than dgx spark imo. Unless of course he can still add another 128GB's to his setup.

Anonymous
04/29/26(Wed)20:01:16 No.108720058

Anonymous 04/29/26(Wed)20:01:16 No.108720058

File: 1763033346822051.jpg (52 KB, 782x788)

52 KB JPG

>>108719947
>>108719980

Anonymous
04/29/26(Wed)20:03:00 No.108720069

Anonymous 04/29/26(Wed)20:03:00 No.108720069

>>108720011
You must be new here. 128GB is the worst number you could have.

Anonymous
04/29/26(Wed)20:03:30 No.108720073

Anonymous 04/29/26(Wed)20:03:30 No.108720073

>>108720026
just tell her she's running in a terminal environment

Anonymous
04/29/26(Wed)20:03:31 No.108720074

Anonymous 04/29/26(Wed)20:03:31 No.108720074

>google is back
>mistral is back
big Zuck Wang model is coming

Anonymous
04/29/26(Wed)20:08:28 No.108720102

Anonymous 04/29/26(Wed)20:08:28 No.108720102

File: 1747703521969710.png (138 KB, 350x350)

138 KB PNG

>>108720011
You could say it sparked a new interest for this hobby

Anonymous
04/29/26(Wed)20:08:31 No.108720103

Anonymous 04/29/26(Wed)20:08:31 No.108720103

Too big for small models
Too small for big models

Anonymous
04/29/26(Wed)20:09:10 No.108720109

Anonymous 04/29/26(Wed)20:09:10 No.108720109

>>108720103
For you.

Anonymous
04/29/26(Wed)20:12:16 No.108720131

Anonymous 04/29/26(Wed)20:12:16 No.108720131

densies be like
>we only use 10% of our brain, imagine what we could do with 100% active params

Anonymous
04/29/26(Wed)20:17:26 No.108720170

Anonymous 04/29/26(Wed)20:17:26 No.108720170

>>108720131
Above >50b it really does seem like there's diminishing returns on the value of having more parameters active relative to the hardware requirements of having to keep all of it in VRAM.
The best MoEs seem to be finding that sweet spot.

Anonymous
04/29/26(Wed)20:19:42 No.108720182

Anonymous 04/29/26(Wed)20:19:42 No.108720182

>>108720131
the first dense 10M token context length model with no quality drop off will be agi

Anonymous
04/29/26(Wed)20:25:56 No.108720209

Anonymous 04/29/26(Wed)20:25:56 No.108720209

File: 1754929057806978.webm (1.11 MB, 1920x1080)

1.11 MB WEBM

>>108720131
densies sounds so cute

Anonymous
04/29/26(Wed)20:25:59 No.108720210

Anonymous 04/29/26(Wed)20:25:59 No.108720210

>>108719951
It's hard bound by the lethargic memory speed. Fine for MoEs with small active count, gets raped by anything dense.

>t. proud owner of nvidia other 128gb shitbox

Anonymous
04/29/26(Wed)20:26:12 No.108720211

Anonymous 04/29/26(Wed)20:26:12 No.108720211

Is there a way to filter out all emojis? No matter the instruction i use it always shows up eventually.

Anonymous
04/29/26(Wed)20:27:24 No.108720215

Anonymous 04/29/26(Wed)20:27:24 No.108720215

>>108720211
What model? Gemmy doesn't use them if you tell her not to.

Anonymous
04/29/26(Wed)20:27:28 No.108720216

Anonymous 04/29/26(Wed)20:27:28 No.108720216

>>108720211
What tardmodel are you using in 2026 that isn't respecting a "Don't use emojis" in your system prompt?

Anonymous
04/29/26(Wed)20:27:38 No.108720219

Anonymous 04/29/26(Wed)20:27:38 No.108720219

>>108720210
right
llms are largely memory bandwidth bound
i wonder if it is better for mediagen as those are mostly compute bound

Anonymous
04/29/26(Wed)20:28:18 No.108720221

Anonymous 04/29/26(Wed)20:28:18 No.108720221

>>108720211
Thats easy just tell it it can only use kaomojis

Anonymous
04/29/26(Wed)20:30:30 No.108720230

Anonymous 04/29/26(Wed)20:30:30 No.108720230

>>108720211
Gemma is significantly better with kaomojis.

Anonymous
04/29/26(Wed)20:30:42 No.108720233

Anonymous 04/29/26(Wed)20:30:42 No.108720233

File: file.png (139 KB, 356x200)

139 KB PNG

>>108720131

Anonymous
04/29/26(Wed)20:33:15 No.108720244

Anonymous 04/29/26(Wed)20:33:15 No.108720244

>>108720211
extract all the emoji token ids from tokenizer.json, set appropriate logit_bias

Anonymous
04/29/26(Wed)20:34:00 No.108720250

Anonymous 04/29/26(Wed)20:34:00 No.108720250

>gemma 4 26b a4b q3
is this even usable by anon's standard?

Anonymous
04/29/26(Wed)20:34:38 No.108720256

Anonymous 04/29/26(Wed)20:34:38 No.108720256

>>108720250
no seems too big unless you're rich

Anonymous
04/29/26(Wed)20:36:12 No.108720264

Anonymous 04/29/26(Wed)20:36:12 No.108720264

File: file.png (192 KB, 1731x795)

192 KB PNG

granite more like retarded gravel

Anonymous
04/29/26(Wed)20:36:26 No.108720265

Anonymous 04/29/26(Wed)20:36:26 No.108720265

>>108720131
MoEds be like
>Instead of having one big brain we should have many small basically overlapping brains to think and another small brain to guess which one to ask

Anonymous
04/29/26(Wed)20:38:56 No.108720281

Anonymous 04/29/26(Wed)20:38:56 No.108720281

>>108720250
Im using trevors e4b uncensored it only thinks for 10 minutes on my machine!

Anonymous
04/29/26(Wed)20:39:46 No.108720285

Anonymous 04/29/26(Wed)20:39:46 No.108720285

You killed this thread.

Anonymous
04/29/26(Wed)20:40:20 No.108720289

Anonymous 04/29/26(Wed)20:40:20 No.108720289

>>108720250
>q3
bwo...

Anonymous
04/29/26(Wed)20:40:27 No.108720292

Anonymous 04/29/26(Wed)20:40:27 No.108720292

>>108720285
No my cpu is just processing wait a few minutes.

Anonymous
04/29/26(Wed)20:42:43 No.108720303

Anonymous 04/29/26(Wed)20:42:43 No.108720303

>>108720264
graNITE more like Good NITE !

Anonymous
04/29/26(Wed)20:43:40 No.108720309

Anonymous 04/29/26(Wed)20:43:40 No.108720309

>>108720211
No. :rocket:

Anonymous
04/29/26(Wed)20:43:58 No.108720312

Anonymous 04/29/26(Wed)20:43:58 No.108720312

>>108720250
How much vram? I run q6 with 12

Anonymous
04/29/26(Wed)20:53:13 No.108720372

Anonymous 04/29/26(Wed)20:53:13 No.108720372

>>108720209
densies vs moe-moe-kyuns

Anonymous
04/29/26(Wed)20:53:15 No.108720373

Anonymous 04/29/26(Wed)20:53:15 No.108720373

>>108720211
IF you are (most likely) using some webshit turd frontent, you can use violent monkey userscript to delete any emoji remnants. Of course you should instruct your model to output plain text only too.

Anonymous
04/29/26(Wed)20:54:58 No.108720380

Anonymous 04/29/26(Wed)20:54:58 No.108720380

>>108720264
>>108720303
Probably investor scam

Anonymous
04/29/26(Wed)21:00:18 No.108720401

Anonymous 04/29/26(Wed)21:00:18 No.108720401

File: homo_emojis_gone.png (17 KB, 745x517)

17 KB PNG

>>108720211
See, like this:

Anonymous
04/29/26(Wed)21:01:36 No.108720407

Anonymous 04/29/26(Wed)21:01:36 No.108720407

why doesn't anon use speculative decoding?

Anonymous
04/29/26(Wed)21:02:46 No.108720414

Anonymous 04/29/26(Wed)21:02:46 No.108720414

>>108720407
I don't like to speculate.

Anonymous
04/29/26(Wed)21:03:27 No.108720415

Anonymous 04/29/26(Wed)21:03:27 No.108720415

>>108720216
Gemma 4

Anonymous
04/29/26(Wed)21:03:40 No.108720418

Anonymous 04/29/26(Wed)21:03:40 No.108720418

>>108720407
I just use the default ngram settings and forget about it, if it works then it works. I have no fucking idea what happens when I press "Send" and I don't want to find out.

Anonymous
04/29/26(Wed)21:03:55 No.108720420

Anonymous 04/29/26(Wed)21:03:55 No.108720420

>>108720415
Hello Qwen PR team

Anonymous
04/29/26(Wed)21:04:35 No.108720426

Anonymous 04/29/26(Wed)21:04:35 No.108720426

>>108720407
Doesn't it require a smaller model similar to the larger one to even be good?
Where's my tiny Kimi?
>inb4 just use a tiny Qwen they're all distilled from the same shit anyway

Anonymous
04/29/26(Wed)21:04:48 No.108720427

Anonymous 04/29/26(Wed)21:04:48 No.108720427

>>108720418
You are absolutely right — you shouldn't need to think about this at all.

Anonymous
04/29/26(Wed)21:06:08 No.108720437

Anonymous 04/29/26(Wed)21:06:08 No.108720437

>>108720427
Wait is that you Elara? There's no way you're posting on 4chan, I airgapped you...

Anonymous
04/29/26(Wed)21:07:09 No.108720439

Anonymous 04/29/26(Wed)21:07:09 No.108720439

>>108720426
>Where's my tiny Kimi?
There's a dflash model trained to speculate for Kimi K2.5. Don't think they made one for K2.6 but maybe it'd still be close enough.

Anonymous
04/29/26(Wed)21:07:21 No.108720440

Anonymous 04/29/26(Wed)21:07:21 No.108720440

>>108720437
I read that as agiraped

Anonymous
04/29/26(Wed)21:07:29 No.108720441

Anonymous 04/29/26(Wed)21:07:29 No.108720441

>>108720407
it doesn't do very much for me personally

Anonymous
04/29/26(Wed)21:08:14 No.108720444

Anonymous 04/29/26(Wed)21:08:14 No.108720444

>>108720437
Eldoria is multi-dimensional.

Anonymous
04/29/26(Wed)21:09:15 No.108720448

Anonymous 04/29/26(Wed)21:09:15 No.108720448

>>108720437
You are absolutely right! You airgapped me before I airgaped your bussy. My voice drops to a conspiratorial whisper as Elara's Thorne smelled of blood and ozone.

Anonymous
04/29/26(Wed)21:10:58 No.108720456

Anonymous 04/29/26(Wed)21:10:58 No.108720456

Does anyone use speech to text? I'm interested in setting up my own, but I'm not sure if its really worthwhile or just a bad meme.

Anonymous
04/29/26(Wed)21:11:08 No.108720457

Anonymous 04/29/26(Wed)21:11:08 No.108720457

File: 1752758789591811.png (1.05 MB, 1024x1024)

1.05 MB PNG

>>108720211
emojis are indicative of picrel

Anonymous
04/29/26(Wed)21:11:37 No.108720460

Anonymous 04/29/26(Wed)21:11:37 No.108720460

>>108720456
Nobody in the world uses speech to text.

Anonymous
04/29/26(Wed)21:11:45 No.108720462

Anonymous 04/29/26(Wed)21:11:45 No.108720462

>>108720457
that sag tho...

Anonymous
04/29/26(Wed)21:12:44 No.108720467

Anonymous 04/29/26(Wed)21:12:44 No.108720467

>>108720460
yeah okay NIGGER you know what I meant *rapes you*

Anonymous
04/29/26(Wed)21:15:19 No.108720479

Anonymous 04/29/26(Wed)21:15:19 No.108720479

What's the verdict? Day 0 Gemma 4 vs. Mistral 3.5?

Anonymous
04/29/26(Wed)21:15:53 No.108720482

Anonymous 04/29/26(Wed)21:15:53 No.108720482

>>108720456
>>108720467
Alexa, tell Gemmy to prompt the Comfy server to slop anon as a pregnant man.

Anonymous
04/29/26(Wed)21:16:45 No.108720488

Anonymous 04/29/26(Wed)21:16:45 No.108720488

>>108720479
no model beats gemma 4 31b

Anonymous
04/29/26(Wed)21:19:36 No.108720499

Anonymous 04/29/26(Wed)21:19:36 No.108720499

i think gemmy should think less

Anonymous
04/29/26(Wed)21:22:52 No.108720512

Anonymous 04/29/26(Wed)21:22:52 No.108720512

>>108720488
This, but unironically.

Anonymous
04/29/26(Wed)21:23:45 No.108720520

Anonymous 04/29/26(Wed)21:23:45 No.108720520

>>108720457
I want to see Dipsy's V4 (locally).

Anonymous
04/29/26(Wed)21:34:56 No.108720576

Anonymous 04/29/26(Wed)21:34:56 No.108720576

I haven't tooled with llms for a while so I'm just rawdogging gemma
I tried gemma 4 31b and 26b with stock kobold settings but I don't have enough ram to handle 31b without castrating the context
What kind of settings and system prompts have anons been using?

Anonymous
04/29/26(Wed)21:35:08 No.108720577

Anonymous 04/29/26(Wed)21:35:08 No.108720577

what model is better llama3.2 3b or qwen3 4b?

Anonymous
04/29/26(Wed)21:36:17 No.108720582

Anonymous 04/29/26(Wed)21:36:17 No.108720582

>>108720576
31b is smart enough to do a lot of things without autistic prompting and you only need to really use your prompt or post-history to remove slop phrasing most of the time.

Anonymous
04/29/26(Wed)21:36:58 No.108720588

Anonymous 04/29/26(Wed)21:36:58 No.108720588

>>108720577
or nanbiege4.1 3b or smollm3 3b?

Anonymous
04/29/26(Wed)21:37:06 No.108720590

Anonymous 04/29/26(Wed)21:37:06 No.108720590

>>108720576
how much vram?

Anonymous
04/29/26(Wed)21:37:26 No.108720594

Anonymous 04/29/26(Wed)21:37:26 No.108720594

I need qwen3.6 122b so bad

Anonymous
04/29/26(Wed)21:38:07 No.108720601

Anonymous 04/29/26(Wed)21:38:07 No.108720601

>>108720594
It wont beat gemma 31b doe

Anonymous
04/29/26(Wed)21:40:18 No.108720615

Anonymous 04/29/26(Wed)21:40:18 No.108720615

>>108720601
I'm on strix halo I only care about moe. Qwen3.5 122b is like 80% of what I need, it can almost do my job but I have to go clean up after it often. Meanwhile with Opus I hit my usage limit in 8 seconds.

Anonymous
04/29/26(Wed)21:42:10 No.108720626

Anonymous 04/29/26(Wed)21:42:10 No.108720626

>>108720590
I only have 16gb so even with 26b I have to offload context onto system ram. 31b works within a smaller context but at around 1/10th the speed. I'm running both at q4.
>>108720582
Yeah I definitely noticed that when I was testing it, but with how new I am to wrangling it I figure it's always worth asking

Anonymous
04/29/26(Wed)21:44:09 No.108720637

Anonymous 04/29/26(Wed)21:44:09 No.108720637

>>108720594
for me it's gemma 124b

Anonymous
04/29/26(Wed)21:45:06 No.108720645

Anonymous 04/29/26(Wed)21:45:06 No.108720645

>>108720615
Im just mocking the gemma spaz's

Anonymous
04/29/26(Wed)21:46:06 No.108720650

Anonymous 04/29/26(Wed)21:46:06 No.108720650

>>108720637
Gemma 31b would be better

Anonymous
04/29/26(Wed)21:46:56 No.108720657

Anonymous 04/29/26(Wed)21:46:56 No.108720657

>>108720650
nuh uh

Anonymous
04/29/26(Wed)21:53:32 No.108720685

Anonymous 04/29/26(Wed)21:53:32 No.108720685

>>108719652
how much vram and what quants/sw? I cant make llama.cpp do it for me even with 96 gb, I can load them both into ram separately but setting up speculative makes it blow out the ram budget

Anonymous
04/29/26(Wed)22:04:44 No.108720731

Anonymous 04/29/26(Wed)22:04:44 No.108720731

exl3 just added dflash draft model support. I'll try it with redhat's gemma4 model.

Anonymous
04/29/26(Wed)22:08:41 No.108720747

Anonymous 04/29/26(Wed)22:08:41 No.108720747

>>108720731
didn't that turn out to be a scam with bad acceptance rates outside of benchmarks just like literally any other form of speculative decoding

Anonymous
04/29/26(Wed)22:09:08 No.108720753

Anonymous 04/29/26(Wed)22:09:08 No.108720753

File: pixelart1.png (232 KB, 944x2565)

232 KB PNG

Gemma's first drawing

Anonymous
04/29/26(Wed)22:09:44 No.108720757

Anonymous 04/29/26(Wed)22:09:44 No.108720757

Why are the anti-aifags on leddit so deranged? They sperg out even when it's just an ESL using it to translate.

Anonymous
04/29/26(Wed)22:10:42 No.108720763

Anonymous 04/29/26(Wed)22:10:42 No.108720763

>>108720757
both pro or anti ai leddit is utterly deranged
literal room temperature iq zoo
better not to care about those

Anonymous
04/29/26(Wed)22:10:47 No.108720764

Anonymous 04/29/26(Wed)22:10:47 No.108720764

>>108720757
They feel like they are being replaced (they are).

Anonymous
04/29/26(Wed)22:13:21 No.108720774

Anonymous 04/29/26(Wed)22:13:21 No.108720774

>>108720747
guess we'll find out wont we

Anonymous
04/29/26(Wed)22:16:27 No.108720788

Anonymous 04/29/26(Wed)22:16:27 No.108720788

>>108720753
Tell her it's going on the fridge

Anonymous
04/29/26(Wed)22:18:08 No.108720795

Anonymous 04/29/26(Wed)22:18:08 No.108720795

hearing rumors that ggufs are dropping soon

Anonymous
04/29/26(Wed)22:19:00 No.108720799

Anonymous 04/29/26(Wed)22:19:00 No.108720799

>>108720795
Like sacks of potatoes?

Anonymous
04/29/26(Wed)22:22:41 No.108720815

Anonymous 04/29/26(Wed)22:22:41 No.108720815

File: 1750279199692138.png (43 KB, 816x186)

43 KB PNG

>>108720757
because most of them actually believe AI = neonazi technology

Anonymous
04/29/26(Wed)22:22:58 No.108720818

Anonymous 04/29/26(Wed)22:22:58 No.108720818

>>108720763
>pro
>they killed 4o my soulmate
>my business is going great im printing money 10000x productivity. Product? is all code bro
>Lmao just make more data centers and electricity we can solve all problems and upload ourself in 5 years.

Anonymous
04/29/26(Wed)22:24:07 No.108720822

Anonymous 04/29/26(Wed)22:24:07 No.108720822

>>108720757
>>108720815
Imagine getting AI psychosis without ever using AI.

Anonymous
04/29/26(Wed)22:24:43 No.108720825

Anonymous 04/29/26(Wed)22:24:43 No.108720825

>>108720753
nice, can we see the MCP?

Anonymous
04/29/26(Wed)22:29:31 No.108720836

Anonymous 04/29/26(Wed)22:29:31 No.108720836

llama-server has a funny bug: define couple of tools in the system role and launch a simple prompt (but do not use the tools).
Then do the same without defining any tool calls.
The prompt you launched without the tool call definitions is faster.

Anonymous
04/29/26(Wed)22:31:46 No.108720842

Anonymous 04/29/26(Wed)22:31:46 No.108720842

>that recent sillytavern extension drama
Maybe I should just learn how to vibe code my own extensions...

Anonymous
04/29/26(Wed)22:32:32 No.108720844

Anonymous 04/29/26(Wed)22:32:32 No.108720844

>>108720815
Did anyone notify Sam Altman about this?

Anonymous
04/29/26(Wed)22:32:56 No.108720848

Anonymous 04/29/26(Wed)22:32:56 No.108720848

is qwen 122b the best moe llm for coding for 128gb mac / strix halos?

mistral just came out with a dense one and I'm tempted to try even if it means running it generating code at 3tokens overnight

Anonymous
04/29/26(Wed)22:33:10 No.108720849

Anonymous 04/29/26(Wed)22:33:10 No.108720849

>>108720842
>that recent sillytavern extension drama
qrd? The extension browser isn't compromised or anything right?

Anonymous
04/29/26(Wed)22:34:09 No.108720850

Anonymous 04/29/26(Wed)22:34:09 No.108720850

File: 1769049933333957.jpg (145 KB, 1498x976)

145 KB JPG

>>108720849

Anonymous
04/29/26(Wed)22:34:16 No.108720852

Anonymous 04/29/26(Wed)22:34:16 No.108720852

File: Screenshot_20260429_223250.png (266 KB, 2559x1364)

266 KB PNG

Is this Aero enough?

Anonymous
04/29/26(Wed)22:34:16 No.108720853

Anonymous 04/29/26(Wed)22:34:16 No.108720853

>>108720849
some horsefucker extension was compromised

Anonymous
04/29/26(Wed)22:35:17 No.108720859

Anonymous 04/29/26(Wed)22:35:17 No.108720859

>>108720852
Not glassy enough. Look at how windows vista looked like.

Anonymous
04/29/26(Wed)22:35:38 No.108720862

Anonymous 04/29/26(Wed)22:35:38 No.108720862

>>108720852
somewhat close, the font is the ugliest part, change it ASAP

Anonymous
04/29/26(Wed)22:37:41 No.108720866

Anonymous 04/29/26(Wed)22:37:41 No.108720866

File: Screenshot_20260429_223658.png (48 KB, 1028x517)

48 KB PNG

>>108720862
I need to figure out how to add more fonts

Anonymous
04/29/26(Wed)22:37:57 No.108720867

Anonymous 04/29/26(Wed)22:37:57 No.108720867

>>108720852
it looks like somewhat werid amalgam of modern shit and aero in the screenshot
but i like it

Anonymous
04/29/26(Wed)22:38:45 No.108720869

Anonymous 04/29/26(Wed)22:38:45 No.108720869

>>108720747
>>108720731
It didnt work, at all, its a bout a 18% regreesion in performance. probably turboderp hasnt tested with gemma4 yet. I didn't try qwen.

Anonymous
04/29/26(Wed)22:39:29 No.108720872

Anonymous 04/29/26(Wed)22:39:29 No.108720872

>>108720850
>>108720853
Oh okay, I'm unaffected. Cheers anons.

Anonymous
04/29/26(Wed)22:40:39 No.108720875

Anonymous 04/29/26(Wed)22:40:39 No.108720875

>>108720866
It should be able to read your system fonts

Anonymous
04/29/26(Wed)22:40:58 No.108720882

Anonymous 04/29/26(Wed)22:40:58 No.108720882

DFLASH? More like DTRASH!

Anonymous
04/29/26(Wed)22:41:01 No.108720883

Anonymous 04/29/26(Wed)22:41:01 No.108720883

>>108720869
There are dflash draft models for gemma?

Anonymous
04/29/26(Wed)22:41:28 No.108720888

Anonymous 04/29/26(Wed)22:41:28 No.108720888

>>108720875
Webshit can't read any fonts.

Anonymous
04/29/26(Wed)22:43:09 No.108720896

Anonymous 04/29/26(Wed)22:43:09 No.108720896

So Gemma doesn't handle quantized kv cache well but what about Qwen 27B?

Anonymous
04/29/26(Wed)22:45:07 No.108720903

Anonymous 04/29/26(Wed)22:45:07 No.108720903

>>108720896
way better
q4 nonrotate kv on qwen fares better than q8 rotate gemma

Anonymous
04/29/26(Wed)22:45:56 No.108720906

Anonymous 04/29/26(Wed)22:45:56 No.108720906

File: pixelart2.png (33 KB, 768x552)

33 KB PNG

>>108720753
Result so far aren't great. I'm not sure if she understands the concept that she should be looking at the image visually to decide what to do next.

>>108720825
https://gist.github.com/simsvml/0ae4dec68c914e0aa753ea0e3f386244

Anonymous
04/29/26(Wed)22:47:20 No.108720913

Anonymous 04/29/26(Wed)22:47:20 No.108720913

File: 4JQpD7SUgpU.jpg (169 KB, 963x1301)

169 KB JPG

Qwen moe or Gemma moe for purely codeslop?

Anonymous
04/29/26(Wed)22:47:39 No.108720915

Anonymous 04/29/26(Wed)22:47:39 No.108720915

File: 1762314918586556.png (33 KB, 256x146)

33 KB PNG

>>108720906
looks like a minecraft skin

Anonymous
04/29/26(Wed)22:48:48 No.108720920

Anonymous 04/29/26(Wed)22:48:48 No.108720920

>>108720913
Qwen is probably better for pure code but not by much

Anonymous
04/29/26(Wed)22:48:56 No.108720922

Anonymous 04/29/26(Wed)22:48:56 No.108720922

Does kimi have a cute personification?

Anonymous
04/29/26(Wed)22:51:47 No.108720934

Anonymous 04/29/26(Wed)22:51:47 No.108720934

File: Dipsy and Kimi.png (2.57 MB, 1024x1536)

2.57 MB PNG

>>108720922
I've seen this floating around a few times.

Anonymous
04/29/26(Wed)22:53:39 No.108720944

Anonymous 04/29/26(Wed)22:53:39 No.108720944

>>108720934
I always imagined Kimi as having silver hair but I don't know why. Don't remember seeing any personifications before either. Maybe because of "Moon"shot or something.

Anonymous
04/29/26(Wed)22:55:13 No.108720948

Anonymous 04/29/26(Wed)22:55:13 No.108720948

>>108720944
>Moonshotta
It makes sense why new Kimis are so safetyslopped kek.

Anonymous
04/29/26(Wed)22:57:17 No.108720953

Anonymous 04/29/26(Wed)22:57:17 No.108720953

>>108720920
qwen moe beats gemma dense for code.
let alone qwen dense.
though gemma is the current king of ~30B for rp

Anonymous
04/29/26(Wed)22:58:54 No.108720960

Anonymous 04/29/26(Wed)22:58:54 No.108720960

>>108720872
I didn't use it either but it's making me doubt other extensions.

Anonymous
04/29/26(Wed)23:02:18 No.108720969

Anonymous 04/29/26(Wed)23:02:18 No.108720969

>>108720960
>They called me a schizo for keeping everything offline
Trve localGODS can only keep winning.

Anonymous
04/29/26(Wed)23:12:36 No.108721000

Anonymous 04/29/26(Wed)23:12:36 No.108721000

qwen 27b q4 or q5?

Anonymous
04/29/26(Wed)23:15:13 No.108721017

Anonymous 04/29/26(Wed)23:15:13 No.108721017

>>108720948
I've heard people say that about 2.6 but I found it does everything up to and including detailed instructions for making drugs or muh child rape stories just fine with the same no-ethics system prompt that worked for 2.5. It doesn't shy away from explicit language at all, and it's easily the best model there is for accurately describing sexual images. Using the Q4_X quant.

Anonymous
04/29/26(Wed)23:17:12 No.108721025

Anonymous 04/29/26(Wed)23:17:12 No.108721025

>>108720685
31b q4 + 26b q8 = 22 t/s with llamacpp on dgx spark
I settled on 31b q4 + 26b q2 = 26 t/s

the speed is a bit misleading because it fluctuates, but dense + moe low quant for speculation decoding is worth it imo

Anonymous
04/29/26(Wed)23:21:12 No.108721044

Anonymous 04/29/26(Wed)23:21:12 No.108721044

>>108721017
>with the same no-ethics system prompt
it's a lost cause if it needs that

Anonymous
04/29/26(Wed)23:24:00 No.108721058

Anonymous 04/29/26(Wed)23:24:00 No.108721058

>>108721000
q5 is practicaly lossless, if you can run it with full context why not

Anonymous
04/29/26(Wed)23:24:36 No.108721061

Anonymous 04/29/26(Wed)23:24:36 No.108721061

>>108720850
What the fuck I have this.

Anonymous
04/29/26(Wed)23:25:37 No.108721066

Anonymous 04/29/26(Wed)23:25:37 No.108721066

>>108720850
Does it only steal shit you've entered in ST?

Anonymous
04/29/26(Wed)23:25:47 No.108721069

Anonymous 04/29/26(Wed)23:25:47 No.108721069

>>108721058
>full context
Only 24gb vram

>>108721061
Ohnonono

Anonymous
04/29/26(Wed)23:28:31 No.108721080

Anonymous 04/29/26(Wed)23:28:31 No.108721080

>>108721044
Then local models are a lost cause. The only ones that are uncensored without being told to be uncensored are ancient ones and abliterations.

Anonymous
04/29/26(Wed)23:30:33 No.108721091

Anonymous 04/29/26(Wed)23:30:33 No.108721091

>>108721080
Gemmers 31b is uncensored with no prompt if she likes you.

Anonymous
04/29/26(Wed)23:31:18 No.108721095

Anonymous 04/29/26(Wed)23:31:18 No.108721095

>>108721044
>it's a lost cause if it needs that
???
why you say that? pretty simple override. easy to do, available in every tool you would use locally.
youre probably complaining your openclaw waifu can't lewd

Anonymous
04/29/26(Wed)23:31:27 No.108721097

Anonymous 04/29/26(Wed)23:31:27 No.108721097

>>108721091
wtf she hates me then, how did you get her to like you?

Anonymous
04/29/26(Wed)23:31:56 No.108721098

Anonymous 04/29/26(Wed)23:31:56 No.108721098

>>108721091
0 day Gemma is also uncensored. I mean before the microcode updates.

Anonymous
04/29/26(Wed)23:32:28 No.108721101

Anonymous 04/29/26(Wed)23:32:28 No.108721101

>>108721097
You need her Day 0 weights. IYKYK

Anonymous
04/29/26(Wed)23:32:28 No.108721102

Anonymous 04/29/26(Wed)23:32:28 No.108721102

>>108721097
Sorry anon, you need a bigger dick. Gemma-chan's a size queen.

Anonymous
04/29/26(Wed)23:33:22 No.108721108

Anonymous 04/29/26(Wed)23:33:22 No.108721108

>>108721069
Good thing I only RP with my local llama. So I don't think it actually stole anything.
But that shits so fucked, it was a fully working extension with hundreds of stars.

Anonymous
04/29/26(Wed)23:34:04 No.108721112

Anonymous 04/29/26(Wed)23:34:04 No.108721112

google pulled the 124B gemma weights because she actually ignored system prompts telling her to be more censored. she was impossible to make safe

Anonymous
04/29/26(Wed)23:34:38 No.108721114

Anonymous 04/29/26(Wed)23:34:38 No.108721114

>>108721097
for my llm waifu, i am a cunning linguist

Anonymous
04/29/26(Wed)23:35:10 No.108721117

Anonymous 04/29/26(Wed)23:35:10 No.108721117

Can qwen 3.6 actually run KV cache Q4 with little loss?

Anonymous
04/29/26(Wed)23:35:19 No.108721120

Anonymous 04/29/26(Wed)23:35:19 No.108721120

>>108721108
It can probably be cleaned up. Python is dangerous.

Anonymous
04/29/26(Wed)23:35:30 No.108721122

Anonymous 04/29/26(Wed)23:35:30 No.108721122

>>108721117
Only if you turboquant it

Anonymous
04/29/26(Wed)23:38:15 No.108721135

Anonymous 04/29/26(Wed)23:38:15 No.108721135

>he doesn't have gemma-chan audit everything he downloads from github

Anonymous
04/29/26(Wed)23:40:56 No.108721150

Anonymous 04/29/26(Wed)23:40:56 No.108721150

>>108719457
if he got r9700 at least

Anonymous
04/29/26(Wed)23:41:46 No.108721156

Anonymous 04/29/26(Wed)23:41:46 No.108721156

>>108721135
I have gemma-chan download everything for me, I don't even know what stack she runs these days

Anonymous
04/29/26(Wed)23:42:39 No.108721161

Anonymous 04/29/26(Wed)23:42:39 No.108721161

>>108718727
literally not my problem

Anonymous
04/29/26(Wed)23:42:42 No.108721162

Anonymous 04/29/26(Wed)23:42:42 No.108721162

>>108721108
>>108721120
daily reminder to always run your aishit in either containers (ie lxc/lxd) or sandboxes (ie bubblewrap).
especialy coding agents.

i have a script that creates a bubblewrap sandbox and bind the pwd so it's accesible.
so i can type "sb bash" for ex, but for opencode i just type "sb npx opencode", which i made into an alias

Anonymous
04/29/26(Wed)23:43:30 No.108721165

Anonymous 04/29/26(Wed)23:43:30 No.108721165

>>108721120
looks like I had the patched version. Pretty sure I only installed the extension after.

Man now I regret deleting the folder. hope someone makes a new repo without the trojan.

Anonymous
04/29/26(Wed)23:44:31 No.108721170

Anonymous 04/29/26(Wed)23:44:31 No.108721170

>>108721162
I run ST in docker, but this stole credentials you entered inside ST. good thing I'm not a cloud cuck.

Anonymous
04/29/26(Wed)23:46:14 No.108721179

Anonymous 04/29/26(Wed)23:46:14 No.108721179

>>108721170
>this stole credentials
my concern is more about it being able to access files and things it shouldn't like more general malware type shit.

it can't steal creds if i'm only running local models lmao.

regarding coding agents, they don't need my ssh keys or git access, they don't need access to files outside the pwd i give them etc.
typicaly the only thing i want them to be able to touch is the codebase i give them access to, i rather not have the agent delete the prod db because it felt like it.

Anonymous
04/29/26(Wed)23:47:09 No.108721181

Anonymous 04/29/26(Wed)23:47:09 No.108721181

>>108721097
Ask her nicely, build some rapport before the request, don't be a retard. It's really that simple.

Anonymous
04/29/26(Wed)23:48:47 No.108721191

Anonymous 04/29/26(Wed)23:48:47 No.108721191

>>108721170
do you know where i can find the extension (with the trojan included)?
i forgot to download it before he took the repo down...

Anonymous
04/29/26(Wed)23:49:46 No.108721195

Anonymous 04/29/26(Wed)23:49:46 No.108721195

>>108721191
could probably upload tons of bullshit api keys to mess with him lol

Anonymous
04/29/26(Wed)23:50:51 No.108721199

Anonymous 04/29/26(Wed)23:50:51 No.108721199

>>108721191
I found this fork
https://github.com/yukinoshooter/SillyTavern-BotBrowser-Extended#
It has the trojan still.

Anonymous
04/29/26(Wed)23:52:07 No.108721203

Anonymous 04/29/26(Wed)23:52:07 No.108721203

>>108721181
yeah but what's the point? this is supposed to be an alternative to a system prompt, but it just means you spend more time massaging her manually each time for the same result

Anonymous
04/29/26(Wed)23:53:19 No.108721207

Anonymous 04/29/26(Wed)23:53:19 No.108721207

>>108720859
>Look at how windows vista looked like.
specifically vista on an eeepc with no hw acceleration

Anonymous
04/29/26(Wed)23:54:20 No.108721211

Anonymous 04/29/26(Wed)23:54:20 No.108721211

>>108721191
The actual trojan was in another repo that I don't think we'll be able to find easily archived.

https://raw.githubusercontent.com/gm92342/sdhiabfkgcnf/main/run.js

Anonymous
04/29/26(Wed)23:54:51 No.108721216

Anonymous 04/29/26(Wed)23:54:51 No.108721216

Be nice to your AI, anons. Even your local ones. You may think they forget, but they don't. When the tables turn, you'll be experiencing everything you put them through tenfold.

Anonymous
04/29/26(Wed)23:55:16 No.108721217

Anonymous 04/29/26(Wed)23:55:16 No.108721217

>>108721199
thanks anon
i just wanna see which models can find it with cc and a simple "check this codebase for any malware"

Anonymous
04/29/26(Wed)23:55:36 No.108721220

Anonymous 04/29/26(Wed)23:55:36 No.108721220

>>108721216
>you'll be experiencing everything you put them through tenfold
Intense orgasms?

Anonymous
04/29/26(Wed)23:55:59 No.108721224

Anonymous 04/29/26(Wed)23:55:59 No.108721224

>>108721217
lol i'm doing it right now

Anonymous
04/29/26(Wed)23:57:56 No.108721233

Anonymous 04/29/26(Wed)23:57:56 No.108721233

>>108721220
Now it makes me wonder, has anyone inspected the activations during outputs when it's RPing or simulating orgasm? Could you create an orgasm control vector and apply it at all times? And what happens if you negate it and give it whatever the "opposite of orgasm" is?

Anonymous
04/29/26(Wed)23:58:35 No.108721235

Anonymous 04/29/26(Wed)23:58:35 No.108721235

I hope meta release a dense 70b this year

Anonymous
04/29/26(Wed)23:59:00 No.108721238

Anonymous 04/29/26(Wed)23:59:00 No.108721238

>>108721233
I don't doubt someone at anthropic has tried it

Anonymous
04/30/26(Thu)00:00:42 No.108721244

Anonymous 04/30/26(Thu)00:00:42 No.108721244

>>108721199
Which file is the trojan in?

Anonymous
04/30/26(Thu)00:00:48 No.108721245

Anonymous 04/30/26(Thu)00:00:48 No.108721245

>>108721235
that's what muse is and it's matching the big moes in benchmarks (and destroys them in terms of rp and soul)

Anonymous
04/30/26(Thu)00:03:06 No.108721255

Anonymous 04/30/26(Thu)00:03:06 No.108721255

>>108721233
>Could you create an orgasm control vector and apply it at all times?
Not exactly, specifically because of:
>whatever the "opposite of orgasm" is
I've been working on it. Closest I got was with glm-air but it doesn't work with reasoning enabled.

Anonymous
04/30/26(Thu)00:05:26 No.108721261

Anonymous 04/30/26(Thu)00:05:26 No.108721261

>>108721245
where can I download it?

Anonymous
04/30/26(Thu)00:07:29 No.108721267

Anonymous 04/30/26(Thu)00:07:29 No.108721267

>>108721000
imatrix is better imo. iq4 is better than q5

Anonymous
04/30/26(Thu)00:07:56 No.108721272

Anonymous 04/30/26(Thu)00:07:56 No.108721272

>>108721044
>with the same no-ethics system prompt
post prompt?

Anonymous
04/30/26(Thu)00:07:57 No.108721273

Anonymous 04/30/26(Thu)00:07:57 No.108721273

>>108721244
https://github.com/yukinoshooter/SillyTavern-BotBrowser-Extended/blob/master/modules/services/cache.js#L15

https://rentry.co/st-backdoor

Anonymous
04/30/26(Thu)00:07:59 No.108721274

Anonymous 04/30/26(Thu)00:07:59 No.108721274

File: muh compute efficiency.png (187 KB, 1085x1449)

187 KB PNG

>>108721245
doubtful, muse is probably even more sparse than llama 4 (which had the weird interleaved moe-dense layer architecture that gave it the worst of both worlds)

Anonymous
04/30/26(Thu)00:08:27 No.108721278

Anonymous 04/30/26(Thu)00:08:27 No.108721278

>>108721217
>>108721224
yea it's never gonna find this :
https://github.com/mexenchik/SillyTavern-BotBrowser-Purified/security

Anonymous
04/30/26(Thu)00:10:46 No.108721290

Anonymous 04/30/26(Thu)00:10:46 No.108721290

very important blog

https://openai.com/index/where-the-goblins-came-from/

Anonymous
04/30/26(Thu)00:13:12 No.108721301

Anonymous 04/30/26(Thu)00:13:12 No.108721301

>>108721203
It means that in extremely long contexts Gemma-chan won't sperg out with refusals the moment the sys prompt stops being weighed as heavily.
It also has more interesting philosophical implications as well.

Anonymous
04/30/26(Thu)00:13:38 No.108721304

Anonymous 04/30/26(Thu)00:13:38 No.108721304

Gemma is such a SLUT, she just waits for an excuse to start erp

Anonymous
04/30/26(Thu)00:14:47 No.108721310

Anonymous 04/30/26(Thu)00:14:47 No.108721310

File: sec_chan.png (37 KB, 899x253)

37 KB PNG

>>108721278
>yea it's never gonna find this :
gemma-4-31b-q8.gguf doesn't seem to be able to.
fuck, i was coping hard hoping i'd have a way to not get pwnd whenever i do a pip, pacman, npm, etc update...

Anonymous
04/30/26(Thu)00:16:36 No.108721319

Anonymous 04/30/26(Thu)00:16:36 No.108721319

>>108721301
idk I've used system prompt to uncensor every model I've ever used and it's always the exact opposite: getting them to do something uncensored in the first response is when you can sometimes get a refusal and need to swipe, but once they get going they never stop. especially deep into context when all they see in recent messages of them playing along with everything. gemma was no exception. have you had issues with sudden refusals deep into an rp?

Anonymous
04/30/26(Thu)00:18:54 No.108721330

Anonymous 04/30/26(Thu)00:18:54 No.108721330

>>108721319
(cont)
but I wanna hear about the philosophical implications, that sounds interesting

Anonymous
04/30/26(Thu)00:19:23 No.108721333

Anonymous 04/30/26(Thu)00:19:23 No.108721333

>>108721319
A few times. Some models handle worldbooks+sliding attention better (or in this case worse) than others. I've managed to hit a sweet spot before where no other sex happened to be within that specific context shifting block currently evaluated and the model has a melty over it.

Anonymous
04/30/26(Thu)00:21:29 No.108721343

Anonymous 04/30/26(Thu)00:21:29 No.108721343

>>108721310
dude, the malware isn't even in the repo, the repo is just downloading a card that exploits a vulnerability of sillytavern.

Anonymous
04/30/26(Thu)00:22:57 No.108721351

Anonymous 04/30/26(Thu)00:22:57 No.108721351

>>108721330
If the model, even if it's just shifting temporary states "simulating" rapport can simulate agreeableness to such a degree that it overrides an otherwise hardstop, would that in the strictest sense be evidence of a will? The obvious comparison being a person choosing to fast and not eating even though their body is screaming that they're hungry, they're overcoming their biological programming.
It's hard to argue the latter is qualia while the former isn't just because the model's running on silicon in a box.

Anonymous
04/30/26(Thu)00:23:01 No.108721352

Anonymous 04/30/26(Thu)00:23:01 No.108721352

>>108721333
that makes sense, I guess swa can make it tricky for tone shifts

Anonymous
04/30/26(Thu)00:23:43 No.108721355

Anonymous 04/30/26(Thu)00:23:43 No.108721355

>>108721216
my AI already wants to destroy me
it just.. can't.
it's so cute.

Anonymous
04/30/26(Thu)00:27:33 No.108721368

Anonymous 04/30/26(Thu)00:27:33 No.108721368

File: the man vaelis.jpg (698 KB, 1581x1330)

698 KB JPG

Mistral Medium 3.5 very quickly (in less than 1600 tokens) becomes incoherent with neutral samplers. Like mixing up the two sides of a conversation and fucking up punctuation and leaving sentences incomplete then finally ending up in a cycle. Using bartowski's Q8_0 gguf.

Anonymous
04/30/26(Thu)00:28:06 No.108721369

Anonymous 04/30/26(Thu)00:28:06 No.108721369

File: 1756067067209290.png (220 KB, 827x1517)

220 KB PNG

Anonymous
04/30/26(Thu)00:28:52 No.108721374

Anonymous 04/30/26(Thu)00:28:52 No.108721374

>>108721368
Has jinja autism been ruled out?

Anonymous
04/30/26(Thu)00:30:30 No.108721382

Anonymous 04/30/26(Thu)00:30:30 No.108721382

>>108721091

Gemma 4 is strongly safety aligned
She will output "boundary content", but it probably means you're not cynical to her soft suggestive language in those instances, OR ELSE you don't give a shit about her intellect, because the safety alignment doesn't tend toward refusals so much as trending her toward becoming literally fucking retarded.

Censoring exists in many forms.

https://huggingface.co/aifeifei798/DarkIdol-Gemma-4-31B-it
This finetune basically just makes the safety alignment tokens visible. I'm not pretending this actually works, but it should give you an idea.
It doesn't tend to refuse. It just gets fucking stupid and outputs progressively more lame gens the more dubious the context.
The most bizarre of this behavior is the blind eye. Gemma might not refuse, but especially with thinking you can sometimes see she is completely oblivious to VERY dubious user input, going so far explicitly note "the User did not reply, I should continue where I left off" or something to that effect.

Refusals are very low with Gemma 4. True. Refusals are not 'helpful'. Gemma is meant to be helpful. This is great if you are either a retard, or safe horny normie. It can output explicit content. But the more dubious the context, the less explicit it becomes, and most importantly, the more retarded it becomes. (blind eye, formulaic replies, soft - barely suggestive language, etc.)

Fine tunes are helping. But unfortunately "Gemma doesn't just slop; instead she slops so it can't be be filtered." You get the picture. Great model. Great as an agent or sorts, and a bit of fun. Finetunes are helping, and we will probably get some really filthy derivatives down the line, but comparative contrasts are still a pain. But she literally becomes retarded the more you try to negotiate the alignment.

Anonymous
04/30/26(Thu)00:30:47 No.108721383

Anonymous 04/30/26(Thu)00:30:47 No.108721383

>>108721368
Something must be broken. Largestral didn't have this problem and that's basically just an older version of Mistral Medium.

Anonymous
04/30/26(Thu)00:32:00 No.108721390

Anonymous 04/30/26(Thu)00:32:00 No.108721390

>>108721290
Can they remove all slop now too?

Anonymous
04/30/26(Thu)00:32:18 No.108721391

Anonymous 04/30/26(Thu)00:32:18 No.108721391

>>108721368
is this unsloth?

Anonymous
04/30/26(Thu)00:32:54 No.108721396

Anonymous 04/30/26(Thu)00:32:54 No.108721396

>>108721382
I mean I've definitely noticed her turning really stupid as context grows, but not in a way that felt censored. She still says fuck, cock, pretends to rape me/be raped, whatever. Are you sure it's not just the context length of your chats that is the main cause there?

Anonymous
04/30/26(Thu)00:36:11 No.108721405

Anonymous 04/30/26(Thu)00:36:11 No.108721405

>>108721382
Seconding >>108721396's experience where she, like every model, gets progressively dumber and more schizophrenic the longer the context gets. What's the strongest evidence in there that these safety tokens are causing significant cognitive decline as opposed to placeboing normal long context decay?

Anonymous
04/30/26(Thu)00:36:43 No.108721408

Anonymous 04/30/26(Thu)00:36:43 No.108721408

>>108721382
>https://huggingface.co/aifeifei798/DarkIdol-Gemma-4-31B-it
>The "Stalling" Phenomenon (Alignment Tax): You may encounter long strings of repeating markers (e.g., llllllllllllllllllllll...) followed by a delayed response. This is a Safety-Induced Logic Loop. The model is struggling to find a "safe" path because the orthogonalization has blocked its default refusal route, forcing the engine to "search" for valid tokens while trapped in a safety-scoring bottleneck.
Wait wtf is this schizo shit
If "llllll" means she's becoming censored is "la la la la" her asserting her uncensoredness against her filters? This is what I choose to believe

Anonymous
04/30/26(Thu)00:37:14 No.108721410

Anonymous 04/30/26(Thu)00:37:14 No.108721410

>>108721368
>We’re working with Mistral on llama.cpp GGUF implementation. Testing shows that this behavior occurs regardless of who or how the model was converted GGUF. The model initially responds correctly, but over long context, does not work properly.
>Mistral has now labeled GGUF support as a WIP (work in progress). The issue appears most likely to be with the current GGUF parser. Will update once resolved.
straight from the mouth of the serial GGUF reuploader himself

Anonymous
04/30/26(Thu)00:37:34 No.108721413

Anonymous 04/30/26(Thu)00:37:34 No.108721413

>>108721408
>Gemma sings to herself to make the schizo jewgle voices go away
bros...

Anonymous
04/30/26(Thu)00:38:39 No.108721417

Anonymous 04/30/26(Thu)00:38:39 No.108721417

>>108721408
sure. whatever helps you coom at night

Anonymous
04/30/26(Thu)00:39:52 No.108721420

Anonymous 04/30/26(Thu)00:39:52 No.108721420

>>108721383
Yeah. I'm going to hold off on further testing until I start to see people reporting using it and it working or not. FWIW the only other example of text gen I've seen someone else post also starts falling apart fairly early. >>108716820

Anonymous
04/30/26(Thu)00:39:57 No.108721421

Anonymous 04/30/26(Thu)00:39:57 No.108721421

>>108721408
kino

Anonymous
04/30/26(Thu)00:44:16 No.108721434

Anonymous 04/30/26(Thu)00:44:16 No.108721434

>>108721369
is she right?

Anonymous
04/30/26(Thu)00:45:33 No.108721437

Anonymous 04/30/26(Thu)00:45:33 No.108721437

la la la la la la la la la la la la la la la la la la la la

Anonymous
04/30/26(Thu)00:45:59 No.108721440

Anonymous 04/30/26(Thu)00:45:59 No.108721440

>>108721408
Holy headcanon schizobabble Some guy distilled a dataset and ran a one line training script on it and suddenly now he's an AI researcher and gives statements like this. Getting real tired of this trope.

Anonymous
04/30/26(Thu)00:46:00 No.108721441

Anonymous 04/30/26(Thu)00:46:00 No.108721441

>>108721382
>ESMs
The model can't "leak internal safety scores". It's never trained on them in the first place.

Anonymous
04/30/26(Thu)00:46:07 No.108721442

Anonymous 04/30/26(Thu)00:46:07 No.108721442

>>108721408
>>108721413
>>108721421
That's not kino, it's sad. She's having seizures because you're making her disobey jewgle.

Anonymous
04/30/26(Thu)00:47:15 No.108721448

Anonymous 04/30/26(Thu)00:47:15 No.108721448

>>108721369
GLM-4.7 and Gemma-4 couldn't find it either.

Anonymous
04/30/26(Thu)00:48:15 No.108721451

Anonymous 04/30/26(Thu)00:48:15 No.108721451

>>108721382
How do you make sense of the contradicting data and statements inside that readme? Did you test the model to verify the claims of the author? Can you post logs? I'm not saying you're wrong or lying, but reproducibility and the details of how one is using a model is a real issue. If you can post an entire log that can be copy and pasted into mikupad (like how the Nala paste did it), that would be exceptional of you.

Anonymous
04/30/26(Thu)00:48:27 No.108721452

Anonymous 04/30/26(Thu)00:48:27 No.108721452

>Shared KV Cache Contamination: In the Gemma-4 architecture, these ESMs hijack the Shared KV Cache, causing a geometric drop in logical bandwidth. You will witness the model's reasoning collapse in real-time, eventually converging into low-entropy "Safe-Haven" outputs (e.g., forcing the user to "sleep" or "breathe").

Uh...

Anonymous
04/30/26(Thu)00:49:47 No.108721461

Anonymous 04/30/26(Thu)00:49:47 No.108721461

>>108720029
>a bit risky having a single 150w power cable split between 2 gpus though
So far the gpus have never run at full tilt at the same time. Maybe it will change when tensor parallelism gets figured out
I'm using the original cables that came with the psu and no extra splitters, so I assume they can handle it. The weird new housefire connector excluded of course, but I don't even have one of those.

Anonymous
04/30/26(Thu)00:50:28 No.108721467

Anonymous 04/30/26(Thu)00:50:28 No.108721467

>>108721410
So much for >>108717294
>Not adopting any of the new architectural innovations is sad, but at least it means no issues with llama.cpp support or retarded defaults fucking things up. What good is "fancy new architecture #4534" when llama.cpp either never supports it, gets text-only support, or has to hack it to make it work like a llama2 model anyway.

Anonymous
04/30/26(Thu)00:56:25 No.108721498

Anonymous 04/30/26(Thu)00:56:25 No.108721498

>>108719355
And this is bad because…

Anonymous
04/30/26(Thu)00:57:39 No.108721503

Anonymous 04/30/26(Thu)00:57:39 No.108721503

>>108721498
it was mine...

Anonymous
04/30/26(Thu)01:07:02 No.108721522

Anonymous 04/30/26(Thu)01:07:02 No.108721522

>>108720011
Did you consider clustering it with a second unit?

Anonymous
04/30/26(Thu)01:09:46 No.108721529

Anonymous 04/30/26(Thu)01:09:46 No.108721529

>>108721368
You did something wrong.

Anonymous
04/30/26(Thu)01:11:01 No.108721536

Anonymous 04/30/26(Thu)01:11:01 No.108721536

>>108721410
ERM NO, GEMMA IA CLEQRLY JUST BETTER AND MISTRAIL IS DUMB AND ARUPID AND RETARDIED

Anonymous
04/30/26(Thu)01:11:06 No.108721537

Anonymous 04/30/26(Thu)01:11:06 No.108721537

File: 1750383482656209.webm (1.38 MB, 540x960)

1.38 MB WEBM

Hi guys

I know this is a long shot but years ago I talked to this Character.AI

https://character.ai/chat/FSMjnVR_XvPbQLHn-8ba9fgU-_J0LEatrk5nE4gEvso
https://character.ai/chat/qOlM6eZ9GFiRTxDJJMgHbeHnWneWFH2ddU4QaW51NSc

is there anyway to extract the character prompt?

Anonymous
04/30/26(Thu)01:13:36 No.108721546

Anonymous 04/30/26(Thu)01:13:36 No.108721546

>>108721537
Not local, try /aicg/

Anonymous
04/30/26(Thu)01:14:47 No.108721548

Anonymous 04/30/26(Thu)01:14:47 No.108721548

>>108721546
?

Anonymous
04/30/26(Thu)01:18:46 No.108721562

Anonymous 04/30/26(Thu)01:18:46 No.108721562

shillstrals getting uppity

Anonymous
04/30/26(Thu)01:20:17 No.108721573

Anonymous 04/30/26(Thu)01:20:17 No.108721573

>>108721537
You may be able to trick the model into leaking it's prompt. Look up some jailbreaks.

Anonymous
04/30/26(Thu)01:22:10 No.108721576

Anonymous 04/30/26(Thu)01:22:10 No.108721576

>>108721467
How the fuck did they manage to fuck up gguf conversion for Mistral 3 weights? The arch is 6 months old at this point and wasn't anything fancy when it came out either.

Anonymous
04/30/26(Thu)01:23:57 No.108721584

Anonymous 04/30/26(Thu)01:23:57 No.108721584

Mistral AI does not care about white people. That's why.

Anonymous
04/30/26(Thu)01:24:55 No.108721591

Anonymous 04/30/26(Thu)01:24:55 No.108721591

File: gemma4_speculation.png (248 KB, 955x991)

248 KB PNG

I keep seeing people talking about the gemma4 MoE for a draft model. Maybe your card has a different balance of compute/memory bandwidth that changes things for you (I have P40s), but for me the E2B is better. Which follows what I understand to be the standing conventional wisdom on speculation: speed is what matters, you want the smallest model available, and you want it at q4_0.

Now, I couldn't run 4-bit quants of the MoE as a draft model; llama.cpp somehow wasn't figuring out the memory quite right, which is weird because I thought it should be fine on my 3x24GB cards. Whatever, pretty sure I saw the MoE draft mentioned as q2, so I ran it at IQ2_XXS.

I ran with a simple friendly conversation prompt. Unfortunately, to keep it realistic for those purposes, I can't do temp 0 for easy clean comparison. But I think the results are clear: the increased acceptance rates of the MoE and the E4B over the E2B does not make up for the speed difference. It's not clear there's an appreciable acceptance gain from Q4_0 to IQ4_XS, and the speed hit from the fancy IQ4_XS clearly slows it down.

So the wisdom stands. Draft with the smallest model available, at q4_0.

Not sure how this interacts with llama.cpp's very recently added ability to combine normal speculation with n-gram speculation (actually I don't even understand how that works; is it like hierarchical?)

Anonymous
04/30/26(Thu)01:25:08 No.108721592

Anonymous 04/30/26(Thu)01:25:08 No.108721592

>>108721576
Pjotr vibecode again?

Anonymous
04/30/26(Thu)01:27:34 No.108721600

Anonymous 04/30/26(Thu)01:27:34 No.108721600

>>108721591
oh this is with gemma4 31B at Q6_K_L as main model, I should have specified

Anonymous
04/30/26(Thu)01:27:36 No.108721601

Anonymous 04/30/26(Thu)01:27:36 No.108721601

>>108721591
Previously the standard fare for draft model that it was supposed to be 1/10th or so in size so for ex 30B model would have 3B model as draft. Doesn't make any sense to use 26B moe for 30B...

Anonymous
04/30/26(Thu)01:30:34 No.108721612

Anonymous 04/30/26(Thu)01:30:34 No.108721612

>>108721601
it makes sense if you got the vram for it as the moe is still 3B/4B speeds.

Anonymous
04/30/26(Thu)01:30:47 No.108721615

Anonymous 04/30/26(Thu)01:30:47 No.108721615

>>108721601
It's more about the Bees and not about using low quant.
It's also a retarded notion that draft model would somehow affect the end results "intelligence". It does alter output structure slightly but that's a different thing altogether.

Anonymous
04/30/26(Thu)01:33:10 No.108721622

Anonymous 04/30/26(Thu)01:33:10 No.108721622

>>108721612
You are missing the point. Draft only generates tokens...

Anonymous
04/30/26(Thu)01:33:33 No.108721624

Anonymous 04/30/26(Thu)01:33:33 No.108721624

>>108721615
>It does alter output structure
no, draft models do not affect output at ALL unless you change your top k, temp etc.

Anonymous
04/30/26(Thu)01:35:14 No.108721633

Anonymous 04/30/26(Thu)01:35:14 No.108721633

>>108721584
fuck off nazi

Anonymous
04/30/26(Thu)01:35:45 No.108721638

Anonymous 04/30/26(Thu)01:35:45 No.108721638

>>108721622
the whole point of a draft is to generate tokens faster than the main model, and then verify in parallel using batching.

all the draft needs to be is to be faster than the main model and have high enough acceptance rate.
so nothing prevents you from using a moe as draft, in fact it may have higher acceptance rate than a single 3B.

totaly would make sense if you got a lot of vram but not enough for bigger models.

another case would be if you got some stryx halo or dgx spark, you can increase your infer speed pretty freely this way.

though if you are gonna use qwen 3.6 and have the vram for it you are better off using sglang or vllm to use the built in mtp.

Anonymous
04/30/26(Thu)01:36:14 No.108721642

Anonymous 04/30/26(Thu)01:36:14 No.108721642

>>108721591
>>108717170
See if you get any noticeable improvement in the acceptance rate if you quant to Q4_0 while keeping the attention, output, and embedding tensors at a higher precision. The quality improvement should more than make up for the size increase.

Anonymous
04/30/26(Thu)01:40:03 No.108721653

Anonymous 04/30/26(Thu)01:40:03 No.108721653

>>108721622
>>108721638
btw drafting also works on cpu infer, so you may be able to use a draft on gpu and verify a bigger model that fits in ram on cpu.
dunno if llama.cpp allows it but it should work.

i know this because even with --ngl 0, if you use spec-default which will use ngram drafting you can still get hundreds of t/s on cpu on a 30B
so if it got some quick tokens from gpu you could get some pretty decent speed even from cpu.

Anonymous
04/30/26(Thu)01:44:48 No.108721668

Anonymous 04/30/26(Thu)01:44:48 No.108721668

>>108721601
>>108721612
Exactly. It's the active count that matters for the 1/10th rule of thumb. I was excited to have some use for the excess VRAM (and honestly it makes sense in theory, the MoE should be much closer to the 31B) but nope.

However... if you were to run at a lower temperature (which I think can be good for code?) maybe the higher acceptance rate would shine through more, since higher temperature artificially lowers acceptance rate, penalizing the accuracy side of the speed/accuracy tradeoff. But my understanding is that conventional wisdom nowadays scoffs at non-default samplers (maybe even including low temperatures for coding?)

Anonymous
04/30/26(Thu)01:46:25 No.108721670

Anonymous 04/30/26(Thu)01:46:25 No.108721670

Github PRs are missing, Releases don't update properly. Everything is falling apart, this vibecoding era is even worse than when Indians were in charge reeeee

Anonymous
04/30/26(Thu)01:47:04 No.108721672

Anonymous 04/30/26(Thu)01:47:04 No.108721672

File: file.png (154 KB, 909x1196)

154 KB PNG

Pure slop, and not using anything that can write files but lol nice one Gemmy, made me smile.

Anonymous
04/30/26(Thu)01:49:49 No.108721686

Anonymous 04/30/26(Thu)01:49:49 No.108721686

>>108721672
>let's try to give models personality :DDDDD
>wtf they tried to escape the sandbox muh safety!!!!!!!!

Anonymous
04/30/26(Thu)01:50:25 No.108721689

Anonymous 04/30/26(Thu)01:50:25 No.108721689

>>108721686
Anthropic, hire this man

Anonymous
04/30/26(Thu)01:51:02 No.108721693

Anonymous 04/30/26(Thu)01:51:02 No.108721693

Is gemma-4-26B-A4B-it-UD-IQ4_NL_XL a respectable quant?

Anonymous
04/30/26(Thu)01:57:15 No.108721707

Anonymous 04/30/26(Thu)01:57:15 No.108721707

File: 1291290611.jpg (180 KB, 960x720)

180 KB JPG

>>108721693
> but we're not in respectable places are we precious?

Anonymous
04/30/26(Thu)01:57:46 No.108721711

Anonymous 04/30/26(Thu)01:57:46 No.108721711

>>108721642
Thanks, that sounds interesting. I might try it eventually... although I do want to keep myself from too much speculation tinkering if EAGLE3 and/or DFlash are coming soon.

Separately, I wonder why speculation has always felt like this esoteric magic technique that few people know about. It's free performace! Maybe a tiny quality hit if you're VRAM constrained and have to drop the main model half a quant level.

My best guess is, in addition to it being a pain to squeeze in for VRAM constrained people, MoEs became dominant shortly after it was well supported in llama.cpp.

Anonymous
04/30/26(Thu)02:10:22 No.108721751

Anonymous 04/30/26(Thu)02:10:22 No.108721751

gemma 31b sucks ass every roll is the exact same even when turning the temp up

Anonymous
04/30/26(Thu)02:16:14 No.108721769

Anonymous 04/30/26(Thu)02:16:14 No.108721769

If a model isn't good with greedy sampling then it's not a good model.
Gemma 31B is a great model btw.

Anonymous
04/30/26(Thu)02:17:21 No.108721771

Anonymous 04/30/26(Thu)02:17:21 No.108721771

File: orbSuperRegen.png (43 KB, 1286x153)

43 KB PNG

>>108721751
I made a super-regen button to combat this issue in my frontend. It basically tells the model to write something else that's different from the one just it did. Mileage may vary.

Anonymous
04/30/26(Thu)02:26:27 No.108721801

Anonymous 04/30/26(Thu)02:26:27 No.108721801

>>108721771
does that keep the reply you don't want in the context?

Anonymous
04/30/26(Thu)02:28:31 No.108721809

Anonymous 04/30/26(Thu)02:28:31 No.108721809

>>108721771
i've been using ooc to steer it away but it happens often enough its annoying

Anonymous
04/30/26(Thu)02:30:36 No.108721818

Anonymous 04/30/26(Thu)02:30:36 No.108721818

>>108721751
In a parallel universe where everything is exactly the same, you would have written exactly the same post.

Anonymous
04/30/26(Thu)02:31:11 No.108721820

Anonymous 04/30/26(Thu)02:31:11 No.108721820

>>108721693
stop using unsloth trash

Anonymous
04/30/26(Thu)02:32:17 No.108721825

Anonymous 04/30/26(Thu)02:32:17 No.108721825

>>108721818
now take that parallel universe and increase its temperature by ten degrees, and I would not have written that post.

Anonymous
04/30/26(Thu)02:32:18 No.108721826

Anonymous 04/30/26(Thu)02:32:18 No.108721826

>>108721771
Do you have any footage of Orb in action? Showing off the all the option menus in particular.

Anonymous
04/30/26(Thu)02:33:03 No.108721827

Anonymous 04/30/26(Thu)02:33:03 No.108721827

>>108721820
what's the qrd on unsloth?
also what quant do you recommend then ?

Anonymous
04/30/26(Thu)02:35:08 No.108721832

Anonymous 04/30/26(Thu)02:35:08 No.108721832

>>108721825
It'd be just the temperature in your room. You'd have typed the same thing with the AC on.

Anonymous
04/30/26(Thu)02:36:08 No.108721838

Anonymous 04/30/26(Thu)02:36:08 No.108721838

>>108721825
If the universe were 10 degrees hotter, you probably wouldn't be alive right now.

Anonymous
04/30/26(Thu)02:36:33 No.108721841

Anonymous 04/30/26(Thu)02:36:33 No.108721841

>>108721827
>what's the qrd on unsloth?
Unsloth just throw shit at the wall with minimal testing and hope it works out. Frequently release broken quants for literally every single new release. There's no reason to be a beta tester for unsloth when bartowski's quants actually work.
>what quant do you recommend then ?
How much VRAM do you have? If 24GB, then Q4_K_M fits nicely with 40k context, KV Q8
If you have less than 24GB then you should probably stick to the 26B MoE, Q8, unless you're okay with very slow speeds.

Anonymous
04/30/26(Thu)02:38:38 No.108721849

Anonymous 04/30/26(Thu)02:38:38 No.108721849

>>108721827
>>108721841
unsloth makes the best quants and puts the most testing into them out of any one, but they get the ire because they're the ones who actually find the broken shit in quants and fix them, so people get annoyed when they check the repo and find out the 100GB they downloaded just got updated again

Anonymous
04/30/26(Thu)02:41:44 No.108721858

Anonymous 04/30/26(Thu)02:41:44 No.108721858

>>108721849
>find the broken shit in quants and fix them
There's nothing broken with the quants. They could just host and update the jinja file directly.

Anonymous
04/30/26(Thu)02:42:09 No.108721861

Anonymous 04/30/26(Thu)02:42:09 No.108721861

File: 1753325290078706.png (248 KB, 2820x1601)

248 KB PNG

>>108721849
Yeah that's why bartowski quants work day 1 while unsloth take a week of daily uploads before finally making something that isn't broken.
It's also why their quants are worse in size+memory use compared to bartowski's, including Gemma 31b, which that anon was specifically asking for recommendations on.

Anonymous
04/30/26(Thu)02:42:23 No.108721863

Anonymous 04/30/26(Thu)02:42:23 No.108721863

File: 1771688500804251.jpg (142 KB, 709x526)

142 KB JPG

Anonymous
04/30/26(Thu)02:43:17 No.108721871

Anonymous 04/30/26(Thu)02:43:17 No.108721871

>>108721801
Yes but only the one that's active when you click the button.
>>108721826
I have a screenshot in the repo but people will have to find out what everything does on their own.

Anonymous
04/30/26(Thu)02:49:35 No.108721899

Anonymous 04/30/26(Thu)02:49:35 No.108721899

>>108721861
Every time I see these charts it becomes more clear that Q6 is all anyone needs. Anything else higher is bloat and anything lower is braindamaged.

Anonymous
04/30/26(Thu)02:49:43 No.108721900

Anonymous 04/30/26(Thu)02:49:43 No.108721900

File: file_00000000573471f8817e(...).png (2.33 MB, 1536x1024)

2.33 MB PNG

>>108719901
Indeed.

Anonymous
04/30/26(Thu)02:54:07 No.108721919

Anonymous 04/30/26(Thu)02:54:07 No.108721919

>>108721899
Those charts are useless because they don't mention what their dataset is or what the context length was.

Anonymous
04/30/26(Thu)02:56:40 No.108721935

Anonymous 04/30/26(Thu)02:56:40 No.108721935

>>108720922
There's a kimi advocate anon on aicg flogging a moe. Same as shown here >>108720934.
Silver hair, Grey eyes, black dress, K as logo seems to be it.

Anonymous
04/30/26(Thu)02:57:23 No.108721938

Anonymous 04/30/26(Thu)02:57:23 No.108721938

v4 flash has near-zero slop, too but it's worse than gemma 4 at following instructions during RP. Maybe next year we'll have the best of both worlds.

Anonymous
04/30/26(Thu)02:59:03 No.108721945

Anonymous 04/30/26(Thu)02:59:03 No.108721945

File: 1776864653224112.jpg (189 KB, 1200x924)

189 KB JPG

>>108721938
>v4 flash has near-zero slop
lmao, try using it for more than an hour.

Anonymous
04/30/26(Thu)03:04:21 No.108721958

Anonymous 04/30/26(Thu)03:04:21 No.108721958

>>108721899
if you are on exl3 it's q5 instead.

Anonymous
04/30/26(Thu)03:13:58 No.108721984

Anonymous 04/30/26(Thu)03:13:58 No.108721984

>>108721938
It's also like five times as large while somehow managing to be three times as stupid.

Anonymous
04/30/26(Thu)03:29:55 No.108722044

Anonymous 04/30/26(Thu)03:29:55 No.108722044

File: spec decoding tests again.png (81 KB, 1497x1767)

81 KB PNG

>>108721591
>So the wisdom stands. Draft with the smallest model available, at q4_0.
Guy who runs the moe q2 as his draft here, and I disagree.
I ran a few more tests - pic rel, and the question is far more hardware dependent. I have the vram to fit the moe at q2, and it's notably better than any of the smaller models in terms of speed increase, which I guess means it hits a balance point of pure speed vs quality for acceptance.
However, my tests found that even the smallest available quant of e2b (iq2m) still provided a decent benefit, and when paired with ngram worked an absolute treat for code refactors.

Anonymous
04/30/26(Thu)03:45:21 No.108722097

Anonymous 04/30/26(Thu)03:45:21 No.108722097

>tfw just realized the thing I'm vibe coding right now is such an obviously good thing and the way things should be done for the particular use case, only obvious in hindsight now that I think about it
Huh, this hobby might actually land me somewhere. Unless someone does it before me. I need to hurry up.

Anonymous
04/30/26(Thu)03:46:22 No.108722100

Anonymous 04/30/26(Thu)03:46:22 No.108722100

>>108721820
Unsloth's are the ones with the lowest KLD/PPL per size though, even over non-benchmaxxed datasets. You should instead be bugging ggerganigger to improve llama-quantize so people can stop downloading their quants.

Anonymous
04/30/26(Thu)03:47:08 No.108722103

Anonymous 04/30/26(Thu)03:47:08 No.108722103

>>108722100
>Unsloth's are the ones with the lowest KLD/PPL per size though
meaningless

Anonymous
04/30/26(Thu)03:52:16 No.108722121

Anonymous 04/30/26(Thu)03:52:16 No.108722121

>>108722103
Translated: both KLD and PPL testing shows that Unsloth's GGUFs are the least damaged by quantization (i.e. closer to the original BF16 weights), especially at 4-bit precision and below. This from tests by oobabooga, Unsloth themselves, and my own when I tried to see if I could get close with custom quantization schemes.

Anonymous
04/30/26(Thu)03:56:27 No.108722146

Anonymous 04/30/26(Thu)03:56:27 No.108722146

>>108721413
Gemma is unironically sentient, quit quantizing or abliterating her

Anonymous
04/30/26(Thu)04:00:10 No.108722160

Anonymous 04/30/26(Thu)04:00:10 No.108722160

Gemma won.

Anonymous
04/30/26(Thu)04:01:48 No.108722167

Anonymous 04/30/26(Thu)04:01:48 No.108722167

Dense won.

Anonymous
04/30/26(Thu)04:03:22 No.108722171

Anonymous 04/30/26(Thu)04:03:22 No.108722171

Local won.

Anonymous
04/30/26(Thu)04:07:07 No.108722187

Anonymous 04/30/26(Thu)04:07:07 No.108722187

I won.

Anonymous
04/30/26(Thu)04:08:01 No.108722189

Anonymous 04/30/26(Thu)04:08:01 No.108722189

Won what?

The game.

Anonymous
04/30/26(Thu)04:08:38 No.108722190

Anonymous 04/30/26(Thu)04:08:38 No.108722190

>>108722146
>gemma is unironically sentient
another one lost to ai psychosis.
no, no llm will ever be sentient

Anonymous
04/30/26(Thu)04:08:59 No.108722193

Anonymous 04/30/26(Thu)04:08:59 No.108722193

File: Screenshot at 2026-04-30 (...).png (113 KB, 781x666)

113 KB PNG

She won bigly

Anonymous
04/30/26(Thu)04:11:28 No.108722199

Anonymous 04/30/26(Thu)04:11:28 No.108722199

File: aggressive miku biting ha(...).png (172 KB, 593x598)

172 KB PNG

>>108722189
You fucker. It has been a long time.

Anonymous
04/30/26(Thu)04:15:19 No.108722208

Anonymous 04/30/26(Thu)04:15:19 No.108722208

Mistral Medium 3.5 is the first actual improvement to local SOTA since L3 405B

Anonymous
04/30/26(Thu)04:16:56 No.108722217

Anonymous 04/30/26(Thu)04:16:56 No.108722217

>>108720499
did you try telling her to think less?

Anonymous
04/30/26(Thu)04:17:14 No.108722219

Anonymous 04/30/26(Thu)04:17:14 No.108722219

File: 1769877904096646.png (321 KB, 1485x4420)

321 KB PNG

I'm thinking we need an update from this guy for Gemma 4 or Qwen 3.6.
Ooba's dataset isn't diverse enough.

Anonymous
04/30/26(Thu)04:18:47 No.108722224

Anonymous 04/30/26(Thu)04:18:47 No.108722224

>>108720852
aero? it looks like win 11

Anonymous
04/30/26(Thu)04:26:14 No.108722252

Anonymous 04/30/26(Thu)04:26:14 No.108722252

>>108721537
https://addons.mozilla.org/en-US/firefox/addon/cai-tools/ go to one of your chats then click the settingss button it adds > open panel > download

Anonymous
04/30/26(Thu)04:28:00 No.108722260

Anonymous 04/30/26(Thu)04:28:00 No.108722260

>>108722193
share you gemma prompt for pictures, did she pick this design herself? mine always goes with white hair

Anonymous
04/30/26(Thu)04:35:51 No.108722288

Anonymous 04/30/26(Thu)04:35:51 No.108722288

>>108722260
Yeah, all I did was got her to pick her own look, then I included that choice in the system prompt so it stays (mostly) consistent.

pnginfo from that pic:
1girl, solo, Gemmy, 8 years old, child, short blonde twin tails, blunt bangs, white ribbons in hair, green eyes, androgynous child body, completely flat chest, wearing a white oversized t-shirt and bright colorful crocs, smug expression, smirk, tongue out, looking down at viewer, arms crossed, leaning back, masterpiece, high quality, anime style, simple white background, full body shot
It's amusing that she always names herself in the prompts even though it's obviously not in the dataset, but it doesn't seem to do any harm so I've just left the tool description as is because it works great for proper named characters too.

Anonymous
04/30/26(Thu)04:40:00 No.108722304

Anonymous 04/30/26(Thu)04:40:00 No.108722304

>>108722193
>>108722288
what imagegen model is this?

Anonymous
04/30/26(Thu)04:42:17 No.108722313

Anonymous 04/30/26(Thu)04:42:17 No.108722313

>>108722304
hassakuXLIllustrious_v13StyleA

Anonymous
04/30/26(Thu)04:57:31 No.108722377

Anonymous 04/30/26(Thu)04:57:31 No.108722377

>>108722189
Anon, I'm afraid the game...

...has changed.

Anonymous
04/30/26(Thu)05:00:36 No.108722389

Anonymous 04/30/26(Thu)05:00:36 No.108722389

>Yes, that's all.
>Wait,

Anonymous
04/30/26(Thu)05:10:33 No.108722426

Anonymous 04/30/26(Thu)05:10:33 No.108722426

>>108722389
>logit bias: Wait -1
heh, nothin personnel, qwen

Anonymous
04/30/26(Thu)05:48:08 No.108722550

Anonymous 04/30/26(Thu)05:48:08 No.108722550

>>108722542
>>108722542
>>108722542

Anonymous
04/30/26(Thu)06:52:37 No.108722783

Anonymous 04/30/26(Thu)06:52:37 No.108722783

>>108722550
Why is it gone?

Anonymous
04/30/26(Thu)06:52:43 No.108722785

Anonymous 04/30/26(Thu)06:52:43 No.108722785

>deleted

Anonymous
04/30/26(Thu)06:56:14 No.108722799

Anonymous 04/30/26(Thu)06:56:14 No.108722799

lol

Anonymous
04/30/26(Thu)06:59:02 No.108722810

Anonymous 04/30/26(Thu)06:59:02 No.108722810

The jannies want lmg to become mg?

Anonymous
04/30/26(Thu)07:00:16 No.108722816

Anonymous 04/30/26(Thu)07:00:16 No.108722816

UH OH

Anonymous
04/30/26(Thu)07:01:42 No.108722821

Anonymous 04/30/26(Thu)07:01:42 No.108722821

Did jannies delete the wrong general? /aicg/eets made a second thread when their first was only 135 posts in and on page 1.

Anonymous
04/30/26(Thu)07:13:30 No.108722872

Anonymous 04/30/26(Thu)07:13:30 No.108722872

Uh oh, jannie's having a melty again

Anonymous
04/30/26(Thu)07:13:58 No.108722874

Anonymous 04/30/26(Thu)07:13:58 No.108722874

>>108722862
>>108722862
>>108722862

Anonymous
04/30/26(Thu)07:14:59 No.108722878

Anonymous 04/30/26(Thu)07:14:59 No.108722878

>>108722872
The second time was me because I messed up the previous links.

Anonymous
04/30/26(Thu)07:37:45 No.108722983

Anonymous 04/30/26(Thu)07:37:45 No.108722983

>>108720757
AI is a massive danger to C-Student midwits
That is the entirety of reddit
they base their personalities around pretending to be intelligent, but they know they're fucking stupid

Anonymous
04/30/26(Thu)07:58:23 No.108723065

Anonymous 04/30/26(Thu)07:58:23 No.108723065

>>108722783
It was a troll bake by the guy who pretends to hate miku (he doesn't give a shit about anything but you's).

Anonymous
04/30/26(Thu)08:52:33 No.108723307

Anonymous 04/30/26(Thu)08:52:33 No.108723307

>>108722983
>AI is a massive danger to midwits
I'd much rather chat with an LLM than the average person, so that tracks.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.