/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 12/08/25(Mon)09:53:38 No.107481183

File: 1733673788419213.png (1.31 MB, 832x1216)

1.31 MB PNG

/lmg/ - Local Models General Anonymous 12/08/25(Mon)09:53:38 No.107481183 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107470372 & >>107452093

►News
>(12/08) GLM-4.6V (106B) and Flash (9B) released with function calling: https://z.ai/blog/glm-4.6v
>(12/06) convert: support Mistral 3 Large MoE #17730: https://github.com/ggml-org/llama.cpp/pull/17730
>(12/04) Microsoft releases VibeVoice-Realtime-0.5B: https://hf.co/microsoft/VibeVoice-Realtime-0.5B
>(12/04) koboldcpp-1.103 prebuilt released: https://github.com/LostRuins/koboldcpp/releases/tag/v1.103
>(12/02) Mistral Large 3 and Ministral 3 released: https://mistral.ai/news/mistral-3

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/08/25(Mon)09:54:10 No.107481187

Anonymous 12/08/25(Mon)09:54:10 No.107481187

File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)

1.15 MB JPG

►Recent Highlights from the Previous Thread: >>107470372

--Llama.cpp performance and development challenges:
>107471956 >107472014 >107472035 >107472140 >107472205 >107472248 >107472314 >107472398 >107472401 >107472517 >107472545 >107472555 >107472592 >107472629 >107472709 >107473151 >107473231 >107472721 >107472760 >107472837 >107472963 >107472212 >107472289 >107472774
--New CUDA documentation and format preferences:
>107472513 >107472524 >107473035 >107472621
--Local image generation and model performance on M5 Macbook Pro:
>107476437 >107476479 >107476511 >107476545 >107476641 >107477701 >107477778 >107477886 >107478034 >107476975 >107477081 >107477104
--TTS model quality comparison and conditioning challenges:
>107470680 >107470914 >107471013 >107471182
--Impressive text-to-image generation of a rustic bedroom scene:
>107472084
--ComfyUI usability decline and search for diffusion software alternatives:
>107470955 >107471018 >107471083 >107471117 >107471133 >107471786 >107471815 >107471836 >107471875 >107471919 >107471940 >107471917 >107471865 >107471935 >107471074 >107471224 >107471245 >107471191 >107471267 >107471354 >107471426 >107471379
--GLM-4.6V multimodal model release and limitations:
>107479516 >107479541 >107479573 >107479580 >107479599 >107479606 >107479547 >107479590
--Chinese AI dominance in uncensored, fast models challenges Western competition:
>107478667 >107479399 >107480085
--Evaluating RTX 5060 Ti's 16GB VRAM for AI workloads:
>107479026 >107479046 >107479116 >107479245 >107479444
--voxcpm performance on CPU systems:
>107479708 >107479717
--AI model token efficiency and training data quality issues in Chinese-English contexts:
>107479930 >107480112 >107480134 >107480193 >107480823 >107480221 >107480243
--Miku (free space):
>107471191 >107471379 >107472024 >107475030 >107478877 >107479399 >107479543

►Recent Highlight Posts from the Previous Thread: >>107470374

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/08/25(Mon)09:56:03 No.107481209

Anonymous 12/08/25(Mon)09:56:03 No.107481209

GOOFS WHEN

Anonymous
12/08/25(Mon)09:57:06 No.107481224

Anonymous 12/08/25(Mon)09:57:06 No.107481224

Mikulove

Anonymous
12/08/25(Mon)09:58:10 No.107481237

Anonymous 12/08/25(Mon)09:58:10 No.107481237

Gemma Sirs, Ganesh Gemma 4 will be soonly enough for Us.

Anonymous
12/08/25(Mon)09:59:27 No.107481251

Anonymous 12/08/25(Mon)09:59:27 No.107481251

>>107481183
>GLM-4.6V (106B) and Flash (9B)
medium sized models are dead or what? why is it always a tiny useless shit + a giant shit?

Anonymous
12/08/25(Mon)10:01:35 No.107481277

Anonymous 12/08/25(Mon)10:01:35 No.107481277

>>107481251
At least in benchmarks the 9B looks really good.

Anonymous
12/08/25(Mon)10:02:05 No.107481282

Anonymous 12/08/25(Mon)10:02:05 No.107481282

Lack of Z-image base starved /ldg/ to death.

Anonymous
12/08/25(Mon)10:02:25 No.107481287

Anonymous 12/08/25(Mon)10:02:25 No.107481287

>>107481251
you can run 110B moes comfortably retard, it's a good size for Q4 96-128gb ram you retard gay faggot

Anonymous
12/08/25(Mon)10:03:09 No.107481294

Anonymous 12/08/25(Mon)10:03:09 No.107481294

>>107481282
im not gonna bake I posted 2 really bad gens there

Anonymous
12/08/25(Mon)10:04:05 No.107481306

Anonymous 12/08/25(Mon)10:04:05 No.107481306

File: 1753758603868441.png (290 KB, 2180x1592)

290 KB PNG

>>107481287
>96-128gb ram
that subhuman didn't saw the news about the boom in ram prices or what?

Anonymous
12/08/25(Mon)10:06:59 No.107481337

Anonymous 12/08/25(Mon)10:06:59 No.107481337

File: file.png (34 KB, 214x235)

34 KB PNG

>>107481282
>Lack of Z-image base starved /ldg/ to death.
/ldg/ fag here, I'm gonna chill there until a new thread is created, I hope you don't mind

Anonymous
12/08/25(Mon)10:09:25 No.107481360

Anonymous 12/08/25(Mon)10:09:25 No.107481360

>>107481337
Does that seething, anti-ai troon janny shit up /ldg/ with impunity as well?

Anonymous
12/08/25(Mon)10:11:00 No.107481378

Anonymous 12/08/25(Mon)10:11:00 No.107481378

>>107481337
>leet
You are very cool. welcome.

Anonymous
12/08/25(Mon)10:11:42 No.107481389

Anonymous 12/08/25(Mon)10:11:42 No.107481389

>>107481360
I don't know what you're talking about so I guess the answer is no kek

Anonymous
12/08/25(Mon)10:11:57 No.107481391

Anonymous 12/08/25(Mon)10:11:57 No.107481391

>>107481306
So glad I RAMMaxxed back in the cheap days. Even my trash desktops have at least 64GB. Homelab servers between 128 and 768GB. Just hope that chink fab coming online and OpenAI being unable to actually pay for 40% of the worlds stock somehow stabilizes things before too long

Anonymous
12/08/25(Mon)10:19:32 No.107481463

Anonymous 12/08/25(Mon)10:19:32 No.107481463

what's now the best for 12GB VRAM and 64GB RAM

Anonymous
12/08/25(Mon)10:20:53 No.107481472

Anonymous 12/08/25(Mon)10:20:53 No.107481472

>>107481463
rocinante 1.1 q6

Anonymous
12/08/25(Mon)10:28:41 No.107481546

Anonymous 12/08/25(Mon)10:28:41 No.107481546

File: 1737576075847366.png (31 KB, 787x429)

31 KB PNG

It's over

Anonymous
12/08/25(Mon)10:35:54 No.107481622

Anonymous 12/08/25(Mon)10:35:54 No.107481622

>>107481391
I'm glad I got 64GB when I built my latest computer around five years ago. I'm sad I didn't buy a 12x64GB system when I first started thinking about it earlier this year.

Anonymous
12/08/25(Mon)10:36:12 No.107481625

Anonymous 12/08/25(Mon)10:36:12 No.107481625

>>107481546
what model

Anonymous
12/08/25(Mon)10:37:08 No.107481635

Anonymous 12/08/25(Mon)10:37:08 No.107481635

>>107481625
Flash (9B)

Anonymous
12/08/25(Mon)10:37:43 No.107481646

Anonymous 12/08/25(Mon)10:37:43 No.107481646

>>107481546
OWARI DA

Anonymous
12/08/25(Mon)10:38:27 No.107481652

Anonymous 12/08/25(Mon)10:38:27 No.107481652

>>107481472
I'd like to use more of the RAM

Anonymous
12/08/25(Mon)10:42:02 No.107481693

Anonymous 12/08/25(Mon)10:42:02 No.107481693

>>107481546
>mesu > nemuru
That's just weird. Mesu isn't that rare of a word, I think.

Anonymous
12/08/25(Mon)10:43:42 No.107481713

Anonymous 12/08/25(Mon)10:43:42 No.107481713

>>107481652
You could try GPT-OSS-20B. Just be aware that it's severely safetymaxxed.

Anonymous
12/08/25(Mon)10:47:01 No.107481744

Anonymous 12/08/25(Mon)10:47:01 No.107481744

>>107481463
Qwen 30B

Anonymous
12/08/25(Mon)10:48:30 No.107481762

Anonymous 12/08/25(Mon)10:48:30 No.107481762

Is the new 4.6V really MoE? I don't see any mentioning of total vs active params count

Anonymous
12/08/25(Mon)10:51:00 No.107481800

Anonymous 12/08/25(Mon)10:51:00 No.107481800

File: file.png (6 KB, 151x111)

6 KB PNG

>>107481762
It's in small grey text at the top left of the benchmeme chart image

Anonymous
12/08/25(Mon)10:52:57 No.107481823

Anonymous 12/08/25(Mon)10:52:57 No.107481823

feet

Anonymous
12/08/25(Mon)10:54:12 No.107481840

Anonymous 12/08/25(Mon)10:54:12 No.107481840

I have slow internet, so I cant just download all those gb.
How fast would glm 106b be with 16gb vram and rest in ram? I have 64gb ddr4 ram.
I heard recently some flags in lccp improved speed.
Gut feeling is that 12b mostly on ram is painfully slow.

Anonymous
12/08/25(Mon)10:57:03 No.107481870

Anonymous 12/08/25(Mon)10:57:03 No.107481870

>>107481840
I run 4.5 Air q4ks on a 7900 XTX and DDR4. I get around 8 t/s. I don't think your GPU speed matter much for token generation, but for prompt processing it will.

Anonymous
12/08/25(Mon)12:57:33 No.107481922

Anonymous 12/08/25(Mon)12:57:33 No.107481922

>>107481762
God I wish it were 106B dense

Anonymous
12/08/25(Mon)13:02:45 No.107481979

Anonymous 12/08/25(Mon)13:02:45 No.107481979

>>107481306
if you don't already have at least 128GB RAM and 24GB VRAM you don't belong in this thread, unironically.

Anonymous
12/08/25(Mon)13:04:26 No.107481997

Anonymous 12/08/25(Mon)13:04:26 No.107481997

>>107481979
i have 96gb ram and 16gb vram, do I get a pass?

Anonymous
12/08/25(Mon)13:07:29 No.107482026

Anonymous 12/08/25(Mon)13:07:29 No.107482026

>>107481979
My exact hardware now. I wish there was something better than nemo to run though

Anonymous
12/08/25(Mon)13:07:46 No.107482030

Anonymous 12/08/25(Mon)13:07:46 No.107482030

New model dropped while 4chan was down: AutoGLM-Phone-9B.

Phone use model. I guess to automate botting tinder or something?

Anonymous
12/08/25(Mon)13:13:01 No.107482072

Anonymous 12/08/25(Mon)13:13:01 No.107482072

We're back!

Anonymous
12/08/25(Mon)13:16:39 No.107482099

Anonymous 12/08/25(Mon)13:16:39 No.107482099

>>107481823
this is the post that crashed 4chan

Anonymous
12/08/25(Mon)13:18:24 No.107482119

Anonymous 12/08/25(Mon)13:18:24 No.107482119

@lmg redditor claims ts the reason RAM prices so high now, is it true? https://www.mooreslawisdead.com/post/sam-altman-s-dirty-dram-deal

Anonymous
12/08/25(Mon)13:21:36 No.107482150

Anonymous 12/08/25(Mon)13:21:36 No.107482150

>>107482119
stfu

Anonymous
12/08/25(Mon)13:22:27 No.107482158

Anonymous 12/08/25(Mon)13:22:27 No.107482158

I'm gone back to running midnight miqu 70b and I'm starting to like it better than glm. Even at low quants it just works. New models are officially dead.

Anonymous
12/08/25(Mon)13:23:02 No.107482166

Anonymous 12/08/25(Mon)13:23:02 No.107482166

migu maintenance has been completed, thank you for your patience

Anonymous
12/08/25(Mon)13:25:10 No.107482191

Anonymous 12/08/25(Mon)13:25:10 No.107482191

>>107482158
lol

Anonymous
12/08/25(Mon)13:27:47 No.107482223

Anonymous 12/08/25(Mon)13:27:47 No.107482223

>>107482158
>mfw a stolen model from mistral is still better than the scraps they're providing nowadays

Anonymous
12/08/25(Mon)13:28:39 No.107482233

Anonymous 12/08/25(Mon)13:28:39 No.107482233

>>107482158
I'm doing mistrals and cohere. Llama is a bit of a dummy by current standards. Miles ahead of the shitty parrot prose of new models.

Anonymous
12/08/25(Mon)13:30:16 No.107482249

Anonymous 12/08/25(Mon)13:30:16 No.107482249

>>107482158
Back to command-r

Anonymous
12/08/25(Mon)13:32:29 No.107482282

Anonymous 12/08/25(Mon)13:32:29 No.107482282

>>107482030
Can it play gacha for me?

Anonymous
12/08/25(Mon)13:32:39 No.107482284

Anonymous 12/08/25(Mon)13:32:39 No.107482284

>>107482150
wow an llm that can seethe, AGI has been achieved it seems.

Anonymous
12/08/25(Mon)13:36:31 No.107482318

Anonymous 12/08/25(Mon)13:36:31 No.107482318

I hope somebody drops a bomb on OAI HQ

Anonymous
12/08/25(Mon)13:42:42 No.107482392

Anonymous 12/08/25(Mon)13:42:42 No.107482392

>>107482318
I hope someone turns sam altman straight.

Anonymous
12/08/25(Mon)14:00:57 No.107482577

Anonymous 12/08/25(Mon)14:00:57 No.107482577

>>107482560
me on the left~

Anonymous
12/08/25(Mon)14:20:57 No.107482787

Anonymous 12/08/25(Mon)14:20:57 No.107482787

All the big players are done for the year. We're likely not going to see anything new until march or april.

Anonymous
12/08/25(Mon)14:27:20 No.107482841

Anonymous 12/08/25(Mon)14:27:20 No.107482841

Dead technology. I hear the bubble popping in the distance and I cannot be happier. We wasted at least 2 years with these retarded transformers instead of pushing for the paradigm shift.

Anonymous
12/08/25(Mon)14:27:58 No.107482847

Anonymous 12/08/25(Mon)14:27:58 No.107482847

Why is it called transformers and not cisformers?

Anonymous
12/08/25(Mon)14:31:52 No.107482876

Anonymous 12/08/25(Mon)14:31:52 No.107482876

>Load up Gemma 270m for shits and giggles
>Ask "What does NEET mean?"
>It starts talking about India
Sirs?

Anonymous
12/08/25(Mon)14:33:14 No.107482891

Anonymous 12/08/25(Mon)14:33:14 No.107482891

>>107482876
you are obsessed

Anonymous
12/08/25(Mon)14:35:46 No.107482917

Anonymous 12/08/25(Mon)14:35:46 No.107482917

I wish I could use AI to turn myself into a woman...

Anonymous
12/08/25(Mon)14:41:07 No.107482976

Anonymous 12/08/25(Mon)14:41:07 No.107482976

I should have upgraded from 64gb to 128gb when I had the chance. It's so fucking over.

Anonymous
12/08/25(Mon)14:41:33 No.107482984

Anonymous 12/08/25(Mon)14:41:33 No.107482984

File: 1751322794859468.jpg (599 KB, 2522x1440)

599 KB JPG

What samplers do people use for glm air
I haven't changed my samplers since mythomax

Anonymous
12/08/25(Mon)14:46:19 No.107483030

Anonymous 12/08/25(Mon)14:46:19 No.107483030

>>107482984
I've been using less and less samplers over the past two years. I'm back to running pure temp + maybe a tiny bit of min-p or even top-p

Anonymous
12/08/25(Mon)14:48:27 No.107483056

Anonymous 12/08/25(Mon)14:48:27 No.107483056

>>107482891
and you're a fecaloid

Anonymous
12/08/25(Mon)14:51:09 No.107483086

Anonymous 12/08/25(Mon)14:51:09 No.107483086

>>107482984
you should only use temperature and min p, the other samplers are for crayon eaters and people who wants to feel good about using the sampler of the month that does nothing

Anonymous
12/08/25(Mon)14:52:04 No.107483099

Anonymous 12/08/25(Mon)14:52:04 No.107483099

>>107482984
Just top-n-sigma and temp. Sometimes I turn on XTC if it just repeats itself when I swipe.

Anonymous
12/08/25(Mon)14:53:02 No.107483108

Anonymous 12/08/25(Mon)14:53:02 No.107483108

>>107482984
Top N-Sigma

Anonymous
12/08/25(Mon)14:53:53 No.107483116

Anonymous 12/08/25(Mon)14:53:53 No.107483116

>>107483086
wasn't min p used for a while in the labs before it got proven to be worse than good old top p and top k?

Anonymous
12/08/25(Mon)15:04:13 No.107483224

Anonymous 12/08/25(Mon)15:04:13 No.107483224

Why isn't anyone talking about Intellect-3? It feels pretty creative, even though the annoying thinking forces itself through every now and then.

Anonymous
12/08/25(Mon)15:04:39 No.107483227

Anonymous 12/08/25(Mon)15:04:39 No.107483227

>>107483116
I assume we are talking about RP here. If not then top p is fine too. Top k is not adaptive enough and is simply worse than the other two.

Anonymous
12/08/25(Mon)15:05:26 No.107483234

Anonymous 12/08/25(Mon)15:05:26 No.107483234

don't even leave me again
>>107482166
did they fix her leaking pipe?

Anonymous
12/08/25(Mon)15:07:32 No.107483257

Anonymous 12/08/25(Mon)15:07:32 No.107483257

>>107483108
top n-slopma

Anonymous
12/08/25(Mon)15:07:59 No.107483260

Anonymous 12/08/25(Mon)15:07:59 No.107483260

>>107483234
I filled her context with my tokens and she leaked memory. Sorry about that.

Anonymous
12/08/25(Mon)15:44:35 No.107483575

Anonymous 12/08/25(Mon)15:44:35 No.107483575

>>107482876
did saar gemma redeem?

Anonymous
12/08/25(Mon)15:45:03 No.107483582

Anonymous 12/08/25(Mon)15:45:03 No.107483582

>>107482917
post bussy

Anonymous
12/08/25(Mon)15:49:06 No.107483625

Anonymous 12/08/25(Mon)15:49:06 No.107483625

>>107483224
I think it's better than glm air but I didn't post about it because it might be placebo (I haven't used it enough to know for sure if it's good) and nobody in this thread understands nuance (every model either has to be the best ever or total shit). Anyways I use it without thinking for story-style RP and I've been enjoying it so far. The only time thinking popped up was when I tried to ask the model a question in llama-server directly, in sillytavern it hasn't come up.

Anonymous
12/08/25(Mon)15:58:12 No.107483683

Anonymous 12/08/25(Mon)15:58:12 No.107483683

>>107483401
conceptually it should be

Anonymous
12/08/25(Mon)16:13:26 No.107483823

Anonymous 12/08/25(Mon)16:13:26 No.107483823

1. Do you think AI is currently in a bubble
2. If yes, how far do you think AI will advance before the bubble pops?

Anonymous
12/08/25(Mon)16:16:39 No.107483859

Anonymous 12/08/25(Mon)16:16:39 No.107483859

>>107483823
Yes, there was a railway bubble and there was an internet bubble and look where those things are now. Tulips still exist as well.
If AI is as successful as either of those things, we'll be fine. Especially once VR takes off and elevates AI to new heights. Financially, crypto is likely still the better long-term investment though.

Anonymous
12/08/25(Mon)16:17:16 No.107483863

Anonymous 12/08/25(Mon)16:17:16 No.107483863

>>107483823
yes, its already hit the wall, nothing but diminishing returns from here on out.

Anonymous
12/08/25(Mon)16:18:39 No.107483875

Anonymous 12/08/25(Mon)16:18:39 No.107483875

>>107483823
Yes. Yes.
If there's any real risk of the bubble popping, DARPA will nudge it forward by leaking a methodology breakthrough from a black budget program. Just like they did when they let attention guidance go public in 2017.

Anonymous
12/08/25(Mon)16:19:10 No.107483882

Anonymous 12/08/25(Mon)16:19:10 No.107483882

>>107483823
1. No, I do not think that AI is currently in a bubble because language models are not AI.

Anonymous
12/08/25(Mon)16:21:13 No.107483902

Anonymous 12/08/25(Mon)16:21:13 No.107483902

>>107483882
not your autistic definition. the contemporary definition that everyone is using.

Anonymous
12/08/25(Mon)16:21:39 No.107483907

Anonymous 12/08/25(Mon)16:21:39 No.107483907

>>107483823
Yes and yes.
The bubble popping will serve as a sort of filter to weed out a lot of the useless ideas and it'll give space for even more innovation.

Anonymous
12/08/25(Mon)16:21:48 No.107483908

Anonymous 12/08/25(Mon)16:21:48 No.107483908

>>107483823
1. No. Demoralizing posts don't have any effect on my enjoyment of language models.
2. Yes. You should kill yourself.

Anonymous
12/08/25(Mon)16:22:46 No.107483915

Anonymous 12/08/25(Mon)16:22:46 No.107483915

>>107483823
it's currently in its bitcoin 2014 $300 stage

Anonymous
12/08/25(Mon)16:23:39 No.107483926

Anonymous 12/08/25(Mon)16:23:39 No.107483926

>>107483823
>1. Do you think AI is currently in a bubble
AI? No. LLM is a bubble. Investors are pumping disgusting amount of money in the technology that already hit the wall.
>2. If yes, how far do you think AI will advance before the bubble pops?
Hard to say, it can be next year or a few years later. You will still observe small improvements in models but we are at the point of diminishing returns. You can do small hacks and changes in the architecture, maybe generate more synthetic data, but that's all. The biggest players already scrapped ALL text data humanity ever created. There is nothing more to train the models on (with the exception of synthetic data, but that's not helpful in the long run).

Anonymous
12/08/25(Mon)16:25:49 No.107483948

Anonymous 12/08/25(Mon)16:25:49 No.107483948

>>107483915
fuck no, it's obviously in the 2017 grifter stage

Anonymous
12/08/25(Mon)16:26:19 No.107483952

Anonymous 12/08/25(Mon)16:26:19 No.107483952

>>107483823
>>107483907
Oh, and I think we'll probably only continue to see incremental steps with small improvements here and there even with the focus on more modalities, world models, and such until the bubble pops.

Anonymous
12/08/25(Mon)16:28:08 No.107483965

Anonymous 12/08/25(Mon)16:28:08 No.107483965

File: Screen Shot 2025-12-03 at(...).png (155 KB, 146x794)

155 KB PNG

>>107483823
US economy is a bubble. China will spectacularly beat the US in the AI race

Anonymous
12/08/25(Mon)16:38:06 No.107484074

Anonymous 12/08/25(Mon)16:38:06 No.107484074

My first impression of Ministral-3-3B-Instruct-2507 (using Mistral's official Q8_0 quant and running at temp=0.1) is that when it comes to fiction writing it quickly falls into the same stuff Qwen's 30B-A3B / 80B-A3B MoEs do with
Lots.
Of.
Very short.
Lines.
Never stopping.
Like them it doesn't happen immediately but as the context grows it gravitates towards that.

My first impression stepping up to Ministral-3-8B-Instruct-2507 (again using Q8_0 from Mistral) at temp=0.1 is that quirk is less pronounced. However it still starts to pick up degenerate quirks and repeat phrases if I let it go on long enough. The 8B also works at higher temperatures than the 3B. Setting dynatemp_low=0.6, dynatemp_high=1.0, dynatemp_exponent=1.0 made the 3B immediately start becoming incoherent but not the 8B. (I haven't nailed that down specifically as a good setting. It's just one of the early things I tried.)

Anonymous
12/08/25(Mon)17:00:09 No.107484294

Anonymous 12/08/25(Mon)17:00:09 No.107484294

>>107484074
Very Good Findings, Sir... We Are Yet Seeing Gemma Is Superior. Good Evening.

Anonymous
12/08/25(Mon)17:02:00 No.107484310

Anonymous 12/08/25(Mon)17:02:00 No.107484310

>>107484074
>My first impression stepping up to Ministral-3-8B-Instruct-2507 (again using Q8_0 from Mistral) at temp=0.1 is that quirk is less pronounced
Nevermind, going a little further it happens with 8B too.

Anonymous
12/08/25(Mon)17:17:45 No.107484454

Anonymous 12/08/25(Mon)17:17:45 No.107484454

>>107484074
That's due to Deepseek distillation. First signs of model collapse. You should never distill a distill.

Anonymous
12/08/25(Mon)17:20:42 No.107484479

Anonymous 12/08/25(Mon)17:20:42 No.107484479

>>107484454
>never distill a distill.
What if we went deeper?

Anonymous
12/08/25(Mon)17:21:06 No.107484485

Anonymous 12/08/25(Mon)17:21:06 No.107484485

>>107483224
weird, I have had the opposite issue. I can't trigger it to do thinking blocks at all in llama.cpp. I think this is probably weakening the model since it's supposed to rely on thinking. how are you running it, any cli args?

>>107483823
yes duh,it's peaking right now.
Current AI bubble has two huge problems:
>inherent limitations of the architecture, hitting a wall of diminishing returns on MOAR LAYERS. only a completely new experimental architecture can fix this, and this could take years of expensive research to discover.
>overestimated training and inference costs.
just look at Z Image VS Flux 2, or Qwen next:
>This base model achieves performance comparable to (or even slightly better than) the dense Qwen3-32B model, while using less than 10% of its training cost (GPU hours). In inference, especially with context lengths over 32K tokens, it delivers more than 10x higher throughput — achieving extreme efficiency in both training and inference.
https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list

How many more 10x training and inference perf improvements does it take to implode the big tech AI data center speculative investment bubble? Pretty soon we'll be able to train SOTA local LLMs and media generators completely from scratch on consumer hardware. Once we hit this inflection point, the amount of independent research on code and architecture optimizations will explode, as everyone will be able to rapidly test and iterate. Then all of the (rapidly depreciating) AI datacenter hardware will be worthless.

Anonymous
12/08/25(Mon)17:23:04 No.107484507

Anonymous 12/08/25(Mon)17:23:04 No.107484507

>>107484485
That's why I'm confident in >>107483965
>amerimuts and evrocucks stuck layers
>China releases zimage

Anonymous
12/08/25(Mon)17:55:01 No.107484836

Anonymous 12/08/25(Mon)17:55:01 No.107484836

4.6 air comes out and this thread is dead? wtf is happening

Anonymous
12/08/25(Mon)18:02:16 No.107484913

Anonymous 12/08/25(Mon)18:02:16 No.107484913

>>107481870
Really? huh. Thats alot better than I thought. I was expecting like 1-2 t/s.
Anything over 5 I can endure. The thinking might be a problem if its very long but I guess I can just prefill it.

Anonymous
12/08/25(Mon)18:03:04 No.107484921

Anonymous 12/08/25(Mon)18:03:04 No.107484921

>>107484485
>llama.cpp/build/bin/llama-server -m ~/models/PrimeIntellect_INTELLECT-3-IQ4_XS-00001-of-00002.gguf --no-mmap -ngl 99 -ncmoe 43 -fa on -b 512 -ub 512 -c 16384 -t 8 --host 127.0.0.1 --port 5002
I guess you are not asking it for anything naughty. For me it will start doing thinking even without <think>. I have to ban all "Okay, Alright, Hmm" etc thinking words. And the thinking has a very formulaic structure, even when it complies with unsafe prompts, "Who is the intended audience?" "What is the user's intended meaning?" etc.

Anonymous
12/08/25(Mon)18:03:56 No.107484939

Anonymous 12/08/25(Mon)18:03:56 No.107484939

I saw GLM in the news and almost shit myself, then killed myself when I saw it wasn't GLM AIR 4.6.
Two weeks right?

Anonymous
12/08/25(Mon)18:06:32 No.107484968

Anonymous 12/08/25(Mon)18:06:32 No.107484968

Is derestricted the new meme? Is it good? WIsh I could test those before I download.

Anonymous
12/08/25(Mon)18:17:30 No.107485081

Anonymous 12/08/25(Mon)18:17:30 No.107485081

>>107484968
Yes. Don't trust people who say it magically makes a model god tier. It's just a bit of an improvement to make models not refuse prompts. They will still be a bit moralizing and positivity biased depending on the base model in question.

Anonymous
12/08/25(Mon)18:19:43 No.107485106

Anonymous 12/08/25(Mon)18:19:43 No.107485106

>>107484836
no goofs for the 106b, and the 9b is barely coherent

Anonymous
12/08/25(Mon)18:21:36 No.107485128

Anonymous 12/08/25(Mon)18:21:36 No.107485128

Is it cheaper to get a second AMD GPU than 2 RAM sticks nowadays?

Anonymous
12/08/25(Mon)18:24:09 No.107485164

Anonymous 12/08/25(Mon)18:24:09 No.107485164

>>107485081
It's getting integrated into the Heretic toolkit suite but I think the main issue at the moment is lack of formalizing what metrics to use to optimize. A lot of it seems like "this looks right to me" rather than anything else.

Anonymous
12/08/25(Mon)18:25:33 No.107485181

Anonymous 12/08/25(Mon)18:25:33 No.107485181

>>107485106
I'm wondering if they only released 4.6V with vision benchmarks because it benchmarks worse than 4.5 Air on normal text/programming stuff, or because they're going to release a real 4.6 Air to compare to 4.5 Air.

Anonymous
12/08/25(Mon)18:28:19 No.107485209

Anonymous 12/08/25(Mon)18:28:19 No.107485209

>>107485181
why release non vision when everyone loves vision and vision is much betterer than not?

Anonymous
12/08/25(Mon)18:31:49 No.107485250

Anonymous 12/08/25(Mon)18:31:49 No.107485250

>>107484507
>>107483965
china economy is fucked in other ways. they also got demographic problems, etc. nobody is immune to clown world.
>>107484485
Sorry bro, delusion. z-image is comparable to flux 1 which was only a handful of B bigger. Qwen-next is shit. The only thing that imploded was the ram market.

Anonymous
12/08/25(Mon)18:44:12 No.107485370

Anonymous 12/08/25(Mon)18:44:12 No.107485370

File: Google AI Engineer finetu(...).png (1.42 MB, 1024x1024)

1.42 MB PNG

Gemma 4 ready finetuning sirs?
Very good model

Anonymous
12/08/25(Mon)18:49:14 No.107485420

Anonymous 12/08/25(Mon)18:49:14 No.107485420

>>107483823
AI is "advancing" in the sense of becoming more benchmaxxed and slopped over time. Just look at how sovlful older models were compared to the current stuff we get now. Kind of sad really.

Anonymous
12/08/25(Mon)18:50:55 No.107485438

Anonymous 12/08/25(Mon)18:50:55 No.107485438

>>107485370
prompt?

Anonymous
12/08/25(Mon)19:00:07 No.107485523

Anonymous 12/08/25(Mon)19:00:07 No.107485523

Can this be run on llama cpp?
https://huggingface.co/mradermacher/GLM-4.6V-Flash-GGUF

Anonymous
12/08/25(Mon)19:08:27 No.107485615

Anonymous 12/08/25(Mon)19:08:27 No.107485615

>>107484836
I just got it running. 5 minutes interacting with it and it's awesome!
Faster than Gemma-3-27B for me (5 seconds to reply after sending it an image, thinking enabled, and asking for a description.

Anonymous
12/08/25(Mon)19:09:28 No.107485627

Anonymous 12/08/25(Mon)19:09:28 No.107485627

>>107485615
That said, I didn't notice the Parrot issue in GLM-4.6 until after a few weeks using it, so I'm sure this model has it's quirks as well.

Anonymous
12/08/25(Mon)19:11:21 No.107485643

Anonymous 12/08/25(Mon)19:11:21 No.107485643

2026 will be /lmg/s year

Anonymous
12/08/25(Mon)19:13:31 No.107485665

Anonymous 12/08/25(Mon)19:13:31 No.107485665

File: 1741580586496730.webm (566 KB, 670x720)

566 KB WEBM

>>107485627
>I didn't notice the Parrot issue in GLM-4.6 until after a few weeks using it
It happens in literally every reply with dialog, how could it have taken weeks to notice?

Anonymous
12/08/25(Mon)19:19:32 No.107485722

Anonymous 12/08/25(Mon)19:19:32 No.107485722

>>107483823
probably only it all crashing down can make the synthmaxxing retards realize that they really arent making much progress

Anonymous
12/08/25(Mon)19:23:07 No.107485744

Anonymous 12/08/25(Mon)19:23:07 No.107485744

>>107485665
Idk about full 4.6, but for me, I tried Air for a while across different contexts. In some, it did start parrotting very early. In some, it actually went on for quite a while before it started doing it. So it's possible he was just lucky and/or did not really spend THAT much time testing it.

Anonymous
12/08/25(Mon)19:24:27 No.107485761

Anonymous 12/08/25(Mon)19:24:27 No.107485761

File: file.jpg (54 KB, 800x533)

54 KB JPG

Anonymous
12/08/25(Mon)19:24:51 No.107485765

Anonymous 12/08/25(Mon)19:24:51 No.107485765

>>107485665
Yeah takes me a while to notice some things.

Anonymous
12/08/25(Mon)19:26:01 No.107485776

Anonymous 12/08/25(Mon)19:26:01 No.107485776

>>107485665
sex with imp

Anonymous
12/08/25(Mon)19:28:00 No.107485792

Anonymous 12/08/25(Mon)19:28:00 No.107485792

>>107485665
honeymoonphase

Anonymous
12/08/25(Mon)19:29:21 No.107485806

Anonymous 12/08/25(Mon)19:29:21 No.107485806

>>107483823
Yes.
A little bit further but the wall is already being nudged. I think when the bubble pops and the slate is wiped clean we'll see more innovation. Currently as it stands there's no real incentive to do anything other than making slightly smarter models (See: Gemini 3 and GPT 5)

Anonymous
12/08/25(Mon)19:30:25 No.107485815

Anonymous 12/08/25(Mon)19:30:25 No.107485815

>>107485643
/lmg/'s final year

Anonymous
12/08/25(Mon)19:32:35 No.107485832

Anonymous 12/08/25(Mon)19:32:35 No.107485832

>>107485523
Yes, but text only.

Anonymous
12/08/25(Mon)19:37:05 No.107485868

Anonymous 12/08/25(Mon)19:37:05 No.107485868

>>107485815
I don't like your negative attitude!

Anonymous
12/08/25(Mon)19:39:02 No.107485888

Anonymous 12/08/25(Mon)19:39:02 No.107485888

>>107485815
Another year of local stagnation with closed models improving probably will kill off any lingering interest in local models. Also, RAM prices skyrocketing and the effect spilling over to GPUs and all other components mean that increasingly few people will be able to run existing models, let alone newer, bigger ones.

Anonymous
12/08/25(Mon)19:39:04 No.107485890

Anonymous 12/08/25(Mon)19:39:04 No.107485890

>>107485832
I hate ggerganov so much it's unreal

Anonymous
12/08/25(Mon)19:51:55 No.107486016

Anonymous 12/08/25(Mon)19:51:55 No.107486016

>>107485890
use case for a backend that supports all of a model's features?

Anonymous
12/08/25(Mon)19:56:58 No.107486058

Anonymous 12/08/25(Mon)19:56:58 No.107486058

>>107485890
exllama exists and is better in every way if you aren't a vramlet

Anonymous
12/08/25(Mon)19:58:52 No.107486074

Anonymous 12/08/25(Mon)19:58:52 No.107486074

File: 1765241616809.jpg (136 KB, 740x493)

136 KB JPG

something doesn't feels right about muh ssd/ram prices thread in this board
theyre inorganic. every single thread has the same posts pattern on it's reply

Anonymous
12/08/25(Mon)20:01:14 No.107486093

Anonymous 12/08/25(Mon)20:01:14 No.107486093

>>107485868
not negative, just realistic

Anonymous
12/08/25(Mon)20:02:04 No.107486101

Anonymous 12/08/25(Mon)20:02:04 No.107486101

man ram and ssd prices are super expensive right now. i think we should all just give up and sell all of our shit on ebay at a really low price just to stick it to the hardware companies

Anonymous
12/08/25(Mon)20:08:59 No.107486148

Anonymous 12/08/25(Mon)20:08:59 No.107486148

Why is it that when I see a cat image attached to a post, I can already tell that the post has no value whatsoever?

Anonymous
12/08/25(Mon)20:14:36 No.107486199

Anonymous 12/08/25(Mon)20:14:36 No.107486199

>>107483116
I've tested this rigorously myself. On a fixed fiction generating scenario, for most models top-p does better than min-p at removing removing as little wheat as possible when removing the chaff. On a minority of models min p tests better. On no models I've yet tested is there a way to usefully combine min-p and top-p to produce a strictly better filter than either alone. It is however possible to sometimes combine top-p and top-k usefully.

Anonymous
12/08/25(Mon)20:15:48 No.107486210

Anonymous 12/08/25(Mon)20:15:48 No.107486210

>>107486058
>need 5 24 gb cards just to run a glorified 12 model
get fucked, it's like all the advances made in the last 2 years have been wiped out

Anonymous
12/08/25(Mon)20:27:59 No.107486304

Anonymous 12/08/25(Mon)20:27:59 No.107486304

https://huggingface.co/AliceThirty/GLM-4.6V-gguf/tree/main
pump it or dump it?

Anonymous
12/08/25(Mon)20:42:23 No.107486406

Anonymous 12/08/25(Mon)20:42:23 No.107486406

>>107486210
Just 4 bro, exllama will not generate a better quants, it's just a waste of bpw
no, you can't tell the difference, it literally just add padding if no more precision is needed

Anonymous
12/08/25(Mon)20:43:58 No.107486417

Anonymous 12/08/25(Mon)20:43:58 No.107486417

File: 1765244604071.jpg (237 KB, 1280x720)

237 KB JPG

don't you guys have phones? just run llm on it

Anonymous
12/08/25(Mon)20:56:50 No.107486524

Anonymous 12/08/25(Mon)20:56:50 No.107486524

>>107486058
turboderp already delivered a quant of GLM4.6V https://huggingface.co/turboderp/GLM-4.6V-exl3/tree/4.00bpw
Haven't been able to try it yet. Anyone know if all the image and agent stuff works with Tabby?

Anonymous
12/08/25(Mon)21:15:40 No.107486631

Anonymous 12/08/25(Mon)21:15:40 No.107486631

>>107486417
phones also need ram

Anonymous
12/08/25(Mon)21:16:34 No.107486639

Anonymous 12/08/25(Mon)21:16:34 No.107486639

So, 2 1/2 years later, why does anyone listen to AI safety people?

Anonymous
12/08/25(Mon)21:20:01 No.107486654

Anonymous 12/08/25(Mon)21:20:01 No.107486654

>>107486639
No one ever did. They just scared corporate CYA zombies.
At this point its the same as TV psychics that haven't won the lottery...if they had anything dangerous or super intelligent, humanity would have been wiped out or they'd be out there winning bigly.
Despite the tech being world-altering and amazing, its still 1% tech and 99% grift.

Anonymous
12/08/25(Mon)21:21:26 No.107486661

Anonymous 12/08/25(Mon)21:21:26 No.107486661

So is this 4.6V better than 4.5 Air? I don't care about vision meme.

Anonymous
12/08/25(Mon)21:22:12 No.107486664

Anonymous 12/08/25(Mon)21:22:12 No.107486664

>>107486639
Because they actually have a point about stuff like teens being driven to suicide over believing their waifu chatbot is real. Companies therefore get pressure to need to do something over it and that is the only real solution other than age gating the service entirely which is dumb. I don't like it either but absolute retards and underage ruin it for the rest of us. That's not to mention the megalomanics who want all that power and deny you access because they have a vision for how you should use LLMs.

Anonymous
12/08/25(Mon)21:24:07 No.107486675

Anonymous 12/08/25(Mon)21:24:07 No.107486675

>>107484968
It's not a complete failure. It makes the model somewhat more 'mute' or 'tame' (not talking about its inability to phrase sordid material) and affects its output. It's not bad though.

Anonymous
12/08/25(Mon)21:30:29 No.107486709

Anonymous 12/08/25(Mon)21:30:29 No.107486709

>>107483875
Aware me?

Anonymous
12/08/25(Mon)21:31:07 No.107486713

Anonymous 12/08/25(Mon)21:31:07 No.107486713

Is it just me or quality of Wan 2.2 FLF2V with lightning LoRA is really bad especially the last keyframes? Is VACE just a better way to go for long videos?

Anonymous
12/08/25(Mon)21:36:47 No.107486742

Anonymous 12/08/25(Mon)21:36:47 No.107486742

>>107486654
Thats what it seems like/has seemed to me.

>>107486664
But teen suicide due to believing an LLM/being mentally handicapped is not an issue on the LLM service provider, its on the parents/caretakers. To be clear, I'm not talking about filters/guard-rails, I'm talking about safetyists like the SSI company or Anthropic, or any of the 'rationalists'.

They all seem like blowhard whackos who got a little too deep in psychotropics, and never came down from an acid trip. All the bullshit about 'building bioweapons' with chatgpt and the like.
Sure, anthropic is doing it for market control, and saltman wouldn't go against that, but everyone who isn't a CEO/on the board of a large company with money to gain talking about how 'AI is going to kill us all!' or doomposting about 'AI Powered Super Weapons', leads me to believe they're all part of the lesswrong/safetyists cult/grifters hoping to cash in somehow, but that doesn't seem right

Anonymous
12/08/25(Mon)21:45:34 No.107486796

Anonymous 12/08/25(Mon)21:45:34 No.107486796

>>107486639
no one does, their movement completely collapsed and their only relevance is some true believers hanging around at anthropic etc.
they are basically absent from the discussion at large about AI at this point, even random xitter artists with their vulgar anti-ai "pick up the pencil"-ism have drastically more presence in the anti side of the AI debate than safetyists.

Anonymous
12/08/25(Mon)21:46:53 No.107486805

Anonymous 12/08/25(Mon)21:46:53 No.107486805

>>107483965
Also Trump finally let leather jacket man sell his goodies to China, while GPT 5.2 being rushed out the gate makes it seem to be less of a spectacular comeback and more of a corpse voiding its bowels

Anonymous
12/08/25(Mon)21:49:01 No.107486824

Anonymous 12/08/25(Mon)21:49:01 No.107486824

File: 0_420_748_0_70_-News-prom(...).png (501 KB, 748x364)

501 KB PNG

>>107486742
>But teen suicide due to believing an LLM/being mentally handicapped is not an issue on the LLM service provider, its on the parents/caretakers.
Oh I agree 100%.
>I'm talking about safetyists like the SSI company or Anthropic, or any of the 'rationalists'.
I was talking about them too. The situation I mentioned as an example is exactly why they have power, because some of the theoretical problems and stuff they talked about actually came true. I'm not saying they aren't for the most part aren't grifting and waxing poetics about hypotheticals regarding X or Y. But some of the stuff did predictably become true even if you aren't a safety oriented person like what I mentioned and stuff like ransomware utilizing LLMs actually being a thing like PromptLock which is actually in the wild. These few things manifesting to being true mean the safety people will continue to have power because people fear uncertainty and bad stuff happening and the liability.
>>107486796
People do listen, just because we don't talk about them and disregard them as does most other LLM enthusiast places online doesn't mean they aren't being listened to.

Anonymous
12/08/25(Mon)22:06:17 No.107486923

Anonymous 12/08/25(Mon)22:06:17 No.107486923

What if I told you that Mistral Medium 3 will be coming by Christmas and it will actually be good?

Anonymous
12/08/25(Mon)22:07:18 No.107486926

Anonymous 12/08/25(Mon)22:07:18 No.107486926

>>107486923
Not local, who cares

Anonymous
12/08/25(Mon)22:09:47 No.107486942

Anonymous 12/08/25(Mon)22:09:47 No.107486942

>>107486926
It might be...

Anonymous
12/08/25(Mon)22:11:50 No.107486953

Anonymous 12/08/25(Mon)22:11:50 No.107486953

>>107486942
Medium has always been their closed models that they actually make money out of, they're not going to release it.

Anonymous
12/08/25(Mon)22:12:38 No.107486962

Anonymous 12/08/25(Mon)22:12:38 No.107486962

>>107486953
But what if we do?

Anonymous
12/08/25(Mon)22:13:51 No.107486971

Anonymous 12/08/25(Mon)22:13:51 No.107486971

>>107486962
That don't be like it is

Anonymous
12/08/25(Mon)22:14:44 No.107486979

Anonymous 12/08/25(Mon)22:14:44 No.107486979

File: file.jpg (87 KB, 1200x1060)

87 KB JPG

>>107486971

Anonymous
12/08/25(Mon)22:31:20 No.107487092

Anonymous 12/08/25(Mon)22:31:20 No.107487092

>>107482917
>I wish I could use AI to turn myself into a woman...

https://vocaroo.com/1l5SkYOxW2AF

Anonymous
12/08/25(Mon)22:48:27 No.107487229

Anonymous 12/08/25(Mon)22:48:27 No.107487229

>>107486953
>that they actually make money out of
Do they though?

Anonymous
12/08/25(Mon)22:51:52 No.107487256

Anonymous 12/08/25(Mon)22:51:52 No.107487256

File: 1715445682044736.png (627 KB, 819x819)

627 KB PNG

>>107486824
gotcha

Anonymous
12/08/25(Mon)22:53:06 No.107487265

Anonymous 12/08/25(Mon)22:53:06 No.107487265

>>107487229
Like most AI companies, Mistral's primary source of money is from their nation's taxpayer dollars
But they do license out Medium to corps and through APIs, so it's something.

Anonymous
12/08/25(Mon)23:27:24 No.107487525

Anonymous 12/08/25(Mon)23:27:24 No.107487525

>>107487265
i left france 4 years ago and boy do i not regret the decision.
tax rate is like 85% end to end.

Anonymous
12/08/25(Mon)23:27:49 No.107487529

Anonymous 12/08/25(Mon)23:27:49 No.107487529

File: file.png (78 KB, 539x164)

78 KB PNG

Anonymous
12/08/25(Mon)23:30:58 No.107487548

Anonymous 12/08/25(Mon)23:30:58 No.107487548

>>107487529
For a unquantized model that would only be like 80b

Anonymous
12/08/25(Mon)23:36:16 No.107487575

Anonymous 12/08/25(Mon)23:36:16 No.107487575

Group Representational Position Encoding
https://arxiv.org/abs/2512.07805
https://arxiv.org/pdf/2512.07805
>We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in SO(d) and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group GL. In Multiplicative GRAPE, a position n ∈ Z (or t ∈ R) acts as G(n) = exp(n ω L) with a rank-2 skew generator L ∈ R d×d , yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the d/2 planes are the canonical coordinate pairs with log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at O(d) and O(rd) cost per head, respectively. In Additive GRAPE, additive logits arise as rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases.
https://github.com/model-architectures/GRAPE
neat. no image sorry since it seems my ip range is blocked from uploading images (first time that's ever happened to me).

Anonymous
12/08/25(Mon)23:53:24 No.107487703

Anonymous 12/08/25(Mon)23:53:24 No.107487703

>>107487575
>Group RepresentatiOnal Position Encoding

Anonymous
12/08/25(Mon)23:54:49 No.107487713

Anonymous 12/08/25(Mon)23:54:49 No.107487713

>>107487575
should have gone with GROUP RAPE

Anonymous
12/08/25(Mon)23:57:55 No.107487746

Anonymous 12/08/25(Mon)23:57:55 No.107487746

>>107487575
https://www.youtube.com/watch?v=EzgUGY36gqM

Anonymous
12/08/25(Mon)23:59:37 No.107487762

Anonymous 12/08/25(Mon)23:59:37 No.107487762

>>107487529
Miku-2?

Anonymous
12/09/25(Tue)00:02:19 No.107487792

Anonymous 12/09/25(Tue)00:02:19 No.107487792

>>107487548
if it's 70b i think we should nuke france

Anonymous
12/09/25(Tue)00:22:03 No.107487902

Anonymous 12/09/25(Tue)00:22:03 No.107487902

>>107487548
mistral medium/miqu was only leaked as q5 + q2 and all quants/merges were based on versions that were padded out to act as fp16.

Anonymous
12/09/25(Tue)00:43:14 No.107488023

Anonymous 12/09/25(Tue)00:43:14 No.107488023

>>107487529
hon hon... oui oui...merci l'anon...

Anonymous
12/09/25(Tue)00:45:38 No.107488035

Anonymous 12/09/25(Tue)00:45:38 No.107488035

I'm gonna buy a 5090 but how much ram do I need to leave nemo hell and have fun RP-ing for the next couple of years?

Anonymous
12/09/25(Tue)00:49:08 No.107488050

Anonymous 12/09/25(Tue)00:49:08 No.107488050

>>107488035
All of it

Anonymous
12/09/25(Tue)00:50:41 No.107488064

Anonymous 12/09/25(Tue)00:50:41 No.107488064

>>107487792
a new dense 70b would be refreshing though
all their other recent sizes suck

Anonymous
12/09/25(Tue)00:51:51 No.107488072

Anonymous 12/09/25(Tue)00:51:51 No.107488072

>>107488035
If you have to ask rather than just buying as much as your motherboard can support then you probably shouldn't be buying a 5090 either.

Anonymous
12/09/25(Tue)00:56:52 No.107488094

Anonymous 12/09/25(Tue)00:56:52 No.107488094

>>107488072
I also want to do video gen.

But after 12b and 24b, surely llms get noticeably better at a certain size and more enjoyable for rp? 70b? 100b? I'll never run a 1T monster locally but there's got to be an entry level where you're no longer with toy models and using something that can truly bring a character card to life

Anonymous
12/09/25(Tue)01:03:58 No.107488146

Anonymous 12/09/25(Tue)01:03:58 No.107488146

>>107488094
>I also want to do video gen.
Alright, a 5090 makes more sense in that case
>But after 12b and 24b, surely llms get noticeably better at a certain size
32GB is not enough for 70b dense and there's nothing worth using between ~20b and 70b, so you'll at the mercy of whatever MoE models get released. GLM4.6/Air are the recent notable ones but have severe parotting issues, beyond that you'd be looking at something like Mistral Large, or the bigger Qwen models and Kimi, but they're all going to be fairly slow with just 32GB VRAM, no matter how much system RAM you can stack.

Anonymous
12/09/25(Tue)01:08:23 No.107488177

Anonymous 12/09/25(Tue)01:08:23 No.107488177

>>107488146
I've deliberately never tried the larger parameter models because I don't want to know what I'm missing out on but when do llms get really good at rp?

Anonymous
12/09/25(Tue)01:17:06 No.107488236

Anonymous 12/09/25(Tue)01:17:06 No.107488236

File: thats the neat part, they dont.png (330 KB, 927x500)

330 KB PNG

>>107488177

Anonymous
12/09/25(Tue)01:20:22 No.107488250

Anonymous 12/09/25(Tue)01:20:22 No.107488250

>>107488236
how is that possible? For text-to-image, we have z-image turbo which is a tiny model that can run fast on a shitty system and its prompt following and realism is utterly amazing but there's no corresponding llm out there with similar qualities?

Anonymous
12/09/25(Tue)01:23:29 No.107488266

Anonymous 12/09/25(Tue)01:23:29 No.107488266

>>107488250
nope. every single model is slopped to hell and back. the ones that arent are old and retarded (and also slightly slopped). something like 70% of the internet is now AI generated, which negatively affected all training data starting around mid to late 2023. there is no way to make a good model that is smart and good at writing that is not slopped. companies just reiterate over old technology, but they quite literally need to start from scratch if they want to actually create something good (they dont)

Anonymous
12/09/25(Tue)01:27:38 No.107488291

Anonymous 12/09/25(Tue)01:27:38 No.107488291

>>107488177
Even flagship models have plenty of flaws and slop, and a model being big doesn't necessarily mean it will be good at RP.
Beyond the ~20-30b dense models that are easily run on consumer hardware, you'd need to jump to 70b dense or 200b+ moe for noticeable improvements.

Anonymous
12/09/25(Tue)01:28:52 No.107488300

Anonymous 12/09/25(Tue)01:28:52 No.107488300

>>107488291
and the big moes are shit and the 70b denses are retarded due to age

Anonymous
12/09/25(Tue)01:31:33 No.107488316

Anonymous 12/09/25(Tue)01:31:33 No.107488316

>>107488266
so I should just optimize ram for video gen and if some miracle happens with llm then that's cool but I shouldn't be wasting money trying to build a system that can run a specific model?

Anonymous
12/09/25(Tue)01:33:57 No.107488328

Anonymous 12/09/25(Tue)01:33:57 No.107488328

>>107488300
>70b denses are retarded due to age
Their age means that less of their dataset was AI-generated, which helps them not be retarded in the same ways modern models are.

Anonymous
12/09/25(Tue)01:37:52 No.107488353

Anonymous 12/09/25(Tue)01:37:52 No.107488353

>>107488316
more or less yeah
>>107488328
correct. they are a different kind of retarded

Anonymous
12/09/25(Tue)01:45:04 No.107488384

Anonymous 12/09/25(Tue)01:45:04 No.107488384

File: 1641143339-macron-lunette(...).png (66 KB, 272x204)

66 KB PNG

>>107487525
Where did you go? Asking for a friend

Anonymous
12/09/25(Tue)01:46:31 No.107488395

Anonymous 12/09/25(Tue)01:46:31 No.107488395

>>107488384
Switzerland

Anonymous
12/09/25(Tue)01:56:11 No.107488443

Anonymous 12/09/25(Tue)01:56:11 No.107488443

>>107488300
70b dense models are less knowledgeable than big moes but are better attention wise and overall less slopped imo
I've noticed this in roleplaying with both

Anonymous
12/09/25(Tue)01:57:33 No.107488454

Anonymous 12/09/25(Tue)01:57:33 No.107488454

>>107488443
correct, but that lack of knowledge is quite crucial as it usually leads to the model forgetting things quickly. they lack spacial reasoning in comparison to modern models.

Anonymous
12/09/25(Tue)02:05:39 No.107488498

Anonymous 12/09/25(Tue)02:05:39 No.107488498

>>107488443
If Gemma 3 had a 70B dense version it'd mog everything including Kimi on knowledge

Anonymous
12/09/25(Tue)02:43:58 No.107488666

Anonymous 12/09/25(Tue)02:43:58 No.107488666

Would a Tesla v100 32gb hbm2 card be a good option for my server to run chatbots on my local network, or am I better off getting a newer 50 series?

Anonymous
12/09/25(Tue)02:47:19 No.107488685

Anonymous 12/09/25(Tue)02:47:19 No.107488685

>>107486923
It's OK at best, not really great in absolute terms. At least it's something that more than a couple users would actually be able to run locally.

Anonymous
12/09/25(Tue)02:49:10 No.107488690

Anonymous 12/09/25(Tue)02:49:10 No.107488690

>>107487548
Not if FP8.

Anonymous
12/09/25(Tue)02:50:00 No.107488693

Anonymous 12/09/25(Tue)02:50:00 No.107488693

>>107488666
satan trips checked, but no. the volta architecture is severely outdated. a 5090 has around triple the performance of a v100 if that is within your budget. if not, well you dont really have many options that would compete in vram quantity. 2 5070tis would be about 50% faster with the same vram quantity

Anonymous
12/09/25(Tue)02:51:26 No.107488704

Anonymous 12/09/25(Tue)02:51:26 No.107488704

>>107488693
Alright I'll keep saving. Thanks

Anonymous
12/09/25(Tue)03:05:43 No.107488781

Anonymous 12/09/25(Tue)03:05:43 No.107488781

>>107488690
Does anyone release models in FP8 anymore?

Anonymous
12/09/25(Tue)03:09:05 No.107488802

Anonymous 12/09/25(Tue)03:09:05 No.107488802

>>107488781
MistralAI just did for Large 3 and Ministral 3 Instruct.

Anonymous
12/09/25(Tue)03:13:41 No.107488835

Anonymous 12/09/25(Tue)03:13:41 No.107488835

>>107488685
They never release the medium models tho? Mistral always releases the small and large models, medium staying api only.

Anonymous
12/09/25(Tue)03:21:26 No.107488870

Anonymous 12/09/25(Tue)03:21:26 No.107488870

File: ministral3-pruning.png (69 KB, 460x387)

69 KB PNG

>>107488835
I don't think they will release it, at least officially (if you want to hope for a Miqu-style leak), because for all new models they now legally have to provide documentation about the training data and I don't think Mistral Medium is fully EU-compliant (I could be wrong). For Ministral 3 models the loophole was that they are pruned versions of Mistral Small 3.1, so technically not completely new.

Anonymous
12/09/25(Tue)03:49:07 No.107489030

Anonymous 12/09/25(Tue)03:49:07 No.107489030

>>107488870
One of the reasons why i believe the eu won't give us any good models anymore. The eu shot itself in the head, literally.

Anonymous
12/09/25(Tue)04:19:00 No.107489192

Anonymous 12/09/25(Tue)04:19:00 No.107489192

File: 1734510273466794.jpg (2.87 MB, 1875x2833)

2.87 MB JPG

Hi /lmg/
I want to get into local LLMs for ERP and have been looking at the rentry spoonfeed guide but couldn't find any information to answer a question I have.
I have close to the bare minimum specs (2060 with 6GB VRAM, 32 GB of normal RAM) but that should be enough for basic ERP and chatting with a smaller model, right? I don't care for much higher order reasoning, but I want the privacy and customization that a locally hosted model has. And any recommendations for a model?
Spoonfeed me this info please so I don't waste a few hours setting this up just to get subpar results.

Anonymous
12/09/25(Tue)04:24:24 No.107489235

Anonymous 12/09/25(Tue)04:24:24 No.107489235

>>107489192
read the getting started guide
https://rentry.org/lmg-lazy-getting-started-guide
Get Nemo from here
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
Get iq4_XS quant to start with, and google how to partial offload to RAM in kobold

Anonymous
12/09/25(Tue)04:35:01 No.107489307

Anonymous 12/09/25(Tue)04:35:01 No.107489307

>>107489192
get koboldcpp
click here (download starts) https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/resolve/main/Mistral-Nemo-Instruct-2407-Q4_0.gguf?download=true
load into koboldcpp
try 4096 for context size, which makes it remember half as much as 8192, which u can try later
start
play around

Anonymous
12/09/25(Tue)04:37:25 No.107489329

Anonymous 12/09/25(Tue)04:37:25 No.107489329

File: 1742029381330177.jpg (390 KB, 1952x2774)

390 KB JPG

>>107489235
>>107489307
Thanks anons
I think my hardware might still struggle with that model but I'll look into more quantized versions or offloading once I get it set up

Anonymous
12/09/25(Tue)04:50:10 No.107489406

Anonymous 12/09/25(Tue)04:50:10 No.107489406

>>107481183
I bought an RTX 5090, best RP models that can fit entirely in it at a reasonable quant?

Anonymous
12/09/25(Tue)04:51:28 No.107489411

Anonymous 12/09/25(Tue)04:51:28 No.107489411

>>107489406
mistral nemo
mistral small 24b

Anonymous
12/09/25(Tue)04:53:05 No.107489427

Anonymous 12/09/25(Tue)04:53:05 No.107489427

>>107489406
i would try cydonia 24b

Anonymous
12/09/25(Tue)04:56:13 No.107489455

Anonymous 12/09/25(Tue)04:56:13 No.107489455

>>107489411
>>107489427
Thanks, will try these

Anonymous
12/09/25(Tue)05:07:53 No.107489521

Anonymous 12/09/25(Tue)05:07:53 No.107489521

>>107489192
You can run 32B-34B models at above 1t/s at 8k with that setup. Since i have the same. Rtx 2060 and 32gb of ram, i don't bother with anything below 27B.

Anonymous
12/09/25(Tue)05:11:43 No.107489553

Anonymous 12/09/25(Tue)05:11:43 No.107489553

>>107489521
1t/s is unbareably slow

Anonymous
12/09/25(Tue)05:12:15 No.107489558

Anonymous 12/09/25(Tue)05:12:15 No.107489558

what happened to the vibe coding thread?

Anonymous
12/09/25(Tue)05:13:54 No.107489569

Anonymous 12/09/25(Tue)05:13:54 No.107489569

>>107489558
just vibe code another one?

Anonymous
12/09/25(Tue)05:14:30 No.107489577

Anonymous 12/09/25(Tue)05:14:30 No.107489577

>>107489553
I barely have to retry, and i tell the models to keep replies short which works most of the time. Especially at q5km.

Anonymous
12/09/25(Tue)05:21:17 No.107489635

Anonymous 12/09/25(Tue)05:21:17 No.107489635

File: 1738181683002051.png (281 KB, 1080x2169)

281 KB PNG

>>107481183
Are there any Mac OS /ldg/ anons here? Thinking about jumping shit from Windows laptops And returning home to the Mac OS garden. Google recently updated the pixel lines of phone so that airdrop between those bones and apple/ this is possible (this will allegedly be available to Snapdragon Android phones in the future) so me not being entrenched in the walled garden isn't even a problem anymore (I'm writing this on a pixel 10 pro fold. Yes I'm a phone poster sue me). I have extensive experience and knowledge regarding using and training both stable diffusion models (particularly loras) and low parameter LLMs (2 - to 7b range) but I always have to rent cloud gpus from runpod, Google, Civitai, etc. Ram prices aren't going down anytime soon and I kind of hate the direction Windows is going so I think it wouldn't be too unwise to look at a decently powerful MacBook for local model inference. Obviously I cannot train anything on it, but I can definitely run something semi-decent on Apple metal right? Looks like decently powerful MacBook (not Mac mini, I want the machine to be mobile since my work frequently sends me overseas) I should look into? I'm eyeing pic rel but I haven't used MacBooks extensively in like 5 years so I haven't been keeping up with their specs.

Anonymous
12/09/25(Tue)05:22:13 No.107489644

Anonymous 12/09/25(Tue)05:22:13 No.107489644

File: 1758259348395898.png (212 KB, 1066x1611)

212 KB PNG

>>107489635
>/ldg/
*/lmg/

Anonymous
12/09/25(Tue)05:33:47 No.107489726

Anonymous 12/09/25(Tue)05:33:47 No.107489726

>>107489635
>trading in your balls for 36gb of unified memory

Anonymous
12/09/25(Tue)05:52:24 No.107489859

Anonymous 12/09/25(Tue)05:52:24 No.107489859

>>107489635
Isn’t that the same price as a shiny new 5090?
Just take the Linux pill bro.

Anonymous
12/09/25(Tue)05:58:48 No.107489923

Anonymous 12/09/25(Tue)05:58:48 No.107489923

best 70B model for erp?

Anonymous
12/09/25(Tue)06:00:18 No.107489937

Anonymous 12/09/25(Tue)06:00:18 No.107489937

>>107489923
Miqu

Anonymous
12/09/25(Tue)06:01:39 No.107489947

Anonymous 12/09/25(Tue)06:01:39 No.107489947

>>107489937
Unironically true, since newer 70B are bad.

Anonymous
12/09/25(Tue)06:04:28 No.107489968

Anonymous 12/09/25(Tue)06:04:28 No.107489968

>>107489947
>since newer are bad
fixed

Anonymous
12/09/25(Tue)06:05:54 No.107489978

Anonymous 12/09/25(Tue)06:05:54 No.107489978

>>107489968
Are you saying that all new models are bad?

Anonymous
12/09/25(Tue)06:06:49 No.107489983

Anonymous 12/09/25(Tue)06:06:49 No.107489983

>>107489978
Just the ones I can run and the ones I can't

Anonymous
12/09/25(Tue)06:08:53 No.107490001

Anonymous 12/09/25(Tue)06:08:53 No.107490001

>>107489978
ye

Anonymous
12/09/25(Tue)06:09:05 No.107490004

Anonymous 12/09/25(Tue)06:09:05 No.107490004

>>107489983
They must be better than qwen 3 at least.

Anonymous
12/09/25(Tue)06:14:25 No.107490038

Anonymous 12/09/25(Tue)06:14:25 No.107490038

>>107489859
As I mentioned earlier, I want the thing to be mobile. I travel enough that I'm buying a gpu ALONG with all the other shit required for that setup (have you been seen the RAM prices recently) would be way more expensive than even a beefed up Macbook.

Anonymous
12/09/25(Tue)06:42:54 No.107490234

Anonymous 12/09/25(Tue)06:42:54 No.107490234

>>107490157
Do they have discounted rates for Indian customers?

Anonymous
12/09/25(Tue)07:09:53 No.107490411

Anonymous 12/09/25(Tue)07:09:53 No.107490411

>>107490038
To be fair, “local” doesn’t imply you’re sitting next to the machine.
My GPU rig is headless and I connect to it from my laptop over wireguard. I don’t use Tailscale (which is wireguard+a hosted endpoint), but I’ve heard it’s easy to set up.

Anonymous
12/09/25(Tue)07:26:34 No.107490519

Anonymous 12/09/25(Tue)07:26:34 No.107490519

>>107486923
>>107487529
Please give us Medium 3... s'il vous plaît...

Anonymous
12/09/25(Tue)07:28:58 No.107490530

Anonymous 12/09/25(Tue)07:28:58 No.107490530

Is Wayfarer 24b still the best local model to use if I want adventures with 'bad' ends?

Also are there settings I can use so that it's less quick to get to the bad end?
It really lacks the build up and tension of whatever JanitorAI uses.

Anonymous
12/09/25(Tue)07:30:28 No.107490544

Anonymous 12/09/25(Tue)07:30:28 No.107490544

>>107490530
*Wayfarer 12B rather

Anonymous
12/09/25(Tue)07:31:22 No.107490548

Anonymous 12/09/25(Tue)07:31:22 No.107490548

>>107490519
They didn't even let NVidia release Mistral-Nemotron, which was based on Mistral Medium 3.0.

Anonymous
12/09/25(Tue)07:33:28 No.107490563

Anonymous 12/09/25(Tue)07:33:28 No.107490563

File: 1763271108574697.png (536 KB, 2000x1125)

536 KB PNG

>>107489192
Oooh fuck I got it working and it was better than expected, now I'm looking forwards to making my own tailored prompts and trying out different models to see which ones suit my tastes. Time to go down the /lmg/ rabbit hole.
Thanks to everyone who spoonfed me

Anonymous
12/09/25(Tue)07:38:16 No.107490597

Anonymous 12/09/25(Tue)07:38:16 No.107490597

>>107490563
Enjoy your models.

Anonymous
12/09/25(Tue)07:43:14 No.107490629

Anonymous 12/09/25(Tue)07:43:14 No.107490629

>>107490548
Considering that every single Nemotron release has been complete and utter garbage, I couldn't care less.

Anonymous
12/09/25(Tue)07:47:22 No.107490652

Anonymous 12/09/25(Tue)07:47:22 No.107490652

4.6 pure is still better than 4.6v right? Btw did I mention that 4.6 changed my IRL life and made me reach enlightenment? It actually did.

Anonymous
12/09/25(Tue)07:49:38 No.107490669

Anonymous 12/09/25(Tue)07:49:38 No.107490669

>>107490652
How did it enlighten you?

Anonymous
12/09/25(Tue)07:51:16 No.107490685

Anonymous 12/09/25(Tue)07:51:16 No.107490685

>>107490652
If you achieved enlightenment then you wouldn't be chasing after other models

Anonymous
12/09/25(Tue)08:00:12 No.107490769

Anonymous 12/09/25(Tue)08:00:12 No.107490769

>>107490685
I told it about my fucked up life and psyche and asked it for a way to find internal validation and self love. And in the autistic LLM stroke of genius it wrote me a western language framework that basically made me start analyzing my own brain that eventually led to me getting ego death and enlightenment. In retrospect I see that it was translating essence of Zen into western language to me. And not even western language. My language. It was a perfect mirror that allowed me to see the mechanisms of my psyche and when I saw them they lost power.
>>107490685
Here is 4.6 hallucinated koan that is kind of an essence of why not:

Joshu's tea.

A monk came to Joshu, a great Zen master.
Joshu asked, "Have you had your tea?"
The monk said, "Yes, I have."
Joshu said, "Go and wash your bowl."

Later, another monk came.
Joshu asked, "Have you had your tea?"
The monk said, "No, I have not."
Joshu said, "Go and have your tea."

Anonymous
12/09/25(Tue)08:01:28 No.107490781

Anonymous 12/09/25(Tue)08:01:28 No.107490781

>>107490652
4.6V is much smaller than 4.6.

Anonymous
12/09/25(Tue)08:02:07 No.107490785

Anonymous 12/09/25(Tue)08:02:07 No.107490785

>>107490781
Rhetorical question on my side really.

Anonymous
12/09/25(Tue)08:02:35 No.107490793

Anonymous 12/09/25(Tue)08:02:35 No.107490793

>>107490769
That's nice sweaty, but I'm not reading all that.

Anonymous
12/09/25(Tue)08:03:14 No.107490802

Anonymous 12/09/25(Tue)08:03:14 No.107490802

>>107490781
Are you talking about the flash version?

Anonymous
12/09/25(Tue)08:04:23 No.107490811

Anonymous 12/09/25(Tue)08:04:23 No.107490811

>>107490802
4.6V is 108B
4.6 (full) is 357B

Anonymous
12/09/25(Tue)08:04:32 No.107490812

Anonymous 12/09/25(Tue)08:04:32 No.107490812

>>107490802
V is 100B. 4.6 is 350B so doubtful they could make it cover the gap.

Anonymous
12/09/25(Tue)08:07:29 No.107490833

Anonymous 12/09/25(Tue)08:07:29 No.107490833

>>107490811
Why didn't they call it 4.6V-Air?

Anonymous
12/09/25(Tue)08:10:18 No.107490848

Anonymous 12/09/25(Tue)08:10:18 No.107490848

>>107490833
Because Air is not exist? Also 4.5V was also based on Air

Anonymous
12/09/25(Tue)08:10:29 No.107490850

Anonymous 12/09/25(Tue)08:10:29 No.107490850

>>107490833
Probably because there's been some regressions from tacking on vision support, and they're saving the Air naming for a later release.

Anonymous
12/09/25(Tue)08:10:31 No.107490851

Anonymous 12/09/25(Tue)08:10:31 No.107490851

>>107490833
because 4.5V is also 108b

Anonymous
12/09/25(Tue)08:12:52 No.107490877

Anonymous 12/09/25(Tue)08:12:52 No.107490877

>>107490850
Thank you for the clarification.

Anonymous
12/09/25(Tue)08:23:00 No.107490963

Anonymous 12/09/25(Tue)08:23:00 No.107490963

>my voice to text prompt
>set to llm
>llm response in voice
>basically taking to AI like talking to friends
is there an existing tool for this?

Anonymous
12/09/25(Tue)08:28:07 No.107491014

Anonymous 12/09/25(Tue)08:28:07 No.107491014

>>107490530
wayfarer is pretty bad.
i would try one of the readyart models
https://huggingface.co/ReadyArt/models
be warned they will fuck you up though

Anonymous
12/09/25(Tue)08:32:03 No.107491052

Anonymous 12/09/25(Tue)08:32:03 No.107491052

>>107491014
I tried their qwen 3 32B tune, wasn't impressed at all.

Anonymous
12/09/25(Tue)08:39:22 No.107491123

Anonymous 12/09/25(Tue)08:39:22 No.107491123

>>107491052
i've only personally tried their nemo or mistral small tunes.
since i upgraded GLM is better than any of that shit anyway now

Anonymous
12/09/25(Tue)08:47:06 No.107491183

Anonymous 12/09/25(Tue)08:47:06 No.107491183

the deepseek_v32 architecture doesn't have goofs...hell, it doesn't even have _transformers_ support at this point...
I think the amount of non-support and zero work on what's a local SOTA model has got to be unprecedented
We've had lack of interest and stalled implementation of garbage models in the past, but I don't think I've heard crickets like this on models that are actually good.
Did I miss some memo where everyone decided the ds32 series (especially the latest releases) are actually worthless? Is everyone just burned out on the new model LLM grind?

Anonymous
12/09/25(Tue)08:52:16 No.107491220

Anonymous 12/09/25(Tue)08:52:16 No.107491220

>>107491183
Real question: what percentage of llamacpp contributors are llm masturbators?

Anonymous
12/09/25(Tue)08:52:57 No.107491227

Anonymous 12/09/25(Tue)08:52:57 No.107491227

>>107491183
transformers is maintained by HuggingFace, an American company. So I don't see them going out of their way to implement an "enemy" model
llama.cpp has the usual difficulty of porting everything to C++ on their fragile codebase and finding volunteers to do it, on top of that now there's a vibecoder parking on the issue and apparently everyone is waiting for him to order and read some books before proceeding
don't expect support any time soon unless DeepSeek writes the pull requests themselves

Anonymous
12/09/25(Tue)08:54:52 No.107491244

Anonymous 12/09/25(Tue)08:54:52 No.107491244

>>107491183
I know it's not high value or anything but from testing on their site ds3.2 feels quite worse than previous versions does possessive error on who are what something belongs to or said something and I've even seen it make typos.

Anonymous
12/09/25(Tue)08:57:55 No.107491275

Anonymous 12/09/25(Tue)08:57:55 No.107491275

>>107491014
Which one?
I notice they have a couple look intended for the the female pov, and I'll probably have a FemPC - they any good or are the generalised one better?

Anonymous
12/09/25(Tue)09:01:08 No.107491301

Anonymous 12/09/25(Tue)09:01:08 No.107491301

>>107488094
I wouldn't hold my breath. Models like Deepseek and GLM 4.6 still make basic temporal and spatial errors, like forgetting what clothes a character had on them, or the relative position between multiple objects in a room.

Anonymous
12/09/25(Tue)09:01:17 No.107491302

Anonymous 12/09/25(Tue)09:01:17 No.107491302

>>107491183
I'm sure it'll be any moment now, llamacpp's finest vibe coders are on the case

Anonymous
12/09/25(Tue)09:08:36 No.107491373

Anonymous 12/09/25(Tue)09:08:36 No.107491373

>>107491301
>Models like Deepseek and GLM 4.6 still make basic temporal and spatial errors
Because they are bloated 37B and 32B models, respectively. LLMs don't get better temporal and spatial reasoning until at least 70B, ideally over 100B. Anyone who ever used Command-R+ would know this firsthand.

Anonymous
12/09/25(Tue)09:09:04 No.107491376

Anonymous 12/09/25(Tue)09:09:04 No.107491376

GLM Air good for rp?

Anonymous
12/09/25(Tue)09:10:14 No.107491388

Anonymous 12/09/25(Tue)09:10:14 No.107491388

>>107491376
No

Anonymous
12/09/25(Tue)09:11:58 No.107491401

Anonymous 12/09/25(Tue)09:11:58 No.107491401

>>107491388
Good compared to the alternatives?

Anonymous
12/09/25(Tue)09:13:11 No.107491414

Anonymous 12/09/25(Tue)09:13:11 No.107491414

>>107491183
All these DS minor releases feel barely different in real use, you'll see more interest if and when they actually bring something new to the table
Speciale being the exception, but I'm not sure how much appeal a model that thinks forever at local speeds is going to have

Anonymous
12/09/25(Tue)09:14:47 No.107491431

Anonymous 12/09/25(Tue)09:14:47 No.107491431

>>107491414
we need vision or smell modal or something agreed

Anonymous
12/09/25(Tue)09:16:45 No.107491444

Anonymous 12/09/25(Tue)09:16:45 No.107491444

>>107491401
No, only despair

Anonymous
12/09/25(Tue)09:17:53 No.107491454

Anonymous 12/09/25(Tue)09:17:53 No.107491454

>>107491431
you just got a glm vision model

Anonymous
12/09/25(Tue)09:19:00 No.107491463

Anonymous 12/09/25(Tue)09:19:00 No.107491463

>>107491454
don't care for repeat slop need deepsee to do it

Anonymous
12/09/25(Tue)09:19:55 No.107491469

Anonymous 12/09/25(Tue)09:19:55 No.107491469

>>107491463
mistral gave you a vision deepseek last week

Anonymous
12/09/25(Tue)09:21:05 No.107491480

Anonymous 12/09/25(Tue)09:21:05 No.107491480

>>107491469
true but it's shit and i want to complain about things, deepsee would do it right

Anonymous
12/09/25(Tue)09:36:17 No.107491591

Anonymous 12/09/25(Tue)09:36:17 No.107491591

>>107491376
if you're coming from nemo you'll at first be amazed by how smart it is before eventually going back to nemo once you start recognizing the usual moe model issues

Anonymous
12/09/25(Tue)09:36:45 No.107491594

Anonymous 12/09/25(Tue)09:36:45 No.107491594

tongyi and cumfartorg fatigue killed /ldg/

Anonymous
12/09/25(Tue)09:39:42 No.107491611

Anonymous 12/09/25(Tue)09:39:42 No.107491611

>>107491594
>tongyi
the fucking Chinese niggers just won't confirm base is open source

>cumfartorg
they won"t survive the memory shortage
should have built up sdcpp instead lol

Anonymous
12/09/25(Tue)09:41:13 No.107491625

Anonymous 12/09/25(Tue)09:41:13 No.107491625

>>107491611
>won't confirm base is open source
Didn't they say they were going to release before last weekend?

Anonymous
12/09/25(Tue)09:43:41 No.107491645

Anonymous 12/09/25(Tue)09:43:41 No.107491645

Hey anons,

I bought a few intel B60s, mostly to give myself something to do.

it's dogshit on llama, a 24B Q4 GGUF runs, 12t/s VULKAN, 18t/s SYCL (both single gpu)

got down to 5t/s with a 70B quant across two cards

The good news is by making my 4090 quant models for VLLM (autoround, W4A16, gptq) I'm now getting 17t/s across 2 cards for a 70B

the cards themselves are fine but the software stack is atrocious, VLLM is about the only thing that runs well and that's an intel fork/patch(intel/llm-scaler), the rest run like shit or don't run at all.

aphrodite only supports fp16 and can't offload weights before it quants, so I'm stuck with VLLM and the stock OAI samplers, the good news is turns out you don't really need most the samplers and they were just a crutch. (I want them so bad)

cards idle around 35w, run at about 130w during inference.

overall can't recommend unless you're retarded like me, just buy more used 3090s.

Anonymous
12/09/25(Tue)09:43:54 No.107491648

Anonymous 12/09/25(Tue)09:43:54 No.107491648

>>107491625
on the weekend three weeks ago. they are going to put it behind an API like the greedy chinks they are.

Anonymous
12/09/25(Tue)09:44:55 No.107491659

Anonymous 12/09/25(Tue)09:44:55 No.107491659

>>107491645
>intel B60s
wow that must be shit
>keeps reading
yeah that sounds awful

Anonymous
12/09/25(Tue)09:46:30 No.107491667

Anonymous 12/09/25(Tue)09:46:30 No.107491667

>>107491648
>like the greedy chinks they are.
They see BFL do it and get praised, they made a better and smaller model so why shouldn't they get to keep something for themselves too? Reneging on the promise would be a dick move though.

Anonymous
12/09/25(Tue)09:49:59 No.107491699

Anonymous 12/09/25(Tue)09:49:59 No.107491699

https://x.com/sophiamyang/status/1998400495270957452

Anonymous
12/09/25(Tue)09:52:37 No.107491721

Anonymous 12/09/25(Tue)09:52:37 No.107491721

File: 1752754986120006.jpg (1.14 MB, 1850x2625)

1.14 MB JPG

>>107491611
>the fucking Chinese niggers just won't confirm base is open source
what are you talking about?

Anonymous
12/09/25(Tue)09:54:04 No.107491729

Anonymous 12/09/25(Tue)09:54:04 No.107491729

>>107491721
"release" isn't the right keyword here, it's called open source. they refuse to say the release is an open source release

Anonymous
12/09/25(Tue)09:55:07 No.107491736

Anonymous 12/09/25(Tue)09:55:07 No.107491736

>>107491729
last cap

Anonymous
12/09/25(Tue)09:55:10 No.107491737

Anonymous 12/09/25(Tue)09:55:10 No.107491737

>>107491721
why are they fighting with ali baba over open sourcing it? it shouldn't be that hard they did zit already

Anonymous
12/09/25(Tue)09:55:15 No.107491738

Anonymous 12/09/25(Tue)09:55:15 No.107491738

>>107491729
bottom right of the image

Anonymous
12/09/25(Tue)09:56:18 No.107491746

Anonymous 12/09/25(Tue)09:56:18 No.107491746

>>107491721
bottom right is Alibaba taking it for an API so they have to work hard to convince them otherwise

Anonymous
12/09/25(Tue)09:56:31 No.107491748

Anonymous 12/09/25(Tue)09:56:31 No.107491748

How do I figure out why llama.cpp is processing the entire prompt every time? It doesn't give any reason why

Anonymous
12/09/25(Tue)09:57:25 No.107491759

Anonymous 12/09/25(Tue)09:57:25 No.107491759

>>107491721
>bottom right: "we are actively working on getting the model OPEN SOURCED"
>>107491746
>you: "bottom right is Alibaba taking it for an API"

Anonymous
12/09/25(Tue)09:57:50 No.107491763

Anonymous 12/09/25(Tue)09:57:50 No.107491763

>>107491699
Vibe coding just barley started working recently with gemini 3. In my opinion everything else shits the bed if you go more than 8k tokens. Maybe like 15k with claude.
Are they doing another v3 distill for their vibe model? kek
I can't see that happening locally. I believe it when I see it.

Its very frustrating that all those western companies think local models use case is tool calls and coding.
I only saw from chinese accounts acknowledgement of creative writing.

Anonymous
12/09/25(Tue)09:58:28 No.107491772

Anonymous 12/09/25(Tue)09:58:28 No.107491772

>>107491746
they literally say "by releasing this checkpoint", how can it be a API and a checkpoint release at the same time?

Anonymous
12/09/25(Tue)09:58:40 No.107491773

Anonymous 12/09/25(Tue)09:58:40 No.107491773

>>107491759
there is no contradiction in what is said, companies are large and departments can disagree on things see MS and wizrad

Anonymous
12/09/25(Tue)09:59:22 No.107491781

Anonymous 12/09/25(Tue)09:59:22 No.107491781

>>107491759
>working on getting
it means it currently isn't planned and they have to fight with Alibaba to get it open sourced

Anonymous
12/09/25(Tue)09:59:57 No.107491784

Anonymous 12/09/25(Tue)09:59:57 No.107491784

>>107491748
either your context is longer than the context window or there is something that changes in your prompt every time.
for example if you are using sillytavern and your sysprompt mentions {{char}} and you have a group chat then {{char}} is different for both characters so it reprocesses it

Anonymous
12/09/25(Tue)10:00:11 No.107491786

Anonymous 12/09/25(Tue)10:00:11 No.107491786

>>107491773
Didn't the Wizard team move back to China? Who are they working for now?

Anonymous
12/09/25(Tue)10:01:11 No.107491794

Anonymous 12/09/25(Tue)10:01:11 No.107491794

>>107491721
And so it beings. Not suprising.
Who else is even here locally besides alibaba? They have their fingers in everything. They own almost half of zai/moonshot too.

Anonymous
12/09/25(Tue)10:01:45 No.107491801

Anonymous 12/09/25(Tue)10:01:45 No.107491801

File: erp sysprompt.png (128 KB, 1413x913)

128 KB PNG

Rate my system prompt.

Anonymous
12/09/25(Tue)10:01:48 No.107491802

Anonymous 12/09/25(Tue)10:01:48 No.107491802

>>107491772
saas slop companies refer to a new endpoint as a release. you can argue semantics but tech bros just don't give a shit

Anonymous
12/09/25(Tue)10:03:20 No.107491812

Anonymous 12/09/25(Tue)10:03:20 No.107491812

>>107491794
OAI and their SOTA local oss models :^)

Anonymous
12/09/25(Tue)10:04:13 No.107491817

Anonymous 12/09/25(Tue)10:04:13 No.107491817

>>107491594
>still no new bake
Brutal. They've been murdered.

Anonymous
12/09/25(Tue)10:05:06 No.107491825

Anonymous 12/09/25(Tue)10:05:06 No.107491825

>>107491802
>By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development.
Does that scream API only to you?

Anonymous
12/09/25(Tue)10:05:55 No.107491838

Anonymous 12/09/25(Tue)10:05:55 No.107491838

File: devstral2-ow.png (64 KB, 1686x1093)

64 KB PNG

>>107491699
It says they're open-weight but I don't see the models on HF yet.

https://mistral.ai/news/devstral-2-vibe-cli

Anonymous
12/09/25(Tue)10:06:06 No.107491839

Anonymous 12/09/25(Tue)10:06:06 No.107491839

>>107491667
>They see BFL do it and get praised
BFL got clowned hard since the release of Z-image lol

Anonymous
12/09/25(Tue)10:06:42 No.107491844

Anonymous 12/09/25(Tue)10:06:42 No.107491844

>>107491825
Meta released Llama 3.3 8B for community-driven fine-tuning and custom development.

Anonymous
12/09/25(Tue)10:07:01 No.107491846

Anonymous 12/09/25(Tue)10:07:01 No.107491846

>>107491844
geg

Anonymous
12/09/25(Tue)10:08:11 No.107491858

Anonymous 12/09/25(Tue)10:08:11 No.107491858

>>107491838
https://huggingface.co/mistralai/Devstral-2-123B-Instruct-2512
https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512
When ready.

Anonymous
12/09/25(Tue)10:08:11 No.107491859

Anonymous 12/09/25(Tue)10:08:11 No.107491859

Ok we're back, bye boyz >>107491813

Anonymous
12/09/25(Tue)10:10:24 No.107491883

Anonymous 12/09/25(Tue)10:10:24 No.107491883

>>107491859
how do you go so quickly from 3 posts/minute to sometimes not having threads at all?

Anonymous
12/09/25(Tue)10:12:01 No.107491900

Anonymous 12/09/25(Tue)10:12:01 No.107491900

>>107491859
holy shit I didn't realize comfyui got that bad

Anonymous
12/09/25(Tue)10:12:02 No.107491901

Anonymous 12/09/25(Tue)10:12:02 No.107491901

>>107491858
So Mistral Medium 3 size is 123B?

Anonymous
12/09/25(Tue)10:13:15 No.107491911

Anonymous 12/09/25(Tue)10:13:15 No.107491911

>>107490157
Stop talking about that. All it takes is some big youtuber showing it in a video and the prices would 3x.

Anonymous
12/09/25(Tue)10:14:29 No.107491921

Anonymous 12/09/25(Tue)10:14:29 No.107491921

>>107491911
Then you can switch to making money by hosting.

Anonymous
12/09/25(Tue)10:14:46 No.107491924

Anonymous 12/09/25(Tue)10:14:46 No.107491924

>>107491838
Based, you guys rag on mistral for getting cucked and being a shit chatbot now, but devstral was the best local coding bot on the market, will try the new one once there's goofs on HF and report back

Anonymous
12/09/25(Tue)10:14:46 No.107491925

Anonymous 12/09/25(Tue)10:14:46 No.107491925

>>107491611
sdcpp is ass compared to comfy. You can't even offload models with that junk.

Anonymous
12/09/25(Tue)10:16:29 No.107491944

Anonymous 12/09/25(Tue)10:16:29 No.107491944

>>107491838
>>107491858
>123B
Vibecoder bros, we are so back.

Anonymous
12/09/25(Tue)10:18:01 No.107491961

Anonymous 12/09/25(Tue)10:18:01 No.107491961

>>107491838
dense?

Anonymous
12/09/25(Tue)10:20:41 No.107491990

Anonymous 12/09/25(Tue)10:20:41 No.107491990

>>107491925
oh wow I wish people would write PRs instead of complaining about it

Anonymous
12/09/25(Tue)10:20:49 No.107491992

Anonymous 12/09/25(Tue)10:20:49 No.107491992

>>107491961
yes
>Devstral 2 is a 123B-parameter dense transformer supporting a 256K context window.

Anonymous
12/09/25(Tue)10:23:23 No.107492018

Anonymous 12/09/25(Tue)10:23:23 No.107492018

>>107491992
I take back everything bad I ever said about the French

Anonymous
12/09/25(Tue)10:26:53 No.107492060

Anonymous 12/09/25(Tue)10:26:53 No.107492060

>>107491901
It sounds like the old Large is the new Medium and even the latest Devstral is codenamed devstral-medium-2512; in that case they had it on Groq hardware all along (80 TB/s bandwidth), no MoE.

Anonymous
12/09/25(Tue)10:27:24 No.107492067

Anonymous 12/09/25(Tue)10:27:24 No.107492067

>>107491838
>6x the size
>4 percentage points of difference
This is so stupid, they are wasting so much computational power for diminishing returns. I pray every day to God and LeCun that transformers will die.

Anonymous
12/09/25(Tue)10:28:28 No.107492081

Anonymous 12/09/25(Tue)10:28:28 No.107492081

>>107491838
Are those the fp8 version tho?

Anonymous
12/09/25(Tue)10:30:30 No.107492098

Anonymous 12/09/25(Tue)10:30:30 No.107492098

>>107491784
Can't I like cut off the older parts of the context?

Anonymous
12/09/25(Tue)10:36:10 No.107492155

Anonymous 12/09/25(Tue)10:36:10 No.107492155

>>107491992
We are so back. Finally something good to run.

Anonymous
12/09/25(Tue)10:38:52 No.107492184

Anonymous 12/09/25(Tue)10:38:52 No.107492184

>>107492098
not without reprocessing everything after the place you cut at

Anonymous
12/09/25(Tue)10:38:57 No.107492185

Anonymous 12/09/25(Tue)10:38:57 No.107492185

>>107491858
Now they work. By the way, this is probably not simply the old Mistral Large with a coding finetune, as that one has a 32k vocabulary, while this one has a 128k one, so it's technically 125B parameters now. The rest of the configuration seems the same.

Anonymous
12/09/25(Tue)10:41:35 No.107492219

Anonymous 12/09/25(Tue)10:41:35 No.107492219

>>107491858
finally no moeshit
>Ministral3ForCausalLM
which arch was this again? the ds fork?

Anonymous
12/09/25(Tue)10:42:18 No.107492234

Anonymous 12/09/25(Tue)10:42:18 No.107492234

>>107492184
Is it just what happens when you get a long chat? Is there nothing I can do?

Anonymous
12/09/25(Tue)10:43:40 No.107492255

Anonymous 12/09/25(Tue)10:43:40 No.107492255

>>107492185
So a new pretrain huh. That raises some red flags desu. I'd be wary of their claimed performance vs real world. I suspect it might be another flop and do worse than GLM 4.6 despite being claimed to be better. I mean hell the 25B is claiming to be on par lmao.

Anonymous
12/09/25(Tue)10:45:07 No.107492268

Anonymous 12/09/25(Tue)10:45:07 No.107492268

>>107492219
no, llama but they renamed it in order to "stay flexible in case they change something in the future"
this is literally just benchmaxed large 2
if you run this you are retarded and contrarian against MoE models

Anonymous
12/09/25(Tue)10:47:14 No.107492288

Anonymous 12/09/25(Tue)10:47:14 No.107492288

>>107492255
>So a new pretrain huh
Or a medium 3 finetune

Anonymous
12/09/25(Tue)10:48:16 No.107492301

Anonymous 12/09/25(Tue)10:48:16 No.107492301

>>107492268
>contrarian against MoE models
That is called someone who bought more than one 3/4/5090.

Anonymous
12/09/25(Tue)10:49:24 No.107492308

Anonymous 12/09/25(Tue)10:49:24 No.107492308

>>107492301
and has been sitting on them seething for the entire past year?

Anonymous
12/09/25(Tue)10:50:00 No.107492320

Anonymous 12/09/25(Tue)10:50:00 No.107492320

>>107492288
I mean Medium 3 is new, relatively speaking. I'm just saying it's a new pretrain compared to the old Medium which we would've normally thought became Devstral 2 if we just looked at the 123 number.

Anonymous
12/09/25(Tue)10:51:20 No.107492338

Anonymous 12/09/25(Tue)10:51:20 No.107492338

https://github.com/mistralai/mistral-vibe

Anonymous
12/09/25(Tue)10:52:00 No.107492348

Anonymous 12/09/25(Tue)10:52:00 No.107492348

>>107492308
Yes and he also missed buying ddr5 when it was 4 times cheaper.

Anonymous
12/09/25(Tue)10:55:56 No.107492385

Anonymous 12/09/25(Tue)10:55:56 No.107492385

>>107492234
you want to keep your context shorter than model's context. the simplest solution would be deleting old messages, ideally replacing them with a summary. there is a summary function in sillytavern

Anonymous
12/09/25(Tue)10:56:15 No.107492389

Anonymous 12/09/25(Tue)10:56:15 No.107492389

>>107492219
>finally no moeshit
Are you aware that in a dense model most neurons in a layer don't have a significant impact on the activation? You are basically doing millions of operations just for them to zero each other.

Anonymous
12/09/25(Tue)10:59:49 No.107492424

Anonymous 12/09/25(Tue)10:59:49 No.107492424

>>107492338
>Mistral Vibe
>Gemini CLI
>Qwen Code
>Claude Code
>Codex
I miss aider. Why do we need half a dozen copilot CLIs that all do the same thing?

Anonymous
12/09/25(Tue)11:02:28 No.107492440

Anonymous 12/09/25(Tue)11:02:28 No.107492440

>>107492234
What you typically do is summarize. Take like half (or whatever) of your chat history and replace it with a summary. Then you avoid reprocessing until your context gets close to filling up again. Then you summarize again, and so on.

Anonymous
12/09/25(Tue)11:03:58 No.107492454

Anonymous 12/09/25(Tue)11:03:58 No.107492454

>>107492389
>most neurons in a layer don't have a significant impact on the activation?
if this were true, pruning wouldn't be such a joke

Anonymous
12/09/25(Tue)11:06:19 No.107492472

Anonymous 12/09/25(Tue)11:06:19 No.107492472

>>107492454
It's not that they don't matter, it's just that for a given token many weights don't contribute much. But the "important" weights will vary for each token you generate.

Anonymous
12/09/25(Tue)11:07:37 No.107492483

Anonymous 12/09/25(Tue)11:07:37 No.107492483

>>107492060
>It sounds like the old Large is the new Medium
It has been right in front of our eyes the whole time:
https://mistral.ai/news/mistral-medium-3
>Medium is the new large.

Anonymous
12/09/25(Tue)11:10:17 No.107492525

Anonymous 12/09/25(Tue)11:10:17 No.107492525

>>107492472
yeah but it's still the best option we have until we start getting bigger moes or dynamic active params catches on

Anonymous
12/09/25(Tue)11:10:23 No.107492527

Anonymous 12/09/25(Tue)11:10:23 No.107492527

does the new devstral coming out mean that this >>107487529 is real?

Anonymous
12/09/25(Tue)11:13:06 No.107492558

Anonymous 12/09/25(Tue)11:13:06 No.107492558

>>107492234
>>107492385
>>107492440
Like the other anons said, depending on what you are doing, you can either summarize, or truncate (cut off) the earlier parts if you don't need the earlier history (this is obviously computationally cheaper).
But you don't want to do it for every message, you want to truncate once you reach your context limit down to something like half the context, then you fill it up again to the limit, etc.
That way you only pay for re-processing once per seq_len / 2.

Anonymous
12/09/25(Tue)11:13:13 No.107492559

Anonymous 12/09/25(Tue)11:13:13 No.107492559

>>107492527
Probably, but he's a fag for not leaking

Anonymous
12/09/25(Tue)11:16:03 No.107492593

Anonymous 12/09/25(Tue)11:16:03 No.107492593

>>107492454
???
it doesn't mean the neurons are useless, they activate too, just for different stimulation. In MoE we route to the group of neurons that are most likely to activate instead of calculating the neurons that are most likely to zero each other activations

Anonymous
12/09/25(Tue)11:16:11 No.107492594

Anonymous 12/09/25(Tue)11:16:11 No.107492594

>>107492559
he's going to leak it for christmas >>107486923 ...hopefully

Anonymous
12/09/25(Tue)11:18:25 No.107492623

Anonymous 12/09/25(Tue)11:18:25 No.107492623

>>107492593
Yeah. I think I've read that MoE-like specialization is observed even in dense models (ie. many weights contribute almost nothing for a given predicted token). In that case MoE just makes that explicit and tries to skip the computations that don't contribute anything.

Anonymous
12/09/25(Tue)11:19:24 No.107492634

Anonymous 12/09/25(Tue)11:19:24 No.107492634

>>107492593
you are missing the point that a dense model can have more of the weights contribute for more difficult problems that require broader connections, but a moe is always limited to a predetermined set of active weights

Anonymous
12/09/25(Tue)11:20:25 No.107492645

Anonymous 12/09/25(Tue)11:20:25 No.107492645

>>107492634
>predetermined set
should have said predetermined number or count

Anonymous
12/09/25(Tue)11:37:55 No.107492851

Anonymous 12/09/25(Tue)11:37:55 No.107492851

>>107492527
Why would that be 162GB, though? Devstral-2 on HuggingFace is ~128GB.

Anonymous
12/09/25(Tue)11:39:35 No.107492875

Anonymous 12/09/25(Tue)11:39:35 No.107492875

>>107492634
That's not really what we observe in practice. Neurons in artificial neural networks behave similar to biological neurons, meaning they tend to specialize themselves. Different problems are solved by particular groups of specialized neurons, it's a REALLY small part of the layer. Allowing more neurons to contribute to the problem solving doesn't make it better, it introduces unnecessary noise. There is a reason why our brains have specialized areas and the artificial neural networks naturally tend to make them themselves

Anonymous
12/09/25(Tue)11:41:06 No.107492895

Anonymous 12/09/25(Tue)11:41:06 No.107492895

>>107484913
>Really? huh. Thats alot better than I thought. I was expecting like 1-2 t/s.
Yup. RAM speed makes a big difference, so check your setup and make sure you're not running it a lower frequency than you need to.

Anonymous
12/09/25(Tue)11:41:15 No.107492899

Anonymous 12/09/25(Tue)11:41:15 No.107492899

>>107492851
Image + audio + video adapters? The llama 3 image adapter alone for 70b was 20b

Anonymous
12/09/25(Tue)11:43:34 No.107492927

Anonymous 12/09/25(Tue)11:43:34 No.107492927

File: devstral2-no-techdoc.png (83 KB, 1061x348)

83 KB PNG

UNO reverse card.
https://legal.mistral.ai/ai-governance/models/devstral-2

>No documentation required
>
>Devstral 2 is designed exclusively to generate and assist with software engineering tasks (exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context). Unlike general-purpose AI models, which can perform a wide variety of tasks, Devstral 2 is specialized in software engineering-related tasks only. As such it does not meet the EU AI Act’s definition of a General-Purpose AI Model (GPAIM), in accordance with the AI Office's official guidelines.
>
>The EU AI Act only requires technical documentation for General-Purpose AI Models (GPAIMs) or GPAIMs with systemic risks.
>
>Since Devstral 2 is neither a GPAIM, nor a GPAIM with systemic-risks, these requirements do not apply.

Anonymous
12/09/25(Tue)11:49:39 No.107492988

Anonymous 12/09/25(Tue)11:49:39 No.107492988

Do I need to set up vLLM to run GLM-4.6V-Flash?

Anonymous
12/09/25(Tue)11:50:15 No.107492992

Anonymous 12/09/25(Tue)11:50:15 No.107492992

>>107492927
Interesting. Does this also allow them to train on any data they want?

Anonymous
12/09/25(Tue)11:55:24 No.107493039

Anonymous 12/09/25(Tue)11:55:24 No.107493039

>>107492992
I think so. It works for regular chatting too, although I haven't tried it in depth for that yet.

Anonymous
12/09/25(Tue)11:56:39 No.107493048

Anonymous 12/09/25(Tue)11:56:39 No.107493048

>it's uncensored and trained on all the nasty shit again
Mistralbros we are so back

Anonymous
12/09/25(Tue)12:01:17 No.107493088

Anonymous 12/09/25(Tue)12:01:17 No.107493088

>>107493039
try to RP with it and tell us how it works

Anonymous
12/09/25(Tue)12:13:14 No.107493216

Anonymous 12/09/25(Tue)12:13:14 No.107493216

>>107492927
The absolute madmen destroyed both the EU and the chink MoE meta. We are so back, France saved LLMs from the MoE dark age.

Anonymous
12/09/25(Tue)12:23:09 No.107493323

Anonymous 12/09/25(Tue)12:23:09 No.107493323

>>107492927
wait, are we actually back?

Anonymous
12/09/25(Tue)12:29:47 No.107493402

Anonymous 12/09/25(Tue)12:29:47 No.107493402

>>107487529
S'il vous plait monsieur. Aussi si vous auriez un Ministral 24B qui traine...même si les evals sont pas bonnes pas grave tant qu'on peut RP bien.

Anonymous
12/09/25(Tue)12:30:31 No.107493410

Anonymous 12/09/25(Tue)12:30:31 No.107493410

>>107493323
not back until we have the goofs

Anonymous
12/09/25(Tue)12:41:16 No.107493517

Anonymous 12/09/25(Tue)12:41:16 No.107493517

>>107492992
>>107493039
If it looks like a general purpose model, sounds like a general purpose model, and performs similarly to a general purpose model beyond being good at "software engineering tasks", at some point isn't it a general purpose model?

Anonymous
12/09/25(Tue)12:43:21 No.107493534

Anonymous 12/09/25(Tue)12:43:21 No.107493534

It seems any model I use seems to love repeating my dialog back to me.

Anonymous
12/09/25(Tue)12:43:34 No.107493537

Anonymous 12/09/25(Tue)12:43:34 No.107493537

>>107493517
>at some point isn't it a general purpose model?
the geriatric female bureaucrats writing the EU regulations don't know that so who cares?

Anonymous
12/09/25(Tue)12:44:27 No.107493542

Anonymous 12/09/25(Tue)12:44:27 No.107493542

>>107493534
repeating your dialogue back to you? ugh, I can definitely see how that could be annoying.

Anonymous
12/09/25(Tue)12:44:52 No.107493546

Anonymous 12/09/25(Tue)12:44:52 No.107493546

>>107493542
Yes and they won't stop if I tell them to!

Anonymous
12/09/25(Tue)12:45:37 No.107493556

Anonymous 12/09/25(Tue)12:45:37 No.107493556

>>107493546
won't stop if you tell them to? I can definitely see how that could be annoying.

Anonymous
12/09/25(Tue)12:52:07 No.107493609

Anonymous 12/09/25(Tue)12:52:07 No.107493609

>>107493556
glm pls

Anonymous
12/09/25(Tue)12:53:42 No.107493622

Anonymous 12/09/25(Tue)12:53:42 No.107493622

File: Untitled.png (13 KB, 837x513)

13 KB PNG

>>107493611
>>107493611
>>107493611

Anonymous
12/09/25(Tue)12:57:05 No.107493649

Anonymous 12/09/25(Tue)12:57:05 No.107493649

>>107493542
Repeating dialogue, you say?
Don't think that doesn't make it incredibly annoying, you're absolutely right.
*The air is thick with ozone and something else...*

Anonymous
12/09/25(Tue)13:05:12 No.107493715

Anonymous 12/09/25(Tue)13:05:12 No.107493715

File: tee teto my beloved anon (...).png (289 KB, 680x710)

289 KB PNG

>>107493622
teee!

Anonymous
12/09/25(Tue)13:11:29 No.107493767

Anonymous 12/09/25(Tue)13:11:29 No.107493767

>>107491924
Makes sense they pick a niche and work on the dataset, there's no money generating fart-fetish erotica for neets

Anonymous
12/09/25(Tue)13:33:39 No.107493978

Anonymous 12/09/25(Tue)13:33:39 No.107493978

>>107489635
kys macfag

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.