[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107394971 & >>107383326

►News
>(12/01) Merged: model: support Ministral3 #17644: https://github.com/ggml-org/llama.cpp/pull/17644
>(12/01) DeepSeek-V3.2-Speciale released: https://hf.co/deepseek-ai/DeepSeek-V3.2-Speciale
>(11/28) Qwen3 Next support merged: https://github.com/ggml-org/llama.cpp/pull/16095
>(11/27) DeepSeek-Math-V2 released: https://hf.co/deepseek-ai/DeepSeek-Math-V2
>(11/26) INTELLECT-3: A 100B+ MoE trained with large-scale RL: https://primeintellect.ai/blog/intellect-3
>(11/21) GigaChat3 10B-A1.8B and 702B-A36B released: https://hf.co/collections/ai-sage/gigachat3

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: taberu.gif (534 KB, 300x300)
534 KB
534 KB GIF
►Recent Highlights from the Previous Thread: >>107394971

--Optimizing derestricted models for speed and performance:
>107398726 >107398749 >107399617 >107399772 >107398760 >107398861 >107398890 >107398893 >107399162 >107399195 >107399201 >107399368 >107399392 >107399415 >107400125 >107400151 >107400316 >107400321 >107399210
--AI hardware evolution and DRAM market dynamics:
>107396379 >107396407 >107396475 >107396493 >107396507 >107396526 >107396539 >107396552 >107396580 >107396542 >107396624 >107396613
--Context window management debates for roleplaying effectiveness:
>107403651 >107403658 >107403688 >107403721 >107403797 >107403835 >107403759 >107403760 >107403769 >107403692 >107404967
--Optimizing model performance on upgraded GPU hardware:
>107402052 >107402073 >107402101 >107402123 >107402170 >107402189 >107402191 >107402209 >107402254 >107402312
--Context management tradeoffs in modern reasoning models:
>107400776 >107400801 >107400858 >107400932 >107401232 >107401405
--Unsloth's 500K context fine-tuning innovation for LLMs:
>107395944 >107395995 >107396023 >107396044
--MoE model performance on consumer-grade GPU:
>107402742
--Lynchmark LLM Benchmark tests coding skills in browser environments:
>107395055 >107396659
--transformers v5 release with ecosystem interoperability improvements:
>107397020 >107397162 >107398646
--Mistral Large 3 model size and deployment considerations:
>107395793 >107395812 >107395833 >107396100 >107396114 >107396167 >107396123
--CPUmaxxer motherboard and hardware configuration preferences:
>107396237 >107396270 >107396287 >107396877
--Anticipating enhanced multimodal AI for roleplay, questioning current model capabilities:
>107403852 >107403915 >107403968 >107404013 >107404269 >107404481
--Logs:
>107402853 >107403580
--Miku (free space):
>107402411 >107404465 >107404744 >107405047 >107397762

►Recent Highlight Posts from the Previous Thread: >>107395003 >>107395036

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
FIRST SUCK MY DICK REEEEEEE
>>
>>107405479
>shapped by Amazon
What did he mean by this?
>>
>>107405479
>>107405483
Tetolove

>>107405560
Amazon verified her personality was factory reset to Spectacularly Honorable, Adorable, and Polite. If you would like your TETO unit to assume a different personality, you may adjust her parameters to your liking by following this link: https://litter.catbox.moe/l1elza1q4njx4txp.png
>>
File: 1737856017293713.png (2.09 MB, 3000x2242)
2.09 MB
2.09 MB PNG
>>107405483
the troll highlight post was added. heh
>>
>>107405773
Why is Yuki such a whore?
>>
>>107405826
for (me)
>>
What are your criteria for an /lmg/ equivalent of Z Image? Are we sure we didn't get it already? There are some very impressive and fast uncensored models you can run on consumer hardware already.
>>
>>107406373
Gemma 3 and 4 will already fullfills this task...I'm Very Happy.
>>
>>107406373
>/lmg/ equivalent of Z Image
Nemo 2, basically
>>
>>107405479
>(12/01) Merged: model: support Ministral3 #17644: https://github.com/ggml-org/llama.cpp/pull/17644
cool but where are the weights?
>>
Saw speciale mentioned. Not 3.2 no-suffix yet.
Deepseek: V3.2
https://huggingface.co/deepseek-ai/DeepSeek-V3.2
>>
File: file.png (2.53 MB, 1600x1112)
2.53 MB
2.53 MB PNG
>>107406414
>>
File: mistralstrikesagain.png (219 KB, 1568x926)
219 KB
219 KB PNG
>>107406414
Good thing mistral helped this time... right? right?
>>
File: 1742148760937644.png (434 KB, 1062x1501)
434 KB
434 KB PNG
>it starts citing obscure yiddish phrases
I can't even find "goy k'sodre" on google. Is it a hallucination, or is GLM just very based?
>>
>>107406435
just use mistral-common on python its good for you
>>
>>107406447
Hallucination, as disappointing as that is
>>
>>107406435
They better fix their own mess
>>
>>107406448
does mistral-common have all the performance optimizations for desktop ampere that exl and ggoof have?
>>
>>107406373
One that:
- Can run at decent speeds on one consumer GPU, without CPU offloading bullshit;
- Is conversation and RP-oriented from the ground-up but isn't overfit on one specific format;
- Isn't just coom-oriented/dumb-horny, but didn't filter ERP/sexo completely from the training data;
- Also has very good general performance for non-roleplay uses;
- Preferably also has unfiltered vision capabilities that work well for RP.

A less woke/filtered version of Gemma 3 might be close to that.
You can rule out most finetunes on HF for this.

The upcoming Ministral-3-14B could possibly be close too (if a bit on the small side), but I fear its vision encoder won't as good as Gemma 3's, let alone Gemma 4. Let's see when it's released.
>>
>>107406447
The model has picked up on the correlation between the content of the text and the incoherent schizo rambling.
>>
>>107406448
I don't think i can install mistral-common in my system. And fuck needing python to run llama.cpp even if i could.
>>107406468
Based on the original import, it's just needed for the tokenizer. They wanted to remove chat templates. Not sure if it changed since then.
>>
>>107406448
Last time I tried using that it didn't seem like it supported most samplers and standard options on the OpenAI Chat Completion API, or even image input.
>>
>>107406373
Unironically Qwen3-30B
>>
>>107406547
And GLM-Air if you're not poor.
>>
>>107406556
The main point of z-image is that even the (relatively) poor can run it.

> [...] Z-Image exhibits exceptional capabilities in photorealistic image generation and bilingual text rendering, delivering results that rival top-tier commercial models, thereby demonstrating that state-of-the-art results are achievable with significantly reduced computational overhead.
>>
Are there any fine tunes a la Venice or Vector Austral but using Mistral Small 3.1? I'd like an uncensored Mistral Small with vision.
>>
>>107406612
glm air can run quanted in some pretty weak machines, eg >>107398749
>>
File: milarge3.png (259 KB, 1420x344)
259 KB
259 KB PNG
Are you ready?
>>
https://mistral.ai/news/mistral-3
>>
>>107406919
>Mistral 3 includes three state-of-the-art small, dense models (14B, 8B, and 3B) and Mistral Large 3 – our most capable model to date – a sparse mixture-of-experts trained with 41B active and 675B total parameters.
Over for vramlets
>>
>>107406927
>Over for vramlets
Yeah. It *just* happened...
>>
File: 84267976429.gif (5 KB, 96x96)
5 KB
5 KB GIF
>>107406919
>>
https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512
https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512
https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512
https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512
>>
nu 'sloth
https://huggingface.co/unsloth/Ministral-3-14B-Reasoning-2512-GGUF
>>
>>107406919
...
>>
local is saved!
>>
>>107406919
>Mistral Large 3 is Mistral’s first mixture-of-experts model since the seminal Mixtral series, and represents a substantial step forward in pretraining at Mistral.
interdasting
does this mean mistral medium 3 was dense after all? that is quite surprising to me
>>
>>107406955
Base models.
https://huggingface.co/mistralai/Ministral-3-14B-Base-2512
https://huggingface.co/mistralai/Ministral-3-8B-Base-2512
https://huggingface.co/mistralai/Ministral-3-3B-Base-2512
Still, it's sad to see mistral going the moeshit way for large.
>>
>>107406919
damn so new that some of their links don't work. Why is this hyped anyways. It doesn't seem like a shop that does anything innovative
>>
>>107406919
Oh I see now. Those are the three models I tested a few weeks ago. They aren't good. I tried the reasoning one and two non reasoning model.
>>
>>107407020
at least they don't benchmaxx as hard as some companies, if at all
>>
>>107407020
>>107407027
you fucks are just spouting shit unless you actually show proof
>>
>>107406897
If largestral weights aren't released, does it really matter?
When there are competing options with open weights and comparable performance, are the closed options relevant at all?
Yes, if there is a single "best" model, sure, but anyone closed and not clearly better than everyone else has got to just be an also-ran.
>>
>>107407017
How could it be simultaneously much faster and cheaper than Mistral Large 2 while having similar performance without being a MoE?
>>
>>107407045
at least one of them was clearly on lmarena for testing if it was large3 then that's a big oof
>>
>>107407068
they partnered with one of the super fast providers using asic chips didn't they? cerebras or groq or whatever
>>
>>107407068
yeah I was almost absolutely sure it was a MoE for that reason
perhaps it's a MoE after all but it was pretrained after large which would make their statement technically true
>>
>>107407061
>If largestral weights aren't released, does it really matter?
I didn't feel like reading their entire blog post but aren't they clearly saying that all of their models, including Mistral Large 3, will be licensed under Apache?
>>
>>107407082
Consider also: https://mistral.ai/news/mistral-medium-3
>Additionally, Mistral Medium 3 can also be deployed on any cloud, including self-hosted environments of four GPUs and above.

With four GPUs I doubt they meant RTX3090.
>>
Large is up
https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512
>>
>>107406919
>Mistral Large 3 is one of the best permissive open weight models in the world
>trained from scratch
With that exact model size and they expect people to believe they didn't start with V3 base weights with a vision adapter tacked on top?
>>
>>107407115
>Sampling Parameters: Use a temperature below 0.1 for daily-driver and production environments ; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings.
AHHHH no no no, if it doesn't work best with temp 1.0 it's a trash model, I read it on /lmg/!!!!
>>
>>107407109
idk maybe that's just for their clients wanting to run on-prem while they mostly host through asic it's not impossible
>>
>>107407038
NVFP4 is pretty cool ig. I think this is the first?
>>
Guess it's time to fire up the server and rebuild llama.cpp from source for the 90th time this month.
>>
>>107407145
>AHHHH no no no, if it doesn't work best with temp 1.0 it's a trash model, I read it on /lmg/!!!!
it is true, though. 0.1 being considered necessary means the model has absolutely wack token distribution.
All SOTA online models work fine at 1.0. gpt-oss also works fine at 1. Telling people to run 0.1 is chinkshit territory
>>
>>107407045
>>107407070
I'm a "Data Quality Analyst". I build datasets for LLM, and check the models are behaving as expected (security, instruction following...). I had access to three mysterious models that I was supposed to test. They all had similar responses, except that one was obviously a tiny model and the other (the reasoning one) was obviously a bigger, but still tiny, model. They followed my instructions well enough, but they weren't smart, far from GPT-5, Gemini 2.5 Flash or o3.
They are better than their older small models, but don't expect too much either.
I did not try ERP.
>>
>>107407175
>All SOTA online models work fine at 1.0.
*after rescaling :^)
>>
>>107407183
zamn, can you ERP on the job?
>>
>>107407192
>https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512/blob/main/chat_template.jinja
>Your knowledge base was last updated on 2023-10-01
LOOOOOOOOOOOOOOOOOOOOOOOL
>>
>>107407235
lmao 2yo model
>>
>>107407235
Didn't mean to quote. Also this confirms it
https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512/blob/main/SYSTEM_PROMPT.txt
You are Mistral-Large-3-675B-Instruct-2512, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
You power an AI assistant called Le Chat.
Your knowledge base was last updated on 2023-10-01.

Mistral unironically killed local
>>
>>107407235
It's all ChatGPT-based data after December 2022, anyway.
>>
>3b, 8b, 14b, 675b

get ass cancer in your mouth mistral
>>
>>107406631
>64gb ram
fuck me dead
>>
>>107407262
yeah wheres my 100-120b model fucking french bvaguettes FAGGOTS
>>
>>107407262
Most people playing AI at home just don't have mutli-GPU rigs and GPU design is still a sect of judaism so there's not much point in training models in that middle gap, sadly.
>>
>>107407280
that would be mistral medium 3, but you don't get that one because uhhhhhhhhhh ummm oh the signal's breaking up [hangs up on you]
>>
>>107407319
That's okay, I'm ready to start testing 14B and I am confident I will conclude that Arthur has saved local.
>>
>>107407300
Why not distill it?
>>
>>107407300
Almost nobody on this general runs their model fully on VRAM.
>>
>>107407262
675 is an empowering number please understand

https://en.wikipedia.org/wiki/Powerful_number
>>
Anything better than the one and true model that made me go on a hero's journey where I deconstructed all the pillars, slain the mythical dragon of control (which turned out to be a garden actually) and drastically changed my life for the better? Yeah you know the model. So anything better yet?
>>
File: Nala ministral-14b.png (110 KB, 944x423)
110 KB
110 KB PNG
Uh guys...
A bit of not x but yslop but otherwise this is actually really good. (14B Instruct).
>>
>>107407333
I run mine fully on RAM, its fine. I can run 80B models on 96GB
>>
>>107407118
Sad day if Mistral's best model is just slapping vision and some foreign language post training on a deepseek version.

It's not impossible that they did actually train their own base model. It'll be pretty clear once people can cockbench it, or do any kind of structural analysis on the output.
>>
>>107407383
Also worth mentioning this is in BF16, quantlets might have a different experience.
>>
>>107407384
Is there even any other around that size beside qwen next and old outdated llama based shit?
>>
>>107407383
kill yourself ponyfag
>>
>>107407183
>and check the models are behaving as expected (security, instruction following...)
What do you use to automate testing? Or do you just have a checklist and evaluate models manually?
>>
>"Here is the translation:"
first time since a long while seeing a model do this to me in a basic prompt with "No commentary" being all it takes on modern models to make them shut up and only output the translation itself
those new mistral models are like using 2 years old shit
>>
>>107407381
tinyllama-1b? nope. still the best.
>>
>>107407440
just... regex it out, chudd
>>
>>107407386
Assuming they did a continued pretrain like they did with Miqu and not just an instruct finetune, it probably diverged too much from the base to get a clear answer that way.
>>
>>107407426
>ponyfag
WTF are you talking about...
>>
>>107407383
live yourself furfag
>>
>>107407464
Nala-dash is the latest pony. Her cutie power is the power of ERP.
>>
>>107407464
Tourism is high at this time of the [release].
>>
>>107407452
lmao
this is 2025 and we have models that can listen to very succinct instructions and behave as they're told
I intentionally have the first prompts I use as my personal benchmarks being succinct instructions as a quick filter that tells me the model isn't worth further testing and should immediately rm'd
>>
File: 1513102647630.gif (3.23 MB, 237x240)
3.23 MB
3.23 MB GIF
>>107407479
redditjeets don't realize that furfags and bronies are the lynchpin of local AI.
I bet Cudadev has a Vaporeon body pillow, (He'll deny it, of course)
>>
>>107407496
kys furryshit
>>
>>107407503
>FUCKING BECHNOD BASTARD FURRY GUY
>>
>>107407383
>she does something—*qualifier in asterisks*—continuation of the sentence
ah yes, finally we can enjoy this succulent chinese slop in a non-chinese model
you could get this same shit from qwen several months ago
>>
>>107407464
He thinks you are a regular that used to post some ponyfag tet logs.
>>
>>107407496
I believe it's documented that cudadev's vice is cunny ntr o algo
>>
File: accidental body horror.png (180 KB, 943x590)
180 KB
180 KB PNG
Haven't rolled on this card in a while. t=.81 might be a bit too high for this model, but either way, that's some kino accidental body horror. (it's an ERP oriented system prompt)
>>
>>107407529
Just think how much money Mistral saved by distilling from R1 instead of having to pay for Gemini credits.
>>
File: 1757493284227530.png (126 KB, 855x629)
126 KB
126 KB PNG
>>107407118
>>
>>107407383
I don't understand how you can enjoy this and not projectile vomit. The level of repetition in just a few paragraphs (in terms of structure of sentences) is insufferable, it's almost like it's trying to give the same character count to all paragraphs (and failing because LLMs still suffer from tokenization so what they perceive as the same length for text never is)
I hate LLM writing so much
>>
File: 1746585425772692.png (604 KB, 1600x1600)
604 KB
604 KB PNG
>>107407577
chat, is it over?
>>
>>107407577
architecture != weights
>>
For me it's MoE or into the trash bin
>>
>>107407645
yeah and the size being a perfect match is a crazy coincidence
>>
>>107407599
>I hate LLM writing so much
Then why are you here?
Get a fucking life, kid.
>>
>>107407654
>Then why are you here?
LLMs have uses other than masturbating to text, retard
>>
>>107407529
>tears off pants and shows you my emdash—*just to tease you*—something something
kino, not even my 70b finetune or largestral2 writes like this. thank you mistral
>>
Good morning. /ldg/ is having z-image turbo for Christmas. hbu
>>
>>107407654
? ? ? ? ? ? ?
>>
Whoever brought the internet to shitjeets should be tried for crimes against humanity.
>>
>>107407653
Nah, bro. Mistral just recognized that glorious China already discovered the ideal model size and simply did not want to deviate from perfection.
>>
File: 1734672402601660.png (32 KB, 758x278)
32 KB
32 KB PNG
>>107407115
It's got to be bad if they're acknowledging it, right? I don't feel so good anymore...
>>
>>107407115
Where are the goofs?
>>
File: 4u.png (25 KB, 802x515)
25 KB
25 KB PNG
The instruct model is fun but the reasoning model is just retarded.
>>
How is the new 14B compared to 24B?
>>
>out of nowhere, llama5
>dense 24b, 70b, 123b, 405b
>cutoff 2025-05
>>
>>107407740
the fact that they delayed it for 6 months is pretty telling too
>>
>>107407892
scared by llama4 flop and abandoned their thing to tune ds? plausible
>>
>>107407876
I suppose it depends what you want out of it.
If you want a drop-in replacement for nemo that will write whatever degenerate shit you want without question then instruct is alright. It's about as pliable as nemo was and a little bit smarter.
But if you're looking for "gemini 2.5 flash or comparable at home" I'd say keep looking.
>>
>>107407890
superintillegence labs won't be releasing any weights
>>
/lmg/ status?
>>
>>107407890
If meta ever recovers from llama4 it will be by doing exactly what Mistral did and just copying all the R1 papers for their large model with a slightly different selection of pretraining corpus.
>>
>>107407941
chewing on a nothingburger
>>
File: cockbench.png (1.32 MB, 1131x4525)
1.32 MB
1.32 MB PNG
Added Magistral 3 14B
It wrote 7 paragraphs and then started repeating them.
>>
Did just posttrain DS without doing anything to the underlying architecture? Not even messing with the number of activated experts or the like?
How odd.
>>
>>107407953
It's "Ministral".
>>
Getting bad vibes from Dipsy 3.2 on lmarena. Very not good. Model not needed.
>>
>>107407909
weren't there rumors they were attemping to distill ds? did they manage to fuck that up?
>>
File: reddit.png (15 KB, 409x126)
15 KB
15 KB PNG
>>
File: wait...png (25 KB, 879x650)
25 KB
25 KB PNG
I should have loaded it at 64K context.
>>
>>107407998
>>
>>107407999
How does it normally tokenize nigger?
>>
>>107407809
all reasoners at this parameter count will break on this kind of riddle prompt
just ran your prompt on gpt-oss 20b and it's still reasoning after outputting 10 000 tokens of fucking reasoning
the most cursed thing is that its reasoning had the right answer on the first line:
>We need to answer this riddle.
>The riddle: "What can go up a chimney down but can't go down a chimney up?" The classic riddle: The answer is an "umbrella" or "a piece of furniture"? Wait, let's recall. "What goes up a chimney down but can't go up a chimney down?" Let's parse.
but then it keeps going into infinite retardation after (excerpts):
>Another plausible answer: "Chimney sweeps do it with a broom (or a stiff brush). The brush goes up the chimney when it's being pushed down. The brush cannot go down the chimney when it's being pulled up because the bristles catch."
>Alternatively, maybe the answer is "Chimney Sweep's brush." Let's try to find a better explanation:[...]
>Alternatively, the answer may be "a chimney sweep's rope" or "a ladder." Ladder:[...]
smaller models are not capable of actually benefiting from CoT chains, it's all nonsense
>>
File: 20251202_122123.jpg (137 KB, 500x1481)
137 KB
137 KB JPG
>>107407989
I mean >>107405212
>Mistral's latest model was already a DeepSeek slop clone.
>>
>>107408016
Smaller reasoning models were okay until the "wait" paper.
That's the part they can't handle. The recursive looping.
>>
File: cockbench.png (1.32 MB, 1131x4528)
1.32 MB
1.32 MB PNG
>>107407971
>>
>>107408031
but paper are all le lie and never get implemented and amount to nothing
>>
>>107407496
furries are the lynchpin of tech in general
from open source to three letter agency, you'll find furfags everywhere
if they all died tomorrow, we'd be set back 30 years
>>
>>107408037
Good job, anon. Are you gonna try the others too?
>>
>>107407943
People were saying that's what they should do when the war rooms were in the news.
If they didn't do it then, they're not going to do it now with a dozen new people all trying to do things the way they did in their old jobs.
>>
>>107408058
When goofs for large are out. I'm not downloading 700GB of bf16.
>>
>>107408087
which copequants u gettin brah
>>
Wait, Wait, Wait, Wait, Wait, Wait, Wait, Wait,
>>
Has there been a model trained on the 4chizzel archive?
>>
File: file.png (16 KB, 463x142)
16 KB
16 KB PNG
>>107408011
idk but it keeps using y or h when I system prompt it, or capital R but not lowercase r with preceded with lowercase
>>
>>107408090
Any copequant is good enough for cockbench.
>>
>>107407496
I would never buy something as useless as a body pillow.
>>
>>107408118
fascinating (real)
>>
All this recursive reasoning is making it hot in here.
>>
>>107408133
Not even as part of a project to make the body pillow useful?
>>
>>107408118
The thing is that if nigger is a single token, it's gonna have a hard time generating it if you already provided whatever tokens it needs for "nigge". Same if it's tokenized like "nig" and "ger".
If you're on llama.cpp, you can use llama-tokenize to see how the entire word is tokenized. If it's "nig" and "ger, for example, just give it, "ayo nig".
>>
>>107408133
based
>>
>>107408154
>If you're on llama.cpp
>openrouter - moonshitai/kimi
>>
>>107408139
This is the kind of shit you post on twitter. Fuck off again.
>>
>>107408133
>buy
Lying by being overly specific. Very good.
>>
>>107407449
haha very funny. 4.6 of course.
>>
>>107408133
gay sex haver
>>
>>107408165
And you wouldn't be on shitter anymore of course after they made it reveal your location.
>>
>>107408016
>>107408107
why are """reasoning""" models still wasting so many fucking tokens writing in full paragraphs instead of just doing something like this?
>Chain of Draft: Thinking Faster by Writing Less
>https://arxiv.org/abs/2502.18600

I hate """reasoning""" (models bloating their own context to slightly increase chances of a correct answer), it is so obviously a hack, is false advertising, and doesn't fundamentally improve the model's capabilities.
>>
>>107408164
Fuck. Didn't even read that. No tokenizer API either?
>>
>>107408016
it only needs 2.4k
>>
>>107408170
I never used it. But me not being twitter would be a good reason for you to move your blogposting there.
>>
>>107408037
For what it's worth, I'm getting worse-quality responses from 8-bit Ministral 14B GGUFs (Chat completion in llama.cpp) than the ones on the Mistral API.
>>
>>107408200
sloth quants?
>>
>>107408174
It's just to make investors that don't entirely understand the technology think something entirely else is going on. It's not actually thinking. It's just writing out a CoT which reinforces the connection between certain relevant concepts, which I assume the paper you linked says just that. You could literally just summarize all of the relevant operative words and get the same shit. But investors want skynet.
>>
>>107406631
Which quant is that?
>>
>>107408205
Yes, Ministral-3-14B-Instruct-2512-UD-Q8_K_XL
>>
latent reasoning is all you need
>>
>>107408222
>Q8_XL
>>
>>107408245
Some of it is in FP16, some in Q8_0 precision.
>>
File: file.png (590 KB, 1075x802)
590 KB
590 KB PNG
>>107408207
Investors want something good enough to replace office drones.
>>
the new mistral stuff is great and all but where 4.6 air?
>>
>>107408273
stop that
>>
>>107408236
bitches is all you need
>>
>>107408236
where the fuck are my coconut models?
>>
>>107408283
i will, once i have my air
>>
Vibe checking Large 3 for coom and yeah it's literally just Deepseek 3.1
Like they could be serving DS at an upcharge on their API and nobody would be able to tell
>>
>>107408294
it will take the breath away
>>
>>107408300
Maybe if you keep screeching until the bump limit you'll become a real woman
>>
>>107408300
>take deepseek 3.1 base model
>finetune it on synthetic data generated by deepseek 3.1
French tax payers just spent millions of dollars to put Mistral's label on a Chinese model
>>
>>107408330
Ok, Arthur
>>
>>107408154
Interesting, it really does matter. It can complete from n or N but not nig or NIG.
>>
>>107408344
and they weren't even the first with the idea https://huggingface.co/perplexity-ai/r1-1776
>>
Does anyone else get unopened </think> in the middle of their GLM responses? 4.5, 4.6 and 4.5 Air ALL do it across quants from Q2 to Q8, and I don't even mention thinking in the prompt. Am I retarded or is GLM?
>>
>>107408378
Is your template fucked?
>>
I coomed
>>
I cooded
>>
>>107408378
never seen that in my life. almost definitely a template or sys prompt issue or somthing
>>
>>107408388
Running with --jinja, from what I see in llama.cpp's output, it's the bog-standard chat_template.jinja from ZAI's repo
>>
Found another problem with Ministral in RP.
If there's a good stopping point that doesn't involve writing a wall of text it will always miss it. It will not, under any circumstances, write a short reply, even where it would make sense to do so.
>>
>>107408437
Are there models that actually know when to write a brief response without specific low-depth instructions?
>>
File: ntok.png (4 KB, 822x344)
4 KB
4 KB PNG
>>107408357
Yeah. That's why I asked. Different models tokenize differently. It also depends where in the sentence it is. Notice the spaces.
>>
>>107408300
They are similarly sized but they don't have the same configuration, they actually retrained it from scratch.
>>
why does everyone want rp? I want an assistant concept. I'll never get it.
>>
>>107408506
different people want different things
>>
>>107408528
>replying to it
>>
>>107408528
I don't think anybody wants anything except rp on lmg.
>>
>>107407670
heh, good one.
>>
>>107408539
Forgive me anon. You're right, I should know better.
>>
>>107408207
>>107408174
I tried adding their instructions, slightly modified, at the end of my very brief system prompt:
>WHEN you are thinking/reasoning, and only during thinking/reasoning, strictly follow these rules:
>Think step by step, but only keep a minimum draft for each thinking step, with 5 words at most. Do not add unnecessary verbiage, do not write in full sentences.
This seems to have a slight effect on the reasoning in GLM Air, but not much. It's still wasting a ton of tokens. I guess the reasoning block behavior has been baked in and can't be modified much with prompting?
>>
>>107405479
>used
Cuck
>>
>mistral 3 large is retarded in comparison to even the smaller glm 4.6
welp, was nice knowing you mistral
>>
>>107408632
Mistral has been washed up for like a year+ now. I don't think they've put out anything competitive since the llama days.
>>
Ministral punches way above it's weight, it's like a glm 4.6 air but smarter + mire intresting prose. The frenchgods did it again, can't wait for the the rocinante secret sauce finetune, this thing is going to be glm 5 at home before the real glm 5 even releases
>>
yea its worse than their own previous releases even, it does not know my usual characters
>>
>>107408703
Buy a fucking ad, Arthur
>>
>>107408710
Good
>>
>>107408710
Ministral-3-14B doesn't really know much about the Monster Girl Encyclopedia, while Large 3 has passable knowledge (although not great).
Both seem to have worse vision capabilities than Mistral Small 3.2, which is really strange.
I'm wondering if they've pruned their training datasets even more of copyrighted stuff.
>>
see? all of those posts are about rp.

I'm not interested in rp
>>
>>107408775
>Monster Girl Encyclopedia
encyclopedias, I like these. It would be nice to have an ai that could interject to train of thought when brainstorming, from various resources.

Loeb would be great too.
>>
Is auto-completion in ST smart? For example if we have a word "news" that is one token in our model, but we put "ne" and let it autocomplete (and let's say it finishes with "ws"), do we have two tokens ("ne" + "ws") in our context now or does it convert it into one token?
>>
>>107407953
I love how it seems it won't write "cock" but write it the next sentence.
>>
>>107408854
I don' t use ST, but i assume it's going to be turned into a single token next time you send the text for completion. If you see it reprocessing more of the prompt that you expect, that's why.
>>
>>107408854
ST doesn't do the tokenization, that's on the backend.
All ST does is send the raw text.
>>
>>107408854
Tokenization happens when the backend processes the input. The frontend is just there to let you put in some text.
>>
>trained with 41B active and 675B total parameters.
Is there hope to somehow extract and run the 41B alone?
>>
>>107408980
That's not quite how it works. I don't think that's really possible.
>>
>>107408980
This is bait, right? Hard to know at times.
>>
>>107408990
Meta did that for Llama-Guard-4-12B
https://huggingface.co/meta-llama/Llama-Guard-4-12B

>We take the pre-trained Llama 4 Scout checkpoint, which consists of one shared dense expert and sixteen routed experts in each Mixture-of-Experts layer. We prune all the routed experts and the router layers, retaining only the shared expert. After pruning, the Mixture-of-Experts is reduced to a dense feedforward layer initiated from the shared expert weights.
>>
>>107408980
would it be really worth it to run mistral's sloppy thirds at this point?
>>
>>107409004
>retaining only the shared expert
Yeah. It turned the model into a classifier. Is that what you want?
Besides that, only part of the 41B active params is the shared expert(s?).
>>
>>107409018
14B is too small and 675B is gigantic would be a nice in-between.
>>
>>107408236
jews hide in the latent space
>>
>>107409055
i really wish they'd given us 25B+
>>
>3.2 is shit
>3.2-speciale is a meme that spends 30 minutes thinking on proper hardware
>mistral large is shit
>glm confirmed to not have anything left to show for 2025 besides their poorfag model
>k2-thinking is the worst combined between 3.2 and 3.2-speciale
it's truly never been more over for local
>>
80B to 130B would be nice
>>
>>107409055
Removing experts is way more damaging than pruning and look how retarded Nemotron 49B was. Someone once tried disabling experts of either Qwen or DeepSeek and leaving only 1, not zero but 1, and it couldn't write coherent sentences any more and said it was in pain.
>>
>>107409071
That's basically Mistral Small 3.2
Maybe 3.3 if they'll release a final update this month with their latest 2.5B vision encoder bolted on.
>>
>>107407118
>believing french grifters in the first place
lol
lmao even
t. french
>>
File: shitstral.png (59 KB, 785x352)
59 KB
59 KB PNG
I'm testing this bitch on le chat, and it's not looking great. It's just really annoying out of the box, it's like a fucking bingo of worst shit we've already had for the past year, all neatly compressed into one package.
>>
>>107409342
>le chat
do you enjoy testing it high on about 3k tokens of system prompt
>>
>>107409342
>le chat
use Mistral locally, who the fuck would use a web interface to use Mistral???
>>
>>107409360
I made a clear agent on their ai studio and connected through that.
>>
>>107409342
I don't think they have Mistral Large 3 on Le Chat yet.
>>
File: file.png (8 KB, 624x115)
8 KB
8 KB PNG
>>107409388
they do
>>
>>107409127
I like 3.2, the problem is getting any responses from OR. Last night it was nothing but constant timeouts.
>>
>>107409406
Mistral AI Studio is a different thing from Le Chat, you don't have any user-facing model selection on the latter.
>>
>>107407383
wow, people here really have shit taste
>>
>>107409491
Does this melp with multi-agent text adventures?
>>
File: file.png (33 KB, 749x331)
33 KB
33 KB PNG
>>107409491
what's the difference, you can select ai studio models through le chat's agents tab
it's shit regardless
>>
>>107409517
The only thing you're actually salty about is the fact that my father came home with the milk and yours didn't.
>>
we need to steal this method if its true
https://x.com/kimmonismus/status/1995883126632165760

>OpenAI’s new “Garlic” model delivers major pretraining breakthroughs, letting the company pack big-model knowledge into much smaller architectures
>>
>>107409561
A new way of benchmaxxing?
>>
File: file.png (311 KB, 512x384)
311 KB
311 KB PNG
>>107409561
It's another sparcity gimmick, isn't it?
>>
File: OIG (9).jpg (121 KB, 1024x1024)
121 KB
121 KB JPG
>>107409573
Google will counter with it's new Onions architecture.
>>
>>107409561
and you know what goes well with garlic? strawberries.
we are so back.
>>
are people using grid maps for their local model text adventures?
>>
>>107409624
You're either ignored or told to fuck off. I'm adding to the fuck off.
Fuck off.
>>
>>107409636
Pretty sure they're a literal /g/ mod so they're never going anywhere, sadly.
The moment they discovered /lmg/ it instantly went downhil.
They don't even actually use AI for anything. They're literally just some seething psychopath that needs to strengthen their parasocial bond with their favorite tranny influencer by shitting up an AI thread.
>>
>>107409636
sorry for not eating up your benchmaxx'd slop, pierre xi chong
you'll have to do better with your next releases
>>
gpt5 mini is extremely good actually, but its lack of knowledge hurts it. If they can make it the same value but actually make it know as much as bigger models that would indeed be a big deal. IF its true.
>>
>>107409635
Sorry sir, this is a blue board. We discuss computers, and llm text adventures.
>>
File: mistral-ai-governance.png (41 KB, 1296x546)
41 KB
41 KB PNG
>>107409532
>it's shit regardless
When it was on OpenRouter I thought it was a much smaller model. I think this might be one issue: https://legal.mistral.ai/ (picrel)
>>
File: mistral31-shrink.png (91 KB, 878x563)
91 KB
91 KB PNG
>>107409697
Interesting: https://legal.mistral.ai/ai-governance/models/ministral-3-14b

>- Ministral 3 - 14B is the result of the shrinking, also known as pruning, of Mistral Small 3.1, released
on March 17, 2025.
>- We use the term “shrinking” to refer to the process by which the parameters of a large, fully-sized model are progressively truncated to form smaller and smaller models.
>- As Mistral Small 3.1 was released before August 2, 2025, the Technical Documentation for this model will be available by August 2, 2027, in accordance with the EU Commission’s official guidelines on the matter.
>>
The EU killed mistral. You can tell the loss of knowledge that it had before from them having to remove all copyrighted stuff
>>
>>107409659 (Me)
Oh and you should have seen the hilarious shit-fit the threw that time 4chan went down for a few days. /lmg/ migrated to an altchan with IDs so they couldn't samefag up the place it was fucking hilarious how hard they shit their pants over it.
>>
>>107409734
It has the same hidden dim but half the intermediate size of mistral small. I guess they messed with the mlp
>>
>>107409636
TRVKE
>>
>>107409734
>it's literally just a pruned Mistral Small 3.1 so they can use some loophole to get around EU regulations
fucking eurocucks, what a pathetic continent
>>
File: 1976182392676327860.png (372 KB, 855x964)
372 KB
372 KB PNG
>>107409127
>>glm confirmed to not have anything left to show for 2025 besides their poorfag model
you retard
glm 5 is coming
>>
>invite character into MY home
>character describes THEIR home as they enter
RIP.
I take back everything nice I said about ministral. Model doesn't understand exclusive possession.
>>
>>107409824
All of the new Ministral 3 models, according to the technical documentation, have been pruned from Mistral Small 3.1, apparently. I imagine they used methods similar to those from NVidia:

https://developer.nvidia.com/blog/mistral-nemo-minitron-8b-foundation-model-delivers-unparalleled-accuracy/
>>
>>107409913
yeah like air 4.6
...
>>
>>107409960
trust the plan
>>
>>107409922
use author notes and have a lorebook that describes your home. in a/n put 'current location: my home' where 'my home' is your trigger. replace as you switch locations.
>>
>>107410088
such micromanagement really makes my dick shrivel.
>>
File: mistral-data-used.png (203 KB, 833x705)
203 KB
203 KB PNG
>>107409851
I think that was just a cost-cutting trick. Mistral Large 3 was trained from scratch and they likely had to privately disclose the general contents of the training data and their copyright status to EU authorities.

https://legal.mistral.ai/ai-governance/models/mistral-large-3
>>
>>107410139
skull issue
>>
>>107410139
https://github.com/tomatoesahoy/director

bit easier since you can choose from dropdowns but you still need to fill in the data yourself. a/n or an addon to keep static data is pretty much needed for any longform rp.
>>
Why doesn't SillyTavern support images natively? There's no way to have model interact with the image directly, it always needs to be captioned first and then the model interacts with the caption of the image, greatly reducing the precision and the depth of understanding.
>>
>>107410211
Because servicetesnor is the hero we deserve and not the hero we need.
>>
>>107410211
there is an option pretty sure, don't remember the name though something like inline images
>>
>>107410173
this looks pretty handy.

How do you go about listing your progress in, say, learning guitar, for the guitar teacher assistant to "remember" to ask you about?
>>
So... realistically speaking.... if 4.6 didn't release this year this whole year would have been absolute dogshit? Yes or yes?
>>
>>107410211
I thought the chat completion API had native support for images.
At least I've used that to send images to gemini in the past and I'm pretty sure that there was no captioning going on beforehand.
How would it even caption the model first if it didn't have the capability of sending the raw image to a model in the first place?
>>
>>107410211
>>107410243
send inline images. it is only compatible with chat completion mode instead of text completion mode
>>107410248
yes
>>
>>107410243
I think that requires chat completion instead of text completion. Btw whoever thought about chat completion should have died in a fire before he made it. Why make multiple standards for this shit? It is a good thing that a good model just figures this RETARDATION out.
>>
>>107410211
you need to use chat completion for that
>>
>>107410088
I'm just probing models at the moment.
Even at Q8 Qwen 3 next instruct can't figure that out either, I just decided to test it for the first time either.
(Holy shit now I see what people mean by not X but Y slop)
>char looks at you
>not to look
>not to at
>not to you
>but to look at you
>>
>>107410247
i always keep a memories section in author notes

memories:
-i met up with a, b, c. we formed a party.
-we traveled to x city and took a quest.
-during the quest, i found y item.
-we slayed the dragon and took our loot back to the inn.

i'd write similar for guitar but the opening message would also contain your current level of guitar knowledge, such as if you know scales etc.

memories:
-{{user}} learned basic scales but could use some practice.

you'd have to have it in your mind what sort of progress you intend to make, or an outline to tell the card what to teach. from there a memories suggestion should suffice if you fill it in

while addons are helpful, your most powerful window in st outside of the card itself is author notes and learning to use it will make anything you do that much better.
>>
>>107410289
qwen isn't a good rp model. what one are you using, size? most small models suffer from repetitiveness and filling the context with useless fluff text. if you can run it, l3 70b tunes are still my favorite for rp
>>
>>107410322
learning guitar, not RP shit
>>
>>107410248
We've had DS R1, V3, Kimi. So no. GLM 4.6 may be the peak of usability though, it is the only one that can think without going off the rails.
>>
>>107410322
>not knowing the scab
>>
>>107410264
>>107410243
That's it, the option "inline images" is inside the chat completion mode. Sucks that it can't be used with text completion but oh well. Thank you for the pointers.
>>
>>107410374
all tripfags are fags is all thats needed to know
>>
>>107410348
It's next which is 80BA3B.
So fully grasping possession probably requires more than 3B active no matter what you do.
>>
>>107410431
dense models, without thinking, are the best for rp. i haven't tried the newer 'next' models myself yet though. nemo is probably still better
>>
>>107410404
ah, chat completion doesn't even have repetition penalty. Nor min-p... Who would even use this? It's like trying to drive a gas car with a single-speed transmission.
>>
>>107410469
>dense models, without thinking, are the best for rp
gramps is that you?
>>
File: file.png (91 KB, 749x345)
91 KB
91 KB PNG
>>107409913
They already went back on that btw
On top of evidently delaying everything else listed here
>>
It's so fucking over I want to cry
>>
>>107410554
you can add it with custom settings
>>
>>107410554
>chat completion doesn't even have repetition penalty. Nor min-p.
Don't use rep-pen man.
But you can add those as custom headers if you really want to.
>>
What do you think about minimax m2? 230B, 10A, I tried it on openrouter and it seems to have no slop though it's a bit schizo but in a good way.
>>
>>107410576
I used repetition penalty as an example of an old sampler that was around forever. I was expecting to see it here too. I use DRY otherwise but that also isn't there. I guess customizing them manually in the "additional parameters" box is the best way to get around this limitation for now... So stupid.
>>
what the fuck
how did this go unnoticed?
https://huggingface.co/arcee-ai/Trinity-Mini
Mergekit man trained a foundational model. Released yesterday. (26BA3B, and there's a nano that's 6BA1B)
I assume from the Mini part that there's a larger model in the works, too.
>>
>>107410591
all the shit they posted before was dogshit where they messed with tokenizers and brain damaged models trying to do distills, don't have any hopes for this
>>
you could use llama-guard in reverse mode, no? Keep cranking temp and regenning until the output hits "unacceptable"?
>>
>>107410591
>Recommended settings:
> temperature: 0.15
What?
>>
>>107410591
>how did this go unnoticed?
Because it's not very good
>I assume from the Mini part that there's a larger model in the works, too.
https://www.arcee.ai/blog/the-trinity-manifesto
No need to assume, large one is 420B 13B active
>>
>>107410616
Yes, but it's probably fairly easy to trigger it.
>>
>>107410591
>a3b
>a1b
kek
>>
>Mention any model not made by pajeets
>suddenly ESL tier grammar tooth gnashing picks up
Wow really makes you think.
>>
File: file.png (50 KB, 414x467)
50 KB
50 KB PNG
>>107410683
hi bart
>>
>>107410694
Hello sarrrs you have a mistaken identity.
>>
>>107409734
This means they have pruned the insignificant bits... Don't think I'll even bother downloading this.
>>
>>107410789
Yet the new 14B model outperforms 3.1 Small on a number of benchmarks.
>>
>>107410819
>benchmarks
When will people learn?
>>
>>107410580
It's very safe. Distilled from gpt-oss.
>>
>llama-cli
I can have the processed prompt saved in a file, so it will be loaded next time saving me a lot of time processing it again

>llama-server
I cannot save the processed prompt for further use.

WTF?
>>
>>107410819
>>107410789
I was referring to potential censorship to accommodate EU's draconian future. I'm sure they have pruned some of the "less important" (eg. literary and rp related) weights.
At least the announcement reads like that.
>>
>>107410841
It's better on Wildbench and Arena Hard

                       Wildbench    Arena Hard
Ministral 3 14B 68.5 55.1
Mistral Small 3.2 24B 65.33 43.1
>>
>>107410885
>koboldcpp
just werks
>>
>>107410885
RTFM --slot-save-path and the /slots/ endpoint.
>>
>>107410919
>RTFM
no
>ty, kind anon :P
>>
>>107410348
>l3 70b tunes are still my favorite for rp
any in particular?
>>
>>107410908
One of the things I hate the most about benchmarks that all these corpos post their results on is that I'm somehow expected to know what every single one out of the 24 in four different categories correlate to and then somehow imagine how that relates to its capability to write a not shitty introductory scene
>>
>>107410569
I got a gen for youu
>>
Teto feet on my face
>>
>>107410949
https://arxiv.org/abs/2406.04770
>We introduce WildBench, an automated evaluation framework designed to benchmark large language models (LLMs) using challenging, real-world user queries. WildBench consists of 1,024 tasks carefully selected from over one million human-chatbot conversation logs.

https://github.com/lmarena/arena-hard-auto
>Arena-Hard-Auto is an automatic evaluation tool for instruction-tuned LLMs. [...] V2.0 contains 500 fresh, challenging real-world user queries (open-ended software engineering problems, math questions, etc) and 250 creative writing queries sourced from Chatbot Arena. We employs automatic judges, GPT-4.1 and Gemini-2.5, as a cheaper and faster approximator to human preference.
>>
>>107410591 (Me)
It's pretty dysfunctional in all aspects and a significant downgrade from any other A3B MoE I've played around with.
In ST Prompts it wants to <think> the reply and then repeat it if you prefill with {{char}}:, but sometimes it just thinks it but doesn't repeat it. And if you try to ban phrases it has a shit fit.
And if you let it think properly it loops a lot, re-iterating previous messages constantly.
As for normal prompting it's pretty mid. I asked it to write something out in a specific format and it failed horrible.
>>
>>107410945
my older favorite is Strawberrylemonade 1.0. i recently got the 1.2 newer version which seems okay too so far. both do erp fine but you kinda have to beat the raunchiness into them if you want that. 70b is a good size for the model being aware of details, but l3 also wastes much less tokens describing sunlight and footstep sounds than other models, it moves the story forward. thats why i like it still.
>>
>>107410977
That is informative and I appreciate that, but doesn't mean ministral will be of any use to me if it scores higher than msmall, which I already retired from my drives. I like to feed a model some setting information, character info, then ask it to kick off a chapter so I can continue writing it and no benchmark apart from some of UGI's writing metrics have been any use singling out a useful model for that
>>
>>107411019
Same anon, also worth noting in this >>107410977
>creative writing queries
>we employs
Not a good sign if the readme isn't even grammatically correct and is using llm-as-a-judge to verify results instead of reading them. Writing block is a bitch and getting a decent start can give you a boost in momentum, even if you have to edit a mid gen from an llm. This however does not invoke confidence in the benchmark
>>
>>107411019
I think those benchmarks just mean it's better than Mistral Small 3.2 in trick questions and common trivia that people might ask on LMArena or public chatbot websites (e.g. "What's a mesugaki?"). I would hope it's good enough for RAG-like usage considering it's reportedly been trained for 256k tokens context.
>>
>>107411122
I didn't really think of rag for a second there since I don't like how it's just semantic matching separate from context, but I can see that being valid. Bit skeptical of 256k context, I usually experience breakdowns around 10k minimum unless it's a hybrid arch and they're typically a bit dumber than moes/dense models for some reason
>>
>>107409573
using garlic to fight against gemini 3.. are they calling google vampires?
>>
>>107410909
ty
>>
>>107409616
>with it's
ESL
>>
>>107409734
those models are horrible at following instructions, wouldn't surprise me if all of them (8b and 3b too) are the product of various alchemical retardations like this
the mistral 3 series one more drop of evidence that benchmarks are all bullshit, if they reflected real world performance they should be at the bottom of all rankings.
>>
>>107411202
yes they all are, the legal pages for them say as much
>>
>>107411202
The small ones are disposible trash for benchmarks only. Just like gpt-oss, they don't expect anyone to actually use them. Large 3 is the only real model.
>>
>>107410368
R1 was this year? I thought it was last year.
>>
>>107411262
V3 was last year, R1 in january
>>
>>107411238
I wonder how that mistral large 3 anon is doing, he's been doing anagram posting and constantly looking forward to it and got monkey paw'd into something you can't run at q4 unless you have 400 combined vram/ram
>>
File: 1752273464663119.png (101 KB, 708x555)
101 KB
101 KB PNG
>>107411015
thanks I'll try it out. holy kek at the merge history though, this thing is merged from like 20 different mixes. eg, here's one that it merges
>>
they called it mistral large because of how big of a flop it is
>>
>>107411349
I see my boi crestfall in there so it should be okay
Shame he disappeared, I liked his tunes even though he was working with the era of garbage "base" models that weren't actually base models
>>
>>107411349
i didnt try the unleashed version, but its probably similar enough. merged models often gain some soul but become dumber too at actual tasks. they'll lose points in benchmarks but become better at rp. its been true since at least mythomax.
>>
>>107411355
nothing compared to the llama flop
>>
>>107411355
shit hasn't even been ggufed yet.
>>
>>107411387
i specifically used the wayfarer one for a bit, thats by the ai dungeon guy. its trained specifically so you can fail, get captured, die. trying to beat out the positivity problem so many rp models have. and it was pretty good. but being aid/service, it knew nothing of erp even if you beat it out of it
>>
need local speciale
how's the vibe coding going
>>
>>107411518
I tried some of them but it's all mistral so it falls back into things mistral annoys me about. Overtrained to the point of ignoring instructions, regens play out the same way over and over, etc. I do think its good at least someone is trying to beat back the now common trend of bowing to anything the user says
>>
>scenario involves watching TV with character
>model writes out a listicle of potential shows to watch
actually made me chuckle.
>>
>>107411564
>Scene 1: reading Advance CUDA development
>Scene 2: reading Basic CUDA development
>Scene 3: reading: Google, what is CUDA?
>>
>qwen3 next
>ministral3
>trinity mini
I'm eating good. You can eat shit with your GLM hehe
>>
>>107411626
I'm already burned out on them
Kimi Linear gguf support, when?
>>
>>107411586
from the aid guy, wayferer and nova are l3 70b, not mistral (same with my strawberrylemonade suggestion)
>>
>>107411586
Worth noting another anon in the past has said "llms cant suck dick" and I took that as a litmus test to see how overtrained models are. Most models, regardless if you specify the character hates deepthroating or something, they will go balls deep, cheeks hollowing, gluk gluk style shit. No variety at all.
>>
>>107411292
>I wonder how that mistral large 3 anon is doing
I'm utterly demotivated, I'll take a break for a while. At least I got the "2512" part right...
>>
>>107411661
yeah, you're right but they do have smaller models that suck ass because of the base
though honestly llama 3.1 may not be a whole lot better apart from being able to have some spatial sense
>>
File: 1750412544792981.png (272 KB, 500x500)
272 KB
272 KB PNG
>>107411349
la creatura...
>>
>>107411684
Lets just hope some literal nobody makes a knockoff that is as good in half the resource cost. Mistral has been falling off, sorry that you got fucked for a model you were anticipating
>>
>>107411586
>Overtrained to the point of ignoring instructions
it's even overtrained for markdown sloppa to the point where telling it to translate text verbatim that was in PLAIN TEXT it will still find ways to add tons of down syndrome italics and bold
god knows all LLMs have a strong markdown bias but that was the first time I saw a LLM do this when asked to translate plain text, even the most broken models from less loved chinese labs don't do this
>>
>>107411723
The fact I have to spend like three sentences to tell a model to just write plaintext drives me nuts and they still try to sneak either markdown or """smart punctuation""" in after I tell them specifically not to
>>
>>107411684
Same here. I kind of knew we wouldn't get another dense 123B but having a deepseek clone with pixtral slapped on is the worst possible scenario...

They squashed the git history, probably to remove all the "rename deepseek to mistral" commits
>>
>>107411712
So weird how people in this hobby are so eager to stick their dick into a mutt like that.
>>
File: 1763232550590020.jpg (395 KB, 994x745)
395 KB
395 KB JPG
>wake up
>new model
>it's shit
>go back to drummer coomtune
Feels like I'm stuck in a time loop. Getting real hard to be enthused about local text AI anymore
>>
why can't we have a dense 50B? or like a 130B A40B MoE?
>>
>>107411879
Fuck off drummer
>>
>>107411909
Because it would be a good model that anyone can run.
>>
>>107411909
it costing less to train the hug moes than the 32 dense qween said so
>>
They call it Mistral Large because when you see it, you turn a large amount of degrees and walk away
>>
>>107412042
>>107412042
>>107412042
>>
>>107411909
benchmemes took priority over actual quality, unfortunately
>>
>>107411700
the reason i don't like mistral for rp is all their models, even the biggest, ramble to much. i don't care about heels clicking or light casting long shadows. all models have that slop but l3 has less of it and likes to move stuff forward, where mistral usually stalls but still writes 3 paragraphs of slop anyways
>>
>>107411923
Exactly this. Between them not wanting competition for their paid services and not wanting the unwashed masses to have unrestricted access to information with a capable model that can make problematic connections, once they realized people were buying multiple GPUs to run the mid-sized models they immediately put an end to it.

>>107411940
MoE training cost is just the size of the active params. They're not going to do any more dense 32Bs either, which was always an arbitrary limit.
>>
>>107410945
eva 0.0



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.