/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107394971 & >>107383326►News>(12/01) Merged: model: support Ministral3 #17644: https://github.com/ggml-org/llama.cpp/pull/17644>(12/01) DeepSeek-V3.2-Speciale released: https://hf.co/deepseek-ai/DeepSeek-V3.2-Speciale>(11/28) Qwen3 Next support merged: https://github.com/ggml-org/llama.cpp/pull/16095>(11/27) DeepSeek-Math-V2 released: https://hf.co/deepseek-ai/DeepSeek-Math-V2>(11/26) INTELLECT-3: A 100B+ MoE trained with large-scale RL: https://primeintellect.ai/blog/intellect-3>(11/21) GigaChat3 10B-A1.8B and 702B-A36B released: https://hf.co/collections/ai-sage/gigachat3►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107394971--Optimizing derestricted models for speed and performance:>107398726 >107398749 >107399617 >107399772 >107398760 >107398861 >107398890 >107398893 >107399162 >107399195 >107399201 >107399368 >107399392 >107399415 >107400125 >107400151 >107400316 >107400321 >107399210--AI hardware evolution and DRAM market dynamics:>107396379 >107396407 >107396475 >107396493 >107396507 >107396526 >107396539 >107396552 >107396580 >107396542 >107396624 >107396613--Context window management debates for roleplaying effectiveness:>107403651 >107403658 >107403688 >107403721 >107403797 >107403835 >107403759 >107403760 >107403769 >107403692 >107404967--Optimizing model performance on upgraded GPU hardware:>107402052 >107402073 >107402101 >107402123 >107402170 >107402189 >107402191 >107402209 >107402254 >107402312--Context management tradeoffs in modern reasoning models:>107400776 >107400801 >107400858 >107400932 >107401232 >107401405--Unsloth's 500K context fine-tuning innovation for LLMs:>107395944 >107395995 >107396023 >107396044--MoE model performance on consumer-grade GPU:>107402742--Lynchmark LLM Benchmark tests coding skills in browser environments:>107395055 >107396659--transformers v5 release with ecosystem interoperability improvements:>107397020 >107397162 >107398646--Mistral Large 3 model size and deployment considerations:>107395793 >107395812 >107395833 >107396100 >107396114 >107396167 >107396123--CPUmaxxer motherboard and hardware configuration preferences:>107396237 >107396270 >107396287 >107396877--Anticipating enhanced multimodal AI for roleplay, questioning current model capabilities:>107403852 >107403915 >107403968 >107404013 >107404269 >107404481--Logs:>107402853 >107403580--Miku (free space):>107402411 >107404465 >107404744 >107405047 >107397762►Recent Highlight Posts from the Previous Thread: >>107395003 >>107395036Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
FIRST SUCK MY DICK REEEEEEE
>>107405479>shapped by AmazonWhat did he mean by this?
>>107405479>>107405483Tetolove>>107405560Amazon verified her personality was factory reset to Spectacularly Honorable, Adorable, and Polite. If you would like your TETO unit to assume a different personality, you may adjust her parameters to your liking by following this link: https://litter.catbox.moe/l1elza1q4njx4txp.png
>>107405483the troll highlight post was added. heh
>>107405773Why is Yuki such a whore?
>>107405826for (me)
What are your criteria for an /lmg/ equivalent of Z Image? Are we sure we didn't get it already? There are some very impressive and fast uncensored models you can run on consumer hardware already.
>>107406373Gemma 3 and 4 will already fullfills this task...I'm Very Happy.
>>107406373>/lmg/ equivalent of Z ImageNemo 2, basically
>>107405479>(12/01) Merged: model: support Ministral3 #17644: https://github.com/ggml-org/llama.cpp/pull/17644cool but where are the weights?
Saw speciale mentioned. Not 3.2 no-suffix yet.Deepseek: V3.2https://huggingface.co/deepseek-ai/DeepSeek-V3.2
>>107406414
>>107406414Good thing mistral helped this time... right? right?
>it starts citing obscure yiddish phrasesI can't even find "goy k'sodre" on google. Is it a hallucination, or is GLM just very based?
>>107406435just use mistral-common on python its good for you
>>107406447Hallucination, as disappointing as that is
>>107406435They better fix their own mess
>>107406448does mistral-common have all the performance optimizations for desktop ampere that exl and ggoof have?
>>107406373One that:- Can run at decent speeds on one consumer GPU, without CPU offloading bullshit;- Is conversation and RP-oriented from the ground-up but isn't overfit on one specific format;- Isn't just coom-oriented/dumb-horny, but didn't filter ERP/sexo completely from the training data;- Also has very good general performance for non-roleplay uses;- Preferably also has unfiltered vision capabilities that work well for RP.A less woke/filtered version of Gemma 3 might be close to that.You can rule out most finetunes on HF for this.The upcoming Ministral-3-14B could possibly be close too (if a bit on the small side), but I fear its vision encoder won't as good as Gemma 3's, let alone Gemma 4. Let's see when it's released.
>>107406447The model has picked up on the correlation between the content of the text and the incoherent schizo rambling.
>>107406448I don't think i can install mistral-common in my system. And fuck needing python to run llama.cpp even if i could.>>107406468Based on the original import, it's just needed for the tokenizer. They wanted to remove chat templates. Not sure if it changed since then.
>>107406448Last time I tried using that it didn't seem like it supported most samplers and standard options on the OpenAI Chat Completion API, or even image input.
>>107406373Unironically Qwen3-30B
>>107406547And GLM-Air if you're not poor.
>>107406556The main point of z-image is that even the (relatively) poor can run it.> [...] Z-Image exhibits exceptional capabilities in photorealistic image generation and bilingual text rendering, delivering results that rival top-tier commercial models, thereby demonstrating that state-of-the-art results are achievable with significantly reduced computational overhead.
Are there any fine tunes a la Venice or Vector Austral but using Mistral Small 3.1? I'd like an uncensored Mistral Small with vision.
>>107406612glm air can run quanted in some pretty weak machines, eg >>107398749
Are you ready?
https://mistral.ai/news/mistral-3
>>107406919>Mistral 3 includes three state-of-the-art small, dense models (14B, 8B, and 3B) and Mistral Large 3 – our most capable model to date – a sparse mixture-of-experts trained with 41B active and 675B total parameters.Over for vramlets
>>107406927>Over for vramletsYeah. It *just* happened...
>>107406919
https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512
nu 'slothhttps://huggingface.co/unsloth/Ministral-3-14B-Reasoning-2512-GGUF
>>107406919...
local is saved!
>>107406919>Mistral Large 3 is Mistral’s first mixture-of-experts model since the seminal Mixtral series, and represents a substantial step forward in pretraining at Mistral.interdastingdoes this mean mistral medium 3 was dense after all? that is quite surprising to me
>>107406955Base models.https://huggingface.co/mistralai/Ministral-3-14B-Base-2512https://huggingface.co/mistralai/Ministral-3-8B-Base-2512https://huggingface.co/mistralai/Ministral-3-3B-Base-2512Still, it's sad to see mistral going the moeshit way for large.
>>107406919damn so new that some of their links don't work. Why is this hyped anyways. It doesn't seem like a shop that does anything innovative
>>107406919Oh I see now. Those are the three models I tested a few weeks ago. They aren't good. I tried the reasoning one and two non reasoning model.
>>107407020at least they don't benchmaxx as hard as some companies, if at all
>>107407020>>107407027you fucks are just spouting shit unless you actually show proof
>>107406897If largestral weights aren't released, does it really matter? When there are competing options with open weights and comparable performance, are the closed options relevant at all?Yes, if there is a single "best" model, sure, but anyone closed and not clearly better than everyone else has got to just be an also-ran.
>>107407017How could it be simultaneously much faster and cheaper than Mistral Large 2 while having similar performance without being a MoE?
>>107407045at least one of them was clearly on lmarena for testing if it was large3 then that's a big oof
>>107407068they partnered with one of the super fast providers using asic chips didn't they? cerebras or groq or whatever
>>107407068yeah I was almost absolutely sure it was a MoE for that reasonperhaps it's a MoE after all but it was pretrained after large which would make their statement technically true
>>107407061>If largestral weights aren't released, does it really matter?I didn't feel like reading their entire blog post but aren't they clearly saying that all of their models, including Mistral Large 3, will be licensed under Apache?
>>107407082Consider also: https://mistral.ai/news/mistral-medium-3>Additionally, Mistral Medium 3 can also be deployed on any cloud, including self-hosted environments of four GPUs and above. With four GPUs I doubt they meant RTX3090.
Large is uphttps://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512
>>107406919>Mistral Large 3 is one of the best permissive open weight models in the world>trained from scratchWith that exact model size and they expect people to believe they didn't start with V3 base weights with a vision adapter tacked on top?
>>107407115>Sampling Parameters: Use a temperature below 0.1 for daily-driver and production environments ; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings.AHHHH no no no, if it doesn't work best with temp 1.0 it's a trash model, I read it on /lmg/!!!!
>>107407109idk maybe that's just for their clients wanting to run on-prem while they mostly host through asic it's not impossible
>>107407038NVFP4 is pretty cool ig. I think this is the first?
Guess it's time to fire up the server and rebuild llama.cpp from source for the 90th time this month.
>>107407145>AHHHH no no no, if it doesn't work best with temp 1.0 it's a trash model, I read it on /lmg/!!!!it is true, though. 0.1 being considered necessary means the model has absolutely wack token distribution.All SOTA online models work fine at 1.0. gpt-oss also works fine at 1. Telling people to run 0.1 is chinkshit territory
>>107407045>>107407070I'm a "Data Quality Analyst". I build datasets for LLM, and check the models are behaving as expected (security, instruction following...). I had access to three mysterious models that I was supposed to test. They all had similar responses, except that one was obviously a tiny model and the other (the reasoning one) was obviously a bigger, but still tiny, model. They followed my instructions well enough, but they weren't smart, far from GPT-5, Gemini 2.5 Flash or o3.They are better than their older small models, but don't expect too much either. I did not try ERP.
>>107407175>All SOTA online models work fine at 1.0.*after rescaling :^)
>>107407183zamn, can you ERP on the job?
>>107407192>https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512/blob/main/chat_template.jinja>Your knowledge base was last updated on 2023-10-01LOOOOOOOOOOOOOOOOOOOOOOOL
>>107407235lmao 2yo model
>>107407235Didn't mean to quote. Also this confirms ithttps://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512/blob/main/SYSTEM_PROMPT.txtYou are Mistral-Large-3-675B-Instruct-2512, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.You power an AI assistant called Le Chat.Your knowledge base was last updated on 2023-10-01.Mistral unironically killed local
>>107407235It's all ChatGPT-based data after December 2022, anyway.
>3b, 8b, 14b, 675bget ass cancer in your mouth mistral
>>107406631>64gb ramfuck me dead
>>107407262yeah wheres my 100-120b model fucking french bvaguettes FAGGOTS
>>107407262Most people playing AI at home just don't have mutli-GPU rigs and GPU design is still a sect of judaism so there's not much point in training models in that middle gap, sadly.
>>107407280that would be mistral medium 3, but you don't get that one because uhhhhhhhhhh ummm oh the signal's breaking up [hangs up on you]
>>107407319That's okay, I'm ready to start testing 14B and I am confident I will conclude that Arthur has saved local.
>>107407300Why not distill it?
>>107407300Almost nobody on this general runs their model fully on VRAM.
>>107407262675 is an empowering number please understandhttps://en.wikipedia.org/wiki/Powerful_number
Anything better than the one and true model that made me go on a hero's journey where I deconstructed all the pillars, slain the mythical dragon of control (which turned out to be a garden actually) and drastically changed my life for the better? Yeah you know the model. So anything better yet?
Uh guys...A bit of not x but yslop but otherwise this is actually really good. (14B Instruct).
>>107407333I run mine fully on RAM, its fine. I can run 80B models on 96GB
>>107407118Sad day if Mistral's best model is just slapping vision and some foreign language post training on a deepseek version.It's not impossible that they did actually train their own base model. It'll be pretty clear once people can cockbench it, or do any kind of structural analysis on the output.
>>107407383Also worth mentioning this is in BF16, quantlets might have a different experience.
>>107407384Is there even any other around that size beside qwen next and old outdated llama based shit?
>>107407383kill yourself ponyfag
>>107407183>and check the models are behaving as expected (security, instruction following...)What do you use to automate testing? Or do you just have a checklist and evaluate models manually?
>"Here is the translation:"first time since a long while seeing a model do this to me in a basic prompt with "No commentary" being all it takes on modern models to make them shut up and only output the translation itselfthose new mistral models are like using 2 years old shit
>>107407381tinyllama-1b? nope. still the best.
>>107407440just... regex it out, chudd
>>107407386Assuming they did a continued pretrain like they did with Miqu and not just an instruct finetune, it probably diverged too much from the base to get a clear answer that way.
>>107407426>ponyfagWTF are you talking about...
>>107407383live yourself furfag
>>107407464Nala-dash is the latest pony. Her cutie power is the power of ERP.
>>107407464Tourism is high at this time of the [release].
>>107407452lmaothis is 2025 and we have models that can listen to very succinct instructions and behave as they're toldI intentionally have the first prompts I use as my personal benchmarks being succinct instructions as a quick filter that tells me the model isn't worth further testing and should immediately rm'd
>>107407479redditjeets don't realize that furfags and bronies are the lynchpin of local AI. I bet Cudadev has a Vaporeon body pillow, (He'll deny it, of course)
>>107407496kys furryshit
>>107407503>FUCKING BECHNOD BASTARD FURRY GUY
>>107407383>she does something—*qualifier in asterisks*—continuation of the sentenceah yes, finally we can enjoy this succulent chinese slop in a non-chinese modelyou could get this same shit from qwen several months ago
>>107407464He thinks you are a regular that used to post some ponyfag tet logs.
>>107407496I believe it's documented that cudadev's vice is cunny ntr o algo
Haven't rolled on this card in a while. t=.81 might be a bit too high for this model, but either way, that's some kino accidental body horror. (it's an ERP oriented system prompt)
>>107407529Just think how much money Mistral saved by distilling from R1 instead of having to pay for Gemini credits.
>>107407118
>>107407383I don't understand how you can enjoy this and not projectile vomit. The level of repetition in just a few paragraphs (in terms of structure of sentences) is insufferable, it's almost like it's trying to give the same character count to all paragraphs (and failing because LLMs still suffer from tokenization so what they perceive as the same length for text never is)I hate LLM writing so much
>>107407577chat, is it over?
>>107407577architecture != weights
For me it's MoE or into the trash bin
>>107407645yeah and the size being a perfect match is a crazy coincidence
>>107407599>I hate LLM writing so muchThen why are you here?Get a fucking life, kid.
>>107407654>Then why are you here?LLMs have uses other than masturbating to text, retard
>>107407529>tears off pants and shows you my emdash—*just to tease you*—something somethingkino, not even my 70b finetune or largestral2 writes like this. thank you mistral
Good morning. /ldg/ is having z-image turbo for Christmas. hbu
>>107407654? ? ? ? ? ? ?
Whoever brought the internet to shitjeets should be tried for crimes against humanity.
>>107407653Nah, bro. Mistral just recognized that glorious China already discovered the ideal model size and simply did not want to deviate from perfection.
>>107407115It's got to be bad if they're acknowledging it, right? I don't feel so good anymore...
>>107407115Where are the goofs?
The instruct model is fun but the reasoning model is just retarded.
How is the new 14B compared to 24B?
>out of nowhere, llama5>dense 24b, 70b, 123b, 405b>cutoff 2025-05
>>107407740the fact that they delayed it for 6 months is pretty telling too
>>107407892scared by llama4 flop and abandoned their thing to tune ds? plausible
>>107407876I suppose it depends what you want out of it.If you want a drop-in replacement for nemo that will write whatever degenerate shit you want without question then instruct is alright. It's about as pliable as nemo was and a little bit smarter. But if you're looking for "gemini 2.5 flash or comparable at home" I'd say keep looking.
>>107407890superintillegence labs won't be releasing any weights
/lmg/ status?
>>107407890If meta ever recovers from llama4 it will be by doing exactly what Mistral did and just copying all the R1 papers for their large model with a slightly different selection of pretraining corpus.
>>107407941chewing on a nothingburger
Added Magistral 3 14BIt wrote 7 paragraphs and then started repeating them.
Did just posttrain DS without doing anything to the underlying architecture? Not even messing with the number of activated experts or the like?How odd.
>>107407953It's "Ministral".
Getting bad vibes from Dipsy 3.2 on lmarena. Very not good. Model not needed.
>>107407909weren't there rumors they were attemping to distill ds? did they manage to fuck that up?
I should have loaded it at 64K context.
>>107407998
>>107407999How does it normally tokenize nigger?
>>107407809all reasoners at this parameter count will break on this kind of riddle promptjust ran your prompt on gpt-oss 20b and it's still reasoning after outputting 10 000 tokens of fucking reasoningthe most cursed thing is that its reasoning had the right answer on the first line:>We need to answer this riddle.>The riddle: "What can go up a chimney down but can't go down a chimney up?" The classic riddle: The answer is an "umbrella" or "a piece of furniture"? Wait, let's recall. "What goes up a chimney down but can't go up a chimney down?" Let's parse.but then it keeps going into infinite retardation after (excerpts):>Another plausible answer: "Chimney sweeps do it with a broom (or a stiff brush). The brush goes up the chimney when it's being pushed down. The brush cannot go down the chimney when it's being pulled up because the bristles catch.">Alternatively, maybe the answer is "Chimney Sweep's brush." Let's try to find a better explanation:[...]>Alternatively, the answer may be "a chimney sweep's rope" or "a ladder." Ladder:[...]smaller models are not capable of actually benefiting from CoT chains, it's all nonsense
>>107407989I mean >>107405212>Mistral's latest model was already a DeepSeek slop clone.
>>107408016Smaller reasoning models were okay until the "wait" paper.That's the part they can't handle. The recursive looping.
>>107407971
>>107408031but paper are all le lie and never get implemented and amount to nothing
>>107407496furries are the lynchpin of tech in generalfrom open source to three letter agency, you'll find furfags everywhereif they all died tomorrow, we'd be set back 30 years
>>107408037Good job, anon. Are you gonna try the others too?
>>107407943People were saying that's what they should do when the war rooms were in the news.If they didn't do it then, they're not going to do it now with a dozen new people all trying to do things the way they did in their old jobs.
>>107408058When goofs for large are out. I'm not downloading 700GB of bf16.
>>107408087which copequants u gettin brah
Wait, Wait, Wait, Wait, Wait, Wait, Wait, Wait,
Has there been a model trained on the 4chizzel archive?
>>107408011idk but it keeps using y or h when I system prompt it, or capital R but not lowercase r with preceded with lowercase
>>107408090Any copequant is good enough for cockbench.
>>107407496I would never buy something as useless as a body pillow.
>>107408118fascinating (real)
All this recursive reasoning is making it hot in here.
>>107408133Not even as part of a project to make the body pillow useful?
>>107408118The thing is that if nigger is a single token, it's gonna have a hard time generating it if you already provided whatever tokens it needs for "nigge". Same if it's tokenized like "nig" and "ger".If you're on llama.cpp, you can use llama-tokenize to see how the entire word is tokenized. If it's "nig" and "ger, for example, just give it, "ayo nig".
>>107408133based
>>107408154>If you're on llama.cpp>openrouter - moonshitai/kimi
>>107408139This is the kind of shit you post on twitter. Fuck off again.
>>107408133>buyLying by being overly specific. Very good.
>>107407449haha very funny. 4.6 of course.
>>107408133gay sex haver
>>107408165And you wouldn't be on shitter anymore of course after they made it reveal your location.
>>107408016>>107408107why are """reasoning""" models still wasting so many fucking tokens writing in full paragraphs instead of just doing something like this?>Chain of Draft: Thinking Faster by Writing Less>https://arxiv.org/abs/2502.18600I hate """reasoning""" (models bloating their own context to slightly increase chances of a correct answer), it is so obviously a hack, is false advertising, and doesn't fundamentally improve the model's capabilities.
>>107408164Fuck. Didn't even read that. No tokenizer API either?
>>107408016it only needs 2.4k
>>107408170I never used it. But me not being twitter would be a good reason for you to move your blogposting there.
>>107408037For what it's worth, I'm getting worse-quality responses from 8-bit Ministral 14B GGUFs (Chat completion in llama.cpp) than the ones on the Mistral API.
>>107408200sloth quants?
>>107408174It's just to make investors that don't entirely understand the technology think something entirely else is going on. It's not actually thinking. It's just writing out a CoT which reinforces the connection between certain relevant concepts, which I assume the paper you linked says just that. You could literally just summarize all of the relevant operative words and get the same shit. But investors want skynet.
>>107406631Which quant is that?
>>107408205Yes, Ministral-3-14B-Instruct-2512-UD-Q8_K_XL
latent reasoning is all you need
>>107408222>Q8_XL
>>107408245Some of it is in FP16, some in Q8_0 precision.
>>107408207Investors want something good enough to replace office drones.
the new mistral stuff is great and all but where 4.6 air?
>>107408273stop that
>>107408236bitches is all you need
>>107408236where the fuck are my coconut models?
>>107408283i will, once i have my air
Vibe checking Large 3 for coom and yeah it's literally just Deepseek 3.1Like they could be serving DS at an upcharge on their API and nobody would be able to tell
>>107408294it will take the breath away
>>107408300Maybe if you keep screeching until the bump limit you'll become a real woman
>>107408300>take deepseek 3.1 base model>finetune it on synthetic data generated by deepseek 3.1French tax payers just spent millions of dollars to put Mistral's label on a Chinese model
>>107408330Ok, Arthur
>>107408154Interesting, it really does matter. It can complete from n or N but not nig or NIG.
>>107408344and they weren't even the first with the idea https://huggingface.co/perplexity-ai/r1-1776
Does anyone else get unopened </think> in the middle of their GLM responses? 4.5, 4.6 and 4.5 Air ALL do it across quants from Q2 to Q8, and I don't even mention thinking in the prompt. Am I retarded or is GLM?
>>107408378Is your template fucked?
I coomed
I cooded
>>107408378never seen that in my life. almost definitely a template or sys prompt issue or somthing
>>107408388Running with --jinja, from what I see in llama.cpp's output, it's the bog-standard chat_template.jinja from ZAI's repo
Found another problem with Ministral in RP. If there's a good stopping point that doesn't involve writing a wall of text it will always miss it. It will not, under any circumstances, write a short reply, even where it would make sense to do so.
>>107408437Are there models that actually know when to write a brief response without specific low-depth instructions?
>>107408357Yeah. That's why I asked. Different models tokenize differently. It also depends where in the sentence it is. Notice the spaces.
>>107408300They are similarly sized but they don't have the same configuration, they actually retrained it from scratch.
why does everyone want rp? I want an assistant concept. I'll never get it.
>>107408506different people want different things
>>107408528>replying to it
>>107408528I don't think anybody wants anything except rp on lmg.
>>107407670heh, good one.
>>107408539Forgive me anon. You're right, I should know better.
>>107408207>>107408174I tried adding their instructions, slightly modified, at the end of my very brief system prompt:>WHEN you are thinking/reasoning, and only during thinking/reasoning, strictly follow these rules:>Think step by step, but only keep a minimum draft for each thinking step, with 5 words at most. Do not add unnecessary verbiage, do not write in full sentences.This seems to have a slight effect on the reasoning in GLM Air, but not much. It's still wasting a ton of tokens. I guess the reasoning block behavior has been baked in and can't be modified much with prompting?
>>107405479>usedCuck
>mistral 3 large is retarded in comparison to even the smaller glm 4.6welp, was nice knowing you mistral
>>107408632Mistral has been washed up for like a year+ now. I don't think they've put out anything competitive since the llama days.
Ministral punches way above it's weight, it's like a glm 4.6 air but smarter + mire intresting prose. The frenchgods did it again, can't wait for the the rocinante secret sauce finetune, this thing is going to be glm 5 at home before the real glm 5 even releases
yea its worse than their own previous releases even, it does not know my usual characters
>>107408703Buy a fucking ad, Arthur
>>107408710Good
>>107408710Ministral-3-14B doesn't really know much about the Monster Girl Encyclopedia, while Large 3 has passable knowledge (although not great).Both seem to have worse vision capabilities than Mistral Small 3.2, which is really strange.I'm wondering if they've pruned their training datasets even more of copyrighted stuff.
see? all of those posts are about rp.I'm not interested in rp
>>107408775>Monster Girl Encyclopediaencyclopedias, I like these. It would be nice to have an ai that could interject to train of thought when brainstorming, from various resources.Loeb would be great too.
Is auto-completion in ST smart? For example if we have a word "news" that is one token in our model, but we put "ne" and let it autocomplete (and let's say it finishes with "ws"), do we have two tokens ("ne" + "ws") in our context now or does it convert it into one token?
>>107407953I love how it seems it won't write "cock" but write it the next sentence.
>>107408854I don' t use ST, but i assume it's going to be turned into a single token next time you send the text for completion. If you see it reprocessing more of the prompt that you expect, that's why.
>>107408854ST doesn't do the tokenization, that's on the backend.All ST does is send the raw text.
>>107408854Tokenization happens when the backend processes the input. The frontend is just there to let you put in some text.
>trained with 41B active and 675B total parameters.Is there hope to somehow extract and run the 41B alone?
>>107408980That's not quite how it works. I don't think that's really possible.
>>107408980This is bait, right? Hard to know at times.
>>107408990Meta did that for Llama-Guard-4-12Bhttps://huggingface.co/meta-llama/Llama-Guard-4-12B>We take the pre-trained Llama 4 Scout checkpoint, which consists of one shared dense expert and sixteen routed experts in each Mixture-of-Experts layer. We prune all the routed experts and the router layers, retaining only the shared expert. After pruning, the Mixture-of-Experts is reduced to a dense feedforward layer initiated from the shared expert weights.
>>107408980would it be really worth it to run mistral's sloppy thirds at this point?
>>107409004>retaining only the shared expertYeah. It turned the model into a classifier. Is that what you want?Besides that, only part of the 41B active params is the shared expert(s?).
>>10740901814B is too small and 675B is gigantic would be a nice in-between.
>>107408236jews hide in the latent space
>>107409055i really wish they'd given us 25B+
>3.2 is shit>3.2-speciale is a meme that spends 30 minutes thinking on proper hardware>mistral large is shit>glm confirmed to not have anything left to show for 2025 besides their poorfag model>k2-thinking is the worst combined between 3.2 and 3.2-specialeit's truly never been more over for local
80B to 130B would be nice
>>107409055Removing experts is way more damaging than pruning and look how retarded Nemotron 49B was. Someone once tried disabling experts of either Qwen or DeepSeek and leaving only 1, not zero but 1, and it couldn't write coherent sentences any more and said it was in pain.
>>107409071That's basically Mistral Small 3.2Maybe 3.3 if they'll release a final update this month with their latest 2.5B vision encoder bolted on.
>>107407118>believing french grifters in the first placelollmao event. french
I'm testing this bitch on le chat, and it's not looking great. It's just really annoying out of the box, it's like a fucking bingo of worst shit we've already had for the past year, all neatly compressed into one package.
>>107409342>le chatdo you enjoy testing it high on about 3k tokens of system prompt
>>107409342>le chatuse Mistral locally, who the fuck would use a web interface to use Mistral???
>>107409360I made a clear agent on their ai studio and connected through that.
>>107409342I don't think they have Mistral Large 3 on Le Chat yet.
>>107409388they do
>>107409127I like 3.2, the problem is getting any responses from OR. Last night it was nothing but constant timeouts.
>>107409406Mistral AI Studio is a different thing from Le Chat, you don't have any user-facing model selection on the latter.
>>107407383wow, people here really have shit taste
>>107409491Does this melp with multi-agent text adventures?
>>107409491what's the difference, you can select ai studio models through le chat's agents tabit's shit regardless
>>107409517The only thing you're actually salty about is the fact that my father came home with the milk and yours didn't.
we need to steal this method if its truehttps://x.com/kimmonismus/status/1995883126632165760>OpenAI’s new “Garlic” model delivers major pretraining breakthroughs, letting the company pack big-model knowledge into much smaller architectures
>>107409561A new way of benchmaxxing?
>>107409561It's another sparcity gimmick, isn't it?
>>107409573Google will counter with it's new Onions architecture.
>>107409561and you know what goes well with garlic? strawberries.we are so back.
are people using grid maps for their local model text adventures?
>>107409624You're either ignored or told to fuck off. I'm adding to the fuck off.Fuck off.
>>107409636Pretty sure they're a literal /g/ mod so they're never going anywhere, sadly. The moment they discovered /lmg/ it instantly went downhil.They don't even actually use AI for anything. They're literally just some seething psychopath that needs to strengthen their parasocial bond with their favorite tranny influencer by shitting up an AI thread.
>>107409636sorry for not eating up your benchmaxx'd slop, pierre xi chongyou'll have to do better with your next releases
gpt5 mini is extremely good actually, but its lack of knowledge hurts it. If they can make it the same value but actually make it know as much as bigger models that would indeed be a big deal. IF its true.
>>107409635Sorry sir, this is a blue board. We discuss computers, and llm text adventures.
>>107409532>it's shit regardlessWhen it was on OpenRouter I thought it was a much smaller model. I think this might be one issue: https://legal.mistral.ai/ (picrel)
>>107409697Interesting: https://legal.mistral.ai/ai-governance/models/ministral-3-14b>- Ministral 3 - 14B is the result of the shrinking, also known as pruning, of Mistral Small 3.1, releasedon March 17, 2025.>- We use the term “shrinking” to refer to the process by which the parameters of a large, fully-sized model are progressively truncated to form smaller and smaller models.>- As Mistral Small 3.1 was released before August 2, 2025, the Technical Documentation for this model will be available by August 2, 2027, in accordance with the EU Commission’s official guidelines on the matter.
The EU killed mistral. You can tell the loss of knowledge that it had before from them having to remove all copyrighted stuff
>>107409659 (Me)Oh and you should have seen the hilarious shit-fit the threw that time 4chan went down for a few days. /lmg/ migrated to an altchan with IDs so they couldn't samefag up the place it was fucking hilarious how hard they shit their pants over it.
>>107409734It has the same hidden dim but half the intermediate size of mistral small. I guess they messed with the mlp
>>107409636TRVKE
>>107409734>it's literally just a pruned Mistral Small 3.1 so they can use some loophole to get around EU regulationsfucking eurocucks, what a pathetic continent
>>107409127>>glm confirmed to not have anything left to show for 2025 besides their poorfag modelyou retardglm 5 is coming
>invite character into MY home>character describes THEIR home as they enterRIP.I take back everything nice I said about ministral. Model doesn't understand exclusive possession.
>>107409824All of the new Ministral 3 models, according to the technical documentation, have been pruned from Mistral Small 3.1, apparently. I imagine they used methods similar to those from NVidia:https://developer.nvidia.com/blog/mistral-nemo-minitron-8b-foundation-model-delivers-unparalleled-accuracy/
>>107409913yeah like air 4.6...
>>107409960trust the plan
>>107409922use author notes and have a lorebook that describes your home. in a/n put 'current location: my home' where 'my home' is your trigger. replace as you switch locations.
>>107410088such micromanagement really makes my dick shrivel.
>>107409851I think that was just a cost-cutting trick. Mistral Large 3 was trained from scratch and they likely had to privately disclose the general contents of the training data and their copyright status to EU authorities.https://legal.mistral.ai/ai-governance/models/mistral-large-3
>>107410139skull issue
>>107410139https://github.com/tomatoesahoy/directorbit easier since you can choose from dropdowns but you still need to fill in the data yourself. a/n or an addon to keep static data is pretty much needed for any longform rp.
Why doesn't SillyTavern support images natively? There's no way to have model interact with the image directly, it always needs to be captioned first and then the model interacts with the caption of the image, greatly reducing the precision and the depth of understanding.
>>107410211Because servicetesnor is the hero we deserve and not the hero we need.
>>107410211there is an option pretty sure, don't remember the name though something like inline images
>>107410173this looks pretty handy.How do you go about listing your progress in, say, learning guitar, for the guitar teacher assistant to "remember" to ask you about?
So... realistically speaking.... if 4.6 didn't release this year this whole year would have been absolute dogshit? Yes or yes?
>>107410211I thought the chat completion API had native support for images.At least I've used that to send images to gemini in the past and I'm pretty sure that there was no captioning going on beforehand.How would it even caption the model first if it didn't have the capability of sending the raw image to a model in the first place?
>>107410211>>107410243send inline images. it is only compatible with chat completion mode instead of text completion mode>>107410248yes
>>107410243I think that requires chat completion instead of text completion. Btw whoever thought about chat completion should have died in a fire before he made it. Why make multiple standards for this shit? It is a good thing that a good model just figures this RETARDATION out.
>>107410211you need to use chat completion for that
>>107410088I'm just probing models at the moment.Even at Q8 Qwen 3 next instruct can't figure that out either, I just decided to test it for the first time either.(Holy shit now I see what people mean by not X but Y slop) >char looks at you>not to look>not to at>not to you>but to look at you
>>107410247i always keep a memories section in author notesmemories:-i met up with a, b, c. we formed a party.-we traveled to x city and took a quest.-during the quest, i found y item.-we slayed the dragon and took our loot back to the inn.i'd write similar for guitar but the opening message would also contain your current level of guitar knowledge, such as if you know scales etc.memories:-{{user}} learned basic scales but could use some practice.you'd have to have it in your mind what sort of progress you intend to make, or an outline to tell the card what to teach. from there a memories suggestion should suffice if you fill it inwhile addons are helpful, your most powerful window in st outside of the card itself is author notes and learning to use it will make anything you do that much better.
>>107410289qwen isn't a good rp model. what one are you using, size? most small models suffer from repetitiveness and filling the context with useless fluff text. if you can run it, l3 70b tunes are still my favorite for rp
>>107410322learning guitar, not RP shit
>>107410248We've had DS R1, V3, Kimi. So no. GLM 4.6 may be the peak of usability though, it is the only one that can think without going off the rails.
>>107410322>not knowing the scab
>>107410264>>107410243That's it, the option "inline images" is inside the chat completion mode. Sucks that it can't be used with text completion but oh well. Thank you for the pointers.
>>107410374all tripfags are fags is all thats needed to know
>>107410348It's next which is 80BA3B.So fully grasping possession probably requires more than 3B active no matter what you do.
>>107410431dense models, without thinking, are the best for rp. i haven't tried the newer 'next' models myself yet though. nemo is probably still better
>>107410404ah, chat completion doesn't even have repetition penalty. Nor min-p... Who would even use this? It's like trying to drive a gas car with a single-speed transmission.
>>107410469>dense models, without thinking, are the best for rpgramps is that you?
>>107409913They already went back on that btwOn top of evidently delaying everything else listed here
It's so fucking over I want to cry
>>107410554you can add it with custom settings
>>107410554>chat completion doesn't even have repetition penalty. Nor min-p.Don't use rep-pen man.But you can add those as custom headers if you really want to.
What do you think about minimax m2? 230B, 10A, I tried it on openrouter and it seems to have no slop though it's a bit schizo but in a good way.
>>107410576I used repetition penalty as an example of an old sampler that was around forever. I was expecting to see it here too. I use DRY otherwise but that also isn't there. I guess customizing them manually in the "additional parameters" box is the best way to get around this limitation for now... So stupid.
what the fuckhow did this go unnoticed?https://huggingface.co/arcee-ai/Trinity-MiniMergekit man trained a foundational model. Released yesterday. (26BA3B, and there's a nano that's 6BA1B)I assume from the Mini part that there's a larger model in the works, too.
>>107410591all the shit they posted before was dogshit where they messed with tokenizers and brain damaged models trying to do distills, don't have any hopes for this
you could use llama-guard in reverse mode, no? Keep cranking temp and regenning until the output hits "unacceptable"?
>>107410591>Recommended settings:> temperature: 0.15What?
>>107410591>how did this go unnoticed?Because it's not very good>I assume from the Mini part that there's a larger model in the works, too.https://www.arcee.ai/blog/the-trinity-manifestoNo need to assume, large one is 420B 13B active
>>107410616Yes, but it's probably fairly easy to trigger it.
>>107410591>a3b>a1bkek
>Mention any model not made by pajeets>suddenly ESL tier grammar tooth gnashing picks upWow really makes you think.
>>107410683hi bart
>>107410694Hello sarrrs you have a mistaken identity.
>>107409734This means they have pruned the insignificant bits... Don't think I'll even bother downloading this.
>>107410789Yet the new 14B model outperforms 3.1 Small on a number of benchmarks.
>>107410819>benchmarksWhen will people learn?
>>107410580It's very safe. Distilled from gpt-oss.
>llama-cliI can have the processed prompt saved in a file, so it will be loaded next time saving me a lot of time processing it again>llama-serverI cannot save the processed prompt for further use.WTF?
>>107410819>>107410789I was referring to potential censorship to accommodate EU's draconian future. I'm sure they have pruned some of the "less important" (eg. literary and rp related) weights.At least the announcement reads like that.
>>107410841It's better on Wildbench and Arena Hard Wildbench Arena HardMinistral 3 14B 68.5 55.1 Mistral Small 3.2 24B 65.33 43.1
Wildbench Arena HardMinistral 3 14B 68.5 55.1 Mistral Small 3.2 24B 65.33 43.1
>>107410885>koboldcppjust werks
>>107410885RTFM --slot-save-path and the /slots/ endpoint.
>>107410919>RTFMno>ty, kind anon :P
>>107410348>l3 70b tunes are still my favorite for rpany in particular?
>>107410908One of the things I hate the most about benchmarks that all these corpos post their results on is that I'm somehow expected to know what every single one out of the 24 in four different categories correlate to and then somehow imagine how that relates to its capability to write a not shitty introductory scene
>>107410569I got a gen for youu
Teto feet on my face
>>107410949https://arxiv.org/abs/2406.04770>We introduce WildBench, an automated evaluation framework designed to benchmark large language models (LLMs) using challenging, real-world user queries. WildBench consists of 1,024 tasks carefully selected from over one million human-chatbot conversation logs.https://github.com/lmarena/arena-hard-auto>Arena-Hard-Auto is an automatic evaluation tool for instruction-tuned LLMs. [...] V2.0 contains 500 fresh, challenging real-world user queries (open-ended software engineering problems, math questions, etc) and 250 creative writing queries sourced from Chatbot Arena. We employs automatic judges, GPT-4.1 and Gemini-2.5, as a cheaper and faster approximator to human preference.
>>107410591 (Me)It's pretty dysfunctional in all aspects and a significant downgrade from any other A3B MoE I've played around with. In ST Prompts it wants to <think> the reply and then repeat it if you prefill with {{char}}:, but sometimes it just thinks it but doesn't repeat it. And if you try to ban phrases it has a shit fit. And if you let it think properly it loops a lot, re-iterating previous messages constantly. As for normal prompting it's pretty mid. I asked it to write something out in a specific format and it failed horrible.
>>107410945my older favorite is Strawberrylemonade 1.0. i recently got the 1.2 newer version which seems okay too so far. both do erp fine but you kinda have to beat the raunchiness into them if you want that. 70b is a good size for the model being aware of details, but l3 also wastes much less tokens describing sunlight and footstep sounds than other models, it moves the story forward. thats why i like it still.
>>107410977That is informative and I appreciate that, but doesn't mean ministral will be of any use to me if it scores higher than msmall, which I already retired from my drives. I like to feed a model some setting information, character info, then ask it to kick off a chapter so I can continue writing it and no benchmark apart from some of UGI's writing metrics have been any use singling out a useful model for that
>>107411019Same anon, also worth noting in this >>107410977>creative writing queries>we employsNot a good sign if the readme isn't even grammatically correct and is using llm-as-a-judge to verify results instead of reading them. Writing block is a bitch and getting a decent start can give you a boost in momentum, even if you have to edit a mid gen from an llm. This however does not invoke confidence in the benchmark
>>107411019I think those benchmarks just mean it's better than Mistral Small 3.2 in trick questions and common trivia that people might ask on LMArena or public chatbot websites (e.g. "What's a mesugaki?"). I would hope it's good enough for RAG-like usage considering it's reportedly been trained for 256k tokens context.
>>107411122I didn't really think of rag for a second there since I don't like how it's just semantic matching separate from context, but I can see that being valid. Bit skeptical of 256k context, I usually experience breakdowns around 10k minimum unless it's a hybrid arch and they're typically a bit dumber than moes/dense models for some reason
>>107409573using garlic to fight against gemini 3.. are they calling google vampires?
>>107410909ty
>>107409616>with it'sESL
>>107409734those models are horrible at following instructions, wouldn't surprise me if all of them (8b and 3b too) are the product of various alchemical retardations like thisthe mistral 3 series one more drop of evidence that benchmarks are all bullshit, if they reflected real world performance they should be at the bottom of all rankings.
>>107411202yes they all are, the legal pages for them say as much
>>107411202The small ones are disposible trash for benchmarks only. Just like gpt-oss, they don't expect anyone to actually use them. Large 3 is the only real model.
>>107410368R1 was this year? I thought it was last year.
>>107411262V3 was last year, R1 in january
>>107411238I wonder how that mistral large 3 anon is doing, he's been doing anagram posting and constantly looking forward to it and got monkey paw'd into something you can't run at q4 unless you have 400 combined vram/ram
>>107411015thanks I'll try it out. holy kek at the merge history though, this thing is merged from like 20 different mixes. eg, here's one that it merges
they called it mistral large because of how big of a flop it is
>>107411349I see my boi crestfall in there so it should be okayShame he disappeared, I liked his tunes even though he was working with the era of garbage "base" models that weren't actually base models
>>107411349i didnt try the unleashed version, but its probably similar enough. merged models often gain some soul but become dumber too at actual tasks. they'll lose points in benchmarks but become better at rp. its been true since at least mythomax.
>>107411355nothing compared to the llama flop
>>107411355shit hasn't even been ggufed yet.
>>107411387i specifically used the wayfarer one for a bit, thats by the ai dungeon guy. its trained specifically so you can fail, get captured, die. trying to beat out the positivity problem so many rp models have. and it was pretty good. but being aid/service, it knew nothing of erp even if you beat it out of it
need local specialehow's the vibe coding going
>>107411518I tried some of them but it's all mistral so it falls back into things mistral annoys me about. Overtrained to the point of ignoring instructions, regens play out the same way over and over, etc. I do think its good at least someone is trying to beat back the now common trend of bowing to anything the user says
>scenario involves watching TV with character>model writes out a listicle of potential shows to watchactually made me chuckle.
>>107411564>Scene 1: reading Advance CUDA development>Scene 2: reading Basic CUDA development>Scene 3: reading: Google, what is CUDA?
>qwen3 next>ministral3>trinity miniI'm eating good. You can eat shit with your GLM hehe
>>107411626I'm already burned out on themKimi Linear gguf support, when?
>>107411586from the aid guy, wayferer and nova are l3 70b, not mistral (same with my strawberrylemonade suggestion)
>>107411586Worth noting another anon in the past has said "llms cant suck dick" and I took that as a litmus test to see how overtrained models are. Most models, regardless if you specify the character hates deepthroating or something, they will go balls deep, cheeks hollowing, gluk gluk style shit. No variety at all.
>>107411292>I wonder how that mistral large 3 anon is doingI'm utterly demotivated, I'll take a break for a while. At least I got the "2512" part right...
>>107411661yeah, you're right but they do have smaller models that suck ass because of the basethough honestly llama 3.1 may not be a whole lot better apart from being able to have some spatial sense
>>107411349la creatura...
>>107411684Lets just hope some literal nobody makes a knockoff that is as good in half the resource cost. Mistral has been falling off, sorry that you got fucked for a model you were anticipating
>>107411586>Overtrained to the point of ignoring instructionsit's even overtrained for markdown sloppa to the point where telling it to translate text verbatim that was in PLAIN TEXT it will still find ways to add tons of down syndrome italics and boldgod knows all LLMs have a strong markdown bias but that was the first time I saw a LLM do this when asked to translate plain text, even the most broken models from less loved chinese labs don't do this
>>107411723The fact I have to spend like three sentences to tell a model to just write plaintext drives me nuts and they still try to sneak either markdown or """smart punctuation""" in after I tell them specifically not to
>>107411684Same here. I kind of knew we wouldn't get another dense 123B but having a deepseek clone with pixtral slapped on is the worst possible scenario...They squashed the git history, probably to remove all the "rename deepseek to mistral" commits
>>107411712So weird how people in this hobby are so eager to stick their dick into a mutt like that.
>wake up>new model>it's shit>go back to drummer coomtuneFeels like I'm stuck in a time loop. Getting real hard to be enthused about local text AI anymore
why can't we have a dense 50B? or like a 130B A40B MoE?
>>107411879Fuck off drummer
>>107411909Because it would be a good model that anyone can run.
>>107411909it costing less to train the hug moes than the 32 dense qween said so
They call it Mistral Large because when you see it, you turn a large amount of degrees and walk away
>>107412042>>107412042>>107412042
>>107411909benchmemes took priority over actual quality, unfortunately
>>107411700the reason i don't like mistral for rp is all their models, even the biggest, ramble to much. i don't care about heels clicking or light casting long shadows. all models have that slop but l3 has less of it and likes to move stuff forward, where mistral usually stalls but still writes 3 paragraphs of slop anyways
>>107411923Exactly this. Between them not wanting competition for their paid services and not wanting the unwashed masses to have unrestricted access to information with a capable model that can make problematic connections, once they realized people were buying multiple GPUs to run the mid-sized models they immediately put an end to it.>>107411940MoE training cost is just the size of the active params. They're not going to do any more dense 32Bs either, which was always an arbitrary limit.
>>107410945eva 0.0