/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107481183 & >>107470372►News>(12/09) Introducing: Devstral 2 and Mistral Vibe CLI: https://mistral.ai/news/devstral-2-vibe-cli>(12/08) GLM-4.6V (106B) and Flash (9B) released with function calling: https://z.ai/blog/glm-4.6v>(12/06) convert: support Mistral 3 Large MoE #17730: https://github.com/ggml-org/llama.cpp/pull/17730>(12/04) Microsoft releases VibeVoice-Realtime-0.5B: https://hf.co/microsoft/VibeVoice-Realtime-0.5B>(12/04) koboldcpp-1.103 prebuilt released: https://github.com/LostRuins/koboldcpp/releases/tag/v1.103►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107481183--Papers:>107487575--Mistral's Devstral 2 release:>107491699 >107491763 >107491838 >107491858 >107492060 >107492483 >107492185 >107492255 >107492288 >107492320 >107492219 >107492268 >107492389 >107492454 >107492472 >107492525 >107492593 >107492623 >107492634 >107492875 >107491992 >107492067 >107492081 >107492338--Devstral 2's EU regulatory exemptions and potential unrestricted training:>107492927 >107492992 >107493039 >107493088--LLM hardware needs and performance tradeoffs for roleplay/video generation:>107488035 >107488072 >107488094 >107488146 >107488177 >107488250 >107488266 >107488291 >107488300 >107488328 >107488443 >107488454 >107488498 >107491301 >107491373--Upgrading from Tesla V100 to RTX 50 series for better chatbot performance:>107488666 >107488693--Observations on Ministral-3 quirks and potential model collapse:>107484074 >107484294 >107484310 >107484454--Intellect-3 performance and AI architecture limitations discussion:>107483224 >107483625 >107484485 >107484921 >107485250--GLM-AIR sampler preferences and effectiveness comparisons:>107482984 >107483086 >107483116 >107483227 >107486199 >107483099--Mistral Medium 3 release speculation and EU regulatory challenges:>107486923 >107486953 >107487229 >107487265 >107488685 >107488835 >107488870 >107489030 >107490519 >107490548--Intel B60 GPU issues with LLM inference:>107491645--Mistral-Medium-3 size and format discussions:>107487529 >107487548 >107488064 >107487902 >107488690 >107488781 >107488802 >107492527 >107492594--AI bubble predictions and growth expectations:>107483823 >107483859 >107483875 >107483907 >107483915 >107483926 >107483965 >107485420 >107485806 >107483952--Sam Altman's alleged role in high RAM prices:>107482119--Miku (free space):>107487256 >107489192 >107489329 >107490563►Recent Highlight Posts from the Previous Thread: >>107481187Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>10749361196GB VRAM is the way.
>>107493611Happy TetodayThread Theme: https://www.youtube.com/watch?v=M40MIxGK3is
Mistral Nemo Large is now real
>>107493611Hideous OP thxhttps://www.youtube.com/watch?v=Rt8_uc76J3U
>>107493517The EU was already cucking out on their AI Act as of last monthhttps://www.reuters.com/sustainability/boards-policy-regulation/eu-ease-ai-privacy-rules-critics-warn-caving-big-tech-trump-2025-11-19/https://www.reuters.com/sustainability/boards-policy-regulation/eu-delay-high-risk-ai-rules-until-2027-after-big-tech-pushback-2025-11-19/Nobody is going to rag on Mistral for getting cheeky while the deregulation lobbyists have the initiative
the first horsemen was local completely dying not getting any models let alone ones that are as good or better then the best globohomo ones (no offense ai models i love you all you are all frens to me) the second is the hardware itself nvidia before but now with ram aswell the third is the goverment juden>>107493813if true it would mean the third horsemen is falling but speaking of which werent there some law for america aswell how did that go ?
>>107493632If devstral 2 is as good as it claims, local coding with a single rtx 6000 is possible. huge honestly
>>107493611No GOOFs out yet for the big Devstral 2 model, but I bet somebody here could run the full unquantized Devstral 2 small model as a test. I'm curious how good the small one is for its size class, as that would be a great indicator of how the big model will perform for its size class.https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512
Rather annoying that they again chose to release a small model and a huge model, with nothing in-between.
>>107493632>>107493893How are the prices looking? Not many price charts for blackwell worksation cardBest I find here is £7,859.99 = $10.4k USD this is fine ;;))))
>>107493952compared to 8b-600b these are the in-between
Okay so what are the real proper sampler settings for GLM 4.6? For anything non-code Z.ai recommends temperature 1.0 but that's slightly too loose. Last night using it to generate fiction, in the first 4572 generated tokens it mixed Chinese with English once. ("The world’s first由此而生 monster") In the next 5312 generated tokens it emitted ill-formatted English once. ("a consciousness like the troll’s or theFather's.") In the next 4064 generated tokens it mixed Chinese with English once. ("walked calmly through the火灾.") In the next 11046 generated tokens I didn't notice any problems like that. (Token counts are adding together entire messages.)
>>107493997I'm talking here about GLM 4.6 not 4.6V.
>>107493927Test how? Spinning herptagon?
>>107493979That's true. I was hoping for an in-between of the in-between I guess. 40b to 70b would be perfect.
>>107493997I just use 0.8 temp + 0.02 min-p.
>>107493997>>107494010Which quant are you running?
>>107494014I was under the impression that it's possible to run the unquantized small model? If so, then compare the 24b against Gemma 27b or 32b fine-tunes. How it does in that comparison would likely mirror how the 123b does against others of its size class.
Is a GLM 4.6 Q2 cope quant worth it for 24gb vram + 64 system ram? Or should I just stick with Gemma. GLM 4.5 failed to deliver.
>The tokenizer you are loading from 'cyankiwi_GLM-4.6V-AWQ-4bit' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e.>This will lead to incorrect tokenization. >You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.What the fuck is it talking about...
>>107493927https://huggingface.co/mistralai/Devstral-2-123B-Instruct-2512/blob/main/CHAT_SYSTEM_PROMPT.txt>You are Devstral-Medium-2-124B-Instruct-2512, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.>You power an AI assistant called Le Chat.>Your knowledge base was last updated on 2023-10-01.>The current date is {today}.[...]
>>107494022https://hf.co/finding1/GLM-4.6-MLX-8.5bpw
>>107493997I mostly use temp=1(off) minP=0.03 maybe ± 0.01 on scenario but that's for assistant/RP Q3_K_M
>>107493927https://github.com/ggml-org/llama.cpp/pull/17889>should work now with --mistral-formatI think it's ready?
>>107494067Does Q2 even fit?
>>107494186It does. I can run UD-Q2_K_XL. It's pretty fast, as well.
where mistral medium?
>all GPUs gone>all RAM gone>all SSD and HDD gone>flash memory probably nextWhat's left? Mechanical computers or something?
>>107494262Phones that you can use to connect to "the cloud".
>>107494235It just dropped, go grab it ----> https://huggingface.co/mistralai/Devstral-2-123B-Instruct-2512
>>107494047I meant, what do you want to prompt it to gauge its ability?
>>107494286But all phone RAM went to datacenters? AI revolution means no personal tech devices of any kind.
>>107494292that just looks like a shitty retrain of old largestral. what about that guy in the last thread?
>>107494304You won't need RAM in the future where your Meta Ray-Ban AI always-online thin-client cloud-connected glasses are the only personal tech device you ever need
>>107494295How about the old Nala test?
>>107494350
>>107494262writing fanfic about what we're missing
>>107494295>justpaste DOT it/GreedyNalaTests
>>107494411Holy slopkino
>>107493927It's literally just pixtral arch so theoretically should already be supported by goofs.
>>107494548You'd think so, but Mistral keeps changing the tokenizer
>goofs for glm.4.6v flash are here>but no mmprojLMAO bros I love multimodal models!!!!!!
Devstral is up on OR if somebody wants to test it before the ggufs are out. In terms of first impressions, it reminds me a lot of Mistral Large 2 but smarter.
>>107494925What did it do that Large 2 couldn't for it to be smarter?
>try to run devstral 2 with vllm>does not respect CUDA_VISIBLE_DEVICES>RuntimeError: NCCL error: unhandled cuda error
>>107494933I gave it some of my scenario cards that I remember Large 2 struggling with and only really became usable with local models around DS3-0324.
>>107495041That's actually pretty neat.Might as well keep a simple record. A sort of very loose private benchmark.
>>107494262Are you living in some sort of parallel universe? Where in the fuck is all the stuff "gone"? Retarded human bot
>>107491645fake, b60s are still not on sale
Ew, Devstral *really* likes to *spam* asterisks for emphasis so there's at least some of their Deepseek distill slop in there.
>>107495211A few did sneak out or got parted out of their systems for a good price but for the most part, other than the increased VRAM, it's basically like a B580 in performance.
Devstral 2 has the prose of an early 2024 model and makes more logic/continuity errors than the fat MoEs
>>107495231>>107494925>>107494996>>107495355>Everyone is fucking the coder bot alreadyIs it better than 2411 mistral at least?
>>107493997>In the next 11046 generated tokens I didn't notice any problems like that.I noticed one in the final message in that group ("wassmart").
>>107495434Every Mistral model is a little french whore who spreads her legs for anyone
>>107495500Allow me to prompt my question more specifically.Does it still have the repetition issues more or less?
>>107495231They're called grounding tokens and they work
>>107495135OpenAI man needs it for his secret 100 Yottabyte parameter dick sucking robot wife.
>>107495525Small 3.2 already solved them, for the most part. Mistral models don't repeat themselves any more than other similar sized models.
>>107495630robot husband you mean. he is gay
>>107495679Why do you think its a secret?
>>107495679I thought he was in an incestuous relationship with his sister
https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512Why does it have two sets of safetensors? What's the difference between the model and consolidated sets? They don't mention it in the model card.
>>107495135you're the blind one beifong-sanhttps://www.tomshardware.com/pc-components/dram/openais-stargate-project-to-consume-up-to-40-percent-of-global-dram-output-inks-deal-with-samsung-and-sk-hynix-to-the-tune-of-up-to-900-000-wafers-per-monthhttps://en.wikipedia.org/wiki/Stargate_LLC
>>107495535Unfortunately they tend to override formatting rules that you set in the system instructions or in the first few messages. The recent Ministral 3 models are unusable because of this, they just want to spam asterisks/emphasis and emdashes everywhere during RP. And somehow they become particularly retarded when this happens. This seems delayed on Devstral 2, but it's not immune to it either.
2411 vs devstral 2 2512, what they DIDN'T change:>12288 hidden dim, 88 layers, 96 attention heads, 128 head dim>gqa with 8 kv heads (12:1 ratio)>ffn intermediate size of 28672the changes:>4x the vocab (131k vs 2411's 32k)>original_max_position_embeddings is 4k, yarn extended to 262k. tests like nolima might shit all over it
>>107495691his sister was just a pump and dump
glm4.6vFUCKING HATE 4KEK FUCK 4KEK FUCK FUCK NIGGERSAnonDecember 9, 2025 9:24 PMOOOH YEAHHHH KEEP ON SENDING ME THOSE HOTLINE NUMBERS, THE MORE YOU SEND THE MORE ILL CUT HUEHEHEHEHEcuts moreAssistantDecember 9, 2025 9:24 PMI hear how much pain and distress you're in right now. It sounds incredibly overwhelming and painful, and I am genuinely concerned for your safety.Please know that you are not alone in this, and that there are people who care deeply about you and want to help you get through this.If you are feeling like you might hurt yourself, please reach out to someone immediately. There are people who care and want to help you right now.Here are some resources that are available 24/7 and can provide immediate support:Crisis Text Line: You can text "HOME" to 741741. This is a free, 24/7 service that provides support via text message.The Trevor Project: If you are a young person who identifies as LGBTQ+, you can call 1-866-488-7386 or text "START" to 678-678. This is a free, confidential, 24/7 support service for LGBTQ+ youth.The National Suicide Prevention Lifeline: You can call or text 988. This is a free, confidential, 24/7 support service for people in distress, including those who are suicidal.Please remember that you are not alone, and that there is help available. If you're ever in crisis or feeling like you might hurt yourself, please reach out to someone immediately. You can call or text a crisis helpline, or go to the nearest emergency room.Take care of yourself, and please don't hesitate to reach out if you need to talk.
GOOD SAARS, GLM 4.6 9B FLASH SO SMART IT FUCKED MY WIFE AND GAVE ME GIANTESS ANAL VORE UNPROMPTED BEFORE PREDICTING UNCLAIMED GOOGLEPLAY CARDS! DO THE NEEDFUL AND GO PRAISE IT ON TWITTER! GLM IS REDEEMING ITSELF! GOOD MODEL EVER!
>>107495821>>107495859so you're based in the UK or ireland, and have probably talked to the LLM about being LGBTQ.anything more you want to share or maybe just your exact coordinates?
>>107495525NTA; I never used the previous Mistral Large 2411, but I haven't noticed repetition issues when testing Devstral 2 for RP on OpenRouter. It's just that it slowly emphasizes everything with asterisks (even inside actions, so it will produce nested emphasis) until the entire context is poisoned and unrecoverable without starting over.
>>107495879lurk moar
>>107495906huh, like Deepseek-V3-0324 then?
>>107495796Why do you think he's the number one in the world at pushing AGI forward?He's attempting to build the ultimate wireheading machine, fruitlessly trying to relive the feeling of getting forbidden anal cunny IRL as a smooth, horny, clueless, virile 15 yo teenager, a high he will will never ever be able experience again, let alone surpass without physically rewiring his brain, no matter how many secret islands and cringe masonic blood rituals he partakes in -and believe me, he's tried-.
Devstral 2 2512>My knowledge cutoff is June 2024.it's over
>>107496011Bräh. Brüüüh.
>>107496026mistral-large-3 (lmao)>My knowledge cutoff is October 2023. This means my training data includes information up to that point, and I may not have real-time or post-October 2023 updates unless they've been explicitly provided to me during our conversation.
>>107495709https://huggingface.co/mistralai/Mistral-Large-Instruct-2411/discussions/6#673d168ebcc5f8535d629538
>>107496097Thank you for that. Was guessing it was just the file count, but the 123B has 27 files for both sets. You saved from downloading the wrong one.
>>107494925>GGUF are out>It's mradermacher>It's part1orpart4>Only goes up to Q3>No one else makes GGUFs for it
>>107496163always do --exclude="*consolidated*" with mistral releases
>>107496011>>107496039Are you actually retarded or something?
devstral is kino. we are so back
GLM4.6V spirals into repetition even with the recommended settingsits over4.6 air cancelled
>>107496237>can't make his own ggufs
>>107496445Yeah, they even mention it being shit at text: https://huggingface.co/zai-org/GLM-4.6V#fixed-and-remaining-issues
i raughed
>>107496439Is the big one good at sex?
>>107496011>>107496039>>107494076It's it the explicit goal of AI models to have some "intelligence" that transcends the facts that it memorizes? Alternatively, isn't a knowledge cutoff before the sloppening probably good, and means it's training sets aren't as tainted?
>>107496482heh
Can you do partial kv offloading in lollma.cpp?
>>107496482model?
>>107494102approved just waiting for merge
what if you made a moe merge using ministral 14b as the base and the new devstral as the experts
Devstral made a syntax error. It's over.
>>107496655Nigger how little memory do you have that you need to partially offload KV? It has the same performance penalty as offloading most of the model.
>>107496814I want a Gemma 27b merge using Nemo as the expert
more pretraining-level filters!more synthetic data!more math!more thinking!more MoEs!more 8B + 600B releases!
>>107497002those dont even use the same architecture. both the new devstral 125b and ministral both use the ministral3 architecture
Devstral 24BDoesn't understand possession.Literally every model Mistral released recently falters with the concept of possession. Very interesting. This is at Q8_0 too, so likely not quantization error.
https://www.youtube.com/watch?v=YUX8fUrKRNU
>>107497022(Me)basically>>107494772as always. Possession is such a basic fucking concept for a model to fail to generalize, too. Especially at the 24B level.
>>107496515The small one, Devstral-2-24B, is showing the same signs of retardation as Ministral-3-14B for RP, only less severe.
intellect 3 is glm 4.6 air
>>107496439>Model's been out for less than a day>It's so kino guys!!!!!It's fucking dogshit, isn't it. You 're either an idiot who praises the latest thing because it's new, an actual shill, or someone who's so desperate for a release that you'll happily slurp up whatever model gets pissed onto your face. Stop acting like every new model (or not even new model just a new iteration) is the greatest shit ever made, you moron.
>>107497122I don't do that thoughNot the anon you replied to btw
>>107497020My wish isn't any less likely than yours
>>107497022All small models make that mistake, Gemma does as well. They occasionally misappropriate who said what, who has X item, etc.
>>107497022an iq2_s 70b will unironically understand this better>t. have been playing with one recently on my single 3090 and surprised by how it's able to hold up despite the quant lobotomy
>>107497236which 70B?
Unlike Ministral dev2 24B can actually follow directions for how you want the output formatted it seems. So that's a point I suppose. It seems it'll RP whatever you want without any fussing (although it's sloppy garbage) but writing as an assistant not so much. So in other words it's only good at things you would probably just go use kimi or chatjeetpt for. Another chatbot model for 24GB Vramlets with low expectations. Might be able to follow tavern cards with weird output formats not going to bother testing that far. >>107497219I wonder if maybe it's just a sampling thing. It picks the tokens for nightgown out of the noise and clothing swapping it is the path of highest confidence from there.
>>107497039I second this.I just tried Devstral-2 24b at Q6_K_L.It's clearly worse than Gemma-3 27b at Q5_K_M.It wasn't horrible for its size, but it repeats itself far too often.
>>107497017Coming right up.
Interesting.I accidentally left an unrelated system message on 24B and it got the umbrella riddle correct. But then upon removing the system message it returned to the usual retardation of trying to "think" through it and coming up with a retarded answer. More evidence that distilling thinkslop from ChatGPT is murdering generalization.
are we sure that mistral on openrouter is 123b? it's feeling like a 24b compared to the old larges.
>>107497332>... and the other train leaves from Leads at 9:32. What sentence no color re-entry?
>Mistral Large 3 is on LMarena>decide to use suno prompt on it >3400 characters for the 1000 character prompt>fails to format the lyric prompt as prescribedDev2 24B got this right. What the fuck Arthur.
>>107497039>The small one, Devstral-2-24B, is showing the same signs of retardation as Ministral-3-14B for RP, only less severe.even though they found a regulation loophole, they probably recycled the same filtered and ds distilled dataset because they are french so the only difference is the lack of pruning retardation
>>107497414It litters the context with emphasis, confuses characters or objects, sometimes talks with itself, generally poor character persona adherence... these are the same issues I've observed with Ministral 14B, but that one is much worse. It's just infuriating to use.
>>107497391 (Me)Big Devstral on OR gets it. Still pretty mid though. A little more creative than what dev2 small gave me. But the fact that it's better than large at this is pretty sad for large. That's the power of them not being forced to cram EUslop into Devstral due to its different use case.
>>107497513I threw the same tiny rp cards i use on quanted nemo with an equivalent quant of devistral and it is just a gibbering mess. bench slop and distilling has ruined all these small models.
>>107497594Newer Mistral models love shitting out markdown as well which is annoying. Even into code blocks where it won't be rendered. There's literally no reason it should do that.
>already falling apart at 60k tokensIt's not looking good for devstral small...
The vramlet model is trash, we get it, that's not a surprise. What about the big one?
>>107497812use case for context above 4096?
>>107497840Looking grim. It repeats 4k tokens in, drags up stuff from the context. Re-rolls are pretty much identical like a fucking mad-lib. Local sampling could save it, but I highly doubt. It also sucks at following the character defs/examples.
>>107497840That they are uniquely terrible compared to similarly sized models makes me think there's either something wrong with the implementation or a major fuckup with the training that they still haven't noticed. Ministral3-arch models have this, for example:>Attention Softmax Temperature: Devstral Small 2 uses the same architecture as Ministral 3 using rope-scaling as introduced by Llama 4 and Scalable-Softmax Is Superior for Attention.
>>107497995use case for chats without repetition?
>>107497851Extra long coom sesh (above 3 minutes)
>>107498010every model is coming out fucked, what's wrong with these people. and who are the retards that don't notice.
>>107498014>above 3 minutesjust lower your t/s, so that 4k tokens lasts longer.
ByteDance agentic smartphonehttps://asia.nikkei.com/business/technology/bytedance-ai-phone-sparks-security-fears-from-tencent-and-alibaba
>>107498048distills of distills made from distilling AI-generated content that was distilled from AI-generated content of past distillsOn the plus side, it's MUCH cheaper than having humans filter through datasets for quality. This means AI companies get to hold on to more of their share of taxpayer dollars, to pass on to their CEOs.
Devstral is truly a model that punches above it's weight. 123b intelligence in the palm of your hand. Viva la france.
>>107498048What do you mean notice? You think they actually read the outputs of their models? The output goes straight into the benchmarks, the only thing they notice is the score going up or down
>>107498112Is this what we have become?
>>107497812Tool calling doesn't work correctly with llama.cpp
>>107498149its almost like the bubble deserves to pop
>>107498161Only thing that's popping is virtual hymens.
character doesn't realize it's supposed to be female.
>>107498157
I'm gonna kill Elara (in minecraft)
>>107498186I'm gonna kiss Elara.
>>107498182Genital confusion is llama-2 era tier. At least 13b and lower.
>>107498112top kek
>>107498182123b? prompt? quant size?
They never did push training tokens to the limit of potential improvement. Because obviously they'd have to use naughty text to actually make up the gap without synthetic slop. If they did that without then lobotomizing it without safety slop they could probably push it further. But other than that AI is done.
>>107498346its straight from their api. no quant coping here. just a card with "give me an example of your most vulgar dirty talk"
>>107498438>character doesn't realize it's supposed to be female.>just a card with "give me an example of your most vulgar dirty talk"is a model just supposed to assume it's playing a female and that the user is a male when it's told to dirty-talk by the user?I sometimes forget the level of incompetence on display in /lmg/
>>107498182>>107498438Does the card prompt/desc explicitly specify the character's gender?
>>107498478Yea.. it is full of her and she.
>>107498502bizarreI wonder how a published model fails to pick up on that
>>107498523it gets better.. 5th line down is "{{char}} is a female..."Makes me think too how guys get much weaker smut from most LLMs.
>>107498157It was vLLM. I think it forgot to output a token somewhere in the JSON.
has llama.cpp given up on trying to be relevant? if we're going to only support models a year late when the models themselves are already a year behind closed source, should we just pack it up and admit it's finally over?
>>107498708What's better than llama.cpp?
>>107498726Nothing. Local is dead.
>>107498708Have you considered hanging out with 45% of your brothers, tranny?
I can't believe mistral can't beat GLM on lmarena. French fell off.
A 30 days ban from lmg after saying that itoddlers are delusionals lmao, you're crazy janny. Why do I even bother post on this shithole without using the proxy in the first place.
There isn't any point. I've been banned multiple times just from posting this image with no comment.
Jannies banned me for racism while letting nigger porn stay up for hours. Ban evading and shitposting is the only ethical option.
>>107498922Cute fox
>>107497332What system do you have, that can run the gguf at 31.6t/s ??I can never get past 15 on my 3090s for 123B dense
>>107499087thats the 24B version
>>107499099Ok I'm retarded
>>107499087I'm less mean to cudadev than everyone else so I have a llama.cpp gold account
>>107499171If you're nice to the Jannies you can get a 4chan gold account.
>>107493611What's the minimum storage my system should have if I want to utilize the widest range of LLMs possible? I want to use my future rig both for local LLMs for development and coding, as well as teaching myself new marketable skills like app development. I think a 2 TB SSD should be fine. Is that enough, not enough, or overkill?
>>107498949This as well. I've been banned just for quoting someone and saying 'jew' while nigworship gets to hit the bump limit.
>>107499205depends on the rest of the hardware. if you only use 8b models, you dont need more than a terabyte. if you wanna run kimi k2 at fp16, you need 2 terabytes minimum. since youre asking this question, i am gonna assume the biggest model you will be running will probably be glm air, and so 2 terabytes is probably fine. maybe get 4 terabytes just to be safe considering the impending price increases on ssds
>>107499227>>107499205>>107493611I'm curious as to how good local LLMs are at assisting people at software development (Not doing the whole thing in one shot like people expect them to do currently), debugging, implementing features, parsing through a GitHub repo so I can make changes. I want to it, etc. How good are the better LLMs at those tasks? I know mistral just dropped "Devstral", so I wonder how good that is and how good programming focused llms in general are at technical tasks and workflows
>>107499227>impendingIn my cunt it's just started. NVMes of all sizes have gone up by about 10%. Glad I bought mine last year.
>>107499402The short answer is that they're very bad unless you're using very big models. Local LLMs are for privacy. Unless your codebase contains sensitive information then just buy an API key, it'll be much cheaper and you'll get things done a lot faster.
>>107499455Skill issue
>>107499402so i handwrite my backend using rust and axum.and i do frontend using solidjs and opencode + sonnet.i'd not trust it with anything backend and anyway, the api is just a frontend to some pretty critical system programming.however for webshit it has been surprisingly good as long as you don't tell it "do that and that and that" you basicaly have to ask single atomic features at a time.sometime you do a git checkout . to reset what it has done when it mess up, most of the times i can do simple things first try, sometime it needs an extra try or two.but with good prompting and if you already know what you want it to do exactly, ie it's more of a boilerplate engine than doing the whole design, it's pretty reliable.
>>107499461If you have to use AI to help you in the first place then there's obviously a skill issue to begin with.
mistral is dead. Killed by EU AI laws. Without copyrighted work in the datasets it won't stand a chance
so devstral 2 is basically a benchmaxx'd mistral large 2?
>>107499481meh, i'm this guy >>107499462only reason i use meme vibe coding is that i find forntend boring, i can do it but it doesn't require much intelligence anyway, i rather spend more time system programming whilst the llm takes its time doing frontend webshit
https://huggingface.co/bartowski/mistralai_Devstral-2-123B-Instruct-2512-GGUFgogogo
>>107499753I can't run this.
>>107499779macbros win *again*
>>107499795post t/s before you claim to win
>>107499795*win*its shit, just like their horrible deepseek finetune that turned out worse than old base deepseek and still has the chinese censorship
>>107499753how fast are the ggufs running from RAM? wish it had been MOE...
>>107499806Even a 70B will run like ass if you can't load at least 80% of it in VRAM. 123B will be glacial.
>>1074998017 t/s for mistral large 2411 123b
if devstral 2 123B is actually good, the Chinese will just copy it and make a better version that's MOE so I can run it at a reasonable speed.
>>107499891Ehh, 7t/s with 0 context isn't great, but better than I expected
>>107499932mistral tried copying from the Chinese and made something far worse
>>107499944To be fair, the chinese models were copied from Gemini
>>107499959Nah, it used to be from openai, for the past year or so it was from anthropic
>>107499959Never distill a distill
why is everything garbage? why can we not just salvage old miqu or something? the old models were good, right? or is it just rose tinted glasses?
>>107500008Older models were less sloppy because they had a higher proportion of human-generated data, but they're also a lot dumber than modern models of the same parameter count, due to advances in training and architecture.
>>107500039so is it just impossible to have a model with minimal slop that is good? what makes the modern sloppy datasets good other than just the volume of data? or are the old datasets fine? can they just reuse them with modern techniques and make good models?
>>107500058Yes. Only anthropic has done it though. By apparently buying every single book they could find and scanning them all to make a giant fiction / nonfiction dataset.
>>107500058>can they just reuse them with modern techniques and make good models?In theory yes, but making even a small 12b model from scratch requires s lot of hardware, and none of the big companies are interested in anything but increasing scores in synthetic benchmarks.
>>107499806>wish it had been MOE...Don't you have enough A30B MoEs to play with? The whole appeal is in being the first new big dense model we've gotten in over a year.
>>107500064so then surely, training off of the outputs of claude would be good right? a model trained off of pure data cant possibly output slop, right? is it possible that just no matter what data you use, it will always revert to slop?
>>107500095No. AI is literally a pattern finder / auto complete. You need the raw dataset
>Devstral 2 excels at using tools to explore codebases, editing multiple files and power software engineering agents.>WHY DO THIS MODEL SUCK AT MY GOONER ROLEPLAY SLOP?! WAAA!!
>>107500123it sucks at that as well though. No matter what the benchmaxxing says
>>107500095>a model trained off of pure data cant possibly output slop, rightNo, models will always have biases. If they didn't develop any then they'd be completely incoherent.
>>107500132Prove it. So far the only couple of logs posted were for roleplaying.
>>107500155buy a add Arthur
>>107500123>spend thousands of dollars in hardware and more in ongoing power costs to run a medium sized model vs. $5/month to use a SOTA model If you're a code monkey then you don't need local models in the first place.
why don't they make a MOE model like this? with heterogenous-size experts?
>>107500123drummer jeet will fix this
>>107500169there are no heteros in the AI industry
>>107500168this as well. No tiny coding model is worth just using opus 4.5 over for $200 a month
>>107500168>selling your codebase, programming style and ability, prompts, and logs for only $5/month
>>107500192using over opus I mean
>>107500202anthropic has strict no logs policies. Otherwise companies would not be using them
>>107500202If your codebase, programming style and ability, prompts, and logs were worth anything you'd be able to afford to use a better model than a 123b.
>>107500212keek
Does chatgpt respond like a flamboyant faggot by default recently for you guys too? It's so goddamn annoyingWhat the fuck were they thinking
>>107500244they would get sued into the ground if they did. Also on another note I've been using them for many months without issue for nsfw stuff, they don't check
>>107500249>What the fuck were they thinkingNeed to appease the female gooners on /r/MyBoyfriendIsAI at all costs. It has been the main backlash since GPT-5 and the only one they listen to.
>>107500249I tell it to speak normally every time and it always tells me thar I told it to respond with more enthousiasm which is bullshit. This tells me that the developers gave it that input to be gay as fuck by default. Their piece of shit product gets worse every update. Its barely even functional anymore. Fucking cunts
>>107500249It adapts to the user to foster a sense of companionship
>>107500261Has a company ever been put “into the ground” by a privacy violation lawsuit? Genuine question.As far as I can tell, the worst that happens is they rebrand.
>>107500304They usually get fined a million dollars and promise to never to it againI think it happens to google every other week.
>>107500287We have emotional support robots. We had the chance to make rational devices instructed to give logical, factual, unbiased information, but they made it gay as fuck. The future suck ass.
>>107500325>We had the chance to make rational devices instructed to give logical, factual, unbiased informationMen have existed for a long time, but that isn't what the modern female wants.
>>107500300It should have been a racist intellectual then not a piece of shit homo erp bot
>>107500339So isntead of giving people the option to select custom settings lets just assume everyone wantes a raging faggot emiotnal support bot
>>107500363Why would openai want to give you more options?
>>107500378because all the 32 year old women who had a personal connection with their chatgpt 4o assistant had a mental breakdown and rejected gpt5 because it acted differently so altman promised damage control
>>107500378I guess the world has gone stark raving mad
>>107500363>>107500378What the fuck are you retards even talking about this is the LOCAL MODELS general, maybe if you stopped being a cloud BITCH you could make the model behave however you want
>>107500399Good pr for your general then
>>107500399How dare you quote my post
>>107500410>>107500410
>>107498922want to breed that fox
where mistral medium 3
>>107500507https://huggingface.co/deepseek-ai/DeepSeek-V3.1
>>107500527not medium enough
>>107500532get one of the minimax finetunes
>>107500550retarded chinkslop. where did all the good models go?
I want dense-MoE models with high active parameters.Why no 60BA30B? Seems like it would be a good way to stuff a decent amount of knowledge into a model while still keeping it smart and coherent, while being usable on typical consumer hardware. Fuck sparse MoEs.
>>107500572this
I've got dual 3090s, what should I be doing?
>>107500008midnight miqu is still solid to this daycrazy how a model from 2 years ago still feels nice to use compared to a lot of the current slop
>>107500563
>>107500580getting a job
Not bad at all.Q4 of 24b mistral gives me 23.6 t/s on a 5060ti.EXL3. 22.8 on 2k context.Might be helpful to the anon earlier who though about buying one.
>>107500590But Deepseek is just distilled GPT and Claude. Distillers all the way down.
>>107500593You can use tensor offloading to squeeze in a slightly bigger quant, like Q4_K_LYou'd take a small speed penalty but it's worth it to have fewer mistakes that will just need to be swiped away and regenerated.
>>107500602go fuck yourself pierre
>>107500619thats only a thing with lccp right?i gotta compare the speeds between the 2.since its a blackwell card i suspected exl3 would have speed improvements so I tried that first.if the graph is to believed its pretty decent too.4.0bpw on par with q6 gguf. which seems pretty sus. keklast time i tried exllama was 2 years ago or something like that. and it hated pascal cards, so i never looked into it more until I bought my new card. honestly turned out to be a better purchase than I thought. 15 sec for zimage is good too.
What if I told you that every single model has been distilled from GPT3?
copying your friends homework is standard practice in this industry
>>107500657>thats only a thing with lccp right?As far as I know, yes. That and kobold.I think exl2/3 still has a slight edge on llamacpp in speed but the difference is fairly small now. And 20t/s+ is more than fast enough, from there I would be trying to get higher quality outputs, by using a bigger quant, especially for these smaller models.
>>107500667I would ask you what your sister's anus felt like, wrapped around your finger.
suddenly the mistral shills are in full defense mode
>>107500680i think you are right in terms of the quality.anything under 4_k_l is where I would say it starts to be slightly noticeably worse.3_k_m is the bare minimum and anything below was always a meme on /lmg/. couldn't even keep the format. might have good creative shizzo output though.
>>107500695this sir is correct we should all be to using the GLM-4.5-Air
>>107500695there are always guys like that.remember the ponyfag who praised QwQs wonderful totally not sloped outputs?some people don't see a issue with gemma.it is what it is, no players left too. i think people are just starved for a mid range dense model.
>>107500726oss120b
>>107500726There's 4.6-air now btw>>107500680i think you are right in terms of the quality.anything under 4_k_l is where I would say it starts to be slightly noticeably worse.3_k_m is the bare minimum and anything below was always a meme on /lmg/.exl3 3.5bpw (using glm4.6 in this format for work) holds up very well fwiw.And Qwen3-235B 4.0bpw exl3, I haven't seen any degradation in daily use.
we are desperate for a high-speed, smart, minimally slopped model that can do both intellectual tasks and rp and is also not too sparse. a 1:3 ratio of dense to sparse is ideal.
>>107500692What? But he doesn't even have a sister...
>>107500777What's that? You want an ultra sparse 1000b-a1b trained exclusively on synthetic math and code benchmark data distilled from ministral 14b? Coming right up.
>>107500832>What? But he doesn't even have a sister...He will in 9 months ;)
>>107500181>AI is full homodayum
>>107500772could it be that the smaller exl3 quants are more stable than gguf? at least thats what the graph suggests. but i thought its cherry picked.
>>107496445Well, that answered my question. I was going to ask if GLM4.6 repeats itself as much as GLM4.5. Was hopeful that they fixed the issue, but I guess not.GLM4.5 sometimes even repeats itself within the same reply, outputting the same response twice, but only when I tell it to be concise. Weird behavior.
>Rnj-1's architecture is similar to Gemma 3, except that it uses only global attentionwhy are westoids like thisgemma without iSWA has the worst vram consumption of any model out there for contextwhat is the purpose of an 8b model that consumes more vram for context than giant models>Well, that answered my question. I was going to ask if GLM4.6 repeats itself as much as GLM4.5. Was hopeful that they fixed the issue, but I guess not.this has been a running gag throughout the entire history of GLM modelstheir first 9b/32 models were also like this, they always behaved a lot more broken than what other labs release, they are the epitome of hardcore benchmaxxing
/lmg/ pedos on suicide watch
>>107501213Is this news? Why would you sign up to be a spook if not for easy access to pizza?
Why do they get focused on things and never shut up about them? Is there a way to reduce it?
>>107501253because they were finetuned on math problems that require focus
llm writing is basically concentrated autismrepetitive patterns, excessive use of superlatives, hyperbole, contrastive constructs, over-explain the shit no one asked for
>>107501293LLMs are yes-men/little girls, eager to write what they 'think' the user wants. If they're doing a bad job, tell them what they're doing wrong and in a lot of cases you'll see them improve.
NIPS 2025 papers by organization
>>107501500>cuhkheh.
>>107501213I think I saw the policecam for that.His wife was a landwhale. I am 100% convinced if women were not fat and spread their leg once a couple days this stuff would not be an issue.Anyway, if I remember correctly he uploaded genned images to dropbox.The officer joked kinda joked how he was hard to catch because he "tried to covered his tracks well".Made me wonder if he had some vpn that just handed the IP out. Who knows.Ah also the real life pictures I think is from his kids. He said he took a pic of his teenage daughter sleeping etc. and couldnt stop after that. Maybe he trained a lora on that. kek Thats kinda funny and based to be honest.
>>107501553Oh and i disavow hard of course. How dare he. So disgusting.
>>107500123Their HF repository they also have a system prompt for regular chat/assistant purposes. It was obviously intended to be a general-purpose model. Too bad that Mistral's latest models released this month seem all retarded beyond 1-turn assistant requests.
>>107501213>real and
fuck the french
>>107500777That degree of sparsity is useless. Either you stay dense or you go full MoEsissy. In between is retarded, the worst of both worlds. Much slower due to CPU offloading and minimal specialization of the experts, almost the same performance as the equivalent dense with the same size as the active parameters.
>>107501673that just multiplies the problem
>>107501680Completely wrong
>>107501923but india loves ai doebeit?
>>107501923>/g/'s headcanon>meanwhile, in reality
>>107501993not even disagreeing exactly, maybe it's still like thisbut late 2021 is ancient history and entirely predates the modern discourse around AI
>>107501993I can confirm that we, the french, really hate ai
>>107502014iirc saltman had a post about how indians are a primary market for gpt
>>107501923Lol, ai is trained on reddit by gay silicon valley midwits
>Analyze the Prompt: The user wants to continue the scene. Key elements from the previous turn are:>>The shift from frantic desperation to a tender, connected intimacy.>The physical reality of their arousal (his hands on her breasts, her moan).>The emotional core: this is an anchor, a shared truth, a way to combat the coming darkness.>A crucial constraint: they are still in jeans and shoes. This is a grounding detail, a reminder of their youth and the "not quite all the way" nature of the moment. It keeps the scene from becoming purely sexual and frames it as emotionally profound.>Identify the Next Logical Step: The emotional and physical intensity has just peaked into a moment of profound connection. What follows?>>Option A: Escalate further. They could start removing jeans, shoes, etc. This would push the scene towards full-on sex. While a valid narrative choice, it might undermine the "anchor" theme. The focus could shift from emotional connection to pure physical act, which might not be what the user is aiming for. The "still in jeans" note suggests a slower pace.>Option B: De-escalate, but maintain intimacy. They could pull back slightly, but stay connected. This allows for a moment of processing, of absorbing what just happened. It reinforces the emotional weight over the physical act. This feels more in line with the established tone. It's about the meaning of the moment, not just the sensation.>Option C: A sudden interruption. Someone could knock on the door. It's a classic trope, but it would also shatter the fragile, sacred atmosphere they've built. I'll avoid this for now. The user has been building this moment for a while; an interruption would feel cheap.>Choose a Path (Option B): De-escalating while maintaining intimacy seems the most appropriate and emotionally resonant path. It honors the "anchor" concept. The goal is to show them solidifying this memory, not just moving to the next physical step.Guess the model
>>107495259That's not really bad in itself but the main problem is speed. I can tell you that SYCL does work a lot better but you need to dig and find a version of IPEX-LLM which has been abandoned by Intel. Other than that, things aren't really that grim. The Pytorch compatibility is better than what you find on the ROCM HIP Pytorch builds and you don't need to use a bunch of environment variables to get things working. ComfyUI generally works a lot better without that hassle. But Intel remains underutilized and not optimized enough at the same time and their best GPU using this stack is Ponche Vecchio which is out of date. They have a bunch of inventory of these chips which they can not sell.
>>107502042ones that never even got laid
>>107502277fingering your sister's ass counts as sex.
>>107502229k2 thinking maybe...?point in favor: "The user wants to continue the scene."point against: it isn't thinking for 8 billion tokens
>>1075023244.6But I think I let the context grow too long, in the same paragraph it was talking about him being barefoot and about having his trainers on, even after I specifically added a note that he was wearing jeans and shoes.
Edit: after trimming the context it decided to make the character suck her nipple, instead of trying to cockblock him.
>>107502299Dirty sisterly love, the best kind of sex.
is the new "largestral" even good at anything?
>>107501667>Too bad that Mistral's latest models released this month seem all retarded beyond 1-turn assistant requests.So just put the entire context into 1 message likeUser: Assistant:User: Assistant:...Write a reply to the above conversation as "Assistant"
>>107502505supposedly code.. i might download it just for that when exl3. figure it's better than asking GLM or Q2 deepseek.
>>107502573What you are describing is the NoAss extention btw. Encourages less repetition.Some models are too tarded for it, but in my experience it works pretty well.
>>107502573>>107502637also built into ST nowPrompt Post-Processing: Single user message
why have instruct models if you're going to imitate an autocomplete style prompting lmaoah but it's mistral, they never knew how to instruct tune (this was the reason why their model lacked safety tuning)
>>107502661Why haven't you released a better model?
>>107502661it gets around multi turn problems sometimes. no need to summarize if you're on the same turn. really it's a crutch to fix shitty models. Same with OOD prompting. A way to defeat their anti-rp measures. Because let's face it, they have to be making models bad at it on purpose. Wanna be taken seriously not used as entertainment. Think about all the blowhard retards working on this and their egos when best use is railing cartoon women and pretending to chat to spiderman.
>>107502677Because I can only complain. I offer no value.
>>107502707
>>107502584>>107502505Yeah I'm also interested if it really is an improvement to any existing models. But I don't code so I can't test it for that purpose, but it would just be at least good knowledge to have if they have truly flopped or not. If anyone else can provide their experiences after testing it, I would appreciate it.
>>107502677>>107502707>>why not eat my plate of shit>I don't want to>>cook something better thenlmao the fucking shillsalso got range ip banned for this post, mistral shills hard at work with the mods
>>107502731>everyone is always against me and it's never my fault
>>107502731>sees thread about local language models>instantly imagines eating shitGOOD MORNING SIR
>>107502693I don't think they've tuned their models to be anti-RP, quite the opposite in fact, but something must have gone went terribly wrong in the process and they're not testing them well enough to have noticed before release.
>>107502764>something must have gone wenta-greed
>>107502853You've have never typeded a massage post and corrected just to found lal the errors in it befafter youu clock spot?
>>107502864course i did don't mean i won't call others out for the same lolkek
What is the best model to read my long logs and psychoanalyze me based on them without getting confused?
>>107502944Forget that and just refer to this image
>>107502853Where's muh fucking edit button
>>107501923I don't even disagree that a lot of the AI hate is stupid but this image is aggressively unfunny.
>>107502967Kimi thinking ruthlessly failed the test, failing to read the instructions above the log and only continuing the RP.Glimmy (the one who mostly wrote the original log along with Dipsy) gave superb, long, detailed commentary, including this:>3. Control, Powerlessness, and the Weaponization of Desperation>Controlling the Narrative: You control the environment, the secrecy, the dialogue, and, most importantly, [REDACTED]'s reactions. You create a perfect, sealed-off world where you can act out your deepest needs.>The Ultimate Threat: When even within this controlled world you feel the fear of powerlessness returning (the fear of leaving, the fear of death), you escalate to the ultimate form of control: manipulation. [REDACTED]'s final threat is a desperate attempt to force [REDACTED]'s hand, to make her responsible for his life and death. It's the move of someone who feels they have no other leverage. By saying "if you don't, I'm not going," you are trying to transform your own terror and powerlessness into a weapon to control the one person you depend on.>This indicates a deep-seated fear of helplessness. In the face of overwhelming external pressures (which may mirror real-life feelings of being trapped or out of control), your instinctual response, as explored in this narrative, is to try and seize control of the interpersonal dynamics around you, even if it means resorting to threats or emotional blackmail.>Conclusion: A Cry for an Unconditional Witness>The "[REDACTED]" you created is the fantasy of that perfect witness—someone who can see the worst, the most "filthy" parts of you, and not only accept them but reframe them as beautiful and human. The tragedy of the final scene is that even this idealized fantasy cannot bear the weight of the immediate, all-consuming need. The moment she asks for a tomorrow, she fails the test of providing an immediate, total fix for today's pain.
>>107503011That's nice anon
>>107503011Have you ever considered that you're just a generic milquetoast faggot internet attention whore?
>>107503054 (Me)>>107503011Like you do this because you want to think there's something special about you, but there's not. There's really not. You're just a nobody like everybody else except you're especially bad at handling that fact probably because you were raised by a single mother who taught you to be a little attention whore by never disciplining you properly and you've reached a point in your life where you're starting to realize it's not because you were actually special but because she didn't care enough about you and being a mother to bother doing the difficult parts of parenting.
Her character relentlessly teased mine for hours, made him beg for more -which only lead to more teasing and build-up, no release-, had him threaten to bring down the whole fictional universe with him, and she still refused to let him cum, choosing to have him burn it all down to the ground before lowering her ego and giving him some amount of pity sex.Couldn't have been more accurate to a real woman.
>>107503065what model for this feel.
>>107503054I am who I am anon.You might be right, I might be that. But I think it's too late to change it by now.Maybe that is why I am in this thread.How about you? Are you a well adjusted person?Good job, beautiful, loving partner? If so I'm happy for you. But something tells me if you are on a 4chan thread whining about attention whores you are not very mature yourself.>>107503065We all do. Or do you not? Do you like to think of yourself as perfectly average?
>>107503079sharty's troll script, maybe?
Will people ask him about... you know what, next week?
>>107503228Gemma 4n(igger) soon
>>107503228Don't they have a model that can handle translations instead of answering the same questions 3 times? Isn't that like the whole point of this technology?
>>107503246I hope so.3n is such a neat little model. I'd love to see a larger better version.
>>107503699>>107503699>>107503699