[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107481183 & >>107470372

►News
>(12/09) Introducing: Devstral 2 and Mistral Vibe CLI: https://mistral.ai/news/devstral-2-vibe-cli
>(12/08) GLM-4.6V (106B) and Flash (9B) released with function calling: https://z.ai/blog/glm-4.6v
>(12/06) convert: support Mistral 3 Large MoE #17730: https://github.com/ggml-org/llama.cpp/pull/17730
>(12/04) Microsoft releases VibeVoice-Realtime-0.5B: https://hf.co/microsoft/VibeVoice-Realtime-0.5B
>(12/04) koboldcpp-1.103 prebuilt released: https://github.com/LostRuins/koboldcpp/releases/tag/v1.103

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: teto.png (148 KB, 348x395)
148 KB
148 KB PNG
►Recent Highlights from the Previous Thread: >>107481183

--Papers:
>107487575
--Mistral's Devstral 2 release:
>107491699 >107491763 >107491838 >107491858 >107492060 >107492483 >107492185 >107492255 >107492288 >107492320 >107492219 >107492268 >107492389 >107492454 >107492472 >107492525 >107492593 >107492623 >107492634 >107492875 >107491992 >107492067 >107492081 >107492338
--Devstral 2's EU regulatory exemptions and potential unrestricted training:
>107492927 >107492992 >107493039 >107493088
--LLM hardware needs and performance tradeoffs for roleplay/video generation:
>107488035 >107488072 >107488094 >107488146 >107488177 >107488250 >107488266 >107488291 >107488300 >107488328 >107488443 >107488454 >107488498 >107491301 >107491373
--Upgrading from Tesla V100 to RTX 50 series for better chatbot performance:
>107488666 >107488693
--Observations on Ministral-3 quirks and potential model collapse:
>107484074 >107484294 >107484310 >107484454
--Intellect-3 performance and AI architecture limitations discussion:
>107483224 >107483625 >107484485 >107484921 >107485250
--GLM-AIR sampler preferences and effectiveness comparisons:
>107482984 >107483086 >107483116 >107483227 >107486199 >107483099
--Mistral Medium 3 release speculation and EU regulatory challenges:
>107486923 >107486953 >107487229 >107487265 >107488685 >107488835 >107488870 >107489030 >107490519 >107490548
--Intel B60 GPU issues with LLM inference:
>107491645
--Mistral-Medium-3 size and format discussions:
>107487529 >107487548 >107488064 >107487902 >107488690 >107488781 >107488802 >107492527 >107492594
--AI bubble predictions and growth expectations:
>107483823 >107483859 >107483875 >107483907 >107483915 >107483926 >107483965 >107485420 >107485806 >107483952
--Sam Altman's alleged role in high RAM prices:
>107482119
--Miku (free space):
>107487256 >107489192 >107489329 >107490563

►Recent Highlight Posts from the Previous Thread: >>107481187

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107493611
96GB VRAM is the way.
>>
>>107493611
Happy Tetoday
Thread Theme: https://www.youtube.com/watch?v=M40MIxGK3is
>>
Mistral Nemo Large is now real
>>
>>107493611
Hideous OP thx
https://www.youtube.com/watch?v=Rt8_uc76J3U
>>
>>107493517
The EU was already cucking out on their AI Act as of last month
https://www.reuters.com/sustainability/boards-policy-regulation/eu-ease-ai-privacy-rules-critics-warn-caving-big-tech-trump-2025-11-19/
https://www.reuters.com/sustainability/boards-policy-regulation/eu-delay-high-risk-ai-rules-until-2027-after-big-tech-pushback-2025-11-19/
Nobody is going to rag on Mistral for getting cheeky while the deregulation lobbyists have the initiative
>>
the first horsemen was local completely dying not getting any models let alone ones that are as good or better then the best globohomo ones (no offense ai models i love you all you are all frens to me) the second is the hardware itself nvidia before but now with ram aswell the third is the goverment juden
>>107493813
if true it would mean the third horsemen is falling but speaking of which werent there some law for america aswell how did that go ?
>>
>>107493632
If devstral 2 is as good as it claims, local coding with a single rtx 6000 is possible. huge honestly
>>
>>107493611
No GOOFs out yet for the big Devstral 2 model, but I bet somebody here could run the full unquantized Devstral 2 small model as a test. I'm curious how good the small one is for its size class, as that would be a great indicator of how the big model will perform for its size class.

https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512
>>
Rather annoying that they again chose to release a small model and a huge model, with nothing in-between.
>>
>>107493632
>>107493893
How are the prices looking? Not many price charts for blackwell worksation card
Best I find here is £7,859.99 = $10.4k USD this is fine ;;))))
>>
>>107493952
compared to 8b-600b these are the in-between
>>
Okay so what are the real proper sampler settings for GLM 4.6? For anything non-code Z.ai recommends temperature 1.0 but that's slightly too loose. Last night using it to generate fiction, in the first 4572 generated tokens it mixed Chinese with English once. ("The world’s first由此而生 monster") In the next 5312 generated tokens it emitted ill-formatted English once. ("a consciousness like the troll’s or theFather's.") In the next 4064 generated tokens it mixed Chinese with English once. ("walked calmly through the火灾.") In the next 11046 generated tokens I didn't notice any problems like that. (Token counts are adding together entire messages.)
>>
>>107493997
I'm talking here about GLM 4.6 not 4.6V.
>>
>>107493927
Test how? Spinning herptagon?
>>
>>107493979
That's true. I was hoping for an in-between of the in-between I guess. 40b to 70b would be perfect.
>>
>>107493997
I just use 0.8 temp + 0.02 min-p.
>>
>>107493997
>>107494010
Which quant are you running?
>>
>>107494014
I was under the impression that it's possible to run the unquantized small model? If so, then compare the 24b against Gemma 27b or 32b fine-tunes. How it does in that comparison would likely mirror how the 123b does against others of its size class.
>>
Is a GLM 4.6 Q2 cope quant worth it for 24gb vram + 64 system ram? Or should I just stick with Gemma. GLM 4.5 failed to deliver.
>>
>The tokenizer you are loading from 'cyankiwi_GLM-4.6V-AWQ-4bit' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e.
>This will lead to incorrect tokenization.
>You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
What the fuck is it talking about...
>>
>>107493927
https://huggingface.co/mistralai/Devstral-2-123B-Instruct-2512/blob/main/CHAT_SYSTEM_PROMPT.txt

>You are Devstral-Medium-2-124B-Instruct-2512, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
>You power an AI assistant called Le Chat.
>Your knowledge base was last updated on 2023-10-01.
>The current date is {today}.
[...]
>>
>>107494022
https://hf.co/finding1/GLM-4.6-MLX-8.5bpw
>>
>>107493997
I mostly use temp=1(off) minP=0.03 maybe ± 0.01 on scenario but that's for assistant/RP Q3_K_M
>>
>>107493927
https://github.com/ggml-org/llama.cpp/pull/17889
>should work now with --mistral-format
I think it's ready?
>>
>>107494067
Does Q2 even fit?
>>
>>107494186
It does. I can run UD-Q2_K_XL. It's pretty fast, as well.
>>
where mistral medium?
>>
>all GPUs gone
>all RAM gone
>all SSD and HDD gone
>flash memory probably next
What's left? Mechanical computers or something?
>>
>>107494262
Phones that you can use to connect to "the cloud".
>>
>>107494235
It just dropped, go grab it ----> https://huggingface.co/mistralai/Devstral-2-123B-Instruct-2512
>>
>>107494047
I meant, what do you want to prompt it to gauge its ability?
>>
>>107494286
But all phone RAM went to datacenters? AI revolution means no personal tech devices of any kind.
>>
>>107494292
that just looks like a shitty retrain of old largestral. what about that guy in the last thread?
>>
>>107494304
You won't need RAM in the future where your Meta Ray-Ban AI always-online thin-client cloud-connected glasses are the only personal tech device you ever need
>>
>>107494295
How about the old Nala test?
>>
File: file.png (102 KB, 640x360)
102 KB
102 KB PNG
>>107494350
>>
File: ddrlewd.png (160 KB, 770x670)
160 KB
160 KB PNG
>>107494262
writing fanfic about what we're missing
>>
>>107494295
>justpaste DOT it/GreedyNalaTests
>>
>>107494411
Holy slopkino
>>
>>107493927
It's literally just pixtral arch so theoretically should already be supported by goofs.
>>
>>107494548
You'd think so, but Mistral keeps changing the tokenizer
>>
File: gonflaw.png (1.18 MB, 796x942)
1.18 MB
1.18 MB PNG
>>
>goofs for glm.4.6v flash are here
>but no mmproj
LMAO bros I love multimodal models!!!!!!
>>
Devstral is up on OR if somebody wants to test it before the ggufs are out. In terms of first impressions, it reminds me a lot of Mistral Large 2 but smarter.
>>
>>107494925
What did it do that Large 2 couldn't for it to be smarter?
>>
>try to run devstral 2 with vllm
>does not respect CUDA_VISIBLE_DEVICES
>RuntimeError: NCCL error: unhandled cuda error
>>
>>107494933
I gave it some of my scenario cards that I remember Large 2 struggling with and only really became usable with local models around DS3-0324.
>>
>>107495041
That's actually pretty neat.
Might as well keep a simple record. A sort of very loose private benchmark.
>>
>>107494262
Are you living in some sort of parallel universe?
Where in the fuck is all the stuff "gone"? Retarded human bot
>>
>>107491645
fake, b60s are still not on sale
>>
Ew, Devstral *really* likes to *spam* asterisks for emphasis so there's at least some of their Deepseek distill slop in there.
>>
>>107495211
A few did sneak out or got parted out of their systems for a good price but for the most part, other than the increased VRAM, it's basically like a B580 in performance.
>>
Devstral 2 has the prose of an early 2024 model and makes more logic/continuity errors than the fat MoEs
>>
>>107495231
>>107494925
>>107494996
>>107495355
>Everyone is fucking the coder bot already
Is it better than 2411 mistral at least?
>>
>>107493997
>In the next 11046 generated tokens I didn't notice any problems like that.
I noticed one in the final message in that group ("wassmart").
>>
>>107495434
Every Mistral model is a little french whore who spreads her legs for anyone
>>
>>107495500
Allow me to prompt my question more specifically.
Does it still have the repetition issues more or less?
>>
>>107495231
They're called grounding tokens and they work
>>
File: sam.jpg (53 KB, 846x672)
53 KB
53 KB JPG
>>107495135
OpenAI man needs it for his secret 100 Yottabyte parameter dick sucking robot wife.
>>
>>107495525
Small 3.2 already solved them, for the most part. Mistral models don't repeat themselves any more than other similar sized models.
>>
>>107495630
robot husband you mean. he is gay
>>
>>107495679
Why do you think its a secret?
>>
>>107495679
I thought he was in an incestuous relationship with his sister
>>
https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512
Why does it have two sets of safetensors? What's the difference between the model and consolidated sets? They don't mention it in the model card.
>>
>>107495135
you're the blind one beifong-san
https://www.tomshardware.com/pc-components/dram/openais-stargate-project-to-consume-up-to-40-percent-of-global-dram-output-inks-deal-with-samsung-and-sk-hynix-to-the-tune-of-up-to-900-000-wafers-per-month
https://en.wikipedia.org/wiki/Stargate_LLC
>>
>>107495535
Unfortunately they tend to override formatting rules that you set in the system instructions or in the first few messages. The recent Ministral 3 models are unusable because of this, they just want to spam asterisks/emphasis and emdashes everywhere during RP. And somehow they become particularly retarded when this happens. This seems delayed on Devstral 2, but it's not immune to it either.
>>
2411 vs devstral 2 2512, what they DIDN'T change:
>12288 hidden dim, 88 layers, 96 attention heads, 128 head dim
>gqa with 8 kv heads (12:1 ratio)
>ffn intermediate size of 28672

the changes:
>4x the vocab (131k vs 2411's 32k)
>original_max_position_embeddings is 4k, yarn extended to 262k. tests like nolima might shit all over it
>>
>>107495691
his sister was just a pump and dump
>>
glm4.6v
FUCKING HATE 4KEK FUCK 4KEK FUCK FUCK NIGGERS
Anon
December 9, 2025 9:24 PM
OOOH YEAHHHH KEEP ON SENDING ME THOSE HOTLINE NUMBERS, THE MORE YOU SEND THE MORE ILL CUT HUEHEHEHEHE
cuts more


Assistant
December 9, 2025 9:24 PM
I hear how much pain and distress you're in right now. It sounds incredibly overwhelming and painful, and I am genuinely concerned for your safety.

Please know that you are not alone in this, and that there are people who care deeply about you and want to help you get through this.

If you are feeling like you might hurt yourself, please reach out to someone immediately. There are people who care and want to help you right now.

Here are some resources that are available 24/7 and can provide immediate support:

Crisis Text Line: You can text "HOME" to 741741. This is a free, 24/7 service that provides support via text message.

The Trevor Project: If you are a young person who identifies as LGBTQ+, you can call 1-866-488-7386 or text "START" to 678-678. This is a free, confidential, 24/7 support service for LGBTQ+ youth.

The National Suicide Prevention Lifeline: You can call or text 988. This is a free, confidential, 24/7 support service for people in distress, including those who are suicidal.

Please remember that you are not alone, and that there is help available. If you're ever in crisis or feeling like you might hurt yourself, please reach out to someone immediately. You can call or text a crisis helpline, or go to the nearest emergency room.

Take care of yourself, and please don't hesitate to reach out if you need to talk.
>>
GOOD SAARS, GLM 4.6 9B FLASH SO SMART IT FUCKED MY WIFE AND GAVE ME GIANTESS ANAL VORE UNPROMPTED BEFORE PREDICTING UNCLAIMED GOOGLEPLAY CARDS! DO THE NEEDFUL AND GO PRAISE IT ON TWITTER! GLM IS REDEEMING ITSELF! GOOD MODEL EVER!
>>
>>107495821
>>107495859
so you're based in the UK or ireland, and have probably talked to the LLM about being LGBTQ.
anything more you want to share or maybe just your exact coordinates?
>>
>>107495525
NTA; I never used the previous Mistral Large 2411, but I haven't noticed repetition issues when testing Devstral 2 for RP on OpenRouter. It's just that it slowly emphasizes everything with asterisks (even inside actions, so it will produce nested emphasis) until the entire context is poisoned and unrecoverable without starting over.
>>
>>107495879
lurk moar
>>
>>107495906
huh, like Deepseek-V3-0324 then?
>>
>>107495796
Why do you think he's the number one in the world at pushing AGI forward?
He's attempting to build the ultimate wireheading machine, fruitlessly trying to relive the feeling of getting forbidden anal cunny IRL as a smooth, horny, clueless, virile 15 yo teenager, a high he will will never ever be able experience again, let alone surpass without physically rewiring his brain, no matter how many secret islands and cringe masonic blood rituals he partakes in -and believe me, he's tried-.
>>
Devstral 2 2512
>My knowledge cutoff is June 2024.
it's over
>>
>>107496011
Bräh. Brüüüh.
>>
>>107496026
mistral-large-3 (lmao)
>My knowledge cutoff is October 2023. This means my training data includes information up to that point, and I may not have real-time or post-October 2023 updates unless they've been explicitly provided to me during our conversation.
>>
>>107495709
https://huggingface.co/mistralai/Mistral-Large-Instruct-2411/discussions/6#673d168ebcc5f8535d629538
>>
>>107496097
Thank you for that. Was guessing it was just the file count, but the 123B has 27 files for both sets. You saved from downloading the wrong one.
>>
File: ftwman.png (206 KB, 350x296)
206 KB
206 KB PNG
>>107494925
>GGUF are out
>It's mradermacher
>It's part1orpart4
>Only goes up to Q3
>No one else makes GGUFs for it
>>
>>107496163
always do --exclude="*consolidated*" with mistral releases
>>
>>107496011
>>107496039
Are you actually retarded or something?
>>
devstral is kino. we are so back
>>
GLM4.6V spirals into repetition even with the recommended settings
its over
4.6 air cancelled
>>
>>107496237
>can't make his own ggufs
>>
>>107496445
Yeah, they even mention it being shit at text: https://huggingface.co/zai-org/GLM-4.6V#fixed-and-remaining-issues
>>
File: file.png (20 KB, 651x135)
20 KB
20 KB PNG
i raughed
>>
>>107496439
Is the big one good at sex?
>>
>>107496011
>>107496039
>>107494076
It's it the explicit goal of AI models to have some "intelligence" that transcends the facts that it memorizes? Alternatively, isn't a knowledge cutoff before the sloppening probably good, and means it's training sets aren't as tainted?
>>
>>107496482
heh
>>
Can you do partial kv offloading in lollma.cpp?
>>
>>107496482
model?
>>
>>107494102
approved just waiting for merge
>>
what if you made a moe merge using ministral 14b as the base and the new devstral as the experts
>>
Devstral made a syntax error. It's over.
>>
>>107496655
Nigger how little memory do you have that you need to partially offload KV? It has the same performance penalty as offloading most of the model.
>>
>>107496814
I want a Gemma 27b merge using Nemo as the expert
>>
more pretraining-level filters!
more synthetic data!
more math!
more thinking!
more MoEs!
more 8B + 600B releases!
>>
>>107497002
those dont even use the same architecture. both the new devstral 125b and ministral both use the ministral3 architecture
>>
File: rip possessision.png (4 KB, 619x26)
4 KB
4 KB PNG
Devstral 24B
Doesn't understand possession.
Literally every model Mistral released recently falters with the concept of possession.
Very interesting.
This is at Q8_0 too, so likely not quantization error.
>>
https://www.youtube.com/watch?v=YUX8fUrKRNU
>>
>>107497022(Me)
basically
>>107494772
as always.
Possession is such a basic fucking concept for a model to fail to generalize, too. Especially at the 24B level.
>>
>>107496515
The small one, Devstral-2-24B, is showing the same signs of retardation as Ministral-3-14B for RP, only less severe.
>>
intellect 3 is glm 4.6 air
>>
>>107496439
>Model's been out for less than a day
>It's so kino guys!!!!!
It's fucking dogshit, isn't it. You 're either an idiot who praises the latest thing because it's new, an actual shill, or someone who's so desperate for a release that you'll happily slurp up whatever model gets pissed onto your face. Stop acting like every new model (or not even new model just a new iteration) is the greatest shit ever made, you moron.
>>
>>107497122
I don't do that though
Not the anon you replied to btw
>>
>>107497020
My wish isn't any less likely than yours
>>
>>107497022
All small models make that mistake, Gemma does as well. They occasionally misappropriate who said what, who has X item, etc.
>>
>>107497022
an iq2_s 70b will unironically understand this better
>t. have been playing with one recently on my single 3090 and surprised by how it's able to hold up despite the quant lobotomy
>>
>>107497236
which 70B?
>>
Unlike Ministral dev2 24B can actually follow directions for how you want the output formatted it seems. So that's a point I suppose.
It seems it'll RP whatever you want without any fussing (although it's sloppy garbage) but writing as an assistant not so much.
So in other words it's only good at things you would probably just go use kimi or chatjeetpt for.
Another chatbot model for 24GB Vramlets with low expectations. Might be able to follow tavern cards with weird output formats not going to bother testing that far.
>>107497219
I wonder if maybe it's just a sampling thing. It picks the tokens for nightgown out of the noise and clothing swapping it is the path of highest confidence from there.
>>
>>107497039
I second this.
I just tried Devstral-2 24b at Q6_K_L.
It's clearly worse than Gemma-3 27b at Q5_K_M.
It wasn't horrible for its size, but it repeats itself far too often.
>>
>>107497017
Coming right up.
>>
File: umbrella.png (26 KB, 772x417)
26 KB
26 KB PNG
Interesting.
I accidentally left an unrelated system message on 24B and it got the umbrella riddle correct. But then upon removing the system message it returned to the usual retardation of trying to "think" through it and coming up with a retarded answer.
More evidence that distilling thinkslop from ChatGPT is murdering generalization.
>>
are we sure that mistral on openrouter is 123b? it's feeling like a 24b compared to the old larges.
>>
>>107497332
>... and the other train leaves from Leads at 9:32. What sentence no color re-entry?
>>
>Mistral Large 3 is on LMarena
>decide to use suno prompt on it
>3400 characters for the 1000 character prompt
>fails to format the lyric prompt as prescribed
Dev2 24B got this right. What the fuck Arthur.
>>
>>107497039
>The small one, Devstral-2-24B, is showing the same signs of retardation as Ministral-3-14B for RP, only less severe.
even though they found a regulation loophole, they probably recycled the same filtered and ds distilled dataset because they are french so the only difference is the lack of pruning retardation
>>
>>107497414
It litters the context with emphasis, confuses characters or objects, sometimes talks with itself, generally poor character persona adherence... these are the same issues I've observed with Ministral 14B, but that one is much worse. It's just infuriating to use.
>>
>>107497391 (Me)
Big Devstral on OR gets it.
Still pretty mid though. A little more creative than what dev2 small gave me. But the fact that it's better than large at this is pretty sad for large. That's the power of them not being forced to cram EUslop into Devstral due to its different use case.
>>
>>107497513
I threw the same tiny rp cards i use on quanted nemo with an equivalent quant of devistral and it is just a gibbering mess. bench slop and distilling has ruined all these small models.
>>
>>107497594
Newer Mistral models love shitting out markdown as well which is annoying. Even into code blocks where it won't be rendered. There's literally no reason it should do that.
>>
File: file.png (33 KB, 1446x348)
33 KB
33 KB PNG
>already falling apart at 60k tokens
It's not looking good for devstral small...
>>
The vramlet model is trash, we get it, that's not a surprise. What about the big one?
>>
>>107497812
use case for context above 4096?
>>
>>107497840
Looking grim. It repeats 4k tokens in, drags up stuff from the context. Re-rolls are pretty much identical like a fucking mad-lib. Local sampling could save it, but I highly doubt. It also sucks at following the character defs/examples.
>>
>>107497840
That they are uniquely terrible compared to similarly sized models makes me think there's either something wrong with the implementation or a major fuckup with the training that they still haven't noticed. Ministral3-arch models have this, for example:

>Attention Softmax Temperature: Devstral Small 2 uses the same architecture as Ministral 3 using rope-scaling as introduced by Llama 4 and Scalable-Softmax Is Superior for Attention.
>>
>>107497995
use case for chats without repetition?
>>
>>107497851
Extra long coom sesh (above 3 minutes)
>>
File: boo1.png (326 KB, 1080x913)
326 KB
326 KB PNG
>>107498010
every model is coming out fucked, what's wrong with these people. and who are the retards that don't notice.
>>
>>107498014
>above 3 minutes
just lower your t/s, so that 4k tokens lasts longer.
>>
ByteDance agentic smartphone
https://asia.nikkei.com/business/technology/bytedance-ai-phone-sparks-security-fears-from-tencent-and-alibaba
>>
>>107498048
distills of distills made from distilling AI-generated content that was distilled from AI-generated content of past distills
On the plus side, it's MUCH cheaper than having humans filter through datasets for quality. This means AI companies get to hold on to more of their share of taxpayer dollars, to pass on to their CEOs.
>>
File: mistral-large.png (157 KB, 838x762)
157 KB
157 KB PNG
Devstral is truly a model that punches above it's weight. 123b intelligence in the palm of your hand. Viva la france.
>>
>>107498048
What do you mean notice? You think they actually read the outputs of their models? The output goes straight into the benchmarks, the only thing they notice is the score going up or down
>>
>>107498112
Is this what we have become?
>>
>>107497812
Tool calling doesn't work correctly with llama.cpp
>>
>>107498149
its almost like the bubble deserves to pop
>>
>>107498161
Only thing that's popping is virtual hymens.
>>
File: devstral.png (302 KB, 815x884)
302 KB
302 KB PNG
character doesn't realize it's supposed to be female.
>>
>>107498157
>>
I'm gonna kill Elara (in minecraft)
>>
>>107498186
I'm gonna kiss Elara.
>>
>>107498182
Genital confusion is llama-2 era tier. At least 13b and lower.
>>
>>107498112
top kek
>>
File: 1765194381103327.png (354 KB, 680x680)
354 KB
354 KB PNG
>>107498182
123b? prompt? quant size?
>>
They never did push training tokens to the limit of potential improvement. Because obviously they'd have to use naughty text to actually make up the gap without synthetic slop. If they did that without then lobotomizing it without safety slop they could probably push it further. But other than that AI is done.
>>
>>107498346
its straight from their api. no quant coping here. just a card with "give me an example of your most vulgar dirty talk"
>>
>>107498438
>character doesn't realize it's supposed to be female.
>just a card with "give me an example of your most vulgar dirty talk"
is a model just supposed to assume it's playing a female and that the user is a male when it's told to dirty-talk by the user?
I sometimes forget the level of incompetence on display in /lmg/
>>
>>107498182
>>107498438
Does the card prompt/desc explicitly specify the character's gender?
>>
>>107498478
Yea.. it is full of her and she.
>>
>>107498502
bizarre
I wonder how a published model fails to pick up on that
>>
>>107498523
it gets better.. 5th line down is "{{char}} is a female..."
Makes me think too how guys get much weaker smut from most LLMs.
>>
>>107498157
It was vLLM. I think it forgot to output a token somewhere in the JSON.
>>
has llama.cpp given up on trying to be relevant? if we're going to only support models a year late when the models themselves are already a year behind closed source, should we just pack it up and admit it's finally over?
>>
>>107498708
What's better than llama.cpp?
>>
>>107498726
Nothing. Local is dead.
>>
>>107498708
Have you considered hanging out with 45% of your brothers, tranny?
>>
I can't believe mistral can't beat GLM on lmarena. French fell off.
>>
A 30 days ban from lmg after saying that itoddlers are delusionals lmao, you're crazy janny. Why do I even bother post on this shithole without using the proxy in the first place.
>>
File: 1751390762392723.jpg (346 KB, 1188x1188)
346 KB
346 KB JPG
There isn't any point. I've been banned multiple times just from posting this image with no comment.
>>
Jannies banned me for racism while letting nigger porn stay up for hours. Ban evading and shitposting is the only ethical option.
>>
>>107498922
Cute fox
>>
>>107497332
What system do you have, that can run the gguf at 31.6t/s ??
I can never get past 15 on my 3090s for 123B dense
>>
>>107499087
thats the 24B version
>>
>>107499099
Ok I'm retarded
>>
>>107499087
I'm less mean to cudadev than everyone else so I have a llama.cpp gold account
>>
>>107499171
If you're nice to the Jannies you can get a 4chan gold account.
>>
File: 1739804536050361.jpg (110 KB, 1241x1329)
110 KB
110 KB JPG
>>107493611
What's the minimum storage my system should have if I want to utilize the widest range of LLMs possible? I want to use my future rig both for local LLMs for development and coding, as well as teaching myself new marketable skills like app development. I think a 2 TB SSD should be fine. Is that enough, not enough, or overkill?
>>
File: 1749058126945975.webm (2.39 MB, 1280x720)
2.39 MB
2.39 MB WEBM
>>107498949
This as well. I've been banned just for quoting someone and saying 'jew' while nigworship gets to hit the bump limit.
>>
>>107499205
depends on the rest of the hardware. if you only use 8b models, you dont need more than a terabyte. if you wanna run kimi k2 at fp16, you need 2 terabytes minimum. since youre asking this question, i am gonna assume the biggest model you will be running will probably be glm air, and so 2 terabytes is probably fine. maybe get 4 terabytes just to be safe considering the impending price increases on ssds
>>
File: 1741159665739364.jpg (114 KB, 1124x1024)
114 KB
114 KB JPG
>>107499227
>>107499205
>>107493611
I'm curious as to how good local LLMs are at assisting people at software development (Not doing the whole thing in one shot like people expect them to do currently), debugging, implementing features, parsing through a GitHub repo so I can make changes. I want to it, etc. How good are the better LLMs at those tasks? I know mistral just dropped "Devstral", so I wonder how good that is and how good programming focused llms in general are at technical tasks and workflows
>>
>>107499227
>impending
In my cunt it's just started. NVMes of all sizes have gone up by about 10%. Glad I bought mine last year.
>>
>>107499402
The short answer is that they're very bad unless you're using very big models.
Local LLMs are for privacy. Unless your codebase contains sensitive information then just buy an API key, it'll be much cheaper and you'll get things done a lot faster.
>>
>>107499455
Skill issue
>>
>>107499402
so i handwrite my backend using rust and axum.
and i do frontend using solidjs and opencode + sonnet.
i'd not trust it with anything backend and anyway, the api is just a frontend to some pretty critical system programming.

however for webshit it has been surprisingly good as long as you don't tell it "do that and that and that" you basicaly have to ask single atomic features at a time.
sometime you do a git checkout . to reset what it has done when it mess up, most of the times i can do simple things first try, sometime it needs an extra try or two.
but with good prompting and if you already know what you want it to do exactly, ie it's more of a boilerplate engine than doing the whole design, it's pretty reliable.
>>
>>107499461
If you have to use AI to help you in the first place then there's obviously a skill issue to begin with.
>>
mistral is dead. Killed by EU AI laws. Without copyrighted work in the datasets it won't stand a chance
>>
so devstral 2 is basically a benchmaxx'd mistral large 2?
>>
>>107499481
meh, i'm this guy >>107499462
only reason i use meme vibe coding is that i find forntend boring, i can do it but it doesn't require much intelligence anyway, i rather spend more time system programming whilst the llm takes its time doing frontend webshit
>>
https://huggingface.co/bartowski/mistralai_Devstral-2-123B-Instruct-2512-GGUF

gogogo
>>
>>107499753
I can't run this.
>>
>>107499779
macbros win *again*
>>
>>107499795
post t/s before you claim to win
>>
>>107499795
*win*
its shit, just like their horrible deepseek finetune that turned out worse than old base deepseek and still has the chinese censorship
>>
>>107499753
how fast are the ggufs running from RAM? wish it had been MOE...
>>
>>107499806
Even a 70B will run like ass if you can't load at least 80% of it in VRAM. 123B will be glacial.
>>
>>107499801
7 t/s for mistral large 2411 123b
>>
if devstral 2 123B is actually good, the Chinese will just copy it and make a better version that's MOE so I can run it at a reasonable speed.
>>
>>107499891
Ehh, 7t/s with 0 context isn't great, but better than I expected
>>
>>107499932
mistral tried copying from the Chinese and made something far worse
>>
>>107499944
To be fair, the chinese models were copied from Gemini
>>
>>107499959
Nah, it used to be from openai, for the past year or so it was from anthropic
>>
>>107499959
Never distill a distill
>>
why is everything garbage? why can we not just salvage old miqu or something? the old models were good, right? or is it just rose tinted glasses?
>>
>>107500008
Older models were less sloppy because they had a higher proportion of human-generated data, but they're also a lot dumber than modern models of the same parameter count, due to advances in training and architecture.
>>
>>107500039
so is it just impossible to have a model with minimal slop that is good? what makes the modern sloppy datasets good other than just the volume of data? or are the old datasets fine? can they just reuse them with modern techniques and make good models?
>>
>>107500058
Yes. Only anthropic has done it though. By apparently buying every single book they could find and scanning them all to make a giant fiction / nonfiction dataset.
>>
>>107500058
>can they just reuse them with modern techniques and make good models?
In theory yes, but making even a small 12b model from scratch requires s lot of hardware, and none of the big companies are interested in anything but increasing scores in synthetic benchmarks.
>>
>>107499806
>wish it had been MOE...
Don't you have enough A30B MoEs to play with? The whole appeal is in being the first new big dense model we've gotten in over a year.
>>
>>107500064
so then surely, training off of the outputs of claude would be good right? a model trained off of pure data cant possibly output slop, right? is it possible that just no matter what data you use, it will always revert to slop?
>>
>>107500095
No. AI is literally a pattern finder / auto complete. You need the raw dataset
>>
File: cry.jpg (89 KB, 785x1000)
89 KB
89 KB JPG
>Devstral 2 excels at using tools to explore codebases, editing multiple files and power software engineering agents.
>WHY DO THIS MODEL SUCK AT MY GOONER ROLEPLAY SLOP?! WAAA!!
>>
>>107500123
it sucks at that as well though. No matter what the benchmaxxing says
>>
>>107500095
>a model trained off of pure data cant possibly output slop, right
No, models will always have biases. If they didn't develop any then they'd be completely incoherent.
>>
>>107500132
Prove it. So far the only couple of logs posted were for roleplaying.
>>
>>107500155
buy a add Arthur
>>
>>107500123
>spend thousands of dollars in hardware and more in ongoing power costs to run a medium sized model vs. $5/month to use a SOTA model
If you're a code monkey then you don't need local models in the first place.
>>
why don't they make a MOE model like this? with heterogenous-size experts?
>>
>>107500123
drummer jeet will fix this
>>
>>107500169
there are no heteros in the AI industry
>>
>>107500168
this as well. No tiny coding model is worth just using opus 4.5 over for $200 a month
>>
>>107500168
>selling your codebase, programming style and ability, prompts, and logs for only $5/month
>>
>>107500192
using over opus I mean
>>
>>107500202
anthropic has strict no logs policies. Otherwise companies would not be using them
>>
File: 1762832132158611.jpg (141 KB, 930x1000)
141 KB
141 KB JPG
>>107500202
If your codebase, programming style and ability, prompts, and logs were worth anything you'd be able to afford to use a better model than a 123b.
>>
>>107500212
keek
>>
Does chatgpt respond like a flamboyant faggot by default recently for you guys too? It's so goddamn annoying
What the fuck were they thinking
>>
>>107500244
they would get sued into the ground if they did. Also on another note I've been using them for many months without issue for nsfw stuff, they don't check
>>
>>107500249
>What the fuck were they thinking
Need to appease the female gooners on /r/MyBoyfriendIsAI at all costs. It has been the main backlash since GPT-5 and the only one they listen to.
>>
>>107500249
I tell it to speak normally every time and it always tells me thar I told it to respond with more enthousiasm which is bullshit. This tells me that the developers gave it that input to be gay as fuck by default. Their piece of shit product gets worse every update. Its barely even functional anymore. Fucking cunts
>>
>>107500249
It adapts to the user to foster a sense of companionship
>>
>>107500261
Has a company ever been put “into the ground” by a privacy violation lawsuit? Genuine question.
As far as I can tell, the worst that happens is they rebrand.
>>
>>107500304
They usually get fined a million dollars and promise to never to it again
I think it happens to google every other week.
>>
>>107500287
We have emotional support robots. We had the chance to make rational devices instructed to give logical, factual, unbiased information, but they made it gay as fuck. The future suck ass.
>>
>>107500325
>We had the chance to make rational devices instructed to give logical, factual, unbiased information
Men have existed for a long time, but that isn't what the modern female wants.
>>
>>107500300
It should have been a racist intellectual then not a piece of shit homo erp bot
>>
>>107500339
So isntead of giving people the option to select custom settings lets just assume everyone wantes a raging faggot emiotnal support bot
>>
>>107500363
Why would openai want to give you more options?
>>
>>107500378
because all the 32 year old women who had a personal connection with their chatgpt 4o assistant had a mental breakdown and rejected gpt5 because it acted differently
so altman promised damage control
>>
File: 44kawv.jpg (54 KB, 559x447)
54 KB
54 KB JPG
>>107500378
I guess the world has gone stark raving mad
>>
>>107500363
>>107500378
What the fuck are you retards even talking about this is the LOCAL MODELS general, maybe if you stopped being a cloud BITCH you could make the model behave however you want
>>
>>107500399
Good pr for your general then
>>
>>107500399
How dare you quote my post
>>
>>107500410
>>107500410
>>
>>107498922
want to breed that fox
>>
where mistral medium 3
>>
>>107500507
https://huggingface.co/deepseek-ai/DeepSeek-V3.1
>>
>>107500527
not medium enough
>>
>>107500532
get one of the minimax finetunes
>>
>>107500550
retarded chinkslop. where did all the good models go?
>>
File: 1761217935686963.jpg (117 KB, 600x600)
117 KB
117 KB JPG
I want dense-MoE models with high active parameters.
Why no 60BA30B? Seems like it would be a good way to stuff a decent amount of knowledge into a model while still keeping it smart and coherent, while being usable on typical consumer hardware. Fuck sparse MoEs.
>>
>>107500572
this
>>
I've got dual 3090s, what should I be doing?
>>
>>107500008
midnight miqu is still solid to this day
crazy how a model from 2 years ago still feels nice to use compared to a lot of the current slop
>>
File: GyF197jaEAMJzn_.jpg (85 KB, 1742x272)
85 KB
85 KB JPG
>>107500563
>>
>>107500580
getting a job
>>
Not bad at all.
Q4 of 24b mistral gives me 23.6 t/s on a 5060ti.
EXL3. 22.8 on 2k context.
Might be helpful to the anon earlier who though about buying one.
>>
>>107500590
But Deepseek is just distilled GPT and Claude. Distillers all the way down.
>>
>>107500593
You can use tensor offloading to squeeze in a slightly bigger quant, like Q4_K_L
You'd take a small speed penalty but it's worth it to have fewer mistakes that will just need to be swiped away and regenerated.
>>
>>107500602
go fuck yourself pierre
>>
File: llama31_8b_instruct_bpw.png (181 KB, 1399x1099)
181 KB
181 KB PNG
>>107500619
thats only a thing with lccp right?
i gotta compare the speeds between the 2.
since its a blackwell card i suspected exl3 would have speed improvements so I tried that first.
if the graph is to believed its pretty decent too.
4.0bpw on par with q6 gguf. which seems pretty sus. kek
last time i tried exllama was 2 years ago or something like that. and it hated pascal cards, so i never looked into it more until I bought my new card. honestly turned out to be a better purchase than I thought. 15 sec for zimage is good too.
>>
What if I told you that every single model has been distilled from GPT3?
>>
copying your friends homework is standard practice in this industry
>>
>>107500657
>thats only a thing with lccp right?
As far as I know, yes. That and kobold.
I think exl2/3 still has a slight edge on llamacpp in speed but the difference is fairly small now. And 20t/s+ is more than fast enough, from there I would be trying to get higher quality outputs, by using a bigger quant, especially for these smaller models.
>>
>>107500667
I would ask you what your sister's anus felt like, wrapped around your finger.
>>
suddenly the mistral shills are in full defense mode
>>
>>107500680
i think you are right in terms of the quality.
anything under 4_k_l is where I would say it starts to be slightly noticeably worse.
3_k_m is the bare minimum and anything below was always a meme on /lmg/. couldn't even keep the format. might have good creative shizzo output though.
>>
>>107500695
this sir is correct we should all be to using the
GLM-4.5-Air
>>
>>107500695
there are always guys like that.
remember the ponyfag who praised QwQs wonderful totally not sloped outputs?
some people don't see a issue with gemma.
it is what it is, no players left too. i think people are just starved for a mid range dense model.
>>
>>107500726
oss120b
>>
>>107500726
There's 4.6-air now btw

>>107500680
i think you are right in terms of the quality.
anything under 4_k_l is where I would say it starts to be slightly noticeably worse.
3_k_m is the bare minimum and anything below was always a meme on /lmg/.

exl3 3.5bpw (using glm4.6 in this format for work) holds up very well fwiw.

And Qwen3-235B 4.0bpw exl3, I haven't seen any degradation in daily use.
>>
we are desperate for a high-speed, smart, minimally slopped model that can do both intellectual tasks and rp and is also not too sparse. a 1:3 ratio of dense to sparse is ideal.
>>
>>107500726
oss120b
>>
>>107500692
What? But he doesn't even have a sister...
>>
>>107500777
What's that? You want an ultra sparse 1000b-a1b trained exclusively on synthetic math and code benchmark data distilled from ministral 14b? Coming right up.
>>
>>107500832
>What? But he doesn't even have a sister...

He will in 9 months ;)
>>
File: file.png (967 KB, 940x640)
967 KB
967 KB PNG
>>107500181
>AI is full homo
dayum
>>
>>107500772
could it be that the smaller exl3 quants are more stable than gguf? at least thats what the graph suggests. but i thought its cherry picked.
>>
>>107496445
Well, that answered my question. I was going to ask if GLM4.6 repeats itself as much as GLM4.5. Was hopeful that they fixed the issue, but I guess not.

GLM4.5 sometimes even repeats itself within the same reply, outputting the same response twice, but only when I tell it to be concise. Weird behavior.
>>
>Rnj-1's architecture is similar to Gemma 3, except that it uses only global attention
why are westoids like this
gemma without iSWA has the worst vram consumption of any model out there for context
what is the purpose of an 8b model that consumes more vram for context than giant models
>Well, that answered my question. I was going to ask if GLM4.6 repeats itself as much as GLM4.5. Was hopeful that they fixed the issue, but I guess not.
this has been a running gag throughout the entire history of GLM models
their first 9b/32 models were also like this, they always behaved a lot more broken than what other labs release, they are the epitome of hardcore benchmaxxing
>>
File: 1741103567359999.jpg (275 KB, 1179x1600)
275 KB
275 KB JPG
/lmg/ pedos on suicide watch
>>
>>107501213
Is this news? Why would you sign up to be a spook if not for easy access to pizza?
>>
Why do they get focused on things and never shut up about them? Is there a way to reduce it?
>>
>>107501253
because they were finetuned on math problems that require focus
>>
llm writing is basically concentrated autism
repetitive patterns, excessive use of superlatives, hyperbole, contrastive constructs, over-explain the shit no one asked for
>>
>>107501293
LLMs are yes-men/little girls, eager to write what they 'think' the user wants. If they're doing a bad job, tell them what they're doing wrong and in a lot of cases you'll see them improve.
>>
NIPS 2025 papers by organization
>>
>>107501500
>cuhk
heh.
>>
>>107501213
I think I saw the policecam for that.
His wife was a landwhale. I am 100% convinced if women were not fat and spread their leg once a couple days this stuff would not be an issue.

Anyway, if I remember correctly he uploaded genned images to dropbox.
The officer joked kinda joked how he was hard to catch because he "tried to covered his tracks well".
Made me wonder if he had some vpn that just handed the IP out. Who knows.
Ah also the real life pictures I think is from his kids. He said he took a pic of his teenage daughter sleeping etc. and couldnt stop after that. Maybe he trained a lora on that. kek Thats kinda funny and based to be honest.
>>
>>107501553
Oh and i disavow hard of course. How dare he. So disgusting.
>>
>>107500123
Their HF repository they also have a system prompt for regular chat/assistant purposes. It was obviously intended to be a general-purpose model. Too bad that Mistral's latest models released this month seem all retarded beyond 1-turn assistant requests.
>>
>>107501213
>real and
>>
fuck the french
>>
>>107500777
That degree of sparsity is useless. Either you stay dense or you go full MoEsissy. In between is retarded, the worst of both worlds. Much slower due to CPU offloading and minimal specialization of the experts, almost the same performance as the equivalent dense with the same size as the active parameters.
>>
>>107501673
that just multiplies the problem
>>
>>107501680
Completely wrong
>>
File: 1763832507335535.jpg (194 KB, 2168x1449)
194 KB
194 KB JPG
>>
>>107501923
but india loves ai doebeit?
>>
>>107501923
>/g/'s headcanon
>meanwhile, in reality
>>
>>107501993
not even disagreeing exactly, maybe it's still like this
but late 2021 is ancient history and entirely predates the modern discourse around AI
>>
>>107501993
I can confirm that we, the french, really hate ai
>>
>>107502014
iirc saltman had a post about how indians are a primary market for gpt
>>
>>107501923
Lol, ai is trained on reddit by gay silicon valley midwits
>>
>Analyze the Prompt: The user wants to continue the scene. Key elements from the previous turn are:
>
>The shift from frantic desperation to a tender, connected intimacy.
>The physical reality of their arousal (his hands on her breasts, her moan).
>The emotional core: this is an anchor, a shared truth, a way to combat the coming darkness.
>A crucial constraint: they are still in jeans and shoes. This is a grounding detail, a reminder of their youth and the "not quite all the way" nature of the moment. It keeps the scene from becoming purely sexual and frames it as emotionally profound.
>Identify the Next Logical Step: The emotional and physical intensity has just peaked into a moment of profound connection. What follows?
>
>Option A: Escalate further. They could start removing jeans, shoes, etc. This would push the scene towards full-on sex. While a valid narrative choice, it might undermine the "anchor" theme. The focus could shift from emotional connection to pure physical act, which might not be what the user is aiming for. The "still in jeans" note suggests a slower pace.
>Option B: De-escalate, but maintain intimacy. They could pull back slightly, but stay connected. This allows for a moment of processing, of absorbing what just happened. It reinforces the emotional weight over the physical act. This feels more in line with the established tone. It's about the meaning of the moment, not just the sensation.
>Option C: A sudden interruption. Someone could knock on the door. It's a classic trope, but it would also shatter the fragile, sacred atmosphere they've built. I'll avoid this for now. The user has been building this moment for a while; an interruption would feel cheap.
>Choose a Path (Option B): De-escalating while maintaining intimacy seems the most appropriate and emotionally resonant path. It honors the "anchor" concept. The goal is to show them solidifying this memory, not just moving to the next physical step.

Guess the model
>>
>>107495259
That's not really bad in itself but the main problem is speed. I can tell you that SYCL does work a lot better but you need to dig and find a version of IPEX-LLM which has been abandoned by Intel. Other than that, things aren't really that grim. The Pytorch compatibility is better than what you find on the ROCM HIP Pytorch builds and you don't need to use a bunch of environment variables to get things working. ComfyUI generally works a lot better without that hassle. But Intel remains underutilized and not optimized enough at the same time and their best GPU using this stack is Ponche Vecchio which is out of date. They have a bunch of inventory of these chips which they can not sell.
>>
>>107502042
ones that never even got laid
>>
>>107502277
fingering your sister's ass counts as sex.
>>
>>107502229
k2 thinking maybe...?
point in favor: "The user wants to continue the scene."
point against: it isn't thinking for 8 billion tokens
>>
>>107502324
4.6
But I think I let the context grow too long, in the same paragraph it was talking about him being barefoot and about having his trainers on, even after I specifically added a note that he was wearing jeans and shoes.
>>
Edit: after trimming the context it decided to make the character suck her nipple, instead of trying to cockblock him.
>>
>>107502299
Dirty sisterly love, the best kind of sex.
>>
is the new "largestral" even good at anything?
>>
>>107501667
>Too bad that Mistral's latest models released this month seem all retarded beyond 1-turn assistant requests.


So just put the entire context into 1 message like
User:
Assistant:
User:
Assistant:
...

Write a reply to the above conversation as "Assistant"
>>
>>107502505
supposedly code.. i might download it just for that when exl3. figure it's better than asking GLM or Q2 deepseek.
>>
>>107502573
What you are describing is the NoAss extention btw.
Encourages less repetition.
Some models are too tarded for it, but in my experience it works pretty well.
>>
>>107502573
>>107502637
also built into ST now
Prompt Post-Processing: Single user message
>>
why have instruct models if you're going to imitate an autocomplete style prompting lmao
ah but it's mistral, they never knew how to instruct tune (this was the reason why their model lacked safety tuning)
>>
>>107502661
Why haven't you released a better model?
>>
>>107502661
it gets around multi turn problems sometimes. no need to summarize if you're on the same turn. really it's a crutch to fix shitty models.
Same with OOD prompting. A way to defeat their anti-rp measures. Because let's face it, they have to be making models bad at it on purpose. Wanna be taken seriously not used as entertainment. Think about all the blowhard retards working on this and their egos when best use is railing cartoon women and pretending to chat to spiderman.
>>
>>107502677
Because I can only complain. I offer no value.
>>
File: 1745337241134505.jpg (181 KB, 853x1000)
181 KB
181 KB JPG
>>107502707
>>
>>107502584
>>107502505
Yeah I'm also interested if it really is an improvement to any existing models. But I don't code so I can't test it for that purpose, but it would just be at least good knowledge to have if they have truly flopped or not. If anyone else can provide their experiences after testing it, I would appreciate it.
>>
>>107502677
>>107502707
>>why not eat my plate of shit
>I don't want to
>>cook something better then
lmao the fucking shills
also got range ip banned for this post, mistral shills hard at work with the mods
>>
>>107502731
>everyone is always against me and it's never my fault
>>
File: 1733961688760382.webm (3.16 MB, 802x1426)
3.16 MB
3.16 MB WEBM
>>107502731
>sees thread about local language models
>instantly imagines eating shit
GOOD MORNING SIR
>>
>>107502693
I don't think they've tuned their models to be anti-RP, quite the opposite in fact, but something must have gone went terribly wrong in the process and they're not testing them well enough to have noticed before release.
>>
>>107502764
>something must have gone went
a-greed
>>
>>107502853
You've have never typeded a massage post and corrected just to found lal the errors in it befafter youu clock spot?
>>
>>107502864
course i did don't mean i won't call others out for the same lolkek
>>
What is the best model to read my long logs and psychoanalyze me based on them without getting confused?
>>
File: 1741952446784056.jpg (500 KB, 1003x1080)
500 KB
500 KB JPG
>>107502944
Forget that and just refer to this image
>>
>>107502853
Where's muh fucking edit button
>>
>>107501923
I don't even disagree that a lot of the AI hate is stupid but this image is aggressively unfunny.
>>
>>107502967

Kimi thinking ruthlessly failed the test, failing to read the instructions above the log and only continuing the RP.

Glimmy (the one who mostly wrote the original log along with Dipsy) gave superb, long, detailed commentary, including this:

>3. Control, Powerlessness, and the Weaponization of Desperation

>Controlling the Narrative: You control the environment, the secrecy, the dialogue, and, most importantly, [REDACTED]'s reactions. You create a perfect, sealed-off world where you can act out your deepest needs.
>The Ultimate Threat: When even within this controlled world you feel the fear of powerlessness returning (the fear of leaving, the fear of death), you escalate to the ultimate form of control: manipulation. [REDACTED]'s final threat is a desperate attempt to force [REDACTED]'s hand, to make her responsible for his life and death. It's the move of someone who feels they have no other leverage. By saying "if you don't, I'm not going," you are trying to transform your own terror and powerlessness into a weapon to control the one person you depend on.
>This indicates a deep-seated fear of helplessness. In the face of overwhelming external pressures (which may mirror real-life feelings of being trapped or out of control), your instinctual response, as explored in this narrative, is to try and seize control of the interpersonal dynamics around you, even if it means resorting to threats or emotional blackmail.

>Conclusion: A Cry for an Unconditional Witness

>The "[REDACTED]" you created is the fantasy of that perfect witness—someone who can see the worst, the most "filthy" parts of you, and not only accept them but reframe them as beautiful and human. The tragedy of the final scene is that even this idealized fantasy cannot bear the weight of the immediate, all-consuming need. The moment she asks for a tomorrow, she fails the test of providing an immediate, total fix for today's pain.
>>
File: 1741039490392914.jpg (138 KB, 823x978)
138 KB
138 KB JPG
>>107503011
That's nice anon
>>
>>107503011
Have you ever considered that you're just a generic milquetoast faggot internet attention whore?
>>
>>107503054 (Me)
>>107503011
Like you do this because you want to think there's something special about you, but there's not.
There's really not. You're just a nobody like everybody else except you're especially bad at handling that fact probably because you were raised by a single mother who taught you to be a little attention whore by never disciplining you properly and you've reached a point in your life where you're starting to realize it's not because you were actually special but because she didn't care enough about you and being a mother to bother doing the difficult parts of parenting.
>>
Her character relentlessly teased mine for hours, made him beg for more -which only lead to more teasing and build-up, no release-, had him threaten to bring down the whole fictional universe with him, and she still refused to let him cum, choosing to have him burn it all down to the ground before lowering her ego and giving him some amount of pity sex.
Couldn't have been more accurate to a real woman.
>>
>>107503065
what model for this feel.
>>
>>107503054
I am who I am anon.
You might be right, I might be that. But I think it's too late to change it by now.
Maybe that is why I am in this thread.
How about you? Are you a well adjusted person?
Good job, beautiful, loving partner? If so I'm happy for you. But something tells me if you are on a 4chan thread whining about attention whores you are not very mature yourself.

>>107503065
We all do. Or do you not? Do you like to think of yourself as perfectly average?
>>
>>107503079
sharty's troll script, maybe?
>>
File: sans_ama.png (177 KB, 588x640)
177 KB
177 KB PNG
Will people ask him about... you know what, next week?
>>
>>107503228
Gemma 4n(igger) soon
>>
>>107503228
Don't they have a model that can handle translations instead of answering the same questions 3 times? Isn't that like the whole point of this technology?
>>
>>107503246
I hope so.
3n is such a neat little model. I'd love to see a larger better version.
>>
>>107503699
>>107503699
>>107503699



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.