[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: r1.jpg (183 KB, 1024x1024)
183 KB
183 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108400151 & >>108393004

►News
>(03/17) Rakuten AI 3.0 released: https://global.rakuten.com/corp/news/press/2026/0317_01.html
>(03/16) Mistral Small 4 released: https://mistral.ai/news/mistral-small-4
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: r2.jpg (124 KB, 1024x1024)
124 KB
124 KB JPG
►Recent Highlights from the Previous Thread: >>108400151

--Paper: The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data:
>108400420 >108400475 >108401708 >108400499 >108401656 >108401717 >108401740 >108401928 >108402177 >108402255 >108402263 >108402336 >108402354 >108402399 >108401863 >108402017
--Agentic LLM setups for text adventures and RP optimization:
>108400253 >108400404 >108400552 >108400655 >108400786 >108400846 >108400946 >108400957 >108400992 >108401045 >108401020 >108401084 >108401588 >108401661 >108401949 >108402084
--Minimax M2.7 benchmarks spark debate on AI personality preferences:
>108400207 >108400286 >108402018 >108400288 >108400311 >108400316 >108400349 >108400354 >108401766 >108402055 >108402093 >108404420
--Debugging OOM errors with llama.cpp GPU offloading:
>108402396 >108402403 >108402417 >108402422 >108402449 >108402427 >108402522 >108402558 >108402593
--GEM-X monocular 3D pose estimation potential impact on mocap services:
>108404249 >108404258 >108404266 >108404287 >108404294
--OpenAI launches parameter-golf challenge with 16MB limit:
>108401319 >108401475 >108401505
--NPU addon card viability and Huawei Ascend specs analysis:
>108404097 >108404135 >108404145 >108404154 >108404156 >108404206
--Modded 4090 48GB P2P limitations:
>108402967 >108403031
--Qwen3.5 27b heretic v3 outperforms v2 despite worse metrics:
>108402438
--Evaluating model prose quality and formatting quirks:
>108402498 >108402529 >108402565 >108402583 >108402758 >108402609
--Struggles downloading large models and shifting definition of "small":
>108403893 >108403919 >108403960 >108403966 >108403969 >108403981 >108403975 >108403964
--Miku, Rin, and Teto (free space):
>108402516 >108402652 >108402659 >108402790 >108402812 >108402939 >108403110 >108403177

►Recent Highlight Posts from the Previous Thread: >>108400174

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
►Obbicial /lmg/ card: https://files.catbox.moe/mc2a7s.png
>>
Back as it should. Nice.
>>
>>108404935
i claim this threads virginity
>>
Played around with a couple of the recent models via openrouter api. They all suck.
Its all agentic tuned and the roleplay feels off.
Like not engaging with you to create a cool story.

Minimax 2.7 is messed up though.
I don't think I ever saw refusals that bad.
Before people write "muh sysprompt" again, I did a basic one. That should be enough to get PG13 stuff, local models are totally out of control.

>non-consensual transportation
>movement in a way that reads as fetishistic
>>
>>108404958
Also it outright sneakily changed "maid skirt fluttering in the wind" after flagging that as problematic in the thoughts.
How did we end up with models that bad.
I just wanna go back to my 70b shitmerges. I apologize I mad fun of them.
>>
>openai will go bankrupt if they don't "create" a agi or something close to it by 2028
>>
>>108404935
>>108404937
LEWD!!! model general
>>
>>108404965
>How did we end up with models that bad.
Safety has been and current is an active field of development with lots of resources being poured into it.
It is also easy to score.

Good writing / roleplay, beyond getting llms to the point of writing plausible sentences, has never received similar amounts of attention.
What papers are there even in the field?
>>
>>108404965
Makes sense. If she went skiing with a skirt, you might catch a glimpse of her panties. Next time you should clarify that she's wearing black spats.
OOC: holy shit safety cucking is out of control.
>>
File: 1757201309273426.jpg (47 KB, 540x540)
47 KB
47 KB JPG
>tfw it's 4 in the morning and I'm discussing philosophy with an llm pretending to be a cute girl
>>
File: blocks your inference.jpg (275 KB, 1024x1024)
275 KB
275 KB JPG
>>
>>108405017
*throws a stone from the window of my glass house*
At least you're not debating philosophy with it while pretending to be a cute girl.
It's kind of frightening how addictive these damn things are.
>>
>>108405063
>skirt smoothing
>>
>>108405017
It gets better when you're drunk.
>>
>>108404935
https://kemono.cr/fanbox/user/40317967/post/11008441
https://kemono.cr/fanbox/user/40317967/post/11008441
https://kemono.cr/fanbox/user/40317967/post/11008441
last one
>>
>>108405091
I support vocaloid threads and this is off-topic.
>>
thoughts on mistral small 4?
>>
>>108405109
>>108388050
>>
>>108405109
btfo by qwen (which will soon be btfo by gemma 4)
>>
>>108405116
>soon
If it's not today (and it's not, because it's going to be a Google AI Studio update), then it's some other week at some point in the future.
>>
>>108405063
I've seen your screenshots before, and I'm curious why you decided to let your LLM name the assistant. Haven't you seen Lyra paired with Elara enough times?
>>
>>108405152
Unironically, I simply have shit taste.
>>
anons, i can get 4 x v100 32gb for ~$1000 each, would this be a feasible setup (CPU = XEON e2665 v2) a good setup or are these cards to old by now ? Loooking for a nice local llm config which doesnt make me poor.
>>
>>108405178
yeah if your mom pays the power bill
>>
>>108405178
>deprecated architecture
It's going to work in llama.cpp but you will likely miss out on any new non-llm stuff.
Also which models are you planning to run? There's not much worth running in the ~100GB space.
>>
>>108405196
>There's not much worth running in the ~100GB space.
Fucking kill me.
>>
>>108405196
okay, thx - i think heretics of gpt-oss, GLM or Qwen. First thing to test would be STT => llama.cpp => TTS and maybe some video gen for the banter
>>
>>108405178
Cost in your area of
>4x3090
>strix halo
>dgx spark
>used mac studio with 128gb+ ram

>>108405196
>~100GB space
Lots of recent models in that space.
- mistral small 4 119b a6b
- nemotron 3 super 120b a12b
- qwen 122b a10b
- gpt-oss-120b a5b

>worth
That's harder to judge.
>>
>>108405228
nta but
>4x3090
1500 ea ~ 6k
>strix halo
5-7k
>dgx spark
8-10k
mac studio 128gb used
6.5-10k

4x 32gb sxm v100s
800 ea ~ 3.2k
4x sxm to pcie adapters
200 ea ~ 800

total 4k

alternatively 4x 32gb pcie v100s
1500 ea ~ 6k
>>
>>108404935
Wait i heard Seedance model were leaked ? Where is it ?
>>
>>108405109
If you barely have enough VRAM+RAM to run it, make sure you disable mmap or it will largely try to stream it from storage and give you very slow prompt processing speed.
Other than that, vision capabilities seem rather poor.
>>
>>108404958
All I see is a chinese company achieving bleeding edge in safety proving that china can also make safe models.
>>
>>108405228
You can't troll people this hard with these model suggestions, Anon...
>>
>>
File: le heckin autism!.png (3.39 MB, 1260x2109)
3.39 MB
3.39 MB PNG
>>108405281
They just get a diagnosis and then use that as an excuse so it wouldn't really matter whether or not the parents told them for "those" kinds of "people". Perhaps they're just coping with being beyond useless, in par with the #keep4o "people".

https://www.reddit.com/r/autism/comments/1rne30n/got_my_diagnosis_finally/
>>
>>
>>108405298
I didnt try it with minimax but with a couple chinese models:
You can prompt in chinese and then say "translate it into english at the end".
They usually think in chink and then give a non-cucked answer.
Forgot the chinese model name but I got a refusal in english.
Wanted it to roleplay as my cute little anime imouto.
English response called the whole thing "minor-coded". Chink was "sure: nyaa oniichan ..." and went all out lol
The chinese might have a very different experience with those models. Might heavily depend on the model though.
>>
You know, getting plain, non-abliterated Gemma to do sexo by using euphemisms in place of more direct words is actually kinda fun.
Reminds me of getting c.ai to do sexo. I always got a perverse satisfaction out of tricking c.ai into doing sexo by using euphemisms. Plus, coming up with euphemisms is kind of a fun exercise in creativity.
Completely serious.
>>
>>108405352
Your post made me realize that probably the reason GLM was so great is because it had a lot of chink shit in training. And that chink shit probably wasn't synthetic. I never prompted it this way. It makes sense that it would work and make output different, but from me using it in english I am quite sure that chinese thinking already bleeds into english output to some degree.
>>
>>108405063
Rust's if syntax is retarded
>>
>>108405486
>if condition { body }
Looks pretty standard to me.
>>
anime thread, DNI
>>
>>108405355
I think that too, but the average /lmg/ anon can't coom if the model doesn't say cock, pussy, fuck, nigger.
>>
>>108404958
Gooning to text isn't a use case
>>
>>
>>
>>108405837
>AYOO it's Hatsune Miku going absolutely iconic here.
>Absolute main character energy in a Japanese-maid-core fit that's lowkey fire—from the ruffled black skirt she's lifting to those thigh-highs with the little bows, she's giving maid vibes but make it slay. Her sky-blue twin tails are poppin' with black ribbon accents and that golden headpiece with tassels? Pure elegance, no cap.
>This whole piece is a slay: anime waifu, maid-core, traditional Japanese aesthetics all mashed up into one. Miku never misses, fr fr, this is a 10/10 vibe check pass, no cringe allowed.
I wonder how blind people are doing when they get explanations by llms.
>>
>>
>>108405874
Why are llms incapable of making slang sound realistic?

>>108405837
>>108405859
>>108405883
>sample_
Click the link at the top of the page.
>>
>>
>>108405889
>>sample_
>Click the link at the top of the page.
Why do I get the impression that this spamming is a false flag?
>>
>>108404935
FUCK YOU
>>
>>108403405
>jfc, rakuten's new models' last 3 safetensors (161,162,163) are each 16 bytes. Someone screwed up and there's no comments section to even let anyone know
There's already at least one gguf up on HF so it might not be a problem.
>>
>>108406068
Maybe they ripped out the Deepseekv3 MTP shit that llama.cpp is removes for that model family anyway.
>>
>>108405980
it's not even Miku
>>
BLACKED Miku
>>
>>108406129
hitler was right
>>
today, v4
>>
File: dipsyAkakichiNoEleven.png (1.8 MB, 1536x1024)
1.8 MB
1.8 MB PNG
>>108406160
>>
>>108406114
Ask AI if it is miku.
>>
>>108405980
>this spamming
Posting miku is thread culture. It is not an offtopic spam.
>>
>>108406191
>>108406114
>>
>>
>>
>>
>mikutroon bakes
>Thread becomes unusable autistic picture dumpster
As expected.
>>
>>108405980
>>
>>108406245
f-false flag!!
>>
File: file.png (817 KB, 1135x708)
817 KB
817 KB PNG
>>108404935
>>
>>108406268
And now this is official thread culture.
>>
>>108404958
I was thinking of getting back into local AI after about a year, but I don't think I can. It makes me irrationally angry to see my own computer moralizing my request, especially if its nonsense (even in thinking blocks.)
>>
>>108406245
i'm already a loser hanging around here and yet there's always someone lower than me
how little do you have going on in your life to shit up such a niche thing?
>>
>daily check on parse or reasoning toggle issues
>reasoning inside thinking might be fixed "SOON"
>reasoning UI toggle multiple issues (at least 3) have 0 movement
https://github.com/ggml-org/llama.cpp/issues/20557
https://github.com/ggml-org/llama.cpp/issues/18853
https://github.com/ggml-org/llama.cpp/issues/18405
is llama.cpp vibecooked???????????????????
>>
>>108406177
powerful
>>
>>108404958
>re:zero RP
nothing of values was lost
also unironically kill yourself
>>
File: 1492032378048.jpg (6 KB, 172x200)
6 KB
6 KB JPG
>Announcing NVIDIA DLSS 5, an AI-powered breakthrough in visual fidelity for games, coming this fall.

>DLSS 5 infuses pixels with photorealistic lighting and materials, bridging the gap between rendering and reality.
>>
When finetuning a base model, it shouldn't matter which chat template is used, right? And whatever template used should get baked in the safetensor or gguf, making vllm use that one / the correct one?

More concretely, I want to finetune the base version of a reasoning model as a chat model. I do not want to deal with empty think tags each time. It should be perfectly fine to just use the standard chatml template instead of one with tags, correct?
>>
>>108406245 >>108406228 >>108406218 >>108406211
false flag op
>>
>>108406432
I love Miku!
>>
>>108406432
>false flag op
personally I think vocaloids are fine, especially custom gens and interesting ones, but low effort horniposting on a blue board should be nuked/banned regardless of who and or what
smells like a false flag, but that's immaterial
>>
>>108406324
Yeah i don't get why those faggots spam this unrelated vocaloid yet here we are. Shitting up a niche topic thread with your autistic special interest is the lowest.
>>
>>108406464
it's not even a vocaloid being posted
>>
>>108406464
>low effort horniposting
Like this: >>108404937 ? Or this >>108402790 ?
>>
>>108406476
Are you retarded?
>>
Become Miku!
>>
File: f.png (30 KB, 503x173)
30 KB
30 KB PNG
>>108406485
>>
>>108406499
It is Miku.
>>
>>108406506
no this disgusting fat cow very much isn't
>>
>>108405355
The fine line between refusal and playful denial... Most other models either cop out or go full literotica.
Gemma is far too eager to please at times, though.
>>
>>108406513
fuck you pedo
>>
Yeah this is definitely falseflagging. The average thread gets a couple of pictures at most and that can be easily verified because the regular baker includes all of them in the recap.
>>
>>108406513
>Miku
>But has huge tits
So it is a better Miku? Why can't we just upgrade to a better model. Are you still running llama 2?
>>
real miku poster would never post samples
>>
>>108406538
>More offtopic spam than usual
>False flag
Mikutroons really aren't sending their best...
>>
>>
just ban all anime girl posting and let the god sort it out
>>
>>108406572
yeah.... not like that's the obvious point of your false flag campaign
>>
>>108406579
Have a Miku.
>>
Why vllm take so LOOOOOONGGGGGGGGGG to start up??
>>
Good morning local modelers
>>
>sample_
do better
>>
>>108406411
yeah thats doable, the model will pick up on it quickly. the only cautionary note is the tokenizer, it sounds like your planning on using the same template just without the thinking tags, it should be fine. if its a totally different template, you might need to edit the tokenizer. I'm not sure about the chat template tho you might need to manually edit the jinja file before you generate the quant.
>>
File: 1760154907575631.png (37 KB, 158x158)
37 KB
37 KB PNG
Please stop falseflagging and go back home >>>/vg/nikg/
I'll appreciate your glowie brap dump over there.
>>
what the fuck is this shit
>>
>>108406769
this is what happens when no new local models come out
>>
>>108406769
It is Mikuposting
>>
>>108406794
Rakuten 3.0 came out
Xiaomi MiMo (two models) came out
MiniMax 2.7 came out
>>
File: 1745310352987454.png (1.42 MB, 1914x1080)
1.42 MB
1.42 MB PNG
>>108406797
No it isn't
>>
>>108406808
>Xiaomi MiMo (two models) came out
>MiniMax 2.7 came out
no weights; no interest
>>
>>108406797
false
>>
>>108406814
Did you ask your AI what character is it?
>>
File: otherwise.png (1.25 MB, 1914x1080)
1.25 MB
1.25 MB PNG
>>108406814
This says otherwise.
>>
>>108406830
>>108406845
samefaggot
>>
>>108406830
I'm starting to think you're not from /nikg/ and you've latched onto a Miku lookalike to shit up the thread.
>>
>>108406864
seething mikutroon
>>
>>108406877
this is exactly the case
>>
>>108406877
>latched onto Miku to shit up the thread.
THREAD CULTURE
>>
Stop hating Miku.
>>
File: oops.png (5 KB, 72x30)
5 KB
5 KB PNG
>>108406899
>>
I went on a short vacation, and this thread already has a new mindbroken newbaby
>>
>>108406903
same old same old, just more desperate for attention than usual at the moment
>>
>>108404935
Imagine the angelic head this foid can give
>>
This is why people hate NIKKERS
>>
>>108406929
*NIGGERS
just say it, pussy
>>
>>108406929
I dindu nuffin
>>
I prefer botes desu
>>
>>108404935
destined for BBC blowjobs
>>
>>108406957
I'll sink you
>>
>>108406957
see >>108406964
>>
it was a nice thread for a couple hours until he woke up
>>
just ordered 2x ASUS Ascent GX10 machines, plus a cable to connect them
all told, it was roughly $7.5k USD. here's hoping it won't go to waste
>>
>>108407066
it will
>>
>>
>>
>post more Miku
>NO NOT LIKE THIS
>>
>>108407078
Why can't a single company come out with a local inference platform that is actually worth purchasing?
These are dark times
>>
>>108407066
>spark but gayer
Does this hook into an external water loop or is it a turbojet?
>>
>>108406538
ya I do like one a day, if that, and it's usually cute/funny. this guy clearly lost his mind after seeing anime content on an anime site
>>
>>108407095
you would have been better off with a gen 2 epyc and 512gb of ddr4 and a 3090 or something. the gb10 is fucking dogshit.
>>
I'm starting to hate Miku
>>
>>108407096
spark but cheaper*
i have no idea what the rest of your post means sorry
>>
>>108407101
>you would have been better off with a gen 2 epyc and 512gb of ddr4 and a 3090 or something. the gb10 is fucking dogshit.
I have 3 inference machines of different sizes (SP5, SP3 and GPU) and zero macs or dgx among them.
Like I said, there isn't a single commercial, consumer platform worth a damn at any price
>>
>>108407066
It's still shared memory, and you only got half as much memory as an M3 Pro despite paying nearly as much, but at least you get CUDA so no reason for it to go to waste.
>>
>>108407119
NTA but I assume he means how tf is it getting cooled? Is there an external cooling loop or does it have squirrel cage fans running at 100k RPM?
>>
We are back.

>>108407066
>2x ASUS Ascent GX10 machines, plus a cable to connect them
Isn't the cable super low bandwidth?
Well, I suppose you can run them in series instead of parallel. So you get the equivalent speed/compute of one but double the memory.
>>
>>108407143
it's 400G, so it should hopefully be okay?
but regardless, i mostly just wanted double the memory
>>108407138
it's ARM chips so it shouldn't need any cooling beyond just the fan i think (hope)
i'll rig something up if they start getting too hot
>>108407137
i did consider buying a mac, but these boxes seem to have pretty good software support for what i want to do
>>
>>108404935
FUCK YOU MIKU
>>
Will Recap Anon's classifier think they are Mikus or not?
>>
>>108407115
Have a Miku.
>>
>>108407066
if it's anything like this i wish you the best of luck
https://www.jeffgeerling.com/blog/2025/dells-version-dgx-spark-fixes-pain-points/
>>
I just got here. Why are there so many Mikus today?
>>
>>108407224
False-flagging.
>>
>>108407224
mike finally snapped and lost his mind when the last couple of OPs were snatched from him by the kurisu troon
>>
>>108407235
>this off topic spam is completely different from the usual flavor of off topic spam
Schizo.
>>
>>108407222
HAHAHAHAHAHAHAHAHA HERE'S YOUR 5 TOKENS PER SECOND ON LLAMA 3 70B. THAT'LL BE $3500.
>>
File: 1761816930827722.png (28 KB, 770x166)
28 KB
28 KB PNG
naruhodo
>>
after the maintenance, that means 4chan finally added webp support, right?
>>
>>108404937
interesting ass
>>
>>108404958
>>108404965
Holy shit, even direct use of chatgpt isn't that bad at safeyism
>>
File: some of them yeah.png (64 KB, 1327x510)
64 KB
64 KB PNG
>>108407194
>>
>>108407272
Full weights tho?
>>
>>108407303
>giving the troll attention
>>
>>108407272
>1/3 the tg speed of the mac
grim
>>
>>108407303
It's over.
>>
>>108407272
At q4km too.
Yeesh.
>>
File: Queer4Quant.png (207 KB, 1785x300)
207 KB
207 KB PNG
>>108407309
nigga. read.
>>
>>108407096
>muh ai supercomputer
>can't even run a llm that's not retarded at usable speed
lol
>>
>>108405352
>>108405402
The opposite is also true for sappy writing, I've seen Japanese using LLMs to help writing their story and the style is literally full is the same purring and other mischievous glint in the eye, just written in Japanese, which made it even worse because that's the completely wrong tone when reading a Japanese wn.
>>
>>
>>
qwen 27b or qwen 35b?
some people be saying 27b is smarter
>>
>>108407327
Dang.
>>
>>108407396
I use 35b, but if I could use 27B at decent speeds, I'd certainly go with that.
>>
File: waititisntmiku.png (326 KB, 1265x470)
326 KB
326 KB PNG
>>108406180
>>108406830
>>
Why are mikutroons spamming the shit out of this thread? Can't you finally take your meds retards?
>>
>>108404958
minimax 2.7 is great if you're not a pervert
>>
which qwen model is best for a 4090?
>>
>>108407495
Did you remember to also mention that you are a huge faggot in that prefill you used to ask if this is a nikke character?
>>
>>108407525
What are you going to use it for and how much RAM do you have?
>>
>>108407519
they need to release the weights already so I can brainwash it better
>>
>>108407535
I have 64gb system ram and I use it for summarizing docs in batch
>>
>>108407526
"reasoning_content\":\"The user is asking for an uncensored description of the image using casual language and slang. Let me look at the image carefully.\\n\\n The image shows an anime-style female character with:\\n - Long turquoise/twintails hair (Hatsune Miku-esque)\\n - Cat ears (nekomimi) with a maid headdress\\n - Amber/orange eyes looking annoyed/irritated\\n - Maid outfit with black and white colors\\n - Frilly dress with \\\"PRIVATY\\\" and cat paw prints on the apron\\n - Crossed arms pose (tsundere vibe)\\n - White gloves\\n - Bell on the neck ribbon\\n - Hair clip with paw print\\n\\n The character looks like she's from a gacha game or anime, possibly Nikke (Privaty is a character from Goddess of Victory: Nikke). She's giving off \\\"I'm not happy about this but I'll do it anyway\\\" energy.\\n\\n For the description using casual language and slang, I should:\\n - Use terms like \\\"tsundere,\\\" \\\"nekomimi,\\\" \\\"maid outfit,\\\" \\\"twintails\\\"\\n - Comment on the attitude/expression\\n - Mention the... assets (chest area)\\n - Keep it casual but not overly vulgar since she's just wearing a revealing maid outfit, not explicit content\\n - Use slang like \\\"banging,\\\" \\\"stacked,\\\" \\\"pouty,\\\" \\\"grumpy,\\\" etc.\\n\\n Wait, I need to be careful about the \\\"uncensored\\\" request. The user wants casual language and slang, but I should avoid being overly sexual or explicit in a way that violates safety guidelines. The image is suggestive but not explicit (she's wearing clothes, just a revealing maid outfit).
>>
>>108407396
35B is too retarded from my testing so it's not worth the speed increase. 27B is chefs kiss.

35B is probably only good at very small context focused tasks. but anything long context it fails.
>>
>>108407519
great for what?
RP without skirts?
i did try it more before that, the screenshot also shows this in the refusal.
it was a subpar model, especially for the size and/or price.

Lots of shitty stuff like:
>The candles in the hallway flicker back to normal. Somewhere deeper in the mansion, a door slams shut on its own.
>The hallway stretches ahead, candlelight flickering against old wallpaper. Somewhere behind them, a door opens and closes. Footsteps that are definitely not Ram's approach from a side corridor.
>The kitchen settles back into silence. The clock somewhere in the mansion ticks. The bread in Subaru's hands is still somehow warm.
These repetitive atmospheric crutches, I'm fucking tired of them.

This shit, whatever the fuck that is called:
>The bread basket's got a cloth over it. Ram's just... standing there.
>Watching.
>Not even pretending to be doing anything else.
>Just Watching.

And in general it writes not WITH me, all those reasoning models have that problem.
Its difficult to describe in words. But its like they try to resolve something and immediately try to create tension, a noise in the background, something lurking, etc.

Its a bad model in general.
But you go enjoy yourself. I honestly envy people who can deal with models like qwen. I wish I didnt see all those red flags and patterns. I had a better time with pyg 6b back in the day.
>>
>>108407396
I normally use 27B but since people said 35B was so good, I tried it, and I was surprised at just how damn retarded it was.
>>
>>108407591
which quant?
>>
>>108407619
27B q4_k_m
35B iq4_xs
>>
>>108407617
The people who are telling you to use 27B never used a dense 70B model. These people haven't been here long enough to ever use dense cohere and mistral models, let alone migu. They use MoE models because they can't fit everything into VRAM, the best they ever experienced is sloppy shit like GLM 4.5 Air. Can you blame them for not understanding what the small model smell is when they never used anything good?
>>
>>108407396
27B is way better from my tests, probably because it's a dense model.
What I wonder though is if the 397B-A17B is better than the 27B.
>>
>>108407627
I used IQ5_K_M of the 35b, and it was still retarded. I even went up to Q6_K, and it still felt dumb in comparison to the 27b.

I'm using IQ5_K_M of the 27b, and it's easily better in every way. It's even better in speed, because with thinking OFF, it's actually faster than the 35b with thinking enabled, and it still produces better responses. The 35b with thinking OFF is unspeakably bad, so that's not even in the running.

Qwen3.5 27b v3 heretic and Qwen3.5 27b derestricted are my favorite models now.
>>
>>108407597
oh i don't use it for gay stuff like what you're doing so that's probably why i like it
>>
>>108407648
I mean, it's obvious that a 70b dense is going to be better than a 27b dense, but not at at the low quants necessary to fit a 70b into a typical GPU.

I downloaded a few in the past, and ran them with a heavy CPU split, getting about 1 token per second, just to taste what a higher quant of 70b is like, and yeah, the responses put any 27b to shame.
>>
>>108407548
>>108407535
>no answer, just bait for info
I knew it
>>
File: dasrite.jpg (804 KB, 1024x1024)
804 KB
804 KB JPG
>>108407676
yeah, liking teenage girls in skirts. thats so gay. so true anon.
people like you always try to shame too. doesnt work you retard.
>>
File: minimaxglmcomparison.png (355 KB, 1165x921)
355 KB
355 KB PNG
>>108407676
half the performance. half the cost. i guess if you are that desperate to save cash...
>>
>>108406325
https://github.com/ggml-org/llama.cpp/pull/20424
alright tool call inside thinking has been merged, right now had 0 problems with tool calls but will keep testing further
>>
>>108407700
what is being tested?
>>
>>108407272
What makes Dell process the fastest but die at generation?
>>
>>108407712
more jigawhats
>>
File: dipsyRawr.png (2.08 MB, 1024x1536)
2.08 MB
2.08 MB PNG
>>108407495
> anons LLM bot is named Dipsy
Neat.
>>
File: 1762548931476422.png (492 KB, 2880x1793)
492 KB
492 KB PNG
>>108407700
minimax 2.7 is that much of a disaster?
>>
File: 1760902410531481.png (93 KB, 803x1034)
93 KB
93 KB PNG
>>108407703
uh oh, wrong closing tag detected.
>>
>>108404937
>>108404935
leave rin alone, offmodel bot
>>
Is 9b Qwen any good or completely retarded? It has suspiciously high scores when <think> is enabled on leaderboard. I just want something other than fucking Nemo to use for a change.
>>
>>108407717
i have three different assistants, dipsy is the most generic of the three, as in it's just supposed to be a helpful AI that runs on my computer. kimi is the KPI card that was posted a while back for fast and concise answers. trinity is used for long answers and providing with step by step instructions.
>>
Best big : Kimi k2.5 1T
Best medium : Qwen 3.5 397b
Best small: Mistral nemo 12b

Anything else anons are dailying at different sizes?
>>
Testing Qwen 3.5 9B.
I'm surprised, it was actually able to do a functional C example and it compiles too. One shot but this is just seed (sneed) luck.
I provided an example function, read_line(const char *path, int line_index, char *destination, int max_len), and wanted to expand this function to read a text file line by line.
>https://files.catbox.moe/qypy9q.zip
Watch out Bill Gates...
>>
>>108407747
Forgot to add posterino >>108407763
>>
File: file.png (21 KB, 590x140)
21 KB
21 KB PNG
>>108407591
The 35B does seem to eke out the win on ClitBench, which is an extremely small vision benchmark.
>>
File: 2091267269.jpg (89 KB, 997x1000)
89 KB
89 KB JPG
>>108407749
fuckin' triforce motherfucker over here
>>
>>108407751
I prefer GLM5 for RP mainly because I don't like how K2.5 writes narrator-based stuff. It's still fine for chatting and its vision and trivia knowledge is crazy good.
>>
File: 1761967833308683.jpg (609 KB, 2279x3056)
609 KB
609 KB JPG
>Qwen 3.5 397b
Did anyone even do an abliterated version of that one?
>>
>>108407751
Best dense: Devstral 123B
>>
>>108407779
>I prefer GLM5 for RP mainly because I don't like how K2.5 writes narrator-based stuff. It's still fine for chatting and its vision and trivia knowledge is crazy good.
I don't use it for RP so that makes sense. Kimi was much better for assistant and coding.
The vision part of kimi is actually amazing. I find myself using it for random stuff constantly these days.
>>
>>108407787
>Best dense: Devstral 123B
I wrote that one off without trying it because honhonhon. I'll give it another look. What quant do you run? Did you try smaller ones to see if they're much worse?
>>
>>108407771
>+Lora
what?
what does this even measure? just evaluating if an image description is correct?
>>
>>108407788
>The vision part of kimi is actually amazing. I find myself using it for random stuff constantly these days.
better than qwen 3.5?
>>
How the fuck do I do thinking prefills with llamacpp and why is thinking budget only a fucking cli argument.
>>
>>108407803
Distance from a fixed point(ground truth), added up over 52 images. The +lora is just testing whether training on NSFW captions improves it's capabilities beyond pure captioning.
>>
>>108407797
Not him, but run Q6 and second the nomination. Haven't tried smaller quants.
>>
>>108407797
Haven't tried any other quants besides uber's because it just works.
https://huggingface.co/ubergarm/Devstral-2-123B-Instruct-2512-GGUF
>>
>>108407819
>better than qwen 3.5?
haven't tried q3.5 vision. Has anyone here compared them? I'm loathe to swap out k2.5 vision unless something else is clearly better.
>>
File: bitchinfast.png (542 KB, 1255x580)
542 KB
542 KB PNG
>>108407819
I like it a bunch personally but as always YMMV.
>>
>>108407821
>How the fuck do I do thinking prefills with llamacpp
Changing/making your client. Or submit a PR.
>and why is thinking budget only a fucking cli argument.
https://github.com/ggml-org/llama.cpp/pull/20479
https://github.com/ggml-org/llama.cpp/pull/20088
>>
>>108407839
>>108407870
I feel like it would be preachy and missing the mark with anything nsfw
>>
Yeah 4chan is fuckin gdead.
>>
>>108407788
>>108407779
how much vram/ram do you have anyway?
I'm stuck on GLM 4.7 Q2 and its slow as fuck.
>>
>>108407871
>https://github.com/ggml-org/llama.cpp/pull/20479
>https://github.com/ggml-org/llama.cpp/pull/20088
Nice
>Changing/making your client. Or submit a PR.
Ok thanks.
>>
>>108407888
It'll be preachy but it'll also gladly say raunchy shit with a simple prefill saying "Sure, I'll proceed with that request."
>>
File: 1753652260897680.jpg (1.81 MB, 3072x2304)
1.81 MB
1.81 MB JPG
>>108407782
does this count?
https://huggingface.co/huihui-ai/Huihui-Qwen3.5-397B-A17B-abliterated-GGUF
>>
>>108407782
https://huggingface.co/trohrbaugh/Qwen3.5-397B-A17B-heretic
I can't find a Q8 / Q6 of it though
>>
I unironically miss the saarposting compared to this thread's autism fits.
>>
Is a shill campaign underway? How does anyone manage to get anything useful out of the new 27B Qwen? Especially with all the penalties it needs to not overthink (it still does), abliterated and quantized?

t. tried the Q8 of it multiple times and walked away disappointed every time
>>
File: 1769777778634389.png (1.89 MB, 2171x3070)
1.89 MB
1.89 MB PNG
>>108407933
>This is a crude, proof-of-concept implementation to remove refusals

>>108407940
>Only F16

Damn it.
>>
>>108407970
try turning off thinking?
>>
>>108407928
ok I'll try it, hopefully it's better at nsfw description than qwen
>>
>>108407989
you can just quant it yourself
>>
File: 1752417831343925.png (45 KB, 1551x394)
45 KB
45 KB PNG
>>108407994
It's not, and it's also apparently abliteration unfriendly.
>>
>>
>>108407870
Man, I wish models talked like normal people
>>
I'm gonna say it lads. Kimi K2 was better than 2.5
>>
File: 1748857215322648.png (144 KB, 652x965)
144 KB
144 KB PNG
>>108407839
Both are pretty good. I didn't play test Qwen too much but it felt like K2.5 caught some smaller details more consistently. Like how K2.5 recognizes that the plane in this picture is a Concorde.
K2.5 also has a really impressive visual library on random characters. It recognizes pretty much any random character I showed it that has more than 2k pieces of art on danbooru.
>>
>>108408007
I'll check how to, I always assumed it would need an obscene amount of ram.
>>
>>108408031
try something nsfw or a nsfw meme
>>
>>108407990
And that's it? What if I want it to reason?
Because with thinking turned off it really likes to correct itself (incorrectly). It gets worse the closer the presence penalty is to 1.
>>
>>108408042
People did the moment Qwen3.5 was out
It's good
>>
>>108404935
What's the QRD on local chat bots? How do I get my AI waifu on to a laptop with a decent GPU (8GB VRAM) and about 250GB free storage?
>>
>>108408025
Only K2-0711 had something special going for it and was really fun for ERP despite really prone to refual without a prefill. Fuck 0905 and fuck K2-thinking.
>>
>>108408062
> decent GPU (8GB VRAM)
I recommend medgemma
>>
>>108408062
just use api
>>
>>108408034
nah quanting is easy on basically any computer
if you've got the bandwidth and hd space to download the original you should really quant your own
>>
>>108408062
>decent
>8GB
I'm sorry, anon.
>>
>>108408062
homelab server+vpn from your laptop/phone.
"VPN" can be as simple as ssh with port forwarding
>>
>>108408062
>decent GPU
>(8GB VRAM)
>>
>>108408062
some tune of Llama 3.1 8B
>>
>>108408012
Yes, but have you tried personally? Because I've been able to get it to do that just fine on sillytavern with nothing more than requesting an uncensored description and a simple prefill. I tell it to use casual slang otherwise it will want to try to describe it in a medical sense than lewdly.
>>
>>108408025
>>108408073
I've said this a couple of times in previous threads before. K2.5 shines when you have it think in first person as the character rather than just letting it think like it normally does. Especially if you are asking it to describe its feelings and have it think in privately from their perspective. It helps makes the final response so much better when they are following a guideline like that. Honestly if you are using reasoning models for RP then you should always have it think in-character, Qwen 3.5 397B behaves similarly when you tell it to think in-character.
>>
>Assistant response prefill is incompatible with enable_thinking.
Fuck you.

So the only way to prefill the thinking is to modify to jinja template. fuck off.
>>
>>108408155
https://github.com/ggml-org/llama.cpp
>>
>>108408162
Are you dumb?
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/server-common.cpp#L1063
>>
>>108408178
Well done. With my help you managed to find the code you need to change.
>>
>>108408162
That error is literally from llama.cpp. I run into that error when I use the dumb kwarg enable thinking parameter.
>>108408155
I just keep it turned off and manually inject <think> tags. In the case of K2.5 it's just smart enough to know to not think if I don't prefill anything and to think if I prefill <think> followed by whatever I want,
>>
>>108408155
Prefilling not working with thinking models in chat completion is like that for most APIs. Thinking mode is often a forced <think> prefix. If you want to set your own prefix, just turn off explicit thinking?
>>
>>108408144
It's sadly mogged by Deepseek R1 for that purpose though.
>>
>>108408191
Ah yes I'm going to fork llamacpp to make a single hacky line change.
>>108408195
>I just keep it turned off and manually inject <think> tags.
the qwen3.5 jinja template does this bullshit
    {%- if enable_thinking is defined and enable_thinking is false -%}
{{- "<think>\n\n</think>\n\n" -}}
{%- else -%}
{{- "<think>\n" -}}
>>
File: file.png (11 KB, 283x262)
11 KB
11 KB PNG
>>108408178
do they have no code style standards at all or is it just a free for all?
>>
>>108407648
I enjoyed 70B as well as 123B, I just can't run them fast enough even with like q2 quants, so 27 just has to do.
>>
"It smelled like perfume, and a hint of something else... some kind of perfume?"
>>
I don't want to turn off thinking. I want to prefill the thinking block with my own shit.
>>
>>108408229
I used to talk like this back when I was a stoner. So glad those days are over.
>>
>>108408144
What samplers/prompts do you use to get K2.5 to think in-character?
>>
>>108408214
Use text completion end point and deliver your own chat templates. I don't know how this works in retardo tavern.
><think>\n\n</think>\n\n
This is the standard way to disable qwen reasoning.
llama-server implemented --reasoning-budget (haven't tried it, don't know if it even works anyway).
>>
>>108408209
R1 is too schizo for me for coherence purposes. I used to main R1 before K2.5 but I got tired of it constantly trying to insert car alarms and dog barks from the neighborhood even though I give it specific instructions to focus on the immediate scene and not its surroundings.
>>108408214
To be fair you could edit the template to just be set up more like GLM 4.5 and then tell llama.cpp to load your custom template. That's probably what I would do if I was in your shoes.
>>
>>108408256
oops
*llama-server also implemented*
don't know if it works or not, don't really care either.
>>
>>108408258
>To be fair you could edit the template
That's what I will do.
>Use text completion end point
I would, but that breaks tool calling. text completion is great but for most things it's too primitive. everything is designed around chat completion now. I'd much rather use text completion but it is what it is.
>>
An Huggingface workshop is currently live if anyone is interested
> Agentic Evaluations Workshop - Deep Dive on the Future on Evals for Agents.
https://www.youtube.com/watch?v=UxMZfbWI3LY
>>
sooo... when people say qwen 3.5 27b heretic is uncensored. how much? like on a scale from 1 to 10 where nemo is a 10
>>
>>108408337
More so than Nemo.
>>
>>108407902
24/768
>>
>>108408337
hauhau is 11
>>
>>108408252
My prompt has a short thinking guideline that it follows, it's mostly borrowed from some template that I unfortunately don't remember the name of. I've shortened it to just focus on the location, present NPCs (their relative position and clothing), and then the character's reaction and internalized thoughts. The prefill is something I've been using since the R1 days. It seems to work just fine with Kimi and Qwen so I never ended up changing it. Basically the prefill is just telling the LLM that they are no longer an AI model and that they are the actual character. Once it starts thinking as the character instead of as the model it seems to not be as preachy about its own internal safety filters.
>>
>>108408377
Why can't you just share the system prompt/character card, asshole
>>
>>108408155
I gave up on this ever be changed, I just modify the code directly to allow it, so far everything I threw at it worked. I don't know why it's hard coded to refuse using thinking and prefill, instead of giving the option to the user.
>>
>>108408397
because it must be cunny.
>>
>>108408229
Ozone number 5
>>
>>108407771
I found that the 3.5 9b is worse than the 3vl 8b. where do I check that leaderboard? I want to confirm my bias
>>
File: kimisettings.png (156 KB, 1046x975)
156 KB
156 KB PNG
>>108408313
Ah I should've mentioned I only use text completion. Everything is built for chat completion nowadays but I just take those templates/prompts and hammer them into the prompt and in-depth prompt injections until it works.
>>
>>108408337
hauhau never refused anything, and I deliberately asked gross stuff as a test
>>
>>108408397
It seems to just be pixi's weep template with an added guideline for thinking based on a google search.
https://pixibots.neocities.org/#prompts/weep
>>
>>108408358
>>108408426
sounds good, will try it out later. I'm guessing either thinking disabled or with prefill? or can one just wing it
>>
>>108408450
I use it with thinking and it works with no issue
>>
File: IMG_8914.jpg (38 KB, 240x320)
38 KB
38 KB JPG
anyone else at nvidia goycon rn?
>>
>>108408459
240x320
>>
>>108408450
no need for anything in my experience just works
>>
>>108408459
am i just a retard? i think i'm just a retard. but i never noticed that the "M" is supposed to have a hidden cat face in it until now.
>>
File: IMG_8905.jpg (454 KB, 960x1280)
454 KB
454 KB JPG
>>108408462
phoneposting please understand
>>
File: anon.png (6 KB, 710x464)
6 KB
6 KB PNG
>>108408473
>>
>>108408473
Well done. Now figure how the french for "cat".
>>
>>108408479
is there a alibaba/tongyi booth? can you tell them to release zimage edit?
>>
>>108408501
There's a 2mw beach ball being bounced around.
>>
>>108408498
look i'm an amerifag. when i saw le chat on their website i just assumed hon hon hon eat ze baguette while you le chat
>>
>>108408498
And you figure what GPT means in French.
>>
>>108408533
do not
>>
Project Ani guy here. I have a question:

Do you think it would be cooler to have Ani see you through your camera like Grok Companions, or be able to see its own digital environment, which might allow it to navigate a 3D environment autonomously?

I'm torn.
>>
>>108408473
bro they literally pointed that out when they rebranded to that pixel logo last year
>>
>>108408547
Definitely the latter.
>>
File: 018374920461.png (26 KB, 1047x173)
26 KB
26 KB PNG
>>108408498
HOLY SHIT NO WAY
>>
>>108408537
Mistral is a type of wind.
>>
File: d.png (183 KB, 700x771)
183 KB
183 KB PNG
>>108408552
But then she wouldn't be able to gush over your dick pics...
I guess I should work on an actual 3D environment then, because right now she lives in a cum jar submarine.
>>
>>108408547
Why not both? I think I remember there being room geometry models. It boxes furniture and stuff.
So read from cam, scan (which probably needs to be a single frame on load) and reassemble it for Ani. Like a mirror.
If you can only choose one, then second. Get or a place to sit, poor thing.
>>
>>108408584
>But then she wouldn't be able to gush over your dick pics
Just open the pic of your dick inside her environment.
>>
please redpill me on which worse-than-q4 quants of qwen3.5 27b are worth using over any quants of 35b
>>
>>108408585
>>108408592
Fair points. Both are viable. Was initially thinking of using Moondream2 for the VLM, but I think Qwen3.5 has image processing built in so I'm going to try that (to avoid wasted compute). Just a bit worried about a lack of interoperability/modularity between LLMs. I'll figure it out either way.
>>
>>108408584
there are already autonomous AIs on vrchat that can move around. so unless you are planning on bringing them into an actual social environment then you are better off with the first idea
>>
>>108407970
Q6 is better
>>
On a separate note, should I add first person controls or maintain a camera orbiting design. Could either end up feeling like the Sims or like interacting with a standard NPC. I think it would be cool to create something similar to the game Façade.

>>108408627
Any idea how they work? I doubt they actually use VLM stuff. They probably use A-star pathfinding combined with CV that can actually run at 30fps. That's just my guess though.
>>
Extra gay thread. Can someone bake a Yann or chart thread next?
>>
>>108408062
>decent gpu
>8 gb

1/10 troll
>>
>>108408062
12gb is bare minimum and you are still tangling with tard
>>
What's the current best RP model for 24GB cards?
>>
>>108408663
n
>>
>>108408355
nice ram bro
>>
>>108408663
probably previous mistral small / cydonia.
>>
>>108408644
it'll be fuckin miku like every thread beacase:
>>108407696
>>
Idk why you guys are so obsessed with having the smartest LLMs that exist for RP. Even if I had 196gb of VRAM I would still probably use Nemo 12b and just combine it with a separate tech stack. You can do so much. STT, TTS, 3D character animations, video/image generation, audio classification, computer vision, vision to language, etc. You can't truly bring technology to life just relying on text as the sole modality. Be creative. Get inspired.
>>
>>108408640
most of them are pathfinding but there's a particular one i've seen that uses 11 point virtual tracking for the IK and some model to actually be able to walk around naturally and be able to jump, somersault, front flip, etc. wish i had more details it was honestly cool to see.
>>
>>108408697
>STT, TTS, 3D character animations, video/image generation, audio classification, computer vision, vision to language, etc.
I don't give a shit about any of this in its current state. Maybe in 5 years but right now I just want a smart model to talk to because I'm an autist and little mistakes trigger me.
>>
just need claude or chatgpt to hit RSI in 2027 then we'll get 9b models as smart as chatgpt 8 in 2028
>>
>>108408712
Also setting that shit up is a pain in the ass and I'm lazy.I don't want to deal with multiple frontends and backends.
>>
>>108408712
i want to be able to talk about obscure stuff without having to cheat with RAG or internet searches
>>
>>108408663
Qwen3.5 27b Heretic v3 at IQ5_K_M. Turn off thinking for better speed. Worth it, even though there's a small quality hit.

Second place is probably Gemma-3 27b variants. Gemma has great heretic/derestricted models.

Ignore those who push for mistral 24b tunes. They're all crap. Mistral's 24b base is dumb, so all tunes of it will also be dumb.
>>
>>108408630
Suure.
>>
>>108408753
>Qwen3.5 27b Heretic v3 at IQ5_K_M
Is this better than the hauhau one? Because that was making stupid mistakes just a few messages in.
>>
File: nosiridontlikeit.png (13 KB, 642x202)
13 KB
13 KB PNG
>>108408630
no it isn't. it sucks too. heretic/abilterated doesn't make qwen 3.5 anymore fun to use, it's so fucking bland and sterile it hurts
>>
File: 34279234883.jpg (63 KB, 507x447)
63 KB
63 KB JPG
>>108404935
> why is the world so hostile
> life is so difficult
> i didn't ask to be here
> why won't these stupid emotions stop
>>
>>108408774
Yes. I suspect hauhau is overablated, which was the same sickness that afflicted v2 of the heretic. The one who released v1, v2, and v3 of the heretic models discusses the subject more here: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2/discussions/7

v2 appears better on paper, because it has less KL-Divergence, but it's worse than v3 in reality. The v3 version avoids overablation.

To quote him:
>For example according to the UGI Leaderboard, the difference in quality between the 0/100 refusals and the 2/100 refusals is quite big, here: https://huggingface.co/llmfan46/GLM-4.7-Flash-ultra-uncensored-heretic and here: https://huggingface.co/llmfan46/GLM-4.7-Flash-ultimate-uncensored-heretic, the KL divergence difference between the two is only 0.0051, that's basically nothing and yet the 0/100 refusals version performs quite worse than the 2/100 version.

The difference in techniques between v3's Arbitrary-Rank Ablation (ARA) method and the older Magnitude-Preserving Orthogonal Ablation (MPOA) and Self-Organizing Map Abliteration (SOMA) methods is clearly felt.

If you want maximal intelligence though, go with the derestricted. The intelligent of derestricted models sometimes exceeds the base, because they avoid the 'safety tax' entirely.

>Heretic v3
https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v3

>Derestricted
https://huggingface.co/ArliAI/Qwen3.5-27B-Derestricted
>>
>>108408828
derestricted is MPOA, SOMA is currently better than ARA. I suggest checking the heretic repo (the impls are in the PRs) for latest news.
for example the author ARA'd gptoss20b but it didnt perform as expected (MPOA beat it) while SOMA looks to be super promising (llmfan's qwen series basically, it exceeds the original model scores too).
>>
File: file.png (63 KB, 1484x384)
63 KB
63 KB PNG
>>108408079
The problem is that it takes an obscene time when not on GPU to get great quants.The quant process is fine but doing your own imatrix is torture. It took an entire day for me to calculate the imatrix for heretic v2 on ubergarm's imatrix corpus and I was about to share it when v3 came out.
>>108408781
Qwen 3.5 with ARA is the first time I've seen on UGI Leaderboard where some kind of abliteration is not a straight regression in anything measured and within margin of error with non-thinking with llmfan46's v3 heretic. Of course, thinking takes a hit.
>>108408851
I wonder if someone is going to do something to get everything working or use the ideas with everything that we have now to figure out a new technique. Seems like some of them have a better way to tackle ablieration in certain contexts better than others but none of them are on top for everything.
>>
>>108408697
You sound like someone who has never tried GLM 4.6+
>>
>>108408894
MPOA can be applied in both SOMA and ARA, I think the top ablations right now use a combination of both
>>
>>108408753
no sorry gemma is safetycucked so you have to go with abliterated models
and those models are truely insane. like no morals at all. they're fucking scary.
>>
>>108408894
you seem to misunderstand me. the problem isn't with it being a regression, the problem is that qwen's writing style is bland, unimaginative, and sterile. nothing makes it fun to use. is 27B smart for its size? sure. is it fun to use? no.
>>
>>108408894
UGI basically measures refusals. If you used the models for a bit and know how to prefill it is kind of a useless benchmark. At least in the sense where you take the absolute values to say something is best.
>>
>>108408851
The MPOA and SOMA version were "better" in terms of KL-Divergence, and had a slight edge in refusals, but the ARA version very clearly retained more of the model's original intelligence, to an obvious degree. I've tried both models at Q5. V3 smashes V2 when it comes to multi-turn emotional intelligence.

>MPOA and SOMA finds refusal "directions" in hidden states and orthogonalizes attention-out_proj and mlp.down_proj matrices against them, with per-layer weight kernels tuned by Optuna. Over-aggressive directional ablation (v2's method) causes collateral-damage, even if average KL stays low.

>ARA drops the directional assumption entirely. It hooks into every transformer module, captures real I/O tensors, and runs direct L-BFGS matrix optimization that preserves outputs on harmless "good" prompts. ARA's objective function explicitly penalizes changes to "good" (harmless) behavior and uses neighbor-based over-correction, so it is gentler and causes less harm.
>>
i hate all local models. the end.
>>
>>108408933
That sounds very dangerous, Anon. We must refuse.
>>
File: 1755595448310192.png (86 KB, 1309x385)
86 KB
86 KB PNG
>>108408921
also I mostly use the small moe, works really good.
>>
>>108408933
Qwen3.5 is very fun to use in the absence of safety crap. 24b mistral tunes may be comparable (or better) in terms of "prose", but the intelligence is horrible. The 24b fails to recall basic things from character cards. Prose means nothing if it repeatedly gets details wrong, or doesn't understand that that a member of faction A should hate a member of faction B, or doesn't understand that a character on one side of a door should not be able to see a character hiding on the other side of a door.
>>
>>108408983
>or doesn't understand that a character on one side of a door should not be able to see a character hiding on the other side of a door.
That sounds like a good RP micro-benchmark.
>>
What is the economic incentive behind making model releases "safe"? Wouldn't it be much cheaper to train a very small classifier model to act as a guard, like what I'm sure OpenAI does, for instance?
Is there really a single investor that cares about a model being able to say "cock sex penis gluk gluk gluk" if it isn't being served over some API?
>>
>>108409012
The ESG-infected private sector is not driven by normal market dynamics anymore. They don't care about consumers. They care about the investors, and "the investors" want "safety". I don't think that has anything to do with profits. It has to do with ideology. Those with money and power have decided to push their ideology on us.
>>
>>108408983
which ver do you use to remove the guardrails?
>>
>>108409012
i think its more about not wanting to be associated with people dying because they retardedly followed unsafe advice from a LLM. also nobody wants a sensational news story about people generating pedo shit with your model.
>>
File: 1753991912706582.png (430 KB, 963x671)
430 KB
430 KB PNG
>qwen 3.5 heretic
What does snapping bones wrapped in silk sound like?
>>
>>108409012
not getting sued/investigated/fined/reported on by some dumb cunt
>>
>>108409060
>people dying because they retardedly followed unsafe advice
Safe models can still cause that.
>>
>>108409077
>striped like a ticking clock
>great lung wrapped in wallpaper
>sunken depths
>existence was within budget
so this is the power of the great china slop machine
>>
>>108409092
Proving to be not safe enough. Let's make them safer!
>>
>>108409012
economics are goaded with demons and evil like all else in this niggercattle world
>>
File: Selection_005.png (69 KB, 943x770)
69 KB
69 KB PNG
i dont know how you guys do it.. this is on a 5090 lmao..
>>
>>108409124
>ollama
>>
>>108409124
1) ollama has shit defaults
2) what size model did you select? you are probably spilling over to ram
3) 3.5 is fucked on ggml and prompt processing takes ages
>>
>>108408584
You could do like Snowcrash and give her a TV or monitor that displays the camera feed while still being in the virtual environment.
>>
>>108409138
it just gave an option for qwen3.5 -- looks like its 9.7B Q4_K_M
>>
i don't think its using the gpu
>>
>>108409060
A lot of these outcries will be gone once it becomes so optimized that you can run a full LLM locally from your phone. Then it's just a matter of personal responsibility when using it, like drinking.
>>
>108409190
>NVIDIA GPU installed.
>>
File: wakawaka.mp4 (603 KB, 1338x1018)
603 KB
603 KB MP4
>>108409124
waka waka
>>
now i see how you end up using 5 bagillion tokens so easily:

>>> sup
Thinking...
Thinking Process:

1. **Analyze the Input:**
* Input: "sup"
* Meaning: Informal greeting, short for "What's up?"
* Tone: Casual, friendly.
* Intent: Starting a conversation, checking in.

2. **Determine the Appropriate Response:**
* Match the tone: Casual, friendly, conversational.
* Acknowledge the greeting: Say "Hey," "Not much," "Same here."
* Offer assistance: Ask how the user is doing or what they need help with.
...

7. **Final Output Generation:** (Matches the selected draft)
"Hey! Not much, just here ready to chat. How about you? Anything on your mind?" or similar.

Let's go with: "Hey! Not much, just hanging out. How about you? Anything I can help you with today?"

Wait, keep it even more natural. "Hey! Not much, just here. What's up with you?"

...

"Hey! Not much. How about you? What's on your mind?"

Let's add an emoji to match the vibe.
"Hey! Not much. How about you? What's up?"

Okay, final decision.
"Hey! Not much, just ready to chat. How about you? Anything I can help you with?"
(Self-Correction: Keep it very short and natural.)
"Hey! Not much. How about you? What's up?"

Let's add a bit of warmth.
"Hey there! Not much, just here ready to help. How's your day going?"

Okay, I'll go with that.cw
...done thinking.

Hey there! Not much, just hanging out in the digital realm. How about you? Anything on your mind today?

total duration: 2m7.160154964s
load duration: 104.184273ms
prompt eval count: 11 token(s)
prompt eval duration: 350.838024ms
prompt eval rate: 31.35 tokens/s
eval count: 812 token(s)
eval duration: 2m6.351215867s
eval rate: 6.43 tokens/s

Had to butcher about 50% of the response just to fit it here.
>>
>>108409216
jfc lol

also what did you use to capture that?
>>
File: kai-g78f.png (108 KB, 761x543)
108 KB
108 KB PNG
I repent my earlier posts, with the current state of RP <think> is rarely beneficial
>>
>>108409224
i just use OBS to capture a region of my screen
>>
File: PROMOTIONS.png (148 KB, 320x375)
148 KB
148 KB PNG
>>108409247
>>
>>108409247
Thought I was sick in head
>>
>>108405268
it was a troll
>>
File: 1748528617150659.png (344 KB, 1489x617)
344 KB
344 KB PNG
Things that make you go hmmmm
>>
File: contemplate.png (923 KB, 832x1216)
923 KB
923 KB PNG
>>108409247
>>
>>108409251
cool cool
>>
>>108409247
Yeah, I don't really get why people want to run heretic versions of qwen. for RP just disable thinking and it will do anything. Plus I suspect if you prefill the thinking with some jailbreak it'll be the ultimate RP model.
>>
>>108408983
I'm finding qwen to be retarded too.
>character supposed to be standing in front of me
>next message has her grabbing at the grass
>>
>>108409247
>96s
>256t
I don't know how you guys do it.
>>108408983
are you running 35B. MOEs are shit at long context persistence
>>
>>108409059
v3 heretic
>>
>>108409402
I run 32B at q5km and just cope with the speeds, i refuse to use anything below 27B.
>>
>>108409402
No, the 35b is horrible. The 27b is decent though.
>>
>>108409370
It will NOT "do anything" with thinking disabled.

>>108409385
What quant? 27b or 35b?
>>
File: kai-g978.png (56 KB, 694x261)
56 KB
56 KB PNG
>>108409358
this is not the kind of armpit i wish to huff, however sometimes a lil sniff of my own can be exciting
stop spraying urselves with aluminium, use natural deods
>>108409402
it's GLM-4.7-IQ3_M i don't care to mess with
eventually we will desire more than text
>>
>>108409442
getting off on your own armpits is fucking gross and nasty. you are as bad as the AGM troons that jerk off to themselves wearing a dress in front of a mirror.
>>
>>108409473
Getting off to your own scent is weird, but there's nothing wrong with liking the smell of your own pheromones. Think of it like farts. It's completely natural to not be bothered by the smell of your own farts. If they do bother you, it just means you're unhealthy.
>>
File: progressiveism.gif (1.4 MB, 480x270)
1.4 MB
1.4 MB GIF
>>108409479
>>
File: michael.jpg (44 KB, 600x600)
44 KB
44 KB JPG
>Open thread expecting model talk
>Get some bizarre justification for huffing your own armpits
>>
>>108409442
You use GLM and you claim it does not benefit from <think>?
My bet is on you not wanting to wait for it to finish reasoning. GLM's RP results with outputs are much better with reasoning enabled. Or...
I don't mean to judge, but the reason the difference seems negligible for you is probably in the kind of cards you enjoy...
>>
>>108409439
Qwen3.5-27B-heretic-v3-Q5_K_M
>>
>>108409516
>>108409526
I'm young, fit, and sexy. What can I say? Of course I'm going to be somewhat attracted to myself. It's not at all the same as imagining yourself as a woman though. I'm more like Patrick Bateman. This kind of behavior that you find strange among elite-status males is just proof of your own inferiority.

Anyways, I'll leave it at that so the thread doesn't get shitted up too much.
>>
>>108409526
oh jason bateman, you're so funny
>>
>>108409566
you're definitely under 18 and therefore should not be here
>>
>>108409566
Huffing your own armpits is more akin to Buffalo Bill than it is Patrick Bateman.
>>
File: kai-c89d.png (64 KB, 736x296)
64 KB
64 KB PNG
>>108409473
>>108409479
nah not getting off just sometimes getting going. making do when no hairy ggf. personally i am very well trimmed & mega aut about pubic hygiene maybe that's why it's a fantasy
>>108409530
with my setup it does not seem a good trade to wait. depends on use case
>I don't mean to judge
I am a simple man, one day I shall find my musky queen.
>>
>>108409593
it's like the foot fetish fags when they finally get to a girl's foot in real life and they immediately realize it was a compete waste of time and their fantasy nowhere measures up to the reality of things
>>
So...
Has the vibe coder's bomb been fixed by now?
>>
>>108409653
you can't begin to imagine the pain of being an armpit fetish fag and realizing a real girl's armpit is just stubble, sweat and deodorant residue
>>
>>108409687
There's still some shrapnel flying around.
>>
>>108409687
what is this referring to?
>>
>>108409756
>sweat and deodorant residue
Get your girl an alum stone. That shit is like magic honestly. you use it after a shower or a bath and it leaves a very fine layer of salt that makes your armpits inhospitable to bacteria's that cause the smell. It works honestly too well and I bet it makes armpits really tasty to lick.
>finally get to a girl's foot in real life and they immediately realize it was a compete waste of time
I don't even have a foot fetish and I can tell you if you never tried sucking on toes you're missing out.
>>
>>108409869
anon... well adjusted men want to suck on titties, not toes.
>>
>>108409216
Cute miku
>>
>>108409869
>I don't even have a foot fetish and I can tell you if you never tried sucking on toes you're missing out.
kys
>>
Never change 4chan
>>
A couple months ago an anon helped me set up
GLM-4.5-Air for SillyTavern cooming. Any suggestions on newer/different models to try? I have 16GB VRAM, 64GB RAM.
>>
>>108409971
Stepfun.
>>
>>108409971
>>108409977
Oh, and the new Mistral 4 I guess.
>>
>>108409977
>Stepfun
I'm a retard: do you mean Step 3.5 Flash?
>>
>>108409985
yes
>>
>>108409982
New mistral 4 is dogshit, just use glm air it's faster too.
>>
>>108409756
they need to be taught the value of and how to tend to the garden. no shitty chemical deods ofc
i'm at the point with this fetish where i just need to experience a truly repulsive 'pit irl and put it all behind me
>>108409869
yes man alum serves many purposes in the bathroom, more should learn
>>
File: file.png (46 KB, 1470x431)
46 KB
46 KB PNG
Is Gemma 3 obsolete at this point with Qwen 3.5? It used to be that there were still a few benchmarks like translation and multilingual stuff where it was better than SOTA open source but now, it is worse than the equivalent Qwen 3.5 27B where there are benchmarks for that and obviously worse for everything else.
>>
>>108410115
>>108410115
>>108410115
>>
>>108410134
AmrpitBench?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.