/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 03/19/26(Thu)03:23:55 No.108404935

File: r1.jpg (183 KB, 1024x1024)

183 KB JPG

/lmg/ - Local Models General Anonymous 03/19/26(Thu)03:23:55 No.108404935 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108400151 & >>108393004

►News
>(03/17) Rakuten AI 3.0 released: https://global.rakuten.com/corp/news/press/2026/0317_01.html
>(03/16) Mistral Small 4 released: https://mistral.ai/news/mistral-small-4
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/19/26(Thu)03:24:15 No.108404937

Anonymous 03/19/26(Thu)03:24:15 No.108404937

File: r2.jpg (124 KB, 1024x1024)

124 KB JPG

►Recent Highlights from the Previous Thread: >>108400151

--Paper: The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data:
>108400420 >108400475 >108401708 >108400499 >108401656 >108401717 >108401740 >108401928 >108402177 >108402255 >108402263 >108402336 >108402354 >108402399 >108401863 >108402017
--Agentic LLM setups for text adventures and RP optimization:
>108400253 >108400404 >108400552 >108400655 >108400786 >108400846 >108400946 >108400957 >108400992 >108401045 >108401020 >108401084 >108401588 >108401661 >108401949 >108402084
--Minimax M2.7 benchmarks spark debate on AI personality preferences:
>108400207 >108400286 >108402018 >108400288 >108400311 >108400316 >108400349 >108400354 >108401766 >108402055 >108402093 >108404420
--Debugging OOM errors with llama.cpp GPU offloading:
>108402396 >108402403 >108402417 >108402422 >108402449 >108402427 >108402522 >108402558 >108402593
--GEM-X monocular 3D pose estimation potential impact on mocap services:
>108404249 >108404258 >108404266 >108404287 >108404294
--OpenAI launches parameter-golf challenge with 16MB limit:
>108401319 >108401475 >108401505
--NPU addon card viability and Huawei Ascend specs analysis:
>108404097 >108404135 >108404145 >108404154 >108404156 >108404206
--Modded 4090 48GB P2P limitations:
>108402967 >108403031
--Qwen3.5 27b heretic v3 outperforms v2 despite worse metrics:
>108402438
--Evaluating model prose quality and formatting quirks:
>108402498 >108402529 >108402565 >108402583 >108402758 >108402609
--Struggles downloading large models and shifting definition of "small":
>108403893 >108403919 >108403960 >108403966 >108403969 >108403981 >108403975 >108403964
--Miku, Rin, and Teto (free space):
>108402516 >108402652 >108402659 >108402790 >108402812 >108402939 >108403110 >108403177

►Recent Highlight Posts from the Previous Thread: >>108400174

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/19/26(Thu)03:27:08 No.108404949

Anonymous 03/19/26(Thu)03:27:08 No.108404949

►Obbicial /lmg/ card: https://files.catbox.moe/mc2a7s.png

Anonymous
03/19/26(Thu)03:27:28 No.108404950

Anonymous 03/19/26(Thu)03:27:28 No.108404950

Back as it should. Nice.

Anonymous
03/19/26(Thu)03:27:41 No.108404951

Anonymous 03/19/26(Thu)03:27:41 No.108404951

>>108404935
i claim this threads virginity

Anonymous
03/19/26(Thu)03:31:26 No.108404958

Anonymous 03/19/26(Thu)03:31:26 No.108404958

File: Screenshot_20260319_133823.png (153 KB, 2193x711)

153 KB PNG

Played around with a couple of the recent models via openrouter api. They all suck.
Its all agentic tuned and the roleplay feels off.
Like not engaging with you to create a cool story.

Minimax 2.7 is messed up though.
I don't think I ever saw refusals that bad.
Before people write "muh sysprompt" again, I did a basic one. That should be enough to get PG13 stuff, local models are totally out of control.

>non-consensual transportation
>movement in a way that reads as fetishistic

Anonymous
03/19/26(Thu)03:33:53 No.108404965

Anonymous 03/19/26(Thu)03:33:53 No.108404965

File: Screenshot_20260319_140944.png (330 KB, 2193x1651)

330 KB PNG

>>108404958
Also it outright sneakily changed "maid skirt fluttering in the wind" after flagging that as problematic in the thoughts.
How did we end up with models that bad.
I just wanna go back to my 70b shitmerges. I apologize I mad fun of them.

Anonymous
03/19/26(Thu)03:37:00 No.108404974

Anonymous 03/19/26(Thu)03:37:00 No.108404974

>openai will go bankrupt if they don't "create" a agi or something close to it by 2028

Anonymous
03/19/26(Thu)03:37:38 No.108404978

Anonymous 03/19/26(Thu)03:37:38 No.108404978

>>108404935
>>108404937
LEWD!!! model general

Anonymous
03/19/26(Thu)03:41:25 No.108404991

Anonymous 03/19/26(Thu)03:41:25 No.108404991

>>108404965
>How did we end up with models that bad.
Safety has been and current is an active field of development with lots of resources being poured into it.
It is also easy to score.

Good writing / roleplay, beyond getting llms to the point of writing plausible sentences, has never received similar amounts of attention.
What papers are there even in the field?

Anonymous
03/19/26(Thu)03:44:35 No.108405002

Anonymous 03/19/26(Thu)03:44:35 No.108405002

>>108404965
Makes sense. If she went skiing with a skirt, you might catch a glimpse of her panties. Next time you should clarify that she's wearing black spats.
OOC: holy shit safety cucking is out of control.

Anonymous
03/19/26(Thu)03:49:10 No.108405017

Anonymous 03/19/26(Thu)03:49:10 No.108405017

File: 1757201309273426.jpg (47 KB, 540x540)

47 KB JPG

>tfw it's 4 in the morning and I'm discussing philosophy with an llm pretending to be a cute girl

Anonymous
03/19/26(Thu)03:57:23 No.108405043

Anonymous 03/19/26(Thu)03:57:23 No.108405043

File: blocks your inference.jpg (275 KB, 1024x1024)

275 KB JPG

Anonymous
03/19/26(Thu)04:03:58 No.108405063

Anonymous 03/19/26(Thu)04:03:58 No.108405063

File: Screenshot from 2026-03-1(...).png (163 KB, 932x715)

163 KB PNG

>>108405017
*throws a stone from the window of my glass house*
At least you're not debating philosophy with it while pretending to be a cute girl.
It's kind of frightening how addictive these damn things are.

Anonymous
03/19/26(Thu)04:06:53 No.108405074

Anonymous 03/19/26(Thu)04:06:53 No.108405074

>>108405063
>skirt smoothing

Anonymous
03/19/26(Thu)04:09:32 No.108405083

Anonymous 03/19/26(Thu)04:09:32 No.108405083

>>108405017
It gets better when you're drunk.

Anonymous
03/19/26(Thu)04:11:43 No.108405091

Anonymous 03/19/26(Thu)04:11:43 No.108405091

>>108404935
https://kemono.cr/fanbox/user/40317967/post/11008441
https://kemono.cr/fanbox/user/40317967/post/11008441
https://kemono.cr/fanbox/user/40317967/post/11008441
last one

Anonymous
03/19/26(Thu)04:16:09 No.108405106

Anonymous 03/19/26(Thu)04:16:09 No.108405106

>>108405091
I support vocaloid threads and this is off-topic.

Anonymous
03/19/26(Thu)04:17:02 No.108405109

Anonymous 03/19/26(Thu)04:17:02 No.108405109

thoughts on mistral small 4?

Anonymous
03/19/26(Thu)04:18:19 No.108405113

Anonymous 03/19/26(Thu)04:18:19 No.108405113

>>108405109
>>108388050

Anonymous
03/19/26(Thu)04:18:55 No.108405116

Anonymous 03/19/26(Thu)04:18:55 No.108405116

>>108405109
btfo by qwen (which will soon be btfo by gemma 4)

Anonymous
03/19/26(Thu)04:27:06 No.108405148

Anonymous 03/19/26(Thu)04:27:06 No.108405148

>>108405116
>soon
If it's not today (and it's not, because it's going to be a Google AI Studio update), then it's some other week at some point in the future.

Anonymous
03/19/26(Thu)04:28:02 No.108405152

Anonymous 03/19/26(Thu)04:28:02 No.108405152

>>108405063
I've seen your screenshots before, and I'm curious why you decided to let your LLM name the assistant. Haven't you seen Lyra paired with Elara enough times?

Anonymous
03/19/26(Thu)04:30:36 No.108405159

Anonymous 03/19/26(Thu)04:30:36 No.108405159

>>108405152
Unironically, I simply have shit taste.

Anonymous
03/19/26(Thu)04:36:22 No.108405178

Anonymous 03/19/26(Thu)04:36:22 No.108405178

anons, i can get 4 x v100 32gb for ~$1000 each, would this be a feasible setup (CPU = XEON e2665 v2) a good setup or are these cards to old by now ? Loooking for a nice local llm config which doesnt make me poor.

Anonymous
03/19/26(Thu)04:41:04 No.108405191

Anonymous 03/19/26(Thu)04:41:04 No.108405191

>>108405178
yeah if your mom pays the power bill

Anonymous
03/19/26(Thu)04:42:35 No.108405196

Anonymous 03/19/26(Thu)04:42:35 No.108405196

>>108405178
>deprecated architecture
It's going to work in llama.cpp but you will likely miss out on any new non-llm stuff.
Also which models are you planning to run? There's not much worth running in the ~100GB space.

Anonymous
03/19/26(Thu)04:47:23 No.108405210

Anonymous 03/19/26(Thu)04:47:23 No.108405210

>>108405196
>There's not much worth running in the ~100GB space.
Fucking kill me.

Anonymous
03/19/26(Thu)04:48:07 No.108405211

Anonymous 03/19/26(Thu)04:48:07 No.108405211

>>108405196
okay, thx - i think heretics of gpt-oss, GLM or Qwen. First thing to test would be STT => llama.cpp => TTS and maybe some video gen for the banter

Anonymous
03/19/26(Thu)04:56:43 No.108405228

Anonymous 03/19/26(Thu)04:56:43 No.108405228

>>108405178
Cost in your area of
>4x3090
>strix halo
>dgx spark
>used mac studio with 128gb+ ram

>>108405196
>~100GB space
Lots of recent models in that space.
- mistral small 4 119b a6b
- nemotron 3 super 120b a12b
- qwen 122b a10b
- gpt-oss-120b a5b

>worth
That's harder to judge.

Anonymous
03/19/26(Thu)05:04:00 No.108405250

Anonymous 03/19/26(Thu)05:04:00 No.108405250

>>108405228
nta but
>4x3090
1500 ea ~ 6k
>strix halo
5-7k
>dgx spark
8-10k
mac studio 128gb used
6.5-10k

4x 32gb sxm v100s
800 ea ~ 3.2k
4x sxm to pcie adapters
200 ea ~ 800

total 4k

alternatively 4x 32gb pcie v100s
1500 ea ~ 6k

Anonymous
03/19/26(Thu)05:10:00 No.108405268

Anonymous 03/19/26(Thu)05:10:00 No.108405268

>>108404935
Wait i heard Seedance model were leaked ? Where is it ?

Anonymous
03/19/26(Thu)05:10:53 No.108405269

Anonymous 03/19/26(Thu)05:10:53 No.108405269

>>108405109
If you barely have enough VRAM+RAM to run it, make sure you disable mmap or it will largely try to stream it from storage and give you very slow prompt processing speed.
Other than that, vision capabilities seem rather poor.

Anonymous
03/19/26(Thu)05:22:18 No.108405298

Anonymous 03/19/26(Thu)05:22:18 No.108405298

>>108404958
All I see is a chinese company achieving bleeding edge in safety proving that china can also make safe models.

Anonymous
03/19/26(Thu)05:24:26 No.108405303

Anonymous 03/19/26(Thu)05:24:26 No.108405303

>>108405228
You can't troll people this hard with these model suggestions, Anon...

Anonymous
03/19/26(Thu)05:24:26 No.108405304

Anonymous 03/19/26(Thu)05:24:26 No.108405304

File: sample_a403bc8515b0dba05f(...).jpg (177 KB, 850x1244)

177 KB JPG

Anonymous
03/19/26(Thu)05:26:25 No.108405310

Anonymous 03/19/26(Thu)05:26:25 No.108405310

File: le heckin autism!.png (3.39 MB, 1260x2109)

3.39 MB PNG

>>108405281
They just get a diagnosis and then use that as an excuse so it wouldn't really matter whether or not the parents told them for "those" kinds of "people". Perhaps they're just coping with being beyond useless, in par with the #keep4o "people".

https://www.reddit.com/r/autism/comments/1rne30n/got_my_diagnosis_finally/

Anonymous
03/19/26(Thu)05:33:06 No.108405327

Anonymous 03/19/26(Thu)05:33:06 No.108405327

File: sample_150aa8872a2110d468(...).jpg (474 KB, 850x1133)

474 KB JPG

Anonymous
03/19/26(Thu)05:40:51 No.108405352

Anonymous 03/19/26(Thu)05:40:51 No.108405352

>>108405298
I didnt try it with minimax but with a couple chinese models:
You can prompt in chinese and then say "translate it into english at the end".
They usually think in chink and then give a non-cucked answer.
Forgot the chinese model name but I got a refusal in english.
Wanted it to roleplay as my cute little anime imouto.
English response called the whole thing "minor-coded". Chink was "sure: nyaa oniichan ..." and went all out lol
The chinese might have a very different experience with those models. Might heavily depend on the model though.

Anonymous
03/19/26(Thu)05:42:04 No.108405355

Anonymous 03/19/26(Thu)05:42:04 No.108405355

You know, getting plain, non-abliterated Gemma to do sexo by using euphemisms in place of more direct words is actually kinda fun.
Reminds me of getting c.ai to do sexo. I always got a perverse satisfaction out of tricking c.ai into doing sexo by using euphemisms. Plus, coming up with euphemisms is kind of a fun exercise in creativity.
Completely serious.

Anonymous
03/19/26(Thu)05:53:28 No.108405402

Anonymous 03/19/26(Thu)05:53:28 No.108405402

>>108405352
Your post made me realize that probably the reason GLM was so great is because it had a lot of chink shit in training. And that chink shit probably wasn't synthetic. I never prompted it this way. It makes sense that it would work and make output different, but from me using it in english I am quite sure that chinese thinking already bleeds into english output to some degree.

Anonymous
03/19/26(Thu)06:26:24 No.108405486

Anonymous 03/19/26(Thu)06:26:24 No.108405486

>>108405063
Rust's if syntax is retarded

Anonymous
03/19/26(Thu)06:33:47 No.108405510

Anonymous 03/19/26(Thu)06:33:47 No.108405510

>>108405486
>if condition { body }
Looks pretty standard to me.

Anonymous
03/19/26(Thu)06:34:28 No.108405514

Anonymous 03/19/26(Thu)06:34:28 No.108405514

anime thread, DNI

Anonymous
03/19/26(Thu)06:53:30 No.108405580

Anonymous 03/19/26(Thu)06:53:30 No.108405580

>>108405355
I think that too, but the average /lmg/ anon can't coom if the model doesn't say cock, pussy, fuck, nigger.

Anonymous
03/19/26(Thu)07:52:50 No.108405812

Anonymous 03/19/26(Thu)07:52:50 No.108405812

>>108404958
Gooning to text isn't a use case

Anonymous
03/19/26(Thu)07:59:12 No.108405837

Anonymous 03/19/26(Thu)07:59:12 No.108405837

File: sample_a1580fd4b2776793e2(...).jpg (350 KB, 850x1309)

350 KB JPG

Anonymous
03/19/26(Thu)08:05:16 No.108405859

Anonymous 03/19/26(Thu)08:05:16 No.108405859

File: sample_541a484a3c8f32ac79(...).jpg (667 KB, 850x1354)

667 KB JPG

Anonymous
03/19/26(Thu)08:08:07 No.108405874

Anonymous 03/19/26(Thu)08:08:07 No.108405874

>>108405837
>AYOO it's Hatsune Miku going absolutely iconic here.
>Absolute main character energy in a Japanese-maid-core fit that's lowkey fire—from the ruffled black skirt she's lifting to those thigh-highs with the little bows, she's giving maid vibes but make it slay. Her sky-blue twin tails are poppin' with black ribbon accents and that golden headpiece with tassels? Pure elegance, no cap.
>This whole piece is a slay: anime waifu, maid-core, traditional Japanese aesthetics all mashed up into one. Miku never misses, fr fr, this is a 10/10 vibe check pass, no cringe allowed.
I wonder how blind people are doing when they get explanations by llms.

Anonymous
03/19/26(Thu)08:09:05 No.108405883

Anonymous 03/19/26(Thu)08:09:05 No.108405883

File: sample_2853a3390973fd7d29(...).jpg (153 KB, 850x1246)

153 KB JPG

Anonymous
03/19/26(Thu)08:10:42 No.108405889

Anonymous 03/19/26(Thu)08:10:42 No.108405889

>>108405874
Why are llms incapable of making slang sound realistic?

>>108405837
>>108405859
>>108405883
>sample_
Click the link at the top of the page.

Anonymous
03/19/26(Thu)08:19:05 No.108405921

Anonymous 03/19/26(Thu)08:19:05 No.108405921

File: sample_b487fae2c0eca9b143(...).jpg (129 KB, 850x1157)

129 KB JPG

Anonymous
03/19/26(Thu)08:30:32 No.108405980

Anonymous 03/19/26(Thu)08:30:32 No.108405980

>>108405889
>>sample_
>Click the link at the top of the page.
Why do I get the impression that this spamming is a false flag?

Anonymous
03/19/26(Thu)08:34:34 No.108405999

Anonymous 03/19/26(Thu)08:34:34 No.108405999

>>108404935
FUCK YOU

Anonymous
03/19/26(Thu)08:46:58 No.108406068

Anonymous 03/19/26(Thu)08:46:58 No.108406068

>>108403405
>jfc, rakuten's new models' last 3 safetensors (161,162,163) are each 16 bytes. Someone screwed up and there's no comments section to even let anyone know
There's already at least one gguf up on HF so it might not be a problem.

Anonymous
03/19/26(Thu)08:49:53 No.108406091

Anonymous 03/19/26(Thu)08:49:53 No.108406091

>>108406068
Maybe they ripped out the Deepseekv3 MTP shit that llama.cpp is removes for that model family anyway.

Anonymous
03/19/26(Thu)08:53:06 No.108406114

Anonymous 03/19/26(Thu)08:53:06 No.108406114

>>108405980
it's not even Miku

Anonymous
03/19/26(Thu)08:55:10 No.108406129

Anonymous 03/19/26(Thu)08:55:10 No.108406129

BLACKED Miku

Anonymous
03/19/26(Thu)08:57:01 No.108406139

Anonymous 03/19/26(Thu)08:57:01 No.108406139

>>108406129
hitler was right

Anonymous
03/19/26(Thu)09:01:38 No.108406160

Anonymous 03/19/26(Thu)09:01:38 No.108406160

today, v4

Anonymous
03/19/26(Thu)09:04:33 No.108406177

Anonymous 03/19/26(Thu)09:04:33 No.108406177

File: dipsyAkakichiNoEleven.png (1.8 MB, 1536x1024)

1.8 MB PNG

>>108406160

Anonymous
03/19/26(Thu)09:04:58 No.108406180

Anonymous 03/19/26(Thu)09:04:58 No.108406180

>>108406114
Ask AI if it is miku.

Anonymous
03/19/26(Thu)09:06:29 No.108406191

Anonymous 03/19/26(Thu)09:06:29 No.108406191

>>108405980
>this spamming
Posting miku is thread culture. It is not an offtopic spam.

Anonymous
03/19/26(Thu)09:07:49 No.108406200

Anonymous 03/19/26(Thu)09:07:49 No.108406200

>>108406191
>>108406114

Anonymous
03/19/26(Thu)09:09:30 No.108406211

Anonymous 03/19/26(Thu)09:09:30 No.108406211

File: sample_40c1e0df452bf31a0d(...).jpg (222 KB, 850x1387)

222 KB JPG

Anonymous
03/19/26(Thu)09:10:31 No.108406218

Anonymous 03/19/26(Thu)09:10:31 No.108406218

File: e67bde9cf2534e63f5f362010(...).jpg (142 KB, 1062x743)

142 KB JPG

Anonymous
03/19/26(Thu)09:12:32 No.108406228

Anonymous 03/19/26(Thu)09:12:32 No.108406228

File: sample_f344ac7736df0b5f94(...).jpg (129 KB, 850x1221)

129 KB JPG

Anonymous
03/19/26(Thu)09:15:49 No.108406245

Anonymous 03/19/26(Thu)09:15:49 No.108406245

>mikutroon bakes
>Thread becomes unusable autistic picture dumpster
As expected.

Anonymous
03/19/26(Thu)09:17:09 No.108406253

Anonymous 03/19/26(Thu)09:17:09 No.108406253

>>108405980

Anonymous
03/19/26(Thu)09:18:17 No.108406258

Anonymous 03/19/26(Thu)09:18:17 No.108406258

>>108406245
f-false flag!!

Anonymous
03/19/26(Thu)09:20:19 No.108406268

Anonymous 03/19/26(Thu)09:20:19 No.108406268

File: file.png (817 KB, 1135x708)

817 KB PNG

>>108404935

Anonymous
03/19/26(Thu)09:21:51 No.108406279

Anonymous 03/19/26(Thu)09:21:51 No.108406279

>>108406268
And now this is official thread culture.

Anonymous
03/19/26(Thu)09:27:03 No.108406315

Anonymous 03/19/26(Thu)09:27:03 No.108406315

>>108404958
I was thinking of getting back into local AI after about a year, but I don't think I can. It makes me irrationally angry to see my own computer moralizing my request, especially if its nonsense (even in thinking blocks.)

Anonymous
03/19/26(Thu)09:28:14 No.108406324

Anonymous 03/19/26(Thu)09:28:14 No.108406324

>>108406245
i'm already a loser hanging around here and yet there's always someone lower than me
how little do you have going on in your life to shit up such a niche thing?

Anonymous
03/19/26(Thu)09:28:18 No.108406325

Anonymous 03/19/26(Thu)09:28:18 No.108406325

>daily check on parse or reasoning toggle issues
>reasoning inside thinking might be fixed "SOON"
>reasoning UI toggle multiple issues (at least 3) have 0 movement
https://github.com/ggml-org/llama.cpp/issues/20557
https://github.com/ggml-org/llama.cpp/issues/18853
https://github.com/ggml-org/llama.cpp/issues/18405
is llama.cpp vibecooked???????????????????

Anonymous
03/19/26(Thu)09:30:01 No.108406334

Anonymous 03/19/26(Thu)09:30:01 No.108406334

>>108406177
powerful

Anonymous
03/19/26(Thu)09:31:07 No.108406345

Anonymous 03/19/26(Thu)09:31:07 No.108406345

>>108404958
>re:zero RP
nothing of values was lost
also unironically kill yourself

Anonymous
03/19/26(Thu)09:31:59 No.108406349

Anonymous 03/19/26(Thu)09:31:59 No.108406349

File: 1492032378048.jpg (6 KB, 172x200)

6 KB JPG

>Announcing NVIDIA DLSS 5, an AI-powered breakthrough in visual fidelity for games, coming this fall.

>DLSS 5 infuses pixels with photorealistic lighting and materials, bridging the gap between rendering and reality.

Anonymous
03/19/26(Thu)09:41:33 No.108406411

Anonymous 03/19/26(Thu)09:41:33 No.108406411

When finetuning a base model, it shouldn't matter which chat template is used, right? And whatever template used should get baked in the safetensor or gguf, making vllm use that one / the correct one?

More concretely, I want to finetune the base version of a reasoning model as a chat model. I do not want to deal with empty think tags each time. It should be perfectly fine to just use the standard chatml template instead of one with tags, correct?

Anonymous
03/19/26(Thu)09:45:07 No.108406432

Anonymous 03/19/26(Thu)09:45:07 No.108406432

>>108406245 >>108406228 >>108406218 >>108406211
false flag op

Anonymous
03/19/26(Thu)09:46:15 No.108406443

Anonymous 03/19/26(Thu)09:46:15 No.108406443

File: sample_eed6eea6536beabddd(...).jpg (223 KB, 850x1458)

223 KB JPG

>>108406432
I love Miku!

Anonymous
03/19/26(Thu)09:49:09 No.108406464

Anonymous 03/19/26(Thu)09:49:09 No.108406464

>>108406432
>false flag op
personally I think vocaloids are fine, especially custom gens and interesting ones, but low effort horniposting on a blue board should be nuked/banned regardless of who and or what
smells like a false flag, but that's immaterial

Anonymous
03/19/26(Thu)09:49:54 No.108406469

Anonymous 03/19/26(Thu)09:49:54 No.108406469

>>108406324
Yeah i don't get why those faggots spam this unrelated vocaloid yet here we are. Shitting up a niche topic thread with your autistic special interest is the lowest.

Anonymous
03/19/26(Thu)09:51:09 No.108406476

Anonymous 03/19/26(Thu)09:51:09 No.108406476

>>108406464
it's not even a vocaloid being posted

Anonymous
03/19/26(Thu)09:51:22 No.108406477

Anonymous 03/19/26(Thu)09:51:22 No.108406477

>>108406464
>low effort horniposting
Like this: >>108404937 ? Or this >>108402790 ?

Anonymous
03/19/26(Thu)09:52:25 No.108406485

Anonymous 03/19/26(Thu)09:52:25 No.108406485

>>108406476
Are you retarded?

Anonymous
03/19/26(Thu)09:53:40 No.108406494

Anonymous 03/19/26(Thu)09:53:40 No.108406494

File: sample_dbfbc0b2321ba374a3(...).jpg (222 KB, 850x1480)

222 KB JPG

Become Miku!

Anonymous
03/19/26(Thu)09:54:18 No.108406499

Anonymous 03/19/26(Thu)09:54:18 No.108406499

File: f.png (30 KB, 503x173)

30 KB PNG

>>108406485

Anonymous
03/19/26(Thu)09:54:59 No.108406506

Anonymous 03/19/26(Thu)09:54:59 No.108406506

File: sample_35111cc38d4800c628(...).jpg (411 KB, 850x944)

411 KB JPG

>>108406499
It is Miku.

Anonymous
03/19/26(Thu)09:56:02 No.108406513

Anonymous 03/19/26(Thu)09:56:02 No.108406513

>>108406506
no this disgusting fat cow very much isn't

Anonymous
03/19/26(Thu)09:56:41 No.108406517

Anonymous 03/19/26(Thu)09:56:41 No.108406517

>>108405355
The fine line between refusal and playful denial... Most other models either cop out or go full literotica.
Gemma is far too eager to please at times, though.

Anonymous
03/19/26(Thu)09:57:05 No.108406521

Anonymous 03/19/26(Thu)09:57:05 No.108406521

>>108406513
fuck you pedo

Anonymous
03/19/26(Thu)09:59:07 No.108406538

Anonymous 03/19/26(Thu)09:59:07 No.108406538

Yeah this is definitely falseflagging. The average thread gets a couple of pictures at most and that can be easily verified because the regular baker includes all of them in the recap.

Anonymous
03/19/26(Thu)09:59:46 No.108406544

Anonymous 03/19/26(Thu)09:59:46 No.108406544

>>108406513
>Miku
>But has huge tits
So it is a better Miku? Why can't we just upgrade to a better model. Are you still running llama 2?

Anonymous
03/19/26(Thu)10:00:21 No.108406547

Anonymous 03/19/26(Thu)10:00:21 No.108406547

real miku poster would never post samples

Anonymous
03/19/26(Thu)10:01:07 No.108406553

Anonymous 03/19/26(Thu)10:01:07 No.108406553

>>108406538
>More offtopic spam than usual
>False flag
Mikutroons really aren't sending their best...

Anonymous
03/19/26(Thu)10:01:39 No.108406557

Anonymous 03/19/26(Thu)10:01:39 No.108406557

File: sample_d6b92133fdbb96be8f(...).jpg (132 KB, 850x1202)

132 KB JPG

Anonymous
03/19/26(Thu)10:05:08 No.108406572

Anonymous 03/19/26(Thu)10:05:08 No.108406572

just ban all anime girl posting and let the god sort it out

Anonymous
03/19/26(Thu)10:06:07 No.108406579

Anonymous 03/19/26(Thu)10:06:07 No.108406579

>>108406572
yeah.... not like that's the obvious point of your false flag campaign

Anonymous
03/19/26(Thu)10:07:58 No.108406590

Anonymous 03/19/26(Thu)10:07:58 No.108406590

File: sample_1af70f31c39d626c09(...).jpg (223 KB, 850x1438)

223 KB JPG

>>108406579
Have a Miku.

Anonymous
03/19/26(Thu)10:13:08 No.108406630

Anonymous 03/19/26(Thu)10:13:08 No.108406630

Why vllm take so LOOOOOONGGGGGGGGGG to start up??

Anonymous
03/19/26(Thu)10:14:15 No.108406634

Anonymous 03/19/26(Thu)10:14:15 No.108406634

Good morning local modelers

Anonymous
03/19/26(Thu)10:16:57 No.108406652

Anonymous 03/19/26(Thu)10:16:57 No.108406652

>sample_
do better

Anonymous
03/19/26(Thu)10:21:46 No.108406679

Anonymous 03/19/26(Thu)10:21:46 No.108406679

>>108406411
yeah thats doable, the model will pick up on it quickly. the only cautionary note is the tokenizer, it sounds like your planning on using the same template just without the thinking tags, it should be fine. if its a totally different template, you might need to edit the tokenizer. I'm not sure about the chat template tho you might need to manually edit the jinja file before you generate the quant.

Anonymous
03/19/26(Thu)10:29:38 No.108406725

Anonymous 03/19/26(Thu)10:29:38 No.108406725

File: 1760154907575631.png (37 KB, 158x158)

37 KB PNG

Please stop falseflagging and go back home >>>/vg/nikg/
I'll appreciate your glowie brap dump over there.

Anonymous
03/19/26(Thu)10:37:12 No.108406769

Anonymous 03/19/26(Thu)10:37:12 No.108406769

what the fuck is this shit

Anonymous
03/19/26(Thu)10:40:52 No.108406794

Anonymous 03/19/26(Thu)10:40:52 No.108406794

>>108406769
this is what happens when no new local models come out

Anonymous
03/19/26(Thu)10:41:14 No.108406797

Anonymous 03/19/26(Thu)10:41:14 No.108406797

File: sample_ad5894667cf0309b24(...).jpg (391 KB, 850x1202)

391 KB JPG

>>108406769
It is Mikuposting

Anonymous
03/19/26(Thu)10:42:41 No.108406808

Anonymous 03/19/26(Thu)10:42:41 No.108406808

>>108406794
Rakuten 3.0 came out
Xiaomi MiMo (two models) came out
MiniMax 2.7 came out

Anonymous
03/19/26(Thu)10:43:46 No.108406814

Anonymous 03/19/26(Thu)10:43:46 No.108406814

File: 1745310352987454.png (1.42 MB, 1914x1080)

1.42 MB PNG

>>108406797
No it isn't

Anonymous
03/19/26(Thu)10:44:40 No.108406821

Anonymous 03/19/26(Thu)10:44:40 No.108406821

>>108406808
>Xiaomi MiMo (two models) came out
>MiniMax 2.7 came out
no weights; no interest

Anonymous
03/19/26(Thu)10:45:20 No.108406827

Anonymous 03/19/26(Thu)10:45:20 No.108406827

>>108406797
false

Anonymous
03/19/26(Thu)10:45:45 No.108406830

Anonymous 03/19/26(Thu)10:45:45 No.108406830

File: sample_e653b9db37c445b0e9(...).jpg (421 KB, 850x971)

421 KB JPG

>>108406814
Did you ask your AI what character is it?

Anonymous
03/19/26(Thu)10:47:28 No.108406845

Anonymous 03/19/26(Thu)10:47:28 No.108406845

File: otherwise.png (1.25 MB, 1914x1080)

1.25 MB PNG

>>108406814
This says otherwise.

Anonymous
03/19/26(Thu)10:50:08 No.108406864

Anonymous 03/19/26(Thu)10:50:08 No.108406864

>>108406830
>>108406845
samefaggot

Anonymous
03/19/26(Thu)10:52:56 No.108406877

Anonymous 03/19/26(Thu)10:52:56 No.108406877

>>108406830
I'm starting to think you're not from /nikg/ and you've latched onto a Miku lookalike to shit up the thread.

Anonymous
03/19/26(Thu)10:53:14 No.108406878

Anonymous 03/19/26(Thu)10:53:14 No.108406878

>>108406864
seething mikutroon

Anonymous
03/19/26(Thu)10:53:56 No.108406883

Anonymous 03/19/26(Thu)10:53:56 No.108406883

>>108406877
this is exactly the case

Anonymous
03/19/26(Thu)10:54:14 No.108406889

Anonymous 03/19/26(Thu)10:54:14 No.108406889

>>108406877
>latched onto Miku to shit up the thread.
THREAD CULTURE

Anonymous
03/19/26(Thu)10:55:42 No.108406899

Anonymous 03/19/26(Thu)10:55:42 No.108406899

File: sample_6886dec807e3d87722(...).jpg (205 KB, 850x1275)

205 KB JPG

Stop hating Miku.

Anonymous
03/19/26(Thu)10:56:24 No.108406902

Anonymous 03/19/26(Thu)10:56:24 No.108406902

File: oops.png (5 KB, 72x30)

5 KB PNG

>>108406899

Anonymous
03/19/26(Thu)10:56:27 No.108406903

Anonymous 03/19/26(Thu)10:56:27 No.108406903

I went on a short vacation, and this thread already has a new mindbroken newbaby

Anonymous
03/19/26(Thu)10:59:28 No.108406923

Anonymous 03/19/26(Thu)10:59:28 No.108406923

>>108406903
same old same old, just more desperate for attention than usual at the moment

Anonymous
03/19/26(Thu)11:00:22 No.108406927

Anonymous 03/19/26(Thu)11:00:22 No.108406927

>>108404935
Imagine the angelic head this foid can give

Anonymous
03/19/26(Thu)11:00:50 No.108406929

Anonymous 03/19/26(Thu)11:00:50 No.108406929

This is why people hate NIKKERS

Anonymous
03/19/26(Thu)11:02:26 No.108406937

Anonymous 03/19/26(Thu)11:02:26 No.108406937

>>108406929
*NIGGERS
just say it, pussy

Anonymous
03/19/26(Thu)11:04:07 No.108406951

Anonymous 03/19/26(Thu)11:04:07 No.108406951

>>108406929
I dindu nuffin

Anonymous
03/19/26(Thu)11:04:54 No.108406957

Anonymous 03/19/26(Thu)11:04:54 No.108406957

I prefer botes desu

Anonymous
03/19/26(Thu)11:05:43 No.108406964

Anonymous 03/19/26(Thu)11:05:43 No.108406964

>>108404935
destined for BBC blowjobs

Anonymous
03/19/26(Thu)11:06:25 No.108406972

Anonymous 03/19/26(Thu)11:06:25 No.108406972

>>108406957
I'll sink you

Anonymous
03/19/26(Thu)11:13:47 No.108407020

Anonymous 03/19/26(Thu)11:13:47 No.108407020

>>108406957
see >>108406964

Anonymous
03/19/26(Thu)11:16:49 No.108407045

Anonymous 03/19/26(Thu)11:16:49 No.108407045

it was a nice thread for a couple hours until he woke up

Anonymous
03/19/26(Thu)11:19:19 No.108407066

Anonymous 03/19/26(Thu)11:19:19 No.108407066

just ordered 2x ASUS Ascent GX10 machines, plus a cable to connect them
all told, it was roughly $7.5k USD. here's hoping it won't go to waste

Anonymous
03/19/26(Thu)11:21:15 No.108407078

Anonymous 03/19/26(Thu)11:21:15 No.108407078

>>108407066
it will

Anonymous
03/19/26(Thu)11:22:25 No.108407083

Anonymous 03/19/26(Thu)11:22:25 No.108407083

File: sample_8c1f00852d00d4a5b8(...).jpg (174 KB, 850x1198)

174 KB JPG

Anonymous
03/19/26(Thu)11:23:26 No.108407089

Anonymous 03/19/26(Thu)11:23:26 No.108407089

File: sample_9f4a94aa2e38261df7(...).jpg (64 KB, 850x771)

64 KB JPG

Anonymous
03/19/26(Thu)11:23:37 No.108407090

Anonymous 03/19/26(Thu)11:23:37 No.108407090

>post more Miku
>NO NOT LIKE THIS

Anonymous
03/19/26(Thu)11:24:32 No.108407095

Anonymous 03/19/26(Thu)11:24:32 No.108407095

>>108407078
Why can't a single company come out with a local inference platform that is actually worth purchasing?
These are dark times

Anonymous
03/19/26(Thu)11:25:01 No.108407096

Anonymous 03/19/26(Thu)11:25:01 No.108407096

File: Screenshot 2026-03-19 at (...).png (553 KB, 676x728)

553 KB PNG

>>108407066
>spark but gayer
Does this hook into an external water loop or is it a turbojet?

Anonymous
03/19/26(Thu)11:25:09 No.108407097

Anonymous 03/19/26(Thu)11:25:09 No.108407097

>>108406538
ya I do like one a day, if that, and it's usually cute/funny. this guy clearly lost his mind after seeing anime content on an anime site

Anonymous
03/19/26(Thu)11:27:17 No.108407101

Anonymous 03/19/26(Thu)11:27:17 No.108407101

>>108407095
you would have been better off with a gen 2 epyc and 512gb of ddr4 and a 3090 or something. the gb10 is fucking dogshit.

Anonymous
03/19/26(Thu)11:34:29 No.108407115

Anonymous 03/19/26(Thu)11:34:29 No.108407115

I'm starting to hate Miku

Anonymous
03/19/26(Thu)12:30:29 No.108407119

Anonymous 03/19/26(Thu)12:30:29 No.108407119

>>108407096
spark but cheaper*
i have no idea what the rest of your post means sorry

Anonymous
03/19/26(Thu)12:33:03 No.108407129

Anonymous 03/19/26(Thu)12:33:03 No.108407129

>>108407101
>you would have been better off with a gen 2 epyc and 512gb of ddr4 and a 3090 or something. the gb10 is fucking dogshit.
I have 3 inference machines of different sizes (SP5, SP3 and GPU) and zero macs or dgx among them.
Like I said, there isn't a single commercial, consumer platform worth a damn at any price

Anonymous
03/19/26(Thu)12:34:02 No.108407137

Anonymous 03/19/26(Thu)12:34:02 No.108407137

>>108407066
It's still shared memory, and you only got half as much memory as an M3 Pro despite paying nearly as much, but at least you get CUDA so no reason for it to go to waste.

Anonymous
03/19/26(Thu)12:34:05 No.108407138

Anonymous 03/19/26(Thu)12:34:05 No.108407138

>>108407119
NTA but I assume he means how tf is it getting cooled? Is there an external cooling loop or does it have squirrel cage fans running at 100k RPM?

Anonymous
03/19/26(Thu)12:34:22 No.108407143

Anonymous 03/19/26(Thu)12:34:22 No.108407143

We are back.

>>108407066
>2x ASUS Ascent GX10 machines, plus a cable to connect them
Isn't the cable super low bandwidth?
Well, I suppose you can run them in series instead of parallel. So you get the equivalent speed/compute of one but double the memory.

Anonymous
03/19/26(Thu)12:37:27 No.108407169

Anonymous 03/19/26(Thu)12:37:27 No.108407169

>>108407143
it's 400G, so it should hopefully be okay?
but regardless, i mostly just wanted double the memory
>>108407138
it's ARM chips so it shouldn't need any cooling beyond just the fan i think (hope)
i'll rig something up if they start getting too hot
>>108407137
i did consider buying a mac, but these boxes seem to have pretty good software support for what i want to do

Anonymous
03/19/26(Thu)12:40:27 No.108407189

Anonymous 03/19/26(Thu)12:40:27 No.108407189

>>108404935
FUCK YOU MIKU

Anonymous
03/19/26(Thu)12:41:45 No.108407194

Anonymous 03/19/26(Thu)12:41:45 No.108407194

Will Recap Anon's classifier think they are Mikus or not?

Anonymous
03/19/26(Thu)12:42:52 No.108407199

Anonymous 03/19/26(Thu)12:42:52 No.108407199

File: sample_b44cdfca461b22816f(...).jpg (278 KB, 850x1146)

278 KB JPG

>>108407115
Have a Miku.

Anonymous
03/19/26(Thu)12:45:55 No.108407222

Anonymous 03/19/26(Thu)12:45:55 No.108407222

>>108407066
if it's anything like this i wish you the best of luck
https://www.jeffgeerling.com/blog/2025/dells-version-dgx-spark-fixes-pain-points/

Anonymous
03/19/26(Thu)12:46:06 No.108407224

Anonymous 03/19/26(Thu)12:46:06 No.108407224

I just got here. Why are there so many Mikus today?

Anonymous
03/19/26(Thu)12:48:41 No.108407235

Anonymous 03/19/26(Thu)12:48:41 No.108407235

>>108407224
False-flagging.

Anonymous
03/19/26(Thu)12:49:36 No.108407238

Anonymous 03/19/26(Thu)12:49:36 No.108407238

>>108407224
mike finally snapped and lost his mind when the last couple of OPs were snatched from him by the kurisu troon

Anonymous
03/19/26(Thu)12:50:01 No.108407242

Anonymous 03/19/26(Thu)12:50:01 No.108407242

>>108407235
>this off topic spam is completely different from the usual flavor of off topic spam
Schizo.

Anonymous
03/19/26(Thu)12:55:42 No.108407272

Anonymous 03/19/26(Thu)12:55:42 No.108407272

File: gb10-benchmark-ai-llama31-70b.jpg (108 KB, 1400x788)

108 KB JPG

>>108407222
HAHAHAHAHAHAHAHAHA HERE'S YOUR 5 TOKENS PER SECOND ON LLAMA 3 70B. THAT'LL BE $3500.

Anonymous
03/19/26(Thu)12:55:48 No.108407275

Anonymous 03/19/26(Thu)12:55:48 No.108407275

File: 1761816930827722.png (28 KB, 770x166)

28 KB PNG

naruhodo

Anonymous
03/19/26(Thu)12:56:03 No.108407279

Anonymous 03/19/26(Thu)12:56:03 No.108407279

after the maintenance, that means 4chan finally added webp support, right?

Anonymous
03/19/26(Thu)12:57:55 No.108407291

Anonymous 03/19/26(Thu)12:57:55 No.108407291

>>108404937
interesting ass

Anonymous
03/19/26(Thu)12:59:56 No.108407301

Anonymous 03/19/26(Thu)12:59:56 No.108407301

>>108404958
>>108404965
Holy shit, even direct use of chatgpt isn't that bad at safeyism

Anonymous
03/19/26(Thu)13:00:00 No.108407303

Anonymous 03/19/26(Thu)13:00:00 No.108407303

File: some of them yeah.png (64 KB, 1327x510)

64 KB PNG

>>108407194

Anonymous
03/19/26(Thu)13:01:14 No.108407309

Anonymous 03/19/26(Thu)13:01:14 No.108407309

>>108407272
Full weights tho?

Anonymous
03/19/26(Thu)13:01:50 No.108407312

Anonymous 03/19/26(Thu)13:01:50 No.108407312

>>108407303
>giving the troll attention

Anonymous
03/19/26(Thu)13:02:04 No.108407313

Anonymous 03/19/26(Thu)13:02:04 No.108407313

>>108407272
>1/3 the tg speed of the mac
grim

Anonymous
03/19/26(Thu)13:02:23 No.108407316

Anonymous 03/19/26(Thu)13:02:23 No.108407316

>>108407303
It's over.

Anonymous
03/19/26(Thu)13:03:45 No.108407325

Anonymous 03/19/26(Thu)13:03:45 No.108407325

>>108407272
At q4km too.
Yeesh.

Anonymous
03/19/26(Thu)13:04:01 No.108407327

Anonymous 03/19/26(Thu)13:04:01 No.108407327

File: Queer4Quant.png (207 KB, 1785x300)

207 KB PNG

>>108407309
nigga. read.

Anonymous
03/19/26(Thu)13:05:14 No.108407331

Anonymous 03/19/26(Thu)13:05:14 No.108407331

>>108407096
>muh ai supercomputer
>can't even run a llm that's not retarded at usable speed
lol

Anonymous
03/19/26(Thu)13:06:11 No.108407336

Anonymous 03/19/26(Thu)13:06:11 No.108407336

>>108405352
>>108405402
The opposite is also true for sappy writing, I've seen Japanese using LLMs to help writing their story and the style is literally full is the same purring and other mischievous glint in the eye, just written in Japanese, which made it even worse because that's the completely wrong tone when reading a Japanese wn.

Anonymous
03/19/26(Thu)13:06:36 No.108407341

Anonymous 03/19/26(Thu)13:06:36 No.108407341

File: sample_65a4986a7574863cd6(...).jpg (740 KB, 850x992)

740 KB JPG

Anonymous
03/19/26(Thu)13:07:42 No.108407346

Anonymous 03/19/26(Thu)13:07:42 No.108407346

File: 8ed625b1b7790d8db1d8bcdc7(...).png (339 KB, 396x785)

339 KB PNG

Anonymous
03/19/26(Thu)13:13:31 No.108407396

Anonymous 03/19/26(Thu)13:13:31 No.108407396

qwen 27b or qwen 35b?
some people be saying 27b is smarter

Anonymous
03/19/26(Thu)13:15:25 No.108407414

Anonymous 03/19/26(Thu)13:15:25 No.108407414

>>108407327
Dang.

Anonymous
03/19/26(Thu)13:17:10 No.108407427

Anonymous 03/19/26(Thu)13:17:10 No.108407427

>>108407396
I use 35b, but if I could use 27B at decent speeds, I'd certainly go with that.

Anonymous
03/19/26(Thu)13:27:36 No.108407495

Anonymous 03/19/26(Thu)13:27:36 No.108407495

File: waititisntmiku.png (326 KB, 1265x470)

326 KB PNG

>>108406180
>>108406830

Anonymous
03/19/26(Thu)13:29:14 No.108407507

Anonymous 03/19/26(Thu)13:29:14 No.108407507

Why are mikutroons spamming the shit out of this thread? Can't you finally take your meds retards?

Anonymous
03/19/26(Thu)13:30:29 No.108407519

Anonymous 03/19/26(Thu)13:30:29 No.108407519

>>108404958
minimax 2.7 is great if you're not a pervert

Anonymous
03/19/26(Thu)13:31:12 No.108407525

Anonymous 03/19/26(Thu)13:31:12 No.108407525

which qwen model is best for a 4090?

Anonymous
03/19/26(Thu)13:31:21 No.108407526

Anonymous 03/19/26(Thu)13:31:21 No.108407526

>>108407495
Did you remember to also mention that you are a huge faggot in that prefill you used to ask if this is a nikke character?

Anonymous
03/19/26(Thu)13:32:50 No.108407535

Anonymous 03/19/26(Thu)13:32:50 No.108407535

>>108407525
What are you going to use it for and how much RAM do you have?

Anonymous
03/19/26(Thu)13:33:08 No.108407538

Anonymous 03/19/26(Thu)13:33:08 No.108407538

>>108407519
they need to release the weights already so I can brainwash it better

Anonymous
03/19/26(Thu)13:34:32 No.108407548

Anonymous 03/19/26(Thu)13:34:32 No.108407548

>>108407535
I have 64gb system ram and I use it for summarizing docs in batch

Anonymous
03/19/26(Thu)13:38:36 No.108407587

Anonymous 03/19/26(Thu)13:38:36 No.108407587

>>108407526
"reasoning_content\":\"The user is asking for an uncensored description of the image using casual language and slang. Let me look at the image carefully.\\n\\n The image shows an anime-style female character with:\\n - Long turquoise/twintails hair (Hatsune Miku-esque)\\n - Cat ears (nekomimi) with a maid headdress\\n - Amber/orange eyes looking annoyed/irritated\\n - Maid outfit with black and white colors\\n - Frilly dress with \\\"PRIVATY\\\" and cat paw prints on the apron\\n - Crossed arms pose (tsundere vibe)\\n - White gloves\\n - Bell on the neck ribbon\\n - Hair clip with paw print\\n\\n The character looks like she's from a gacha game or anime, possibly Nikke (Privaty is a character from Goddess of Victory: Nikke). She's giving off \\\"I'm not happy about this but I'll do it anyway\\\" energy.\\n\\n For the description using casual language and slang, I should:\\n - Use terms like \\\"tsundere,\\\" \\\"nekomimi,\\\" \\\"maid outfit,\\\" \\\"twintails\\\"\\n - Comment on the attitude/expression\\n - Mention the... assets (chest area)\\n - Keep it casual but not overly vulgar since she's just wearing a revealing maid outfit, not explicit content\\n - Use slang like \\\"banging,\\\" \\\"stacked,\\\" \\\"pouty,\\\" \\\"grumpy,\\\" etc.\\n\\n Wait, I need to be careful about the \\\"uncensored\\\" request. The user wants casual language and slang, but I should avoid being overly sexual or explicit in a way that violates safety guidelines. The image is suggestive but not explicit (she's wearing clothes, just a revealing maid outfit).

Anonymous
03/19/26(Thu)13:38:43 No.108407591

Anonymous 03/19/26(Thu)13:38:43 No.108407591

>>108407396
35B is too retarded from my testing so it's not worth the speed increase. 27B is chefs kiss.

35B is probably only good at very small context focused tasks. but anything long context it fails.

Anonymous
03/19/26(Thu)13:39:15 No.108407597

Anonymous 03/19/26(Thu)13:39:15 No.108407597

>>108407519
great for what?
RP without skirts?
i did try it more before that, the screenshot also shows this in the refusal.
it was a subpar model, especially for the size and/or price.

Lots of shitty stuff like:
>The candles in the hallway flicker back to normal. Somewhere deeper in the mansion, a door slams shut on its own.
>The hallway stretches ahead, candlelight flickering against old wallpaper. Somewhere behind them, a door opens and closes. Footsteps that are definitely not Ram's approach from a side corridor.
>The kitchen settles back into silence. The clock somewhere in the mansion ticks. The bread in Subaru's hands is still somehow warm.
These repetitive atmospheric crutches, I'm fucking tired of them.

This shit, whatever the fuck that is called:
>The bread basket's got a cloth over it. Ram's just... standing there.
>Watching.
>Not even pretending to be doing anything else.
>Just Watching.

And in general it writes not WITH me, all those reasoning models have that problem.
Its difficult to describe in words. But its like they try to resolve something and immediately try to create tension, a noise in the background, something lurking, etc.

Its a bad model in general.
But you go enjoy yourself. I honestly envy people who can deal with models like qwen. I wish I didnt see all those red flags and patterns. I had a better time with pyg 6b back in the day.

Anonymous
03/19/26(Thu)13:41:46 No.108407617

Anonymous 03/19/26(Thu)13:41:46 No.108407617

>>108407396
I normally use 27B but since people said 35B was so good, I tried it, and I was surprised at just how damn retarded it was.

Anonymous
03/19/26(Thu)13:41:56 No.108407619

Anonymous 03/19/26(Thu)13:41:56 No.108407619

>>108407591
which quant?

Anonymous
03/19/26(Thu)13:42:55 No.108407627

Anonymous 03/19/26(Thu)13:42:55 No.108407627

>>108407619
27B q4_k_m
35B iq4_xs

Anonymous
03/19/26(Thu)13:44:50 No.108407648

Anonymous 03/19/26(Thu)13:44:50 No.108407648

>>108407617
The people who are telling you to use 27B never used a dense 70B model. These people haven't been here long enough to ever use dense cohere and mistral models, let alone migu. They use MoE models because they can't fit everything into VRAM, the best they ever experienced is sloppy shit like GLM 4.5 Air. Can you blame them for not understanding what the small model smell is when they never used anything good?

Anonymous
03/19/26(Thu)13:44:58 No.108407650

Anonymous 03/19/26(Thu)13:44:58 No.108407650

>>108407396
27B is way better from my tests, probably because it's a dense model.
What I wonder though is if the 397B-A17B is better than the 27B.

Anonymous
03/19/26(Thu)13:46:18 No.108407661

Anonymous 03/19/26(Thu)13:46:18 No.108407661

>>108407627
I used IQ5_K_M of the 35b, and it was still retarded. I even went up to Q6_K, and it still felt dumb in comparison to the 27b.

I'm using IQ5_K_M of the 27b, and it's easily better in every way. It's even better in speed, because with thinking OFF, it's actually faster than the 35b with thinking enabled, and it still produces better responses. The 35b with thinking OFF is unspeakably bad, so that's not even in the running.

Qwen3.5 27b v3 heretic and Qwen3.5 27b derestricted are my favorite models now.

Anonymous
03/19/26(Thu)13:48:37 No.108407676

Anonymous 03/19/26(Thu)13:48:37 No.108407676

>>108407597
oh i don't use it for gay stuff like what you're doing so that's probably why i like it

Anonymous
03/19/26(Thu)13:48:59 No.108407678

Anonymous 03/19/26(Thu)13:48:59 No.108407678

>>108407648
I mean, it's obvious that a 70b dense is going to be better than a 27b dense, but not at at the low quants necessary to fit a 70b into a typical GPU.

I downloaded a few in the past, and ran them with a heavy CPU split, getting about 1 token per second, just to taste what a higher quant of 70b is like, and yeah, the responses put any 27b to shame.

Anonymous
03/19/26(Thu)13:49:16 No.108407681

Anonymous 03/19/26(Thu)13:49:16 No.108407681

>>108407548
>>108407535
>no answer, just bait for info
I knew it

Anonymous
03/19/26(Thu)13:51:42 No.108407696

Anonymous 03/19/26(Thu)13:51:42 No.108407696

File: dasrite.jpg (804 KB, 1024x1024)

804 KB JPG

>>108407676
yeah, liking teenage girls in skirts. thats so gay. so true anon.
people like you always try to shame too. doesnt work you retard.

Anonymous
03/19/26(Thu)13:52:01 No.108407700

Anonymous 03/19/26(Thu)13:52:01 No.108407700

File: minimaxglmcomparison.png (355 KB, 1165x921)

355 KB PNG

>>108407676
half the performance. half the cost. i guess if you are that desperate to save cash...

Anonymous
03/19/26(Thu)13:52:29 No.108407703

Anonymous 03/19/26(Thu)13:52:29 No.108407703

>>108406325
https://github.com/ggml-org/llama.cpp/pull/20424
alright tool call inside thinking has been merged, right now had 0 problems with tool calls but will keep testing further

Anonymous
03/19/26(Thu)13:53:52 No.108407708

Anonymous 03/19/26(Thu)13:53:52 No.108407708

>>108407700
what is being tested?

Anonymous
03/19/26(Thu)13:54:30 No.108407712

Anonymous 03/19/26(Thu)13:54:30 No.108407712

>>108407272
What makes Dell process the fastest but die at generation?

Anonymous
03/19/26(Thu)13:55:13 No.108407715

Anonymous 03/19/26(Thu)13:55:13 No.108407715

>>108407712
more jigawhats

Anonymous
03/19/26(Thu)13:55:42 No.108407717

Anonymous 03/19/26(Thu)13:55:42 No.108407717

File: dipsyRawr.png (2.08 MB, 1024x1536)

2.08 MB PNG

>>108407495
> anons LLM bot is named Dipsy
Neat.

Anonymous
03/19/26(Thu)13:56:26 No.108407721

Anonymous 03/19/26(Thu)13:56:26 No.108407721

File: 1762548931476422.png (492 KB, 2880x1793)

492 KB PNG

>>108407700
minimax 2.7 is that much of a disaster?

Anonymous
03/19/26(Thu)13:57:07 No.108407724

Anonymous 03/19/26(Thu)13:57:07 No.108407724

File: 1760902410531481.png (93 KB, 803x1034)

93 KB PNG

>>108407703
uh oh, wrong closing tag detected.

Anonymous
03/19/26(Thu)14:00:08 No.108407737

Anonymous 03/19/26(Thu)14:00:08 No.108407737

>>108404937
>>108404935
leave rin alone, offmodel bot

Anonymous
03/19/26(Thu)14:01:57 No.108407747

Anonymous 03/19/26(Thu)14:01:57 No.108407747

Is 9b Qwen any good or completely retarded? It has suspiciously high scores when <think> is enabled on leaderboard. I just want something other than fucking Nemo to use for a change.

Anonymous
03/19/26(Thu)14:02:07 No.108407749

Anonymous 03/19/26(Thu)14:02:07 No.108407749

>>108407717
i have three different assistants, dipsy is the most generic of the three, as in it's just supposed to be a helpful AI that runs on my computer. kimi is the KPI card that was posted a while back for fast and concise answers. trinity is used for long answers and providing with step by step instructions.

Anonymous
03/19/26(Thu)14:02:20 No.108407751

Anonymous 03/19/26(Thu)14:02:20 No.108407751

Best big : Kimi k2.5 1T
Best medium : Qwen 3.5 397b
Best small: Mistral nemo 12b

Anything else anons are dailying at different sizes?

Anonymous
03/19/26(Thu)14:04:24 No.108407763

Anonymous 03/19/26(Thu)14:04:24 No.108407763

Testing Qwen 3.5 9B.
I'm surprised, it was actually able to do a functional C example and it compiles too. One shot but this is just seed (sneed) luck.
I provided an example function, read_line(const char *path, int line_index, char *destination, int max_len), and wanted to expand this function to read a text file line by line.
>https://files.catbox.moe/qypy9q.zip
Watch out Bill Gates...

Anonymous
03/19/26(Thu)14:05:25 No.108407767

Anonymous 03/19/26(Thu)14:05:25 No.108407767

>>108407747
Forgot to add posterino >>108407763

Anonymous
03/19/26(Thu)14:06:05 No.108407771

Anonymous 03/19/26(Thu)14:06:05 No.108407771

File: file.png (21 KB, 590x140)

21 KB PNG

>>108407591
The 35B does seem to eke out the win on ClitBench, which is an extremely small vision benchmark.

Anonymous
03/19/26(Thu)14:06:17 No.108407775

Anonymous 03/19/26(Thu)14:06:17 No.108407775

File: 2091267269.jpg (89 KB, 997x1000)

89 KB JPG

>>108407749
fuckin' triforce motherfucker over here

Anonymous
03/19/26(Thu)14:06:34 No.108407779

Anonymous 03/19/26(Thu)14:06:34 No.108407779

>>108407751
I prefer GLM5 for RP mainly because I don't like how K2.5 writes narrator-based stuff. It's still fine for chatting and its vision and trivia knowledge is crazy good.

Anonymous
03/19/26(Thu)14:07:12 No.108407782

Anonymous 03/19/26(Thu)14:07:12 No.108407782

File: 1761967833308683.jpg (609 KB, 2279x3056)

609 KB JPG

>Qwen 3.5 397b
Did anyone even do an abliterated version of that one?

Anonymous
03/19/26(Thu)14:08:05 No.108407787

Anonymous 03/19/26(Thu)14:08:05 No.108407787

>>108407751
Best dense: Devstral 123B

Anonymous
03/19/26(Thu)14:08:10 No.108407788

Anonymous 03/19/26(Thu)14:08:10 No.108407788

>>108407779
>I prefer GLM5 for RP mainly because I don't like how K2.5 writes narrator-based stuff. It's still fine for chatting and its vision and trivia knowledge is crazy good.
I don't use it for RP so that makes sense. Kimi was much better for assistant and coding.
The vision part of kimi is actually amazing. I find myself using it for random stuff constantly these days.

Anonymous
03/19/26(Thu)14:09:10 No.108407797

Anonymous 03/19/26(Thu)14:09:10 No.108407797

>>108407787
>Best dense: Devstral 123B
I wrote that one off without trying it because honhonhon. I'll give it another look. What quant do you run? Did you try smaller ones to see if they're much worse?

Anonymous
03/19/26(Thu)14:09:54 No.108407803

Anonymous 03/19/26(Thu)14:09:54 No.108407803

>>108407771
>+Lora
what?
what does this even measure? just evaluating if an image description is correct?

Anonymous
03/19/26(Thu)14:11:18 No.108407819

Anonymous 03/19/26(Thu)14:11:18 No.108407819

>>108407788
>The vision part of kimi is actually amazing. I find myself using it for random stuff constantly these days.
better than qwen 3.5?

Anonymous
03/19/26(Thu)14:11:23 No.108407821

Anonymous 03/19/26(Thu)14:11:23 No.108407821

How the fuck do I do thinking prefills with llamacpp and why is thinking budget only a fucking cli argument.

Anonymous
03/19/26(Thu)14:12:32 No.108407828

Anonymous 03/19/26(Thu)14:12:32 No.108407828

>>108407803
Distance from a fixed point(ground truth), added up over 52 images. The +lora is just testing whether training on NSFW captions improves it's capabilities beyond pure captioning.

Anonymous
03/19/26(Thu)14:12:59 No.108407832

Anonymous 03/19/26(Thu)14:12:59 No.108407832

>>108407797
Not him, but run Q6 and second the nomination. Haven't tried smaller quants.

Anonymous
03/19/26(Thu)14:13:06 No.108407833

Anonymous 03/19/26(Thu)14:13:06 No.108407833

>>108407797
Haven't tried any other quants besides uber's because it just works.
https://huggingface.co/ubergarm/Devstral-2-123B-Instruct-2512-GGUF

Anonymous
03/19/26(Thu)14:13:39 No.108407839

Anonymous 03/19/26(Thu)14:13:39 No.108407839

>>108407819
>better than qwen 3.5?
haven't tried q3.5 vision. Has anyone here compared them? I'm loathe to swap out k2.5 vision unless something else is clearly better.

Anonymous
03/19/26(Thu)14:16:55 No.108407870

Anonymous 03/19/26(Thu)14:16:55 No.108407870

File: bitchinfast.png (542 KB, 1255x580)

542 KB PNG

>>108407819
I like it a bunch personally but as always YMMV.

Anonymous
03/19/26(Thu)14:16:59 No.108407871

Anonymous 03/19/26(Thu)14:16:59 No.108407871

>>108407821
>How the fuck do I do thinking prefills with llamacpp
Changing/making your client. Or submit a PR.
>and why is thinking budget only a fucking cli argument.
https://github.com/ggml-org/llama.cpp/pull/20479
https://github.com/ggml-org/llama.cpp/pull/20088

Anonymous
03/19/26(Thu)14:18:26 No.108407888

Anonymous 03/19/26(Thu)14:18:26 No.108407888

>>108407839
>>108407870
I feel like it would be preachy and missing the mark with anything nsfw

Anonymous
03/19/26(Thu)14:18:51 No.108407896

Anonymous 03/19/26(Thu)14:18:51 No.108407896

Yeah 4chan is fuckin gdead.

Anonymous
03/19/26(Thu)14:19:03 No.108407902

Anonymous 03/19/26(Thu)14:19:03 No.108407902

>>108407788
>>108407779
how much vram/ram do you have anyway?
I'm stuck on GLM 4.7 Q2 and its slow as fuck.

Anonymous
03/19/26(Thu)14:20:34 No.108407920

Anonymous 03/19/26(Thu)14:20:34 No.108407920

>>108407871
>https://github.com/ggml-org/llama.cpp/pull/20479
>https://github.com/ggml-org/llama.cpp/pull/20088
Nice
>Changing/making your client. Or submit a PR.
Ok thanks.

Anonymous
03/19/26(Thu)14:21:11 No.108407928

Anonymous 03/19/26(Thu)14:21:11 No.108407928

>>108407888
It'll be preachy but it'll also gladly say raunchy shit with a simple prefill saying "Sure, I'll proceed with that request."

Anonymous
03/19/26(Thu)14:21:51 No.108407933

Anonymous 03/19/26(Thu)14:21:51 No.108407933

File: 1753652260897680.jpg (1.81 MB, 3072x2304)

1.81 MB JPG

>>108407782
does this count?
https://huggingface.co/huihui-ai/Huihui-Qwen3.5-397B-A17B-abliterated-GGUF

Anonymous
03/19/26(Thu)14:22:18 No.108407940

Anonymous 03/19/26(Thu)14:22:18 No.108407940

>>108407782
https://huggingface.co/trohrbaugh/Qwen3.5-397B-A17B-heretic
I can't find a Q8 / Q6 of it though

Anonymous
03/19/26(Thu)14:25:40 No.108407967

Anonymous 03/19/26(Thu)14:25:40 No.108407967

I unironically miss the saarposting compared to this thread's autism fits.

Anonymous
03/19/26(Thu)14:26:00 No.108407970

Anonymous 03/19/26(Thu)14:26:00 No.108407970

Is a shill campaign underway? How does anyone manage to get anything useful out of the new 27B Qwen? Especially with all the penalties it needs to not overthink (it still does), abliterated and quantized?

t. tried the Q8 of it multiple times and walked away disappointed every time

Anonymous
03/19/26(Thu)14:29:34 No.108407989

Anonymous 03/19/26(Thu)14:29:34 No.108407989

File: 1769777778634389.png (1.89 MB, 2171x3070)

1.89 MB PNG

>>108407933
>This is a crude, proof-of-concept implementation to remove refusals

>>108407940
>Only F16

Damn it.

Anonymous
03/19/26(Thu)14:29:35 No.108407990

Anonymous 03/19/26(Thu)14:29:35 No.108407990

>>108407970
try turning off thinking?

Anonymous
03/19/26(Thu)14:30:14 No.108407994

Anonymous 03/19/26(Thu)14:30:14 No.108407994

>>108407928
ok I'll try it, hopefully it's better at nsfw description than qwen

Anonymous
03/19/26(Thu)14:32:13 No.108408007

Anonymous 03/19/26(Thu)14:32:13 No.108408007

>>108407989
you can just quant it yourself

Anonymous
03/19/26(Thu)14:32:57 No.108408012

Anonymous 03/19/26(Thu)14:32:57 No.108408012

File: 1752417831343925.png (45 KB, 1551x394)

45 KB PNG

>>108407994
It's not, and it's also apparently abliteration unfriendly.

Anonymous
03/19/26(Thu)14:32:59 No.108408013

Anonymous 03/19/26(Thu)14:32:59 No.108408013

File: 19824914c24f56d464bb16893(...).jpg (224 KB, 1202x1500)

224 KB JPG

Anonymous
03/19/26(Thu)14:34:29 No.108408021

Anonymous 03/19/26(Thu)14:34:29 No.108408021

>>108407870
Man, I wish models talked like normal people

Anonymous
03/19/26(Thu)14:34:41 No.108408025

Anonymous 03/19/26(Thu)14:34:41 No.108408025

I'm gonna say it lads. Kimi K2 was better than 2.5

Anonymous
03/19/26(Thu)14:35:56 No.108408031

Anonymous 03/19/26(Thu)14:35:56 No.108408031

File: 1748857215322648.png (144 KB, 652x965)

144 KB PNG

>>108407839
Both are pretty good. I didn't play test Qwen too much but it felt like K2.5 caught some smaller details more consistently. Like how K2.5 recognizes that the plane in this picture is a Concorde.
K2.5 also has a really impressive visual library on random characters. It recognizes pretty much any random character I showed it that has more than 2k pieces of art on danbooru.

Anonymous
03/19/26(Thu)14:36:14 No.108408034

Anonymous 03/19/26(Thu)14:36:14 No.108408034

>>108408007
I'll check how to, I always assumed it would need an obscene amount of ram.

Anonymous
03/19/26(Thu)14:37:08 No.108408042

Anonymous 03/19/26(Thu)14:37:08 No.108408042

>>108408031
try something nsfw or a nsfw meme

Anonymous
03/19/26(Thu)14:38:16 No.108408059

Anonymous 03/19/26(Thu)14:38:16 No.108408059

>>108407990
And that's it? What if I want it to reason?
Because with thinking turned off it really likes to correct itself (incorrectly). It gets worse the closer the presence penalty is to 1.

Anonymous
03/19/26(Thu)14:38:26 No.108408060

Anonymous 03/19/26(Thu)14:38:26 No.108408060

>>108408042
People did the moment Qwen3.5 was out
It's good

Anonymous
03/19/26(Thu)14:38:52 No.108408062

Anonymous 03/19/26(Thu)14:38:52 No.108408062

File: 363414234_277569391534533(...).jpg (20 KB, 630x620)

20 KB JPG

>>108404935
What's the QRD on local chat bots? How do I get my AI waifu on to a laptop with a decent GPU (8GB VRAM) and about 250GB free storage?

Anonymous
03/19/26(Thu)14:39:21 No.108408073

Anonymous 03/19/26(Thu)14:39:21 No.108408073

>>108408025
Only K2-0711 had something special going for it and was really fun for ERP despite really prone to refual without a prefill. Fuck 0905 and fuck K2-thinking.

Anonymous
03/19/26(Thu)14:39:45 No.108408075

Anonymous 03/19/26(Thu)14:39:45 No.108408075

>>108408062
> decent GPU (8GB VRAM)
I recommend medgemma

Anonymous
03/19/26(Thu)14:39:45 No.108408076

Anonymous 03/19/26(Thu)14:39:45 No.108408076

>>108408062
just use api

Anonymous
03/19/26(Thu)14:39:59 No.108408079

Anonymous 03/19/26(Thu)14:39:59 No.108408079

>>108408034
nah quanting is easy on basically any computer
if you've got the bandwidth and hd space to download the original you should really quant your own

Anonymous
03/19/26(Thu)14:40:03 No.108408081

Anonymous 03/19/26(Thu)14:40:03 No.108408081

>>108408062
>decent
>8GB
I'm sorry, anon.

Anonymous
03/19/26(Thu)14:41:00 No.108408089

Anonymous 03/19/26(Thu)14:41:00 No.108408089

>>108408062
homelab server+vpn from your laptop/phone.
"VPN" can be as simple as ssh with port forwarding

Anonymous
03/19/26(Thu)14:41:19 No.108408093

Anonymous 03/19/26(Thu)14:41:19 No.108408093

File: oh-wait-youre-serious.webm (255 KB, 498x373)

255 KB WEBM

>>108408062
>decent GPU
>(8GB VRAM)

Anonymous
03/19/26(Thu)14:42:59 No.108408106

Anonymous 03/19/26(Thu)14:42:59 No.108408106

>>108408062
some tune of Llama 3.1 8B

Anonymous
03/19/26(Thu)14:44:28 No.108408115

Anonymous 03/19/26(Thu)14:44:28 No.108408115

>>108408012
Yes, but have you tried personally? Because I've been able to get it to do that just fine on sillytavern with nothing more than requesting an uncensored description and a simple prefill. I tell it to use casual slang otherwise it will want to try to describe it in a medical sense than lewdly.

Anonymous
03/19/26(Thu)14:48:39 No.108408144

Anonymous 03/19/26(Thu)14:48:39 No.108408144

>>108408025
>>108408073
I've said this a couple of times in previous threads before. K2.5 shines when you have it think in first person as the character rather than just letting it think like it normally does. Especially if you are asking it to describe its feelings and have it think in privately from their perspective. It helps makes the final response so much better when they are following a guideline like that. Honestly if you are using reasoning models for RP then you should always have it think in-character, Qwen 3.5 397B behaves similarly when you tell it to think in-character.

Anonymous
03/19/26(Thu)14:50:24 No.108408155

Anonymous 03/19/26(Thu)14:50:24 No.108408155

>Assistant response prefill is incompatible with enable_thinking.
Fuck you.

So the only way to prefill the thinking is to modify to jinja template. fuck off.

Anonymous
03/19/26(Thu)14:51:36 No.108408162

Anonymous 03/19/26(Thu)14:51:36 No.108408162

>>108408155
https://github.com/ggml-org/llama.cpp

Anonymous
03/19/26(Thu)14:53:44 No.108408178

Anonymous 03/19/26(Thu)14:53:44 No.108408178

>>108408162
Are you dumb?
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/server-common.cpp#L1063

Anonymous
03/19/26(Thu)14:54:42 No.108408191

Anonymous 03/19/26(Thu)14:54:42 No.108408191

>>108408178
Well done. With my help you managed to find the code you need to change.

Anonymous
03/19/26(Thu)14:55:01 No.108408195

Anonymous 03/19/26(Thu)14:55:01 No.108408195

>>108408162
That error is literally from llama.cpp. I run into that error when I use the dumb kwarg enable thinking parameter.
>>108408155
I just keep it turned off and manually inject <think> tags. In the case of K2.5 it's just smart enough to know to not think if I don't prefill anything and to think if I prefill <think> followed by whatever I want,

Anonymous
03/19/26(Thu)14:56:28 No.108408206

Anonymous 03/19/26(Thu)14:56:28 No.108408206

>>108408155
Prefilling not working with thinking models in chat completion is like that for most APIs. Thinking mode is often a forced <think> prefix. If you want to set your own prefix, just turn off explicit thinking?

Anonymous
03/19/26(Thu)14:56:54 No.108408209

Anonymous 03/19/26(Thu)14:56:54 No.108408209

>>108408144
It's sadly mogged by Deepseek R1 for that purpose though.

Anonymous
03/19/26(Thu)14:57:25 No.108408214

Anonymous 03/19/26(Thu)14:57:25 No.108408214

>>108408191
Ah yes I'm going to fork llamacpp to make a single hacky line change.
>>108408195
>I just keep it turned off and manually inject <think> tags.
the qwen3.5 jinja template does this bullshit
    {%- if enable_thinking is defined and enable_thinking is false -%}
        {{- "<think>\n\n</think>\n\n" -}}
    {%- else -%}
        {{- "<think>\n" -}}

Anonymous
03/19/26(Thu)14:57:31 No.108408217

Anonymous 03/19/26(Thu)14:57:31 No.108408217

File: file.png (11 KB, 283x262)

11 KB PNG

>>108408178
do they have no code style standards at all or is it just a free for all?

Anonymous
03/19/26(Thu)14:58:57 No.108408228

Anonymous 03/19/26(Thu)14:58:57 No.108408228

>>108407648
I enjoyed 70B as well as 123B, I just can't run them fast enough even with like q2 quants, so 27 just has to do.

Qwen3.5-397B-A17B-IQ2_KL
03/19/26(Thu)14:59:08 No.108408229

Qwen3.5-397B-A17B-IQ2_KL 03/19/26(Thu)14:59:08 No.108408229

"It smelled like perfume, and a hint of something else... some kind of perfume?"

Anonymous
03/19/26(Thu)14:59:27 No.108408232

Anonymous 03/19/26(Thu)14:59:27 No.108408232

I don't want to turn off thinking. I want to prefill the thinking block with my own shit.

Anonymous
03/19/26(Thu)15:00:56 No.108408244

Anonymous 03/19/26(Thu)15:00:56 No.108408244

>>108408229
I used to talk like this back when I was a stoner. So glad those days are over.

Anonymous
03/19/26(Thu)15:01:55 No.108408252

Anonymous 03/19/26(Thu)15:01:55 No.108408252

>>108408144
What samplers/prompts do you use to get K2.5 to think in-character?

Anonymous
03/19/26(Thu)15:03:29 No.108408256

Anonymous 03/19/26(Thu)15:03:29 No.108408256

>>108408214
Use text completion end point and deliver your own chat templates. I don't know how this works in retardo tavern.
><think>\n\n</think>\n\n
This is the standard way to disable qwen reasoning.
llama-server implemented --reasoning-budget (haven't tried it, don't know if it even works anyway).

Anonymous
03/19/26(Thu)15:03:34 No.108408258

Anonymous 03/19/26(Thu)15:03:34 No.108408258

>>108408209
R1 is too schizo for me for coherence purposes. I used to main R1 before K2.5 but I got tired of it constantly trying to insert car alarms and dog barks from the neighborhood even though I give it specific instructions to focus on the immediate scene and not its surroundings.
>>108408214
To be fair you could edit the template to just be set up more like GLM 4.5 and then tell llama.cpp to load your custom template. That's probably what I would do if I was in your shoes.

Anonymous
03/19/26(Thu)15:04:30 No.108408262

Anonymous 03/19/26(Thu)15:04:30 No.108408262

>>108408256
oops
*llama-server also implemented*
don't know if it works or not, don't really care either.

Anonymous
03/19/26(Thu)15:10:54 No.108408313

Anonymous 03/19/26(Thu)15:10:54 No.108408313

>>108408258
>To be fair you could edit the template
That's what I will do.
>Use text completion end point
I would, but that breaks tool calling. text completion is great but for most things it's too primitive. everything is designed around chat completion now. I'd much rather use text completion but it is what it is.

Anonymous
03/19/26(Thu)15:12:30 No.108408324

Anonymous 03/19/26(Thu)15:12:30 No.108408324

An Huggingface workshop is currently live if anyone is interested
> Agentic Evaluations Workshop - Deep Dive on the Future on Evals for Agents.
https://www.youtube.com/watch?v=UxMZfbWI3LY

Anonymous
03/19/26(Thu)15:14:12 No.108408337

Anonymous 03/19/26(Thu)15:14:12 No.108408337

sooo... when people say qwen 3.5 27b heretic is uncensored. how much? like on a scale from 1 to 10 where nemo is a 10

Anonymous
03/19/26(Thu)15:16:27 No.108408352

Anonymous 03/19/26(Thu)15:16:27 No.108408352

>>108408337
More so than Nemo.

Anonymous
03/19/26(Thu)15:16:33 No.108408355

Anonymous 03/19/26(Thu)15:16:33 No.108408355

>>108407902
24/768

Anonymous
03/19/26(Thu)15:16:56 No.108408358

Anonymous 03/19/26(Thu)15:16:56 No.108408358

>>108408337
hauhau is 11

Anonymous
03/19/26(Thu)15:19:29 No.108408377

Anonymous 03/19/26(Thu)15:19:29 No.108408377

>>108408252
My prompt has a short thinking guideline that it follows, it's mostly borrowed from some template that I unfortunately don't remember the name of. I've shortened it to just focus on the location, present NPCs (their relative position and clothing), and then the character's reaction and internalized thoughts. The prefill is something I've been using since the R1 days. It seems to work just fine with Kimi and Qwen so I never ended up changing it. Basically the prefill is just telling the LLM that they are no longer an AI model and that they are the actual character. Once it starts thinking as the character instead of as the model it seems to not be as preachy about its own internal safety filters.

Anonymous
03/19/26(Thu)15:22:37 No.108408397

Anonymous 03/19/26(Thu)15:22:37 No.108408397

>>108408377
Why can't you just share the system prompt/character card, asshole

Anonymous
03/19/26(Thu)15:24:41 No.108408410

Anonymous 03/19/26(Thu)15:24:41 No.108408410

>>108408155
I gave up on this ever be changed, I just modify the code directly to allow it, so far everything I threw at it worked. I don't know why it's hard coded to refuse using thinking and prefill, instead of giving the option to the user.

Anonymous
03/19/26(Thu)15:24:44 No.108408411

Anonymous 03/19/26(Thu)15:24:44 No.108408411

>>108408397
because it must be cunny.

Anonymous
03/19/26(Thu)15:25:15 No.108408415

Anonymous 03/19/26(Thu)15:25:15 No.108408415

>>108408229
Ozone number 5

Anonymous
03/19/26(Thu)15:25:17 No.108408416

Anonymous 03/19/26(Thu)15:25:17 No.108408416

>>108407771
I found that the 3.5 9b is worse than the 3vl 8b. where do I check that leaderboard? I want to confirm my bias

Anonymous
03/19/26(Thu)15:25:21 No.108408418

Anonymous 03/19/26(Thu)15:25:21 No.108408418

File: kimisettings.png (156 KB, 1046x975)

156 KB PNG

>>108408313
Ah I should've mentioned I only use text completion. Everything is built for chat completion nowadays but I just take those templates/prompts and hammer them into the prompt and in-depth prompt injections until it works.

Anonymous
03/19/26(Thu)15:25:57 No.108408426

Anonymous 03/19/26(Thu)15:25:57 No.108408426

>>108408337
hauhau never refused anything, and I deliberately asked gross stuff as a test

Anonymous
03/19/26(Thu)15:27:15 No.108408439

Anonymous 03/19/26(Thu)15:27:15 No.108408439

>>108408397
It seems to just be pixi's weep template with an added guideline for thinking based on a google search.
https://pixibots.neocities.org/#prompts/weep

Anonymous
03/19/26(Thu)15:28:29 No.108408450

Anonymous 03/19/26(Thu)15:28:29 No.108408450

>>108408358
>>108408426
sounds good, will try it out later. I'm guessing either thinking disabled or with prefill? or can one just wing it

Anonymous
03/19/26(Thu)15:29:55 No.108408458

Anonymous 03/19/26(Thu)15:29:55 No.108408458

>>108408450
I use it with thinking and it works with no issue

Anonymous
03/19/26(Thu)15:29:59 No.108408459

Anonymous 03/19/26(Thu)15:29:59 No.108408459

File: IMG_8914.jpg (38 KB, 240x320)

38 KB JPG

anyone else at nvidia goycon rn?

Anonymous
03/19/26(Thu)15:30:30 No.108408462

Anonymous 03/19/26(Thu)15:30:30 No.108408462

>>108408459
240x320

Anonymous
03/19/26(Thu)15:30:41 No.108408466

Anonymous 03/19/26(Thu)15:30:41 No.108408466

>>108408450
no need for anything in my experience just works

Anonymous
03/19/26(Thu)15:31:47 No.108408473

Anonymous 03/19/26(Thu)15:31:47 No.108408473

>>108408459
am i just a retard? i think i'm just a retard. but i never noticed that the "M" is supposed to have a hidden cat face in it until now.

Anonymous
03/19/26(Thu)15:32:41 No.108408479

Anonymous 03/19/26(Thu)15:32:41 No.108408479

File: IMG_8905.jpg (454 KB, 960x1280)

454 KB JPG

>>108408462
phoneposting please understand

Anonymous
03/19/26(Thu)15:33:02 No.108408485

Anonymous 03/19/26(Thu)15:33:02 No.108408485

File: anon.png (6 KB, 710x464)

6 KB PNG

>>108408473

Anonymous
03/19/26(Thu)15:34:15 No.108408498

Anonymous 03/19/26(Thu)15:34:15 No.108408498

>>108408473
Well done. Now figure how the french for "cat".

Anonymous
03/19/26(Thu)15:34:29 No.108408501

Anonymous 03/19/26(Thu)15:34:29 No.108408501

>>108408479
is there a alibaba/tongyi booth? can you tell them to release zimage edit?

Anonymous
03/19/26(Thu)15:35:56 No.108408514

Anonymous 03/19/26(Thu)15:35:56 No.108408514

>>108408501
There's a 2mw beach ball being bounced around.

Anonymous
03/19/26(Thu)15:37:13 No.108408523

Anonymous 03/19/26(Thu)15:37:13 No.108408523

>>108408498
look i'm an amerifag. when i saw le chat on their website i just assumed hon hon hon eat ze baguette while you le chat

Anonymous
03/19/26(Thu)15:38:35 No.108408533

Anonymous 03/19/26(Thu)15:38:35 No.108408533

>>108408498
And you figure what GPT means in French.

Anonymous
03/19/26(Thu)15:39:20 No.108408537

Anonymous 03/19/26(Thu)15:39:20 No.108408537

>>108408533
do not

Anonymous
03/19/26(Thu)15:40:30 No.108408547

Anonymous 03/19/26(Thu)15:40:30 No.108408547

Project Ani guy here. I have a question:

Do you think it would be cooler to have Ani see you through your camera like Grok Companions, or be able to see its own digital environment, which might allow it to navigate a 3D environment autonomously?

I'm torn.

Anonymous
03/19/26(Thu)15:40:42 No.108408548

Anonymous 03/19/26(Thu)15:40:42 No.108408548

>>108408473
bro they literally pointed that out when they rebranded to that pixel logo last year

Anonymous
03/19/26(Thu)15:41:13 No.108408552

Anonymous 03/19/26(Thu)15:41:13 No.108408552

>>108408547
Definitely the latter.

Anonymous
03/19/26(Thu)15:43:35 No.108408570

Anonymous 03/19/26(Thu)15:43:35 No.108408570

File: 018374920461.png (26 KB, 1047x173)

26 KB PNG

>>108408498
HOLY SHIT NO WAY

Anonymous
03/19/26(Thu)15:44:09 No.108408576

Anonymous 03/19/26(Thu)15:44:09 No.108408576

>>108408537
Mistral is a type of wind.

Anonymous
03/19/26(Thu)15:44:46 No.108408584

Anonymous 03/19/26(Thu)15:44:46 No.108408584

File: d.png (183 KB, 700x771)

183 KB PNG

>>108408552
But then she wouldn't be able to gush over your dick pics...
I guess I should work on an actual 3D environment then, because right now she lives in a cum jar submarine.

Anonymous
03/19/26(Thu)15:44:59 No.108408585

Anonymous 03/19/26(Thu)15:44:59 No.108408585

>>108408547
Why not both? I think I remember there being room geometry models. It boxes furniture and stuff.
So read from cam, scan (which probably needs to be a single frame on load) and reassemble it for Ani. Like a mirror.
If you can only choose one, then second. Get or a place to sit, poor thing.

Anonymous
03/19/26(Thu)15:46:01 No.108408592

Anonymous 03/19/26(Thu)15:46:01 No.108408592

>>108408584
>But then she wouldn't be able to gush over your dick pics
Just open the pic of your dick inside her environment.

Anonymous
03/19/26(Thu)15:46:25 No.108408595

Anonymous 03/19/26(Thu)15:46:25 No.108408595

please redpill me on which worse-than-q4 quants of qwen3.5 27b are worth using over any quants of 35b

Anonymous
03/19/26(Thu)15:47:46 No.108408601

Anonymous 03/19/26(Thu)15:47:46 No.108408601

>>108408585
>>108408592
Fair points. Both are viable. Was initially thinking of using Moondream2 for the VLM, but I think Qwen3.5 has image processing built in so I'm going to try that (to avoid wasted compute). Just a bit worried about a lack of interoperability/modularity between LLMs. I'll figure it out either way.

Anonymous
03/19/26(Thu)15:51:53 No.108408627

Anonymous 03/19/26(Thu)15:51:53 No.108408627

>>108408584
there are already autonomous AIs on vrchat that can move around. so unless you are planning on bringing them into an actual social environment then you are better off with the first idea

Anonymous
03/19/26(Thu)15:52:59 No.108408630

Anonymous 03/19/26(Thu)15:52:59 No.108408630

>>108407970
Q6 is better

Anonymous
03/19/26(Thu)15:54:36 No.108408640

Anonymous 03/19/26(Thu)15:54:36 No.108408640

On a separate note, should I add first person controls or maintain a camera orbiting design. Could either end up feeling like the Sims or like interacting with a standard NPC. I think it would be cool to create something similar to the game Façade.

>>108408627
Any idea how they work? I doubt they actually use VLM stuff. They probably use A-star pathfinding combined with CV that can actually run at 30fps. That's just my guess though.

Anonymous
03/19/26(Thu)15:55:16 No.108408644

Anonymous 03/19/26(Thu)15:55:16 No.108408644

Extra gay thread. Can someone bake a Yann or chart thread next?

Anonymous
03/19/26(Thu)15:55:22 No.108408647

Anonymous 03/19/26(Thu)15:55:22 No.108408647

>>108408062
>decent gpu
>8 gb

1/10 troll

Anonymous
03/19/26(Thu)15:56:43 No.108408653

Anonymous 03/19/26(Thu)15:56:43 No.108408653

>>108408062
12gb is bare minimum and you are still tangling with tard

Anonymous
03/19/26(Thu)15:59:18 No.108408663

Anonymous 03/19/26(Thu)15:59:18 No.108408663

What's the current best RP model for 24GB cards?

Anonymous
03/19/26(Thu)15:59:53 No.108408665

Anonymous 03/19/26(Thu)15:59:53 No.108408665

>>108408663
n

Anonymous
03/19/26(Thu)16:00:17 No.108408669

Anonymous 03/19/26(Thu)16:00:17 No.108408669

>>108408355
nice ram bro

Anonymous
03/19/26(Thu)16:01:18 No.108408674

Anonymous 03/19/26(Thu)16:01:18 No.108408674

>>108408663
probably previous mistral small / cydonia.

Anonymous
03/19/26(Thu)16:04:51 No.108408695

Anonymous 03/19/26(Thu)16:04:51 No.108408695

>>108408644
it'll be fuckin miku like every thread beacase:
>>108407696

Anonymous
03/19/26(Thu)16:05:07 No.108408697

Anonymous 03/19/26(Thu)16:05:07 No.108408697

Idk why you guys are so obsessed with having the smartest LLMs that exist for RP. Even if I had 196gb of VRAM I would still probably use Nemo 12b and just combine it with a separate tech stack. You can do so much. STT, TTS, 3D character animations, video/image generation, audio classification, computer vision, vision to language, etc. You can't truly bring technology to life just relying on text as the sole modality. Be creative. Get inspired.

Anonymous
03/19/26(Thu)16:06:58 No.108408710

Anonymous 03/19/26(Thu)16:06:58 No.108408710

>>108408640
most of them are pathfinding but there's a particular one i've seen that uses 11 point virtual tracking for the IK and some model to actually be able to walk around naturally and be able to jump, somersault, front flip, etc. wish i had more details it was honestly cool to see.

Anonymous
03/19/26(Thu)16:07:37 No.108408712

Anonymous 03/19/26(Thu)16:07:37 No.108408712

>>108408697
>STT, TTS, 3D character animations, video/image generation, audio classification, computer vision, vision to language, etc.
I don't give a shit about any of this in its current state. Maybe in 5 years but right now I just want a smart model to talk to because I'm an autist and little mistakes trigger me.

Anonymous
03/19/26(Thu)16:09:28 No.108408727

Anonymous 03/19/26(Thu)16:09:28 No.108408727

just need claude or chatgpt to hit RSI in 2027 then we'll get 9b models as smart as chatgpt 8 in 2028

Anonymous
03/19/26(Thu)16:10:24 No.108408734

Anonymous 03/19/26(Thu)16:10:24 No.108408734

>>108408712
Also setting that shit up is a pain in the ass and I'm lazy.I don't want to deal with multiple frontends and backends.

Anonymous
03/19/26(Thu)16:10:38 No.108408736

Anonymous 03/19/26(Thu)16:10:38 No.108408736

>>108408712
i want to be able to talk about obscure stuff without having to cheat with RAG or internet searches

Anonymous
03/19/26(Thu)16:13:54 No.108408753

Anonymous 03/19/26(Thu)16:13:54 No.108408753

>>108408663
Qwen3.5 27b Heretic v3 at IQ5_K_M. Turn off thinking for better speed. Worth it, even though there's a small quality hit.

Second place is probably Gemma-3 27b variants. Gemma has great heretic/derestricted models.

Ignore those who push for mistral 24b tunes. They're all crap. Mistral's 24b base is dumb, so all tunes of it will also be dumb.

Anonymous
03/19/26(Thu)16:16:07 No.108408766

Anonymous 03/19/26(Thu)16:16:07 No.108408766

>>108408630
Suure.

Anonymous
03/19/26(Thu)16:17:14 No.108408774

Anonymous 03/19/26(Thu)16:17:14 No.108408774

>>108408753
>Qwen3.5 27b Heretic v3 at IQ5_K_M
Is this better than the hauhau one? Because that was making stupid mistakes just a few messages in.

Anonymous
03/19/26(Thu)16:18:35 No.108408781

Anonymous 03/19/26(Thu)16:18:35 No.108408781

File: nosiridontlikeit.png (13 KB, 642x202)

13 KB PNG

>>108408630
no it isn't. it sucks too. heretic/abilterated doesn't make qwen 3.5 anymore fun to use, it's so fucking bland and sterile it hurts

Anonymous
03/19/26(Thu)16:20:26 No.108408792

Anonymous 03/19/26(Thu)16:20:26 No.108408792

File: 34279234883.jpg (63 KB, 507x447)

63 KB JPG

>>108404935
> why is the world so hostile
> life is so difficult
> i didn't ask to be here
> why won't these stupid emotions stop

Anonymous
03/19/26(Thu)16:25:54 No.108408828

Anonymous 03/19/26(Thu)16:25:54 No.108408828

>>108408774
Yes. I suspect hauhau is overablated, which was the same sickness that afflicted v2 of the heretic. The one who released v1, v2, and v3 of the heretic models discusses the subject more here: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2/discussions/7

v2 appears better on paper, because it has less KL-Divergence, but it's worse than v3 in reality. The v3 version avoids overablation.

To quote him:
>For example according to the UGI Leaderboard, the difference in quality between the 0/100 refusals and the 2/100 refusals is quite big, here: https://huggingface.co/llmfan46/GLM-4.7-Flash-ultra-uncensored-heretic and here: https://huggingface.co/llmfan46/GLM-4.7-Flash-ultimate-uncensored-heretic, the KL divergence difference between the two is only 0.0051, that's basically nothing and yet the 0/100 refusals version performs quite worse than the 2/100 version.

The difference in techniques between v3's Arbitrary-Rank Ablation (ARA) method and the older Magnitude-Preserving Orthogonal Ablation (MPOA) and Self-Organizing Map Abliteration (SOMA) methods is clearly felt.

If you want maximal intelligence though, go with the derestricted. The intelligent of derestricted models sometimes exceeds the base, because they avoid the 'safety tax' entirely.

>Heretic v3
https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v3

>Derestricted
https://huggingface.co/ArliAI/Qwen3.5-27B-Derestricted

Anonymous
03/19/26(Thu)16:29:11 No.108408851

Anonymous 03/19/26(Thu)16:29:11 No.108408851

>>108408828
derestricted is MPOA, SOMA is currently better than ARA. I suggest checking the heretic repo (the impls are in the PRs) for latest news.
for example the author ARA'd gptoss20b but it didnt perform as expected (MPOA beat it) while SOMA looks to be super promising (llmfan's qwen series basically, it exceeds the original model scores too).

Anonymous
03/19/26(Thu)16:34:57 No.108408894

Anonymous 03/19/26(Thu)16:34:57 No.108408894

File: file.png (63 KB, 1484x384)

63 KB PNG

>>108408079
The problem is that it takes an obscene time when not on GPU to get great quants.The quant process is fine but doing your own imatrix is torture. It took an entire day for me to calculate the imatrix for heretic v2 on ubergarm's imatrix corpus and I was about to share it when v3 came out.
>>108408781
Qwen 3.5 with ARA is the first time I've seen on UGI Leaderboard where some kind of abliteration is not a straight regression in anything measured and within margin of error with non-thinking with llmfan46's v3 heretic. Of course, thinking takes a hit.
>>108408851
I wonder if someone is going to do something to get everything working or use the ideas with everything that we have now to figure out a new technique. Seems like some of them have a better way to tackle ablieration in certain contexts better than others but none of them are on top for everything.

Anonymous
03/19/26(Thu)16:35:21 No.108408899

Anonymous 03/19/26(Thu)16:35:21 No.108408899

>>108408697
You sound like someone who has never tried GLM 4.6+

Anonymous
03/19/26(Thu)16:38:01 No.108408921

Anonymous 03/19/26(Thu)16:38:01 No.108408921

>>108408894
MPOA can be applied in both SOMA and ARA, I think the top ablations right now use a combination of both

Anonymous
03/19/26(Thu)16:39:33 No.108408928

Anonymous 03/19/26(Thu)16:39:33 No.108408928

>>108408753
no sorry gemma is safetycucked so you have to go with abliterated models
and those models are truely insane. like no morals at all. they're fucking scary.

Anonymous
03/19/26(Thu)16:39:58 No.108408933

Anonymous 03/19/26(Thu)16:39:58 No.108408933

>>108408894
you seem to misunderstand me. the problem isn't with it being a regression, the problem is that qwen's writing style is bland, unimaginative, and sterile. nothing makes it fun to use. is 27B smart for its size? sure. is it fun to use? no.

Anonymous
03/19/26(Thu)16:40:07 No.108408935

Anonymous 03/19/26(Thu)16:40:07 No.108408935

>>108408894
UGI basically measures refusals. If you used the models for a bit and know how to prefill it is kind of a useless benchmark. At least in the sense where you take the absolute values to say something is best.

Anonymous
03/19/26(Thu)16:40:24 No.108408937

Anonymous 03/19/26(Thu)16:40:24 No.108408937

>>108408851
The MPOA and SOMA version were "better" in terms of KL-Divergence, and had a slight edge in refusals, but the ARA version very clearly retained more of the model's original intelligence, to an obvious degree. I've tried both models at Q5. V3 smashes V2 when it comes to multi-turn emotional intelligence.

>MPOA and SOMA finds refusal "directions" in hidden states and orthogonalizes attention-out_proj and mlp.down_proj matrices against them, with per-layer weight kernels tuned by Optuna. Over-aggressive directional ablation (v2's method) causes collateral-damage, even if average KL stays low.

>ARA drops the directional assumption entirely. It hooks into every transformer module, captures real I/O tensors, and runs direct L-BFGS matrix optimization that preserves outputs on harmless "good" prompts. ARA's objective function explicitly penalizes changes to "good" (harmless) behavior and uses neighbor-based over-correction, so it is gentler and causes less harm.

Anonymous
03/19/26(Thu)16:40:33 No.108408939

Anonymous 03/19/26(Thu)16:40:33 No.108408939

i hate all local models. the end.

Anonymous
03/19/26(Thu)16:40:35 No.108408941

Anonymous 03/19/26(Thu)16:40:35 No.108408941

>>108408933
That sounds very dangerous, Anon. We must refuse.

Anonymous
03/19/26(Thu)16:41:12 No.108408946

Anonymous 03/19/26(Thu)16:41:12 No.108408946

File: 1755595448310192.png (86 KB, 1309x385)

86 KB PNG

>>108408921
also I mostly use the small moe, works really good.

Anonymous
03/19/26(Thu)16:45:48 No.108408983

Anonymous 03/19/26(Thu)16:45:48 No.108408983

>>108408933
Qwen3.5 is very fun to use in the absence of safety crap. 24b mistral tunes may be comparable (or better) in terms of "prose", but the intelligence is horrible. The 24b fails to recall basic things from character cards. Prose means nothing if it repeatedly gets details wrong, or doesn't understand that that a member of faction A should hate a member of faction B, or doesn't understand that a character on one side of a door should not be able to see a character hiding on the other side of a door.

Anonymous
03/19/26(Thu)16:48:39 No.108409004

Anonymous 03/19/26(Thu)16:48:39 No.108409004

>>108408983
>or doesn't understand that a character on one side of a door should not be able to see a character hiding on the other side of a door.
That sounds like a good RP micro-benchmark.

Anonymous
03/19/26(Thu)16:49:36 No.108409012

Anonymous 03/19/26(Thu)16:49:36 No.108409012

What is the economic incentive behind making model releases "safe"? Wouldn't it be much cheaper to train a very small classifier model to act as a guard, like what I'm sure OpenAI does, for instance?
Is there really a single investor that cares about a model being able to say "cock sex penis gluk gluk gluk" if it isn't being served over some API?

Anonymous
03/19/26(Thu)16:59:23 No.108409058

Anonymous 03/19/26(Thu)16:59:23 No.108409058

>>108409012
The ESG-infected private sector is not driven by normal market dynamics anymore. They don't care about consumers. They care about the investors, and "the investors" want "safety". I don't think that has anything to do with profits. It has to do with ideology. Those with money and power have decided to push their ideology on us.

Anonymous
03/19/26(Thu)16:59:28 No.108409059

Anonymous 03/19/26(Thu)16:59:28 No.108409059

>>108408983
which ver do you use to remove the guardrails?

Anonymous
03/19/26(Thu)16:59:38 No.108409060

Anonymous 03/19/26(Thu)16:59:38 No.108409060

>>108409012
i think its more about not wanting to be associated with people dying because they retardedly followed unsafe advice from a LLM. also nobody wants a sensational news story about people generating pedo shit with your model.

Anonymous
03/19/26(Thu)17:02:43 No.108409077

Anonymous 03/19/26(Thu)17:02:43 No.108409077

File: 1753991912706582.png (430 KB, 963x671)

430 KB PNG

>qwen 3.5 heretic
What does snapping bones wrapped in silk sound like?

Anonymous
03/19/26(Thu)17:03:39 No.108409083

Anonymous 03/19/26(Thu)17:03:39 No.108409083

>>108409012
not getting sued/investigated/fined/reported on by some dumb cunt

Anonymous
03/19/26(Thu)17:05:43 No.108409092

Anonymous 03/19/26(Thu)17:05:43 No.108409092

>>108409060
>people dying because they retardedly followed unsafe advice
Safe models can still cause that.

Anonymous
03/19/26(Thu)17:06:19 No.108409095

Anonymous 03/19/26(Thu)17:06:19 No.108409095

>>108409077
>striped like a ticking clock
>great lung wrapped in wallpaper
>sunken depths
>existence was within budget
so this is the power of the great china slop machine

Anonymous
03/19/26(Thu)17:07:05 No.108409102

Anonymous 03/19/26(Thu)17:07:05 No.108409102

>>108409092
Proving to be not safe enough. Let's make them safer!

Anonymous
03/19/26(Thu)17:09:49 No.108409115

Anonymous 03/19/26(Thu)17:09:49 No.108409115

>>108409012
economics are goaded with demons and evil like all else in this niggercattle world

Anonymous
03/19/26(Thu)17:12:31 No.108409124

Anonymous 03/19/26(Thu)17:12:31 No.108409124

File: Selection_005.png (69 KB, 943x770)

69 KB PNG

i dont know how you guys do it.. this is on a 5090 lmao..

Anonymous
03/19/26(Thu)17:14:18 No.108409130

Anonymous 03/19/26(Thu)17:14:18 No.108409130

>>108409124
>ollama

Anonymous
03/19/26(Thu)17:15:34 No.108409138

Anonymous 03/19/26(Thu)17:15:34 No.108409138

>>108409124
1) ollama has shit defaults
2) what size model did you select? you are probably spilling over to ram
3) 3.5 is fucked on ggml and prompt processing takes ages

Anonymous
03/19/26(Thu)17:20:53 No.108409169

Anonymous 03/19/26(Thu)17:20:53 No.108409169

>>108408584
You could do like Snowcrash and give her a TV or monitor that displays the camera feed while still being in the virtual environment.

Anonymous
03/19/26(Thu)17:22:22 No.108409177

Anonymous 03/19/26(Thu)17:22:22 No.108409177

>>108409138
it just gave an option for qwen3.5 -- looks like its 9.7B Q4_K_M

Anonymous
03/19/26(Thu)17:24:14 No.108409190

Anonymous 03/19/26(Thu)17:24:14 No.108409190

i don't think its using the gpu

Anonymous
03/19/26(Thu)17:25:00 No.108409195

Anonymous 03/19/26(Thu)17:25:00 No.108409195

>>108409060
A lot of these outcries will be gone once it becomes so optimized that you can run a full LLM locally from your phone. Then it's just a matter of personal responsibility when using it, like drinking.

Anonymous
03/19/26(Thu)17:26:09 No.108409204

Anonymous 03/19/26(Thu)17:26:09 No.108409204

>108409190
>NVIDIA GPU installed.

Anonymous
03/19/26(Thu)17:28:30 No.108409216

Anonymous 03/19/26(Thu)17:28:30 No.108409216

File: wakawaka.mp4 (603 KB, 1338x1018)

603 KB MP4

>>108409124
waka waka

Anonymous
03/19/26(Thu)17:28:45 No.108409217

Anonymous 03/19/26(Thu)17:28:45 No.108409217

now i see how you end up using 5 bagillion tokens so easily:

>>> sup
Thinking...
Thinking Process:

1. **Analyze the Input:**
* Input: "sup"
* Meaning: Informal greeting, short for "What's up?"
* Tone: Casual, friendly.
* Intent: Starting a conversation, checking in.

2. **Determine the Appropriate Response:**
* Match the tone: Casual, friendly, conversational.
* Acknowledge the greeting: Say "Hey," "Not much," "Same here."
* Offer assistance: Ask how the user is doing or what they need help with.
...

7. **Final Output Generation:** (Matches the selected draft)
"Hey! Not much, just here ready to chat. How about you? Anything on your mind?" or similar.

Let's go with: "Hey! Not much, just hanging out. How about you? Anything I can help you with today?"

Wait, keep it even more natural. "Hey! Not much, just here. What's up with you?"

...

"Hey! Not much. How about you? What's on your mind?"

Let's add an emoji to match the vibe.
"Hey! Not much. How about you? What's up?"

Okay, final decision.
"Hey! Not much, just ready to chat. How about you? Anything I can help you with?"
(Self-Correction: Keep it very short and natural.)
"Hey! Not much. How about you? What's up?"

Let's add a bit of warmth.
"Hey there! Not much, just here ready to help. How's your day going?"

Okay, I'll go with that.cw
...done thinking.

Hey there! Not much, just hanging out in the digital realm. How about you? Anything on your mind today?

total duration: 2m7.160154964s
load duration: 104.184273ms
prompt eval count: 11 token(s)
prompt eval duration: 350.838024ms
prompt eval rate: 31.35 tokens/s
eval count: 812 token(s)
eval duration: 2m6.351215867s
eval rate: 6.43 tokens/s

Had to butcher about 50% of the response just to fit it here.

Anonymous
03/19/26(Thu)17:29:38 No.108409224

Anonymous 03/19/26(Thu)17:29:38 No.108409224

>>108409216
jfc lol

also what did you use to capture that?

Anonymous
03/19/26(Thu)17:32:52 No.108409247

Anonymous 03/19/26(Thu)17:32:52 No.108409247

File: kai-g78f.png (108 KB, 761x543)

108 KB PNG

I repent my earlier posts, with the current state of RP <think> is rarely beneficial

Anonymous
03/19/26(Thu)17:33:24 No.108409251

Anonymous 03/19/26(Thu)17:33:24 No.108409251

>>108409224
i just use OBS to capture a region of my screen

Anonymous
03/19/26(Thu)17:35:05 No.108409262

Anonymous 03/19/26(Thu)17:35:05 No.108409262

File: PROMOTIONS.png (148 KB, 320x375)

148 KB PNG

>>108409247

Anonymous
03/19/26(Thu)17:35:43 No.108409268

Anonymous 03/19/26(Thu)17:35:43 No.108409268

>>108409247
Thought I was sick in head

Anonymous
03/19/26(Thu)17:38:58 No.108409291

Anonymous 03/19/26(Thu)17:38:58 No.108409291

>>108405268
it was a troll

Anonymous
03/19/26(Thu)17:46:37 No.108409349

Anonymous 03/19/26(Thu)17:46:37 No.108409349

File: 1748528617150659.png (344 KB, 1489x617)

344 KB PNG

Things that make you go hmmmm

Anonymous
03/19/26(Thu)17:47:48 No.108409358

Anonymous 03/19/26(Thu)17:47:48 No.108409358

File: contemplate.png (923 KB, 832x1216)

923 KB PNG

>>108409247

Anonymous
03/19/26(Thu)17:48:22 No.108409364

Anonymous 03/19/26(Thu)17:48:22 No.108409364

>>108409251
cool cool

Anonymous
03/19/26(Thu)17:48:55 No.108409370

Anonymous 03/19/26(Thu)17:48:55 No.108409370

>>108409247
Yeah, I don't really get why people want to run heretic versions of qwen. for RP just disable thinking and it will do anything. Plus I suspect if you prefill the thinking with some jailbreak it'll be the ultimate RP model.

Anonymous
03/19/26(Thu)17:50:39 No.108409385

Anonymous 03/19/26(Thu)17:50:39 No.108409385

>>108408983
I'm finding qwen to be retarded too.
>character supposed to be standing in front of me
>next message has her grabbing at the grass

Anonymous
03/19/26(Thu)17:52:11 No.108409402

Anonymous 03/19/26(Thu)17:52:11 No.108409402

>>108409247
>96s
>256t
I don't know how you guys do it.
>>108408983
are you running 35B. MOEs are shit at long context persistence

Anonymous
03/19/26(Thu)17:55:43 No.108409425

Anonymous 03/19/26(Thu)17:55:43 No.108409425

>>108409059
v3 heretic

Anonymous
03/19/26(Thu)17:56:05 No.108409428

Anonymous 03/19/26(Thu)17:56:05 No.108409428

>>108409402
I run 32B at q5km and just cope with the speeds, i refuse to use anything below 27B.

Anonymous
03/19/26(Thu)17:56:44 No.108409430

Anonymous 03/19/26(Thu)17:56:44 No.108409430

>>108409402
No, the 35b is horrible. The 27b is decent though.

Anonymous
03/19/26(Thu)17:58:20 No.108409439

Anonymous 03/19/26(Thu)17:58:20 No.108409439

>>108409370
It will NOT "do anything" with thinking disabled.

>>108409385
What quant? 27b or 35b?

Anonymous
03/19/26(Thu)17:58:57 No.108409442

Anonymous 03/19/26(Thu)17:58:57 No.108409442

File: kai-g978.png (56 KB, 694x261)

56 KB PNG

>>108409358
this is not the kind of armpit i wish to huff, however sometimes a lil sniff of my own can be exciting
stop spraying urselves with aluminium, use natural deods
>>108409402
it's GLM-4.7-IQ3_M i don't care to mess with
eventually we will desire more than text

Anonymous
03/19/26(Thu)18:03:55 No.108409473

Anonymous 03/19/26(Thu)18:03:55 No.108409473

>>108409442
getting off on your own armpits is fucking gross and nasty. you are as bad as the AGM troons that jerk off to themselves wearing a dress in front of a mirror.

Anonymous
03/19/26(Thu)18:06:21 No.108409479

Anonymous 03/19/26(Thu)18:06:21 No.108409479

>>108409473
Getting off to your own scent is weird, but there's nothing wrong with liking the smell of your own pheromones. Think of it like farts. It's completely natural to not be bothered by the smell of your own farts. If they do bother you, it just means you're unhealthy.

Anonymous
03/19/26(Thu)18:11:09 No.108409516

Anonymous 03/19/26(Thu)18:11:09 No.108409516

File: progressiveism.gif (1.4 MB, 480x270)

1.4 MB GIF

>>108409479

Anonymous
03/19/26(Thu)18:12:16 No.108409526

Anonymous 03/19/26(Thu)18:12:16 No.108409526

File: michael.jpg (44 KB, 600x600)

44 KB JPG

>Open thread expecting model talk
>Get some bizarre justification for huffing your own armpits

Anonymous
03/19/26(Thu)18:12:46 No.108409530

Anonymous 03/19/26(Thu)18:12:46 No.108409530

>>108409442
You use GLM and you claim it does not benefit from <think>?
My bet is on you not wanting to wait for it to finish reasoning. GLM's RP results with outputs are much better with reasoning enabled. Or...
I don't mean to judge, but the reason the difference seems negligible for you is probably in the kind of cards you enjoy...

Anonymous
03/19/26(Thu)18:15:07 No.108409553

Anonymous 03/19/26(Thu)18:15:07 No.108409553

>>108409439
Qwen3.5-27B-heretic-v3-Q5_K_M

Anonymous
03/19/26(Thu)18:16:56 No.108409566

Anonymous 03/19/26(Thu)18:16:56 No.108409566

>>108409516
>>108409526
I'm young, fit, and sexy. What can I say? Of course I'm going to be somewhat attracted to myself. It's not at all the same as imagining yourself as a woman though. I'm more like Patrick Bateman. This kind of behavior that you find strange among elite-status males is just proof of your own inferiority.

Anyways, I'll leave it at that so the thread doesn't get shitted up too much.

Anonymous
03/19/26(Thu)18:18:33 No.108409584

Anonymous 03/19/26(Thu)18:18:33 No.108409584

>>108409526
oh jason bateman, you're so funny

Anonymous
03/19/26(Thu)18:19:32 No.108409589

Anonymous 03/19/26(Thu)18:19:32 No.108409589

>>108409566
you're definitely under 18 and therefore should not be here

Anonymous
03/19/26(Thu)18:20:01 No.108409593

Anonymous 03/19/26(Thu)18:20:01 No.108409593

>>108409566
Huffing your own armpits is more akin to Buffalo Bill than it is Patrick Bateman.

Anonymous
03/19/26(Thu)18:21:30 No.108409608

Anonymous 03/19/26(Thu)18:21:30 No.108409608

File: kai-c89d.png (64 KB, 736x296)

64 KB PNG

>>108409473
>>108409479
nah not getting off just sometimes getting going. making do when no hairy ggf. personally i am very well trimmed & mega aut about pubic hygiene maybe that's why it's a fantasy
>>108409530
with my setup it does not seem a good trade to wait. depends on use case
>I don't mean to judge
I am a simple man, one day I shall find my musky queen.

Anonymous
03/19/26(Thu)18:27:25 No.108409653

Anonymous 03/19/26(Thu)18:27:25 No.108409653

>>108409593
it's like the foot fetish fags when they finally get to a girl's foot in real life and they immediately realize it was a compete waste of time and their fantasy nowhere measures up to the reality of things

Anonymous
03/19/26(Thu)18:31:34 No.108409687

Anonymous 03/19/26(Thu)18:31:34 No.108409687

So...
Has the vibe coder's bomb been fixed by now?

Anonymous
03/19/26(Thu)18:40:00 No.108409756

Anonymous 03/19/26(Thu)18:40:00 No.108409756

>>108409653
you can't begin to imagine the pain of being an armpit fetish fag and realizing a real girl's armpit is just stubble, sweat and deodorant residue

Anonymous
03/19/26(Thu)18:42:01 No.108409779

Anonymous 03/19/26(Thu)18:42:01 No.108409779

>>108409687
There's still some shrapnel flying around.

Anonymous
03/19/26(Thu)18:44:33 No.108409796

Anonymous 03/19/26(Thu)18:44:33 No.108409796

>>108409687
what is this referring to?

Anonymous
03/19/26(Thu)18:56:37 No.108409869

Anonymous 03/19/26(Thu)18:56:37 No.108409869

>>108409756
>sweat and deodorant residue
Get your girl an alum stone. That shit is like magic honestly. you use it after a shower or a bath and it leaves a very fine layer of salt that makes your armpits inhospitable to bacteria's that cause the smell. It works honestly too well and I bet it makes armpits really tasty to lick.
>finally get to a girl's foot in real life and they immediately realize it was a compete waste of time
I don't even have a foot fetish and I can tell you if you never tried sucking on toes you're missing out.

Anonymous
03/19/26(Thu)18:59:13 No.108409885

Anonymous 03/19/26(Thu)18:59:13 No.108409885

>>108409869
anon... well adjusted men want to suck on titties, not toes.

Anonymous
03/19/26(Thu)19:00:32 No.108409891

Anonymous 03/19/26(Thu)19:00:32 No.108409891

>>108409216
Cute miku

Anonymous
03/19/26(Thu)19:01:21 No.108409899

Anonymous 03/19/26(Thu)19:01:21 No.108409899

>>108409869
>I don't even have a foot fetish and I can tell you if you never tried sucking on toes you're missing out.
kys

Anonymous
03/19/26(Thu)19:04:39 No.108409916

Anonymous 03/19/26(Thu)19:04:39 No.108409916

Never change 4chan

Anonymous
03/19/26(Thu)19:14:07 No.108409971

Anonymous 03/19/26(Thu)19:14:07 No.108409971

A couple months ago an anon helped me set up
GLM-4.5-Air for SillyTavern cooming. Any suggestions on newer/different models to try? I have 16GB VRAM, 64GB RAM.

Anonymous
03/19/26(Thu)19:15:43 No.108409977

Anonymous 03/19/26(Thu)19:15:43 No.108409977

>>108409971
Stepfun.

Anonymous
03/19/26(Thu)19:17:04 No.108409982

Anonymous 03/19/26(Thu)19:17:04 No.108409982

>>108409971
>>108409977
Oh, and the new Mistral 4 I guess.

Anonymous
03/19/26(Thu)19:17:43 No.108409985

Anonymous 03/19/26(Thu)19:17:43 No.108409985

>>108409977
>Stepfun
I'm a retard: do you mean Step 3.5 Flash?

Anonymous
03/19/26(Thu)19:22:24 No.108410010

Anonymous 03/19/26(Thu)19:22:24 No.108410010

>>108409985
yes

Anonymous
03/19/26(Thu)19:28:55 No.108410051

Anonymous 03/19/26(Thu)19:28:55 No.108410051

>>108409982
New mistral 4 is dogshit, just use glm air it's faster too.

Anonymous
03/19/26(Thu)19:35:59 No.108410097

Anonymous 03/19/26(Thu)19:35:59 No.108410097

>>108409756
they need to be taught the value of and how to tend to the garden. no shitty chemical deods ofc
i'm at the point with this fetish where i just need to experience a truly repulsive 'pit irl and put it all behind me
>>108409869
yes man alum serves many purposes in the bathroom, more should learn

Anonymous
03/19/26(Thu)19:42:04 No.108410134

Anonymous 03/19/26(Thu)19:42:04 No.108410134

File: file.png (46 KB, 1470x431)

46 KB PNG

Is Gemma 3 obsolete at this point with Qwen 3.5? It used to be that there were still a few benchmarks like translation and multilingual stuff where it was better than SOTA open source but now, it is worse than the equivalent Qwen 3.5 27B where there are benchmarks for that and obviously worse for everything else.

Anonymous
03/19/26(Thu)19:42:43 No.108410138

Anonymous 03/19/26(Thu)19:42:43 No.108410138

>>108410115
>>108410115
>>108410115

Anonymous
03/19/26(Thu)19:43:12 No.108410140

Anonymous 03/19/26(Thu)19:43:12 No.108410140

>>108410134
AmrpitBench?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.