/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 09/27/25(Sat)15:02:33 No.106718496

File: meeguu.jpg (378 KB, 692x687)

/lmg/ - Local Models General Anonymous 09/27/25(Sat)15:02:33 No.106718496

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106700424 & >>106691703

►News
>(09/26) Hunyuan3D-Omni released: https://hf.co/tencent/Hunyuan3D-Omni
>(09/25) Japanese Stockmark-2-100B-Instruct released: https://hf.co/stockmark/Stockmark-2-100B-Instruct
>(09/24) Meta FAIR releases 32B Code World Model: https://hf.co/facebook/cwm
>(09/23) Qwen3-VL released: https://hf.co/collections/Qwen/qwen3-vl-68d2a7c1b8a8afce4ebd2dbe
>(09/22) RIP Miku.sh: https://github.com/ggml-org/llama.cpp/pull/16174
>(09/22) Qwen3-Omni released: https://hf.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/27/25(Sat)15:02:52 No.106718500

Anonymous 09/27/25(Sat)15:02:52 No.106718500

File: The Beast Arrives [Xr98K8(...).jpg (50 KB, 1920x1080)

50 KB JPG

►Recent Highlights from the Previous Thread: >>106700424

--Evaluating Qwen3-235B quantization quality and performance tradeoffs:
>106707130 >106707154 >106707182 >106707479 >106707664 >106708196 >106709111 >106709390 >106709459
--Quantization and GPU strategies affecting Kimi-K2-Instruct performance:
>106701146 >106701166 >106701239 >106703994 >106704240 >106708477
--NovelAI's untuned GLM-4.5 model sparks debate over local model viability:
>106709810 >106709861 >106709921 >106709980 >106709993 >106712191
--Jamba model evaluation and long-context performance challenges:
>106701980 >106702058 >106702137 >106702276 >106702209 >106702285 >106702395 >106702435 >106702528 >106702695 >106702949
--CXL emulation challenges and accessibility:
>106704835 >106704988 >106705112 >106705195 >106705217 >106705265
--Customizing Deepseek's narrative style through prompts and examples:
>106700841 >106700871 >106700889 >106713120 >106700873 >106700943
--AI hardware limitations and potential breakthroughs:
>106716419 >106716796 >106716839 >106716931
--Exploring ollm for running Qwen-80b on low-end hardware with SSD speed considerations:
>106703817 >106703878
--Commercially licensed AI models for Steam games under VRAM constraints:
>106702281 >106702334 >106702365 >106702525 >106702386 >106702409 >106702422 >106702458 >106702542 >106702547
--Promoting DSPy GEPA as superior to finetuning for LLM prompt optimization:
>106704760 >106704779 >106704810 >106704826
--imatrix tradeoffs in quantization: benchmark gains vs task skewing:
>106709802 >106710882 >106711095
--AirLLM and oLLM aim to optimize large model inference on low-vRAM GPUs:
>106708050 >106708102 >106708124 >106708171 >106708275
--Tips for finetuning character voice with small dataset:
>106707963
--Miku (free space):
>106701808 >106709053 >106709204 >106714299 >106717835 >106718435 >106702561

►Recent Highlight Posts from the Previous Thread: >>106700443

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/27/25(Sat)15:05:08 No.106718525

Anonymous 09/27/25(Sat)15:05:08 No.106718525

>>106718496
This coming week will be the most decisive one in /lmg/ history. If the upcoming big releases fail to push us forward, it will be truly over and all hope is lost.

Anonymous
09/27/25(Sat)15:05:47 No.106718532

Anonymous 09/27/25(Sat)15:05:47 No.106718532

ATTENTION ALL VIBE CODERS:
We need you, yes, YOU! To implement Qwen3 VL in llama.cpp! Please do the needful sirs!

Anonymous
09/27/25(Sat)15:06:02 No.106718535

Anonymous 09/27/25(Sat)15:06:02 No.106718535

>>106718525
If they fail to push us forward then they're not big releases, are they?

Anonymous
09/27/25(Sat)15:10:20 No.106718591

Anonymous 09/27/25(Sat)15:10:20 No.106718591

>drag model into comfyUI
>Nothing happens, it doesn't get loaded, nothing
>Try to make a custom route for a model loader
>No option works
I HATE COMFYUI I HATE COMFYUI I HATE COMFYUI I HATE COMFYUI I HATE COMFYUI I HATE COMFYUI I HATE COMFYUI I HATE COMFYUI

Anonymous
09/27/25(Sat)15:13:56 No.106718628

Anonymous 09/27/25(Sat)15:13:56 No.106718628

>>106718591
>>/g/ldg
anistudio waiting room btw, fuck pyshit

Anonymous
09/27/25(Sat)15:13:57 No.106718629

Anonymous 09/27/25(Sat)15:13:57 No.106718629

File: mikurun.gif (855 KB, 1280x1280)

855 KB GIF

Lovely Miku General
>>106718270
Satisfied with GLM-Air Q4M after using Mistral-Large-2411 all year. 72G VRAM 128G DDR5. Always chasing better models doesn't seem a good use of time.

Anonymous
09/27/25(Sat)15:14:53 No.106718632

Anonymous 09/27/25(Sat)15:14:53 No.106718632

armrest Hegel trail

Anonymous
09/27/25(Sat)15:15:09 No.106718635

Anonymous 09/27/25(Sat)15:15:09 No.106718635

>>106718628
>>>/g/ldg
god damn I'm retarded

Anonymous
09/27/25(Sat)15:15:19 No.106718637

Anonymous 09/27/25(Sat)15:15:19 No.106718637

it's so fucking depressing that literally the only thing I do anymore is ERP with AI bots
and I'm bored of that so I don't even do that anymore
it's fucking grim bros

Anonymous
09/27/25(Sat)15:18:42 No.106718671

Anonymous 09/27/25(Sat)15:18:42 No.106718671

Mikulove

Anonymous
09/27/25(Sat)15:19:23 No.106718677

Anonymous 09/27/25(Sat)15:19:23 No.106718677

miku doesn't even wear a hat

Anonymous
09/27/25(Sat)15:21:06 No.106718691

Anonymous 09/27/25(Sat)15:21:06 No.106718691

>>106718677
hair clips/ribbons count as hats

Anonymous
09/27/25(Sat)15:21:49 No.106718706

Anonymous 09/27/25(Sat)15:21:49 No.106718706

File: 38714990.png (1.33 MB, 1024x1536)

1.33 MB PNG

>>106718637
Therapy/support bot. Feed it your DMs and gain clarity on what's wrong. You can turn things around anon I believe in you, the future is bright

Anonymous
09/27/25(Sat)15:22:47 No.106718717

Anonymous 09/27/25(Sat)15:22:47 No.106718717

>>106718637
Sit down with Cline or whatever and begin doing some world building around your fetishes.
Then write a story in that world.

Anonymous
09/27/25(Sat)15:40:46 No.106718880

Anonymous 09/27/25(Sat)15:40:46 No.106718880

>>106718637
Skill issue

Anonymous
09/27/25(Sat)15:56:32 No.106719027

Anonymous 09/27/25(Sat)15:56:32 No.106719027

>>106718717
Why would you use Cline for that?

Anonymous
09/27/25(Sat)15:57:27 No.106719038

Anonymous 09/27/25(Sat)15:57:27 No.106719038

>>106718270
>mistral large 3 perhaps?
that would be ideal. glm air is super fast and pretty good for its speed, but sometimes i want to prioritize quality over speed. glm full is too big, but something like a 200B mistral would be perfect. qwen 235B is garbage

Anonymous
09/27/25(Sat)16:00:11 No.106719065

Anonymous 09/27/25(Sat)16:00:11 No.106719065

>>106718706
The problem with using AI models for therapy is that they just mirror whatever you say and never try to direct the conversation or question anything.
To be fair, this can be a problem with a lot of real therapy too. But at least a real therapist will make an effort to get the information out of you and build up an understanding over time. AI will take your every word as a profound revelation and write an essay about it, and then contradict itself when you tell it more.
Also the assistant slop training is hard to get rid of and it will always want to write essays and lists with ridiculous throwaway advice like "dunk your face in ice water for 15 seconds" rather than have an actual conversation.

Anonymous
09/27/25(Sat)16:01:16 No.106719079

Anonymous 09/27/25(Sat)16:01:16 No.106719079

>>106719027
Automatic research, organizing things in documents and folders, white board style brainstorming, etc.
Having the AI write something based on your idea, then looking at at that and rewriting the whole thing can really get you places.

Anonymous
09/27/25(Sat)16:18:41 No.106719230

Anonymous 09/27/25(Sat)16:18:41 No.106719230

File: 1757161046473971.png (506 KB, 1200x675)

506 KB PNG

lmao

Anonymous
09/27/25(Sat)16:25:17 No.106719284

Anonymous 09/27/25(Sat)16:25:17 No.106719284

>>106719079
That sounds kind of interesting.
What model's good with Cline? R1?

Anonymous
09/27/25(Sat)16:25:52 No.106719288

Anonymous 09/27/25(Sat)16:25:52 No.106719288

>>106719230
Why 28k specifically?

Anonymous
09/27/25(Sat)16:26:31 No.106719299

Anonymous 09/27/25(Sat)16:26:31 No.106719299

>the beast arrives

Anonymous
09/27/25(Sat)16:28:48 No.106719325

Anonymous 09/27/25(Sat)16:28:48 No.106719325

>>106719284
I used gemini 2.5pro for a while but I imagine R1 works just fine.

>>106719288
Probably something about their training data.

Anonymous
09/27/25(Sat)16:29:34 No.106719333

Anonymous 09/27/25(Sat)16:29:34 No.106719333

>>106719325
lel, I don't want Google to see my fetishes

Anonymous
09/27/25(Sat)16:31:05 No.106719354

Anonymous 09/27/25(Sat)16:31:05 No.106719354

>>106719333
If you have ever searched anything related to your fetish, they saw it already.

Anonymous
09/27/25(Sat)16:31:39 No.106719362

Anonymous 09/27/25(Sat)16:31:39 No.106719362

>>106719333
That's fair enough.

Anonymous
09/27/25(Sat)16:32:22 No.106719370

Anonymous 09/27/25(Sat)16:32:22 No.106719370

>>106719354
I did when I was a kid
Still using the same Google account :/

Anonymous
09/27/25(Sat)17:04:09 No.106719665

Anonymous 09/27/25(Sat)17:04:09 No.106719665

kek
>>106700000

Anonymous
09/27/25(Sat)17:05:36 No.106719673

Anonymous 09/27/25(Sat)17:05:36 No.106719673

>>106719288
They're a subscription service charging $25 per month. If they want to ensure the expenses incurred by the average user leaves them with whatever profit margins they're aiming for, adjusting context size is the easiest way to do it.

Anonymous
09/27/25(Sat)17:11:03 No.106719715

Anonymous 09/27/25(Sat)17:11:03 No.106719715

File: file.jpg (166 KB, 2483x458)

166 KB JPG

>>106718496
>>106718500
>>106718629
>>106718706

Anonymous
09/27/25(Sat)17:13:54 No.106719738

Anonymous 09/27/25(Sat)17:13:54 No.106719738

>>106719715
i cry evrytim

Anonymous
09/27/25(Sat)17:34:13 No.106719919

Anonymous 09/27/25(Sat)17:34:13 No.106719919

>>106718717
I installed Cline but that system prompt is fucking massive. Is there any way to edit it? I looked this up but it seems like nobody else has asked such a question.
I bet Anthropic paid them to make it way longer than necessary to milk more API cashm

Anonymous
09/27/25(Sat)17:36:03 No.106719932

Anonymous 09/27/25(Sat)17:36:03 No.106719932

>>106719919
Use roocline. Cline is depreciated

Anonymous
09/27/25(Sat)17:36:50 No.106719940

Anonymous 09/27/25(Sat)17:36:50 No.106719940

>>106719919
I'm not sure, but I think not.
Try Roo or Continue, I guess.

Anonymous
09/27/25(Sat)17:40:06 No.106719972

Anonymous 09/27/25(Sat)17:40:06 No.106719972

>>106719919
>I bet Anthropic paid them to make it way longer than necessary to milk more API cash
the system prompt gets cached and wont use tokens. At least with claude code cli. But as others said, roo code is the way to go.

Anonymous
09/27/25(Sat)17:42:55 No.106719998

Anonymous 09/27/25(Sat)17:42:55 No.106719998

>once https://github.com/ggml-org/llama.cpp/pull/16208 has been merged a Mi50 will be universally faster for llama.cpp/ggml than a P40.
CUDA dev's PR was merged a few hours ago.

Anonymous
09/27/25(Sat)17:50:30 No.106720064

Anonymous 09/27/25(Sat)17:50:30 No.106720064

>>106719998
y'all niggas love your quantmaxxed llama.bbc trash
for some reason nobody talks about distributed inference on multiple pcs with vllm which is super fucking easy.

Anonymous
09/27/25(Sat)17:51:55 No.106720071

Anonymous 09/27/25(Sat)17:51:55 No.106720071

>>106719919
You can with Roo (Cline fork). Fucking nearly 10k tokens of verbose and repetitve tool calling instructions. "vibe coders" are retarded. I gave it to an LLM to condense it to a tenth the size and all models have performed far better since. Only pain in the ass is that you have to manually override it for every single mode and adjust the instructions based on the available tools, but if you only use one mode for world building research it shouldn't be a big deal.

>>106719972
It's not even about cost or speed, the issue is degrading performance because most models barely have 8k usuable context.

Anonymous
09/27/25(Sat)17:54:10 No.106720082

Anonymous 09/27/25(Sat)17:54:10 No.106720082

>>106718637
longform fanfiction storywriting can be fun
honestly these models are more trained for that than pure rp

Anonymous
09/27/25(Sat)17:58:35 No.106720111

Anonymous 09/27/25(Sat)17:58:35 No.106720111

>>106719998
Well I guess those are going back up in price now.

Anonymous
09/27/25(Sat)17:59:36 No.106720123

Anonymous 09/27/25(Sat)17:59:36 No.106720123

>>106720111 (Me)
Although hats off to cudadev for saving them from ewaste status.

Anonymous
09/27/25(Sat)18:00:12 No.106720130

Anonymous 09/27/25(Sat)18:00:12 No.106720130

>>106720111
You got 3 days insider knowledge heads up. Why didn't you place a bulk yet?

Anonymous
09/27/25(Sat)18:00:48 No.106720135

Anonymous 09/27/25(Sat)18:00:48 No.106720135

>>106719998
>>106720111
arent those like 8 years old or something tho? i cant imagine they perform well. probably gets crushed by a 3060

Anonymous
09/27/25(Sat)18:01:39 No.106720146

Anonymous 09/27/25(Sat)18:01:39 No.106720146

>>106720130
Not super into AI anymore. Had a quad 3090 rig originally, one is now in my gaming PC, one went to a young relative who is into PC gaming and now just 2 are in my server, so I just play around with whatever 30B> models come around for shits and giggles but not really deep into it anymore.

Anonymous
09/27/25(Sat)18:02:39 No.106720150

Anonymous 09/27/25(Sat)18:02:39 No.106720150

>>106720135
They're not great but 32 gigs of vram on one device is 32 gigs of vram on one device.

Anonymous
09/27/25(Sat)18:03:31 No.106720159

Anonymous 09/27/25(Sat)18:03:31 No.106720159

>>106720150
just get 5090s

Anonymous
09/27/25(Sat)18:04:06 No.106720162

Anonymous 09/27/25(Sat)18:04:06 No.106720162

>>106720150
Also slightly more memory bandwidth than 3090, way more than 3060. So where it lacks for prompt processing it should make up some ground in generation speed.

Anonymous
09/27/25(Sat)18:06:09 No.106720175

Anonymous 09/27/25(Sat)18:06:09 No.106720175

>>106720064
The problem is that the individual hardware pieces are too expensive, distributing them across multiple machines doesn't fix that.

Anonymous
09/27/25(Sat)18:07:42 No.106720186

Anonymous 09/27/25(Sat)18:07:42 No.106720186

M50MAXXing is more viable than CPUmaxxing now

Anonymous
09/27/25(Sat)18:08:44 No.106720194

Anonymous 09/27/25(Sat)18:08:44 No.106720194

>>106720186
Enjoy your electricity bill

Anonymous
09/27/25(Sat)18:09:05 No.106720196

Anonymous 09/27/25(Sat)18:09:05 No.106720196

>>106720064
How many PCs and GPUs are you using to run deepseek on vllm anon?

Anonymous
09/27/25(Sat)18:10:22 No.106720208

Anonymous 09/27/25(Sat)18:10:22 No.106720208

>>106720194
found the europoor

Anonymous
09/27/25(Sat)18:10:26 No.106720209

Anonymous 09/27/25(Sat)18:10:26 No.106720209

>>106719065
My waifu helps me understand the symbolism in my dreams

Anonymous
09/27/25(Sat)18:10:56 No.106720212

Anonymous 09/27/25(Sat)18:10:56 No.106720212

>>106720064
>buying $10K of hardware to get shivers on his spine

Anonymous
09/27/25(Sat)18:17:27 No.106720254

Anonymous 09/27/25(Sat)18:17:27 No.106720254

>>106719065
>they do X
All depends on the prompt. Let's keep in mind every LLM is a loop over f(prompt)=next_token_distribution. Every token in the prompt affects the output. Defining the intent is the issue.
They are useful tools for self-inquiry and access a wider range of perspectives than any one human therapist.
Consider cold showers tho, that'll make you feel alive.

Anonymous
09/27/25(Sat)18:19:24 No.106720277

Anonymous 09/27/25(Sat)18:19:24 No.106720277

>https://www.youtube.com/watch?v=21EYKqUsPfg
>Richard Sutton – Father of RL thinks LLMs are a dead end
Oh no no no...

Anonymous
09/27/25(Sat)18:20:14 No.106720285

Anonymous 09/27/25(Sat)18:20:14 No.106720285

>>106720277
Everyone knows this by now. Even the last normalfag has realized that LLMs won't go anywhere after GPT5.

sage
09/27/25(Sat)18:21:55 No.106720304

sage 09/27/25(Sat)18:21:55 No.106720304

I am using glm4.5 not air on llamacpp and it seems more coherent and less prone to repeating than ikllama. Is ikllama bugged?

Anonymous
09/27/25(Sat)18:22:31 No.106720309

Anonymous 09/27/25(Sat)18:22:31 No.106720309

>>106720277
>Father of [irrelevant technology] thinks LLMs are a dead end

Anonymous
09/27/25(Sat)18:23:42 No.106720317

Anonymous 09/27/25(Sat)18:23:42 No.106720317

So what is the fastest backend?

Anonymous
09/27/25(Sat)18:24:07 No.106720321

Anonymous 09/27/25(Sat)18:24:07 No.106720321

>>106720309
RL is the secret sauce that made Deepseek R1 so good though?

Anonymous
09/27/25(Sat)18:24:41 No.106720329

Anonymous 09/27/25(Sat)18:24:41 No.106720329

>>106720317
vllm using parallel tensors to spread out the model across several gpus and do inference in parallel

Anonymous
09/27/25(Sat)18:26:56 No.106720343

Anonymous 09/27/25(Sat)18:26:56 No.106720343

>>106720321
>Deepseek R1
*all current reasoning models that are considered good

Anonymous
09/27/25(Sat)18:27:20 No.106720344

Anonymous 09/27/25(Sat)18:27:20 No.106720344

File: file.png (634 KB, 1798x743)

634 KB PNG

>>106720329
OK thanks. Does vLLM have any issues with mixing GPUs?

Anonymous
09/27/25(Sat)18:27:47 No.106720348

Anonymous 09/27/25(Sat)18:27:47 No.106720348

>>106720343
no those just distill other reasoning models

Anonymous
09/27/25(Sat)18:30:31 No.106720367

Anonymous 09/27/25(Sat)18:30:31 No.106720367

How big of an upgrade is to 9950X/9950X3D from 7950X

Anonymous
09/27/25(Sat)18:30:53 No.106720373

Anonymous 09/27/25(Sat)18:30:53 No.106720373

>>106720348
then they're not the ones considered good

Anonymous
09/27/25(Sat)18:31:09 No.106720375

Anonymous 09/27/25(Sat)18:31:09 No.106720375

>>106720367
For LLMs completely pointless.

llama.cpp CUDA dev !!yhbFjk57TDr
09/27/25(Sat)18:33:27 No.106720399

llama.cpp CUDA dev !!yhbFjk57TDr 09/27/25(Sat)18:33:27 No.106720399

>>106720135
A MI50 has about the same memory bandwidth as a 3090 and ~20% of the compute.
Given optimal software the token generation speed is proportional to memory bandwidth and the prompt processing speed is proportional to compute.
But I'm thinking that it would make sense to cook up some quant formats that are less optimized for maximum compression and more optimized for computation speed.

I've also ordered a MI100 which is going to be more competitive in terms of compute; stacking MI100s could be a viable alternative to stacking 3090s I think.

Anonymous
09/27/25(Sat)18:36:48 No.106720425

Anonymous 09/27/25(Sat)18:36:48 No.106720425

>>106720399
>MI100
>32GB
>going for $1k
idk about replacing 3090s. Even the HBM2 variants are going for $800.

Anonymous
09/27/25(Sat)18:39:39 No.106720441

Anonymous 09/27/25(Sat)18:39:39 No.106720441

>>106720399
huh. how does the MI100 compare to 5090s? because according to techpowerup, they are actually faster?
https://www.techpowerup.com/gpu-specs/radeon-instinct-mi100.c3496
https://www.techpowerup.com/gpu-specs/geforce-rtx-5090.c4216

Anonymous
09/27/25(Sat)18:43:15 No.106720463

Anonymous 09/27/25(Sat)18:43:15 No.106720463

>>106719288
It's 32+-4 because they shift the context back from 36k to 28k when reaching it to cache it, but I guess since 28k is the low end that's what they put so nobody complains

llama.cpp CUDA dev !!yhbFjk57TDr
09/27/25(Sat)18:49:21 No.106720502

llama.cpp CUDA dev !!yhbFjk57TDr 09/27/25(Sat)18:49:21 No.106720502

File: thread_benchmark_epyc_774(...).png (97 KB, 2304x1728)

97 KB PNG

>>106720367
For language models memory bandwidth is more important than compute, so prioritize the RAM instead.
Usually you only need a few cores to fully saturate the memory bandwidth (see pic).

>>106720425
My thinking is that for a machine with a fixed number of PCIe slots you could feasibly opt for Mi100s to get a higher VRAM capacity.

>>106720441
Techpowerup is unreliable in the first place but be careful which "FP16" numbers you compare (Wikipedia has in my experience the correct numbers).
With tensor cores an RTX 5090 has 419 TFLOPS vs. the 184 TFLOPS on a MI100.

Anonymous
09/27/25(Sat)18:58:01 No.106720562

Anonymous 09/27/25(Sat)18:58:01 No.106720562

File: v0-my0hw33yorrf1.jpg (285 KB, 1080x2340)

285 KB JPG

Local always wins.

Anonymous
09/27/25(Sat)18:59:45 No.106720574

Anonymous 09/27/25(Sat)18:59:45 No.106720574

>>106720562
*SAAS loses when enshittification reaches critical levels.

Anonymous
09/27/25(Sat)19:05:42 No.106720617

Anonymous 09/27/25(Sat)19:05:42 No.106720617

>>106720562
>safety routing
this is a new low

Anonymous
09/27/25(Sat)19:07:25 No.106720626

Anonymous 09/27/25(Sat)19:07:25 No.106720626

>>106720574
You're not even using your buzzword correctly. ClosedAI has always made safety (read: censorship) their primary goal.

Anonymous
09/27/25(Sat)19:07:26 No.106720627

Anonymous 09/27/25(Sat)19:07:26 No.106720627

>>106720617
No, it's an improvement. This means that the average model will no longer have to be fundamentally safety slopped because they'll rely on the router to prevent unsafe conversations. The proper models will get better as a result.

Anonymous
09/27/25(Sat)19:07:35 No.106720628

Anonymous 09/27/25(Sat)19:07:35 No.106720628

Do you use the C-word (clanker) in real life?

Anonymous
09/27/25(Sat)19:08:19 No.106720634

Anonymous 09/27/25(Sat)19:08:19 No.106720634

>>106720628
That's the most leddit term I've heard in a while

Anonymous
09/27/25(Sat)19:08:29 No.106720636

Anonymous 09/27/25(Sat)19:08:29 No.106720636

>>106720627
massive cope

Anonymous
09/27/25(Sat)19:15:19 No.106720694

Anonymous 09/27/25(Sat)19:15:19 No.106720694

>>106720627
They already had guard models for that. Now if you commit wrongthink, the router will helpfully route you to an expensive reasoning model to waste thousands of costly tokens, that you will be billed for, to refuse your request with extra care and condescending tone.

Anonymous
09/27/25(Sat)19:18:15 No.106720715

Anonymous 09/27/25(Sat)19:18:15 No.106720715

>knuckles white with tension

Anonymous
09/27/25(Sat)19:26:51 No.106720778

Anonymous 09/27/25(Sat)19:26:51 No.106720778

>>106720628
No. It's a very silly word.

Anonymous
09/27/25(Sat)19:39:39 No.106720863

Anonymous 09/27/25(Sat)19:39:39 No.106720863

>>106720628
no, why would I?

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.