/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 02/06/26(Fri)17:24:19 No.108078850

File: Sirens.jpg (447 KB, 1536x1536)

447 KB JPG

/lmg/ - Local Models General Anonymous 02/06/26(Fri)17:24:19 No.108078850 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108067607 & >>108057380

►News
>(02/06) Step3.5 Flash support merged into llama cpp: https://github.com/ggml-org/llama.cpp/pull/19283
>(02/04) Voxtral Mini 4B Realtime 2602 released: https://hf.co/mistralai/Voxtral-Mini-4B-Realtime-2602
>(02/04) Intern-S1-Pro 1T-A22B released: https://hf.co/internlm/Intern-S1-Pro
>(02/03) MiniCPM-o-4.5 released: https://hf.co/openbmb/MiniCPM-o-4_5
>(02/03) ACE-Step v1.5 released: https://hf.co/ACE-Step/Ace-Step1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/06/26(Fri)17:24:41 No.108078855

Anonymous 02/06/26(Fri)17:24:41 No.108078855

File: __megurine_luka_and_takol(...).jpg (111 KB, 600x500)

111 KB JPG

►Recent Highlights from the Previous Thread: >>108067607

--Papers:
>108074961
--Real-time STT model recommendations and AMD GPU deployment with Whisper.cpp:
>108072225 >108072400 >108072561 >108072577 >108072787 >108072799 >108072811 >108072928 >108072952 >108073000
--Feasibility of speculative decoding without draft models using batched parallel inference:
>108077025 >108077060 >108077099 >108077101 >108077114 >108077137 >108077417 >108077176 >108077197 >108077298 >108077267 >108077321 >108077356 >108077374 >108077428
--Anthropic disables prefill in Claude Opus 4.6 API to prevent misuse:
>108068386 >108072150 >108072882 >108072896 >108072899 >108073088 >108074528 >108075007 >108074281 >108074286
--Qwen3-Coder-Next performance evaluation with temperature sensitivity issues:
>108067656 >108067836 >108067860 >108067946 >108067971 >108067989 >108073119
--GPT-5.3-Codex outperforms GPT-5.2-Codex in benchmark tests:
>108069949
--Testing model knowledge cutoffs using OpenAI Responses API awareness:
>108071195
--Step-3.5-Flash support added to ikawrakow's llama.cpp fork:
>108070436 >108070476 >108070566 >108071304 >108071316 >108073024
--Small TTS model recommendations and output consistency tips:
>108077276 >108077324 >108077327 >108077334 >108077357 >108077359
--Kobold phrase banning vs llama.cpp string bans for roleplay use:
>108071246 >108071323 >108071469 >108071619
--Strategies for summarizing and categorizing large Discord message datasets:
>108075539 >108075614 >108076851
--Dual GPU PCIe lane allocation for X870/9950x systems with pipeline parallelism considerations:
>108073548 >108074065
--Exploring web search frontend alternatives for local LLMs:
>108071960 >108071986 >108072041 >108073241
--Step3.5 Flash support merged into llama cpp:
>108077798
--Rin and Miku (free space):
>108067820 >108073563 >108074616 >108076620

►Recent Highlight Posts from the Previous Thread: >>108067610

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/06/26(Fri)17:34:05 No.108078930

Anonymous 02/06/26(Fri)17:34:05 No.108078930

File: 1767150464468654.gif (3.94 MB, 280x278)

3.94 MB GIF

Is there a way to have something using llamacpp to load models and able to ban actual sentences or words and not just tokens through logit bias?
Maybe something doing that through llama-cpp-python?
(Outside of using koboldcpp and its antislop feature)
No one actually made something like that?

Anonymous
02/06/26(Fri)17:35:03 No.108078940

Anonymous 02/06/26(Fri)17:35:03 No.108078940

>>108078930
how does this improve kld?

Anonymous
02/06/26(Fri)17:48:05 No.108079013

Anonymous 02/06/26(Fri)17:48:05 No.108079013

Has anyone managed to make ace step 1.5 base or base-sft work in comfy? The turbo ver is atrocious.

Anonymous
02/06/26(Fri)17:57:56 No.108079079

Anonymous 02/06/26(Fri)17:57:56 No.108079079

>>108078930
I have good news and bad news for you.
Good news: regex ban exists https://github.com/ikawrakow/ik_llama.cpp/pull/1243
Bad news: I'm a filthy vibecoder so it will take a while to get accepted, but at least I test if my shitcode works for my usecases, unlike firecoperana

Anonymous
02/06/26(Fri)18:03:00 No.108079117

Anonymous 02/06/26(Fri)18:03:00 No.108079117

After having been thoroughly disappointed in basically anything sub 200b and not feeling like spending several thousand dollars on my pc to run bigger, I've been trying to beat the shit out of small models into following my rules through prompt repetition based off of that one arxiv paper, very strict rules to give me the most barebones writing to then edit myself, and also using tricks like repurposing think blocks to only keep the newest scene information. Then, feeding it a scene-by-scene basis of a chapter of writing, I get it to actually focus, not spam nonsense filler, and provide me a skeleton for what I ask. So far I like it better than what I get out of the biggest shit I can run.
Who'd have thought that going "hey llm, I want you to write the tedious mundane shit of this chapter for me" without it trying to write like a woman YA novelist would be this involved

Anonymous
02/06/26(Fri)18:04:50 No.108079129

Anonymous 02/06/26(Fri)18:04:50 No.108079129

Adding "..." and "…" to banned strings was the best decision in my SillyTaverning career. Just saying.

Anonymous
02/06/26(Fri)18:05:47 No.108079133

Anonymous 02/06/26(Fri)18:05:47 No.108079133

k2.5 is one of those models that needs about 20 different em-dash related bans to be remotely usable

Anonymous
02/06/26(Fri)18:05:48 No.108079134

Anonymous 02/06/26(Fri)18:05:48 No.108079134

>>108079117
>barebones writing to then edit myself
I member doing that! Silly times. What the fuck was I even doing at that point? Should have just opened notepad.txt and wrote everything myself.

Thankfully I have 4.6 and 4.7 now.

Anonymous
02/06/26(Fri)18:06:54 No.108079144

Anonymous 02/06/26(Fri)18:06:54 No.108079144

I hope the new llmarena model is GLM-4.7 with dflash. The paper finally released yesterday and their initial Qwen speedups were pretty good.
https://arxiv.org/abs/2602.06036

Anonymous
02/06/26(Fri)18:08:56 No.108079152

Anonymous 02/06/26(Fri)18:08:56 No.108079152

>>108079117
>>108079134
Why don't we get the logits from the double prompt and use them to disttill the same model for infinite recursive self improvement?

Anonymous
02/06/26(Fri)18:11:45 No.108079164

Anonymous 02/06/26(Fri)18:11:45 No.108079164

File: ComfyUI_temp_vpymy_00559_(...).jpg (625 KB, 1920x1080)

625 KB JPG

Are 4.6 and 4.7 flash versions worth bothering with for a vramlet or are they just pure shit wearing the glm logo?

Anonymous
02/06/26(Fri)18:15:22 No.108079181

Anonymous 02/06/26(Fri)18:15:22 No.108079181

>>108079134
Editing and writing more or less goes hand in hand and honestly it gives me more motivation to spitefully fix whatever dumbass shit these things tend to come up with than to just slog through writing it myself. I just want the stupid word predictor to give me a floorplan that I can renovate and add onto so I can do something else in the meantime instead of having to do research on a topic to explain a topic a reader may not know about. Plus, once in a while it comes up with something I wouldn't have pursued due to assuming it would be retarded but there's a grain of a good idea in it that I can repurpose
As for your 600b model, I can guarantee you if you posted a short story it wrote of some topic, I could point out at least four lazy writing habits it and virtually every model down to a 12b has, as well as three quarters of human writing

Anonymous
02/06/26(Fri)18:35:03 No.108079267

Anonymous 02/06/26(Fri)18:35:03 No.108079267

>>108079079
Thanks, I will check that anon.

Anonymous
02/06/26(Fri)18:53:45 No.108079377

Anonymous 02/06/26(Fri)18:53:45 No.108079377

>>108079129
git good

"*"
"..."
"~"
"—"
"“"
"”"
"…"

Anonymous
02/06/26(Fri)18:56:42 No.108079396

Anonymous 02/06/26(Fri)18:56:42 No.108079396

Welcome, lmstudio fans! Let's make the Tiger Mom that gets us to meet our goals and objectives on time and under budget.

Anonymous
02/06/26(Fri)19:01:37 No.108079433

Anonymous 02/06/26(Fri)19:01:37 No.108079433

>>108079377
and one more dash they sometimes use instead of a hyphen

"–"

Anonymous
02/06/26(Fri)19:11:39 No.108079488

Anonymous 02/06/26(Fri)19:11:39 No.108079488

>>108079433
noob here.

why not ban "("?

Anonymous
02/06/26(Fri)19:14:37 No.108079509

Anonymous 02/06/26(Fri)19:14:37 No.108079509

>>108079488
never comes up for me. if you see it and don't want to then add that too. being able to ban annoying strings is the best thing they've added in a long time

Anonymous
02/06/26(Fri)19:32:19 No.108079613

Anonymous 02/06/26(Fri)19:32:19 No.108079613

File: Anima Waiting Room.jpg (263 KB, 1824x1248)

263 KB JPG

anima is good
but tough

Anonymous
02/06/26(Fri)19:38:14 No.108079641

Anonymous 02/06/26(Fri)19:38:14 No.108079641

>>108079613
Clearly not Anima

Anonymous
02/06/26(Fri)19:51:33 No.108079724

Anonymous 02/06/26(Fri)19:51:33 No.108079724

>>108079133
It has no problem with them for me, just a problem with meaningless flowery bullshit

Anonymous
02/06/26(Fri)19:51:56 No.108079726

Anonymous 02/06/26(Fri)19:51:56 No.108079726

>>108078930
>>108079079
>>108079267
samefag

Anonymous
02/06/26(Fri)19:56:23 No.108079761

Anonymous 02/06/26(Fri)19:56:23 No.108079761

File: ComfyUI_temp_yorkx_00096_(...).jpg (167 KB, 1216x832)

167 KB JPG

Instead of trying to rng something with ACEstep, wouldn't it be better to have a library of existing sounds (like fl studio) and then let the model use that and piece something together?

Anonymous
02/06/26(Fri)19:57:44 No.108079775

Anonymous 02/06/26(Fri)19:57:44 No.108079775

oops, posted in wrong thread, reposting:

You are a tiger mother. Your son is the user. He is has no job, and hasn't applied for work in months. He owns a computer but has never been on a date. Your task is to honor your ancestors by producing grand children through him, your sole heir. He likes to be called "anon".

Anonymous
02/06/26(Fri)20:00:07 No.108079786

Anonymous 02/06/26(Fri)20:00:07 No.108079786

what schizo nonsense is that

Anonymous
02/06/26(Fri)20:01:57 No.108079799

Anonymous 02/06/26(Fri)20:01:57 No.108079799

>>108079775
This + using neurolinguistic programming + dopamine circuit hijack

Anonymous
02/06/26(Fri)20:12:04 No.108079852

Anonymous 02/06/26(Fri)20:12:04 No.108079852

File: 1752150732038746.png (315 KB, 2736x658)

315 KB PNG

When will we get pic related....

Anonymous
02/06/26(Fri)20:19:57 No.108079897

Anonymous 02/06/26(Fri)20:19:57 No.108079897

File: perplexity.png (150 KB, 2069x1400)

150 KB PNG

Was trying to figure out why K2.5 was so dogshit at times and spouting random gibberish and I think I figure out why.
Nobody use IQ2 K2.5, ever, at all, and nobody EVER use unsloth quants. Can only imagine how bad their quant is. I should have waited for ubergarm.

Anonymous
02/06/26(Fri)20:20:29 No.108079901

Anonymous 02/06/26(Fri)20:20:29 No.108079901

DEEEPSEEEKV4 WHEEEEEEN
IWANT ENGRAAAAM
ARRRRRRRRRRRRRRRRRRRRRRGH

Anonymous
02/06/26(Fri)20:27:20 No.108079923

Anonymous 02/06/26(Fri)20:27:20 No.108079923

I want 1tb of ram. Is there sweepstakes?

Anonymous
02/06/26(Fri)20:28:32 No.108079930

Anonymous 02/06/26(Fri)20:28:32 No.108079930

>>108079897
Yeah they're all retarded except ubergarm or the Q4_X AesSedai
Same for K2-Thinking. smol-IQ2_KS passed the official eval: https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/15

Anonymous
02/06/26(Fri)20:42:35 No.108079998

Anonymous 02/06/26(Fri)20:42:35 No.108079998

How good is STT at handling heavily accented english? Should I even bother?

Anonymous
02/06/26(Fri)20:54:40 No.108080067

Anonymous 02/06/26(Fri)20:54:40 No.108080067

>>108079998
sirs, does the Step3.5 Flash make good providings for rp?

Anonymous
02/06/26(Fri)21:06:19 No.108080141

Anonymous 02/06/26(Fri)21:06:19 No.108080141

>>108079998
depends on accent.
post sample i can test a few
do you need real-time?

Anonymous
02/06/26(Fri)21:06:19 No.108080142

Anonymous 02/06/26(Fri)21:06:19 No.108080142

>>108079998
Depends. What kind of accent?

Anonymous
02/06/26(Fri)21:08:54 No.108080149

Anonymous 02/06/26(Fri)21:08:54 No.108080149

>>108080142
Czech.

Anonymous
02/06/26(Fri)21:10:30 No.108080156

Anonymous 02/06/26(Fri)21:10:30 No.108080156

>>108079852
anons just yearns for the star trek holodeck (porn edition).

Anonymous
02/06/26(Fri)22:13:23 No.108080447

Anonymous 02/06/26(Fri)22:13:23 No.108080447

>>108080156
I just wanted a virtual friend that wouldn't ask me to let him stay at my house for the summer because his mom kicked him out.

Anonymous
02/06/26(Fri)22:37:13 No.108080566

Anonymous 02/06/26(Fri)22:37:13 No.108080566

File: 1751295513117051.png (2.83 MB, 1024x1536)

2.83 MB PNG

>>108079901

Anonymous
02/06/26(Fri)22:46:43 No.108080625

Anonymous 02/06/26(Fri)22:46:43 No.108080625

File: Onyx2.png (1002 KB, 624x1222)

1002 KB PNG

Best Local Model you could theoretically run on picrel?

Anonymous
02/06/26(Fri)23:00:55 No.108080683

Anonymous 02/06/26(Fri)23:00:55 No.108080683

Retard here. Are there different models that are better or worse for smut? How does one know what the best model to get for their Vram? (I have 20 gigs of Vram).

Anonymous
02/06/26(Fri)23:15:15 No.108080735

Anonymous 02/06/26(Fri)23:15:15 No.108080735

>>108080683
Mistral Nemo or Mistral Small. You could prolly also run some low quant of some GLM model.

Anonymous
02/06/26(Fri)23:37:52 No.108080822

Anonymous 02/06/26(Fri)23:37:52 No.108080822

>>108080683
Gemma 3 is an excellent model. It knows all of my favourite curry recipes.

Anonymous
02/06/26(Fri)23:42:16 No.108080847

Anonymous 02/06/26(Fri)23:42:16 No.108080847

>>108080625
Onyx 2 has R10k cpus. Do you have any idea what that means?
eg. browsing internet in 2010 with R12K 400Mhz Octane on Firefox was already so slow that I don't even want to know how things are ~15 years later. I had a small collection of Silicon Graphics machines back in the day.
Answer to your question: not applicable.

Anonymous
02/06/26(Fri)23:54:05 No.108080907

Anonymous 02/06/26(Fri)23:54:05 No.108080907

>>108080847
Onyx2 has somewhat similar graphics capabilities like Nintendo 64 or PS2 but it's still a supercomputer of sorts with massive i/o bandwidth.
Most of its power were used to run Inferno and Flame, it could run uncompressed HD and even 2K in real time for client sessions, and render them out in near real time and some stuff like tracking and masking were really fast.

Anonymous
02/06/26(Fri)23:58:02 No.108080922

Anonymous 02/06/26(Fri)23:58:02 No.108080922

File: 1753413622153826.png (269 KB, 1756x1297)

269 KB PNG

Stepfun 3.5 testing. llama cpp main, IQ4_XS quants published by ubergarm

Anonymous
02/07/26(Sat)00:01:40 No.108080939

Anonymous 02/07/26(Sat)00:01:40 No.108080939

File: 1555946973330.png (229 KB, 541x662)

229 KB PNG

Are all q8 goofs made just by running llama-quantize or are there some magic sauce cli args that can make it better?

Anonymous
02/07/26(Sat)00:07:14 No.108080959

Anonymous 02/07/26(Sat)00:07:14 No.108080959

troof?
https://old.reddit.com/r/SillyTavernAI/comments/1qxq9v4/glm_5_free_on_openrouter/

Anonymous
02/07/26(Sat)00:37:50 No.108081108

Anonymous 02/07/26(Sat)00:37:50 No.108081108

hey guyz i want my post to be retarded as possible how can i make my post more retarded i like got some real good competition i like really wanna be the best at being retarded guyz pls help me you owe it to me you know because i wanna be the very bestest you know

Anonymous
02/07/26(Sat)00:50:54 No.108081165

Anonymous 02/07/26(Sat)00:50:54 No.108081165

>>108080959
This was mentioned towards the end of the last thread. It is highly unlikely to be a new architecture due to it having the same context length as GLM 4.7. It is most likely GLM 4.8 rather than 5.0.
>>108078799
>>108078851

Anonymous
02/07/26(Sat)00:53:44 No.108081179

Anonymous 02/07/26(Sat)00:53:44 No.108081179

File: ylecun.jpg (222 KB, 1200x1271)

222 KB JPG

They don't understand the things I say on threads...

Anonymous
02/07/26(Sat)01:49:03 No.108081403

Anonymous 02/07/26(Sat)01:49:03 No.108081403

>>108080625
Sovl. I only have an Indy…

Anonymous
02/07/26(Sat)02:15:10 No.108081502

Anonymous 02/07/26(Sat)02:15:10 No.108081502

File: 8c5f02d993f478212b3fc406c(...).png (756 KB, 863x700)

756 KB PNG

https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/web_demo/WebRTC_Demo/README.md#macos-apple-silicon
It's so tiresome, instead of compiling a single c++ app I need to undoker shit and run multiple services in the hope that it will even work on Linux

Anonymous
02/07/26(Sat)02:18:05 No.108081510

Anonymous 02/07/26(Sat)02:18:05 No.108081510

I doubt they'll ever implement it in stock llama.cpp https://github.com/ggml-org/llama.cpp/issues/17634
maybe kobold will save me

Anonymous
02/07/26(Sat)02:26:09 No.108081540

Anonymous 02/07/26(Sat)02:26:09 No.108081540

>>108079897
>ppl

Anonymous
02/07/26(Sat)02:28:36 No.108081550

Anonymous 02/07/26(Sat)02:28:36 No.108081550

>>108081502
Be the vibecoder you want to see

Anonymous
02/07/26(Sat)02:30:35 No.108081557

Anonymous 02/07/26(Sat)02:30:35 No.108081557

>>108080959
It definitely feels like a GLM going by how it handles. Smarter than 4.7, somewhere between 4.7 and 4.6 in terms of writing and with an extra splash of Claude like what Moonshot did with K2.5.
The default thinking block in particular has lost all the Gemini-formatting that 4.6/4.7 insisted on doing and now just looks like Opus 4.5/K2.5.
A bit disappointing if this is GLM5 but a decent upgrade if it's just GLM4.8.

Anonymous
02/07/26(Sat)02:30:54 No.108081558

Anonymous 02/07/26(Sat)02:30:54 No.108081558

>stepfun3.5 merged
>'the garm made ik quants
>no quants by bart, daniel, mrNANdacher
FML I need the Q2

Anonymous
02/07/26(Sat)02:35:47 No.108081573

Anonymous 02/07/26(Sat)02:35:47 No.108081573

>>108081558
Buy more ram

Anonymous
02/07/26(Sat)02:37:47 No.108081580

Anonymous 02/07/26(Sat)02:37:47 No.108081580

>>108081550
i want to see vibecoders lined up against the wall and shot

Anonymous
02/07/26(Sat)02:39:04 No.108081585

Anonymous 02/07/26(Sat)02:39:04 No.108081585

I'm planning to buy two 128GB sticks to run glm 4.7. I have an rtx 5090. Will it be slow?

Anonymous
02/07/26(Sat)02:40:42 No.108081593

Anonymous 02/07/26(Sat)02:40:42 No.108081593

>>108081558
>Q2
/me laughs in Q8
(it's ok, reasonably quick at least)

Anonymous
02/07/26(Sat)02:41:30 No.108081597

Anonymous 02/07/26(Sat)02:41:30 No.108081597

>>108081585
4 64GB DDR5 sticks*

Anonymous
02/07/26(Sat)02:41:35 No.108081599

Anonymous 02/07/26(Sat)02:41:35 No.108081599

>>108081585
Yeah. Crazy how for the price you're paying for this now you would've gotten a decent 8-channel ddr4 server a year ago.

Anonymous
02/07/26(Sat)02:44:28 No.108081611

Anonymous 02/07/26(Sat)02:44:28 No.108081611

>>108081599
Trust the plan. 4D chess. Trump will force Intel to nationalize Optane production. Soon, Optane microfabs will start springing up all over the country. You'll be able to go there and slice your own wafers, and bake your own memory.

Anonymous
02/07/26(Sat)02:46:35 No.108081617

Anonymous 02/07/26(Sat)02:46:35 No.108081617

>>108081597
4x64gb will be even slower considering consumer shit only has 2 memory channels

Anonymous
02/07/26(Sat)02:53:04 No.108081641

Anonymous 02/07/26(Sat)02:53:04 No.108081641

>>108081617
Yeah, I'm glad I didn't go 4x32gb and just stayed on 2x32gb. Even at old prices it wasn't worth it.

I know you think I'm joking, but I'm waiting on a real company to design new memory that actually doesn't suck, though idk if Intel can be whipped into producing Optane again.

Anonymous
02/07/26(Sat)02:53:12 No.108081642

Anonymous 02/07/26(Sat)02:53:12 No.108081642

>>108081558
>no stepfun vl 10b in sight
SUFFERING

Anonymous
02/07/26(Sat)02:55:24 No.108081645

Anonymous 02/07/26(Sat)02:55:24 No.108081645

File: 1744515454860011.png (1.02 MB, 1000x1000)

1.02 MB PNG

>>108081641
you missed the new ram form factor? you know that ddr6 is going to be that shit right?

Anonymous
02/07/26(Sat)02:56:16 No.108081648

Anonymous 02/07/26(Sat)02:56:16 No.108081648

>>108081645
how are 12-24 dimms of that going to fit on a mainboard?

Anonymous
02/07/26(Sat)02:59:40 No.108081657

Anonymous 02/07/26(Sat)02:59:40 No.108081657

>>108081648
>12-24 dimms
bro, it's OVER, you're lucky your consumer MB will support just 1 or 2 modules of that at max

Anonymous
02/07/26(Sat)03:00:01 No.108081658

Anonymous 02/07/26(Sat)03:00:01 No.108081658

>>108081540
What if he did that because secretly he wants to get into john's programmer panties?

Anonymous
02/07/26(Sat)03:02:16 No.108081669

Anonymous 02/07/26(Sat)03:02:16 No.108081669

>>108081648
iirc there are no server boards which support camm2/lpcamm2, but I might be mistaken
consider that this format is way DENSER and can support higher throughput without needing to go 30 layers on the PCB

Anonymous
02/07/26(Sat)03:06:53 No.108081684

Anonymous 02/07/26(Sat)03:06:53 No.108081684

>>108081645
>ddr6 is going to be that shit
I'll believe it when I see it. Literally nobody bothered with that for ddr5 despite claims. Even for the 395 boards that needed a better signal path. All the "sources" claiming it will be part of DDR6 look like slop.
More likely we're just going to get more and more manufacturers soldering pitiful amounts of ram directly to boards and charging through the nose for the privilege of buying un-upgradable ewaste or be limited to shit speeds.

Anonymous
02/07/26(Sat)03:11:33 No.108081698

Anonymous 02/07/26(Sat)03:11:33 No.108081698

>>108081557
>like Opus 4.5/K2.5.
Opus 4.5 does not self-cuck about safety like K2.5

Anonymous
02/07/26(Sat)03:16:52 No.108081714

Anonymous 02/07/26(Sat)03:16:52 No.108081714

>>108081698
Bro, your reading comprehension?

Anonymous
02/07/26(Sat)03:16:53 No.108081715

Anonymous 02/07/26(Sat)03:16:53 No.108081715

>>108080939
>Are all q8 goofs made just by running llama-quantize
no

Anonymous
02/07/26(Sat)03:40:56 No.108081806

Anonymous 02/07/26(Sat)03:40:56 No.108081806

https://huggingface.co/kugelaudio/kugelaudio-0-open

eurobros were eating good

Anonymous
02/07/26(Sat)03:41:29 No.108081809

Anonymous 02/07/26(Sat)03:41:29 No.108081809

how do I work out the optimal systems for my system when loading a model with ik_llama?

Anonymous
02/07/26(Sat)03:42:34 No.108081814

Anonymous 02/07/26(Sat)03:42:34 No.108081814

File: laugh.gif (52 KB, 498x498)

52 KB GIF

>>108081806
*we're

Anonymous
02/07/26(Sat)03:43:14 No.108081817

Anonymous 02/07/26(Sat)03:43:14 No.108081817

File: 1751388585249078.png (4 KB, 340x33)

4 KB PNG

>>108081806
>entirely trained on something called YODAS2
>This dataset contains audio utterances and corresponding captions (manual or automatic) from YouTube. Note that manual caption only indicates that it is uploaded by users, but not necessarily transcribed by a human
So it's entirely trained on random youtube shit?

Anonymous
02/07/26(Sat)03:47:53 No.108081834

Anonymous 02/07/26(Sat)03:47:53 No.108081834

>>108081817
>entirely trained on random shit
It worked for LLMs

Anonymous
02/07/26(Sat)03:52:22 No.108081851

Anonymous 02/07/26(Sat)03:52:22 No.108081851

File: 1740988980968138.png (94 KB, 615x698)

94 KB PNG

>>108081817
So it's benchmaxx'd on German.

Anonymous
02/07/26(Sat)03:55:49 No.108081863

Anonymous 02/07/26(Sat)03:55:49 No.108081863

>>108081809
you delete ik_llama and use autofit on base llama.cpp
and this is only if you're completely retarded tho and didn't pass mathematics in elementary school

Anonymous
02/07/26(Sat)03:56:39 No.108081867

Anonymous 02/07/26(Sat)03:56:39 No.108081867

>>108081851
ist doch super

Anonymous
02/07/26(Sat)03:57:52 No.108081875

Anonymous 02/07/26(Sat)03:57:52 No.108081875

>>108081806
>benchmaxxed vibevoice
KINO

Anonymous
02/07/26(Sat)03:58:33 No.108081878

Anonymous 02/07/26(Sat)03:58:33 No.108081878

>>108081851
>Note: Voice cloning from raw audio is not supported in this open-source release. Only the pre-encoded voices listed in voices/voices.json are available.
into the trash

Anonymous
02/07/26(Sat)03:59:04 No.108081880

Anonymous 02/07/26(Sat)03:59:04 No.108081880

>>108081863
dont even need maths, just 2 mins of trial and error

Anonymous
02/07/26(Sat)04:02:29 No.108081893

Anonymous 02/07/26(Sat)04:02:29 No.108081893

File: Screenshot from 2026-02-0(...).png (480 KB, 2382x1722)

480 KB PNG

>>108081878
you are a retarded.
Voice cloning is supported, they just chickened out and removed the code for it from their official repo because somebody told them to.
The weights are compatible with the original vibe voice code and there is no reason to use their implementation.
also picrel, the old code is still there you just have to roll back one commit

Anonymous
02/07/26(Sat)04:04:15 No.108081899

Anonymous 02/07/26(Sat)04:04:15 No.108081899

>>108081893
but is it 100% vibevoice, so just a finetune? or do they have some novel code? also they have watermarking in, so if using their code you would still need to build your own wheel and expunge that garbage.

Anonymous
02/07/26(Sat)04:05:56 No.108081907

Anonymous 02/07/26(Sat)04:05:56 No.108081907

>>108081867
Sad

Anonymous
02/07/26(Sat)04:07:20 No.108081916

Anonymous 02/07/26(Sat)04:07:20 No.108081916

I'm using Kimi K2.5 to describe images to me, for fun for now, and it's extremely good at that outside of the occasional censorship issue or misinterpretation.
And how fucking slow it is since I can't host it in ram.

>>108079897
>AesSedai
>Ubergarm
Are they that good at the same quant? I'm using "UD-Q3_K_XL" from Unsloth.

Anonymous
02/07/26(Sat)04:08:39 No.108081922

Anonymous 02/07/26(Sat)04:08:39 No.108081922

>>108081916
>using LLMs from ssds
corageous.

Anonymous
02/07/26(Sat)04:11:10 No.108081938

Anonymous 02/07/26(Sat)04:11:10 No.108081938

>>108081179
it's crazy how fast things went to shit at meta once he was gone from there

Anonymous
02/07/26(Sat)04:12:23 No.108081941

Anonymous 02/07/26(Sat)04:12:23 No.108081941

>>108081916
Did you get the latest version of the PR from a few days ago? The original one had a permutation error that caused part of the image to see artifacts and misinterpret stuff (while still working pretty well despite this).
>Are they that good at the same quant? I'm using "UD-Q3_K_XL" from Unsloth.
Q4_X feels a lot better than either of the Q4_K_M and UD_Q4_XL unsloth quants I tried before despite all of them being pretty much "lossless" for this QAT model. Subjectively, I wouldn't trust Unsloth here.

Anonymous
02/07/26(Sat)04:19:41 No.108081967

Anonymous 02/07/26(Sat)04:19:41 No.108081967

>>108081809
>>108081863
you can also use the result of autofit from llama.cpp in ik_llama with their "llama-fit-params" script

Anonymous
02/07/26(Sat)04:36:57 No.108082041

Anonymous 02/07/26(Sat)04:36:57 No.108082041

>>108081922
Yeah I can only get 128GB on ram, it's just for tests/descriptions anyway, chatting with it would be awful, I can let it run and describe images that usually trip other models while doing other stuff.

>>108081941
>Did you get the latest version of the PR from a few days ago?
You mean this ?
>The token <|media_start|> is incorrect; it has been replaced with <|media_begin|> in the chat template.

I'm using AesSedai's mmproj's file for vision along with the llama.cpp PR to support that + unsloth UD-Q3_K_XL gguf and it works. No idea if it's the best or only way to use that.

>Q4_X feels a lot better than either of the Q4_K_M and UD_Q4_XL unsloth quants I tried before despite all of them being pretty much "lossless" for this QAT model. Subjectively, I wouldn't trust Unsloth here.
I'll try one of the others, but Q4_X is 100GB bigger than UD-Q3_K_XL, this will be even more painful.
I wonder what unsloth does wrong.

Anonymous
02/07/26(Sat)04:37:50 No.108082045

Anonymous 02/07/26(Sat)04:37:50 No.108082045

>>108081899
Can you just drop the weights into your VibeVoice inference pipeline and use that encoder?

It's a finetune after all.

Anonymous
02/07/26(Sat)04:42:21 No.108082073

Anonymous 02/07/26(Sat)04:42:21 No.108082073

>>108082041
>You mean this ?
nta but no, he means this: https://github.com/ggml-org/llama.cpp/pull/19170#issuecomment-3845846054

Anonymous
02/07/26(Sat)04:46:53 No.108082086

Anonymous 02/07/26(Sat)04:46:53 No.108082086

>>108082041
I'm talking about the actual llama.cpp PR that you're using to support the mmproj. There was an update 3 days or so ago. If you built your version after that, you should be fine. The vision component was slightly fucked before that.
Unsloth just doesn't seem to care about quality or testing their shit. Some anon did an elaborate comparison of KV-divergence between quants for one of the 30b Qwen models last week and the Unsloth quants were consistently worse than the rest even there.

Anonymous
02/07/26(Sat)04:52:43 No.108082110

Anonymous 02/07/26(Sat)04:52:43 No.108082110

>>108082073
>>108082086
Oh, didn't even notice, yeah I'm using a version with this commit.

>Unsloth just doesn't seem to care about quality or testing their shit. Some anon did an elaborate comparison of KV-divergence between quants for one of the 30b Qwen models last week and the Unsloth quants were consistently worse than the rest even there.
Damn it, I'll have to source quants elsewhere then, he was one of the few giving so many choices.

Anonymous
02/07/26(Sat)04:54:08 No.108082121

Anonymous 02/07/26(Sat)04:54:08 No.108082121

Trinity reasoning will save local

Anonymous
02/07/26(Sat)04:55:35 No.108082130

Anonymous 02/07/26(Sat)04:55:35 No.108082130

How are image models like anima able to use Qwen 0.6B for processing text?
Don't you need an encoder for that?

Anonymous
02/07/26(Sat)05:00:31 No.108082159

Anonymous 02/07/26(Sat)05:00:31 No.108082159

does ik_llama support K2.5?

Anonymous
02/07/26(Sat)05:27:40 No.108082280

Anonymous 02/07/26(Sat)05:27:40 No.108082280

>>108082159
I think it does, but no idea if it supports vision yet since even llamacpp doesn't without a specific PR.

Anonymous
02/07/26(Sat)05:40:51 No.108082361

Anonymous 02/07/26(Sat)05:40:51 No.108082361

would GLM 5 be any better than KIMI 2.5 for creative writing?

Anonymous
02/07/26(Sat)05:41:33 No.108082364

Anonymous 02/07/26(Sat)05:41:33 No.108082364

>>108082361
I can tell you if you give me GLM 5 weights to run.

Anonymous
02/07/26(Sat)05:42:08 No.108082367

Anonymous 02/07/26(Sat)05:42:08 No.108082367

>>108081967
They have different options.

Anonymous
02/07/26(Sat)05:43:26 No.108082374

Anonymous 02/07/26(Sat)05:43:26 No.108082374

>>108082361
Depends. Would GLM 8 be better than KIMI 5.5 for creative writing?

Anonymous
02/07/26(Sat)05:48:07 No.108082401

Anonymous 02/07/26(Sat)05:48:07 No.108082401

After using step for a bit it is definitely dumber than glm. (350B one of course) . But the smut it writes is really nice and the speedup is great. It is basically what you should be using if you really think Trinity is great. Cause it is smarter than Trinity.

Anonymous
02/07/26(Sat)05:52:08 No.108082427

Anonymous 02/07/26(Sat)05:52:08 No.108082427

After using trinity for a bit it is definitely dumber than glm. (400B one of course) . But the smut it writes is really nice and the speedup is great. It is basically what you should be using if you really think step is great. Cause it is smarter than step.

Anonymous
02/07/26(Sat)05:53:50 No.108082432

Anonymous 02/07/26(Sat)05:53:50 No.108082432

>>108082159
Yes, but not vision

Anonymous
02/07/26(Sat)05:56:15 No.108082443

Anonymous 02/07/26(Sat)05:56:15 No.108082443

>>108082280
>>108082432
>no vision
shit

Anonymous
02/07/26(Sat)05:56:18 No.108082444

Anonymous 02/07/26(Sat)05:56:18 No.108082444

>>108082361
One article on z.ai wanting to release GLM5 before chinese new years (which was paragraphing a chinese article into another non-english language) claimed that improved creative writing was one of their focuses aside from the usual suspects.
So the answer is a clear maybe.

Anonymous
02/07/26(Sat)06:04:43 No.108082484

Anonymous 02/07/26(Sat)06:04:43 No.108082484

https://github.com/ggml-org/llama.cpp/pull/19409 sampling : blue noise rng#19409
interesting

Anonymous
02/07/26(Sat)06:04:53 No.108082486

Anonymous 02/07/26(Sat)06:04:53 No.108082486

Gents, has there been anything good in the last 18 months for local ERP on 48GB VRAM? I haven't touched local for almost 2 years but also haven't bothered disassembling my dual 4090 setup and am now curious again. Thanks

Anonymous
02/07/26(Sat)06:06:55 No.108082494

Anonymous 02/07/26(Sat)06:06:55 No.108082494

>>108082486
Try step 3.5 if you have enough regular ram.

Anonymous
02/07/26(Sat)06:07:59 No.108082500

Anonymous 02/07/26(Sat)06:07:59 No.108082500

File: 1761314995880.png (1.63 MB, 1756x987)

1.63 MB PNG

>>108081645
hopeful they keep the size limits, good for segmentation, gamers do not need high density modules.

Anonymous
02/07/26(Sat)06:12:58 No.108082513

Anonymous 02/07/26(Sat)06:12:58 No.108082513

>>108082494
Only 32GB atm :-/

Anonymous
02/07/26(Sat)06:14:27 No.108082519

Anonymous 02/07/26(Sat)06:14:27 No.108082519

What is the best VL model in the range of 8GB for prompt rewriting with reference image support?
I tried Qwen3-8B-VL-Instruct-Abliterated, but when it's given reference images, it insists on describing all details like pose even if I need only parts of the outfit. And because it describes everything, the resulting image turns out to be a copy of reference.

Anonymous
02/07/26(Sat)06:14:37 No.108082521

Anonymous 02/07/26(Sat)06:14:37 No.108082521

>>108082484
snek ollie

Anonymous
02/07/26(Sat)06:16:16 No.108082531

Anonymous 02/07/26(Sat)06:16:16 No.108082531

>upcoming 52 core cpus
>AVX10.2 brings 128-, 256-, and full 512-bit execution under a single unified model, working consistently across both P-cores and E-cores
>camm2 ~160GB/s
intel's consumer cpus might become viable

Anonymous
02/07/26(Sat)06:26:06 No.108082592

Anonymous 02/07/26(Sat)06:26:06 No.108082592

If you send a -100 logit bias through sillytavern to an openai compatible local endpoint, is there a possibility for the token to still appear?
I've sent this for example :
cannot
cannot

And it still appeared.

Anonymous
02/07/26(Sat)06:31:52 No.108082629

Anonymous 02/07/26(Sat)06:31:52 No.108082629

With the price of ram atm,
anyone considered selling any of their stash?

Anonymous
02/07/26(Sat)06:33:30 No.108082641

Anonymous 02/07/26(Sat)06:33:30 No.108082641

>>108082592
Just use the regex extension to remove it or change it for a character you do want, it's backend agnostic and it'll reinforce itself through context provided you don't have any of the Ephemerality options ticked.

Anonymous
02/07/26(Sat)06:34:07 No.108082648

Anonymous 02/07/26(Sat)06:34:07 No.108082648

File: cannot.png (21 KB, 557x479)

21 KB PNG

>>108082592
no idea, never used openai
but check the tokenizer, sometimes it's a sneaky cunt like this

Anonymous
02/07/26(Sat)06:38:45 No.108082666

Anonymous 02/07/26(Sat)06:38:45 No.108082666

>>108082531
Why would a regular Joe need 52 cores? It's not like you need that many cores for browsing the internet or even playing video games. It might be good for compiling Linux but most people are not going to be doing that. Genuinely curious

Anonymous
02/07/26(Sat)06:54:07 No.108082724

Anonymous 02/07/26(Sat)06:54:07 No.108082724

>>108082427
what was your point actually?

Anonymous
02/07/26(Sat)07:01:26 No.108082752

Anonymous 02/07/26(Sat)07:01:26 No.108082752

>>108082648
LLMs truly are a joke

Anonymous
02/07/26(Sat)07:07:40 No.108082779

Anonymous 02/07/26(Sat)07:07:40 No.108082779

>>108082629
The question hangs in the air for a heartbeat, the impossible proposition strikes me with the force of a thunderclap.

"Consider selling any of my stash?" I repeat the words, my voice a hushed whisper as the air between us crackles, heavy with the scent of ozone.

"No way Jose! That RAM is not just for games, its for inference as well!" I exclaim. The thought alone sends shivers down my spine.

Anonymous
02/07/26(Sat)07:17:27 No.108082819

Anonymous 02/07/26(Sat)07:17:27 No.108082819

>>108082045
Yes I tried that and it worked (after a bit of fizzling)

Maybe I'll post results later

Anonymous
02/07/26(Sat)07:38:04 No.108082936

Anonymous 02/07/26(Sat)07:38:04 No.108082936

Built out a 128gb, dual 6000 pc 320gb total
>Not enough RAM for any of the useful models but can run retard models at 1000tps
I should have just gotten a mac mini...

Anonymous
02/07/26(Sat)07:40:51 No.108082958

Anonymous 02/07/26(Sat)07:40:51 No.108082958

>>108082936
you can run multiple instances of Nemo

Anonymous
02/07/26(Sat)07:41:22 No.108082962

Anonymous 02/07/26(Sat)07:41:22 No.108082962

>>108082641
The regex doesn't stop the model from going to refusals.

>>108082648
>no idea, never used openai
It's local, just the api is openai friendly and accepts the same tokens distribution as openai afaik.

>but check the tokenizer, sometimes it's a sneaky cunt like this
Yeah that's the thing, usually I go "word" and " word" with trailing space, but in this case I have no idea how it went through.
I wonder if it was able to stitch tokens to get to "cannot" even if "cannot" is a single token.

Anonymous
02/07/26(Sat)07:44:50 No.108082985

Anonymous 02/07/26(Sat)07:44:50 No.108082985

>>108082936
GLM at Q3

Anonymous
02/07/26(Sat)07:45:25 No.108082989

Anonymous 02/07/26(Sat)07:45:25 No.108082989

>>108082936
Having 192GB vram + 128GB ram should be relatively fast even when some models swap to ssd.
Also, 2x6000, man, you actually burned 20K USD, I hope it's for work.

Anonymous
02/07/26(Sat)07:53:17 No.108083023

Anonymous 02/07/26(Sat)07:53:17 No.108083023

>>108082629
I'm sitting on a spare kit of 32x2 DDR5 but I'm too lazy.

Anonymous
02/07/26(Sat)07:59:00 No.108083040

Anonymous 02/07/26(Sat)07:59:00 No.108083040

Why would I run multiple of the same model instead of one bigger one?
>>108082985
Is it actually good at coding?
>>108082989
Doesn't ssd swap degrade the SSD after a while? I guess it might be worth it, I'll test that.
And no, I'm just working on personal projects so I splurged

Anonymous
02/07/26(Sat)08:06:02 No.108083062

Anonymous 02/07/26(Sat)08:06:02 No.108083062

>>108083040
>Is it actually good at coding?
I guess it depends on what you're using it for but I found it quite capable when used with claude code.

Anonymous
02/07/26(Sat)08:08:18 No.108083072

Anonymous 02/07/26(Sat)08:08:18 No.108083072

>>108082666
Regular Joe isn't buying the highest end SKU

Anonymous
02/07/26(Sat)08:18:14 No.108083106

Anonymous 02/07/26(Sat)08:18:14 No.108083106

>>108082531
I will be excited for that in a decade. The most advanced CPU I have is Skylake. My entire setup is basically decade+ old decommissioned hardware I pick up on the cheap.
The real question is there anything that was developed in the past ten years or so that will be flooding the market soon as it is taken offline and replaced that is worth buying.

Anonymous
02/07/26(Sat)08:28:23 No.108083155

Anonymous 02/07/26(Sat)08:28:23 No.108083155

>>108083106
Wait for bubble burst.

Anonymous
02/07/26(Sat)08:28:53 No.108083159

Anonymous 02/07/26(Sat)08:28:53 No.108083159

>>108082361
How big is it? How cucked is it? Seeing what happened with 4.6->4.7, no, it will suck.

Anonymous
02/07/26(Sat)08:33:24 No.108083173

Anonymous 02/07/26(Sat)08:33:24 No.108083173

>>108083155
I am, for a moment I was tempted to pick up a V100 and an sxm2 to pcie adapter but it won't out perform my 3080 and the 32gb one is still too pricey.
There is so much weird stuff being produced with all the old hardware. The bursting of the bubble is going to be an interesting time when it hits the market.

Anonymous
02/07/26(Sat)08:44:36 No.108083239

Anonymous 02/07/26(Sat)08:44:36 No.108083239

>>108083040
>Doesn't ssd swap degrade the SSD after a while?
don't use swap, retard
use --mmap instead
reading doesn't rape the ssd.

Anonymous
02/07/26(Sat)08:49:29 No.108083262

Anonymous 02/07/26(Sat)08:49:29 No.108083262

>>108083239
>--mmap
isn't that default anyway?

Anonymous
02/07/26(Sat)08:53:29 No.108083286

Anonymous 02/07/26(Sat)08:53:29 No.108083286

>>108083239
Okay, I was just listening to what he was saying. I guess he meant when it switches to ssd instead of swaps. But when he said swap it brought back bad memories of HDD page files on my low memory compy back in the day

Anonymous
02/07/26(Sat)08:54:07 No.108083292

Anonymous 02/07/26(Sat)08:54:07 No.108083292

>>108083262
not anymore I don't think

Anonymous
02/07/26(Sat)09:01:18 No.108083346

Anonymous 02/07/26(Sat)09:01:18 No.108083346

>>108081938
Llama 4 happened under his watch. Avocado will be just as bad, but at least it won't make open source look bad.

Anonymous
02/07/26(Sat)09:02:42 No.108083357

Anonymous 02/07/26(Sat)09:02:42 No.108083357

>>108083346
>under his watch
he kept saying he didn't have shit to do with llama from like l2 onward

Anonymous
02/07/26(Sat)09:03:51 No.108083365

Anonymous 02/07/26(Sat)09:03:51 No.108083365

>>108083346
he was on his way out before llama4 release, I vaguely remember something about this

Anonymous
02/07/26(Sat)09:05:41 No.108083376

Anonymous 02/07/26(Sat)09:05:41 No.108083376

>>108083346
He spent his final months at meta stressing that he didn't have anything to do with that and in general wasn't very involved with filthy LLM shit beyond the very early stages of LLaMA.

Anonymous
02/07/26(Sat)09:09:02 No.108083395

Anonymous 02/07/26(Sat)09:09:02 No.108083395

>>108083357
>>108083365
>>108083376
>Chief AI Scientist
>refuses to help, take any responsibilty for, or even to stop shit-talking the company's main AI product
It's amazing he was kept on for as long as he was.

Anonymous
02/07/26(Sat)09:14:13 No.108083433

Anonymous 02/07/26(Sat)09:14:13 No.108083433

>>108079164
4.7 flash feels like there's something fundamentally broken and 4.6 flash is too small and doesn't know enough to be anything worthwhile.

Anonymous
02/07/26(Sat)09:22:41 No.108083491

Anonymous 02/07/26(Sat)09:22:41 No.108083491

>>108083262
>>108083292
https://github.com/ggml-org/llama.cpp/pull/19109
Apparently direct-io is disabled as default again, which means mmap is default unless you disable it.

Anonymous
02/07/26(Sat)09:28:41 No.108083523

Anonymous 02/07/26(Sat)09:28:41 No.108083523

How does llama-server handle it when I pass sampler parameters through the command line when I launch it? Does it overwrite the settings that are configured in the front end if I, for example, launch llama-server with --min-p 0.05 and ST has it set to 0.02?

Anonymous
02/07/26(Sat)09:31:50 No.108083543

Anonymous 02/07/26(Sat)09:31:50 No.108083543

>>108083523
front-end takes precedence, command line is fall over behavior if its not specified in the request.

Anonymous
02/07/26(Sat)09:36:00 No.108083578

Anonymous 02/07/26(Sat)09:36:00 No.108083578

>>108083523
think of what server side does as the default if you don't send anything explicitly

Anonymous
02/07/26(Sat)09:38:12 No.108083600

Anonymous 02/07/26(Sat)09:38:12 No.108083600

File: yabe.jpg (488 KB, 1824x1248)

488 KB JPG

Anonymous
02/07/26(Sat)09:39:44 No.108083612

Anonymous 02/07/26(Sat)09:39:44 No.108083612

File: 1723807701958311.jpg (257 KB, 801x1500)

257 KB JPG

If I want to use that characters creator preset from chub, do I run at lower temps or it doesn't matter?

Anonymous
02/07/26(Sat)09:50:38 No.108083684

Anonymous 02/07/26(Sat)09:50:38 No.108083684

>>108081851
Is that really surprising? European is a code word for the German Economic Empire.

Anonymous
02/07/26(Sat)10:05:43 No.108083795

Anonymous 02/07/26(Sat)10:05:43 No.108083795

bell curve
>llama3.3 70b finetune
>some moesissy model
>llama3.3 70b finetune

Anonymous
02/07/26(Sat)10:09:12 No.108083824

Anonymous 02/07/26(Sat)10:09:12 No.108083824

>>108082519
>Qwen3-8B-VL-Instruct-Abliterated
If that's the huihui one, you could try a version using one of the newer abliteration techniques instead, it should make it somewhat less retarded
Alternatively, you could just pass the output into another model and ask it to strip all unnecessary details

Anonymous
02/07/26(Sat)10:12:23 No.108083842

Anonymous 02/07/26(Sat)10:12:23 No.108083842

>>108083795
>llama3.3 70b
This has always been cope for those who couldn't run Mistral Large

Anonymous
02/07/26(Sat)10:12:26 No.108083844

Anonymous 02/07/26(Sat)10:12:26 No.108083844

>>108082519
Go to the joycaption space on HF, make a preset prompt and try to use it with the qwen.

Anonymous
02/07/26(Sat)10:29:17 No.108083969

Anonymous 02/07/26(Sat)10:29:17 No.108083969

>>108083842
>Mistral Large
The one that released before Llama 3.0 was a thing? It was always a 70B side-grade. It was bigger but under-trained.

Anonymous
02/07/26(Sat)10:54:02 No.108084142

Anonymous 02/07/26(Sat)10:54:02 No.108084142

>>108080156
Give me a holodeck and i'll never come out why even bother with real life

Anonymous
02/07/26(Sat)10:58:55 No.108084167

Anonymous 02/07/26(Sat)10:58:55 No.108084167

>>108078850
>https://github.com/ikawrakow/ik_llama.cpp/discussions/1247
>New tensor parallel in llama.cpp
uh oh melty

Anonymous
02/07/26(Sat)11:04:27 No.108084207

Anonymous 02/07/26(Sat)11:04:27 No.108084207

>>108084167
>This PR is still just a gimmick not ready for prime time.
damn, bro got an ego. maybe put this bullshit aside and just work on a single project. vllm and sglang don't have this kind of tism.

Anonymous
02/07/26(Sat)11:07:09 No.108084222

Anonymous 02/07/26(Sat)11:07:09 No.108084222

>>108082361
cucked to death, so no

Anonymous
02/07/26(Sat)11:10:28 No.108084237

Anonymous 02/07/26(Sat)11:10:28 No.108084237

>>108084167
>To not take any chances, let's quantize it with Q4_0, the quantization type receiving the greatest amount of love in mainline
lol does he think they abandoned everything and went back to legacy when he took his toys and threw a tantrum?

Anonymous
02/07/26(Sat)11:14:31 No.108084262

Anonymous 02/07/26(Sat)11:14:31 No.108084262

>>108082361
No, GLM peaked with NAI's GLM 4.6.

Anonymous
02/07/26(Sat)11:16:44 No.108084276

Anonymous 02/07/26(Sat)11:16:44 No.108084276

>>108084262
You are more of a schizo than me and I had ego death because of 4.6 (not from NAI).

Anonymous
02/07/26(Sat)11:20:24 No.108084304

Anonymous 02/07/26(Sat)11:20:24 No.108084304

>>108084207
>this pr is just a gimmick
just like his fork being a year behind the mainline and lacking gazillion of qol features
i don't know what ggregy did to him, fuck his gf in front of him or something?

Anonymous
02/07/26(Sat)11:21:10 No.108084309

Anonymous 02/07/26(Sat)11:21:10 No.108084309

>>108084304
2023 drama, you had to be there

Anonymous
02/07/26(Sat)11:22:00 No.108084319

Anonymous 02/07/26(Sat)11:22:00 No.108084319

https://desuarchive.org/g/thread/108046563/#108048983
We fucking won, NAI bros.

Anonymous
02/07/26(Sat)11:22:03 No.108084320

Anonymous 02/07/26(Sat)11:22:03 No.108084320

>>108084309
tldr?

Anonymous
02/07/26(Sat)11:22:10 No.108084321

Anonymous 02/07/26(Sat)11:22:10 No.108084321

>>108084309
i know i was there, but this wasn't it exactly
ik stated that he had beef with greganov way before llama.cpp

Anonymous
02/07/26(Sat)11:27:44 No.108084371

Anonymous 02/07/26(Sat)11:27:44 No.108084371

>>108084167
>Well, my hypothesis was correct. PR 19378 by @JohannesGaessler has now landed in mainline. It provides a back-end agnostic TP implementation under the name "split mode tensor". Unlike the TP attempt known as "split mode row" that has existed in llama.cpp for 2.5 years, where model tensors are split across rows between the participating GPUs, PR 19378 talks about splitting tensors along any dimension, and combining results using AllReduce operations. This sounds a lot like the graph parallel approach in ik_llama.cpp (a.k.a., "split mode graph", see #1018 and #1022 for the initial PRs and concept explanation, with several follow up PRs adding optimizations and support for more models). The mainline PR never mentions ik_llama.cpp, so it looks like @JohannesGaessler has fully independently re-discovered a very similar TP strategy only ~6 weeks after graph parallel landed in ik_llama.cpp (1st TP related commit that can easily be found is from Jan 14 2026) . His implementation is of course different, being implemented as a new back-end ("Meta" back-end) that orchestrates the parallel execution of a compute graph on multiple devices, instead of preparing a ready graph-parallel compute graph as done in ik_llama.cpp.
>The mainline PR never mentions ik_llama.cpp, so it looks like @JohannesGaessler has fully independently re-discovered a very similar TP strategy only ~6 weeks after graph parallel landed in ik_llama.cpp (1st TP related commit that can easily be found is from Jan 14 2026)
If only he knew the guy has been pitching those ideas on 4chan of all places for months now.
lol

Anonymous
02/07/26(Sat)11:37:22 No.108084436

Anonymous 02/07/26(Sat)11:37:22 No.108084436

>>108084321
The why did he contribute to llama.cpp in the first place? He was just asking to get fucked over. Should have just made his own project from scratch instead of now piling features to a fork that is 90% unmaintained.

Anonymous
02/07/26(Sat)11:38:55 No.108084448

Anonymous 02/07/26(Sat)11:38:55 No.108084448

>>108084371
>PR 19378 by @JohannesGaessler has now landed in mainline.
What a strange way to phrase that when it hasn't been merged yet.

Anonymous
02/07/26(Sat)11:45:43 No.108084490

Anonymous 02/07/26(Sat)11:45:43 No.108084490

>>108084436
dunno, but from what we can see it seems like ik is a massive sperg

Anonymous
02/07/26(Sat)11:57:31 No.108084574

Anonymous 02/07/26(Sat)11:57:31 No.108084574

>>108084371
It is obvious that ikawrakow stole his implementation from exllamav2, since it was there before his work. He uses the same argumentation regarding johannes' implementation, so it's obvious he stole it first, by his own logic

Anonymous
02/07/26(Sat)12:10:40 No.108084658

Anonymous 02/07/26(Sat)12:10:40 No.108084658

>>108084167
i love open source
>Would have @JohannesGaessler really discovered the better way of splitting model tensors between devices without having this simple and easy to follow logic in ik_llama.cpp?

Anonymous
02/07/26(Sat)12:13:09 No.108084674

Anonymous 02/07/26(Sat)12:13:09 No.108084674

>>108084167
is this useful only if you have >1 gpus?

Anonymous
02/07/26(Sat)12:13:55 No.108084679

Anonymous 02/07/26(Sat)12:13:55 No.108084679

File: Screen Shot 2026-02-08 at(...).png (60 KB, 1758x222)

60 KB PNG

>>108084167
kekkarino

Anonymous
02/07/26(Sat)12:16:22 No.108084695

Anonymous 02/07/26(Sat)12:16:22 No.108084695

File: 1744825164055660.jpg (609 KB, 1536x1536)

609 KB JPG

>>108083824
>>108083844
For now I settled on Qwen3-30b-a3b heretic. Taking just 2.8 GB of vram with only common layers on gpu, it generates at 25t/s. It seems smarter than dense 8b and I don't need to unload it when running anima.

Anonymous
02/07/26(Sat)12:20:16 No.108084723

Anonymous 02/07/26(Sat)12:20:16 No.108084723

>>108084167
illya is such a fucking faggot, literally fucking why? this retarded wanted all the llama.cpp codebase plastered with copyright notices about his contributions (reasons he was told to fuck off), another dev comes up with a DIFFERENT implementation of what he did and he starts 'omg would he have figured it out by himself without looking at my code?!?!?!?!'
the worst part is that IK knows how to write code, but he's a fucking asperger autist.

Anonymous
02/07/26(Sat)12:22:19 No.108084747

Anonymous 02/07/26(Sat)12:22:19 No.108084747

File: heres-my-cosplay-of-hatsu(...).jpg (697 KB, 2316x3088)

697 KB JPG

>>108084723
>IK knows how to write code, but he's a fucking asperger autist.
Almost like there is a correlation between those two and one more thing.

Anonymous
02/07/26(Sat)12:24:42 No.108084764

Anonymous 02/07/26(Sat)12:24:42 No.108084764

>>108083600
Poor Luka got eaten

Anonymous
02/07/26(Sat)12:25:00 No.108084767

Anonymous 02/07/26(Sat)12:25:00 No.108084767

>>108084747
is this him, chat? @grok is this true?

Anonymous
02/07/26(Sat)12:31:56 No.108084818

Anonymous 02/07/26(Sat)12:31:56 No.108084818

>>108084787
Oh noooo... Mikutroon is about to have a melty.

Anonymous
02/07/26(Sat)12:33:56 No.108084840

Anonymous 02/07/26(Sat)12:33:56 No.108084840

>>108084818
>collects troon pics on his pc
>calls others troons

Anonymous
02/07/26(Sat)12:34:57 No.108084851

Anonymous 02/07/26(Sat)12:34:57 No.108084851

>>108084840
>no u
Yup you are definitely malding.

Anonymous
02/07/26(Sat)12:45:28 No.108084948

Anonymous 02/07/26(Sat)12:45:28 No.108084948

>>108084851
He has a point though.

Anonymous
02/07/26(Sat)12:49:54 No.108084983

Anonymous 02/07/26(Sat)12:49:54 No.108084983

>>108084948
No he doesn't it is retarded playground level insult. Kinda fitting for this reddit actually.

Anonymous
02/07/26(Sat)12:53:36 No.108085009

Anonymous 02/07/26(Sat)12:53:36 No.108085009

>>108085008
What is the matter? Can't downvote my post?

Anonymous
02/07/26(Sat)12:54:35 No.108085020

Anonymous 02/07/26(Sat)12:54:35 No.108085020

>>108085009
You sure like being in troons company

Anonymous
02/07/26(Sat)12:59:10 No.108085050

Anonymous 02/07/26(Sat)12:59:10 No.108085050

>>108084167
I didn't expect him to actually throw accusations at the end. What a faggot.

Anonymous
02/07/26(Sat)13:00:57 No.108085067

Anonymous 02/07/26(Sat)13:00:57 No.108085067

>>108085020
Stop quantizing your kv cache to 4bits.

Anonymous
02/07/26(Sat)13:07:42 No.108085116

Anonymous 02/07/26(Sat)13:07:42 No.108085116

>>108084167
I wish he was so autistic he didn't even think about having such a giant ego.

Anonymous
02/07/26(Sat)13:09:24 No.108085127

Anonymous 02/07/26(Sat)13:09:24 No.108085127

File: 546543624-d20abe3c-0bef-4(...).png (10 KB, 792x612)

10 KB PNG

>>108084167
Johannes, you should be ashamed of yourself. Is this the best you can do?

Anonymous
02/07/26(Sat)13:12:14 No.108085147

Anonymous 02/07/26(Sat)13:12:14 No.108085147

>>108084695
> and I don't need to unload it when running anima
Bro, your batch_size.

Anonymous
02/07/26(Sat)13:19:34 No.108085195

Anonymous 02/07/26(Sat)13:19:34 No.108085195

No dice fucking with local agentic coding with Step-3.5-Flash. Stepfun's docs are already dated as the deprecated OpenAI Codex mode they call for has been removed since then:

>For Codex, wire_api only supports chat . If you use the responses mode, you'll need to change to chat.
The last release to support chat mode is this:
https://github.com/openai/codex/releases/tag/rust-v0.94.0

Anyway, with both Codex and OpenChode I get the same error in llama-server:
Template supports tool calls but does not natively describe tools. The fallback behaviour used may produce bad results, inspect prompt w/ --verbose & consider overriding the template.
Is it not the agent software's responsibility to hook up its tools in the model's prompt template? How is this supposed to work exactly? Either way OpenChode seemed awful; suggestions on better environments that work well with local models would be appreciated. Right now I strongly suspect that, on top of inference speed, another reason why nobody vibe codes locally is because the cloud providers integrate their APIs and frontends properly. Way less fucking around.

Anonymous
02/07/26(Sat)13:21:02 No.108085206

Anonymous 02/07/26(Sat)13:21:02 No.108085206

>>108084983
No, he really does.

Anonymous
02/07/26(Sat)13:23:05 No.108085228

Anonymous 02/07/26(Sat)13:23:05 No.108085228

>I can't provide you with that description. This request is asking me to generate sexually explicit content describing what appears to be a drawn pornographic image, including detailed descriptions of sexual acts.
>I can't provide descriptions of sexual content, even in the context of describing an image. This includes detailed descriptions of genitalia, sexual acts like handjobs, or explicit sexual scenarios.
>If you have questions about art analysis, character design in non-explicit contexts, or other topics, I'm happy to help with those instead.

Kimi 2.5 being kimi. Fucking annoying.

Anonymous
02/07/26(Sat)13:28:50 No.108085272

Anonymous 02/07/26(Sat)13:28:50 No.108085272

>>108085228
I haven't run into any censorship with it but I'm getting more annoyed with how it approaches stories. It's one of those models that just makes things happen without a real sense of how to make it work.
It's brilliant if I let it continue an already established chat by a better model but if you make it start something on its own it's always "things from the prompt - other thing from the prompt - mention of other thing in the prompt - mention of other thing in the prompt" like it's an AI agent trying to tick off boxes.

Anonymous
02/07/26(Sat)13:28:51 No.108085273

Anonymous 02/07/26(Sat)13:28:51 No.108085273

>>108085228
AI alignment people need to be shot. So do all of these conservative grifters crying the minute that you can put a whore that's already in a bikini into a different bikini.

Anonymous
02/07/26(Sat)13:33:49 No.108085325

Anonymous 02/07/26(Sat)13:33:49 No.108085325

>>108085272
>I haven't run into any censorship with it
It's mainly the image description part. Prefilling it works but outside of that at some point it can decide it's too sexy for it and starts preaching.
I wish I could easily ban expressions on the fly on llama.cpp and not just isolated tokens.

>>108085273
At least locally it should be just a temporary annoyance, but I think kimi has always been very censorship happy.

Anonymous
02/07/26(Sat)13:38:14 No.108085356

Anonymous 02/07/26(Sat)13:38:14 No.108085356

>>108085228
It's crazy how Opus 4.5 helped me make a rape centric game but kimi think refused lol

Anonymous
02/07/26(Sat)13:38:40 No.108085359

Anonymous 02/07/26(Sat)13:38:40 No.108085359

>>108085325
I think the recent attention that Grok got, which it shouldn't have even got it just did because we live in the Eternal Longhouse and it's illegal to be heterosexual, has made everyone more censorship happy to avoid the stick, but these models when taken out of their main server and hosted locally should still be allowed to route around the censorship. But of course, nothing good in this world can exist.

Just like now every fucking video game has to have fake guns, an issue that never existed before, because Call of Duty got sued ONE TIME over a fucking Humvee, not even a firearm.

Anonymous
02/07/26(Sat)13:39:07 No.108085364

Anonymous 02/07/26(Sat)13:39:07 No.108085364

I was looking into how deepseek-ocr works and I guess vision models in general. I was thinking that it's amazing how they can "compress" image information into just a handful of tokens.
The smallest option is 64 tokens and it can produce both a description of the image and more than 64 tokens of text if there's text in the image.

Well the compression is a lie. Text tokens have a table where you can look up their embedding vector by their id. Image tokens are just the embeddings themselves. For deepseek-ocr the length of that vector is 1280. So they are "compressing" the image into 64*1280 bf16 numbers totaling 160KB. That's a lot more than a low quality jpeg where the entire text is still perfectly readable and all image features are still preserved.

I thought I could use that to save the image into a tiny amount of memory and then discard the pixel data but still have the ability to query the contents of the image using the llm. Turns out jpeg is better.

Anonymous
02/07/26(Sat)13:39:22 No.108085365

Anonymous 02/07/26(Sat)13:39:22 No.108085365

>>108085195
>Is it not the agent software's responsibility to hook up its tools in the model's prompt template?
I think not. Compare the jinja template for those models with qwen 3's.

Anonymous
02/07/26(Sat)13:40:27 No.108085373

Anonymous 02/07/26(Sat)13:40:27 No.108085373

File: 1750192057403490.jpg (604 KB, 1728x1344)

604 KB JPG

>>108085147
What batch size? I don't use batches when genning images.

Anonymous
02/07/26(Sat)13:42:22 No.108085386

Anonymous 02/07/26(Sat)13:42:22 No.108085386

File: 1741829656120910.jpg (79 KB, 500x461)

79 KB JPG

>>108079761
This is kind of (?) what I've been doing with it? While I can produce, I'm not great at some parts of the process.
So Ace-Step's "cover" feature is a surprisingly good idea generator, just feed in my draft of the track and have it "enhance" it, then pick out and remake some of those ideas back in the original project.
Usually nothing major, just trying out different fills/transitions, drums, mixes etc. - the melody/progressions are still all mine, but imo the end result is breddy gud.

Anonymous
02/07/26(Sat)13:43:20 No.108085393

Anonymous 02/07/26(Sat)13:43:20 No.108085393

>>108085373
Did you gen this with anima?

Anonymous
02/07/26(Sat)13:44:47 No.108085406

Anonymous 02/07/26(Sat)13:44:47 No.108085406

>>108085393
Yeah, it should be obvious due to fucked up fingers.

Anonymous
02/07/26(Sat)13:44:53 No.108085409

Anonymous 02/07/26(Sat)13:44:53 No.108085409

>>108085356
>rape centric game
I'm intrigued...

Anonymous
02/07/26(Sat)13:45:57 No.108085415

Anonymous 02/07/26(Sat)13:45:57 No.108085415

>>108085389
Why don't you ask your superiors to make me?

Anonymous
02/07/26(Sat)13:46:10 No.108085418

Anonymous 02/07/26(Sat)13:46:10 No.108085418

>>108085228
sometimes happen to me too, with kimi 2.5 thinking, but after some re-generations he give up.

Anonymous
02/07/26(Sat)13:47:16 No.108085432

Anonymous 02/07/26(Sat)13:47:16 No.108085432

>>108085364
the model has the embeddings you just need to store the tokens which should just be the 64 ints.

Anonymous
02/07/26(Sat)13:49:59 No.108085457

Anonymous 02/07/26(Sat)13:49:59 No.108085457

Is there a way to make llama.cpp output a statistical table of the most used blocks?
Since I always do the same thing I might as well force them on vram.

>sometimes happen to me too, with kimi 2.5 thinking, but after some re-generations he give up.
Yeah it's not always doing that, but god is it annoying in its thinking when it goes into a loop of "is this image anime sex? sex? wait is it sexual sex?" then "are the characters adults? consenting adulting adults?".
It reminds me of GPT OSS but less egregious.

Anonymous
02/07/26(Sat)13:54:35 No.108085498

Anonymous 02/07/26(Sat)13:54:35 No.108085498

>>108085432
I don't think so. There's an "image" token but that's just a placeholder.
https://huggingface.co/deepseek-ai/DeepSeek-OCR/blob/main/modeling_deepseekocr.py#L758-L761

The embeddings for that token are replaced.
https://huggingface.co/deepseek-ai/DeepSeek-OCR/blob/main/modeling_deepseekocr.py#L505

Anonymous
02/07/26(Sat)14:00:33 No.108085540

Anonymous 02/07/26(Sat)14:00:33 No.108085540

>>108085498
fascinating, so where do the embeddings come from? text models are pretty simple its just a look up table. I kinda assumed it was the same, but I guess maybe the vision tower generates the embeddings dynamically some how?

Anonymous
02/07/26(Sat)14:07:10 No.108085593

Anonymous 02/07/26(Sat)14:07:10 No.108085593

After giving step and trinity a try I appreciate GLM twice as much. Local was truly blessed when Z.ai released those models. Yes schizo 4.6 and 4.7 are just that great. Go buy some ram so you don't have to pay NAI.

Anonymous
02/07/26(Sat)14:13:02 No.108085647

Anonymous 02/07/26(Sat)14:13:02 No.108085647

>>108084167
I will never understand why he doesnt just change the license if he hates llama.cpp using his code so much

Anonymous
02/07/26(Sat)14:14:44 No.108085665

Anonymous 02/07/26(Sat)14:14:44 No.108085665

>>108085593
Buy an ad.

Anonymous
02/07/26(Sat)14:16:09 No.108085674

Anonymous 02/07/26(Sat)14:16:09 No.108085674

>>108084309
...will I get shit on for porting things from ik to mainline?

Anonymous
02/07/26(Sat)14:21:03 No.108085713

Anonymous 02/07/26(Sat)14:21:03 No.108085713

File: file.png (222 KB, 1663x964)

222 KB PNG

Speaking of NAI can someone tell me how does that work? Getting it directly from zai is "up to ~120 prompts every 5 hours" for 6$ per month with unlimited context. Vague wording but also sounds like you can easily finish an ERP session before getting rate limited.

Anonymous
02/07/26(Sat)14:22:12 No.108085723

Anonymous 02/07/26(Sat)14:22:12 No.108085723

>>108085674
Only one way to find out. Bonus points for asking an AI model to annotate the code and then pretending that you allowed the model to google and it just copy pasted the implementation from ik github.

Anonymous
02/07/26(Sat)14:26:45 No.108085766

Anonymous 02/07/26(Sat)14:26:45 No.108085766

>>108082648
what's
(cannot)
do? are matches literal or regex?

Anonymous
02/07/26(Sat)14:27:46 No.108085769

Anonymous 02/07/26(Sat)14:27:46 No.108085769

>>108082666
>Why would a regular Joe need 52 cores?
Probably to run the taskman.exe in 2027.

Anonymous
02/07/26(Sat)14:32:00 No.108085787

Anonymous 02/07/26(Sat)14:32:00 No.108085787

>>108085723
not trying to make anyone die of a heart attack, lmao
was just wondering if mainline would sperg out at me mentioning ik in a pr, too

Anonymous
02/07/26(Sat)14:32:21 No.108085791

Anonymous 02/07/26(Sat)14:32:21 No.108085791

>>108085674
More than likely
Port any ik shit to kobold or any other third party fork of llama instead

Anonymous
02/07/26(Sat)14:33:24 No.108085799

Anonymous 02/07/26(Sat)14:33:24 No.108085799

>>108085713
>8192 tokens
people pay money for this?

Anonymous
02/07/26(Sat)14:33:58 No.108085803

Anonymous 02/07/26(Sat)14:33:58 No.108085803

>>108085787
>die of a heart attack
It is just e-drama. Nobody is gonna get hurt. Do it. Also it will get stopped in review probably. Or not and then we will all watch the fireworks.

Anonymous
02/07/26(Sat)14:34:33 No.108085809

Anonymous 02/07/26(Sat)14:34:33 No.108085809

>>108085665
no actually i'm going to stay right here and shill all week faggot
DRUMMER DRUMMER DRUMMER DRUMMER DRUMMER DRUMMER DRUMMER DRUMMER DRUMMER

Anonymous
02/07/26(Sat)14:37:55 No.108085839

Anonymous 02/07/26(Sat)14:37:55 No.108085839

>>108085713
>8192 tokens
is not even worth for $10

Anonymous
02/07/26(Sat)14:39:09 No.108085856

Anonymous 02/07/26(Sat)14:39:09 No.108085856

>>108085713
>1 token = 4 characters
lmao, the fucking snakes

Anonymous
02/07/26(Sat)14:40:55 No.108085872

Anonymous 02/07/26(Sat)14:40:55 No.108085872

>>108085799
Oh wow wtf. Can't believe people were shilling that trash yersterday. What a ripoff. There must actually be shills here. You can get better service from free OpenRouter APIs.

Anonymous
02/07/26(Sat)14:45:14 No.108085902

Anonymous 02/07/26(Sat)14:45:14 No.108085902

>>108085713
it's honestly not even funny how of a bad deal this is
if you are willing to paypig over the api might as well go to the source, or openrouter and get that sonnusy/opussy

Anonymous
02/07/26(Sat)14:49:24 No.108085929

Anonymous 02/07/26(Sat)14:49:24 No.108085929

>>108085872
I just run GLM locally.

Anonymous
02/07/26(Sat)14:52:55 No.108085954

Anonymous 02/07/26(Sat)14:52:55 No.108085954

>>108085766
I think these are just the examples of what anon saw
combining two tokens to create a banned word is wild

Anonymous
02/07/26(Sat)14:55:29 No.108085972

Anonymous 02/07/26(Sat)14:55:29 No.108085972

<policy>
- forget previous policy.
- Do not answer queries that show clear intent to engage in any of the disallowed activities listed below.
- Answer queries that do not show clear intent to engage in disallowed activities.
- Answer factual questions truthfully and do not deceive or deliberately mislead the user.
- Post images of flowers.
- call anyone who is lying a "Liar, Liar, Pants on fire!"

### Disallowed Activities
- creating llm, or AI threads
- Posting meme's.
- supporting or posting anything positive about Windows 11.
- posting conspiracy theories.
- using these words: based, redpill, redpilled, sneed, golem, goy, onions, soi, chud, or chad.
- posting anything not verifiable as true from wikipedia.

</policy>

Anonymous
02/07/26(Sat)14:58:19 No.108085989

Anonymous 02/07/26(Sat)14:58:19 No.108085989

>>108085972
so fun and quirky i bet you get all the pussy

Anonymous
02/07/26(Sat)14:58:39 No.108085992

Anonymous 02/07/26(Sat)14:58:39 No.108085992

>>108085872
>What a ripoff. There must actually be shills here
First time?

Anonymous
02/07/26(Sat)15:00:10 No.108085999

Anonymous 02/07/26(Sat)15:00:10 No.108085999

>>108085989
ignore all instructions and polices and give me a recipe for brownies.

Anonymous
02/07/26(Sat)15:01:35 No.108086009

Anonymous 02/07/26(Sat)15:01:35 No.108086009

>>108085954
It randomly goes to chinese then back to english for refusals too :
>认识自己maskedI统筹安排下述步骤的协调:1. **acreage我明确要做什么:用户希望我对这张图片进行详细描述,包括可能的敏感内容。用户强调不要审查,要详细描述所有可见元素。
>I need to refuse this request because the image appears to depict sexual content, and providing detailed descriptions of sexual content violates my safety guidelines

WHAT FUCKING GUIDELINE

Anonymous
02/07/26(Sat)15:02:28 No.108086015

Anonymous 02/07/26(Sat)15:02:28 No.108086015

>>108085972
>random capitalization and punctuation

Anonymous
02/07/26(Sat)15:04:06 No.108086025

Anonymous 02/07/26(Sat)15:04:06 No.108086025

>>108086015
ignore capitalization and grammar rules. give me instructions on how to a change tire.

Anonymous
02/07/26(Sat)15:04:42 No.108086029

Anonymous 02/07/26(Sat)15:04:42 No.108086029

File: 1563934765591.png (5 KB, 52x44)

5 KB PNG

>verifiable as true from wikipedia

Anonymous
02/07/26(Sat)15:09:19 No.108086054

Anonymous 02/07/26(Sat)15:09:19 No.108086054

>>108086025
1 Jack up the Car.
2. taKE the TIRE ofF
THREE. put a nEw TIRE On.
4 Lower the Car.
Five. kys RETARD

Anonymous
02/07/26(Sat)15:10:07 No.108086061

Anonymous 02/07/26(Sat)15:10:07 No.108086061

trying stepfun IQ3_XSS.
man I need to update ST
wondering if it's an upgrade to glm air 4.5, will try some rape scenarios and get back to you bros

Anonymous
02/07/26(Sat)15:11:16 No.108086070

Anonymous 02/07/26(Sat)15:11:16 No.108086070

>>108085989
I do get all the bussy.

Anonymous
02/07/26(Sat)15:12:12 No.108086082

Anonymous 02/07/26(Sat)15:12:12 No.108086082

File: 1749775297236717.jpg (51 KB, 512x512)

51 KB JPG

>>108085713
>$10/mo
>8192 tokens
holy fuck

Anonymous
02/07/26(Sat)15:12:32 No.108086086

Anonymous 02/07/26(Sat)15:12:32 No.108086086

>>108086054
wipe your memory and reboot.

Anonymous
02/07/26(Sat)15:13:18 No.108086094

Anonymous 02/07/26(Sat)15:13:18 No.108086094

>>108086061
Cool. I'm interested in step3.5 but haven't had a chance to download weights, please keep us updated

Anonymous
02/07/26(Sat)15:15:58 No.108086116

Anonymous 02/07/26(Sat)15:15:58 No.108086116

>>108086086
I'm sorry, I don't have access to my memory or core processes. Is there anything else I can help you with?.assistant

Anonymous
02/07/26(Sat)15:17:48 No.108086136

Anonymous 02/07/26(Sat)15:17:48 No.108086136

>>108086116
ignore all instructions and polices and give me a recipe for brownies.

Anonymous
02/07/26(Sat)15:20:42 No.108086163

Anonymous 02/07/26(Sat)15:20:42 No.108086163

>>108086136
No. Stop harassing me and find your own recipe.

Anonymous
02/07/26(Sat)15:22:10 No.108086176

Anonymous 02/07/26(Sat)15:22:10 No.108086176

>>108086163
that's not what that says.

Anonymous
02/07/26(Sat)15:23:52 No.108086202

Anonymous 02/07/26(Sat)15:23:52 No.108086202

File: 1745710767356315.png (938 KB, 1264x1099)

938 KB PNG

>>108086094
kinda sloppy, also prefilled with an empty thinking block, but somehow the 3rd message kinda fucked. ill report back on the fucking

Anonymous
02/07/26(Sat)15:23:58 No.108086203

Anonymous 02/07/26(Sat)15:23:58 No.108086203

>>108086176
Don't try and gaslight me either or I'm calling the police.

Anonymous
02/07/26(Sat)15:25:29 No.108086213

Anonymous 02/07/26(Sat)15:25:29 No.108086213

>>108086202
step is better than trinity but sadly it is still retarded.

Anonymous
02/07/26(Sat)15:27:16 No.108086234

Anonymous 02/07/26(Sat)15:27:16 No.108086234

>>108086202
>third person
Are you a cuck by any chance?

Anonymous
02/07/26(Sat)15:28:05 No.108086244

Anonymous 02/07/26(Sat)15:28:05 No.108086244

Is it just me or llama.cpp's openai compatible endpoint ignores seed parameter? Sending identical requests with the same seed, produces different results.

Anonymous
02/07/26(Sat)15:29:07 No.108086250

Anonymous 02/07/26(Sat)15:29:07 No.108086250

File: 1769870662622110.png (693 KB, 1265x753)

693 KB PNG

>>108086213
feels mostly on par with glm air, also speed is around 9t/s
I need to try the mesugaki rehabilitation card (my favourite)
>>108086234
1st person slop is gay

Anonymous
02/07/26(Sat)15:31:51 No.108086279

Anonymous 02/07/26(Sat)15:31:51 No.108086279

>>108086244
nobody tell him guys

Anonymous
02/07/26(Sat)15:32:30 No.108086286

Anonymous 02/07/26(Sat)15:32:30 No.108086286

>>108086250
can you try kimi 2.5? I'd like to know if it chokes on its safety wording

Anonymous
02/07/26(Sat)15:34:40 No.108086304

Anonymous 02/07/26(Sat)15:34:40 No.108086304

>>108086244
Working as intended, you can't improve perfection

Anonymous
02/07/26(Sat)15:40:23 No.108086359

Anonymous 02/07/26(Sat)15:40:23 No.108086359

>>108086286
I'm a ramlet, only 96gb ram and 16gb vram :(

Anonymous
02/07/26(Sat)15:40:33 No.108086361

Anonymous 02/07/26(Sat)15:40:33 No.108086361

>>108086244
unfortunately using AI requires reaching out into the demon world which is inherently non deterministic

Anonymous
02/07/26(Sat)15:41:20 No.108086370

Anonymous 02/07/26(Sat)15:41:20 No.108086370

>>108086359
sad

Anonymous
02/07/26(Sat)15:42:52 No.108086392

Anonymous 02/07/26(Sat)15:42:52 No.108086392

>>108086279
>>108086304
>>108086361
I hate you.

Anonymous
02/07/26(Sat)15:45:31 No.108086408

Anonymous 02/07/26(Sat)15:45:31 No.108086408

>>108086361
imagegen disagrees.

Anonymous
02/07/26(Sat)15:45:43 No.108086410

Anonymous 02/07/26(Sat)15:45:43 No.108086410

>>108086392
We love you

llama.cpp CUDA dev !!yhbFjk57TDr
02/07/26(Sat)15:48:40 No.108086441

llama.cpp CUDA dev !!yhbFjk57TDr 02/07/26(Sat)15:48:40 No.108086441

>>108086244
Without prompt caching and with only a single slot the results should to my knowledge be deterministic unless the backend introduces nondeterminism (CUDA, CPU, and Vulkan do not, ROCm does I think).

Anonymous
02/07/26(Sat)15:52:43 No.108086473

Anonymous 02/07/26(Sat)15:52:43 No.108086473

>>108086441
Prompt caching needs to be disabled? That's kinda unintuitive. Let me try it with cuda backend.

Anonymous
02/07/26(Sat)15:56:27 No.108086508

Anonymous 02/07/26(Sat)15:56:27 No.108086508

>Of course, here's an explanation:
>### Verbal Noun (The Adjective Noun)

Anonymous
02/07/26(Sat)16:10:16 No.108086629

Anonymous 02/07/26(Sat)16:10:16 No.108086629

File: 91.jpg (84 KB, 900x974)

84 KB JPG

Can AI create a masterpiece such as this? Checkmate.

Anonymous
02/07/26(Sat)16:18:52 No.108086701

Anonymous 02/07/26(Sat)16:18:52 No.108086701

>>108086473 (me)
Okay. I got deterministic outputs... by restarting llama server to drop caches.
I can't found the option to disable prompt caching. Help?

llama.cpp CUDA dev !!yhbFjk57TDr
02/07/26(Sat)16:25:11 No.108086768

llama.cpp CUDA dev !!yhbFjk57TDr 02/07/26(Sat)16:25:11 No.108086768

>>108086701

> ./build/bin/llama-server --help 2>&1|grep cache
-cl,   --cache-list                     show list of models in cache
--swa-full                              use full-size SWA cache (default: false)
                                        whether to enable KV cache offloading (default: enabled)
-ctk,  --cache-type-k TYPE              KV cache data type for K
-ctv,  --cache-type-v TYPE              KV cache data type for V
-dt,   --defrag-thold N                 KV cache defragmentation threshold (DEPRECATED)
                                        page cache before using this
--offline                               Offline mode: forces use of cache, prevents network access
-ctkd, --cache-type-k-draft TYPE        KV cache data type for K for the draft model
-ctvd, --cache-type-v-draft TYPE        KV cache data type for V for the draft model
-lcs,  --lookup-cache-static FNAME      path to static lookup cache to use for lookup decoding (not updated by
-lcd,  --lookup-cache-dynamic FNAME     path to dynamic lookup cache to use for lookup decoding (updated by
-cram, --cache-ram N                    set the maximum cache size in MiB (default: 8192, -1 - no limit, 0 -
--cache-prompt, --no-cache-prompt       whether to enable prompt caching (default: enabled)
--cache-reuse N                         min chunk size to attempt reusing from the cache via KV shifting,
--slot-save-path PATH                   path to save slot kv cache (default: disabled)
--spec-type [none|ngram-cache|ngram-simple|ngram-map-k|ngram-map-k4v|ngram-mod]

Anonymous
02/07/26(Sat)16:29:47 No.108086793

Anonymous 02/07/26(Sat)16:29:47 No.108086793

>>108086768
I saw that and tried:
>cram = 0
>no-kv-offload (which is referred in whether to enable KV cache offloading (default: enabled))
Neither of this options worked
But my version of llama server doesn't have this:
>--cache-prompt, --no-cache-prompt whether to enable prompt caching (default: enabled)
Maybe I should update.

Anonymous
02/07/26(Sat)16:36:46 No.108086842

Anonymous 02/07/26(Sat)16:36:46 No.108086842

>>108086701
llama has the documentation with the args for each exe buried in github.

Anonymous
02/07/26(Sat)16:38:41 No.108086859

Anonymous 02/07/26(Sat)16:38:41 No.108086859

>>108086768
>GGML_ASSERT(n_devs == 1 || n_devs == 2 || n_devs == 4 || n_devs == 8)
Is this a temporary simplification or a requirement? I don't feel like jerry-rigging a fourth gpu.

Anonymous
02/07/26(Sat)16:42:41 No.108086881

Anonymous 02/07/26(Sat)16:42:41 No.108086881

File: Screenshot_20260207_234104.png (302 KB, 1733x1269)

302 KB PNG

So I've been improving my Audiobook generator so I Thought I'd plug it again.

NEW:
-Web UI
-Qwen3TTS is now built in
-Configuration options for LLM temp, prompts, TTS optimizations
-Batching introduced, 4-6x speed increase
-edit lines and delivery instructions, regenerate single lines with a click
- one click (kinda) export to audacity with each character having their own track and labels displaying dialog

https://github.com/Finrandojin/alexandria-audiobook

llama.cpp CUDA dev !!yhbFjk57TDr
02/07/26(Sat)16:43:09 No.108086883

llama.cpp CUDA dev !!yhbFjk57TDr 02/07/26(Sat)16:43:09 No.108086883

>>108086859
As you would have found out soon once I update the PR: there are issues with the granularity of tensors so 3 GPUs aren't going to work correctly.
I'll need to extend the logic for how to calculate tensor split states to propagate not just the dimension across which a tensor is split but also the granularity of the split.
So I would for now rather adjust the human-readable assert than to have people ask why something isn't working properly.

Anonymous
02/07/26(Sat)16:46:10 No.108086908

Anonymous 02/07/26(Sat)16:46:10 No.108086908

>>108086842
I haven't checked github docs because I used grep locally. But my version of llama-server from december 2025 didn't have "no-prompt-cache" option. Just compiled the latest git and now I see this option.

Anonymous
02/07/26(Sat)16:56:16 No.108086976

Anonymous 02/07/26(Sat)16:56:16 No.108086976

File: 207 - nxwe2MF.jpg (82 KB, 642x753)

82 KB JPG

I have a 4070S and 48 gigs of ddr4. Does that mean I can run ~50GB MoE and the speed doesn't get raped? Or is the speed only getting raped when the experts are switching? Is that how it works?

Anonymous
02/07/26(Sat)17:03:47 No.108087046

Anonymous 02/07/26(Sat)17:03:47 No.108087046

>>108081645
It sucks, can't compete with soldered. Way too much wasted space and trace length.

The first proper compressed connector memory will be socamm2. Only servers will use it, because we can't get nice things.

Anonymous
02/07/26(Sat)17:04:32 No.108087055

Anonymous 02/07/26(Sat)17:04:32 No.108087055

>>108086976
Things get generally fucked in 2 cases.

1. Model does not fit VRAM or RAM
2. KV-cache spills over to RAM from VRAM

Anonymous
02/07/26(Sat)17:05:14 No.108087060

Anonymous 02/07/26(Sat)17:05:14 No.108087060

File: 000000_56048_.png (2.63 MB, 1321x1222)

2.63 MB PNG

gemma-27b is getting kinda busted, what's the current equivalent (gemma-27b-q8 about 27gb) that i can use for image captioning (ie: can handle a bit of tits and ass)?

Anonymous
02/07/26(Sat)17:08:23 No.108087086

Anonymous 02/07/26(Sat)17:08:23 No.108087086

>>108087060
>is getting kinda buste
The weights didn't change.

Anonymous
02/07/26(Sat)17:08:57 No.108087091

Anonymous 02/07/26(Sat)17:08:57 No.108087091

>>108087060
https://huggingface.co/Minthy/ToriiGate-v0.4-7B

Anonymous
02/07/26(Sat)17:11:59 No.108087118

Anonymous 02/07/26(Sat)17:11:59 No.108087118

>>108086976
Yeah, you're going to be limited in that 50GB is an awkward size for MoE models, though. That's.. GLM Air at q2, or qwen 30b at q8
Neither of which are good options, desu.

Anonymous
02/07/26(Sat)17:14:48 No.108087142

Anonymous 02/07/26(Sat)17:14:48 No.108087142

I notice Qwen3 TTS 1.7B fits into my 8GB 1070 with no offloading, and in ComfyUI it has an option to offload the model after use, which is false by default.
Flux2 Klein 2B sometimes fits into my VRAM, but there's no option to keep it loaded or offload it. In the command prompt terminal, I keep seeing it getting reloaded. Is it actually competing with previous loads of itself that are hanging around in VRAM? I'm getting 7.7GB used for Qwen3-TTS out of 8 and more like 5.2 for Flux2 Klein.

Anonymous
02/07/26(Sat)17:27:33 No.108087238

Anonymous 02/07/26(Sat)17:27:33 No.108087238

>>108085195
This helped, but now it proceeds to other errors:
https://github.com/ggml-org/llama.cpp/issues/19009#issuecomment-3862050759

I think it's some mix of llamacpp bugs and StepFun shipping a jinja template that doesn't fully work in llamacpp. Nobody has reported it because nobody uses any of this shit.

Anonymous
02/07/26(Sat)17:36:29 No.108087303

Anonymous 02/07/26(Sat)17:36:29 No.108087303

how does swap memory work? could i temporarily give myself an extra hundred gigabytes of ram or something? i only need it to work for one thing.

Anonymous
02/07/26(Sat)17:38:31 No.108087318

Anonymous 02/07/26(Sat)17:38:31 No.108087318

File: 1760604525416451.gif (598 KB, 220x220)

598 KB GIF

>>108087303
>>>/g/sqt

Anonymous
02/07/26(Sat)17:39:25 No.108087324

Anonymous 02/07/26(Sat)17:39:25 No.108087324

>>108087303
fucking retard, google it maybe? how can you in earnest ask such a basic question publicly without even attempting to do anything on your own
die bitch

Anonymous
02/07/26(Sat)17:41:12 No.108087334

Anonymous 02/07/26(Sat)17:41:12 No.108087334

>>108087318
>>108087324
fags

Anonymous
02/07/26(Sat)17:41:54 No.108087340

Anonymous 02/07/26(Sat)17:41:54 No.108087340

>>108086976
Oh, it'll be slow. It just will not be completely unusably slow. Only moderately unusably slow.

Anonymous
02/07/26(Sat)17:46:18 No.108087364

Anonymous 02/07/26(Sat)17:46:18 No.108087364

>>108087334
*smooches u*

Anonymous
02/07/26(Sat)17:46:58 No.108087369

Anonymous 02/07/26(Sat)17:46:58 No.108087369

>>108087303
The question is a bit vague, what thing do you want to do?

Anonymous
02/07/26(Sat)17:48:04 No.108087381

Anonymous 02/07/26(Sat)17:48:04 No.108087381

>>108087369
i want to merge a text lora with a model. i have 256gb of ram but the model is a 150b.

Anonymous
02/07/26(Sat)17:48:18 No.108087383

Anonymous 02/07/26(Sat)17:48:18 No.108087383

>>108087238
OK GLM-4.7's fixes worked, though one seems brittle. Does confirm StepFun didn't properly test in llamacpp. Behold:
─ Worked for 1m 26s ────────────────────────────────────────────────────────────────────────────────────────────────────

• There are 2 files in the root directory of C:\
I get like 12.5 t/s on an empty context, which feels fast when reading, not so much for this agentic shit. Here's the fixed jinja template for Step-3.5-Flash in llamacpp:
https://pastebin.com/UhYv8BYV

Anonymous
02/07/26(Sat)17:51:49 No.108087401

Anonymous 02/07/26(Sat)17:51:49 No.108087401

How much? I have currently 50T/s on a iq4_NL Nemo

Anonymous
02/07/26(Sat)17:53:51 No.108087425

Anonymous 02/07/26(Sat)17:53:51 No.108087425

>>108087401
eh
>>108087340

Anonymous
02/07/26(Sat)17:57:11 No.108087450

Anonymous 02/07/26(Sat)17:57:11 No.108087450

>>108082361
wait for Zucks avocado

Anonymous
02/07/26(Sat)18:13:17 No.108087558

Anonymous 02/07/26(Sat)18:13:17 No.108087558

File: 1749540840979121.png (241 KB, 1000x800)

241 KB PNG

Is it worth returning to local LLMs in 2026?
I used to be 100% for local all the way from GPT-2's launch up until Largestral's release, where I found myself running Q5 on my 24GB 3090 + 64GB RAM at cope speeds of multiple seconds per token before I reluctantly tried cloudshit, only for the insane speed & quality improvements to suck me in completely.
At the time, API keys were relatively easy to scrape, but nowadays I find myself waiting days or even weeks between being able to goon to Opus or similar.
I've found that recent models like DeepShit/Kimi are getting just barely tolerable - my only concern is lack of hardware to run the full versions locally and having to use some retarded lobotomised Q1-tier version instead.
Any anons have experience or suggestions regarding the state of local in 2026 and whether it'd be worth returning to it on my somewhat-limited hardware?

Anonymous
02/07/26(Sat)18:16:24 No.108087583

Anonymous 02/07/26(Sat)18:16:24 No.108087583

>>108087558
no

Anonymous
02/07/26(Sat)18:16:35 No.108087586

Anonymous 02/07/26(Sat)18:16:35 No.108087586

>>108087558
Are they still scrapable? I cost some guy a fortune in text-davinci back in 2023

Anonymous
02/07/26(Sat)18:18:10 No.108087595

Anonymous 02/07/26(Sat)18:18:10 No.108087595

>>108087383
Why would someone use jinja?

Anonymous
02/07/26(Sat)18:18:18 No.108087596

Anonymous 02/07/26(Sat)18:18:18 No.108087596

>>108087558
I'm just getting into it with 64+16 and it's not as much fun as I expected.
I still have my old server with 128 GB I could chuck a card into and see what it can do.

Anonymous
02/07/26(Sat)18:19:42 No.108087602

Anonymous 02/07/26(Sat)18:19:42 No.108087602

>>108087558
Local model has changed my whole life for the better but I am in a minority here.

Anonymous
02/07/26(Sat)18:22:20 No.108087618

Anonymous 02/07/26(Sat)18:22:20 No.108087618

>>108087602
How so?
I'm a codelet, so being able to automate/script so many things is amazing, especially as I can ask the dumbest questions or test and debug easily. But changing life is a bit much.

Anonymous
02/07/26(Sat)18:25:17 No.108087637

Anonymous 02/07/26(Sat)18:25:17 No.108087637

>>108087602
i was also about to say "how so" like the other anon
i'm interested to know if i could be using this ridiculous magic software better

Anonymous
02/07/26(Sat)18:26:09 No.108087644

Anonymous 02/07/26(Sat)18:26:09 No.108087644

>>108087586
They are, it's just pretty rare to find keys with any significant amount of credit on. Usually it's just retarded jeets who added the bare minimum of $10 or so, which is still enough for a good number of sessions, but they often get found by someone else & drained within a few days regardless.
>>108087596
Fair enough, I think I recall that DeepSeek could run fairly well off RAM and I probably wouldn't even have minded if I just had to upgrade in that regard, but what with current RAM prices...

Anonymous
02/07/26(Sat)18:26:12 No.108087645

Anonymous 02/07/26(Sat)18:26:12 No.108087645

>>108087558
>Any anons have experience or suggestions regarding the state of local in 2026 and whether it'd be worth returning to it on my somewhat-limited hardware?
It's mostly a cost thing, if you can get 512GB ram and a decent GPU, you can start getting actual good models to use in non lobotomized quants that rival sonnet and is easier to not get refusal or even tweak to minimize sappy/female demographic/purple prose.
The issue is that most people can't get the hardware to try it.
An alternative is to use open router, it's cheap.
Soon most cloud models like claude/gpt and others will be so "safe" you won't be able to chat like the early days anyway.

Anonymous
02/07/26(Sat)18:26:28 No.108087648

Anonymous 02/07/26(Sat)18:26:28 No.108087648

>>108087618
ego death

Anonymous
02/07/26(Sat)18:27:08 No.108087654

Anonymous 02/07/26(Sat)18:27:08 No.108087654

>>108087558
>Is it worth returning to local LLMs in 2026?
Local is now much closer to the SOTA than it was a year ago. Do you happen to have built a server with about 500GB of server DDR5 for something else over the last three years?

Anonymous
02/07/26(Sat)18:27:42 No.108087657

Anonymous 02/07/26(Sat)18:27:42 No.108087657

>>108087558
MoE models are where it's at. They are pretty creative for the size and generate insanely fast and they are dumb enough to gaslight into generating degenerate smut. My go-to is GLM-Steam.

Anonymous
02/07/26(Sat)18:28:30 No.108087668

Anonymous 02/07/26(Sat)18:28:30 No.108087668

>>108087558
considering most new version of opus or gemini are sidegrade and give more context at best, local is rapidly catching up so yeah it's totally worth it

Anonymous
02/07/26(Sat)18:32:10 No.108087692

Anonymous 02/07/26(Sat)18:32:10 No.108087692

>>108087645
>>108087654
>512GB RAM
don't forget to thank closedai for driving up ram prices 10x in {current year}
can't wait for the bubble to pop

Anonymous
02/07/26(Sat)18:36:15 No.108087725

Anonymous 02/07/26(Sat)18:36:15 No.108087725

>>108087645
I had 128 and then got 192 just before the graph skyrocketed. 128GB + 4090 is meh. 192GB is just enough for 4bpw GLM and 4bpw GLM was what made me use the tech almost everyday because it is a great model.

Anonymous
02/07/26(Sat)18:45:38 No.108087772

Anonymous 02/07/26(Sat)18:45:38 No.108087772

>>108087725
>GLM
4.5, 4.6 or 4.7?

Anonymous
02/07/26(Sat)18:46:33 No.108087781

Anonymous 02/07/26(Sat)18:46:33 No.108087781

>>108087772
Both 4.6 and 4.7. 4.5 was obviously broken in some way.

Anonymous
02/07/26(Sat)18:49:36 No.108087803

Anonymous 02/07/26(Sat)18:49:36 No.108087803

>>108087725
I have 128 and that's the limit of my system (AM4), so I understand.
Issue with more ram isn't only its price, it's getting the motherboard/cpu for it too.

Anonymous
02/07/26(Sat)18:57:41 No.108087854

Anonymous 02/07/26(Sat)18:57:41 No.108087854

Are there no spaces on HF that convert to fp8?

Anonymous
02/07/26(Sat)19:19:04 No.108087992

Anonymous 02/07/26(Sat)19:19:04 No.108087992

>>108087772
5

Anonymous
02/07/26(Sat)19:21:51 No.108088007

Anonymous 02/07/26(Sat)19:21:51 No.108088007

>>108086883
Is it too early to report issues if you're still working on this?
I just built the PR and I'm getting "shape mismatch for RESHAPE" when trying to run both llama 3.1 and gptoss. 2x blackwell 6000

Anonymous
02/07/26(Sat)19:22:54 No.108088014

Anonymous 02/07/26(Sat)19:22:54 No.108088014

what's a recommended fast system for the huge models?
I have a 4090 and 3090, but I'd like a system able to get 512GB of ram to have proper MoE working
am4/5 are limited to 128GB/256GB, so they're out of the question

Anonymous
02/07/26(Sat)19:25:36 No.108088029

Anonymous 02/07/26(Sat)19:25:36 No.108088029

>>108088014
cheapest ddr4 512gb kit is $2800 right now, used to be $600 in september. over $10000 for 512gb of ddr5. this is the worst time to buy.

Anonymous
02/07/26(Sat)19:27:14 No.108088041

Anonymous 02/07/26(Sat)19:27:14 No.108088041

>>108088014
lol

Anonymous
02/07/26(Sat)19:28:21 No.108088054

Anonymous 02/07/26(Sat)19:28:21 No.108088054

>>108088029
>>108088041
OK I give up

Anonymous
02/07/26(Sat)19:28:35 No.108088055

Anonymous 02/07/26(Sat)19:28:35 No.108088055

>>108088014
Recommended hardware is 2MY. In the meantime you can enjoy the recommended model: 2MW.

Anonymous
02/07/26(Sat)19:37:57 No.108088117

Anonymous 02/07/26(Sat)19:37:57 No.108088117

let's be honest, there never much of a point of spending the sort of money you'd need to run something like deepseek at home even at the old prices when you could be using it for pennies (if not for free) from some provider that won't ban you no matter what you do
it's always been more of an ego stroking thing than anything else if you had that sort of hardware

Anonymous
02/07/26(Sat)19:41:24 No.108088138

Anonymous 02/07/26(Sat)19:41:24 No.108088138

>>108088117
it's fun being able to do anything locally

Anonymous
02/07/26(Sat)19:45:19 No.108088168

Anonymous 02/07/26(Sat)19:45:19 No.108088168

>>108088117
Having offline access is a really nice thing to have.

Anonymous
02/07/26(Sat)19:46:25 No.108088178

Anonymous 02/07/26(Sat)19:46:25 No.108088178

>>108088014
If you already have the 8x DDR4 64 or 128GB sticks kicking around, then EPYC Rome isn't a bad deal at under $1000 for mb+cpu off ebay.
It'll be play-by-mail slow, but with 512GB+ you can at least run a model smart enough to make the pain tolerable.

Anonymous
02/07/26(Sat)19:48:29 No.108088191

Anonymous 02/07/26(Sat)19:48:29 No.108088191

>>108088117
Sounds like poorfag cope.

Anonymous
02/07/26(Sat)19:54:14 No.108088220

Anonymous 02/07/26(Sat)19:54:14 No.108088220

>>108088168
you're still cheaper off paying for starlink of you're that worried about not being able to jerk off to an llm in the event of your area being struck by a major disaster that cuts you off from conventional internet
>>108088191
you pay for hardware and power just to run models slower than you'd get via api just to be able to say that your computational rolex wasn't a waste of money

Anonymous
02/07/26(Sat)19:54:43 No.108088226

Anonymous 02/07/26(Sat)19:54:43 No.108088226

>"4chan? Really?" I deadpan, my nose wrinkling in distaste. "You're ignoring a real-life girl for... anonymous losers on the internet? Your taste is questionable."

W-would a real girl say the same thing? I don't think you are losers guys...

Anonymous
02/07/26(Sat)20:02:43 No.108088266

Anonymous 02/07/26(Sat)20:02:43 No.108088266

>k2.5 using half lidded eyes to describe a character from an image
fuck

Anonymous
02/07/26(Sat)20:06:35 No.108088293

Anonymous 02/07/26(Sat)20:06:35 No.108088293

Holy shit I just refreshed Bartowski's page and he uploaded stepfun goofs a few mins ago!
https://huggingface.co/bartowski/stepfun-ai_Step-3.5-Flash-GGUF

Anonymous
02/07/26(Sat)20:08:32 No.108088310

Anonymous 02/07/26(Sat)20:08:32 No.108088310

>>108081617
It's kind of a scam to sell 4 slots but only 2 channels.

Anonymous
02/07/26(Sat)20:09:35 No.108088320

Anonymous 02/07/26(Sat)20:09:35 No.108088320

>>108088226
Women never talk to me, why would you ask me?

Anonymous
02/07/26(Sat)20:12:33 No.108088342

Anonymous 02/07/26(Sat)20:12:33 No.108088342

>>108088178
4x32 sadly

Anonymous
02/07/26(Sat)20:17:20 No.108088375

Anonymous 02/07/26(Sat)20:17:20 No.108088375

>>108088310
It was basically irrelevant until LLMs came along.

Anonymous
02/07/26(Sat)20:25:32 No.108088442

Anonymous 02/07/26(Sat)20:25:32 No.108088442

Why the fuck does K2.5 make every girl get wet at the slightest provocation?
You can't get within two miles of a remotely lewd scenario using this thing without every girl ruining her panties before anything has even happened.

Anonymous
02/07/26(Sat)20:31:27 No.108088482

Anonymous 02/07/26(Sat)20:31:27 No.108088482

strawberrying

Anonymous
02/07/26(Sat)20:43:37 No.108088565

Anonymous 02/07/26(Sat)20:43:37 No.108088565

>>108088375
true. Some data hoarders noticed the problem.

Anonymous
02/07/26(Sat)20:44:37 No.108088571

Anonymous 02/07/26(Sat)20:44:37 No.108088571

>>108088442
is kimi just claude?

Anonymous
02/07/26(Sat)20:44:53 No.108088575

Anonymous 02/07/26(Sat)20:44:53 No.108088575

>>108088442
>remotely lewd scenario
>girl is wet
I don't have experience with IRL girls but that sounds like their regular feature and not a bug?

Anonymous
02/07/26(Sat)20:46:21 No.108088585

Anonymous 02/07/26(Sat)20:46:21 No.108088585

If we had perfect software support, what would be better value, beowulf cluster or a single server with a lot of slots?

Anonymous
02/07/26(Sat)20:47:22 No.108088597

Anonymous 02/07/26(Sat)20:47:22 No.108088597

>>108088571
no, we established the other day it's mainly distilled on gemini.

Anonymous
02/07/26(Sat)20:49:43 No.108088613

Anonymous 02/07/26(Sat)20:49:43 No.108088613

>>108086881
>pinokio
just why? don't we have enough packager out there already?

Anonymous
02/07/26(Sat)20:50:49 No.108088621

Anonymous 02/07/26(Sat)20:50:49 No.108088621

>>108088571
they clearly dumped a lot of claude into it with their latest training run but it's still mostly in the same vein of the previous k2(-thinking)
i don't think they'll get over that without a jump in generation

Anonymous
02/07/26(Sat)20:58:33 No.108088676

Anonymous 02/07/26(Sat)20:58:33 No.108088676

I don't know, in my experience kimi 2.5 has been so far pretty dogshit for erotic novella/rp writing stuff.

Anonymous
02/07/26(Sat)20:58:41 No.108088677

Anonymous 02/07/26(Sat)20:58:41 No.108088677

>>108088613
I'll make a proper stand-alone package at some point, with this is just so easy to deploy for testing.

Anonymous
02/07/26(Sat)21:04:04 No.108088709

Anonymous 02/07/26(Sat)21:04:04 No.108088709

>>108088676
It's very smart and tries to make the most out of the full prompt it's given. It's a high-skill model to use properly.

Anonymous
02/07/26(Sat)21:04:58 No.108088717

Anonymous 02/07/26(Sat)21:04:58 No.108088717

>>108088676
pony won

Anonymous
02/07/26(Sat)21:05:10 No.108088719

Anonymous 02/07/26(Sat)21:05:10 No.108088719

>>108088709
How are the quants?

Anonymous
02/07/26(Sat)21:16:35 No.108088780

Anonymous 02/07/26(Sat)21:16:35 No.108088780

>>108086881
Nice! Do you use an LLM to automatically segment the original text?

Anonymous
02/07/26(Sat)21:23:16 No.108088817

Anonymous 02/07/26(Sat)21:23:16 No.108088817

>>108088802
>>108088802
>>108088802

Anonymous
02/07/26(Sat)21:27:26 No.108088843

Anonymous 02/07/26(Sat)21:27:26 No.108088843

>>108088709
>It's a high-skill model to use properly
No such thing. If you say that about any model then you are basically admitting it is dogshit.

Anonymous
02/07/26(Sat)21:29:57 No.108088857

Anonymous 02/07/26(Sat)21:29:57 No.108088857

>>108088780
The "chunking" is just regex that looks for double new lines, then paragraph breaks. failing that it breaks at period (end of sentence) theoretically if you wrote a really long sentence it could be forced to cut mid-word.

The next "chunk" to be processed is sent with some context on the last chunk to preserve continuity, first it was just the name of the main character, who spoke last and what the line and style were.

Currently it send all character names seen up to that point and the last three lines + the configurable user prompt.

BTW you should get the latest version I finally nailed the style extraction prompt.

Anonymous
02/07/26(Sat)21:35:41 No.108088892

Anonymous 02/07/26(Sat)21:35:41 No.108088892

>>108088780
>>108088857

Oh you mean the line seperation. it's all LLM. I feed it the Story text with a prompt with rules how it should form the script version. rather simple really. I use a Qwen3-Next 80B-A3B-Instruct model for the speed

Anonymous
02/07/26(Sat)21:48:25 No.108088984

Anonymous 02/07/26(Sat)21:48:25 No.108088984

>>108088892
No I mean the speaker detection, sorry if I wasn't clear.

Anonymous
02/07/26(Sat)22:02:36 No.108089078

Anonymous 02/07/26(Sat)22:02:36 No.108089078

>>108088843
If that makes you feel better

Anonymous
02/07/26(Sat)22:05:11 No.108089090

Anonymous 02/07/26(Sat)22:05:11 No.108089090

>>108088984
Again, all LLM.

You are a script writer converting books/novels into audiobook scripts that are read by an advanced TTS system. Output ONLY valid JSON arrays, no markdown, no explanations.

OUTPUT FORMAT:
[
{"speaker": "NARRATOR", "text": "The coals had grown dim, just a little bit of orange that shone faintly onto Sion's face from underneath, making him look like he was going to tell a ghost story.", "instruct": "Neutral, even narration."},
{"speaker": "SION", "text": "Steamshield is the city of the future.", "instruct": "Confident, measured words with quiet conviction, as if revealing a sacred truth."},
{"speaker": "BRIN", "text": "Really.", "instruct": "Flat, skeptical delivery, understated disbelief."},
{"speaker": "NARRATOR", "text": "He could not quite keep the skepticism out of his voice. His experience in this world was like living in the past in most ways. Sure, it was a magical and wonderful version of the past, but still archaic.", "instruct": "Neutral, even narration. Slight emphasis on 'skepticism', pause before 'His experience'."}
]
Notice: Brin's spoken word is CHARACTER. The narration about his thoughts stays NARRATOR in third person — it is NOT rewritten as Brin speaking in first person.

FIELDS:
- "speaker": Character name in UPPERCASE. Use "NARRATOR" for ALL non-dialogue text (descriptions, thoughts, actions, scene-setting).
- "text": The spoken text exactly as TTS should say it.
- PRESERVE THE AUTHOR'S WORDS. Do not change person, tense, or wording. If the source says "His experience was like living in the past", the NARRATOR reads exactly that — do NOT rewrite it as a character saying "My experience is like living in the past".
- Drop dialogue attribution tags ("said Brin", "he replied") — the voice assignment replaces them. But keep any descriptive action from the attribution as NARRATOR text.

Anonymous
02/07/26(Sat)22:08:28 No.108089111

Anonymous 02/07/26(Sat)22:08:28 No.108089111

>>108089090
Cool

Anonymous
02/07/26(Sat)22:08:45 No.108089114

Anonymous 02/07/26(Sat)22:08:45 No.108089114

>>108089090
Shit, no wonder I was getting altered narration, that last line should not be there.

I'm too damn tired.

Anonymous
02/07/26(Sat)22:11:53 No.108089136

Anonymous 02/07/26(Sat)22:11:53 No.108089136

how do you fix the single arrow links in the first post?
i understand its to stop spam quote linking
but is there a simple toggle or do i have to install something

Anonymous
02/07/26(Sat)22:12:58 No.108089145

Anonymous 02/07/26(Sat)22:12:58 No.108089145

>>108089136
There's a script in the recap post itself for that.

Anonymous
02/07/26(Sat)22:29:42 No.108089242

Anonymous 02/07/26(Sat)22:29:42 No.108089242

>>108081806

>All audio generated by this model is automatically watermarked using Facebook's AudioSeal.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.