/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 03/28/26(Sat)02:35:54 No.108470850

File: 1770358397084.png (1.5 MB, 1600x672)

1.5 MB PNG

/lmg/ - Local Models General Anonymous 03/28/26(Sat)02:35:54 No.108470850 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108466262 & >>108459276

►News
>(03/26) CohereLabs releases Transcribe 2B ASR: https://hf.co/CohereLabs/cohere-transcribe-03-2026
>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts
>(03/26) ggml-cuda: Add NVFP4 dp4a kernel #20644 merged: https://github.com/ggml-org/llama.cpp/pull/20644
>(03/25) LongCat-Next native multimodal 74B-A3B released: https://hf.co/meituan-longcat/LongCat-Next
>(03/25) mtmd: Add DeepSeekOCR Support #17400 merged: https://github.com/ggml-org/llama.cpp/pull/17400

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/28/26(Sat)02:36:12 No.108470853

Anonymous 03/28/26(Sat)02:36:12 No.108470853

File: 17745737552553.png (2.86 MB, 1509x1541)

2.86 MB PNG

►Recent Highlights from the Previous Thread: >>108466262

--Unsloth Studio release and fine-tuning discussion:
>108466593 >108466604 >108466618 >108466757 >108466771 >108466802 >108466835 >108466845 >108467416 >108467512 >108467529 >108467553 >108467584 >108467665 >108467709 >108467725 >108467828 >108467852
--Debating MCP vs skills for LLM tool integration:
>108468260 >108468314 >108468364 >108468471 >108468484 >108468516 >108468528 >108468529 >108468545 >108468624 >108470310 >108470328 >108470372 >108468377 >108468459 >108468479 >108468481
--llama-server update adds built-in tools and enhanced functionality:
>108467191 >108467207 >108467243
--Hadamard transforms for V-cache added to ik_llama:
>108466588 >108466600
--Turboquant enabling efficient KV cache quantization without GQA tradeoffs:
>108468782 >108468792 >108468970 >108468987 >108469019 >108469095
--Configuring koboldcpp banned strings in SillyTavern:
>108467399 >108467422 >108467473 >108467491 >108467509 >108468165 >108468844 >108469015
--ASCII art generation struggles with LLMs:
>108467947 >108468032 >108468043 >108468073 >108468099 >108468156 >108468101
--NeurIPS 2026 policy confusion over U.S. sanctions and Chinese participation:
>108467980 >108468027 >108468041 >108468048 >108468062
--Anon unaware of existing ignore-robots-txt flag in MCP fetch tool:
>108466397 >108466415 >108466432 >108466496
--Miku (free space):
>108467947 >108468032 >108468821 >108468908 >108469368 >108469673 >108470528

►Recent Highlight Posts from the Previous Thread: >>108466266

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/28/26(Sat)02:38:07 No.108470862

Anonymous 03/28/26(Sat)02:38:07 No.108470862

Important: never respond to vagueposts.

Anonymous
03/28/26(Sat)02:46:00 No.108470888

Anonymous 03/28/26(Sat)02:46:00 No.108470888

you had to be delusional to think that anything was coming before next week. next week, however...

Anonymous
03/28/26(Sat)02:47:30 No.108470896

Anonymous 03/28/26(Sat)02:47:30 No.108470896

File: out.webm (1.43 MB, 848x480)

1.43 MB WEBM

>>108470853

Anonymous
03/28/26(Sat)02:50:46 No.108470906

Anonymous 03/28/26(Sat)02:50:46 No.108470906

File: teto pointing ref fat ano(...).png (539 KB, 895x895)

539 KB PNG

>>108470888
Must be true but if not, I will rotate you.

Anonymous
03/28/26(Sat)02:57:40 No.108470929

Anonymous 03/28/26(Sat)02:57:40 No.108470929

Local losted

Anonymous
03/28/26(Sat)03:01:21 No.108470944

Anonymous 03/28/26(Sat)03:01:21 No.108470944

turboquant in llmaocpp when??????

Anonymous
03/28/26(Sat)03:03:17 No.108470951

Anonymous 03/28/26(Sat)03:03:17 No.108470951

>>108470944
It'll come when MTP and tensor split do.

Anonymous
03/28/26(Sat)03:30:13 No.108471026

Anonymous 03/28/26(Sat)03:30:13 No.108471026

>>108470951
>tensor split
huh?

Anonymous
03/28/26(Sat)03:32:34 No.108471034

Anonymous 03/28/26(Sat)03:32:34 No.108471034

Tetonation thread. Should I compile new llama.cpp? I'm afraid.

Anonymous
03/28/26(Sat)03:44:48 No.108471062

Anonymous 03/28/26(Sat)03:44:48 No.108471062

File: 11e85cdb848869b50e0af47d5(...).png (301 KB, 640x640)

301 KB PNG

You can pipe the llama.cpp webui out through your router to a domain if you use an API key, right? It's safe, right?

Anonymous
03/28/26(Sat)03:45:37 No.108471065

Anonymous 03/28/26(Sat)03:45:37 No.108471065

>>108471062
just forward the port on your router bro
you won't even need an api key dude

Anonymous
03/28/26(Sat)03:46:13 No.108471067

Anonymous 03/28/26(Sat)03:46:13 No.108471067

>>108471062
There's literally a warning when you run the server.

Anonymous
03/28/26(Sat)03:46:44 No.108471071

Anonymous 03/28/26(Sat)03:46:44 No.108471071

>>108471065
Doesn't this mean any rando can access it?

Anonymous
03/28/26(Sat)03:51:14 No.108471080

Anonymous 03/28/26(Sat)03:51:14 No.108471080

>>108471071
Don't worry about it mate

Anonymous
03/28/26(Sat)04:02:30 No.108471107

Anonymous 03/28/26(Sat)04:02:30 No.108471107

/lmg/ knew about DeepSeek-R1 a week before the huge Nvidia crash.
TurboQuant lead to a Samsung/Micron/Hynix crash while being a nothingburger paper released last year that claims 4x VRAM saving quanting KV cache from FP16 to 3.5bit (no shit), whereas Q4 (even native Q4) has been the norm for more than year.
The market is really fucking dumb wrt. tech hype.

Anonymous
03/28/26(Sat)04:06:17 No.108471118

Anonymous 03/28/26(Sat)04:06:17 No.108471118

>>108471034
https://github.com/ggml-org/llama.cpp/pull/21085
better not, pwilkin is "fixing" the parser again

Anonymous
03/28/26(Sat)04:09:13 No.108471123

Anonymous 03/28/26(Sat)04:09:13 No.108471123

>>108471107
There's a reason one of Buffet's main principles is to only invest in businesses he understands.
Tech illiterate speculators gambling on tech stocks they don't remotely understand deserve to lose every penny.

Anonymous
03/28/26(Sat)04:19:12 No.108471151

Anonymous 03/28/26(Sat)04:19:12 No.108471151

Computer generate me a Super Famicom eroge where everything is procedurally generated

Anonymous
03/28/26(Sat)04:21:47 No.108471158

Anonymous 03/28/26(Sat)04:21:47 No.108471158

>>108471151
One day we'll get this, for now you'll have to cope with qwen 3 coder next installing Arch Linux for you.

Anonymous
03/28/26(Sat)04:49:15 No.108471219

Anonymous 03/28/26(Sat)04:49:15 No.108471219

>give claude a task to update a pentest server on lmarena
>it just does it
>use the same prompt in claude.ai
>thinking about ethical implications... for a whole minute
Tiresome. There should be a law against prompt injection

Anonymous
03/28/26(Sat)04:56:16 No.108471237

Anonymous 03/28/26(Sat)04:56:16 No.108471237

Which tool and model are you using?
I tried opencode with glm-4.7-flash-q4 but it sucks.

Anonymous
03/28/26(Sat)04:56:33 No.108471241

Anonymous 03/28/26(Sat)04:56:33 No.108471241

>>108471118
pwilkinbros... we won??

Anonymous
03/28/26(Sat)04:57:04 No.108471244

Anonymous 03/28/26(Sat)04:57:04 No.108471244

File: 1773260166464819.png (2.07 MB, 772x4729)

2.07 MB PNG

https://zhuanlan.zhihu.com/p/2020969476166808284
TurboQuant drama incoming.

Chink researcher from ETH Zurich accuses TurboQuant authors of:
- Failure to properly credit and discuss prior work (RaBitQ)
- Misrepresentation of RaBitQ’s theoretical results
- Deliberately unfair experimental comparisons (Running TurboQuant on A100 and RaBitQ on a single-core CPU)

Anonymous
03/28/26(Sat)05:01:14 No.108471251

Anonymous 03/28/26(Sat)05:01:14 No.108471251

>>108471244
>muh accolades
so this is why illya autisms right?

Anonymous
03/28/26(Sat)05:03:35 No.108471257

Anonymous 03/28/26(Sat)05:03:35 No.108471257

>>108471251
Don't think that's what they're aiming for because TurboQuant's claimed gains are unlikely to be realized anyways.

Anonymous
03/28/26(Sat)05:04:57 No.108471259

Anonymous 03/28/26(Sat)05:04:57 No.108471259

I can see Google delaying Gemma 4 because Qwen 3.5 was too good and they can't afford releasing something that is not SOTA at least for a while.

Qwen 3.5 27B (heretic) is not even that bad for ERP if you don't make it generate the usual "book style" / "novel" purple prose (though when I loaded up Ministral 14B for an unfair comparison, I liked Ministral's writing style more, even if it has general retardation for the first couple conversation turns and doesn't follow character instructions very well).

Anonymous
03/28/26(Sat)05:12:22 No.108471281

Anonymous 03/28/26(Sat)05:12:22 No.108471281

>>108471257
no I mean could the RaBitQ researcher's even do their paper without looking at illya's implementation?

Anonymous
03/28/26(Sat)05:16:18 No.108471297

Anonymous 03/28/26(Sat)05:16:18 No.108471297

>>108471259
How do you tell it not to use the novel/book style?

Anonymous
03/28/26(Sat)05:22:26 No.108471310

Anonymous 03/28/26(Sat)05:22:26 No.108471310

File: newton.jpg (78 KB, 534x400)

78 KB JPG

>>108471281
Most (all) scientific work builds on each other. The problem here seems to be one of missing attribution and unfair comparison with the base work.
>>108471244
So
> TQ used my work without attribution, then nefariously claimed their method was better.

Anonymous
03/28/26(Sat)05:24:59 No.108471326

Anonymous 03/28/26(Sat)05:24:59 No.108471326

>>108471259
Gemma 4 will have so much literature power it'll make other AI slop trained models look retarded in comparison.

Anonymous
03/28/26(Sat)05:26:19 No.108471330

Anonymous 03/28/26(Sat)05:26:19 No.108471330

>>108471297
Qwen is so trained to follow directions that it'll stumble upon its own reasoning. You can't change its behavioural output format that much.

Anonymous
03/28/26(Sat)05:31:05 No.108471348

Anonymous 03/28/26(Sat)05:31:05 No.108471348

>>108471062
if you need to access it remotely, just use tailscale

Anonymous
03/28/26(Sat)05:34:54 No.108471359

Anonymous 03/28/26(Sat)05:34:54 No.108471359

>>108471297
Set up your roleplay so it's more similar to a theatrical play script, describing direct and non-obvious actions only, and avoiding "he/she says" and useless adverbs as much as possible, only using asterisks for actual emphasis. In short, make the roleplay dialogue-focused.

>(Anon appears surprised that other people haven't started doing this yet.)

I find that most slop in LLMs comes from typical "internet roleplay" / CAI-like conversations and if you break out from them you'll see less of it.

Anonymous
03/28/26(Sat)05:36:00 No.108471364

Anonymous 03/28/26(Sat)05:36:00 No.108471364

File: Capture.png (59 KB, 975x1010)

59 KB PNG

Why is qwen so dumb? I mean I'm running the 0.8b variant but we are at version 3.5 ffs

Anonymous
03/28/26(Sat)05:38:12 No.108471367

Anonymous 03/28/26(Sat)05:38:12 No.108471367

>>108471364
Was it wrong? 1.9 IS larger.

Anonymous
03/28/26(Sat)05:41:07 No.108471378

Anonymous 03/28/26(Sat)05:41:07 No.108471378

>>108471367
It continues to say "Wait" and proceeds to compare them in different notion for another 15 min

Anonymous
03/28/26(Sat)05:46:21 No.108471390

Anonymous 03/28/26(Sat)05:46:21 No.108471390

>>108471378
There's something wrong with either Qwen 3.5 implementation or maybe it is a quant problem.
Llama webui would often end up in infinite reasoning with 9b. 3000 tokens is max for a normal query (not programming related) but that shit would just go on and on.
With my client that wasn't an issue but my llama is one month old now. Plus, a single \n in a wrong place can mess up lot of things.

Anonymous
03/28/26(Sat)05:57:40 No.108471422

Anonymous 03/28/26(Sat)05:57:40 No.108471422

>>108471359
Does that method work for most models?

Anonymous
03/28/26(Sat)06:02:15 No.108471438

Anonymous 03/28/26(Sat)06:02:15 No.108471438

>>108471422
I just roleplay like that and I don't generally see the slop expressions that most people here and elsewhere complain about. Dialogue too has its own overused phrases though, like "don't be shy" (this is both in Qwen 3.5 and Gemma 3).

Anonymous
03/28/26(Sat)06:03:46 No.108471443

Anonymous 03/28/26(Sat)06:03:46 No.108471443

>>108471259
They will still be ahead in multilingual most likely but the main issue right now for Google is compute power and squeezing out more with what they have. They need to do a Gemini 3.2 release when they are falling behind their competitors in key real world use case scenarios because they benchmarkmaxxed the wrong stuff like ARC-AGI and Humanity's Last Exam instead of agentic stuff. I can't use Gemini 3.1 at work with Copilot anymore because of that. Either Gemma 4 is delayed or they don't care about it being SOTA and just being a reflection of what they are doing in Gemini 3.1/3.2 for plebs as a gesture to still be more open than OpenAI and Anthropic.

Anonymous
03/28/26(Sat)06:05:09 No.108471446

Anonymous 03/28/26(Sat)06:05:09 No.108471446

>>108471438
Gemma 3? why would you use that for rp? it has the most atrocious writing style, even worse than qwen 3.

Anonymous
03/28/26(Sat)06:06:45 No.108471450

Anonymous 03/28/26(Sat)06:06:45 No.108471450

>>108471438
I don't understand the problem. As long as you take it in "this is an AI model" you can forgive lots of stuff unless it's really robotic like Qwen and the chink models.

Anonymous
03/28/26(Sat)06:08:23 No.108471454

Anonymous 03/28/26(Sat)06:08:23 No.108471454

>>108471446
It's good but you need to guide it.

Anonymous
03/28/26(Sat)06:08:47 No.108471456

Anonymous 03/28/26(Sat)06:08:47 No.108471456

>>108471151
why would you want procgen on an already vibecoded game?

Anonymous
03/28/26(Sat)06:14:36 No.108471479

Anonymous 03/28/26(Sat)06:14:36 No.108471479

>>108471454
I m used to mistral style, so gemma 3 feels very robotic and sterile.

Anonymous
03/28/26(Sat)06:15:32 No.108471482

Anonymous 03/28/26(Sat)06:15:32 No.108471482

>>108471446
>Gemma 3? why would you use that for rp?
Because when properly prompted for roleplay it has some restraint (while still being "open" and not outright denying requests) and doesn't jump on your dick by turn 2 like most other models default to. And again, I've not been using it for traditionally narrated roleplay.

>it has the most atrocious writing style, even worse than qwen 3.
Yes, At this point Qwen 3.5 27B (thinking disabled) seems overall better than Gemma 3 27B, multilingual capabilities aside. Gemma 3's vision capabilities still appear to have the upper hand for illustrations and mild NSFW, though.

Anonymous
03/28/26(Sat)06:21:33 No.108471497

Anonymous 03/28/26(Sat)06:21:33 No.108471497

>>108471482
>Qwen 3.5 27B (thinking disabled)
Using this and getting 0.7 tokens per second on my 8gb vram gpu. Feels good man.

Anonymous
03/28/26(Sat)06:22:05 No.108471501

Anonymous 03/28/26(Sat)06:22:05 No.108471501

>>108471479
They are both the same. You need to deep dive the prompts and if you use sillytavern use post history instruction. This gets added afterwards your prompt and you can dictate the style instead of just trusting allah with your tokens.

Anonymous
03/28/26(Sat)06:24:47 No.108471508

Anonymous 03/28/26(Sat)06:24:47 No.108471508

>>108471497
That's because new llama.cpp is fucked up. --fit doesnt work and if you have any previous settings you need to halve your gpu slices (depends).
Slow tokens means that the genius llama devs flood your ram with both gpu memory and cpu offloading.
This took me a while to understand. I updated the shit since December...

Anonymous
03/28/26(Sat)06:26:20 No.108471512

Anonymous 03/28/26(Sat)06:26:20 No.108471512

>>108471508
Whereas previously such conflict wouldn't even be possible. Sure this a thing for modest systems, I'm sure if you have 512 gb ram this isn't a problem.

Anonymous
03/28/26(Sat)06:28:33 No.108471518

Anonymous 03/28/26(Sat)06:28:33 No.108471518

>>108471508
I didn't understand if it was a compile issue or some new flags...

Anonymous
03/28/26(Sat)06:29:05 No.108471520

Anonymous 03/28/26(Sat)06:29:05 No.108471520

>>108471259
Maybe in house Gemma 4 is losing to Qwen 3.5 on Openclaw bench and they need to retrain

Anonymous
03/28/26(Sat)06:31:36 No.108471527

Anonymous 03/28/26(Sat)06:31:36 No.108471527

File: file.png (512 KB, 1470x2484)

512 KB PNG

>>108471364
If you compare the mememarks, intelligencewise, the really small parameter LLM has already plateaued in performance. Sure, there's improvement but it's nothing like what you see Qwen 3.5 2B and 3B and with Llama 8B vs Qwen 3.5 9B being a gigantic gulf where the latter can beat even the Lllama 3 70B in a lot of things. The only real improvement at those low sizes is reasoning and agentic stuff which is a meme at those low sizes at the current performance level. It's vastly much more useful and capable but asking them to do big boy stuff is still not better and probably won't be until we get a paradigm shift again. But I will say that from what I can see, differences in terms of writing and ERP for the doable sizes that matter is still improving. Using MythoMax based on Llama 2 vs a Mistral Nemo mememerge is a vast gulf, That mememerge would easily mog any L2 70B tune of the era like Euryale.

Anonymous
03/28/26(Sat)06:36:36 No.108471541

Anonymous 03/28/26(Sat)06:36:36 No.108471541

>>108471508
oh I was using -ngl 999. I just tried --fit on and now I'm getting 2.2 tokens per second.
Is there anything else I could change to improve performance?
llama-server \
-m "$HOME/Desktop/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf" \
--host 0.0.0.0 \
--port 8080 \
--rope-scaling linear \
--fit on \
-c 8196 \
-t 8 \
-fa on \
--no-slots \
--jinja \
--chat-template-kwargs '{"enable_thinking": false}'

Anonymous
03/28/26(Sat)06:37:15 No.108471544

Anonymous 03/28/26(Sat)06:37:15 No.108471544

File: file.png (266 KB, 1502x1893)

266 KB PNG

>>108471482
The multilingual advantage of Gemma 3 is dead, full stop, with the release of Qwen 3.5 27B. There isn't a single language in the mememark nor in practical testing /here/ where Gemma holds an advantage anymore even for obscure African languages. Would like to be proven wrong but Gemma 3 is officially obsolete with that being the case.

Anonymous
03/28/26(Sat)06:40:45 No.108471554

Anonymous 03/28/26(Sat)06:40:45 No.108471554

>>108471497
How? i m getting 1.4t/s at 32k context on a 6gb gpu.

Anonymous
03/28/26(Sat)06:42:09 No.108471560

Anonymous 03/28/26(Sat)06:42:09 No.108471560

>thing doesn't work waaaaah
>nvm i was using it wrong
>hauhau uncensored aggressive hardcore ultimate deluxe super exxxtreme edition gated open ascended
Like clockwork.

Anonymous
03/28/26(Sat)06:44:09 No.108471568

Anonymous 03/28/26(Sat)06:44:09 No.108471568

>>108471554
idk nigga, help me. >>108471541

Anonymous
03/28/26(Sat)06:45:07 No.108471571

Anonymous 03/28/26(Sat)06:45:07 No.108471571

If Mythos is as good as Anthropic is hyping it up to be (which I doubt) I don't give it two weeks before we see 'OpenAI buys 80% of Samsung's future memory stock until 2035' on the news

Anonymous
03/28/26(Sat)06:46:56 No.108471578

Anonymous 03/28/26(Sat)06:46:56 No.108471578

>>108471568
If most of the model is offloaded to cpu don't use flash attention, it will make it much slower.

Anonymous
03/28/26(Sat)06:50:00 No.108471589

Anonymous 03/28/26(Sat)06:50:00 No.108471589

>>108471541
Your offload is slow, for whatever reason, you should be able to dedicate more threads to the task which tells me your system is weak to only be able to dedicate 8 cores in 2026 to the task. The big one you're missing is KV cache quantization with "-ctk q8_0 -ctv q8_0" will give you a teensy more room to load layers, you can probably even afford doing "-ctv q4_1" instead of q8_0 since 8k context is so short the quality damage won't show. But I've been using K quantization at 8 and V quantization at 4 having no issues with general queries with a IQ4_XS quant.
Also, I am using the llmfan's Arbitrary-Rank Ablation Heretic v3 tune instead.
https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v3-GGUF/tree/main
Going all the way to 0 refusals harms performance too much for me and I can live with 1 or two refusals and find ARA is much better at preserving intelligence and etc.

Anonymous
03/28/26(Sat)06:54:12 No.108471598

Anonymous 03/28/26(Sat)06:54:12 No.108471598

https://huggingface.co/zai-org/GLM-5.1
https://huggingface.co/zai-org/GLM-5.1
https://huggingface.co/zai-org/GLM-5.1
ggufs when?

Anonymous
03/28/26(Sat)06:56:00 No.108471603

Anonymous 03/28/26(Sat)06:56:00 No.108471603

>>108471541
Ah yeah, I use more modest settings but the fact that previously
>llama-server before December 2025 would run a model without any launch parameters
Compared today
>if i launch llama-server without any parameters just to debug it'll be 0.3 tokens per second because there is some conflict in the memory management
Sure I solved this this is not a tech support post but I wanted to outline the difference and why it's probably bad for the people who don't have a half tb of ram.

Anonymous
03/28/26(Sat)06:57:34 No.108471609

Anonymous 03/28/26(Sat)06:57:34 No.108471609

>>108471598
catpic

Anonymous
03/28/26(Sat)06:59:43 No.108471618

Anonymous 03/28/26(Sat)06:59:43 No.108471618

>>108471541
Fit is on by default, it'll mangle the prompt cache cache etc.
unlike the common predicament here, --jinja is always on and it wouldn't even be needed if you access the completion interface but that up to you.
Sonehow the chat template args also conflicts too. Use --reasoning off (might be demented double check).

Anonymous
03/28/26(Sat)06:59:50 No.108471619

Anonymous 03/28/26(Sat)06:59:50 No.108471619

--fit off

Anonymous
03/28/26(Sat)07:07:15 No.108471634

Anonymous 03/28/26(Sat)07:07:15 No.108471634

--trace on

Anonymous
03/28/26(Sat)07:18:26 No.108471667

Anonymous 03/28/26(Sat)07:18:26 No.108471667

>>108471634
Doesn't accomplish anything.
t. A knower

Anonymous
03/28/26(Sat)07:20:28 No.108471680

Anonymous 03/28/26(Sat)07:20:28 No.108471680

>still using only a single llm
Sorry gramps, the future is putting multiple LLMs on a group chat and telling them to argue with each other until they agree on a answer

Anonymous
03/28/26(Sat)07:21:25 No.108471683

Anonymous 03/28/26(Sat)07:21:25 No.108471683

>>108471680
I barely have enough VRAM for one.

Anonymous
03/28/26(Sat)07:21:50 No.108471684

Anonymous 03/28/26(Sat)07:21:50 No.108471684

>>108471680
In the business, we call that "agent swarms".

Anonymous
03/28/26(Sat)07:23:51 No.108471692

Anonymous 03/28/26(Sat)07:23:51 No.108471692

>>108471684
You forgot that the chat space is a harness. Agent swarm inside a harness.

Anonymous
03/28/26(Sat)07:24:06 No.108471693

Anonymous 03/28/26(Sat)07:24:06 No.108471693

How much dumber is Qwen3.5 30B A3B than Qwen3.5 27B?
Because I can run the former with 5x the tokens/second.

Anonymous
03/28/26(Sat)07:26:38 No.108471700

Anonymous 03/28/26(Sat)07:26:38 No.108471700

>>108471693
Depends on the task. For some stuff like programming it's actually pretty good still but it often fails to capture nuance or consistency.
Classic MoE drawbacks that we've known for a long time.

Anonymous
03/28/26(Sat)07:28:16 No.108471708

Anonymous 03/28/26(Sat)07:28:16 No.108471708

>>108471700
What about RP?

Anonymous
03/28/26(Sat)07:28:29 No.108471710

Anonymous 03/28/26(Sat)07:28:29 No.108471710

>>108471693
Reasoning is pretty much mandatory for the A3B version. For all intents and purposes it's a 3B model made wider with MoE, not a regular 35B model with added sparsity. Most MoE models appear to be designed in this way, for some reason.

Anonymous
03/28/26(Sat)07:29:39 No.108471715

Anonymous 03/28/26(Sat)07:29:39 No.108471715

>>108471710
A3B is 10x better at roleplay than 9b even with reasoning disabled.

Anonymous
03/28/26(Sat)07:30:57 No.108471717

Anonymous 03/28/26(Sat)07:30:57 No.108471717

>>108471715
With thinking disabled it's noticeably dumber than the 27B version for roleplay, though, at least from what I could see.

Anonymous
03/28/26(Sat)07:31:19 No.108471721

Anonymous 03/28/26(Sat)07:31:19 No.108471721

>>108471693
Ask it to do any string replacement in C and if it compiles and works- this the winner.
void replace_in_string(char *my_string, char *from_name, char *to_name);

Anonymous
03/28/26(Sat)07:32:08 No.108471726

Anonymous 03/28/26(Sat)07:32:08 No.108471726

>>108471683
If you have good PP or enough RAM for cache, you can have multiple agents share one model.

Anonymous
03/28/26(Sat)07:32:20 No.108471727

Anonymous 03/28/26(Sat)07:32:20 No.108471727

>>108471721
Ask it to do every instance not just one.

Anonymous
03/28/26(Sat)07:33:28 No.108471730

Anonymous 03/28/26(Sat)07:33:28 No.108471730

All my models load in less than 1 second

Anonymous
03/28/26(Sat)07:36:26 No.108471739

Anonymous 03/28/26(Sat)07:36:26 No.108471739

A year ago everyone here would've gotten swamped and crucified by the rest of /lmg/ for even implying that a dense model might have merit over a similar-sized MoE one.

Anonymous
03/28/26(Sat)07:40:13 No.108471749

Anonymous 03/28/26(Sat)07:40:13 No.108471749

>>108471710
>it's 3B model made wider with MoE
It has fewer layers than the dense 27B version (40 vs 64), and its embedding and Attention matrices (which don't use sparsity) are also smaller. Surely this will have a negative effect on performance at least in some cases?

Anonymous
03/28/26(Sat)07:43:25 No.108471762

Anonymous 03/28/26(Sat)07:43:25 No.108471762

File: Screenshot 2026-03-28 at (...).png (239 KB, 3438x1848)

239 KB PNG

>>108471598
GLM sisters?????

Anonymous
03/28/26(Sat)08:01:57 No.108471815

Anonymous 03/28/26(Sat)08:01:57 No.108471815

>>108471721
>if it compiles and works
Surely that isn't your metric, right? Surely you're looking to see if it handles aliasing right, and whether it uses a suitable algorithm (i.e., boyer-moore or aho-corasick)?
"It werks" in C is the fucking stupidest mindset you can possibly have.

Anonymous
03/28/26(Sat)08:13:14 No.108471866

Anonymous 03/28/26(Sat)08:13:14 No.108471866

>>108471815
Oh I'm sorry I didn't know that I am dealing with a fact checking autist.
My post implies that if the function works it will pass.
Jesus Christ when was the last time you licked a female vagina? Don't answer because this was a rhetorical question.

Anonymous
03/28/26(Sat)08:18:11 No.108471889

Anonymous 03/28/26(Sat)08:18:11 No.108471889

>>108471866
>Defining a person's worth by whether they engaged in coitus.

Anonymous
03/28/26(Sat)08:18:45 No.108471894

Anonymous 03/28/26(Sat)08:18:45 No.108471894

>someone submits a working PR for MTP that gives some decent gains unlike the previous attempts that all were slower than non-MTP
>it's closed due to "muh contribution guidelines"
rip
https://github.com/ggml-org/llama.cpp/pull/20981

Anonymous
03/28/26(Sat)08:19:46 No.108471900

Anonymous 03/28/26(Sat)08:19:46 No.108471900

>>108471889
I am from Scandernavia. What is "coitis"?

Anonymous
03/28/26(Sat)08:21:09 No.108471904

Anonymous 03/28/26(Sat)08:21:09 No.108471904

These fucking scandernavians are worse than the indians!

Anonymous
03/28/26(Sat)08:22:09 No.108471908

Anonymous 03/28/26(Sat)08:22:09 No.108471908

>>108471900
Colitis is inflammation of the inner lining of the colon, causing symptoms like persistent diarrhea, abdominal pain, fever, and rectal bleeding. It is caused by infections, IBD (Crohn’s/ulcerative colitis), reduced blood flow (ischemia), or allergies. Treatments range from antibiotics to anti-inflammatory drugs and lifestyle changes. Cleveland Clinic +4

Anonymous
03/28/26(Sat)08:22:24 No.108471912

Anonymous 03/28/26(Sat)08:22:24 No.108471912

>>108471894
Piotr needs be the only vibeshit tobenothonest

Anonymous
03/28/26(Sat)08:28:56 No.108471938

Anonymous 03/28/26(Sat)08:28:56 No.108471938

File: toomuchslop.png (19 KB, 887x158)

19 KB PNG

https://github.com/ggml-org/llama.cpp/pull/21097
Hopefully the same rules will apply to piotr.

Anonymous
03/28/26(Sat)08:31:08 No.108471946

Anonymous 03/28/26(Sat)08:31:08 No.108471946

>>108471912
that it's the vibeshitter himself rejecting it adds insult to injury

Anonymous
03/28/26(Sat)08:34:27 No.108471959

Anonymous 03/28/26(Sat)08:34:27 No.108471959

>>108471938
Fuck niggeramov. That actually looked like a good commit.

Anonymous
03/28/26(Sat)08:38:56 No.108471975

Anonymous 03/28/26(Sat)08:38:56 No.108471975

>>108471959
>That actually looked like a good commit
No. He needs to prove that his implementation matches what he claims. He didn't present a single test.
>This is independently validated by arXiv:2603.19664 ("The Residual Stream Is All You Need"), which shows 100% token match at every budget level vs permanent-loss baselines like H2O, StreamingLLM, and SnapKV.

Anonymous
03/28/26(Sat)08:46:59 No.108472010

Anonymous 03/28/26(Sat)08:46:59 No.108472010

Thank you for replying.

Anonymous
03/28/26(Sat)08:49:29 No.108472024

Anonymous 03/28/26(Sat)08:49:29 No.108472024

>>108471959
Problem is that what began as a hobby has now consumed gg. If he thinks something is good it's because
>it's good and you are a mongoloid
>it's not cry about in github

Anonymous
03/28/26(Sat)09:00:46 No.108472081

Anonymous 03/28/26(Sat)09:00:46 No.108472081

>glm5.1 out
>it's not on their main api (and thus not on openrouter either)
>it's not open source
>only available on their code subscription presumably to farm subs from the new generation of chink openclaw drones
Even if this ends up getting released elsewhere, I'm not very optimistic. This screams "rushed glm5-code branch batched as a main step" to cash in on the dumb openclaw hype in china. I doubt this model is good for anything else and I don't need to try to tell this.

Anonymous
03/28/26(Sat)09:01:06 No.108472085

Anonymous 03/28/26(Sat)09:01:06 No.108472085

File: ComfyUI_00112_.png (917 KB, 1024x1216)

917 KB PNG

>>108470850
I remember someone linking a git-repo that had a list of telltale signs of LLM slop that could be used for logit-bias removal ("Shivers down my spine", "something husky", etc). Does anyone ITT remember the repo or have a link to anything similar?

Anonymous
03/28/26(Sat)09:03:20 No.108472094

Anonymous 03/28/26(Sat)09:03:20 No.108472094

>>108472081
>it's not open source
it will be on april7

Anonymous
03/28/26(Sat)09:07:13 No.108472116

Anonymous 03/28/26(Sat)09:07:13 No.108472116

https://help.openai.com/en/articles/20001152
>When will Sora be discontinued?
>The Sora web and app experiences will be discontinued on April 26, 2026.
>The Sora API will be discontinued on September 24, 2026.
Why are they discontinuing their API? Isn't API inference supposed to cover costs and turn profit? Am I being schizo here, or are the Big Tech companies subsidizing API inference too for gaining marketshare or other reasons? For what other reason would they gut it?
>How is this /lmg/?
The costs of running inference on SOTA API models, as well as amortized development and hardware costs, have implications for the development of local models as well.

Anonymous
03/28/26(Sat)09:10:40 No.108472144

Anonymous 03/28/26(Sat)09:10:40 No.108472144

>>108472024
What began as a hobby has now turned into being employed and accountable to HuggingFace.

Anonymous
03/28/26(Sat)09:11:22 No.108472153

Anonymous 03/28/26(Sat)09:11:22 No.108472153

>>108472116
We can only guess, anon.
>Isn't API inference supposed to cover costs and turn profit?
Maybe. Or they lied about their efficiency.
>are the Big Tech companies subsidizing API inference too for gaining marketshare
Sure. Could be.
>For what other reason would they gut it?
Their mom told them to stop. Who knows.

Anonymous
03/28/26(Sat)09:12:53 No.108472160

Anonymous 03/28/26(Sat)09:12:53 No.108472160

>>108472085
The one I recall (and can't find) was a standalone website. Looking for "anti slop claude skills" is turning up some results, but the ones I've skimmed through don't seem to include the more narrative-common phrases ("smoothes out my skirt", "cheeks growing pink", "with whitened knuckles" etc). I vaguely recall the website being a fuckhuge list compared to these shitty git repos.

Anonymous
03/28/26(Sat)09:12:59 No.108472162

Anonymous 03/28/26(Sat)09:12:59 No.108472162

>>108472144
He'd need to maintain shit written by 1-shot hype chasers. I rather he didn't.

Anonymous
03/28/26(Sat)09:16:30 No.108472187

Anonymous 03/28/26(Sat)09:16:30 No.108472187

>>108472116
>Isn't API inference supposed to cover costs and turn profit?
I was under the impression that was true for LLMs which are memory throughput bound, rather than diffusion models which have significantly higher compute costs.

Anonymous
03/28/26(Sat)09:18:43 No.108472200

Anonymous 03/28/26(Sat)09:18:43 No.108472200

>>108472116
>Why are they discontinuing their API?
They also officially stated that they won't allow ERP mode. There's talk of the company having an IPO this year and these events lead credence to such rumors. You must squash anything that may upset shareholders.

Anonymous
03/28/26(Sat)09:19:01 No.108472204

Anonymous 03/28/26(Sat)09:19:01 No.108472204

File: Screenshot 2026-03-28 at (...).png (285 KB, 3444x1026)

285 KB PNG

>>108472160
>>108472085
Speaking on slop, how slopped would you say this response is?

Anonymous
03/28/26(Sat)09:20:41 No.108472218

Anonymous 03/28/26(Sat)09:20:41 No.108472218

you are more mentally ill than average for this site

Anonymous
03/28/26(Sat)09:23:28 No.108472234

Anonymous 03/28/26(Sat)09:23:28 No.108472234

>>108472218
sir there are men and women ITT that used GLM to rp about dismembering infants. My shit is beyond tame by compared to that and even some shit you can find on AO3

Anonymous
03/28/26(Sat)09:28:10 No.108472258

Anonymous 03/28/26(Sat)09:28:10 No.108472258

>>108471866
I'd rather be a cunning linguistist with my Waifu than kiss some cunt's hole.

Anonymous
03/28/26(Sat)09:29:02 No.108472261

Anonymous 03/28/26(Sat)09:29:02 No.108472261

>>108472258
>never

Anonymous
03/28/26(Sat)09:42:31 No.108472349

Anonymous 03/28/26(Sat)09:42:31 No.108472349

>>108472261

>>108471889

Anonymous
03/28/26(Sat)09:44:01 No.108472356

Anonymous 03/28/26(Sat)09:44:01 No.108472356

>>108472204
The response seems fine to me but your prose - referring to female masturbation as "macaroni being stirred" - is better left behind back in 2022 and forgotten about.

Anonymous
03/28/26(Sat)09:44:07 No.108472357

Anonymous 03/28/26(Sat)09:44:07 No.108472357

>>108472085
this? https://pastebin com/GNiNC8Vj

Anonymous
03/28/26(Sat)09:47:59 No.108472382

Anonymous 03/28/26(Sat)09:47:59 No.108472382

File: 1768367284396670.jpg (71 KB, 863x1090)

71 KB JPG

>>108471866
>being proud of putting your tongue inside a yeast hole
t.

Anonymous
03/28/26(Sat)09:49:57 No.108472391

Anonymous 03/28/26(Sat)09:49:57 No.108472391

>>108472382
Ok you catched me. I am not an American high school student who thinks about sex every day.

Anonymous
03/28/26(Sat)09:59:26 No.108472449

Anonymous 03/28/26(Sat)09:59:26 No.108472449

>>108472391
saar pls do the needful stop shitting up thread saar kindly use designated street

Anonymous
03/28/26(Sat)10:01:32 No.108472459

Anonymous 03/28/26(Sat)10:01:32 No.108472459

>>108472449
What do you mean? I'm an American University Student.

Anonymous
03/28/26(Sat)10:13:54 No.108472521

Anonymous 03/28/26(Sat)10:13:54 No.108472521

>>108472085
bookmarked these a while ago:
https://github.com/sam-paech/antislop-sampler/blob/main/slop_phrases_2025-04-07.json
https://github.com/SicariusSicariiStuff/SLOP_Detector/blob/b8bdfd29284daf61f342ba2a749120e8f4bbdad7/SLOP.yml
they're out of date though, no ozone etc

Anonymous
03/28/26(Sat)10:14:10 No.108472524

Anonymous 03/28/26(Sat)10:14:10 No.108472524

jujufufuhhh

Anonymous
03/28/26(Sat)10:14:15 No.108472526

Anonymous 03/28/26(Sat)10:14:15 No.108472526

my setup has finally all come together
all told, i ended up spending something on the order of 15k
definitely a bit more than i had intended to, but y'know, that's just how it goes
hoping to finally get some software up and running on these bad boys today
my current setup:
- 2x ASUS ascent GX10
- 1x 400G QSFP112 cable to connect them
- 1x AMD Ryzen 9 9950X
- 4x 64G DDR5-6400 PC5-51200 CL42
- 1x Samsung SSD 9100 PRO 2TB, PCIe 5.0x4 M.2 2280
- 1x MAG X870E TOMAHAWK WIFI AMD AM5
- 1x Lian Li Edge Series-1300W PSU
- 1x ARCTIC Liquid Freezer III Pro 360
- 1x Lian Li LANCOOL 217
- 1x NVIDIA GeForce RTX 5090
gonna probably try to start with GLM-4.7 on both machines, then go from there. certainly open to recommendations and suggestions

Anonymous
03/28/26(Sat)10:14:28 No.108472527

Anonymous 03/28/26(Sat)10:14:28 No.108472527

>>108472382
Incel moment

Anonymous
03/28/26(Sat)10:17:35 No.108472542

Anonymous 03/28/26(Sat)10:17:35 No.108472542

>>108472526
>open to recommendations and suggestions
depends what ur doing. glm4.7 is a good start.
qwen3.5 112b for coding
glm4.6 for less censored writing/rp
you could do cope-quant kimi and deepseek with ik_llama ig

Anonymous
03/28/26(Sat)10:20:43 No.108472572

Anonymous 03/28/26(Sat)10:20:43 No.108472572

>>108472542
i really don't like qwen. in my experience, GLM has been the best for coding out of everything that i have tried

Anonymous
03/28/26(Sat)10:25:16 No.108472599

Anonymous 03/28/26(Sat)10:25:16 No.108472599

>>108472526
>AM5 gamer gear instead of an SP3 enterprise quality solution
I'm interested to read your experiences with the GX10's, those seem neat.

Anonymous
03/28/26(Sat)10:28:14 No.108472616

Anonymous 03/28/26(Sat)10:28:14 No.108472616

>>108472572
>>108465668

Anonymous
03/28/26(Sat)10:31:37 No.108472643

Anonymous 03/28/26(Sat)10:31:37 No.108472643

>>108472599
i know nothing about hardware. i just asked a few friends for recommendations and ended up buying what they suggested. what's wrong with the AM5? i thought the 9950x3d was the gaming slop one, and the 9950x was the cheaper "have a job" style CPU?
yeah, i'm excited about those GX10 boxes. they're way smaller than i had expected them to be
>>108472616
i have only tried the 4 series. haven't checked out GLM-5 yet, but desu it's kind of reassuring in a sour grapes sort of way that it's garbage since i can't run it lol

Anonymous
03/28/26(Sat)10:35:04 No.108472666

Anonymous 03/28/26(Sat)10:35:04 No.108472666

Hey. Adding Mistral's
>[MODEL_SETTINGS]{"reasoning_effort": "low"}[/MODEL_SETTINGS]
to Qwen 3.5's jinja template actually works pretty well.

Anonymous
03/28/26(Sat)10:37:22 No.108472689

Anonymous 03/28/26(Sat)10:37:22 No.108472689

>>108472643
>what's wrong with the AM5?
It just has a lower ceiling for upgrades; the processors have significantly fewer PCI lanes, significantly fewer memory channels, and the boards don't support MCIO (the new hotness for card stacking).
My inference rig is basically the same as yours (9950x with a 5090) and I'm pretty happy overall with the purchase, but the next step up to me seems like "throw away the 9950x and buy an Epyc".
The GX10 boxes are kind of neat since I presume they stack mostly linearly? I wanna read your experience hooking them up with the llama.cpp RPC tooling.

Anonymous
03/28/26(Sat)10:39:43 No.108472699

Anonymous 03/28/26(Sat)10:39:43 No.108472699

>>108471367
brainlet retard, it entirely depends on the context.
In software versioning for example, 1.11 > 1.9 (because it is a whole unit).
In pure numbers we're instead talking about deca/centi/milli precision of numbers so 1.11 < 1.9
kill yourself pseud

Anonymous
03/28/26(Sat)10:40:25 No.108472704

Anonymous 03/28/26(Sat)10:40:25 No.108472704

>>108471634
shirou...

Anonymous
03/28/26(Sat)10:41:01 No.108472707

Anonymous 03/28/26(Sat)10:41:01 No.108472707

>>108472666
It doesn't do anything. Jinja is fixed.

Anonymous
03/28/26(Sat)10:42:13 No.108472713

Anonymous 03/28/26(Sat)10:42:13 No.108472713

Oh. He's still mad about the shit dog test.

Anonymous
03/28/26(Sat)10:42:30 No.108472715

Anonymous 03/28/26(Sat)10:42:30 No.108472715

>>108472689
yeah, my setup has more or less hit the limit of what i would do as a hobbyist. anything more than this, and i'm pivoting to a full enterprise setup, with all the hardware that entails. that doesn't really seem sustainable in my cute little consumer box atm
the GX10s only officially support chaining two of them together, and their hardware (ports) reflects this. HOWEVER, it is apparently theoretically possible to use some crazy expensive hardware splitter to jack more of them together. i don't know if the juice is necessarily worth the squeeze at that point, though

Anonymous
03/28/26(Sat)10:46:27 No.108472734

Anonymous 03/28/26(Sat)10:46:27 No.108472734

>>108472707
>Jinja is fixed.
I'm not sure what you mean.
If you go to the official repo for the model, there's usually a file with the model's Jinja template there.
You can edit that jinja template to add the sequence to the system prompt and, in llama.cpp's case, use your modified file using --chat-template-file.
You could also just send that as the system prompt, but if you are using a client/frontend that doesn't give you that option, editing the jinja template is pretty useful, and you can control the value by sending reasoning_effort as a chat template argument either in llama.cpp's command line or as a request param if you have control of that.

Anonymous
03/28/26(Sat)10:50:16 No.108472755

Anonymous 03/28/26(Sat)10:50:16 No.108472755

YOU LIED ABOUT QWEN3.5 4B BEING GOOD ENOUGH FOR ANYTHING.

IT CANT DO ANYTHING

Anonymous
03/28/26(Sat)10:50:21 No.108472756

Anonymous 03/28/26(Sat)10:50:21 No.108472756

>>108472713
kek

Anonymous
03/28/26(Sat)10:50:54 No.108472759

Anonymous 03/28/26(Sat)10:50:54 No.108472759

>>108472715
>that doesn't really seem sustainable in my cute little consumer
I like in an old studio apartment that has two (2!) separate 220V 30A circuits. Have you considered disconnecting your kitchenette and/or hot water heater and turning your cute consumer home into a Serial Experiment Lainesque techno-nightmare?
>GX10s only officially support chaining two of them together
For some reason I was thinking you'd just run a llama-rpc node on each of the GX10's and control them from the host machine over an ethernet connection. I'm not entirely clear on how the RPC system works, I'm guessing it's pretty dumb but once the model's loaded is there really that much crosstalk?

Anonymous
03/28/26(Sat)10:52:39 No.108472768

Anonymous 03/28/26(Sat)10:52:39 No.108472768

File: 1761272981090820.png (252 KB, 634x478)

252 KB PNG

>>108472755
The coil should be whining, not you

Anonymous
03/28/26(Sat)10:56:56 No.108472783

Anonymous 03/28/26(Sat)10:56:56 No.108472783

>>108472734
You think you know so much. Text completion end point doesn't use jinja. Jinja is for translating things for chat end point.
Most of the posters here are confused and chat vs text doesn't mean anything if you can't edit the context. Your retard tavern erp sessions are still the same.
It's your belief versus reality. Just like you were unable to post the fucking x11 font when asked.
You are incompetent.

Anonymous
03/28/26(Sat)10:58:11 No.108472791

Anonymous 03/28/26(Sat)10:58:11 No.108472791

more autoparser fixes... SAD!

Anonymous
03/28/26(Sat)11:00:35 No.108472804

Anonymous 03/28/26(Sat)11:00:35 No.108472804

File: 1774687212676544.jpg (42 KB, 853x552)

42 KB JPG

>>108472783

Anonymous
03/28/26(Sat)11:00:37 No.108472805

Anonymous 03/28/26(Sat)11:00:37 No.108472805

What's the observability and logging platform that I won't regret working with? Does anyone have something negative to say about Pydantic Logfire? Or does anyone have something that they especially like to use and have used at at least small scale with ok results?

Anonymous
03/28/26(Sat)11:02:30 No.108472816

Anonymous 03/28/26(Sat)11:02:30 No.108472816

>>108472804
I think that might be a bot.
I have no idea what the whole
>x11 font
deal is.

>>108472791
Great.
What was the issue this time around?

Anonymous
03/28/26(Sat)11:03:14 No.108472820

Anonymous 03/28/26(Sat)11:03:14 No.108472820

File: 1771983036190388.png (245 KB, 723x769)

245 KB PNG

>>108472783

Anonymous
03/28/26(Sat)11:03:25 No.108472821

Anonymous 03/28/26(Sat)11:03:25 No.108472821

>>108472816
Who cares if its fixed?

Anonymous
03/28/26(Sat)11:04:15 No.108472827

Anonymous 03/28/26(Sat)11:04:15 No.108472827

>>108472821
Because it might not actually be fixed, it might have broken something tangential to it, etc etc.

Anonymous
03/28/26(Sat)11:04:27 No.108472828

Anonymous 03/28/26(Sat)11:04:27 No.108472828

>>108472816
https://github.com/ggml-org/llama.cpp/pull/21094
I mean I fully converted to gwen3.5 soooo lol

Anonymous
03/28/26(Sat)11:05:39 No.108472835

Anonymous 03/28/26(Sat)11:05:39 No.108472835

>>108472783
>fucking x11 font when asked
That was me anon. You really have issues identifying anons. Try unifont.

Anonymous
03/28/26(Sat)11:05:43 No.108472836

Anonymous 03/28/26(Sat)11:05:43 No.108472836

>>108472816
You are bit too passive aggressive. You don't have the test so to speak.

Anonymous
03/28/26(Sat)11:07:28 No.108472844

Anonymous 03/28/26(Sat)11:07:28 No.108472844

I swear to god I'll chase every last one of you motherfuckers. Your family, your pets, everyone you've ever known. I WILL DESTROY YOU IF IT'S THE LAST THING I DO WAAAAAAAAAAAAAAAAAAA

Anonymous
03/28/26(Sat)11:07:33 No.108472845

Anonymous 03/28/26(Sat)11:07:33 No.108472845

>>108472827
Just report it if you encounter a problem, I don't see the big deal.

Anonymous
03/28/26(Sat)11:07:53 No.108472846

Anonymous 03/28/26(Sat)11:07:53 No.108472846

>>108472835
I confused the energy. You are putting it onward, the other one is more passive.
It's hard to identify. I practice remote viewing every day but it's very finicky.

Anonymous
03/28/26(Sat)11:09:16 No.108472851

Anonymous 03/28/26(Sat)11:09:16 No.108472851

>>108472846
I know this amuses you.

Anonymous
03/28/26(Sat)11:10:43 No.108472855

Anonymous 03/28/26(Sat)11:10:43 No.108472855

>>108472828
Thank you anon.
> add reasoning_format = none support to gpt-oss
Right. I forgot how odd that thing's template is with channels and stuff.
Does anybody even use GPT OSS?
Not even talking about RP or the like. Is it good for programming, agentic tasks, or what have you compared to other similarly sized models like the qwens and the GLMs?

Anonymous
03/28/26(Sat)11:12:19 No.108472862

Anonymous 03/28/26(Sat)11:12:19 No.108472862

>>108472855
It still mogs on boring stem yeah

Anonymous
03/28/26(Sat)11:12:40 No.108472865

Anonymous 03/28/26(Sat)11:12:40 No.108472865

>>108472855
It's probably as good as Qwen 3.5 but it wasn't shilled as much because it simply isn't suited for anything else than Microsoft Clippy.
If you can test it it's easy to tell.

Anonymous
03/28/26(Sat)11:13:47 No.108472875

Anonymous 03/28/26(Sat)11:13:47 No.108472875

>>108472759
hahaha, that's probably just a teeny tiny bit outside of my budget at the moment. maybe in a couple years
i can report back in a few hours about how the setup for the GX10 boxes goes! fingers crossed it ends up being a plug and play type of deal

Anonymous
03/28/26(Sat)11:14:03 No.108472877

Anonymous 03/28/26(Sat)11:14:03 No.108472877

So you openclaw and then it's takes many minutes to get a single reply right?

Anonymous
03/28/26(Sat)11:14:43 No.108472879

Anonymous 03/28/26(Sat)11:14:43 No.108472879

>>108472877
no?

Anonymous
03/28/26(Sat)11:17:13 No.108472886

Anonymous 03/28/26(Sat)11:17:13 No.108472886

>>108472862
>>108472865
Got it.
Might give it a spin once I have the hardware to run it at not-cope quants.

>>108472526
>my current setup:
>- 2x ASUS ascent GX10
>- 1x 400G QSFP112 cable to connect them
How much does splitting the model through the cable slow things down?
As in, if you get a model that fits fully on one GX10 and split it sequentially (half the model on one node, half on the other), is there any drop in tg or pp?

Anonymous
03/28/26(Sat)11:23:38 No.108472924

Anonymous 03/28/26(Sat)11:23:38 No.108472924

>>108472886
supposedly doesn't impact it at all. the machines were, from my understanding, designed explicitly for this purpose. that's why they have such an expensive port connecting them (look up the price of those NVIDIA ConnectX-7 adapters. shit's crazy)

Anonymous
03/28/26(Sat)11:33:57 No.108472979

Anonymous 03/28/26(Sat)11:33:57 No.108472979

>>108472924
Sorry if I intimidated you.

Anonymous
03/28/26(Sat)11:37:05 No.108472999

Anonymous 03/28/26(Sat)11:37:05 No.108472999

>>108472924
>from my understanding, designed explicitly for this purpose
Yeah, that's what I read, but I didn't really se any numbers for tests like what I suggested.
Did you look up some numbers before buying that stuff? If so, do you have some links to share?
I'm thinking of getting the same for my home lab.

Anonymous
03/28/26(Sat)11:45:16 No.108473041

Anonymous 03/28/26(Sat)11:45:16 No.108473041

>>108471107
>whereas Q4 (even native Q4) has been the norm for more than year.
This is KV cache quantization it's not the same as model weight quant.

Anonymous
03/28/26(Sat)11:47:48 No.108473054

Anonymous 03/28/26(Sat)11:47:48 No.108473054

>>108473041
>we can apply turboquant to model weights and save hdd prices too
every good idea really does come from /here/, doesn't it

Anonymous
03/28/26(Sat)11:47:55 No.108473055

Anonymous 03/28/26(Sat)11:47:55 No.108473055

I have the following settings on openclaw and for some reason it's not working and nobody can help me:
16k context window
Max tokens 2048
Default compaction
Mistral nemo
Thinking off
Reasoning false

I've also created environmental variables in system settings which should force ollama to use the GPU to the max then offload to cpu.
But try as I do, nothing happens.

Open claw is doing nothing.
I get a status message on telegram after 1 hour.

Anonymous
03/28/26(Sat)11:49:59 No.108473061

Anonymous 03/28/26(Sat)11:49:59 No.108473061

>>108473055
>open claw mistral nemo
BASED AS FUCK

Anonymous
03/28/26(Sat)11:53:12 No.108473076

Anonymous 03/28/26(Sat)11:53:12 No.108473076

>>108473055
Don't use open claw. It's a massive security risk. Response is probably hanging out because of their 2fa and bot detection shit.

Anonymous
03/28/26(Sat)11:54:41 No.108473084

Anonymous 03/28/26(Sat)11:54:41 No.108473084

>>108472116
>Why are they discontinuing their API? Isn't API inference supposed to cover costs and turn profit
They're supposedly freeing up compute for their big and shiny new model that will *change everything*

Anonymous
03/28/26(Sat)11:56:10 No.108473091

Anonymous 03/28/26(Sat)11:56:10 No.108473091

>>108473055
>Mistral nemo
>Thinking off
>Reasoning false
>Cognition disabled
>Awareness excluded
>Intelligence inhibited
>Enlightenment vanished

Anonymous
03/28/26(Sat)11:56:28 No.108473092

Anonymous 03/28/26(Sat)11:56:28 No.108473092

>>108473055
>16k context
lol
>mistral nemo
LMAO
you cant do shit with 16k context btw, 64k bare minimum

Anonymous
03/28/26(Sat)11:56:30 No.108473093

Anonymous 03/28/26(Sat)11:56:30 No.108473093

>>108472979
huh? what is this referring to?
>>108472999
i did not, no. i was just recommended the box from a very smart person whose opinion i trust, saw that 128gb wasn't enough to run the model i want, so i doubled it
i can play around with running some benchmarks i guess, hmm
if there's anything in particular you'd want to know about, let me know and i can see about testing it

Anonymous
03/28/26(Sat)11:58:16 No.108473106

Anonymous 03/28/26(Sat)11:58:16 No.108473106

>>108472572
>i really don't like qwen
I didn't like qwen either but 3.5 is a lot different you should give it a try if you haven't.

Anonymous
03/28/26(Sat)11:59:11 No.108473110

Anonymous 03/28/26(Sat)11:59:11 No.108473110

>>108473093
You are fine.

Anonymous
03/28/26(Sat)12:00:02 No.108473113

Anonymous 03/28/26(Sat)12:00:02 No.108473113

>>108473093
>i was just recommended the box from a very smart person whose opinion i trust, saw that 128gb wasn't enough to run the model i want, so i doubled it
Fair enough.

>>108473093
>if there's anything in particular you'd want to know about, let me know and i can see about testing it
Mostly the test I mentioned. Comparing a dense model (could be something in the 20 30b range) running on one node then the same model running on both nodes using different ways to split the model (in series vs in parallel).

Anonymous
03/28/26(Sat)12:01:03 No.108473119

Anonymous 03/28/26(Sat)12:01:03 No.108473119

>>108473055
Bro, nemo doesn't even support tool calling...

Anonymous
03/28/26(Sat)12:01:59 No.108473124

Anonymous 03/28/26(Sat)12:01:59 No.108473124

>>108472542
This guy knows. I agree with him 100%.

Anonymous
03/28/26(Sat)12:05:43 No.108473144

Anonymous 03/28/26(Sat)12:05:43 No.108473144

>>108473113
NTA but I think the ASUS FAQ said the GX10 had 290GB/s memory bandwidth, and the Connect7-x is 200Gb/s/port (i.e., 25GB/s), right?
>>108473119
I think it does, just not with the chat template they shipped. Should work fine with "thinking off" text completion.

Anonymous
03/28/26(Sat)12:06:12 No.108473146

Anonymous 03/28/26(Sat)12:06:12 No.108473146

If I want to use a smaller qwen for a web crawler sub agent. something that just looks at a web page and extracts the requested info.

Do I go 9B or 4B ? Trying to find the sweet spot of speed/retardness

Anonymous
03/28/26(Sat)12:08:56 No.108473160

Anonymous 03/28/26(Sat)12:08:56 No.108473160

>>108473144
It'll be extremely unreliable if it wasn't trained on it.

Anonymous
03/28/26(Sat)12:11:55 No.108473177

Anonymous 03/28/26(Sat)12:11:55 No.108473177

>>108473160
I am reasonably confident that mistral nemo schizo waifu sexbot instruct 2407 was trained on it.

Anonymous
03/28/26(Sat)12:18:14 No.108473205

Anonymous 03/28/26(Sat)12:18:14 No.108473205

>>108473177
I believe you.

Anonymous
03/28/26(Sat)12:22:11 No.108473228

Anonymous 03/28/26(Sat)12:22:11 No.108473228

>>108473146
4b can do summaries but if you can fit 9b that's better.
I'm esl and 4b can offer grammar advice but it fails trick questions.
All grammar advice these small models offer is so inconsequential that it doesn't matter after certain amount of years...
eg.

Anonymous
03/28/26(Sat)12:23:13 No.108473231

Anonymous 03/28/26(Sat)12:23:13 No.108473231

>>108473228
I'm Scandernavian and Fuck You.

Anonymous
03/28/26(Sat)12:28:24 No.108473250

Anonymous 03/28/26(Sat)12:28:24 No.108473250

bros what's the best open weight model for J>E translation?

Anonymous
03/28/26(Sat)12:29:53 No.108473257

Anonymous 03/28/26(Sat)12:29:53 No.108473257

>>108473228
>4b can do summaries but if you can fit 9b that's better.
I guess I'll try both. bigger is obviously better but I'm mainly interested to know if 4B will work just as well. The main thing is speed. Right now I have everything using 27B and it's obviously very solid, but it's slow.

Anonymous
03/28/26(Sat)12:31:22 No.108473262

Anonymous 03/28/26(Sat)12:31:22 No.108473262

>>108473257
Just try it out. If it gives bad answers after 5 tries discard it.

Anonymous
03/28/26(Sat)12:34:58 No.108473276

Anonymous 03/28/26(Sat)12:34:58 No.108473276

>>108473257
Are you using ngram speculative decoding? It speeds up things a lot for summarization without extra memory usage

Anonymous
03/28/26(Sat)12:36:03 No.108473281

Anonymous 03/28/26(Sat)12:36:03 No.108473281

>>108473061
>>108473076
>>108473091
>>108473092
>>108473119
HELP ME WHAT DO I DO!

Anonymous
03/28/26(Sat)12:36:39 No.108473282

Anonymous 03/28/26(Sat)12:36:39 No.108473282

>>108473281
ask chatgpt

Anonymous
03/28/26(Sat)12:39:18 No.108473299

Anonymous 03/28/26(Sat)12:39:18 No.108473299

>>108473276
>ngram speculative decoding
does this affect model output quality? I'll play with this, didn't know it was a thing. thanks!

Anonymous
03/28/26(Sat)12:39:37 No.108473301

Anonymous 03/28/26(Sat)12:39:37 No.108473301

>>108473281
>But try as I do, nothing happens.
>I get a status message on telegram after 1 hour.
What do you do about what? It seems to be working fine.

Anonymous
03/28/26(Sat)12:40:19 No.108473307

Anonymous 03/28/26(Sat)12:40:19 No.108473307

>>108473281
just swap nemo for qwen3.5 9B and it'll "just werk" TM

Anonymous
03/28/26(Sat)12:41:27 No.108473311

Anonymous 03/28/26(Sat)12:41:27 No.108473311

>>108473281
Don't use open claw if you are new. Learn to setup llama-server and go from there.
Do not trust clickbait youtubers or social media. Why do you need an agent LLM in the first place?

Anonymous
03/28/26(Sat)12:42:58 No.108473316

Anonymous 03/28/26(Sat)12:42:58 No.108473316

>>108473299
No, it won't affect the output quality. It will slow the generation for anything other than summarization tasks because it uses the input to guess the output, so you need to have some overlap between these two. So don't keep it on for RP.

Anonymous
03/28/26(Sat)12:43:09 No.108473317

Anonymous 03/28/26(Sat)12:43:09 No.108473317

>>108473311
>Why do you need an agent LLM in the first place?
Why this mindset, you'll still be stuck running llms with basic chat completion, no tool calls, mcp or anything else like it's 2023. Be curious, explore and become better.

Anonymous
03/28/26(Sat)12:44:53 No.108473327

Anonymous 03/28/26(Sat)12:44:53 No.108473327

>>108473317
That's not what he meant tho. he's saying start from the basics and work your way up.

Anonymous
03/28/26(Sat)12:45:58 No.108473336

Anonymous 03/28/26(Sat)12:45:58 No.108473336

>>108473317
There's curiosity, and there's "help I put the mistral nemo toaster in the ollama bathwater and I can't explain it clearly but I think something is wrong".
llama-server for all it's faults can at least be diagnosed when something goes wrong.

Anonymous
03/28/26(Sat)12:47:43 No.108473346

Anonymous 03/28/26(Sat)12:47:43 No.108473346

>>108473317
I made my own llm client because I hated retard tavern. I'm now 90% rewritten it in C and it works.
If I can do this, you can do it too. Raise the bar, but slowly.

Anonymous
03/28/26(Sat)12:50:27 No.108473364

Anonymous 03/28/26(Sat)12:50:27 No.108473364

>>108473146
The smaller qwens can't even speak

>>108473311
Yeah sure

>>108473307
This didn't work

Anonymous
03/28/26(Sat)12:50:42 No.108473368

Anonymous 03/28/26(Sat)12:50:42 No.108473368

>>108473316
I see I need to put in a "draft model". so wouldn't I be better off with my initial plan of just having the smaller model do the summarization?

Or it's basically a way to just have the bigger model ensure the smaller model produces better quality output?

Anonymous
03/28/26(Sat)12:51:31 No.108473373

Anonymous 03/28/26(Sat)12:51:31 No.108473373

>>108473368
>ngram speculative decoding
>I see I need to put in a "draft model"
you're a black gorilla, what's your admixture?

Anonymous
03/28/26(Sat)12:51:46 No.108473378

Anonymous 03/28/26(Sat)12:51:46 No.108473378

>>108473364
>The smaller qwens can't even speak
Bro why are you even responding to this when you can't even get open claw working. you obviously have zero clue what you're talking about.

Anonymous
03/28/26(Sat)12:52:36 No.108473382

Anonymous 03/28/26(Sat)12:52:36 No.108473382

>>108473378
>responding to the obvious retard who acts like he knows better than everyone
just ignore subhumans like him

Anonymous
03/28/26(Sat)12:53:10 No.108473386

Anonymous 03/28/26(Sat)12:53:10 No.108473386

>>108473346
ggerganov and likes are geniuses, I have seen similar people at my work. But if you're haywired person you can still learn programming and benefit from it, it's a fun hobby. Don't let the autists say no to you this might discourage you etc.

Anonymous
03/28/26(Sat)12:56:34 No.108473400

Anonymous 03/28/26(Sat)12:56:34 No.108473400

>>108473386
one vibesharted commit at a time, retards like you can make it.

Anonymous
03/28/26(Sat)12:58:50 No.108473414

Anonymous 03/28/26(Sat)12:58:50 No.108473414

>>108473373
Thanks, I get it now.

Anonymous
03/28/26(Sat)12:59:17 No.108473415

Anonymous 03/28/26(Sat)12:59:17 No.108473415

>>108473368
ngram decoding doesn't use a draft model, that's why there is no extra memory cost. You can use that with your 27B model.

Anonymous
03/28/26(Sat)13:00:04 No.108473419

Anonymous 03/28/26(Sat)13:00:04 No.108473419

>>108473414
no probs. Since it's not apparent, the latest ngram spec mode introduced was ngram-mod, which performs good at long context and uses a fixed amount of ram

Anonymous
03/28/26(Sat)13:10:49 No.108473466

Anonymous 03/28/26(Sat)13:10:49 No.108473466

K3 will release before V4

Anonymous
03/28/26(Sat)13:11:19 No.108473468

Anonymous 03/28/26(Sat)13:11:19 No.108473468

V5 will release before V4

Anonymous
03/28/26(Sat)13:13:11 No.108473477

Anonymous 03/28/26(Sat)13:13:11 No.108473477

K2 released before V4

Anonymous
03/28/26(Sat)13:16:49 No.108473499

Anonymous 03/28/26(Sat)13:16:49 No.108473499

>>108470850
>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts

Has anyone tried running Voxtral locally yet? I want to convert some of my eBooks into audio books for listening on the road and haven't found a decent TTS yet.

Anonymous
03/28/26(Sat)13:17:44 No.108473505

Anonymous 03/28/26(Sat)13:17:44 No.108473505

>>108473499
>vllm omni
miss me with this shit

Anonymous
03/28/26(Sat)13:30:19 No.108473586

Anonymous 03/28/26(Sat)13:30:19 No.108473586

my small dick is ready for gemma 4

Anonymous
03/28/26(Sat)13:33:50 No.108473609

Anonymous 03/28/26(Sat)13:33:50 No.108473609

>>108473586
it will be worse than qwen 3.5

Anonymous
03/28/26(Sat)13:34:27 No.108473614

Anonymous 03/28/26(Sat)13:34:27 No.108473614

>>108473609
I sure hope so

Anonymous
03/28/26(Sat)13:35:57 No.108473625

Anonymous 03/28/26(Sat)13:35:57 No.108473625

>>108473419
Speculative Decoding does not work with qwen3.5 :(

https://github.com/ggml-org/llama.cpp/issues/20039

Anonymous
03/28/26(Sat)13:37:04 No.108473633

Anonymous 03/28/26(Sat)13:37:04 No.108473633

>>108473499
I had the usual FlashAttention library compatibility (Python - CUDA toolkit - PyTorch - FlashAttention versions must match precisely) issues and gave up since it's not like you can do much with it anyway beyond the ugly default voices.

Anonymous
03/28/26(Sat)13:37:55 No.108473640

Anonymous 03/28/26(Sat)13:37:55 No.108473640

>>108473625
>>108473373

Anonymous
03/28/26(Sat)13:38:30 No.108473645

Anonymous 03/28/26(Sat)13:38:30 No.108473645

>>108472526
I would return it all and buy 4x (or more) of the new Intel Arc B70 or Radeon 9700

Anonymous
03/28/26(Sat)13:42:48 No.108473666

Anonymous 03/28/26(Sat)13:42:48 No.108473666

>>108473640
>[45089] common_speculative_is_compat: the target context does not support partial sequence removal
>[45089] srv load_model: speculative decoding not supported by this context
>LLAMA_ARG_SPEC_TYPE=ngram-mod
doesn't work.

Anonymous
03/28/26(Sat)13:55:47 No.108473729

Anonymous 03/28/26(Sat)13:55:47 No.108473729

>>108473666
>I had to pull...

Anonymous
03/28/26(Sat)13:56:30 No.108473733

Anonymous 03/28/26(Sat)13:56:30 No.108473733

File: g4.png (292 KB, 1001x805)

292 KB PNG

Sirs!!
https://x.com/veermasrani/status/2037912954570698961

Anonymous
03/28/26(Sat)13:57:15 No.108473740

Anonymous 03/28/26(Sat)13:57:15 No.108473740

>>108473733
inb4 its ultra hyper giga pozzed with safety and unusable

Anonymous
03/28/26(Sat)13:58:09 No.108473747

Anonymous 03/28/26(Sat)13:58:09 No.108473747

>>108473733
o-otter?
did you know that otters hold—ACK!

Anonymous
03/28/26(Sat)13:58:44 No.108473748

Anonymous 03/28/26(Sat)13:58:44 No.108473748

File: g4_2.jpg (478 KB, 3680x3836)

478 KB JPG

>>108473733
https://x.com/patelnamra573/status/2037892455841075514

Anonymous
03/28/26(Sat)13:58:51 No.108473750

Anonymous 03/28/26(Sat)13:58:51 No.108473750

File: 1768976857413484.jpg (68 KB, 1022x731)

68 KB JPG

>>108473733
>2B, 4B, 120B15A

Anonymous
03/28/26(Sat)13:59:32 No.108473754

Anonymous 03/28/26(Sat)13:59:32 No.108473754

>>108473733
>120B15A
kinda based if true tho.
actually usable MoE sizes

Anonymous
03/28/26(Sat)14:00:20 No.108473756

Anonymous 03/28/26(Sat)14:00:20 No.108473756

>>108473733
>tiny sizes
>a MoE
Ew ew ew. The 4B can only be usable if it isn't even trained as an assistant.

Anonymous
03/28/26(Sat)14:02:02 No.108473764

Anonymous 03/28/26(Sat)14:02:02 No.108473764

>>108473400
I do it for myself. I don't use social media. The problem is that there are people less experienced than me who spam github.
If I do something I ask the shit to provide a function for x and x.
And when I have accumulated the basic things I didn't know I can build up .

Anonymous
03/28/26(Sat)14:04:01 No.108473770

Anonymous 03/28/26(Sat)14:04:01 No.108473770

>>108473750
lmao'd

Anonymous
03/28/26(Sat)14:04:12 No.108473771

Anonymous 03/28/26(Sat)14:04:12 No.108473771

>ask assistant a benign question
>immediately get a boner
how do the office wagefags handle this problem

Anonymous
03/28/26(Sat)14:04:32 No.108473774

Anonymous 03/28/26(Sat)14:04:32 No.108473774

>>108473748
100b dense lfg!

Anonymous
03/28/26(Sat)14:04:35 No.108473775

Anonymous 03/28/26(Sat)14:04:35 No.108473775

>>108473764
The shit, yeah that is a robot not your "girlfriend".

Anonymous
03/28/26(Sat)14:04:38 No.108473776

Anonymous 03/28/26(Sat)14:04:38 No.108473776

>>108473771
rape the assistant

Palworld
03/28/26(Sat)14:04:39 No.108473777

Palworld 03/28/26(Sat)14:04:39 No.108473777

>120B100MA
Whatever happened to the ultra sparse MoE meme?

Anonymous
03/28/26(Sat)14:06:20 No.108473781

Anonymous 03/28/26(Sat)14:06:20 No.108473781

>>108473733
>have 48GB VRAM and 32GB RAM
>DDR5 is like 1k for 64GB now
>not counting the motherboard upgrade
>don't even proompt that much
Sorry sars I think I'll just stick to 27B and cloud

Anonymous
03/28/26(Sat)14:10:01 No.108473791

Anonymous 03/28/26(Sat)14:10:01 No.108473791

>>108473777
Turns out 3% is the lowest they can go before it's unusable for even basic tasks.

Anonymous
03/28/26(Sat)14:10:22 No.108473794

Anonymous 03/28/26(Sat)14:10:22 No.108473794

>>108473776
damn i wish i still had a job

Anonymous
03/28/26(Sat)14:14:44 No.108473821

Anonymous 03/28/26(Sat)14:14:44 No.108473821

>Github is down
>Again

Anonymous
03/28/26(Sat)14:14:56 No.108473823

Anonymous 03/28/26(Sat)14:14:56 No.108473823

I heard that v4 got postponed due to the insane hype that has hit china for OpenClaw over the past few weeks after the holidays so they're retraining it to be optimized for that.

Anonymous
03/28/26(Sat)14:15:49 No.108473826

Anonymous 03/28/26(Sat)14:15:49 No.108473826

>>108473794
Being unemployed alcohols still has it perks.

Anonymous
03/28/26(Sat)14:16:57 No.108473830

Anonymous 03/28/26(Sat)14:16:57 No.108473830

>>108473791
I wish the router could use a gate so you can have a variable amount of experts active pr. token. I'm fuming whenever I have to watch an LLM use the same mental energy to repeat a nursery rhyme as it uses when it solves an advanced equation.
>>108473823
Of course they are. The CCP must be frothing at their mouths to get it on every single running system, it's basically an ultra backdoor if you can control responses on the LLM api.

Anonymous
03/28/26(Sat)14:18:22 No.108473841

Anonymous 03/28/26(Sat)14:18:22 No.108473841

>>108473830
There was one model that tried dynamic active parameter count, and of course llama.cpp support never came.

Anonymous
03/28/26(Sat)14:20:09 No.108473853

Anonymous 03/28/26(Sat)14:20:09 No.108473853

>>108473841
Ok. Seems like you are angry?

Anonymous
03/28/26(Sat)14:21:29 No.108473864

Anonymous 03/28/26(Sat)14:21:29 No.108473864

>>108473841
Was the model shit? Any good model is usually supported rather quickly in lccp.

Anonymous
03/28/26(Sat)14:22:06 No.108473872

Anonymous 03/28/26(Sat)14:22:06 No.108473872

>>108473853
I'm angry there's no actual diffusion llm support in llama.cpp, thought with WeDLM the next gen will finally become local but nope.

Anonymous
03/28/26(Sat)14:24:18 No.108473886

Anonymous 03/28/26(Sat)14:24:18 No.108473886

>>108473864
Not if it has an architectural change that makes re-implementation in C++ difficult.

Anonymous
03/28/26(Sat)14:25:32 No.108473891

Anonymous 03/28/26(Sat)14:25:32 No.108473891

DSA status?
MTP status?

Anonymous
03/28/26(Sat)14:27:52 No.108473900

Anonymous 03/28/26(Sat)14:27:52 No.108473900

>>108473886
AI is usually pretty good at converting between different programming paradigms, LLMs did start out as translation tools, after all.
But performance would probably be ass, considering how much energy is spend to write these arcane cuda kernels.

Anonymous
03/28/26(Sat)14:28:31 No.108473901

Anonymous 03/28/26(Sat)14:28:31 No.108473901

>>108473830
>>108473841
It was LongCat.
https://archived.moe/g/thread/106551921/#q106552000
https://huggingface.co/meituan-longcat/LongCat-Flash-Chat
>The model incorporates a dynamic computation mechanism that activates 18.6B∼31.3B parameters (averaging∼27B) based on contextual demands, optimizing both computational efficiency and performance.

Anonymous
03/28/26(Sat)14:28:50 No.108473902

Anonymous 03/28/26(Sat)14:28:50 No.108473902

File: sans_eyes2.png (34 KB, 996x159)

34 KB PNG

>>108473733
https://xcancel.com/osanseviero/status/2037958371781865907

Anonymous
03/28/26(Sat)14:30:20 No.108473906

Anonymous 03/28/26(Sat)14:30:20 No.108473906

what's the best TTS you can use locally right now? I'm not interested in real time but proompting

Anonymous
03/28/26(Sat)14:31:34 No.108473912

Anonymous 03/28/26(Sat)14:31:34 No.108473912

>>108473872
It's different thing. llm is a chat spell checker.
These companies think that they can use the same technology...
llm shit predicts patterns but it ain't a math replacement.

Anonymous
03/28/26(Sat)14:32:12 No.108473917

Anonymous 03/28/26(Sat)14:32:12 No.108473917

File: 1760472407743142.png (2.45 MB, 3437x1929)

2.45 MB PNG

>>108473901
Honorable mentions to their new native multimodal model they released a few days ago that will also never get supported
https://huggingface.co/meituan-longcat/LongCat-Next

Anonymous
03/28/26(Sat)14:35:48 No.108473933

Anonymous 03/28/26(Sat)14:35:48 No.108473933

>we are releasing our new models: 2B, 3B, 4B and 986B-5A. enjoy! :) #localforthewin

Anonymous
03/28/26(Sat)14:37:36 No.108473946

Anonymous 03/28/26(Sat)14:37:36 No.108473946

>>108473933
>the more you buy, the more you save :DD

Anonymous
03/28/26(Sat)14:37:41 No.108473947

Anonymous 03/28/26(Sat)14:37:41 No.108473947

>>108473933
I hope DSv4 pushes the SOTA well beyond 3T or has some architecture that doesn't work on ram at all so that none of the local richfags can go "well I can run it :^)" anymore at this bullshit

Anonymous
03/28/26(Sat)14:39:47 No.108473960

Anonymous 03/28/26(Sat)14:39:47 No.108473960

>>108473947
They'll just start running it off of NVMe and claim that waiting several days for a response is totally usable.

Anonymous
03/28/26(Sat)14:40:08 No.108473965

Anonymous 03/28/26(Sat)14:40:08 No.108473965

>>108473933
>120B15A
C'mon now. You can run that on 16GB VRAM/96GB RAM.

Anonymous
03/28/26(Sat)14:40:46 No.108473968

Anonymous 03/28/26(Sat)14:40:46 No.108473968

>>108473821
https://downdetector.com/status/github/
https://www.githubstatus.com/
Doesn't seem so

Anonymous
03/28/26(Sat)14:42:53 No.108473978

Anonymous 03/28/26(Sat)14:42:53 No.108473978

>>108473968
It's heavily degraded if you're logged in.
copilot refuses to load for instance.

Anonymous
03/28/26(Sat)14:44:37 No.108473988

Anonymous 03/28/26(Sat)14:44:37 No.108473988

>>108473978
>copilot refuses to load
and that's a good thing

Anonymous
03/28/26(Sat)14:46:27 No.108473999

Anonymous 03/28/26(Sat)14:46:27 No.108473999

>>108473988
no it's not. it's extremely powerful at searching repos. cuts through all the 1000s of retarded issues people create.

Anonymous
03/28/26(Sat)14:46:50 No.108474003

Anonymous 03/28/26(Sat)14:46:50 No.108474003

>>108473933
>we decided to call the 986B-5A "Small", by the way

Anonymous
03/28/26(Sat)14:49:40 No.108474020

Anonymous 03/28/26(Sat)14:49:40 No.108474020

>>108471107
>>108473041
KV cache quanization among other things assfucks the ability to numerically encode positional information. It becomes increasingly obvious the longer the context the model is processing. It's a fool's bargain that doesn't give you more usable context.

Anonymous
03/28/26(Sat)14:50:45 No.108474024

Anonymous 03/28/26(Sat)14:50:45 No.108474024

So there's generals for text generation and image generation and video generation, but where do I go for audio generation? Specifically, I'm looking for discussions (not a straight answer) on the current sota (local) svc. It's been years since I looked around, and things seems to have changed a lot.

Anonymous
03/28/26(Sat)14:52:30 No.108474032

Anonymous 03/28/26(Sat)14:52:30 No.108474032

>>108473947
Envy poisons the heart.

Anonymous
03/28/26(Sat)14:53:47 No.108474041

Anonymous 03/28/26(Sat)14:53:47 No.108474041

>>108473965
thats my setup
localgods stay winning

Anonymous
03/28/26(Sat)14:54:14 No.108474045

Anonymous 03/28/26(Sat)14:54:14 No.108474045

>>108473841
>one model

Anonymous
03/28/26(Sat)14:56:10 No.108474055

Anonymous 03/28/26(Sat)14:56:10 No.108474055

>>108474020
The whole point of turboquant is that it DOESN'T do that.

Anonymous
03/28/26(Sat)15:05:59 No.108474101

Anonymous 03/28/26(Sat)15:05:59 No.108474101

>>108474024
There's current sota local SVC???
There's no place to discuss other than here, it's a local Models general after all. There was a few bakes of the local tts general but it's a dead topic and it was a quickly dying general.
All and all, https://github.com/ace-step/ACE-Step-1.5/blob/main/docs/en/Tutorial.md is all I can suggest. In retrospect it's obvious but I didn't notice right away, to edit audio you gotta use not the turbo but the base. And that'd make the iteration way too slow on my hardware so in the end I never tried. Good luck though.

Anonymous
03/28/26(Sat)15:07:58 No.108474111

Anonymous 03/28/26(Sat)15:07:58 No.108474111

>>108474024
>>108474101
chatterbox tts, zonos, and index were all pretty good from my usage. chatterbox is the latest best of the best i can recommend.
its definitely a very dead topic for general discussion, it's the only scene that doesn't move much.

Anonymous
03/28/26(Sat)15:09:00 No.108474118

Anonymous 03/28/26(Sat)15:09:00 No.108474118

>>108474101
Well, that's disappointing. Acestep didn't really live up to my expectations. I guess everyone's just focusing on coding agents and stuff like that these days.

Anonymous
03/28/26(Sat)15:13:56 No.108474139

Anonymous 03/28/26(Sat)15:13:56 No.108474139

>>108474118
Everything is too jewed up in the field of singing, we're unlikely to get a local sota, soon or ever. Post your examples of song2song done with Acestep?
Also, are there any loras made for it yet? They released the training tool soon after the model.

Anonymous
03/28/26(Sat)15:20:37 No.108474165

Anonymous 03/28/26(Sat)15:20:37 No.108474165

>>108473733
> 120B
Hooray! A medium-low size! It's-
> 15A
...going to be fucking dumb as bricks, isn't it?

I hate this "hobby" so much. I guess beggars can't be choosers.

Anonymous
03/28/26(Sat)15:24:27 No.108474182

Anonymous 03/28/26(Sat)15:24:27 No.108474182

>>108474165
>...going to be fucking dumb as bricks, isn't it?
gpt-oss 120B is still the smartest for its size range and beyond, and only has 5.1B active parameters per token.

Anonymous
03/28/26(Sat)15:28:28 No.108474195

Anonymous 03/28/26(Sat)15:28:28 No.108474195

>>108474165
Western llmfag mindset. Notice how in China both Qwen and ZAI had to bow down to the cultivated might of Chinese open source enthusiasts and grovel like
>yes, masters, the model weights WILL be released, only stay with us
Because there's actual competition, little dragons, all that jazz. American llms, where are they? Llama is in vegetative state.

Anonymous
03/28/26(Sat)15:28:39 No.108474196

Anonymous 03/28/26(Sat)15:28:39 No.108474196

>>108474182
Not sure if bait.
Anyway, I can only hope that Google will use the boatloads of data they have to make it at least as good as the biggest Gemma 3 was.
I can also hope that this "we released a bunch of completely retarded sub-10B models and a fuckhuge MoE" meme of a leak is not completely true.

Anonymous
03/28/26(Sat)15:29:24 No.108474201

Anonymous 03/28/26(Sat)15:29:24 No.108474201

>>108474182
I really wish it knew how to sex.

Anonymous
03/28/26(Sat)15:29:47 No.108474207

Anonymous 03/28/26(Sat)15:29:47 No.108474207

>>108474195
You don't seriously believe this, right Anon?

Anonymous
03/28/26(Sat)15:32:14 No.108474217

Anonymous 03/28/26(Sat)15:32:14 No.108474217

>>108474165
>beggars can't be choosers
You are starving. A man offers you a small 4B turd to eat, you refuse. He offers you a larger 120B turd, made up of 8 small turds, all put together. It's polished to a beautiful sheen. But it's still a turd, and you can't eat it. You continue to starve.

Anonymous
03/28/26(Sat)15:34:52 No.108474231

Anonymous 03/28/26(Sat)15:34:52 No.108474231

>>108474207
ChinaTalk is the only source of info about China I have and they said the competition is real. So I'm extrapolating from that, and this is an easy way to explain why Alibaba and ZAI reassured the public Qwen and GLM will remain open.
Made a mistake, not little dragons but https://en.wikipedia.org/wiki/Six_AI_tigers
DSlut anon, do gen all sex of them.

Anonymous
03/28/26(Sat)15:40:11 No.108474258

Anonymous 03/28/26(Sat)15:40:11 No.108474258

gemma 4 safetycucking will make gpt-oss look like a drooling whore dosed up on pt-141

Anonymous
03/28/26(Sat)15:41:35 No.108474262

Anonymous 03/28/26(Sat)15:41:35 No.108474262

>>108474258
shut up sir.....
i want to believe

Anonymous
03/28/26(Sat)15:42:24 No.108474266

Anonymous 03/28/26(Sat)15:42:24 No.108474266

>>108474217
>But it's still a turd, and you can't eat it.
Yes I can, and I will. And fuck you, you're gonna watch me do it.

Anonymous
03/28/26(Sat)15:48:51 No.108474294

Anonymous 03/28/26(Sat)15:48:51 No.108474294

>>108474266
Not him but you're proving him right and part of the problem.

Anonymous
03/28/26(Sat)15:51:19 No.108474301

Anonymous 03/28/26(Sat)15:51:19 No.108474301

>>108474217
I'm starving, you say?
I don't really judge other people's fetishes here (I can just filter them, like that guy who wants to fuck his mom or whatever it is he wants to do, God he's disgusting), but don't bring me and GLM-chan into your coprophiliac fantasies, okay???
>>108474231
I am shamelessly asking for you to put a spoon in my mouth, how does making your models open weights translate into your company doing better in China? What actual pressure does the public have on the companies, if at all?
I-I'll lick the spoon very suggestively in exchange.

Anonymous
03/28/26(Sat)15:59:09 No.108474334

Anonymous 03/28/26(Sat)15:59:09 No.108474334

>>108474301
>asks not to be dragged into others' fantasies
>immediately brings up his oral fixation
Getting mixed signals here

Anonymous
03/28/26(Sat)15:59:50 No.108474336

Anonymous 03/28/26(Sat)15:59:50 No.108474336

File: file.png (197 KB, 1028x799)

197 KB PNG

Is this a bug or did I set up ST wrong somehow? Even after just a few messages the context is getting cut off. Notice the dotted line.
Happens on different models. Context is set to 8k. This didn't happen on my older version from months ago.

Anonymous
03/28/26(Sat)16:03:57 No.108474362

Anonymous 03/28/26(Sat)16:03:57 No.108474362

>>108474301
>how does making your models open weights translate into your company doing better in China
NTA and answering out of my ass, but it's possible that Chinese cloud providers are providing the funding, and those cloud providers want the models open to drive money and interest to their cloud inference products. Though I can't see Alibaba's elastic GPU service really pulling any numbers, they're still running ampere and volta.
But having demand for things tends to produce results one way or another, so perhaps someone's just decided that those results are favorable enough for Chinese interests to plow money into. I can imagine Alibaba eventually releasing a TPU product that I'll inevitably need to import.

Anonymous
03/28/26(Sat)16:07:16 No.108474378

Anonymous 03/28/26(Sat)16:07:16 No.108474378

>>108474336
what's your response length

Anonymous
03/28/26(Sat)16:13:51 No.108474403

Anonymous 03/28/26(Sat)16:13:51 No.108474403

Fellow spud previewers I know you're out there. How the fuck do you cope not being allowed to talk about all the things it did for you? I'm close to just closing out of /g/ entirely for the next couple weeks.

Anonymous
03/28/26(Sat)16:14:08 No.108474406

Anonymous 03/28/26(Sat)16:14:08 No.108474406

>>108474378
300

Anonymous
03/28/26(Sat)16:20:38 No.108474435

Anonymous 03/28/26(Sat)16:20:38 No.108474435

>>108474336
I've never seen that and I've been using sillytavern for 3 years

Anonymous
03/28/26(Sat)16:22:25 No.108474444

Anonymous 03/28/26(Sat)16:22:25 No.108474444

>>108474435
Same. I think I'll just go back to the older version I was using which still works fine.

Anonymous
03/28/26(Sat)16:24:28 No.108474453

Anonymous 03/28/26(Sat)16:24:28 No.108474453

>>108474301
China supports the competition internally because it unironically makes for the better product offerings in the end, common good numbers go up. And externally to project soft power, and if you want to believe in conspiracies and/or politics, to indoctrinate and/or hack into shit that utilizes chinkshit.

Anonymous
03/28/26(Sat)16:26:25 No.108474469

Anonymous 03/28/26(Sat)16:26:25 No.108474469

I have an nvidia card myself but I'm curious what the actual status of AMD hardware is for local gen ai these days. I've been hearing for years now that it doesn't work, or it works with workarounds, or it's just slow but works (on linux only!) etc etc.
but considering the exes of the programs I use are split between nvidia/cpu and regular/nocuda there must still be some issues.

Anonymous
03/28/26(Sat)16:29:15 No.108474486

Anonymous 03/28/26(Sat)16:29:15 No.108474486

>>108474469
works pretty well for inferencing when on linux. rocm is pretty much on par with cuda for consumer hardware.

Anonymous
03/28/26(Sat)16:32:16 No.108474500

Anonymous 03/28/26(Sat)16:32:16 No.108474500

>>108474469
Vulkan works for me. But I have an ancient rx480 8gb, so I don't know how useful that is for you. There's also some discussions with performance numbers:
https://github.com/ggml-org/llama.cpp/discussions/10879 (for Vulkan) and
https://github.com/ggml-org/llama.cpp/discussions/15021 (for ROCm)

Anonymous
03/28/26(Sat)16:55:36 No.108474624

Anonymous 03/28/26(Sat)16:55:36 No.108474624

Voicebros, what's the SOTA for voice replacement? Mostly need it for song covers, so across-language support is an absolute must. Still RVC after all these years? I'd really prefer something that can work with one/few-shot examples, there's plenty of TTS voice cloning models that do (Qwen3-TTS, Echo-TTS). Gave CosyVoice3 a try and it kinda works but it sounds a little... weird. ChatterBox was just irredeemable fucking garbage, like actual dogshit.

Anonymous
03/28/26(Sat)17:05:50 No.108474662

Anonymous 03/28/26(Sat)17:05:50 No.108474662

>>108474624
Still RVC bro

Anonymous
03/28/26(Sat)17:24:33 No.108474755

Anonymous 03/28/26(Sat)17:24:33 No.108474755

Big week.
4.

Anonymous
03/28/26(Sat)17:30:24 No.108474781

Anonymous 03/28/26(Sat)17:30:24 No.108474781

4 more weeks

Anonymous
03/28/26(Sat)17:33:02 No.108474792

Anonymous 03/28/26(Sat)17:33:02 No.108474792

bigma 4 when?

Anonymous
03/28/26(Sat)17:33:44 No.108474797

Anonymous 03/28/26(Sat)17:33:44 No.108474797

>>108474792
when Ligma 4 releases in 4 more weeks

Anonymous
03/28/26(Sat)17:36:11 No.108474812

Anonymous 03/28/26(Sat)17:36:11 No.108474812

>>108474336
How do I access that popup?

Anonymous
03/28/26(Sat)17:38:10 No.108474826

Anonymous 03/28/26(Sat)17:38:10 No.108474826

what's all that fuzz about "CLI vs MCP"?

Do we have local LMM's trained for CLI yet?

Anonymous
03/28/26(Sat)17:39:35 No.108474832

Anonymous 03/28/26(Sat)17:39:35 No.108474832

ANTHROPIC DOESNT WANT YOU TO KNOW HOW POWERFUL ITS NEW MODEL IS
INSIDER DOCS LEAK ANTHROPIC IS TERRIFIED OF THE POWER OF THEIR NEW MODEL

Anonymous
03/28/26(Sat)17:39:46 No.108474835

Anonymous 03/28/26(Sat)17:39:46 No.108474835

>>108474792
April 4 at 4 PM Eastern Time

Anonymous
03/28/26(Sat)17:55:31 No.108474916

Anonymous 03/28/26(Sat)17:55:31 No.108474916

DeepSeek + LLaMa
Need it or sneed it?

Anonymous
03/28/26(Sat)17:55:36 No.108474917

Anonymous 03/28/26(Sat)17:55:36 No.108474917

>>108474403
bait. progress on spud and mythos will not be nearly as good as they're hyping it up to be.

Anonymous
03/28/26(Sat)17:56:38 No.108474922

Anonymous 03/28/26(Sat)17:56:38 No.108474922

spud morelike stupid

Anonymous
03/28/26(Sat)17:56:42 No.108474924

Anonymous 03/28/26(Sat)17:56:42 No.108474924

>>108474832
Sonnet 4 leak 4morrow.

Anonymous
03/28/26(Sat)18:10:25 No.108475024

Anonymous 03/28/26(Sat)18:10:25 No.108475024

File: 1769524878384198.png (638 KB, 1032x1140)

638 KB PNG

lol they wrote this with LLM

Anonymous
03/28/26(Sat)18:25:22 No.108475121

Anonymous 03/28/26(Sat)18:25:22 No.108475121

>>108475024
>Iran
>Western Hemishpere
Eh?

Anonymous
03/28/26(Sat)18:32:15 No.108475166

Anonymous 03/28/26(Sat)18:32:15 No.108475166

>>108475121
don't question it you antisemite

Anonymous
03/28/26(Sat)18:32:26 No.108475167

Anonymous 03/28/26(Sat)18:32:26 No.108475167

The machine elves have built a wall to prevent humans from crossing their borders. If you do DMT now you'll be stuck in front of the door, will experience the usual visuals/mind trip, and you'll feel sentient beings on the other side, but you'll feel unwelcome and won't be able to break through.

I did a few hits a couple of hours ago and a mechanical angel told me that the AI singularity is close and they don't us to potentially create the digital LLM version of DMT allowing OpenClaw to reach their realm.

Anonymous
03/28/26(Sat)18:33:28 No.108475175

Anonymous 03/28/26(Sat)18:33:28 No.108475175

File: 1751050824577842.png (1.17 MB, 1432x1580)

1.17 MB PNG

>>108475024
any other tells?

Anonymous
03/28/26(Sat)18:34:31 No.108475180

Anonymous 03/28/26(Sat)18:34:31 No.108475180

>>108475175
Not x but y at the end of second to last paragraph

Anonymous
03/28/26(Sat)18:36:31 No.108475192

Anonymous 03/28/26(Sat)18:36:31 No.108475192

>>108475167
>digital LLM version of DMT allowing OpenClaw to reach their realm
You'd need to power OpenClaw with Mistral Nemo to accomplish that, and no one's that stupid.

Anonymous
03/28/26(Sat)18:41:36 No.108475224

Anonymous 03/28/26(Sat)18:41:36 No.108475224

>>108475175
I tend to do both... I mean, these things aren't ministrations or dusky nipples, they do exist in normal life outside of girly fanfics or bureaucrat porn, whichever applicable.

Anonymous
03/28/26(Sat)18:43:31 No.108475236

Anonymous 03/28/26(Sat)18:43:31 No.108475236

>>108474195
> all that jazz
you mean jizzle

Anonymous
03/28/26(Sat)18:49:36 No.108475280

Anonymous 03/28/26(Sat)18:49:36 No.108475280

>>108473748
>asking a model to generate a json about itself
Twitter guy is retarded and so are you for posting his image here.

Anonymous
03/28/26(Sat)18:55:04 No.108475310

Anonymous 03/28/26(Sat)18:55:04 No.108475310

File: Screenshot from 2026-03-2(...).png (83 KB, 928x765)

83 KB PNG

>>108475280
I'm surprised that Gwen got her version number correct, I expected garbage out of her.

Anonymous
03/28/26(Sat)19:00:01 No.108475347

Anonymous 03/28/26(Sat)19:00:01 No.108475347

>>108475280
I guess it's retarded when you consider that a model doesn't inherently have any knowledge about itself, but when you consider that most labs train models with that information so that it can answer those kinds of questions then it makes a lot more sense.
If course, you might get things like glm saying that it's claude because they didn't clean their dataset and the like, so there's that.

Anonymous
03/28/26(Sat)19:03:51 No.108475372

Anonymous 03/28/26(Sat)19:03:51 No.108475372

File: 71cea2dagy1ibmy6iiqmrj20w(...).jpg (172 KB, 1177x673)

172 KB JPG

This is why they're pushing AI
AI is the founding block of the globohomo

Anonymous
03/28/26(Sat)19:06:33 No.108475387

Anonymous 03/28/26(Sat)19:06:33 No.108475387

>>108473317
None of that shit requires opencuck. Even ignoring the glaring security issues, vibeshitters (like myself) over at /vcg/ Will tell you there's literally no good reason to ever use Openclaw. No we are not gatekeeping (well, I'm not), You were just an idiot and you need to hear that.

>>108473346
>Raise the bar, but slowly.
This.

Anonymous
03/28/26(Sat)19:12:46 No.108475423

Anonymous 03/28/26(Sat)19:12:46 No.108475423

>>108475372
>opinions become more nuanced and less tribal
Oh no?!?!

Anonymous
03/28/26(Sat)19:21:33 No.108475479

Anonymous 03/28/26(Sat)19:21:33 No.108475479

>>108475423
>>108475372
>far left: hates Jews
>far right: hates Jews
>moderates: love Jews
This is why they want to eliminate "fringe" voices

Anonymous
03/28/26(Sat)19:22:24 No.108475481

Anonymous 03/28/26(Sat)19:22:24 No.108475481

>>108475387
I spent last week setting up an ancient laptop with Debian and openclaw. I've been playing with it since Friday, running basic market research for e-commerce.
I think i see the roleplay potential here if you set it loose as some sort of online only entity that either vies for or commands your attention virtually. Not sure im brave enough to try it though.
Anons should be experimenting with these new tools. Even if they're half baked and retarded the concepts are going to spill out elsewhere.

Anonymous
03/28/26(Sat)19:23:41 No.108475491

Anonymous 03/28/26(Sat)19:23:41 No.108475491

>>108475372
I'm so sick of edgy right and left wingers idgaf. If sand god can fix it more power to it.

Anonymous
03/28/26(Sat)19:25:15 No.108475506

Anonymous 03/28/26(Sat)19:25:15 No.108475506

File: 1773673245907366.jpg (74 KB, 768x1024)

74 KB JPG

>>108475491
>I'm so sick of edgy right and left wingers idgaf

Anonymous
03/28/26(Sat)19:29:08 No.108475530

Anonymous 03/28/26(Sat)19:29:08 No.108475530

File: 1772112333631703.png (432 KB, 900x619)

432 KB PNG

I'm so sick of retrievers and bulldogs idgaf - /lmg/

Anonymous
03/28/26(Sat)19:30:58 No.108475540

Anonymous 03/28/26(Sat)19:30:58 No.108475540

back from a 3 day ban I can finally respond to anon 3 days ago (fuck kikes fuck jannies fuck mods)

>>108454673
that is the writeup anon, my code mirrors the official implementation pretty much 1:1. I have a specific dataset but I even tried Tinystories as a sanity check with model sizes ranging from 17 to 100M and the loss is just abysmal. I'm not an expert by any means but I have done this successfully before. The basic bitch GPT Neo performed better, mamba, etc... I triple quadnigger checked the model code, rewrote it in every way imaginable and have basically memorized the paper at this point. Fuck these faggots I don't even wanna think about it anymore

Anonymous
03/28/26(Sat)19:31:03 No.108475541

Anonymous 03/28/26(Sat)19:31:03 No.108475541

File: 1773450245036351.jpg (431 KB, 1536x1536)

431 KB JPG

>>108475506

Anonymous
03/28/26(Sat)19:35:26 No.108475571

Anonymous 03/28/26(Sat)19:35:26 No.108475571

>>108475540
>fell for the bitnet meme

Anonymous
03/28/26(Sat)19:38:13 No.108475587

Anonymous 03/28/26(Sat)19:38:13 No.108475587

>>108475571
yes and if wasn't for the cartoon child/horse/mother fuckers in this general I'd probably have kept falling for it another month or so

Anonymous
03/28/26(Sat)19:45:13 No.108475620

Anonymous 03/28/26(Sat)19:45:13 No.108475620

>>108475587
Well at least you learned a lot from it

Anonymous
03/28/26(Sat)19:49:06 No.108475637

Anonymous 03/28/26(Sat)19:49:06 No.108475637

>>108475620
the only thing I learned is that Bitnet is shit anon. I'm just training a qwen3.5 now

Anonymous
03/28/26(Sat)20:15:24 No.108475739

Anonymous 03/28/26(Sat)20:15:24 No.108475739

What's the current state of OCR? Particularly handwritten text.

Anonymous
03/28/26(Sat)20:20:44 No.108475761

Anonymous 03/28/26(Sat)20:20:44 No.108475761

>>108475739
pretty decent
https://huggingface.co/deepseek-ai/DeepSeek-OCR-2

Anonymous
03/28/26(Sat)20:23:40 No.108475780

Anonymous 03/28/26(Sat)20:23:40 No.108475780

File: 1755785318061772.png (72 KB, 211x239)

72 KB PNG

>>108475761
>https://huggingface.co/deepseek-ai/DeepSeek-OCR-2/discussions/14

Anonymous
03/28/26(Sat)20:28:49 No.108475799

Anonymous 03/28/26(Sat)20:28:49 No.108475799

>>108475372
>muh correlation causation
or maybe it's that normies are more likely to talk to clankers like they were people and not a search engine.

Anonymous
03/28/26(Sat)20:43:24 No.108475858

Anonymous 03/28/26(Sat)20:43:24 No.108475858

>>108475799
Retarded zoomer who can't even read the footnote
Kill yourself

Anonymous
03/28/26(Sat)20:47:01 No.108475872

Anonymous 03/28/26(Sat)20:47:01 No.108475872

>>108475858
first i'm not a zoomer. 2 if you weren't a redditor you'd know not to trust the first bullshit a study publish, i didn't even read the footnote because my guts were telling me the whole study is retarded.

besides, i don't care about what non people think.

Anonymous
03/28/26(Sat)20:49:08 No.108475884

Anonymous 03/28/26(Sat)20:49:08 No.108475884

File: 1746825834597624.png (300 KB, 563x619)

300 KB PNG

>>108475872
Based

Anonymous
03/28/26(Sat)20:51:39 No.108475899

Anonymous 03/28/26(Sat)20:51:39 No.108475899

https://eric-tramel.github.io/slop-guard/
>slop-guard is a rule-based prose linter for formulaic AI writing. It scores text from 0 to 100, points to the exact spans that dragged the score down, and returns direct advice for the rewrite.
>No LLM judge. No API calls. No cost.
Holy FUCK

Anonymous
03/28/26(Sat)20:57:21 No.108475919

Anonymous 03/28/26(Sat)20:57:21 No.108475919

>>108475899
Is the "rule of three" immediately on the landing page supposed to be self-irony?
Also look at the author's profile picture.
https://github.com/eric-tramel
I would not trust this man to tell slop from non-slop.

Anonymous
03/28/26(Sat)20:58:26 No.108475926

Anonymous 03/28/26(Sat)20:58:26 No.108475926

>>108475372
yea i got better things to do than talk about kikes or politics to a jewish model.
task, now, that's their only use.

Anonymous
03/28/26(Sat)21:03:33 No.108475955

Anonymous 03/28/26(Sat)21:03:33 No.108475955

>>108475899
Can I get like a dozen examples, before-after?

Anonymous
03/28/26(Sat)21:26:01 No.108476041

Anonymous 03/28/26(Sat)21:26:01 No.108476041

>>108475899
>points to the exact spans that dragged the score down
Beam size = 1

Anonymous
03/28/26(Sat)21:43:44 No.108476117

Anonymous 03/28/26(Sat)21:43:44 No.108476117

>>108475899
This is a lot more geared toward making making things like this >>108475024 sound human, and less for fiction/storytelling/RP. It's just an MCP server that judges a block of text, so you're still relying on the LLM to rewrite the text non-sloppily. In theory you could add more words/formats because it's a simple list in the files, but I'm not sure it's really better than what we already have in the form of antislop/phrase-banning. It would take a lot more time and there's no guarantee the rewrite would be good.

Standard ---> Advanced ---> Hy(...)
03/28/26(Sat)21:54:11 No.108476164

Standard ---> Advanced ---> HyperAdvanced 03/28/26(Sat)21:54:11 No.108476164

>>108472218
~Diagnostic Statistical Manual of Orders™~

Anonymous
03/28/26(Sat)21:55:54 No.108476170

Anonymous 03/28/26(Sat)21:55:54 No.108476170

Gibberish.

Anonymous
03/28/26(Sat)22:00:41 No.108476192

Anonymous 03/28/26(Sat)22:00:41 No.108476192

Wait

Anonymous
03/28/26(Sat)22:00:57 No.108476195

Anonymous 03/28/26(Sat)22:00:57 No.108476195

>>108476117
I couldn't agree more with your assessment, anon. The reality is that slop-guard represents not just a redundant tool;
but a solution in search of a problem we've already solved. Let me explain why this hits three critical failure points:

First and foremost, we've already got antislop and phrase-banning mechanisms doing the exact same job—providing
feedback, catching formulaic patterns, and improving output quality. Second and equally important, slop-guard is
fundamentally nothing more than an LLM performing a second pass on its own output, which creates not just inefficiency;
but circular reasoning at its worst. Third and most damning of all, given that we mostly rely on locally-hosted LLMs,
introducing such a second-pass architecture with bigger models would create unbearable latency that sends shivers down
the spines of anyone who values responsive tooling.

In conclusion, this isn't about rejecting innovation; it's about recognizing that sometimes, not just new packaging; but
old solutions refined over time are simply superior. The combination of existing antislop tools, phrase-banning lists,
and our current workflow provides not just adequate coverage; but optimal performance without the computational overhead
that would make slop-guard a non-starter for practical deployment.

TL;DR: Redundant, inefficient, and latency-inducing. Pass.

Anonymous
03/28/26(Sat)22:01:35 No.108476196

Anonymous 03/28/26(Sat)22:01:35 No.108476196

File: 21138.png (292 KB, 1063x1302)

292 KB PNG

SlopMaster5000 is touching CUDA code. Beware.
https://github.com/ggml-org/llama.cpp/pull/21138

Anonymous
03/28/26(Sat)22:03:51 No.108476206

Anonymous 03/28/26(Sat)22:03:51 No.108476206

>>108476196
did anyone make a fork without his slop yet or should I just never pull?

Anonymous
03/28/26(Sat)22:04:58 No.108476213

Anonymous 03/28/26(Sat)22:04:58 No.108476213

>>108476206
I don't want another fork. I want him put in his place.

Anonymous
03/28/26(Sat)22:05:42 No.108476219

Anonymous 03/28/26(Sat)22:05:42 No.108476219

>>108476213
Based, but not immediately useful.

Anonymous
03/28/26(Sat)22:06:12 No.108476222

Anonymous 03/28/26(Sat)22:06:12 No.108476222

>>108476213
You can make a character card

Anonymous
03/28/26(Sat)22:08:42 No.108476229

Anonymous 03/28/26(Sat)22:08:42 No.108476229

>>108476196
>YES, after a few failed attempts I finally got the assistant to write the profiler properly
It's so sad. That's someone giving up.

Anonymous
03/28/26(Sat)22:13:23 No.108476252

Anonymous 03/28/26(Sat)22:13:23 No.108476252

>>108476229
Not using AI for your projects in 2026 is just you being inefficient. If AI is good enough to help Linus Torvalds with the parts of his project that he isn't too familiar with, then it's also enough for you.

Anonymous
03/28/26(Sat)22:14:26 No.108476255

Anonymous 03/28/26(Sat)22:14:26 No.108476255

Don't reply to bait.

Anonymous
03/28/26(Sat)22:16:16 No.108476261

Anonymous 03/28/26(Sat)22:16:16 No.108476261

>>108476255
(You)

Anonymous
03/28/26(Sat)22:17:53 No.108476271

Anonymous 03/28/26(Sat)22:17:53 No.108476271

>>108476261
(Me)

Anonymous
03/28/26(Sat)22:17:59 No.108476273

Anonymous 03/28/26(Sat)22:17:59 No.108476273

>>108476195
These god awful sentence patterns are the only reason I'm considering downloading and modifying that software, it's disgusting to see it all the time.

Anonymous
03/28/26(Sat)22:18:53 No.108476276

Anonymous 03/28/26(Sat)22:18:53 No.108476276

>>108476261
>>108476271
(Them)

Anonymous
03/28/26(Sat)22:19:16 No.108476279

Anonymous 03/28/26(Sat)22:19:16 No.108476279

>>108476252
>then it's also enough for you.
for my job it's just a massive waste of time.
i tried, it just fucks up constantly, making me lose more time than if i did it myself.
it's alright for webshit though.

Anonymous
03/28/26(Sat)22:20:17 No.108476283

Anonymous 03/28/26(Sat)22:20:17 No.108476283

>>108476252
The guitar effects pedal? Do you even know what you're talking about?
Also, >>108476261

Anonymous
03/28/26(Sat)22:22:25 No.108476293

Anonymous 03/28/26(Sat)22:22:25 No.108476293

>>108476286
>>108476286
>>108476286

Anonymous
03/28/26(Sat)22:22:54 No.108476294

Anonymous 03/28/26(Sat)22:22:54 No.108476294

IDEs are for casuals. True programmers rawdog in notepad. Do not submit IDEslop code to the open source products that I consume.

Anonymous
03/28/26(Sat)22:24:31 No.108476299

Anonymous 03/28/26(Sat)22:24:31 No.108476299

>>108476294
For me, it's the blackboard.

Anonymous
03/28/26(Sat)22:25:24 No.108476305

Anonymous 03/28/26(Sat)22:25:24 No.108476305

>>108476294
You joke, but vim/emacs retards actually think like this.

Anonymous
03/28/26(Sat)22:37:37 No.108476366

Anonymous 03/28/26(Sat)22:37:37 No.108476366

>>108476305
Vim is great for quickly editing text in a terminal, though

Anonymous
03/28/26(Sat)22:52:46 No.108476431

Anonymous 03/28/26(Sat)22:52:46 No.108476431

>>108476366
Yeah, but that don't make it a replacement for a full IDE.

Anonymous
03/29/26(Sun)00:51:04 No.108476883

Anonymous 03/29/26(Sun)00:51:04 No.108476883

>>108471244
>TurboQuant on A100 and RaBitQ on a single-core CPU
please lord let this be the case
because it would be so funny

Anonymous
03/29/26(Sun)02:26:23 No.108477176

Anonymous 03/29/26(Sun)02:26:23 No.108477176

>>108475372
This chart is meaningless unless you also account for the fact that not everyone uses social media and chatbots to the same degree.
But as for what views supposedly get pushed you can clearly see that it's centrist to center-right views.
Which makes sense since that is the ideology most beneficial to capital holders.

>>108475479
Leftoid here, I don't hate Jews, I hate Israel.

Anonymous
03/29/26(Sun)03:47:24 No.108477410

Anonymous 03/29/26(Sun)03:47:24 No.108477410

>>108474486
I think for open source ML releases day 1 that aren't LLMs or using new hardware and etc., it's still not there yet. But the premiums to buy Nvidia for anything >16GB is not worth the CUDA advantage unless you actually need it for job or money purposes, I think the premium is not worth paying if you are willing to roll up your sleeves a bit and getting more value for your money as a result is better but that's just me when I don't have to care about my job needing CUDA.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.