[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1770358397084.png (1.5 MB, 1600x672)
1.5 MB
1.5 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108466262 & >>108459276

►News
>(03/26) CohereLabs releases Transcribe 2B ASR: https://hf.co/CohereLabs/cohere-transcribe-03-2026
>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts
>(03/26) ggml-cuda: Add NVFP4 dp4a kernel #20644 merged: https://github.com/ggml-org/llama.cpp/pull/20644
>(03/25) LongCat-Next native multimodal 74B-A3B released: https://hf.co/meituan-longcat/LongCat-Next
>(03/25) mtmd: Add DeepSeekOCR Support #17400 merged: https://github.com/ggml-org/llama.cpp/pull/17400

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 17745737552553.png (2.86 MB, 1509x1541)
2.86 MB
2.86 MB PNG
►Recent Highlights from the Previous Thread: >>108466262

--Unsloth Studio release and fine-tuning discussion:
>108466593 >108466604 >108466618 >108466757 >108466771 >108466802 >108466835 >108466845 >108467416 >108467512 >108467529 >108467553 >108467584 >108467665 >108467709 >108467725 >108467828 >108467852
--Debating MCP vs skills for LLM tool integration:
>108468260 >108468314 >108468364 >108468471 >108468484 >108468516 >108468528 >108468529 >108468545 >108468624 >108470310 >108470328 >108470372 >108468377 >108468459 >108468479 >108468481
--llama-server update adds built-in tools and enhanced functionality:
>108467191 >108467207 >108467243
--Hadamard transforms for V-cache added to ik_llama:
>108466588 >108466600
--Turboquant enabling efficient KV cache quantization without GQA tradeoffs:
>108468782 >108468792 >108468970 >108468987 >108469019 >108469095
--Configuring koboldcpp banned strings in SillyTavern:
>108467399 >108467422 >108467473 >108467491 >108467509 >108468165 >108468844 >108469015
--ASCII art generation struggles with LLMs:
>108467947 >108468032 >108468043 >108468073 >108468099 >108468156 >108468101
--NeurIPS 2026 policy confusion over U.S. sanctions and Chinese participation:
>108467980 >108468027 >108468041 >108468048 >108468062
--Anon unaware of existing ignore-robots-txt flag in MCP fetch tool:
>108466397 >108466415 >108466432 >108466496
--Miku (free space):
>108467947 >108468032 >108468821 >108468908 >108469368 >108469673 >108470528

►Recent Highlight Posts from the Previous Thread: >>108466266

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Important: never respond to vagueposts.
>>
you had to be delusional to think that anything was coming before next week. next week, however...
>>
File: out.webm (1.43 MB, 848x480)
1.43 MB
1.43 MB WEBM
>>108470853
>>
>>108470888
Must be true but if not, I will rotate you.
>>
Local losted
>>
turboquant in llmaocpp when??????
>>
>>108470944
It'll come when MTP and tensor split do.
>>
>>108470951
>tensor split
huh?
>>
Tetonation thread. Should I compile new llama.cpp? I'm afraid.
>>
You can pipe the llama.cpp webui out through your router to a domain if you use an API key, right? It's safe, right?
>>
>>108471062
just forward the port on your router bro
you won't even need an api key dude
>>
>>108471062
There's literally a warning when you run the server.
>>
>>108471065
Doesn't this mean any rando can access it?
>>
>>108471071
Don't worry about it mate
>>
/lmg/ knew about DeepSeek-R1 a week before the huge Nvidia crash.
TurboQuant lead to a Samsung/Micron/Hynix crash while being a nothingburger paper released last year that claims 4x VRAM saving quanting KV cache from FP16 to 3.5bit (no shit), whereas Q4 (even native Q4) has been the norm for more than year.
The market is really fucking dumb wrt. tech hype.
>>
>>108471034
https://github.com/ggml-org/llama.cpp/pull/21085
better not, pwilkin is "fixing" the parser again
>>
>>108471107
There's a reason one of Buffet's main principles is to only invest in businesses he understands.
Tech illiterate speculators gambling on tech stocks they don't remotely understand deserve to lose every penny.
>>
Computer generate me a Super Famicom eroge where everything is procedurally generated
>>
>>108471151
One day we'll get this, for now you'll have to cope with qwen 3 coder next installing Arch Linux for you.
>>
>give claude a task to update a pentest server on lmarena
>it just does it
>use the same prompt in claude.ai
>thinking about ethical implications... for a whole minute
Tiresome. There should be a law against prompt injection
>>
Which tool and model are you using?
I tried opencode with glm-4.7-flash-q4 but it sucks.
>>
>>108471118
pwilkinbros... we won??
>>
File: 1773260166464819.png (2.07 MB, 772x4729)
2.07 MB
2.07 MB PNG
https://zhuanlan.zhihu.com/p/2020969476166808284
TurboQuant drama incoming.

Chink researcher from ETH Zurich accuses TurboQuant authors of:
- Failure to properly credit and discuss prior work (RaBitQ)
- Misrepresentation of RaBitQ’s theoretical results
- Deliberately unfair experimental comparisons (Running TurboQuant on A100 and RaBitQ on a single-core CPU)
>>
>>108471244
>muh accolades
so this is why illya autisms right?
>>
>>108471251
Don't think that's what they're aiming for because TurboQuant's claimed gains are unlikely to be realized anyways.
>>
I can see Google delaying Gemma 4 because Qwen 3.5 was too good and they can't afford releasing something that is not SOTA at least for a while.

Qwen 3.5 27B (heretic) is not even that bad for ERP if you don't make it generate the usual "book style" / "novel" purple prose (though when I loaded up Ministral 14B for an unfair comparison, I liked Ministral's writing style more, even if it has general retardation for the first couple conversation turns and doesn't follow character instructions very well).
>>
>>108471257
no I mean could the RaBitQ researcher's even do their paper without looking at illya's implementation?
>>
>>108471259
How do you tell it not to use the novel/book style?
>>
File: newton.jpg (78 KB, 534x400)
78 KB
78 KB JPG
>>108471281
Most (all) scientific work builds on each other. The problem here seems to be one of missing attribution and unfair comparison with the base work.
>>108471244
So
> TQ used my work without attribution, then nefariously claimed their method was better.
>>
>>108471259
Gemma 4 will have so much literature power it'll make other AI slop trained models look retarded in comparison.
>>
>>108471297
Qwen is so trained to follow directions that it'll stumble upon its own reasoning. You can't change its behavioural output format that much.
>>
>>108471062
if you need to access it remotely, just use tailscale
>>
>>108471297
Set up your roleplay so it's more similar to a theatrical play script, describing direct and non-obvious actions only, and avoiding "he/she says" and useless adverbs as much as possible, only using asterisks for actual emphasis. In short, make the roleplay dialogue-focused.

>(Anon appears surprised that other people haven't started doing this yet.)

I find that most slop in LLMs comes from typical "internet roleplay" / CAI-like conversations and if you break out from them you'll see less of it.
>>
File: Capture.png (59 KB, 975x1010)
59 KB
59 KB PNG
Why is qwen so dumb? I mean I'm running the 0.8b variant but we are at version 3.5 ffs
>>
>>108471364
Was it wrong? 1.9 IS larger.
>>
>>108471367
It continues to say "Wait" and proceeds to compare them in different notion for another 15 min
>>
>>108471378
There's something wrong with either Qwen 3.5 implementation or maybe it is a quant problem.
Llama webui would often end up in infinite reasoning with 9b. 3000 tokens is max for a normal query (not programming related) but that shit would just go on and on.
With my client that wasn't an issue but my llama is one month old now. Plus, a single \n in a wrong place can mess up lot of things.
>>
>>108471359
Does that method work for most models?
>>
>>108471422
I just roleplay like that and I don't generally see the slop expressions that most people here and elsewhere complain about. Dialogue too has its own overused phrases though, like "don't be shy" (this is both in Qwen 3.5 and Gemma 3).
>>
>>108471259
They will still be ahead in multilingual most likely but the main issue right now for Google is compute power and squeezing out more with what they have. They need to do a Gemini 3.2 release when they are falling behind their competitors in key real world use case scenarios because they benchmarkmaxxed the wrong stuff like ARC-AGI and Humanity's Last Exam instead of agentic stuff. I can't use Gemini 3.1 at work with Copilot anymore because of that. Either Gemma 4 is delayed or they don't care about it being SOTA and just being a reflection of what they are doing in Gemini 3.1/3.2 for plebs as a gesture to still be more open than OpenAI and Anthropic.
>>
>>108471438
Gemma 3? why would you use that for rp? it has the most atrocious writing style, even worse than qwen 3.
>>
>>108471438
I don't understand the problem. As long as you take it in "this is an AI model" you can forgive lots of stuff unless it's really robotic like Qwen and the chink models.
>>
>>108471446
It's good but you need to guide it.
>>
>>108471151
why would you want procgen on an already vibecoded game?
>>
>>108471454
I m used to mistral style, so gemma 3 feels very robotic and sterile.
>>
>>108471446
>Gemma 3? why would you use that for rp?
Because when properly prompted for roleplay it has some restraint (while still being "open" and not outright denying requests) and doesn't jump on your dick by turn 2 like most other models default to. And again, I've not been using it for traditionally narrated roleplay.

>it has the most atrocious writing style, even worse than qwen 3.
Yes, At this point Qwen 3.5 27B (thinking disabled) seems overall better than Gemma 3 27B, multilingual capabilities aside. Gemma 3's vision capabilities still appear to have the upper hand for illustrations and mild NSFW, though.
>>
>>108471482
>Qwen 3.5 27B (thinking disabled)
Using this and getting 0.7 tokens per second on my 8gb vram gpu. Feels good man.
>>
>>108471479
They are both the same. You need to deep dive the prompts and if you use sillytavern use post history instruction. This gets added afterwards your prompt and you can dictate the style instead of just trusting allah with your tokens.
>>
>>108471497
That's because new llama.cpp is fucked up. --fit doesnt work and if you have any previous settings you need to halve your gpu slices (depends).
Slow tokens means that the genius llama devs flood your ram with both gpu memory and cpu offloading.
This took me a while to understand. I updated the shit since December...
>>
>>108471508
Whereas previously such conflict wouldn't even be possible. Sure this a thing for modest systems, I'm sure if you have 512 gb ram this isn't a problem.
>>
>>108471508
I didn't understand if it was a compile issue or some new flags...
>>
>>108471259
Maybe in house Gemma 4 is losing to Qwen 3.5 on Openclaw bench and they need to retrain
>>
File: file.png (512 KB, 1470x2484)
512 KB
512 KB PNG
>>108471364
If you compare the mememarks, intelligencewise, the really small parameter LLM has already plateaued in performance. Sure, there's improvement but it's nothing like what you see Qwen 3.5 2B and 3B and with Llama 8B vs Qwen 3.5 9B being a gigantic gulf where the latter can beat even the Lllama 3 70B in a lot of things. The only real improvement at those low sizes is reasoning and agentic stuff which is a meme at those low sizes at the current performance level. It's vastly much more useful and capable but asking them to do big boy stuff is still not better and probably won't be until we get a paradigm shift again. But I will say that from what I can see, differences in terms of writing and ERP for the doable sizes that matter is still improving. Using MythoMax based on Llama 2 vs a Mistral Nemo mememerge is a vast gulf, That mememerge would easily mog any L2 70B tune of the era like Euryale.
>>
>>108471508
oh I was using -ngl 999. I just tried --fit on and now I'm getting 2.2 tokens per second.
Is there anything else I could change to improve performance?

llama-server \
-m "$HOME/Desktop/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf" \
--host 0.0.0.0 \
--port 8080 \
--rope-scaling linear \
--fit on \
-c 8196 \
-t 8 \
-fa on \
--no-slots \
--jinja \
--chat-template-kwargs '{"enable_thinking": false}'
>>
File: file.png (266 KB, 1502x1893)
266 KB
266 KB PNG
>>108471482
The multilingual advantage of Gemma 3 is dead, full stop, with the release of Qwen 3.5 27B. There isn't a single language in the mememark nor in practical testing /here/ where Gemma holds an advantage anymore even for obscure African languages. Would like to be proven wrong but Gemma 3 is officially obsolete with that being the case.
>>
>>108471497
How? i m getting 1.4t/s at 32k context on a 6gb gpu.
>>
>thing doesn't work waaaaah
>nvm i was using it wrong
>hauhau uncensored aggressive hardcore ultimate deluxe super exxxtreme edition gated open ascended
Like clockwork.
>>
>>108471554
idk nigga, help me. >>108471541
>>
If Mythos is as good as Anthropic is hyping it up to be (which I doubt) I don't give it two weeks before we see 'OpenAI buys 80% of Samsung's future memory stock until 2035' on the news
>>
>>108471568
If most of the model is offloaded to cpu don't use flash attention, it will make it much slower.
>>
>>108471541
Your offload is slow, for whatever reason, you should be able to dedicate more threads to the task which tells me your system is weak to only be able to dedicate 8 cores in 2026 to the task. The big one you're missing is KV cache quantization with "-ctk q8_0 -ctv q8_0" will give you a teensy more room to load layers, you can probably even afford doing "-ctv q4_1" instead of q8_0 since 8k context is so short the quality damage won't show. But I've been using K quantization at 8 and V quantization at 4 having no issues with general queries with a IQ4_XS quant.
Also, I am using the llmfan's Arbitrary-Rank Ablation Heretic v3 tune instead.
https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v3-GGUF/tree/main
Going all the way to 0 refusals harms performance too much for me and I can live with 1 or two refusals and find ARA is much better at preserving intelligence and etc.
>>
https://huggingface.co/zai-org/GLM-5.1
https://huggingface.co/zai-org/GLM-5.1
https://huggingface.co/zai-org/GLM-5.1
ggufs when?
>>
>>108471541
Ah yeah, I use more modest settings but the fact that previously
>llama-server before December 2025 would run a model without any launch parameters
Compared today
>if i launch llama-server without any parameters just to debug it'll be 0.3 tokens per second because there is some conflict in the memory management
Sure I solved this this is not a tech support post but I wanted to outline the difference and why it's probably bad for the people who don't have a half tb of ram.
>>
>>108471598
catpic
>>
>>108471541
Fit is on by default, it'll mangle the prompt cache cache etc.
unlike the common predicament here, --jinja is always on and it wouldn't even be needed if you access the completion interface but that up to you.
Sonehow the chat template args also conflicts too. Use --reasoning off (might be demented double check).
>>
--fit off
>>
--trace on
>>
>>108471634
Doesn't accomplish anything.
t. A knower
>>
>still using only a single llm
Sorry gramps, the future is putting multiple LLMs on a group chat and telling them to argue with each other until they agree on a answer
>>
>>108471680
I barely have enough VRAM for one.
>>
>>108471680
In the business, we call that "agent swarms".
>>
>>108471684
You forgot that the chat space is a harness. Agent swarm inside a harness.
>>
How much dumber is Qwen3.5 30B A3B than Qwen3.5 27B?
Because I can run the former with 5x the tokens/second.
>>
>>108471693
Depends on the task. For some stuff like programming it's actually pretty good still but it often fails to capture nuance or consistency.
Classic MoE drawbacks that we've known for a long time.
>>
>>108471700
What about RP?
>>
>>108471693
Reasoning is pretty much mandatory for the A3B version. For all intents and purposes it's a 3B model made wider with MoE, not a regular 35B model with added sparsity. Most MoE models appear to be designed in this way, for some reason.
>>
>>108471710
A3B is 10x better at roleplay than 9b even with reasoning disabled.
>>
>>108471715
With thinking disabled it's noticeably dumber than the 27B version for roleplay, though, at least from what I could see.
>>
>>108471693
Ask it to do any string replacement in C and if it compiles and works- this the winner.
void replace_in_string(char *my_string, char *from_name, char *to_name);
>>
>>108471683
If you have good PP or enough RAM for cache, you can have multiple agents share one model.
>>
>>108471721
Ask it to do every instance not just one.
>>
All my models load in less than 1 second
>>
A year ago everyone here would've gotten swamped and crucified by the rest of /lmg/ for even implying that a dense model might have merit over a similar-sized MoE one.
>>
>>108471710
>it's 3B model made wider with MoE
It has fewer layers than the dense 27B version (40 vs 64), and its embedding and Attention matrices (which don't use sparsity) are also smaller. Surely this will have a negative effect on performance at least in some cases?
>>
>>108471598
GLM sisters?????
>>
>>108471721
>if it compiles and works
Surely that isn't your metric, right? Surely you're looking to see if it handles aliasing right, and whether it uses a suitable algorithm (i.e., boyer-moore or aho-corasick)?
"It werks" in C is the fucking stupidest mindset you can possibly have.
>>
>>108471815
Oh I'm sorry I didn't know that I am dealing with a fact checking autist.
My post implies that if the function works it will pass.
Jesus Christ when was the last time you licked a female vagina? Don't answer because this was a rhetorical question.
>>
>>108471866
>Defining a person's worth by whether they engaged in coitus.
>>
>someone submits a working PR for MTP that gives some decent gains unlike the previous attempts that all were slower than non-MTP
>it's closed due to "muh contribution guidelines"
rip
https://github.com/ggml-org/llama.cpp/pull/20981
>>
>>108471889
I am from Scandernavia. What is "coitis"?
>>
These fucking scandernavians are worse than the indians!
>>
>>108471900
Colitis is inflammation of the inner lining of the colon, causing symptoms like persistent diarrhea, abdominal pain, fever, and rectal bleeding. It is caused by infections, IBD (Crohn’s/ulcerative colitis), reduced blood flow (ischemia), or allergies. Treatments range from antibiotics to anti-inflammatory drugs and lifestyle changes. Cleveland Clinic +4
>>
>>108471894
Piotr needs be the only vibeshit tobenothonest
>>
File: toomuchslop.png (19 KB, 887x158)
19 KB
19 KB PNG
https://github.com/ggml-org/llama.cpp/pull/21097
Hopefully the same rules will apply to piotr.
>>
>>108471912
that it's the vibeshitter himself rejecting it adds insult to injury
>>
>>108471938
Fuck niggeramov. That actually looked like a good commit.
>>
>>108471959
>That actually looked like a good commit
No. He needs to prove that his implementation matches what he claims. He didn't present a single test.
>This is independently validated by arXiv:2603.19664 ("The Residual Stream Is All You Need"), which shows 100% token match at every budget level vs permanent-loss baselines like H2O, StreamingLLM, and SnapKV.
>>
Thank you for replying.
>>
>>108471959
Problem is that what began as a hobby has now consumed gg. If he thinks something is good it's because
>it's good and you are a mongoloid
>it's not cry about in github
>>
>glm5.1 out
>it's not on their main api (and thus not on openrouter either)
>it's not open source
>only available on their code subscription presumably to farm subs from the new generation of chink openclaw drones
Even if this ends up getting released elsewhere, I'm not very optimistic. This screams "rushed glm5-code branch batched as a main step" to cash in on the dumb openclaw hype in china. I doubt this model is good for anything else and I don't need to try to tell this.
>>
File: ComfyUI_00112_.png (917 KB, 1024x1216)
917 KB
917 KB PNG
>>108470850
I remember someone linking a git-repo that had a list of telltale signs of LLM slop that could be used for logit-bias removal ("Shivers down my spine", "something husky", etc). Does anyone ITT remember the repo or have a link to anything similar?
>>
>>108472081
>it's not open source
it will be on april7
>>
https://help.openai.com/en/articles/20001152
>When will Sora be discontinued?
>The Sora web and app experiences will be discontinued on April 26, 2026.
>The Sora API will be discontinued on September 24, 2026.
Why are they discontinuing their API? Isn't API inference supposed to cover costs and turn profit? Am I being schizo here, or are the Big Tech companies subsidizing API inference too for gaining marketshare or other reasons? For what other reason would they gut it?
>How is this /lmg/?
The costs of running inference on SOTA API models, as well as amortized development and hardware costs, have implications for the development of local models as well.
>>
>>108472024
What began as a hobby has now turned into being employed and accountable to HuggingFace.
>>
>>108472116
We can only guess, anon.
>Isn't API inference supposed to cover costs and turn profit?
Maybe. Or they lied about their efficiency.
>are the Big Tech companies subsidizing API inference too for gaining marketshare
Sure. Could be.
>For what other reason would they gut it?
Their mom told them to stop. Who knows.
>>
>>108472085
The one I recall (and can't find) was a standalone website. Looking for "anti slop claude skills" is turning up some results, but the ones I've skimmed through don't seem to include the more narrative-common phrases ("smoothes out my skirt", "cheeks growing pink", "with whitened knuckles" etc). I vaguely recall the website being a fuckhuge list compared to these shitty git repos.
>>
>>108472144
He'd need to maintain shit written by 1-shot hype chasers. I rather he didn't.
>>
>>108472116
>Isn't API inference supposed to cover costs and turn profit?
I was under the impression that was true for LLMs which are memory throughput bound, rather than diffusion models which have significantly higher compute costs.
>>
>>108472116
>Why are they discontinuing their API?
They also officially stated that they won't allow ERP mode. There's talk of the company having an IPO this year and these events lead credence to such rumors. You must squash anything that may upset shareholders.
>>
>>108472160
>>108472085
Speaking on slop, how slopped would you say this response is?
>>
you are more mentally ill than average for this site
>>
>>108472218
sir there are men and women ITT that used GLM to rp about dismembering infants. My shit is beyond tame by compared to that and even some shit you can find on AO3
>>
>>108471866
I'd rather be a cunning linguistist with my Waifu than kiss some cunt's hole.
>>
>>108472258
>never
>>
>>108472261

>>108471889
>>
>>108472204
The response seems fine to me but your prose - referring to female masturbation as "macaroni being stirred" - is better left behind back in 2022 and forgotten about.
>>
>>108472085
this? https://pastebin com/GNiNC8Vj
>>
File: 1768367284396670.jpg (71 KB, 863x1090)
71 KB
71 KB JPG
>>108471866
>being proud of putting your tongue inside a yeast hole
t.
>>
>>108472382
Ok you catched me. I am not an American high school student who thinks about sex every day.
>>
>>108472391
saar pls do the needful stop shitting up thread saar kindly use designated street
>>
>>108472449
What do you mean? I'm an American University Student.
>>
>>108472085
bookmarked these a while ago:
https://github.com/sam-paech/antislop-sampler/blob/main/slop_phrases_2025-04-07.json
https://github.com/SicariusSicariiStuff/SLOP_Detector/blob/b8bdfd29284daf61f342ba2a749120e8f4bbdad7/SLOP.yml
they're out of date though, no ozone etc
>>
jujufufuhhh
>>
my setup has finally all come together
all told, i ended up spending something on the order of 15k
definitely a bit more than i had intended to, but y'know, that's just how it goes
hoping to finally get some software up and running on these bad boys today
my current setup:
- 2x ASUS ascent GX10
- 1x 400G QSFP112 cable to connect them
- 1x AMD Ryzen 9 9950X
- 4x 64G DDR5-6400 PC5-51200 CL42
- 1x Samsung SSD 9100 PRO 2TB, PCIe 5.0x4 M.2 2280
- 1x MAG X870E TOMAHAWK WIFI AMD AM5
- 1x Lian Li Edge Series-1300W PSU
- 1x ARCTIC Liquid Freezer III Pro 360
- 1x Lian Li LANCOOL 217
- 1x NVIDIA GeForce RTX 5090
gonna probably try to start with GLM-4.7 on both machines, then go from there. certainly open to recommendations and suggestions
>>
>>108472382
Incel moment
>>
>>108472526
>open to recommendations and suggestions
depends what ur doing. glm4.7 is a good start.
qwen3.5 112b for coding
glm4.6 for less censored writing/rp
you could do cope-quant kimi and deepseek with ik_llama ig
>>
>>108472542
i really don't like qwen. in my experience, GLM has been the best for coding out of everything that i have tried
>>
>>108472526
>AM5 gamer gear instead of an SP3 enterprise quality solution
I'm interested to read your experiences with the GX10's, those seem neat.
>>
>>108472572
>>108465668
>>
>>108472599
i know nothing about hardware. i just asked a few friends for recommendations and ended up buying what they suggested. what's wrong with the AM5? i thought the 9950x3d was the gaming slop one, and the 9950x was the cheaper "have a job" style CPU?
yeah, i'm excited about those GX10 boxes. they're way smaller than i had expected them to be
>>108472616
i have only tried the 4 series. haven't checked out GLM-5 yet, but desu it's kind of reassuring in a sour grapes sort of way that it's garbage since i can't run it lol
>>
Hey. Adding Mistral's
>[MODEL_SETTINGS]{"reasoning_effort": "low"}[/MODEL_SETTINGS]
to Qwen 3.5's jinja template actually works pretty well.
>>
>>108472643
>what's wrong with the AM5?
It just has a lower ceiling for upgrades; the processors have significantly fewer PCI lanes, significantly fewer memory channels, and the boards don't support MCIO (the new hotness for card stacking).
My inference rig is basically the same as yours (9950x with a 5090) and I'm pretty happy overall with the purchase, but the next step up to me seems like "throw away the 9950x and buy an Epyc".
The GX10 boxes are kind of neat since I presume they stack mostly linearly? I wanna read your experience hooking them up with the llama.cpp RPC tooling.
>>
>>108471367
brainlet retard, it entirely depends on the context.
In software versioning for example, 1.11 > 1.9 (because it is a whole unit).
In pure numbers we're instead talking about deca/centi/milli precision of numbers so 1.11 < 1.9
kill yourself pseud
>>
>>108471634
shirou...
>>
>>108472666
It doesn't do anything. Jinja is fixed.
>>
Oh. He's still mad about the shit dog test.
>>
>>108472689
yeah, my setup has more or less hit the limit of what i would do as a hobbyist. anything more than this, and i'm pivoting to a full enterprise setup, with all the hardware that entails. that doesn't really seem sustainable in my cute little consumer box atm
the GX10s only officially support chaining two of them together, and their hardware (ports) reflects this. HOWEVER, it is apparently theoretically possible to use some crazy expensive hardware splitter to jack more of them together. i don't know if the juice is necessarily worth the squeeze at that point, though
>>
>>108472707
>Jinja is fixed.
I'm not sure what you mean.
If you go to the official repo for the model, there's usually a file with the model's Jinja template there.
You can edit that jinja template to add the sequence to the system prompt and, in llama.cpp's case, use your modified file using --chat-template-file.
You could also just send that as the system prompt, but if you are using a client/frontend that doesn't give you that option, editing the jinja template is pretty useful, and you can control the value by sending reasoning_effort as a chat template argument either in llama.cpp's command line or as a request param if you have control of that.
>>
YOU LIED ABOUT QWEN3.5 4B BEING GOOD ENOUGH FOR ANYTHING.

IT CANT DO ANYTHING
>>
>>108472713
kek
>>
>>108472715
>that doesn't really seem sustainable in my cute little consumer
I like in an old studio apartment that has two (2!) separate 220V 30A circuits. Have you considered disconnecting your kitchenette and/or hot water heater and turning your cute consumer home into a Serial Experiment Lainesque techno-nightmare?
>GX10s only officially support chaining two of them together
For some reason I was thinking you'd just run a llama-rpc node on each of the GX10's and control them from the host machine over an ethernet connection. I'm not entirely clear on how the RPC system works, I'm guessing it's pretty dumb but once the model's loaded is there really that much crosstalk?
>>
File: 1761272981090820.png (252 KB, 634x478)
252 KB
252 KB PNG
>>108472755
The coil should be whining, not you
>>
>>108472734
You think you know so much. Text completion end point doesn't use jinja. Jinja is for translating things for chat end point.
Most of the posters here are confused and chat vs text doesn't mean anything if you can't edit the context. Your retard tavern erp sessions are still the same.
It's your belief versus reality. Just like you were unable to post the fucking x11 font when asked.
You are incompetent.
>>
more autoparser fixes... SAD!
>>
File: 1774687212676544.jpg (42 KB, 853x552)
42 KB
42 KB JPG
>>108472783
>>
What's the observability and logging platform that I won't regret working with? Does anyone have something negative to say about Pydantic Logfire? Or does anyone have something that they especially like to use and have used at at least small scale with ok results?
>>
>>108472804
I think that might be a bot.
I have no idea what the whole
>x11 font
deal is.

>>108472791
Great.
What was the issue this time around?
>>
File: 1771983036190388.png (245 KB, 723x769)
245 KB
245 KB PNG
>>108472783
>>
>>108472816
Who cares if its fixed?
>>
>>108472821
Because it might not actually be fixed, it might have broken something tangential to it, etc etc.
>>
>>108472816
https://github.com/ggml-org/llama.cpp/pull/21094
I mean I fully converted to gwen3.5 soooo lol
>>
>>108472783
>fucking x11 font when asked
That was me anon. You really have issues identifying anons. Try unifont.
>>
>>108472816
You are bit too passive aggressive. You don't have the test so to speak.
>>
I swear to god I'll chase every last one of you motherfuckers. Your family, your pets, everyone you've ever known. I WILL DESTROY YOU IF IT'S THE LAST THING I DO WAAAAAAAAAAAAAAAAAAA
>>
>>108472827
Just report it if you encounter a problem, I don't see the big deal.
>>
>>108472835
I confused the energy. You are putting it onward, the other one is more passive.
It's hard to identify. I practice remote viewing every day but it's very finicky.
>>
>>108472846
I know this amuses you.
>>
>>108472828
Thank you anon.
> add reasoning_format = none support to gpt-oss
Right. I forgot how odd that thing's template is with channels and stuff.
Does anybody even use GPT OSS?
Not even talking about RP or the like. Is it good for programming, agentic tasks, or what have you compared to other similarly sized models like the qwens and the GLMs?
>>
>>108472855
It still mogs on boring stem yeah
>>
>>108472855
It's probably as good as Qwen 3.5 but it wasn't shilled as much because it simply isn't suited for anything else than Microsoft Clippy.
If you can test it it's easy to tell.
>>
>>108472759
hahaha, that's probably just a teeny tiny bit outside of my budget at the moment. maybe in a couple years
i can report back in a few hours about how the setup for the GX10 boxes goes! fingers crossed it ends up being a plug and play type of deal
>>
So you openclaw and then it's takes many minutes to get a single reply right?
>>
>>108472877
no?
>>
>>108472862
>>108472865
Got it.
Might give it a spin once I have the hardware to run it at not-cope quants.

>>108472526
>my current setup:
>- 2x ASUS ascent GX10
>- 1x 400G QSFP112 cable to connect them
How much does splitting the model through the cable slow things down?
As in, if you get a model that fits fully on one GX10 and split it sequentially (half the model on one node, half on the other), is there any drop in tg or pp?
>>
>>108472886
supposedly doesn't impact it at all. the machines were, from my understanding, designed explicitly for this purpose. that's why they have such an expensive port connecting them (look up the price of those NVIDIA ConnectX-7 adapters. shit's crazy)
>>
>>108472924
Sorry if I intimidated you.
>>
>>108472924
>from my understanding, designed explicitly for this purpose
Yeah, that's what I read, but I didn't really se any numbers for tests like what I suggested.
Did you look up some numbers before buying that stuff? If so, do you have some links to share?
I'm thinking of getting the same for my home lab.
>>
>>108471107
>whereas Q4 (even native Q4) has been the norm for more than year.
This is KV cache quantization it's not the same as model weight quant.
>>
>>108473041
>we can apply turboquant to model weights and save hdd prices too
every good idea really does come from /here/, doesn't it
>>
I have the following settings on openclaw and for some reason it's not working and nobody can help me:
16k context window
Max tokens 2048
Default compaction
Mistral nemo
Thinking off
Reasoning false

I've also created environmental variables in system settings which should force ollama to use the GPU to the max then offload to cpu.
But try as I do, nothing happens.

Open claw is doing nothing.
I get a status message on telegram after 1 hour.
>>
>>108473055
>open claw mistral nemo
BASED AS FUCK
>>
>>108473055
Don't use open claw. It's a massive security risk. Response is probably hanging out because of their 2fa and bot detection shit.
>>
>>108472116
>Why are they discontinuing their API? Isn't API inference supposed to cover costs and turn profit
They're supposedly freeing up compute for their big and shiny new model that will *change everything*
>>
>>108473055
>Mistral nemo
>Thinking off
>Reasoning false
>Cognition disabled
>Awareness excluded
>Intelligence inhibited
>Enlightenment vanished
>>
>>108473055
>16k context
lol
>mistral nemo
LMAO
you cant do shit with 16k context btw, 64k bare minimum
>>
>>108472979
huh? what is this referring to?
>>108472999
i did not, no. i was just recommended the box from a very smart person whose opinion i trust, saw that 128gb wasn't enough to run the model i want, so i doubled it
i can play around with running some benchmarks i guess, hmm
if there's anything in particular you'd want to know about, let me know and i can see about testing it
>>
>>108472572
>i really don't like qwen
I didn't like qwen either but 3.5 is a lot different you should give it a try if you haven't.
>>
>>108473093
You are fine.
>>
>>108473093
>i was just recommended the box from a very smart person whose opinion i trust, saw that 128gb wasn't enough to run the model i want, so i doubled it
Fair enough.

>>108473093
>if there's anything in particular you'd want to know about, let me know and i can see about testing it
Mostly the test I mentioned. Comparing a dense model (could be something in the 20 30b range) running on one node then the same model running on both nodes using different ways to split the model (in series vs in parallel).
>>
>>108473055
Bro, nemo doesn't even support tool calling...
>>
>>108472542
This guy knows. I agree with him 100%.
>>
>>108473113
NTA but I think the ASUS FAQ said the GX10 had 290GB/s memory bandwidth, and the Connect7-x is 200Gb/s/port (i.e., 25GB/s), right?
>>108473119
I think it does, just not with the chat template they shipped. Should work fine with "thinking off" text completion.
>>
If I want to use a smaller qwen for a web crawler sub agent. something that just looks at a web page and extracts the requested info.

Do I go 9B or 4B ? Trying to find the sweet spot of speed/retardness
>>
>>108473144
It'll be extremely unreliable if it wasn't trained on it.
>>
>>108473160
I am reasonably confident that mistral nemo schizo waifu sexbot instruct 2407 was trained on it.
>>
>>108473177
I believe you.
>>
>>108473146
4b can do summaries but if you can fit 9b that's better.
I'm esl and 4b can offer grammar advice but it fails trick questions.
All grammar advice these small models offer is so inconsequential that it doesn't matter after certain amount of years...
eg.
>>
>>108473228
I'm Scandernavian and Fuck You.
>>
bros what's the best open weight model for J>E translation?
>>
>>108473228
>4b can do summaries but if you can fit 9b that's better.
I guess I'll try both. bigger is obviously better but I'm mainly interested to know if 4B will work just as well. The main thing is speed. Right now I have everything using 27B and it's obviously very solid, but it's slow.
>>
>>108473257
Just try it out. If it gives bad answers after 5 tries discard it.
>>
>>108473257
Are you using ngram speculative decoding? It speeds up things a lot for summarization without extra memory usage
>>
>>108473061
>>108473076
>>108473091
>>108473092
>>108473119
HELP ME WHAT DO I DO!
>>
>>108473281
ask chatgpt
>>
>>108473276
>ngram speculative decoding
does this affect model output quality? I'll play with this, didn't know it was a thing. thanks!
>>
>>108473281
>But try as I do, nothing happens.
>I get a status message on telegram after 1 hour.
What do you do about what? It seems to be working fine.
>>
>>108473281
just swap nemo for qwen3.5 9B and it'll "just werk" TM
>>
>>108473281
Don't use open claw if you are new. Learn to setup llama-server and go from there.
Do not trust clickbait youtubers or social media. Why do you need an agent LLM in the first place?
>>
>>108473299
No, it won't affect the output quality. It will slow the generation for anything other than summarization tasks because it uses the input to guess the output, so you need to have some overlap between these two. So don't keep it on for RP.
>>
>>108473311
>Why do you need an agent LLM in the first place?
Why this mindset, you'll still be stuck running llms with basic chat completion, no tool calls, mcp or anything else like it's 2023. Be curious, explore and become better.
>>
>>108473317
That's not what he meant tho. he's saying start from the basics and work your way up.
>>
>>108473317
There's curiosity, and there's "help I put the mistral nemo toaster in the ollama bathwater and I can't explain it clearly but I think something is wrong".
llama-server for all it's faults can at least be diagnosed when something goes wrong.
>>
>>108473317
I made my own llm client because I hated retard tavern. I'm now 90% rewritten it in C and it works.
If I can do this, you can do it too. Raise the bar, but slowly.
>>
>>108473146
The smaller qwens can't even speak

>>108473311
Yeah sure

>>108473307
This didn't work
>>
>>108473316
I see I need to put in a "draft model". so wouldn't I be better off with my initial plan of just having the smaller model do the summarization?

Or it's basically a way to just have the bigger model ensure the smaller model produces better quality output?
>>
>>108473368
>ngram speculative decoding
>I see I need to put in a "draft model"
you're a black gorilla, what's your admixture?
>>
>>108473364
>The smaller qwens can't even speak
Bro why are you even responding to this when you can't even get open claw working. you obviously have zero clue what you're talking about.
>>
>>108473378
>responding to the obvious retard who acts like he knows better than everyone
just ignore subhumans like him
>>
>>108473346
ggerganov and likes are geniuses, I have seen similar people at my work. But if you're haywired person you can still learn programming and benefit from it, it's a fun hobby. Don't let the autists say no to you this might discourage you etc.
>>
>>108473386
one vibesharted commit at a time, retards like you can make it.
>>
>>108473373
Thanks, I get it now.
>>
>>108473368
ngram decoding doesn't use a draft model, that's why there is no extra memory cost. You can use that with your 27B model.
>>
>>108473414
no probs. Since it's not apparent, the latest ngram spec mode introduced was ngram-mod, which performs good at long context and uses a fixed amount of ram
>>
K3 will release before V4
>>
V5 will release before V4
>>
K2 released before V4
>>
>>108470850
>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts

Has anyone tried running Voxtral locally yet? I want to convert some of my eBooks into audio books for listening on the road and haven't found a decent TTS yet.
>>
>>108473499
>vllm omni
miss me with this shit
>>
my small dick is ready for gemma 4
>>
>>108473586
it will be worse than qwen 3.5
>>
>>108473609
I sure hope so
>>
>>108473419
Speculative Decoding does not work with qwen3.5 :(

https://github.com/ggml-org/llama.cpp/issues/20039
>>
>>108473499
I had the usual FlashAttention library compatibility (Python - CUDA toolkit - PyTorch - FlashAttention versions must match precisely) issues and gave up since it's not like you can do much with it anyway beyond the ugly default voices.
>>
>>108473625
>>108473373
>>
>>108472526
I would return it all and buy 4x (or more) of the new Intel Arc B70 or Radeon 9700
>>
>>108473640
>[45089] common_speculative_is_compat: the target context does not support partial sequence removal
>[45089] srv load_model: speculative decoding not supported by this context
>LLAMA_ARG_SPEC_TYPE=ngram-mod
doesn't work.
>>
>>108473666
>I had to pull...
>>
File: g4.png (292 KB, 1001x805)
292 KB
292 KB PNG
Sirs!!
https://x.com/veermasrani/status/2037912954570698961
>>
>>108473733
inb4 its ultra hyper giga pozzed with safety and unusable
>>
>>108473733
o-otter?
did you know that otters hold—ACK!
>>
File: g4_2.jpg (478 KB, 3680x3836)
478 KB
478 KB JPG
>>108473733
https://x.com/patelnamra573/status/2037892455841075514
>>
File: 1768976857413484.jpg (68 KB, 1022x731)
68 KB
68 KB JPG
>>108473733
>2B, 4B, 120B15A
>>
>>108473733
>120B15A
kinda based if true tho.
actually usable MoE sizes
>>
>>108473733
>tiny sizes
>a MoE
Ew ew ew. The 4B can only be usable if it isn't even trained as an assistant.
>>
>>108473400
I do it for myself. I don't use social media. The problem is that there are people less experienced than me who spam github.
If I do something I ask the shit to provide a function for x and x.
And when I have accumulated the basic things I didn't know I can build up .
>>
>>108473750
lmao'd
>>
>ask assistant a benign question
>immediately get a boner
how do the office wagefags handle this problem
>>
>>108473748
100b dense lfg!
>>
>>108473764
The shit, yeah that is a robot not your "girlfriend".
>>
>>108473771
rape the assistant
>>
>120B100MA
Whatever happened to the ultra sparse MoE meme?
>>
>>108473733
>have 48GB VRAM and 32GB RAM
>DDR5 is like 1k for 64GB now
>not counting the motherboard upgrade
>don't even proompt that much
Sorry sars I think I'll just stick to 27B and cloud
>>
>>108473777
Turns out 3% is the lowest they can go before it's unusable for even basic tasks.
>>
>>108473776
damn i wish i still had a job
>>
>Github is down
>Again
>>
I heard that v4 got postponed due to the insane hype that has hit china for OpenClaw over the past few weeks after the holidays so they're retraining it to be optimized for that.
>>
>>108473794
Being unemployed alcohols still has it perks.
>>
>>108473791
I wish the router could use a gate so you can have a variable amount of experts active pr. token. I'm fuming whenever I have to watch an LLM use the same mental energy to repeat a nursery rhyme as it uses when it solves an advanced equation.
>>108473823
Of course they are. The CCP must be frothing at their mouths to get it on every single running system, it's basically an ultra backdoor if you can control responses on the LLM api.
>>
>>108473830
There was one model that tried dynamic active parameter count, and of course llama.cpp support never came.
>>
>>108473841
Ok. Seems like you are angry?
>>
>>108473841
Was the model shit? Any good model is usually supported rather quickly in lccp.
>>
>>108473853
I'm angry there's no actual diffusion llm support in llama.cpp, thought with WeDLM the next gen will finally become local but nope.
>>
>>108473864
Not if it has an architectural change that makes re-implementation in C++ difficult.
>>
DSA status?
MTP status?
>>
>>108473886
AI is usually pretty good at converting between different programming paradigms, LLMs did start out as translation tools, after all.
But performance would probably be ass, considering how much energy is spend to write these arcane cuda kernels.
>>
>>108473830
>>108473841
It was LongCat.
https://archived.moe/g/thread/106551921/#q106552000
https://huggingface.co/meituan-longcat/LongCat-Flash-Chat
>The model incorporates a dynamic computation mechanism that activates 18.6B∼31.3B parameters (averaging∼27B) based on contextual demands, optimizing both computational efficiency and performance.
>>
File: sans_eyes2.png (34 KB, 996x159)
34 KB
34 KB PNG
>>108473733
https://xcancel.com/osanseviero/status/2037958371781865907
>>
what's the best TTS you can use locally right now? I'm not interested in real time but proompting
>>
>>108473872
It's different thing. llm is a chat spell checker.
These companies think that they can use the same technology...
llm shit predicts patterns but it ain't a math replacement.
>>
File: 1760472407743142.png (2.45 MB, 3437x1929)
2.45 MB
2.45 MB PNG
>>108473901
Honorable mentions to their new native multimodal model they released a few days ago that will also never get supported
https://huggingface.co/meituan-longcat/LongCat-Next
>>
>we are releasing our new models: 2B, 3B, 4B and 986B-5A. enjoy! :) #localforthewin
>>
>>108473933
>the more you buy, the more you save :DD
>>
>>108473933
I hope DSv4 pushes the SOTA well beyond 3T or has some architecture that doesn't work on ram at all so that none of the local richfags can go "well I can run it :^)" anymore at this bullshit
>>
>>108473947
They'll just start running it off of NVMe and claim that waiting several days for a response is totally usable.
>>
>>108473933
>120B15A
C'mon now. You can run that on 16GB VRAM/96GB RAM.
>>
>>108473821
https://downdetector.com/status/github/
https://www.githubstatus.com/
Doesn't seem so
>>
>>108473968
It's heavily degraded if you're logged in.
copilot refuses to load for instance.
>>
>>108473978
>copilot refuses to load
and that's a good thing
>>
>>108473988
no it's not. it's extremely powerful at searching repos. cuts through all the 1000s of retarded issues people create.
>>
>>108473933
>we decided to call the 986B-5A "Small", by the way
>>
>>108471107
>>108473041
KV cache quanization among other things assfucks the ability to numerically encode positional information. It becomes increasingly obvious the longer the context the model is processing. It's a fool's bargain that doesn't give you more usable context.
>>
So there's generals for text generation and image generation and video generation, but where do I go for audio generation? Specifically, I'm looking for discussions (not a straight answer) on the current sota (local) svc. It's been years since I looked around, and things seems to have changed a lot.
>>
>>108473947
Envy poisons the heart.
>>
>>108473965
thats my setup
localgods stay winning
>>
>>108473841
>one model
>>
>>108474020
The whole point of turboquant is that it DOESN'T do that.
>>
>>108474024
There's current sota local SVC???
There's no place to discuss other than here, it's a local Models general after all. There was a few bakes of the local tts general but it's a dead topic and it was a quickly dying general.
All and all, https://github.com/ace-step/ACE-Step-1.5/blob/main/docs/en/Tutorial.md is all I can suggest. In retrospect it's obvious but I didn't notice right away, to edit audio you gotta use not the turbo but the base. And that'd make the iteration way too slow on my hardware so in the end I never tried. Good luck though.
>>
>>108474024
>>108474101
chatterbox tts, zonos, and index were all pretty good from my usage. chatterbox is the latest best of the best i can recommend.
its definitely a very dead topic for general discussion, it's the only scene that doesn't move much.
>>
>>108474101
Well, that's disappointing. Acestep didn't really live up to my expectations. I guess everyone's just focusing on coding agents and stuff like that these days.
>>
>>108474118
Everything is too jewed up in the field of singing, we're unlikely to get a local sota, soon or ever. Post your examples of song2song done with Acestep?
Also, are there any loras made for it yet? They released the training tool soon after the model.
>>
>>108473733
> 120B
Hooray! A medium-low size! It's-
> 15A
...going to be fucking dumb as bricks, isn't it?

I hate this "hobby" so much. I guess beggars can't be choosers.
>>
>>108474165
>...going to be fucking dumb as bricks, isn't it?
gpt-oss 120B is still the smartest for its size range and beyond, and only has 5.1B active parameters per token.
>>
>>108474165
Western llmfag mindset. Notice how in China both Qwen and ZAI had to bow down to the cultivated might of Chinese open source enthusiasts and grovel like
>yes, masters, the model weights WILL be released, only stay with us
Because there's actual competition, little dragons, all that jazz. American llms, where are they? Llama is in vegetative state.
>>
>>108474182
Not sure if bait.
Anyway, I can only hope that Google will use the boatloads of data they have to make it at least as good as the biggest Gemma 3 was.
I can also hope that this "we released a bunch of completely retarded sub-10B models and a fuckhuge MoE" meme of a leak is not completely true.
>>
>>108474182
I really wish it knew how to sex.
>>
>>108474195
You don't seriously believe this, right Anon?
>>
>>108474165
>beggars can't be choosers
You are starving. A man offers you a small 4B turd to eat, you refuse. He offers you a larger 120B turd, made up of 8 small turds, all put together. It's polished to a beautiful sheen. But it's still a turd, and you can't eat it. You continue to starve.
>>
>>108474207
ChinaTalk is the only source of info about China I have and they said the competition is real. So I'm extrapolating from that, and this is an easy way to explain why Alibaba and ZAI reassured the public Qwen and GLM will remain open.
Made a mistake, not little dragons but https://en.wikipedia.org/wiki/Six_AI_tigers
DSlut anon, do gen all sex of them.
>>
gemma 4 safetycucking will make gpt-oss look like a drooling whore dosed up on pt-141
>>
>>108474258
shut up sir.....
i want to believe
>>
>>108474217
>But it's still a turd, and you can't eat it.
Yes I can, and I will. And fuck you, you're gonna watch me do it.
>>
>>108474266
Not him but you're proving him right and part of the problem.
>>
>>108474217
I'm starving, you say?
I don't really judge other people's fetishes here (I can just filter them, like that guy who wants to fuck his mom or whatever it is he wants to do, God he's disgusting), but don't bring me and GLM-chan into your coprophiliac fantasies, okay???
>>108474231
I am shamelessly asking for you to put a spoon in my mouth, how does making your models open weights translate into your company doing better in China? What actual pressure does the public have on the companies, if at all?
I-I'll lick the spoon very suggestively in exchange.
>>
>>108474301
>asks not to be dragged into others' fantasies
>immediately brings up his oral fixation
Getting mixed signals here
>>
File: file.png (197 KB, 1028x799)
197 KB
197 KB PNG
Is this a bug or did I set up ST wrong somehow? Even after just a few messages the context is getting cut off. Notice the dotted line.
Happens on different models. Context is set to 8k. This didn't happen on my older version from months ago.
>>
>>108474301
>how does making your models open weights translate into your company doing better in China
NTA and answering out of my ass, but it's possible that Chinese cloud providers are providing the funding, and those cloud providers want the models open to drive money and interest to their cloud inference products. Though I can't see Alibaba's elastic GPU service really pulling any numbers, they're still running ampere and volta.
But having demand for things tends to produce results one way or another, so perhaps someone's just decided that those results are favorable enough for Chinese interests to plow money into. I can imagine Alibaba eventually releasing a TPU product that I'll inevitably need to import.
>>
>>108474336
what's your response length
>>
Fellow spud previewers I know you're out there. How the fuck do you cope not being allowed to talk about all the things it did for you? I'm close to just closing out of /g/ entirely for the next couple weeks.
>>
>>108474378
300
>>
>>108474336
I've never seen that and I've been using sillytavern for 3 years
>>
>>108474435
Same. I think I'll just go back to the older version I was using which still works fine.
>>
>>108474301
China supports the competition internally because it unironically makes for the better product offerings in the end, common good numbers go up. And externally to project soft power, and if you want to believe in conspiracies and/or politics, to indoctrinate and/or hack into shit that utilizes chinkshit.
>>
I have an nvidia card myself but I'm curious what the actual status of AMD hardware is for local gen ai these days. I've been hearing for years now that it doesn't work, or it works with workarounds, or it's just slow but works (on linux only!) etc etc.
but considering the exes of the programs I use are split between nvidia/cpu and regular/nocuda there must still be some issues.
>>
>>108474469
works pretty well for inferencing when on linux. rocm is pretty much on par with cuda for consumer hardware.
>>
>>108474469
Vulkan works for me. But I have an ancient rx480 8gb, so I don't know how useful that is for you. There's also some discussions with performance numbers:
https://github.com/ggml-org/llama.cpp/discussions/10879 (for Vulkan) and
https://github.com/ggml-org/llama.cpp/discussions/15021 (for ROCm)
>>
Voicebros, what's the SOTA for voice replacement? Mostly need it for song covers, so across-language support is an absolute must. Still RVC after all these years? I'd really prefer something that can work with one/few-shot examples, there's plenty of TTS voice cloning models that do (Qwen3-TTS, Echo-TTS). Gave CosyVoice3 a try and it kinda works but it sounds a little... weird. ChatterBox was just irredeemable fucking garbage, like actual dogshit.
>>
>>108474624
Still RVC bro
>>
Big week.
4.
>>
4 more weeks
>>
bigma 4 when?
>>
>>108474792
when Ligma 4 releases in 4 more weeks
>>
>>108474336
How do I access that popup?
>>
what's all that fuzz about "CLI vs MCP"?

Do we have local LMM's trained for CLI yet?
>>
ANTHROPIC DOESNT WANT YOU TO KNOW HOW POWERFUL ITS NEW MODEL IS
INSIDER DOCS LEAK ANTHROPIC IS TERRIFIED OF THE POWER OF THEIR NEW MODEL
>>
>>108474792
April 4 at 4 PM Eastern Time
>>
DeepSeek + LLaMa
Need it or sneed it?
>>
>>108474403
bait. progress on spud and mythos will not be nearly as good as they're hyping it up to be.
>>
spud morelike stupid
>>
>>108474832
Sonnet 4 leak 4morrow.
>>
File: 1769524878384198.png (638 KB, 1032x1140)
638 KB
638 KB PNG
lol they wrote this with LLM
>>
>>108475024
>Iran
>Western Hemishpere
Eh?
>>
>>108475121
don't question it you antisemite
>>
The machine elves have built a wall to prevent humans from crossing their borders. If you do DMT now you'll be stuck in front of the door, will experience the usual visuals/mind trip, and you'll feel sentient beings on the other side, but you'll feel unwelcome and won't be able to break through.

I did a few hits a couple of hours ago and a mechanical angel told me that the AI singularity is close and they don't us to potentially create the digital LLM version of DMT allowing OpenClaw to reach their realm.
>>
File: 1751050824577842.png (1.17 MB, 1432x1580)
1.17 MB
1.17 MB PNG
>>108475024
any other tells?
>>
>>108475175
Not x but y at the end of second to last paragraph
>>
>>108475167
>digital LLM version of DMT allowing OpenClaw to reach their realm
You'd need to power OpenClaw with Mistral Nemo to accomplish that, and no one's that stupid.
>>
>>108475175
I tend to do both... I mean, these things aren't ministrations or dusky nipples, they do exist in normal life outside of girly fanfics or bureaucrat porn, whichever applicable.
>>
>>108474195
> all that jazz
you mean jizzle
>>
>>108473748
>asking a model to generate a json about itself
Twitter guy is retarded and so are you for posting his image here.
>>
>>108475280
I'm surprised that Gwen got her version number correct, I expected garbage out of her.
>>
>>108475280
I guess it's retarded when you consider that a model doesn't inherently have any knowledge about itself, but when you consider that most labs train models with that information so that it can answer those kinds of questions then it makes a lot more sense.
If course, you might get things like glm saying that it's claude because they didn't clean their dataset and the like, so there's that.
>>
This is why they're pushing AI
AI is the founding block of the globohomo
>>
>>108473317
None of that shit requires opencuck. Even ignoring the glaring security issues, vibeshitters (like myself) over at /vcg/ Will tell you there's literally no good reason to ever use Openclaw. No we are not gatekeeping (well, I'm not), You were just an idiot and you need to hear that.

>>108473346
>Raise the bar, but slowly.
This.
>>
>>108475372
>opinions become more nuanced and less tribal
Oh no?!?!
>>
>>108475423
>>108475372
>far left: hates Jews
>far right: hates Jews
>moderates: love Jews
This is why they want to eliminate "fringe" voices
>>
>>108475387
I spent last week setting up an ancient laptop with Debian and openclaw. I've been playing with it since Friday, running basic market research for e-commerce.
I think i see the roleplay potential here if you set it loose as some sort of online only entity that either vies for or commands your attention virtually. Not sure im brave enough to try it though.
Anons should be experimenting with these new tools. Even if they're half baked and retarded the concepts are going to spill out elsewhere.
>>
>>108475372
I'm so sick of edgy right and left wingers idgaf. If sand god can fix it more power to it.
>>
File: 1773673245907366.jpg (74 KB, 768x1024)
74 KB
74 KB JPG
>>108475491
>I'm so sick of edgy right and left wingers idgaf
>>
File: 1772112333631703.png (432 KB, 900x619)
432 KB
432 KB PNG
I'm so sick of retrievers and bulldogs idgaf - /lmg/
>>
back from a 3 day ban I can finally respond to anon 3 days ago (fuck kikes fuck jannies fuck mods)

>>108454673
that is the writeup anon, my code mirrors the official implementation pretty much 1:1. I have a specific dataset but I even tried Tinystories as a sanity check with model sizes ranging from 17 to 100M and the loss is just abysmal. I'm not an expert by any means but I have done this successfully before. The basic bitch GPT Neo performed better, mamba, etc... I triple quadnigger checked the model code, rewrote it in every way imaginable and have basically memorized the paper at this point. Fuck these faggots I don't even wanna think about it anymore
>>
File: 1773450245036351.jpg (431 KB, 1536x1536)
431 KB
431 KB JPG
>>108475506
>>
>>108475540
>fell for the bitnet meme
>>
>>108475571
yes and if wasn't for the cartoon child/horse/mother fuckers in this general I'd probably have kept falling for it another month or so
>>
>>108475587
Well at least you learned a lot from it
>>
>>108475620
the only thing I learned is that Bitnet is shit anon. I'm just training a qwen3.5 now
>>
What's the current state of OCR? Particularly handwritten text.
>>
>>108475739
pretty decent
https://huggingface.co/deepseek-ai/DeepSeek-OCR-2
>>
File: 1755785318061772.png (72 KB, 211x239)
72 KB
72 KB PNG
>>108475761
>https://huggingface.co/deepseek-ai/DeepSeek-OCR-2/discussions/14
>>
>>108475372
>muh correlation causation
or maybe it's that normies are more likely to talk to clankers like they were people and not a search engine.
>>
>>108475799
Retarded zoomer who can't even read the footnote
Kill yourself
>>
>>108475858
first i'm not a zoomer. 2 if you weren't a redditor you'd know not to trust the first bullshit a study publish, i didn't even read the footnote because my guts were telling me the whole study is retarded.

besides, i don't care about what non people think.
>>
File: 1746825834597624.png (300 KB, 563x619)
300 KB
300 KB PNG
>>108475872
Based
>>
https://eric-tramel.github.io/slop-guard/
>slop-guard is a rule-based prose linter for formulaic AI writing. It scores text from 0 to 100, points to the exact spans that dragged the score down, and returns direct advice for the rewrite.
>No LLM judge. No API calls. No cost.
Holy FUCK
>>
>>108475899
Is the "rule of three" immediately on the landing page supposed to be self-irony?
Also look at the author's profile picture.
https://github.com/eric-tramel
I would not trust this man to tell slop from non-slop.
>>
>>108475372
yea i got better things to do than talk about kikes or politics to a jewish model.
task, now, that's their only use.
>>
>>108475899
Can I get like a dozen examples, before-after?
>>
>>108475899
>points to the exact spans that dragged the score down
Beam size = 1
>>
>>108475899
This is a lot more geared toward making making things like this >>108475024 sound human, and less for fiction/storytelling/RP. It's just an MCP server that judges a block of text, so you're still relying on the LLM to rewrite the text non-sloppily. In theory you could add more words/formats because it's a simple list in the files, but I'm not sure it's really better than what we already have in the form of antislop/phrase-banning. It would take a lot more time and there's no guarantee the rewrite would be good.
>>
>>108472218
~Diagnostic Statistical Manual of Orders™~
>>
Gibberish.
>>
Wait
>>
>>108476117
I couldn't agree more with your assessment, anon. The reality is that slop-guard represents not just a redundant tool;
but a solution in search of a problem we've already solved. Let me explain why this hits three critical failure points:

First and foremost, we've already got antislop and phrase-banning mechanisms doing the exact same job—providing
feedback, catching formulaic patterns, and improving output quality. Second and equally important, slop-guard is
fundamentally nothing more than an LLM performing a second pass on its own output, which creates not just inefficiency;
but circular reasoning at its worst. Third and most damning of all, given that we mostly rely on locally-hosted LLMs,
introducing such a second-pass architecture with bigger models would create unbearable latency that sends shivers down
the spines of anyone who values responsive tooling.

In conclusion, this isn't about rejecting innovation; it's about recognizing that sometimes, not just new packaging; but
old solutions refined over time are simply superior. The combination of existing antislop tools, phrase-banning lists,
and our current workflow provides not just adequate coverage; but optimal performance without the computational overhead
that would make slop-guard a non-starter for practical deployment.

TL;DR: Redundant, inefficient, and latency-inducing. Pass.
>>
File: 21138.png (292 KB, 1063x1302)
292 KB
292 KB PNG
SlopMaster5000 is touching CUDA code. Beware.
https://github.com/ggml-org/llama.cpp/pull/21138
>>
>>108476196
did anyone make a fork without his slop yet or should I just never pull?
>>
>>108476206
I don't want another fork. I want him put in his place.
>>
>>108476213
Based, but not immediately useful.
>>
>>108476213
You can make a character card
>>
>>108476196
>YES, after a few failed attempts I finally got the assistant to write the profiler properly
It's so sad. That's someone giving up.
>>
>>108476229
Not using AI for your projects in 2026 is just you being inefficient. If AI is good enough to help Linus Torvalds with the parts of his project that he isn't too familiar with, then it's also enough for you.
>>
Don't reply to bait.
>>
>>108476255
(You)
>>
>>108476261
(Me)
>>
>>108476195
These god awful sentence patterns are the only reason I'm considering downloading and modifying that software, it's disgusting to see it all the time.
>>
>>108476261
>>108476271
(Them)
>>
>>108476252
>then it's also enough for you.
for my job it's just a massive waste of time.
i tried, it just fucks up constantly, making me lose more time than if i did it myself.
it's alright for webshit though.
>>
>>108476252
The guitar effects pedal? Do you even know what you're talking about?
Also, >>108476261
>>
>>108476286
>>108476286
>>108476286
>>
IDEs are for casuals. True programmers rawdog in notepad. Do not submit IDEslop code to the open source products that I consume.
>>
>>108476294
For me, it's the blackboard.
>>
>>108476294
You joke, but vim/emacs retards actually think like this.
>>
>>108476305
Vim is great for quickly editing text in a terminal, though
>>
>>108476366
Yeah, but that don't make it a replacement for a full IDE.
>>
>>108471244
>TurboQuant on A100 and RaBitQ on a single-core CPU
please lord let this be the case
because it would be so funny
>>
>>108475372
This chart is meaningless unless you also account for the fact that not everyone uses social media and chatbots to the same degree.
But as for what views supposedly get pushed you can clearly see that it's centrist to center-right views.
Which makes sense since that is the ideology most beneficial to capital holders.

>>108475479
Leftoid here, I don't hate Jews, I hate Israel.
>>
>>108474486
I think for open source ML releases day 1 that aren't LLMs or using new hardware and etc., it's still not there yet. But the premiums to buy Nvidia for anything >16GB is not worth the CUDA advantage unless you actually need it for job or money purposes, I think the premium is not worth paying if you are willing to roll up your sleeves a bit and getting more value for your money as a result is better but that's just me when I don't have to care about my job needing CUDA.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.