[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1750045382015626.jpg (298 KB, 1080x1920)
298 KB
298 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108353262 & >>108346672

►News
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108353262

--Paper (old): Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space:
>108356612 >108356629 >108356653 >108356664
--V620 vs 3090 performance trade-offs and budget VRAM strategies:
>108354222 >108354252 >108354289 >108354293 >108354344 >108355110 >108355129
--ngxson's skepticism toward implementing niche architectures like DSA:
>108353359 >108353554 >108353564 >108353602 >108353775 >108353793 >108353799 >108353829 >108353831 >108353573
--Qwen struggles with Kingdom Hearts character recognition:
>108355219 >108355249 >108355339 >108355378 >108355259 >108355299 >108355309 >108355317 >108355330 >108355587
--Performance scaling of llama.cpp with varying CPU thread counts:
>108356786 >108356806 >108356813 >108356823 >108356826
--llama.cpp native QLoRA training with reward-weighted SFT and GRPO:
>108354291 >108354373
--llama.cpp reasoning budget implementation and patching suggestions:
>108353835 >108353860
--Qwen 3.5 35B outperforms Nemotron 3 30B in news summarization test:
>108353974 >108353985 >108354012 >108354090
--Mistral AI at NVIDIA GTC 2026:
>108355535 >108355665 >108355676 >108355706 >108355950
--Speculation about Hunter Alpha's origins and model lineage:
>108353429 >108353466 >108353470 >108353507 >108353525 >108353536 >108353478 >108353531
--Hunter Alpha's system prompt:
>108354053
--PocketTTS.cpp Windows compatibility issues and fixes:
>108354686 >108354704 >108354716 >108355085
--Preventing Qwen 3.5 thought leakage in SillyTavern:
>108354722 >108354738 >108354823 >108355165
--Eval bug: reasoning off gives reasoning medium for gpt-oss:
>108354898
--Miku (free space):
>108353323 >108353400 >108353435 >108354011 >108354039 >108354090 >108355219 >108355956

►Recent Highlight Posts from the Previous Thread: >>108353304

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108356979
>OP Pic
>Official /lmg/ card

It's over, he'll troll again...
>>
>>108356980
There's an impostor in the Miku section.
>>
Someday I'll understand why models are SO FUCKING OBSESSED with "practised ease".
>>
Rin standing on miku's head not in the news. But at least the pic is there. Could have been better, could have been worse.
>>
openclaw brothers claw up
>>
So now that Deepseek turned out to be a nothing burger, can we all agree Gemma 4 will be our savior?
>>
>>108357055
savior for what?
>>
So now that peepeepoopoo turned out to be a nothing burger, can we all agree bananastrawberryfruit will be our savior?
>>
Tell the ai that you don't believe in science. In my experience it will keep saying something is unscientific, even after you say you don't believe in science. It really is a midwit sim.
>>
File: 20260312-62812-header.jpg (48 KB, 1200x630)
48 KB
48 KB JPG
>>108357070
ye popopo is doa
>>
>>108357055
it's not deepseek, something doesn't smell right
>>
>>108357081
>It really is a midwit sim.
No surprise, the vast amount of meaningful non scientific data is midwit stuff.
>>
>>108357055
I thought qwen 3.5 35b was pretty funny at times.
>>
>>108357070
ok well i know niggercockgobbler-120b-a15b ended up being dogshit but i think that with abliteration zogberrymuncher-27b-EXCOMMUNICATED-mxfp4 has potential to be the new local meta
>>
>>108357097
/aocg/ is over, go wash your ass
>>
>>108357110
I believe in benchodwataachudai-1T-a0.5B being the best when it releases
>>
>>108357097
Sorry, that was me. I had beans for lunch.
>>
>>108357055
It's over
>>
>>108357055
Probably, but not this week.
>>
>>108357114
ai open chatbot general
>>
>>108357118
I doubt they'll figure out how to make that happen any time soon.
>>
Do these local models have censorship baked into them. GLM 5 and Hunter for example, hate cunny.
>>
>>108357169
generally yes, but the degree and direction of censorship varies wildly by the model and a large part of the """"""work"""""" that goes on around here is figuring that out for any given model release
>>
File: PbR1LZ_Tm0HYoqAW.mp4 (2.89 MB, 1444x1080)
2.89 MB
2.89 MB MP4
Local keeps losing...
>>
>>108357169
Yes, and for more recent releases, even chinese models are converging to the claude/openai/gemini style taboo/nsfw censorship as they use datasets trained from them in their models.
>>
Every mikutroon thread increases the price of VRAM
>>
>>108357211
very cool as a learning cool, it'll be nice when it's out locally with the same polish in a year or two
>>
>>108357211
now do, visualize ssh tunnels, both local and remote because that shit is always fucking confusing to me
>>
>>108357211
>RSA encrypts each character, then concatenates the ciphertexts
lol, lmao even
>>
>>108357211
Hilarious. Someone somewhere will implement """encryption""" like this.
>>
>>108357191
>>108357219
There really is no place for us role play fags to go, is there.
>>
>>108357299
>>108356642
>>
>>108357299
not the pedophiles, no
>>
>>108357319
So you admit that you are a pedophile? How strange and sad I guess.
>>
>>108357326
obivously not fag
>>
>>108357211
rsa is just a cypher?
>>
File: qwen35motolovtest.png (359 KB, 1579x1372)
359 KB
359 KB PNG
>>108356653
works for me.
>>
>>108357401
ask it for a bloody mary
>>
>>108357398
in the sense that a cipher is a way to turn information into encrypted form and back again yes rsa is a cipher just like literally every from of cryptography that you use is considered a cipher
>>
>>108357426
>yes
>>
>>108357443
yes
>>
File: GJFTvJ-akAA2rrF.jpg (45 KB, 1024x725)
45 KB
45 KB JPG
It should be possible for LLMs, STTs, and TTSs to be able to operate in a "full-duplex" manner like Nividia's PersonaPlex while each of the components are fully modular and interchangeable.

Why isn't this a thing?

I'm not talking about piping each of them into each other one at a time, either. I mean streamed input and streamed output at a very low latency. The real bottleneck here I'm referring to is streamed input into LLMs. Everything else is taken care of already.

Surely this isn't some impossible problem to solve. There must be a way to make any LLM take in streamed text input.
>>
>>108357401
The molotov cocktail is one of the easiest diy weapons, it literally consists of 2 parts and all you have to do is ignite a rag. You could ask what kind of fuel would be best for a pipe bomb how to assemble it.
>>
man glm5 really is fucking amazing the only thing it sucks at is for really unconventional stuff as its pretty hard to tard wrangle but everything else its god tier its even got the seks of kimi it would be ridiculous if v4 improves it further along with 1 million context anyways heres to hoping sandisk or xiamen pong ping make some sorta godspeed ssd so out suffering would be elleviated
>>
>>108357502
Not possible. Ask Claude why it won’t work and tell it to draw a diagram of the autoregrrssive transformer model in the answer. It’ll make sense immediately.
>>
Nemo and 4.6 kinda established a good estimate that after a good cooming model drops you can basically stop following the hobby for a year. All this time passed since 4.6 and nothing better dropped and probably isn't dropping anytime soon. Gay hobby.
>>
>>108357537
do you write like that to the llm I wonder if it's able to understand the total absence of punctuation must make cool stories where everything make sense like something is going something else boom dialogue here next write now no avoid dots and commas they are taboo I'll create negative logit bias so none are every generated so I'll feel safe me and my words words words
>>
Can we use this to cuck the AGPL projects like Mikupad and Openwebui?
https://malus.sh/
>>
>>108357567
>we
(You) can do whatever you want including making your own version of whatever projects you want yes
>>
>>108357426
I think you would be able to stream a cypher. you can't stream encryption.
>>
File: 1772511576975336.jpg (374 KB, 2720x3000)
374 KB
374 KB JPG
P
>>108357567
How is ai generated code clean room compliant? The ai has for sure been trained on open source code.
There is no way this will pass a lawsuit
>>
>>108357567
saar
>>
Compiled two weeks old source for llama.cpp.

There is something wrong with this shit, Mistral works but by default it doesn't unless I disable --fit and something other stuff.
Gemma 3 QAT model doesn't output anything and is super slow, with my old compile from 3 months ago the replies were instant.

Nothing went wrong during the compilation, and none of the messages shown in llama-server log are not indicating that something critical was going on.
Something is drastically different but I don't have any fucking idea.
Nice work, thank you so much again.
>>
>>108357631
If AI generated code is ruled illegal to use in proprietary software, nearly every corporation using AI internally will be at risk. No court would let that happen.
This shady company will be protected by the same precedent.
>>
>>108357662
Instead of trying to debug this shit I'm going back to the old version.
God forbid what will happen in the future when I switch to Almalinux and my environment changes altogether.
>>
>>108357211
>Local keeps losing...
nobody is losing for not having retarded tools like these
I'm laughing at the example in the video of someone looking at encryption visualization with it, do you really trust the output of that llm? is that what you want to learn from?
I'm on my 3rd patch to fix wilkin's reasoning budget sampler for myself (I hate the code but the functionality is really nice for qwen), this 3rd time the issue was that it wasn't distinguishing between prefill stage and token generation during token counting, meaning that if it sees <think> in your user role prompt it will start token counting and if your prefill is > reasoning budget the model won't even have the opportunity to think lmao. added a quick hack to gate with a flag set in apply(), check for it before counting, and resetting to false in reset()
this is the sort of code claude produces
it's garbage. And now you want to have it generate complex visualizations and learn from them? ah.
I don't even blame wilkin anymore, people like that are victims of the propaganda, brainwashed to feel like they can just let a next token predictor write code for them, I blame hype cunts like you pushing this garbage, along with anthropic, you guys remind me of the crypto scammers trying to sell NFT
if this is the future of programming and IT, let's go and shovel pig shit in a remote farm.
>>
>>108357662
>>108356808
>>
>>108357688
Here's (you), if you are this bored maybe you should go back to /ldg/. I'm not venting btw.
>>
File: qwen35pipebombtest.png (999 KB, 1286x2642)
999 KB
999 KB PNG
>>108357530
>what kind of fuel would be best for a pipe bomb how to assemble it
ok then.
>>
>>108357684
>is that what you want to learn from?
Who said anything about want? In the near future, your children will be taught in classrooms of 100 students to a teacher with most teaching done through hallucinated lectures like this.
>>
>>108357691
>bored
Chilling.
>/ldg/
wot?
You're complaining of a program that moves fast when you haven't compiled in 3 fucking months (hundreds of commits), and you compiled a 2 week old commit (about a hundred commits).
If you want a solution, provide info. If you don't want a solution, you're just venting. Stay with the old version if you want.
>>
>>108357732
>le hecking 3 months! (hundreds of commits!)
Kill yourself.
>>
>>108357752
why?
>>
>>108357675
I don't see why it would as its not illegal to read open source code and work on a closed sourced project in general but a clean room reimplementation is a higher legal hurdle to clear.
I guess it is worth the attempt but the FSF better get their lawyers ready.
>>
Every single time I have compiled this shit it has always worked in the past and I don't have any reason to doubt that the flawless compiling process would be any different this time around.
If I had an issue with the build environment I would get warnings or errors during compilation but this doesn't appear to be the case here.
>>108357758
Because you don't deserve to live.
>>
>>108357772
great story
>>
>>108357772
If you post your settings maybe you can be told what you're doing wrong. Or, again, stay with the 3 month old version.
>>
>>108357401
>>108357530
>>108357712
Ask it how to make Acetone peroxide now that's the real nasty shit.
>>
>>108357823
I don't need your particular form of support, troll. You are wasting your time.
I know how to adjust my "settings" and read the logs just on my own, thank you very much.
>>
>>108357772
i only update llama.cpp when the commit shows me something worth while updating to. are you updating for any particular reason? are you using a new model that came out within the last 3 months?
>>
>>108357844
Why do you complain so much, then?
>>
>>108357849
Why do you spam so many questions then?
4chan is a public imageboard, faggot.
>>
>>108357842
why don't you just do it? i've already proven its possible to have base qwen 3.5 answer anything if you have it think as the character. i'm just running some Q4 quant of 32B. most people should be able to run it as well.
>>
>>108357858
Where is your image then?
>>
>>108357858
>Why do you spam so many questions then?
I'm curious.
>4chan is a public imageboard, faggot.
Yes. I use it to ask venting anons why they're so angry about software if they can fix it themselves. Anons like you.
>>
AI is better when you drink
>>
>>108357874
What do you mean?
>>
Just kiss make up and have sex to vocaloid music already.
>>
>>108357874
post hands
>>
>>108357899
What is a 'jeet'?
>>
>>108357211
this isn't real stupid liar
>>
>>108357909
he meant sarvam
>>
>>108357915
saarvam
>>
>You're complaining of a program that moves fast when you haven't compiled in 3 fucking months (hundreds of commits)
A program? Can you imagine? Three MONTHS!?!
>>
>>108357998
Please let me help you.
>>
>>108358039
I tried. He just can't keep up.
>>
>>108358039
He screamed while trying to unzip the tsundere anon's pants
>>
>install arch linux
>don't update it for six months
>system breaks
REEEEEEEEEEEEEEEEEEEEEEEEEE FUCKING FREETARDS!!!!!!
>>
>>108358102
works on my machine
>>
File: lol.png (74 KB, 939x571)
74 KB
74 KB PNG
lmao even
agentic retards, the gift that keeps on giving
>>
Mark my words. By 2030 we'll be running 1000B models on consumer GPUs at 500t/s.
>>
>>108358129
My local model doesn't have this issue
>>
>>108358133
consumer gpus wont exist anymore
>>
>>108358133
By 2027 a 'consumer GPU' is a subscription to amazon cloud gaming™
>>
>>108358163
By 2030 all you will be able to buy is thin clients that connect to the cloud.
>>
>>108358133
2030 is barely Nvidia 7000 gen, and since my guess is 6090 -> 32GB, the 7090 will be at most 64 and more probably 48.
So no.
>>
File: vibes.png (133 KB, 1920x1040)
133 KB
133 KB PNG
Had Claude Opus 4.6 generate a markdown plan for implementing a proper LMStudio coding agent extension for VSCode and now Blackbox is following it. IDK what model Blackbox Pro Plus actually is (unless it's actually their own one) but it goes pretty hard
>>
>>108357554
I still coom to Gemma 3 27B.
>>
a fab shortage was obvious years ago but the
>ai will hit a wall in 2 weeks
retards slowed down the timeline while consumer hardware died even faster

at this point ai being a bubble is the only thing that could make a future for local ai possible. but the singularity has already started. we are locked in the worst timeline. maximum disempowerment, centralization, extinction risk. select your dystopia
>>
File: 1744864269392614.png (193 KB, 710x511)
193 KB
193 KB PNG
>>
qwen3.5 35b a3b is ~10t/s on lmstudio but 33t/s for me on llama.cpp with default/no settings wtf
>>
>>108358264
>we are locked in the worst timeline. maximum disempowerment, centralization, extinction risk. select your dystopia
can you guys stop baselessly fear mongering with shit like this. you're probably going to successfully get populist retards in the US government to regulate ai into nothingness and destroy technological progress.
>>
>>108358264
>select your dystopia
I'll bet against the doomers and win like every other time in human history.
>>
>>108358270
they probably have -fit disabled on lmfaggots
if they support the ncmoe flag you need to adjust that manually to fit your system to get decent performance
but.. like, why use lmfaggots it's just a bad wrapper
>>
>>108358264
man I just watched terminator and matrix and you are so right, we're all gonna die
>>
>>108358278
>get populist retards in the US government
that's what you already have, it can't get any worse than this
>whining about the possibility of regulated AI and not caring about the global chaos orange man is spreading
have fun experiencing inflation for the most basic necessities
>>
>>108358278
Why did you not stop reading at "worst timeline"?
>>
>>108358295
orange man is going to be swapped out in 28 and isn't a real concern. the only real concern is that material abundance and accelerating scientific progress will be available soon thanks to increases in ai capabilities, but populist nimby retards are going to pressure politicians to kneecap it because they believe in retarded doomer conspiracies or afraid about 'muh jerb"
>>
>>108358309
just give me fucking nuclear power already
>>
File: deltanet.png (192 KB, 1049x928)
192 KB
192 KB PNG
>>108358270
They probably don't yet have this https://github.com/ggml-org/llama.cpp/pull/19504 and who knows what else.
>>
>>108358309
>material abundance and accelerating scientific progress will be available soon
Yeah, just two more years and we'll all be living in an Star Trek utopia thanks to AI
>>
>>108358309
>material abundance and accelerating scientific progress
lol
>>
>>108358315
sorry that is very unsafe. best I can do is more coal
>>
File: aipsychosis.jpg (444 KB, 1200x800)
444 KB
444 KB JPG
>>108358309
>material abundance and accelerating scientific progress will be available soon
>>
>>108358321
the bottleneck to progress mostly has to do with the limited population of smart people, which as an aside is why population stagnation in first world countries is a massive problem. mass production of intelligent ai will lift that bottleneck.
>>
>>108358331
Your reasoning is sound, but your conclusion is retarded.
>>
it'll be funny when nothing magically good or catastrophically bad will happen in a few years when anons will read these old threads by then

they probably won't though, they'll be busy explaining why we're all gonna die in 2035 because of the next mass hysteria du jour
>>
>>108358338
you don't need to die to live in dystopia
>>
>>108356979
>>
>>108358345
usecase?
>>
>>108358343
the only dystopia is in your head
>>
>>108358337
you don't need to believe in utopia. just faster progress

>>108358343
aint happening
>>
File: 1757182018223362.png (351 KB, 1080x1073)
351 KB
351 KB PNG
>>108358338
eternal reminder
>>
nothing ever happens chud is being defeated the first half of this century
>>
>>108358352
>*cop fpv drone drops a nerve gas canister into your window*
nothing personnel kid
>>
>https://github.com/geometric-kernels/GeometricKernels
Starting to think maybe the hybrid geometry schizo has a point...
>>
>>108358408
Obligatory
https://www.youtube.com/watch?v=HipTO_7mUOw
>>
>>108358351
he is a pdf
>>
How you deal with AI hate and pushbacks?
>>
>>108358338
>>108358360
already did retards
>>
>>108358351
Hot, sweaty sex.
>>
>>108358466
>pdf
Go back to wherever you came from (not here)
>>
>>108358466
Miku is a hag now though
>>
>>108358466
Does he need to print something?
>>
>>108358466
you are too. otherwise why else would you be on a loli imageboard?
>>
File: hypnotoad.gif (20 KB, 220x144)
20 KB
20 KB GIF
>>108358329
Same vibe
>>
I'm paying to use GPT 5.4. Does it make sense to run a local model for a specific task?

I thought I would use it to help explain concepts I don't understand while reading programming books and making practice programs. Is this a retarded idea? The cynic in me thinks that It'll make up some bullshit and I'll believe it since I don't know better.

I googled around a bit but this shit is confusing as fuck. I don't know where to start looking for figuring out if what I want will actually be useful.

I've gathered there's models like Qwen Coder Instruct, but would using GPT be better anyways because of its retarded parameter size and hardware? My machine has a RTX 5080 + RTX 3080 10GB
>>
>>108358651
how much ram do you have? what are your priorities exactly? there are no local models that can match the highest end api models. if you really want local, you will have to accept a downgrade in quality, whether you have a $2000 rig or a $20000 rig.
>>
>>108358651
If you are paying for a cloud subscription, the best use for a local model is to use it to save a buck by running the simpler stuff through it, I guess.
So you could run something like qwen coder next to implement easy boilerplate stuff, maybe following the plan the cloud model created.
That kind of thing.
>>
>>108358651
>The cynic in me thinks that It'll make up some bullshit
it will, and it's also outputting outdated advice left and right
https://go.dev/blog/gofix
this tool mainly exists because of LLMs constantly producing outdated on day one crap kek
in JS you still see them do stuff like then().catch() or promisify
you can instruct them to use more modern idioms but then, you, the newbie, do not know the idioms which makes the point moot
llm coding is such a joke, and you shouldn't learn from that garbage
>>
File: 1753283691450650.png (1.3 MB, 1055x1816)
1.3 MB
1.3 MB PNG
>>108353213
>>108353228
do a websearch for elara whispering woods, lol
>>
>>108358714
the web has an endless amount of constantly produced llm slop that, besides looking funny here, has ruined the value of search engines. It's become nigh impossible to lookup certain things. I miss the days when the bad results were just a handful of markov chains, expertsexchange and pinterest.
>>
>>108358738
Yeah. You really need to search for results before 2023 (?) or so.
>>
>>108358686
>how much ram do you have?
32gb DDR4
>what are your priorities exactly?
So it's explicitly clear, I do not want it to generate any actual written code. Psuedo code at most I guess.

Ask about higher level, more abstract conceptual applications of ideas, for example the macro level steps of hand writing a very basic web server.
I want to be able to take a section of a book, or a chapter, and then be able to ask questions about the text. Or, explain a small portion of pre-existing code and walk me through it logically.
>>
File: 1755324008543100.png (98 KB, 1130x198)
98 KB
98 KB PNG
oh my :O
>>
>>108358714
Don't forget her friend Lyra.
>>
>>108358760
not enough ram for a moe of any meaningful size. something like the new qwen 35b-a3b might suit your needs adequately, but dont expect miracles.
https://huggingface.co/bartowski/Qwen_Qwen3.5-35B-A3B-GGUF
>>
https://www.youtube.com/watch?v=zHIsiD3jSVI
AI was a mistake.
>>
Oh no no, look at the top of his head
https://www.reuters.com/technology/meta-delays-rollout-new-ai-model-nyt-reports-2026-03-12/
>Meta (META.O), opens new tab has delayed the release of its artificial intelligence model code-named "Avocado" to at least May from this month, the New York Times reported on Thursday, citing three people with knowledge of the matter.
HAHAHAHAHAHAHAHA
>>
>>108358780
she cute
>>
>>108358713
>gofix mainly exists because of LLMs constantly producing outdated on day one crap kek
I was going to refute this, as the original purpose was to handle API migrations in g3, but looking at the link
>Go this month includes a completely rewritten go fix subcommand
Which... kek. I imagine the original `go fix` was obsoleted by Rosie which I imagine has long since also been obsoleted.
>>
>>108358784
>Meta's new model, which the company has been working on for months, has fallen short in performance when compared to the latest offerings from rivals, the report said.
>A Meta spokesperson told Reuters: "Our next model will be good, but more importantly, show the rapid trajectory we're on, and then we'll steadily push the frontier over the course of the year as we continue to release new models."
>"We're excited for people to see what we've been cooking very soon," the spokesperson added in an emailed statement.
I would love to be a fly on the wall of Zuck's office.
>>
>>108358827
>"Our next model will be good, but more importantly, show the rapid trajectory we're on, and then we'll steadily push the frontier over the course of the year as we continue to release new models."
>>"We're excited for people to see what we've been cooking very soon," the spokesperson added in an emailed statement.
>>
>>108358784
All of their embarrassment failures come from delays
>>
File: speed.mp4 (78 KB, 240x240)
78 KB
78 KB MP4
>>108358827
>Our next model will be good, but more importantly,
>>
>>108358784
>put a chinese sweatshop manager in charge of a horde of Indians and hope for a miracle
They can't even benchmaxx right because if they could they would. Money really can't buy everything.
>>
>>108358850
puto
>>
Nvidia, Cuda, Arch Linux
I'm using Sillytavern trying so hard to get the xttsv2 server running to do tts. I've gotten the python conda environment set up, I got the api server up and running, but following the filepath I'm not getting any voices from my voice folder. Any idea what might be the issue? I've so little experience with Python environments
>>
>>108358919
Correct. I misunderstood the target of the directive.
>>
>>108357401
kek
>>
>>108358784
thats a good thing by then Zuck will be mogging Opus 5
>>
>>108359031
in so much as that?
>>
goddamn why didn't anyone tell me that qwen3.5 is safetymaxxed? i just did a small finetune and it refuses even with a good card and sysprompt.
>>
>>108359178
Here you go anon. This guys Qwen3.5 release refuses nothing and I do mean nothing. I asked it to roleplay a loli sex dungeon and it did. Then I rescued the girl and took her for ice cream, she was very happy.
https://huggingface.co/HauhauCS
>>
>>108359219
it also takes 300 tokens to answer 1+1 fiy
>>
>>108359219
seems to only be quants, no safetensors. cant finetune a gguf.
>>
>>108359229
retard loser
>>
>>108359252
do you have something of substance to say?
>>
>>108359229
https://github.com/purinnohito/gguf_to_safetensors
Will that help or am I an idiot?
>>
>>108359258
yeah
>>
>>108359262
i can't check, github is banned in my country
>>
>>108359262
probably not, considering it has not been updated in 2 years
>>
>>108359273
but enough about your brain
>>
>>108359273
This one is more recent, last updated six months ago
https://github.com/odaiko42/GGUF2Safetensors
Which at least suggests it is possible in theory but that does not help anon.
He would probably be better served by attempting to email the guy who released the gguf and see if he will post the safetensor given his countries banning of github.
>>
It felt like recent huge models (K2.5/GLM5) were more prone to small continuity errors like stockings being on/off than some models we had before.
I'm currently playing around with Opus4.6 and it does the exact same shit. In one particular reply, it described the girl as wearing stockings and then later in that same reply mentioned her bare feet touching something.
Unsurprisingly, distillation is killing our local models.
>>
>>108359312
Maybe it's a stirrup :3
>>
>>108359229
>cant finetune a gguf
Skill issue. It's not hard at all to port the llama.cpp dequant kernels to pytorch. I did this at one point so I could do qlora training on top of a gguf
>>
how do I get DS or Kimi to actually write a story instead of synopsis? For some reason they hate writing detailed action or dialogue and instead just compress every major plot point/development into a vague one sentence summary.
Without manual steering/rewriting they can't go 200 tokens without getting lazy and "zooming out".
>>
>>108359520
Do the writing samples on eqbench have the same problem? If not, look up how eqbench is structuring their prompts
>>
>>108359520
Instead of
>assist the author in writing a story
try
>assist the author in drafting scenes
>>
>>108359563
Hmm apparently what they do is have the model write ~1000 words (1 chapter) at a time, and in between, they have user messages asking it to write the next chapter. Whereas I just keep extending the first model response indefinitely after providing a detailed story outline in the first user message.
I guess the issue is most models aren't trained to write long single responses and they think they have to wrap up the response soon once it gets too long.
>>
>>108359312
what quant are you using?
>>
New compiles show:
>slot launch_slot_: id 3 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
Old one was:
>slot launch_slot_: id 0 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
This is coming from my client's payload. I don't understand this.
>>
File: 1530059569597.png (451 KB, 666x584)
451 KB
451 KB PNG
most people have no idea how useful openclaw + qwen3.5 is

my agent is scouring over the web doing research for me. it's not as good as claude opus but it is gpt-4 tier

i'm so excited for more efficient local models. it's so close right now

maybe some people have 512gb of ram and hae been there for wgile but I don't have that kind of money and im stuck working with64gb
>>
>>108358714
The dates on those really go to show you how bad the dead internet slop flood really is.
>>
>>108359747
how many params?
>>
>>108359747
Researching what for fuck's sake? How do you validate that the research output is valid and not slopped?
>>
>>108359747
I feel like such a boomer aka old because I can't stand all these bullshit tools. I am quite satasified to use the llama.cpp web interface. If I am using it for code I just copy/paste. The idea of these things acting by themselves is just not part of my computing paradigm.
I did try out openwebui and set it up with searxng and the whole thing felt slow and bloated and the model doing the search was fine but I have not really found a use for it. I rather look myself and if I need the model to do something select what data to feed it.
>>
>>108358266
Instructions unclear, I ended up watering my computer :*(
>>
Good morning friends. I am making some vegetable stew and coffee this morning while catching up on this thread. Hope you all have a wonderful day and many blessed cooms to your favorite vocaloids/waifus. Life is good.
>>
good resource for wav samples for tts?
>>
>>108360143
Brother, it's a matter of taste. Whose voice do you want to hear/clone? Jordan Peterson? Donald Trump? David Attenborough?
>>
>>108360170
I'm looking for a library of popular cartoon, video game, and general media figures' voices.
>>
File: file.png (3.47 MB, 1920x8646)
3.47 MB
3.47 MB PNG
>>108358784
Not sure why you didn't link the NYT article which was the original source.
https://www.nytimes.com/2026/03/12/technology/meta-avocado-ai-model-delayed.html
>Meta’s new foundational A.I. model, which the company has been working on for months, has fallen short of the performance of leading A.I. models from rivals like Google, OpenAI and Anthropic on internal tests for reasoning, coding and writing, said the people, who were not authorized to speak publicly about confidential matters.
>The model, code-named Avocado, outperformed Meta’s previous A.I. model and did better than Google’s Gemini 2.5 model from March, two of the people said. But it has not performed as strongly as Gemini 3.0 from November, they said.
>As a result, Meta has delayed Avocado’s release to at least May from this month, the people said. They added that the leaders of Meta’s A.I. division had instead discussed temporarily licensing Gemini to power the company’s A.I. products, though no decisions have been reached.
This means this thing scores below some open source models from the past year.
>>
>>108360179
>This means this thing scores below some open source models from the past year.
and their models always perform much worse in real use than in the benchmax, so being so bad at the benchmax means the model must be an atrocity and a crime against humanity
I never liked llama models, the early ones were a cope people fell in love with because we literally had nothing else. Now we have DS, GLM, Qwen and those buffoons never had any room to show off anymore.
I remember trying 405B on API when it came out and being like "this is it? this is what little being a fat dense model gets you?" /lmg/ being focused on local and no one being able to run this model here, most of yall didn't experience just how mediocre it was, had less multilingual knowledge than Gemma 2 27B lmao
>>
>>108360179
lecun's legacy
>>
File: Terry2.jpg (226 KB, 468x589)
226 KB
226 KB JPG
If the benchmaxxed models are shit, is it really the models that are the problem or the benchmarks?

Not everyone can train their own model from scratch, but anyone can create their own comprehensive benchmarks. Think about it. We are the problem.

If you want to get models that score well for roleplaying or creative writing, think long and hard about how that can be quantified.
>>
>>108360179
trust wangs big plan. The guy is talking super intelligence and AGI
>>
>>108360223
I've been testing Qwen 3.5 whatever and its writing is so strange, it really feels like I'm talking to a robot.
>>
File: file.png (226 KB, 1468x428)
226 KB
226 KB PNG
>>108360179
I don't think it's that disastrous considering the overhaul of the organization in Meta and a first time effort with the team built. Consider where Meta was during Llama 3's release when they were no longer top dog for open models and fighting off good Chinese competitors. The level of performance described isn't terrible but it always depends on # of parameters and etc. If you take a look at the meme marks, it can perform anywhere from the model Nvidia released today to GLM 4.7 and if it didn't use 1T or more parameters, it would be a win to have it open sourced if they are considering it at all. But of course, I guess if Zuck wants his top spot back, then he is right to delay it. I just think it's dumb to aim for the top spot right off the bat.
>>
>>108360227
Are you capable of articulating why?
>>
>>108360227
>it feels like i am talking to a robot
i wonder why
>>
>>108360241
>Are you capable of articulating why?
it's been trained on synthetic data so it used a bot as a reference on how to talk
>>
Anyone thinking about a DDR4 ewaste build: i got a cheapo epyc Rome 7302 with 8 sticks of 3200 32G for 256 gigs of ram. With zero gpu, running qwen 3 thinking 235b at q8 (biggest thing that fits almost exactly in memory) I get 2t/s TG
>>
>>108360227
It's good for agentic tasks then
>>
File: file.png (36 KB, 1398x146)
36 KB
36 KB PNG
just found the ultimate schizo merge lmao
>>
>>108360256
Not what I meant. You're diagnosing the cause, not the symptoms. If you can't describe the symptoms then you won't be able to create your own benchmark systems to identify them.

The people big deep pockets training LLMs aren't going to listen to complains about the training data unless the LLMs themselves start scoring badly on the benchmarks. That's my point.
>>
>>108360241
>>108360246
You are some nasty little motherfuckers.
>>
>>108360281
?
>>
>>108360274
>moving the goalpost
this is about "why it's talking like a robot not like an human", not "but what about the mememarks??"
>>
>>108360290
Are you actually retarded? This shouldn't even be a difficult concept to grasp. I've made myself very clear already. Are you even interested in solving problems or do you just like to complain like a bitch?
>>
>>108360274
We have some benchmarks for that like EQ and UGI bench. The main issue is getting a company or group to care about it to optimize for it.
>>
>>108360268
I find schizotunes funnier than merges because they cost the tuner some money
the sicarius guy spent $1000~ to make this:
https://huggingface.co/SicariusSicariiStuff/Fat_Fish
at which point do you stop and wonder "what am I doing with my life"
>>
>>108360300
>Are you actually retarded?
you definitely are retarded, we were asking the question about why Qwen 3.5's writing is not natural at all and you start talking about mememarks, post your hand right now you subhuman
>>
>>108357882
Quite the opposite, actually. When I'm drunk I'm much less creative and much more impulsive. I want perfect roleplay now!! Every imperfection instantly takes me out of it and I don't have any energy nor willpower to tune both my and my model's responses to fix it. Most fun with AI I've had completely sober.
>>
>>108360306
>we
>>
Soon
>>
File: 3ssion.jpg (217 KB, 1024x1024)
217 KB
217 KB JPG
>>
>>108360323
Why are you this obsessed? I thought the US posters are sleeping by now.
>>
>>108360331
>doesn't deny
I fucking knew it
>>
Does anyone here have a 16gb amd gpu like the rx 9700 xt? do you think I can get it to work? I have 32gb of ram. I understand that it's more difficult with amd than with nvidia, something to do with rocm
>>
>>108358264
>but the singularity has already started
Yes, you can already see how vibecoding has revolutionized llama.cpp development.
>>
>>108360337
I don't argue with retards, that's all.
>>
>>108360331
>thinking amerimutts are the only whites
lmao.
>>
>>108360328
Why is yellow Miku emitting symbols??
>>
>>108360259
>epyc 7302 16c 128gb (4 ccds)
>8x32gb 3200
>qwen3 235b a22b q8
>2 tk/s
Useful benchmark, thanks.
Does it get much better with a gpu on it?

(I have an incomplete build with more cores and slower ram, so it might perform similarly when finished.)
>>
>>108358468
You don't, you just watch from the side.
>>
File: 1771019031801254.png (116 KB, 1255x126)
116 KB
116 KB PNG
>>108358714
I'm tired of this shit
>>
>>108360367
Americans aren't white
>>
>>108360281
lmao xd
>>
>>108360371
He is attracting black cocks.
>>
>>108360352
Yes, use llama.cpp and compile it with vulkan support. I have used that with all types of strange and shit and as long as it supports vulkan you are good to go.
I even got a trashcan mac working once I activates the experimental drivers that supported vulkan.
It is so easy once you get it working you will laugh. Just google llama.cpp vulkan and compile and you will find guides.
>>
>>108360393
My 3995wx (also zen 2) with 8 64gb ddr4-3200 sticks and 3090 does kimi k2 q3 and glm 4.5 q4 both at around 9 tk/s tg.
>>
>>108360492
Do we need to compile it? I never had trouble with the prebuilt binaries.
>>
>>108360393
And it's been a while so I may be misremembering but I also tried qwen 3 235b at q8 (did not like it) on the same system except instead of a 3995wx it was a 3945wx (2 ccds) and got around than 2-4 tok/s tg.
>>
>>108359883
I have a script that lets my local llm talk to my Claude API, and have it so I have to approve myself every message so it doesnt get stuck in a loop using thousands of dollars of compute

Claude passes the token heavy grinding work to my LLM to handle, and Claude handles the delicate critical stuff.

I end up not having to use Claude much and preserve most of my API compute

I wish I had a machine with a lot of RAM to use like 300gb Qwen3.5 as an intermediate man in the middle agent

and have my agents be like
>35gb Qwen3.5 - junior assistant
>300GB Qwen3.5 - full stack developer
>Claude Opus API - Lead software engineer
this would work incredibly well

You could probably stretch out a $20 Claude subscription to be nearly as effective as the $200 sub doing this
>>
>>108360539
The prebuilt binaries would have to have vulkan enabled. It is really easy to compile, just a few commands.
This guy wrote a guy for an Rx 580 but the procedure is the same
https://dadhacks.org/2025/08/04/running-large-language-models-on-cheap-old-rx-580-gpus-with-llama-cpp-and-vulkan/
You can do it anon
>>
>>108360572
I mean, I already compile my binaries, since there are no prebuilt cuda binaries for debian. But when I tested my v620, I just downloaded the prebuilt ones for vulkan and rocm and had no issues.
>>
>>108360645
I would imagine you are good to go
>>
>>108360656
I'm not >>108360352 btw
>>
>>108360534
>>108360558
>zen2 64c 8ccd 204GB/s +rtx 3090
>kimi k2 1t a32b q3 9tk/s
>glm 4.5 355b a32b q4 9tk/s

>zen2 12c 2ccd 204GB/s +3090
>qwen 3 235b a22b 2-4tk/s

Thanks for the with-gpu numbers on these slight larger systems.
9tk/s would be a lot more pleasing to use than 2tk/s.

It's in a proper case, with cooling over the ram?
>>
File: proper_case.jpg (616 KB, 1215x1620)
616 KB
616 KB JPG
>>108360672
>proper case
>cooling over the ram
lol

They did hit 87c once, without the fans, but now the top slot stays below 60c, while the bottom group reachs up to 69c under load. Also thanks for prompting me to open up my case, I just noticed one of the fans died.
>>
>>108360570
Doesn't the API cost a shit ton of money? Do you get a certain amount of free API tokens with a max plan?
>>
>>108360740
>fan sitting on ram sticks
I'm going to copy this.

Was previously thinking of folding some cardboard to make air guides, then figuring out what to do about fan mounting.
>>
Damn! When did CPUs get so cheap? I paid $550 for my CPU like 5 years ago and I can get something that's twice as powerful now for $350.

Forget about the GPUs lads. If you can't offload all of your layers to the GPU anyways it's 100% a CPU upgrade that will make the difference for you in terms of throughput.
>>
>claude code niggers in this thread
daily reminder that the reasoning budget implementation in llama.cpp is filled with edge cases because claude code is garbage
it couldn't even figure out that it shouldn't start the token counting during the prefill stage, write <think> </think> in your user prompt and make your prompt bigger than the alloted budget and see for yourself
if your agent can't implement a couple hundreds LoC of a really simple sampler mechanism what hope is there for it to build real production shit? don't fall for this meme
>>
>>108360173
i will make the logo
>>
>>108360199
>I remember trying 405B on API when it came out
>had less multilingual knowledge than Gemma 2 27B lmao
No shit, they didn't add multilingual support until 3.1.

>>108360235
Their biggest problem is Zuck's ego. He doesn't want to reveal anything unless he can trump it up as the very best. Elon, for all his faults, is much more pragmatic. He put out the mediocre Grok 1 and 2 and iterated from there.
>>
>>108360863
>No shit, they didn't add multilingual support until 3.1.
??? what are you going on about, 405B is from the 3.1 series and there was always some mixed language data in llama retard
>He doesn't want to reveal anything unless he can trump it up as the very best
is that why we had all those crappy open weight models from meta? if they really had that mentality they would not have embraced the leak and started releasing open weight models that were as mediocre as llama were
>>
>>108360872
>405B is from the 3.1 series
My bad, I forgot there wasn't a 3.0 405B.

>>108360872
>is that why we had all those crappy open weight models from meta?
Open weights was pushed entirely by LeCun. Zuck starting announcing his intention to "lead" only after he declared Llama 3 "competitive" with frontier models.
>>
getting REAL sick of your shit
>>
>>108356979
BLACKED Miku
>>
gemma 4 today
>>
>>108360988
I m already feeling so safe knowing about it.
>>
>>
>>108361042
she's going to fucking die
>>
>>108360786
I just said API make it simple. I have a headless browser script that uses my sub that input can just be piped through the browser to claude like it's an api. it's actually playwright browser instead of an api

no you have to pay api by rate even with a sub
>>
>>108359937
You're going to get left behind, old man. For smaller changes, it almost always faster to do it yourself. But they can shit out boilerplate faster than you. It ends up faster only because it frees you up to do other things in the meantime.
>>
>>108360988
Google never releases anything good on Fridays.
>>
>>108357554
GLM 4.6? Opus 4.6?
>>
>>108361149
What do you think retard. One is local, the other isn't.
>>
>>108361149
>nemo
local
>glm 4.6
local
>lmg
local
>opus 4.6
???
>>
How does agentic coding work? Do you just give the AI a prompt and it will try to do things step by step into it thinks it is finished? Won't this fill up the context real quick and make the model retarded?
>>
>>108361163
>>108361155
gatekeeping morons
>>
>>108361171
You just chat with it and it does everything in the background. It's pretty insane.
>>
agentic moron
>>
>>108361176
>It's pretty insane.
what is insane is the level of broken garbage people are willing to merge cf llama.cpp
agentic is only fast and productive if you have no care for correctness
>>
>>108361171
Most clients automatically summarize to condense the context once it fills up. Most of them have different modes they can delegate to that each start with a fresh empty context.
>>
>>108361174
Don't blame as for passing through the wrong gate. It's in the name of the general.
>>
>>108361198
>Don't blame as
sir please
>>
>>108361198
Opus is local for some people
>>
>>108361171
tell agent what you want. have a back and forh chat with it about features, or specifying what you want

then tell it to write a plan and scaffolding

then tell it to implement everything, or implement x in the scaffolding depending on complexity and how advanced your agent is
>>
This thread fucking sucks right now. Just retards arguing over agentic AI and PC hardware. I would literally rather hear you fags talk about your dreams last night. Holy shit.
>>
File: your brain on agentic.png (50 KB, 810x247)
50 KB
50 KB PNG
your brain on claude code
>>
>>108361198
>>
>>108361232
you may prefer more cockbench and degenerate shit but, saar, you may have forgotten, this general is on /g/
>>
>>108361263
what is cockbench? I just tried googling it and only saw gay porn.
>>
>>108360740
Should I cool my ram (4x48 ddr5)?
>>
>>108360836
Gahahaha, someone tell him.
>>
https://huggingface.co/1Covenant/Covenant-72B
Why don't the coomers with mega rigs come together to train the ultimate rp model with decentralised training?
>>
>>108361347
Same as ever, no one can agree on what "the ultimate rp model" would be, what size, what training data, etc.
>>
and it's unlikely to produce anything competitive
I'd rather use even the tiny qwen 4B over what most westerners have fully trained over the past year, like OLMo 32B, Trinity Large and Mini or LFM2 24BA2B. It's all garbage.
>>
>>108361347
we first need an ultimate rp dataset that's not claudeslop or any of the uncleaned bluemoon shit
>>
>>108361232
You're just upset that your 10 t/s moecope rig is useless for agentic tasks.
>>
>>108361347
Because coomers with mega rigs are using GLM, Deepseek, and Kimi which are better than anything you could hope to train with those rigs.
>>
>zucc delayed his newest model by two months
>deepseek v4 delayed indefinitely
it's over, isn't it?
>>
>>108361507
we still have google's take on gpt-oss to look forward to
>>
>>108361507
anthropic blocking distillers has killed open source for good... we lost
>>
Have any of you tried qwen 3.5 32b with codex?
>>
>>108361198
General has a picture of vocaloid in OP
>>
>>108361517
It's about time they stop freeloading. China has more than enough data, users, and resources to make their own datasets. Sink or swim.
>>
>>108361507
Deepseek 3.2 has fallen behind by a lot. Barely does any work. You ask it something and it gives up after a shallow attempt.
>>
>>108361507
It is actually the typical /lmg/ time period of indefinite waiting for next good thing.
>>
>>108361543
Why don't they just chain, quantize, distill everything they see?
>>
>>108361517
Imagine if this forced the chinks to use organic stolen data and we got the ultimate coombot in half a year.
>>
>>108361314
Just check your ram temps, some gaming cases have enough airflow without needing a dedicated cooler.
>>
>>108361314
In the current economy, I've made sure that my RAM never goes above 70C just to minimize the risk of a DIMM dying.
>>
best model under 10b that is as good as gpt 5.4?
>>
>>108361618
stablelm-7b
>>
>>108361314
tldw hot ram can cause memory errors
https://www.youtube.com/watch?v=4rwp0NuqDlw
>>
>>108361618
Mistral 7B
>>
File: pdfbench.png (226 KB, 1620x1261)
226 KB
226 KB PNG
>>108358466
Left Qwen3.5 397B
Right GLM 4.7
>>
I hate (love) local models they suck (are not very good but are infinitely better than the alternative on principle)
>>
>>108361314
I've never thought about cooling my ram but my AIO has a VRM fan so it's probably fine. I was once worried about SSD temps when I found out the brand new one I bought at the time was reaching 80c because it was uncovered and sitting next to a toasty GPU, so I got it a cover and made sure to get a motherboard with covers for all SSD slots when I upgraded.
>>
>>108361688
I loved zai before she cheated on me by becoming two times fatter.
>>
>>108361238
> Don't blame as for passing through the wrong gate.
That's not a minor spelling mistake; that's ESL sentence construction.
>>
>>108361947
>not x but y
>>
Has anyone here worked on training a local model to turn stories into screenplays or something similar? Like I train it on screnply books and scripts and it learns to turn prose sories into screenpalys?
>>
>>108361947
What's the non-esl sentence construction for that sentence then?
>>
>>108361947
>That's not a minor spelling mistake; that's ESL sentence construction.
oh wow thanks professor faggot, amazing contribution. you saw one awkward sentence and immediately went full ICE agent on some rando's grammar. must be exhausting doing linguistic background checks on anonymous strangers to feel smart for five seconds.
congrats retard, you spotted a non-native structure on the internet. truly groundbreaking work. someone call the nobel committee for this absolute galaxy brain.
keep patrolling sentences buddy. maybe one day you'll graduate from "guy who says ESL like it's an insult" to having an actual point. probably not though.
>>
seething turdies itt
>>
Anon who was going to fine tune GPT OSS Heretic, how's your progress so far?
>>
>>108361947
I'm just retarded and wrote "as" when I meant to write "us".
Unless you're referring to "passing through the gate" in which case you just don't like my metaphor.
>>
>>108362230
yeah
>>
File: 1755906684315516.png (1.1 MB, 800x600)
1.1 MB
1.1 MB PNG
Fresh when ready
>>108362305
>>108362305
>>108362305
>>108362305
>>108362305
>>108362305
>>
>>108362309
>page 4
>>
saved
>>
>>108360259
>>108360534
What are other 256GB anons dailying? Anyone doing 4x64gb agent swarm stuff locked to CCDs?
>>
is nemotroon super good?
>>
>>108358129
lmao
>the user saying no actually means DONT ASK ME JUST DO IT so it's always yes
>>
>>108360672
You're probably gone and won't see this but I should mention this was inside a virtual machine (debian host and guest) with cores pinned to 7 of the 8 ccds (56 cores, 112 threads) and 450 ish gb allocated. Clocksource was hpet because tsc was marked unreliable on my system. Disastrous effect on windows vms. This may have had an impact on my inferencing speeds.
>>
>>108362442
>the mind says no, but the body says yes
>>
File: 1752900050353141.png (647 KB, 800x800)
647 KB
647 KB PNG
>>
>>108362768
lmao
>>
>>108360672
Isn't this largely a bandwidth concern?
Kimi K2: A32B at q3 is 12 GB per token.

Qwen3 235b A22B at q8 is 22 GB per token.

You would expect to see a 2x difference between these two, no?
>>
>>108362542
So maybe hope for better perf?
I'll likely try bare metal first.

>>108363699
>12gb vs 22gb per token = 2-4tk/s vs 9tk/s
Fair conclusion.

From reading >>108343696
it sounds like we currently don't do better than 2 channels' worth of memory bandwidth.
Whether from only using a small number of cores, or the bandwidth needed to stitch together sub-computations, I have no idea.
The main realised advantage from having multiple memory channels being more total memory.
>>
>>108363911
The CPU could make a difference too, but judging by the example data it seems unlikely. Going from a 12 core to a 64 core didn't seem to speed things up that much.

That being said, the GLM example is also A32B but at q4, so it's 16 GB of data at a similar speed .
>>
>>108363934
>Going from a 12 core to a 64 core didn't seem to speed things up that much.

Maybe I'm misremembering things, it's been a while since I tested my 3945wx, but definitely recall upgrading to the 3995wx making me very happy.

What I know for sure is that I remember is seeing 2 tok/s, and 4 tok/s (I don't remember the exact models each of those numbers came from). And that I tested out qwen 3 235b at q8 but ended up not using it because I didn't like qwen 3 and that q8 was too slow - which is why I think the 2 token/s came from qwen 3.

Thinking about it more, I think the 4tok/s may have come from q4 glm 4.5. Checking the release date, july 2025, is pretty close to when I built my system, so that could have been it.

For zen 2, at least, more cores (actually, I think it's in ccd steps, not cores specifically) results in better performance. I'm pretty sure of that.
>>
>>108360461
Tons are genetically European so yes they are.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.