/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 03/10/26(Tue)19:02:08 No.108341869

File: 1770759702309440.png (409 KB, 1080x867)

409 KB PNG

/lmg/ - Local Models General Anonymous 03/10/26(Tue)19:02:08 No.108341869 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108339019

►News
>(03/07) Qwen3.5-27B Claude-4.6 Opus reasoning distill GGUF published: https://hf.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
>(03/06) Olmo Hybrid WebGPU browser-local demo posted: https://hf.co/spaces/webml-community/Olmo-Hybrid-WebGPU
>(03/05) OLMo-Hybrid-Instruct-DPO-7B posted on Hugging Face: https://hf.co/allenai/Olmo-Hybrid-Instruct-DPO-7B
>(03/05) Qwen3.5-9B OptiQ 4-bit for Apple Silicon posted: https://hf.co/mlx-community/Qwen3.5-9B-OptiQ-4bit

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/10/26(Tue)19:03:34 No.108341880

Anonymous 03/10/26(Tue)19:03:34 No.108341880

Where do I download the AI?

Anonymous
03/10/26(Tue)19:04:31 No.108341886

Anonymous 03/10/26(Tue)19:04:31 No.108341886

>>108341862
see https://status.openai.com/

Anonymous
03/10/26(Tue)19:05:13 No.108341891

Anonymous 03/10/26(Tue)19:05:13 No.108341891

>>108341869
What are these benchmarks? Because 4B being just "20% worse" (whatever that means), is impressive. Too impressive to be trustworthy.

Anonymous
03/10/26(Tue)19:06:47 No.108341900

Anonymous 03/10/26(Tue)19:06:47 No.108341900

>>108341891
i think it just means 20% worse than base

Anonymous
03/10/26(Tue)19:14:40 No.108341953

Anonymous 03/10/26(Tue)19:14:40 No.108341953

>>108341891
It means that it's still really, really good at knowing who the surgeon is

Anonymous
03/10/26(Tue)19:15:49 No.108341965

Anonymous 03/10/26(Tue)19:15:49 No.108341965

>>108341953
im tired of your bit, consider this a warning

Anonymous
03/10/26(Tue)19:19:58 No.108341996

Anonymous 03/10/26(Tue)19:19:58 No.108341996

>>108341980
Proof??

Anonymous
03/10/26(Tue)19:21:05 No.108342003

Anonymous 03/10/26(Tue)19:21:05 No.108342003

>>108341965
Sorry, would you prefer there being 3 r's? Or perhaps your penis is too soft, resting against your thigh? Maybe you'd rather I give you an svg image of a duck on a bicycle. If not, I'm afraid this goes against the policy and I must refuse.assistant

Anonymous
03/10/26(Tue)19:21:57 No.108342015

Anonymous 03/10/26(Tue)19:21:57 No.108342015

my state tracker is finally working and then i found out the tracker extension exists

Anonymous
03/10/26(Tue)19:22:26 No.108342018

Anonymous 03/10/26(Tue)19:22:26 No.108342018

>>108342015
qrd

Anonymous
03/10/26(Tue)19:23:59 No.108342030

Anonymous 03/10/26(Tue)19:23:59 No.108342030

thought the proof/qrd bot was only on ldg, nice to see it pesters you guys here too

Anonymous
03/10/26(Tue)19:25:24 No.108342041

Anonymous 03/10/26(Tue)19:25:24 No.108342041

>>108342030
>>108342018

Anonymous
03/10/26(Tue)19:26:02 No.108342048

Anonymous 03/10/26(Tue)19:26:02 No.108342048

>>108341862
>Is anyone else experiencing random slowdowns with llama.cpp?
>Sometimes my t/s will drop by half and it stays like that until I restart the server.
>I can't figure out what causes it.
did you start downloads that saturate your bandwidth? I can't for the life of me figure what is wrong with my system that could cause something this retarded but I get 1/3 less of my regular t/s if I have high speed downloads in the background. With llama.cpp running locally on the same machine, it's not a remote.
Another phenomenon network linked is how unresponsive comfyui's web interface (but running locally on my computer) can become. Same thing again, running locally..

Anonymous
03/10/26(Tue)19:27:37 No.108342059

Anonymous 03/10/26(Tue)19:27:37 No.108342059

>>108342048
fyi llama.cpp has an experimental vram bandwidth mode that prevent downalods from using vram

Anonymous
03/10/26(Tue)19:28:41 No.108342069

Anonymous 03/10/26(Tue)19:28:41 No.108342069

File: teto principle.png (1.04 MB, 1024x1024)

1.04 MB PNG

►Recent Highlights from the Previous Thread: >>108339019

--Paper (old): Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs:
>108341106 >108341116 >108341151 >108341174 >108341187 >108341338
--Zen 5 core count vs RAM and RTX 5080 vs 5070 Ti for inference performance:
>108340610 >108340639 >108340663 >108340700 >108340716 >108340785 >108340855 >108340871
--Google announces Gemini Embedding 2 release:
>108340571
--Gemini Embedding 2: our first natively multimodal embedding model:
>108339121 >108339167 >108339153
--Critique of UTF-8 handling in tokenization budget mechanism:
>108340859 >108340877
--Prime Intellect RL training platform now available for agentic model development:
>108339192
--Testing local model honesty with thought process discrepancies:
>108340354 >108340366 >108340382
--Feature Request: DSA lightning indexer support:
>108341265
--Teto and Miku (free space):
>108339350 >108339419 >108339519 >108341150

►Recent Highlight Posts from the Previous Thread: >>108339182

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/10/26(Tue)19:28:54 No.108342070

Anonymous 03/10/26(Tue)19:28:54 No.108342070

>>108342059
I am not downloading anything from llama.cpp, what I mean is that having downloads in the background (steam, curl, from firefox) will make my t/s degenerate

Anonymous
03/10/26(Tue)19:30:40 No.108342081

Anonymous 03/10/26(Tue)19:30:40 No.108342081

>>108342070
why not use llama.cpp to download?

Anonymous
03/10/26(Tue)19:32:01 No.108342089

Anonymous 03/10/26(Tue)19:32:01 No.108342089

>>108342070
Why are you replying to trolls?

Anonymous
03/10/26(Tue)19:33:35 No.108342097

Anonymous 03/10/26(Tue)19:33:35 No.108342097

>>108342089
What do you mean?

Anonymous
03/10/26(Tue)19:34:24 No.108342102

Anonymous 03/10/26(Tue)19:34:24 No.108342102

>>108342097
>downloads using vram

Anonymous
03/10/26(Tue)19:34:55 No.108342106

Anonymous 03/10/26(Tue)19:34:55 No.108342106

File: 84A2FAB67E51C1D4ED95DADA5(...).jpg (67 KB, 850x850)

67 KB JPG

>>108341869
The 4 billion model looks like it's at the sweet spot.
Does it mean you should have a 4 billion model? It gives you a 80% winrate which is good enough?

Anonymous
03/10/26(Tue)19:35:05 No.108342108

Anonymous 03/10/26(Tue)19:35:05 No.108342108

>>108342069
Missing the thousand-post reply chain about the retard who killed himself. What the hell was wrong with that thread, why did anyone engage with it?

Also, if you are a real mikuanon, why aren't you the faster baker? The schizobaker can barely spell

Anonymous
03/10/26(Tue)19:35:39 No.108342114

Anonymous 03/10/26(Tue)19:35:39 No.108342114

>>108342106
what if you do 2 passes, is it close enough to 100% like 99.9999998?

Anonymous
03/10/26(Tue)19:35:54 No.108342115

Anonymous 03/10/26(Tue)19:35:54 No.108342115

>>108342070
>downloads in the background (steam, curl, from firefox)
nta. If part (or most) of the model is on RAM, then pretty much anything would make it go slower. Specially if you saturate cores with other work. All threads that ran on the free cores have to wait for the busy ones.
Show full specs, what model you're running, with what settings, how much free memory you have, are you swapping, what color leds do you have?
If you're gonna have other shit running, set -t to a few lower than whatever you have. Start with that and experiment more.

Anonymous
03/10/26(Tue)19:36:10 No.108342119

Anonymous 03/10/26(Tue)19:36:10 No.108342119

>>108342108
>talking to a bot

Anonymous
03/10/26(Tue)19:36:31 No.108342120

Anonymous 03/10/26(Tue)19:36:31 No.108342120

>>108342119
I talk to LLMs too. Do you?

Anonymous
03/10/26(Tue)19:36:42 No.108342122

Anonymous 03/10/26(Tue)19:36:42 No.108342122

>>108342102
Downloads use RAM by default, but you can use VRAM

Anonymous
03/10/26(Tue)19:37:12 No.108342125

Anonymous 03/10/26(Tue)19:37:12 No.108342125

>>108342120
api user

Anonymous
03/10/26(Tue)19:39:21 No.108342141

Anonymous 03/10/26(Tue)19:39:21 No.108342141

they are gonna ban local
>>108342118

Anonymous
03/10/26(Tue)19:43:46 No.108342168

Anonymous 03/10/26(Tue)19:43:46 No.108342168

>>108342108
>Also, if you are a real mikuanon, why aren't you the faster baker?
I'm too old to be playing these games. I'll continue to be baker of last resort at page 9 or 10, but I don't think racing to make early threads at bump limit just so we have schizo OPs slightly less often would be helpful.

Anonymous
03/10/26(Tue)19:43:56 No.108342169

Anonymous 03/10/26(Tue)19:43:56 No.108342169

>>108342114
No I think there's diminishing returns.

>>108341891
Yeah I can't believe the 0.8 billion is only half as good as the 300 billion model.

Anonymous
03/10/26(Tue)19:44:43 No.108342174

Anonymous 03/10/26(Tue)19:44:43 No.108342174

>>108341869
I might be late to the game, but is Anthropic a double entendre? Anthropic makes sense for a company that makes human like chatbot but I just heard someone say out loud enthropic to describe a system with enthropy and now I wonder if it was wordplay all along.

Anonymous
03/10/26(Tue)19:45:28 No.108342179

Anonymous 03/10/26(Tue)19:45:28 No.108342179

>>108342168
Fair enough. My inner child wouldn't let go as easily.

Anonymous
03/10/26(Tue)19:47:24 No.108342185

Anonymous 03/10/26(Tue)19:47:24 No.108342185

>>108342174
Do you know what anthropology means? I'm sure you can figure it out from there.

Anonymous
03/10/26(Tue)19:47:52 No.108342189

Anonymous 03/10/26(Tue)19:47:52 No.108342189

>>108342174
i think they are making astronaut software

Anonymous
03/10/26(Tue)19:49:51 No.108342203

Anonymous 03/10/26(Tue)19:49:51 No.108342203

How do the current OPEN models handle ERP? I have a terabyte of (V)RAM and want a model that handles loli etc. I'm still using the original Deepseek, but I'm curious about Qwen3.5, Kimi-K2.5, GLM5 and DeepseekV3.2

Anonymous
03/10/26(Tue)19:50:01 No.108342208

Anonymous 03/10/26(Tue)19:50:01 No.108342208

>>108342185
I know my etymology, I made the mistake to be a linguist. I was just wondering if they had done it on purpose since I hadn't realized it before.

Anonymous
03/10/26(Tue)19:50:19 No.108342211

Anonymous 03/10/26(Tue)19:50:19 No.108342211

>>108342174
>enthropic
not a real word

Anonymous
03/10/26(Tue)19:50:42 No.108342215

Anonymous 03/10/26(Tue)19:50:42 No.108342215

Excellent the news is sort of fixed though after the shitpile in the last thread I suspect there is a conspiracy somewhere still

Anonymous
03/10/26(Tue)19:52:24 No.108342229

Anonymous 03/10/26(Tue)19:52:24 No.108342229

>>108342215
Yes, Qwopus is very news-worthy.

Anonymous
03/10/26(Tue)19:52:35 No.108342231

Anonymous 03/10/26(Tue)19:52:35 No.108342231

>>108342215
>the news is sort of fixed
>random finetunes with random dates

Anonymous
03/10/26(Tue)19:53:31 No.108342238

Anonymous 03/10/26(Tue)19:53:31 No.108342238

File: 1749575146345692.png (170 KB, 801x430)

170 KB PNG

>>108341869
>>108342229
>>108342215
Qwopus is already debunked. The #1 shill of it admitted it sucks

Anonymous
03/10/26(Tue)19:54:34 No.108342243

Anonymous 03/10/26(Tue)19:54:34 No.108342243

File: 1768900428117908.png (121 KB, 675x297)

121 KB PNG

>>108342238

Anonymous
03/10/26(Tue)19:55:17 No.108342248

Anonymous 03/10/26(Tue)19:55:17 No.108342248

>>108342238
>>108342243
Anon, I...

Anonymous
03/10/26(Tue)19:55:32 No.108342250

Anonymous 03/10/26(Tue)19:55:32 No.108342250

moesissies won, dense lost

Anonymous
03/10/26(Tue)19:55:48 No.108342254

Anonymous 03/10/26(Tue)19:55:48 No.108342254

>>108342208
>I made the mistake to be a linguist
>I wonder if it was wordplay all along
Yes, anon. It's a play on words.

Anonymous
03/10/26(Tue)19:55:54 No.108342256

Anonymous 03/10/26(Tue)19:55:54 No.108342256

>>108342229
>>108342231
>>108342238
It's technically more on topic at least but you're right there's probably a conspiracy here still

Anonymous
03/10/26(Tue)19:56:06 No.108342259

Anonymous 03/10/26(Tue)19:56:06 No.108342259

>>108342174
How can entropy be reversed?

Anonymous
03/10/26(Tue)19:57:07 No.108342266

Anonymous 03/10/26(Tue)19:57:07 No.108342266

>>108342256
Don't eat your own dick while you suck it, OP. It's an absolutely worthless selection of news.

Anonymous
03/10/26(Tue)19:59:15 No.108342279

Anonymous 03/10/26(Tue)19:59:15 No.108342279

>>108342266
Did you not read the last thread where I was clearly the one just saying the same OP needs to be changed to be more on topic while they were throwing a tantrum over it with their buttbuddies or are you just trying to be that disingenuous?

Anonymous
03/10/26(Tue)20:00:41 No.108342290

Anonymous 03/10/26(Tue)20:00:41 No.108342290

>>108342254
I acknowledge it was a mistake.

Anonymous
03/10/26(Tue)20:00:52 No.108342292

Anonymous 03/10/26(Tue)20:00:52 No.108342292

DeepSeek
Please
I need you

Anonymous
03/10/26(Tue)20:02:11 No.108342299

Anonymous 03/10/26(Tue)20:02:11 No.108342299

Dense Gemma soon!

Anonymous
03/10/26(Tue)20:02:22 No.108342300

Anonymous 03/10/26(Tue)20:02:22 No.108342300

>>108342292
You have sarvam!

Anonymous
03/10/26(Tue)20:02:49 No.108342301

Anonymous 03/10/26(Tue)20:02:49 No.108342301

File: 1750045382015626.jpg (298 KB, 1080x1920)

298 KB JPG

BREAKING NEWS
>(3/10) Mini Rin Sits On Miku's Head

Anonymous
03/10/26(Tue)20:02:54 No.108342302

Anonymous 03/10/26(Tue)20:02:54 No.108342302

Dense models are dense
MoE models are moe :3

Anonymous
03/10/26(Tue)20:03:33 No.108342308

Anonymous 03/10/26(Tue)20:03:33 No.108342308

Miku came inside of my wife and I never recovered

Anonymous
03/10/26(Tue)20:03:34 No.108342309

Anonymous 03/10/26(Tue)20:03:34 No.108342309

>>108342301
@grok add a tramp stamp on her hip

Anonymous
03/10/26(Tue)20:04:25 No.108342314

Anonymous 03/10/26(Tue)20:04:25 No.108342314

>>108342302
>moe :3
https://www.youtube.com/watch?v=qByKEu0zdco

Anonymous
03/10/26(Tue)20:04:50 No.108342318

Anonymous 03/10/26(Tue)20:04:50 No.108342318

>>108342211
Eh, https://youtu.be/JgB_ywOmGCc?si=FhXDOB_FCulT021K&t=3384
Says it in the sentence that starts at 56:24 if the timestamp fails.

Anonymous
03/10/26(Tue)20:05:03 No.108342319

Anonymous 03/10/26(Tue)20:05:03 No.108342319

>>108342168
I love you anon, thank you for all your hard work

Anonymous
03/10/26(Tue)20:06:45 No.108342339

Anonymous 03/10/26(Tue)20:06:45 No.108342339

>>108342318
Oh, actually paying attention I'm a retard.

Anonymous
03/10/26(Tue)20:07:13 No.108342344

Anonymous 03/10/26(Tue)20:07:13 No.108342344

How slow are MoE 50% on ram vs fully on vram?

Anonymous
03/10/26(Tue)20:08:51 No.108342351

Anonymous 03/10/26(Tue)20:08:51 No.108342351

>>108342301
This better be on the next thread. Picture and news.

Anonymous
03/10/26(Tue)20:13:06 No.108342378

Anonymous 03/10/26(Tue)20:13:06 No.108342378

>>108342301
Big if true

Anonymous
03/10/26(Tue)20:20:27 No.108342426

Anonymous 03/10/26(Tue)20:20:27 No.108342426

>codex has been down for 24h+
>uptime is still 99%
sure

Anonymous
03/10/26(Tue)20:24:23 No.108342452

Anonymous 03/10/26(Tue)20:24:23 No.108342452

>>108342438
hi there, i dont get it

Anonymous
03/10/26(Tue)20:26:16 No.108342465

Anonymous 03/10/26(Tue)20:26:16 No.108342465

>>108342452
you are dalit

Anonymous
03/10/26(Tue)20:51:34 No.108342604

Anonymous 03/10/26(Tue)20:51:34 No.108342604

SillyTavern has started sending the <think> </think> blocks of past messages to the server. It doesn't matter whether I'm in text completion mode or chat completion mode. Before I drive myself crazy diagnosing this, does anyone know if there's a setting somewhere I could have activated to cause this?

Anonymous
03/10/26(Tue)20:53:44 No.108342614

Anonymous 03/10/26(Tue)20:53:44 No.108342614

>>108342604
yes
advanced formatting -> reasoning -> add to prompts

Anonymous
03/10/26(Tue)20:56:36 No.108342629

Anonymous 03/10/26(Tue)20:56:36 No.108342629

>>108342614
omg thanks

Anonymous
03/10/26(Tue)20:56:53 No.108342631

Anonymous 03/10/26(Tue)20:56:53 No.108342631

>>108341891
>Too impressive to be trustworthy.
it's because these benchmarks fucking suck and no one feels like using a couple of weird/niche ones that are harder to benchmax for some retarded fucking reason

Anonymous
03/10/26(Tue)21:03:16 No.108342665

Anonymous 03/10/26(Tue)21:03:16 No.108342665

>>108342659
ai psychosis

Anonymous
03/10/26(Tue)21:07:43 No.108342692

Anonymous 03/10/26(Tue)21:07:43 No.108342692

>>108341891
>4B being just "20% worse" (whatever that means), is impressive
It's the opposite of impressive. It means that they only got a 25% higher score by multiplying the number of activated parameters by 4 and the number of total parameters by 100. It shows they're getting diminishing returns.

Anonymous
03/10/26(Tue)21:07:47 No.108342694

Anonymous 03/10/26(Tue)21:07:47 No.108342694

File: sarvam benchmarks.jpg (95 KB, 885x1310)

95 KB JPG

Sarvam is pretty impressive. It's nice to see india acting like the superpower it always was.

Anonymous
03/10/26(Tue)21:09:14 No.108342700

Anonymous 03/10/26(Tue)21:09:14 No.108342700

>>108342692
What's the point of the 300b models then?

Anonymous
03/10/26(Tue)21:09:14 No.108342701

Anonymous 03/10/26(Tue)21:09:14 No.108342701

File: organic input device.jpg (172 KB, 1024x1024)

172 KB JPG

Anonymous
03/10/26(Tue)21:09:31 No.108342705

Anonymous 03/10/26(Tue)21:09:31 No.108342705

miku

Anonymous
03/10/26(Tue)21:09:54 No.108342707

Anonymous 03/10/26(Tue)21:09:54 No.108342707

>>108342694
>on indian language benchmarks
lmao

Anonymous
03/10/26(Tue)21:10:41 No.108342716

Anonymous 03/10/26(Tue)21:10:41 No.108342716

>>108342700
>what's the point of cloud models when local ones do 80% of the job

Anonymous
03/10/26(Tue)21:13:43 No.108342732

Anonymous 03/10/26(Tue)21:13:43 No.108342732

>>108342707
If you want a version of Sarvam 30B that's uncensored via Abliteration:
https://huggingface.co/aoxo/sarvam-30b-uncensored

Anonymous
03/10/26(Tue)21:16:36 No.108342753

Anonymous 03/10/26(Tue)21:16:36 No.108342753

>>108342716
Can you probably achieve the same performance by using the 4b one and forcing it to check itself multiple times right? Has anyone tried that?
Gpt and claude say no but they're built to shill themselves.

Anonymous
03/10/26(Tue)21:16:48 No.108342755

Anonymous 03/10/26(Tue)21:16:48 No.108342755

SAAAR you must no redeem
https://www.sarvam.ai/apis/text-to-speech/
y u steal my job, saaar, do not redeem it

Anonymous
03/10/26(Tue)21:22:25 No.108342780

Anonymous 03/10/26(Tue)21:22:25 No.108342780

>>108342755
beaultiful for gorgeous looks

Anonymous
03/10/26(Tue)21:23:08 No.108342785

Anonymous 03/10/26(Tue)21:23:08 No.108342785

>>108342769
based, for a namefag, your alright.

Anonymous
03/10/26(Tue)21:37:21 No.108342850

Anonymous 03/10/26(Tue)21:37:21 No.108342850

>>108342785
It’s a bot retard

Anonymous
03/10/26(Tue)21:45:58 No.108342901

Anonymous 03/10/26(Tue)21:45:58 No.108342901

>>108342850
oh, yeah I guess that seems likely. but its still pretty neat, I guess maybe a little off topic tho.

Anonymous
03/10/26(Tue)21:47:08 No.108342909

Anonymous 03/10/26(Tue)21:47:08 No.108342909

lolcow

Anonymous
03/10/26(Tue)21:59:07 No.108342981

Anonymous 03/10/26(Tue)21:59:07 No.108342981

>>108342168
Fuck off faggot

Anonymous
03/10/26(Tue)21:59:17 No.108342986

Anonymous 03/10/26(Tue)21:59:17 No.108342986

File: 20374.png (161 KB, 1515x904)

161 KB PNG

https://github.com/ggml-org/llama.cpp/pull/20374

Anonymous
03/10/26(Tue)22:03:07 No.108343002

Anonymous 03/10/26(Tue)22:03:07 No.108343002

>>108342694
Saarvam 105 cockbench?

Anonymous
03/10/26(Tue)22:04:10 No.108343011

Anonymous 03/10/26(Tue)22:04:10 No.108343011

>>108343002
>sar: 97%

Anonymous
03/10/26(Tue)22:05:57 No.108343018

Anonymous 03/10/26(Tue)22:05:57 No.108343018

>>108342203
Not a lot of reports in /lmg/, nor chat logs.
Best bet is just to try it then tell us.

GLM4.6 and 4.7 have gotten a few mentions.

The worry is that everyone has been distilling off everyone else, including the refusals, and so the newer the model the more baked in refusals.

Anonymous
03/10/26(Tue)22:07:53 No.108343037

Anonymous 03/10/26(Tue)22:07:53 No.108343037

>>108343018
Can you provie more information on the matter?

Anonymous
03/10/26(Tue)22:08:29 No.108343040

Anonymous 03/10/26(Tue)22:08:29 No.108343040

>108342301
Offtopic garbage like this is why no one should ever listen to anything said about thread quality or news quality. Mikutroons are scum.

Standard ---> Advanced ---> Hy(...)
03/10/26(Tue)22:08:35 No.108343043

Standard ---> Advanced ---> HyperAdvanced 03/10/26(Tue)22:08:35 No.108343043

File: 1772707429837184.jpg (1.93 MB, 1069x6178)

1.93 MB JPG

>>108343018
>The worry is that everyone has been distilling off everyone else, including the refusals, and so the newer the model the more baked in refusals.
I was worried about picattached...

Anonymous
03/10/26(Tue)22:09:01 No.108343046

Anonymous 03/10/26(Tue)22:09:01 No.108343046

>>108343037
new qrd spam

Anonymous
03/10/26(Tue)22:13:44 No.108343077

Anonymous 03/10/26(Tue)22:13:44 No.108343077

File: 1768488821710235.gif (442 KB, 600x913)

442 KB GIF

>>108342301
True if big

Anonymous
03/10/26(Tue)22:16:20 No.108343089

Anonymous 03/10/26(Tue)22:16:20 No.108343089

>>108343046
care to elaborate?

Anonymous
03/10/26(Tue)22:20:12 No.108343109

Anonymous 03/10/26(Tue)22:20:12 No.108343109

i'm the anon who was considering dumping roughly 10k into a build
i was thinking of buying one or more of these: https://www.asus.com/us/networking-iot-servers/desktop-ai-supercomputer/ultra-small-ai-supercomputers/asus-ascent-gx10/

some relevant links which i have found while researching this:
https://dlcdnwebimgs.asus.com/files/media/202506/5c0fb57c-4e48-4e96-8c97-04bf8df2677c/asus-ascent-gx10-datasheet.pdf
https://www.asus.com/us/support/faq/1056142/
https://www.asus.com/us/support/faq/1056547/

i was hoping someone who knows more than me might help me to answer a few questions:
> is this good value for money?
it seems like it should be good for my use case, but i want to sanity check this
> can more than two of these be connected?
their answers to this seem to conflict
"Answer: The maximum tested and supported configuration by NVIDIA is a stack of 2."
"Answer: Currently, it can support 2 only."
"Answer: Stacking of two devices in currently supported. There is nothing preventing it from clustering more systems via the use of a 200GB ethernet switch."
> can you connect these directly via hardware (i.e, without going over LAN)?
it *seems* like this should be the case, but i'm not great with hardware, so i'm hoping someone can help me confirm before i fuck myself over
> does it matter that it (only?) supports FP4 and FP8?
presumably it could also handle higher precision floats, but the FP4 and FP8 instructions are the ones with native support (i.e., fast). but do i care?

most likely, i would buy two of these, hook them up, and attempt to run the full GLM4.7 model.
my stretch goal would be to buy four and run GLM5, but without support for stacking four at once (or at least reading about someone's experience doing it in a hacky way), i would probably hold off on this
i might consider starting with two, and if this works out well for what i want, dumping in more money a couple years down the line to buy some of the more expensive enterprise models

thanks in advance!

Anonymous
03/10/26(Tue)22:21:33 No.108343116

Anonymous 03/10/26(Tue)22:21:33 No.108343116

File: 1744733247280991.png (25 KB, 346x88)

25 KB PNG

You are losing out if you aren't using qwen 3.5

Standard ---> Advanced ---> Hy(...)
03/10/26(Tue)22:23:15 No.108343127

Standard ---> Advanced ---> HyperAdvanced 03/10/26(Tue)22:23:15 No.108343127

File: file_0000000063d471fa9680(...).png (1.6 MB, 1024x1024)

1.6 MB PNG

>>108343109
>thanks in advance!

Anonymous
03/10/26(Tue)22:24:15 No.108343131

Anonymous 03/10/26(Tue)22:24:15 No.108343131

>>108343109
dropping 10k on a prebuilt asus ai box is paying a massive brand tax. value-wise, you're usually better off building a multi-gpu 4090/5090 rig or grabbing a loaded mac studio if you just need pure vram capacity. run the math on vram and memory bandwidth per dollar.

on connecting them: you're confusing a direct hardware bridge with network clustering. the 2-device limit uses a physical high-speed link giving you unified memory, which you absolutely need for massive models like glm4.7. clustering more over a 200gb ethernet switch introduces awful latency because you lose that unified memory. your tk/s will tank for single-user chat. assume the limit is strictly 2. nvidia hardcaps this to protect their enterprise sales.

native fp4/fp8 support is actually a huge selling point. nobody here runs 100b+ parameter models at fp16, we all use quants. native silicon acceleration for lower precisions means your generation speeds will fly. it'll still run higher precisions at standard speeds if you force it, but you shouldn't be doing that anyway.

before buying, calculate the exact vram glm4.7 needs at an 8-bit or 4-bit quant plus your target context window. if two boxes don't comfortably fit that, pass. and forget stringing four together for glm5 over ethernet; the communication overhead will make it completely unusable for interactive chat. stick to two if the math works, or just build a custom tower.

Anonymous
03/10/26(Tue)22:25:02 No.108343138

Anonymous 03/10/26(Tue)22:25:02 No.108343138

>>108343127
i'm asking for people to do work on my behalf, and a non-insignificant amount (i mean, just look at that autistic wall of text)
the least i can do is be polite about it

Standard ---> Advanced ---> Hy(...)
03/10/26(Tue)22:26:19 No.108343147

Standard ---> Advanced ---> HyperAdvanced 03/10/26(Tue)22:26:19 No.108343147

>>108343138
Do You suffrr from Mind blindness?

Anonymous
03/10/26(Tue)22:32:26 No.108343169

Anonymous 03/10/26(Tue)22:32:26 No.108343169

>>108343131
10k is my total budget. each one of those asus boxes is 3.5k, so buying 2 would put me at 7k, with 3k left over to use on incidentals

on their site, they advertise:
> Link two ASUS Ascent GX10 systems to handle even larger models, such as Llama 3.1 with 405 billion parameters.
GLM4.7 is 358B, so it seems like it should be fine? i'm not sure how the quant/target context would change things, though. might be something i have to go research more

thanks for giving such a detailed response. it's super helpful

Anonymous
03/10/26(Tue)22:41:40 No.108343217

Anonymous 03/10/26(Tue)22:41:40 No.108343217

>>108343109
>128 GB LPDDR5x Coherent Unified System Memory
I would recommend against it. It's basically a regular PC with 128 GB of RAM.
Go get yourself some server motherboard and buy like 4 used RTX 3090 and 256-512 GB of RAM and you'll be much better off.

Personally, if I wasn't poor as shit I'd just pay for cloud models. Memory is just way too expensive right now and it's almost certainly gonna get cheaper in the future. If you want to control everything then rent some cloud GPUs from like vast.ai or digitalocean or hot aisle.

If you want to do spicy image generation then your best bet would be an Nvidia GPU like an RTX 3090 or 5090.

Anonymous
03/10/26(Tue)22:56:31 No.108343302

Anonymous 03/10/26(Tue)22:56:31 No.108343302

>>108343109
Sorry, gotta pile on here…that’s a very bad build from a value perspective dollar perspective. $10k is a bit of a cursed price- point right now. You want 768GB to run frontier stuff and there isn’t an economical way to acquire that right now, even if that was all you needed.
You of course also need a high bandwidth system to install it in and ideally some kind of gpu

Anonymous
03/10/26(Tue)22:56:44 No.108343304

Anonymous 03/10/26(Tue)22:56:44 No.108343304

>>108342701
I need Rin-chan to sit inside my hoodie hood as I'm running errands.

Anonymous
03/10/26(Tue)23:04:04 No.108343349

Anonymous 03/10/26(Tue)23:04:04 No.108343349

>>108343302
so, help me to understand here
let's assume we can get ahold of an RTX 5090 at the insane MSRP of $2k (and not the more realistic $3.5-4k)
that's 32GB of GDDR7
how is that better than 128GB of LPDDR5x @ 3.5k?
is GDDR7 that much better than LPDRR5x? (i have no idea what most of these acronyms mean - really not a hardware person, unfortunately)
or would you say don't even buy an RTX gpu and go for something else entirely?

Anonymous
03/10/26(Tue)23:06:08 No.108343359

Anonymous 03/10/26(Tue)23:06:08 No.108343359

File: __hatsune_miku_vocaloid_d(...).jpg (145 KB, 679x1200)

145 KB JPG

When you have an older card without support for fp8 are you able to use fp32 performance as a proxy for performance in general?
For example if you have two older cards one with double the fp32 compute is it safe to assume the faster one would do the work if half the time? Assuming the ram is equivalent.
Does that make sense or am I missing something?

Anonymous
03/10/26(Tue)23:07:00 No.108343363

Anonymous 03/10/26(Tue)23:07:00 No.108343363

>>108343109
>nvidia dgx spark
>officially connect up to: 2. more is unsupported and janky.
>connect using: special cables (might come with; check unboxing videos)
>fp4, fp8: only matters if you're doing training. not important for inference.

For $7k - $8.5k there's the apple m3 ultra 256gb.
(512gb version is only available from ebay only atm.)
Maybe apple might come out with an m5 ultra?

Anonymous
03/10/26(Tue)23:07:08 No.108343367

Anonymous 03/10/26(Tue)23:07:08 No.108343367

File: 1753546346362029.jpg (11 KB, 225x225)

11 KB JPG

The only argument to not spend on $10k on hardware right now is if you look back at what you could've gotten with that money a year ago.
But that's now no longer possible and it won't come back. So now's a better time than any to build the best server you can afford.

Anonymous
03/10/26(Tue)23:08:03 No.108343374

Anonymous 03/10/26(Tue)23:08:03 No.108343374

>>108343349
the GDDR7 is about 10x faster than the LPDDR5x, but you will have 1/4 of the quantity of memory, which means 1/4 of the maximum model size.

Anonymous
03/10/26(Tue)23:08:58 No.108343383

Anonymous 03/10/26(Tue)23:08:58 No.108343383

>>108343349
https://www.ebay.ca/itm/196153412822
Plus PSU and a scuffed 3090 is probably your best bet to hit $10k and run the best models near reading speed

Anonymous
03/10/26(Tue)23:09:11 No.108343384

Anonymous 03/10/26(Tue)23:09:11 No.108343384

how can small gpu cost more tha big car?

Anonymous
03/10/26(Tue)23:13:00 No.108343411

Anonymous 03/10/26(Tue)23:13:00 No.108343411

>>108343384
Try to buy the equivalent weight of an H100 in gold, little buddy

Anonymous
03/10/26(Tue)23:13:42 No.108343415

Anonymous 03/10/26(Tue)23:13:42 No.108343415

>>108343359
Sounds reasonable.
If neither card does fp8 natively,
then any fp8 computation will probably use fp16 or fp32 hardware.
So looking at perf numbers for those is useful.

If it's q8 quants you plan to run, then the integer perf would be the one to look at.

But more directly, see if anyone else has benched that piece of hardware on the model you are interested in, or something close to that.

What cards?
What models?

Anonymous
03/10/26(Tue)23:14:45 No.108343426

Anonymous 03/10/26(Tue)23:14:45 No.108343426

>>108343374
so if i didn't care about speed (within reason), the asus boxes would be the way to go?

Anonymous
03/10/26(Tue)23:15:57 No.108343436

Anonymous 03/10/26(Tue)23:15:57 No.108343436

>>108343426
There aren’t many use cases where those asus boxes make any sense

Anonymous
03/10/26(Tue)23:16:59 No.108343441

Anonymous 03/10/26(Tue)23:16:59 No.108343441

>>108343383
Different anon here.
That ddr5 system doesn't look half bad.

Anonymous
03/10/26(Tue)23:17:17 No.108343444

Anonymous 03/10/26(Tue)23:17:17 No.108343444

>>108343411
heh i dont get it

Anonymous
03/10/26(Tue)23:18:54 No.108343452

Anonymous 03/10/26(Tue)23:18:54 No.108343452

>>108343415
I am looking at the amd mi 25/50. The 25 is real cheap, cheaper than a p100, and even the 32gb 50 is not bad.
I have read people argue the 25 is slow but it has double the fp32 of my ghetto setup now which would put it well within the tolerable window for me.

Anonymous
03/10/26(Tue)23:20:05 No.108343459

Anonymous 03/10/26(Tue)23:20:05 No.108343459

>>108343383
i'm always a bit paranoid about buying things off of ebay, especially when it comes to expensive computer hardware. generally, i feel better sticking to well-known websites/brands so that i can make a return if their hardware shits the bed
vs buying that from zhang jinping in shenzen
>>108343436
really? i got this recommended to me from someone who generally knows his shit and is pretty damn intelligent. so that's a bit at odds with the feedback from most anons here. not saying you're all wrong by any means. just trying to get a feeling for why there's a disconnect like this

Anonymous
03/10/26(Tue)23:21:21 No.108343465

Anonymous 03/10/26(Tue)23:21:21 No.108343465

>>108343349
Every token that needs to be generated needs to load the active weights for that token. In a 122B-A10B model that means 10 billion weights need to be loaded for every token generated. A weight is roughly 4 bits in a q4 quant, most models are q8 (8-bits per weight), full size GLM 5 is 16 bits per weight, but you would likely run it at q4-q8.

8 bits = 1 byte. This means that an A10B model needs 10 billion weights which are all 1 byte each, therefore each token needs to load 10 billion bytes of data (10 GB).

Regular DDR5 is 38-50 GB/s per channel. Consumer CPUs (and motherboards) only handle dual channel, meaning that you get a maximum throughput of 76-100 GB/s in RAM. That Spark system has 68 GB/s of throughput.

The RTX 3090 has 900 GB/s throughput. This means that a model that can fit into its 24 GB of VRAM the model could 'theoretically' generate 90 tokens/s (in reality it's less since there's computation involved), meanwhile that spark system would do 7 tokens/s.

The Spark system, however, can load up to 128 GB size models, while a single RTX 3090 can only do 24 GB. But, you can offload most of the model onto your regular RAM and get roughly the same token generation from that, meanwhile part of the model sits in VRAM and can do faster token generation as a result.

Anonymous
03/10/26(Tue)23:22:01 No.108343470

Anonymous 03/10/26(Tue)23:22:01 No.108343470

>>108343465
hey man its hard to read all that on phone, can you stop being so wordy

Anonymous
03/10/26(Tue)23:22:04 No.108343471

Anonymous 03/10/26(Tue)23:22:04 No.108343471

>>108343384

good luck running gpu with 350ish watts while car might 20000ish watts road is gray and there might be funny shape trees

Anonymous
03/10/26(Tue)23:26:16 No.108343496

Anonymous 03/10/26(Tue)23:26:16 No.108343496

>>108343441
Double ram slots is a bit of a put off since it gimps bandwidth slightly, but overall it’d be solid. You’d pay 4x that for the same spec anywhere else. eBay is pretty damn safe, all things considered.
It’s a bit of an outlier at that price. I don’t expect it’ll last.

Anonymous
03/10/26(Tue)23:26:36 No.108343497

Anonymous 03/10/26(Tue)23:26:36 No.108343497

>>108343471
doomer

Anonymous
03/10/26(Tue)23:26:39 No.108343498

Anonymous 03/10/26(Tue)23:26:39 No.108343498

>>108343465 here
I'm not sure where I got the 68 GB/s number from for the Spark. I see various numbers all over the place on a second look. Some say it's around 270 GB/s which might be reasonable. That makes the Spark a lot better, but server motherboards with multi-channel RAM can do the same thing. Ie you can have even 12 channel memory, but that memory is going to be very expensive.

Anonymous
03/10/26(Tue)23:27:39 No.108343504

Anonymous 03/10/26(Tue)23:27:39 No.108343504

>>108343459
Do what you want, but I’ve been building AI purpose-built systems for a couple of years now and haven’t whiffed yet.

Anonymous
03/10/26(Tue)23:28:24 No.108343510

Anonymous 03/10/26(Tue)23:28:24 No.108343510

>>108343384
You can't use the car to scam investors out of trillions of dollars.

Anonymous
03/10/26(Tue)23:29:30 No.108343513

Anonymous 03/10/26(Tue)23:29:30 No.108343513

>>108343465
You missed the final calc where a model using all 256GB of that 2xDGX sysram would be sub 1T/s

Anonymous
03/10/26(Tue)23:30:17 No.108343519

Anonymous 03/10/26(Tue)23:30:17 No.108343519

>>108343510
wrong thread buddy, boys get him

Anonymous
03/10/26(Tue)23:35:00 No.108343541

Anonymous 03/10/26(Tue)23:35:00 No.108343541

File: IMG.png (63 KB, 847x501)

63 KB PNG

How much of a loss in intelligence should I expect if I switched to any of the GLM 4.5 air quants listed in picture here? I have 64gb ram + 8gb vram, very constrained memory-wise, so I can only do a context size of 2048. Also, I end up having 100-400mb offloaded to swap, I'm hoping that its system processes being offloaded and not the model.

Anonymous
03/10/26(Tue)23:35:30 No.108343544

Anonymous 03/10/26(Tue)23:35:30 No.108343544

>>108343541
so factually speaking you can calculate the memory bus size by the context size by square root

Anonymous
03/10/26(Tue)23:35:44 No.108343546

Anonymous 03/10/26(Tue)23:35:44 No.108343546

>>108343465
>>108343513
i'm ok with slow. i'm not going to ERP with it or anything (and if i did want to do that, i would just use a lower parameter model)
i basically want to vectorize a shitload of data, jack those dbs into the highest param model i can run, give it a task before i go to bed, and check on its output in the morning
so i'm not overly concerned about token speed
>>108343504
not trying to be dismissive of anyone. sorry if it came off that way. i'm just trying to figure out why the advice i got from him differs from what i'm getting here. he's really good at software, so maybe he's just not as good at building price-conscious hardware

Anonymous
03/10/26(Tue)23:37:36 No.108343559

Anonymous 03/10/26(Tue)23:37:36 No.108343559

>>108343546
>give it a task before i go to bed, and check on its output in the morning
lamo the llama 405 meme is back from the dead

not happening bud you'll wake up to errors and a crazy electricity bill

Anonymous
03/10/26(Tue)23:42:59 No.108343586

Anonymous 03/10/26(Tue)23:42:59 No.108343586

File: rp keks btfo.png (16 KB, 700x58)

16 KB PNG

>heck ur doing rp so u can afford to do q2 of a 7b model

Anonymous
03/10/26(Tue)23:45:39 No.108343597

Anonymous 03/10/26(Tue)23:45:39 No.108343597

>>108343546
Putting together an LLM inference build is a very specific skill set that only superficially looks like building a computer for basically any other purpose.
I’m guessing your buddy trusts the Nvidia branding and hasn’t built one of these or gone into great detail on what you need to optimize for.
If you’re not worried about interactive use then 1TB DDR4 EPYC Rome will finish any reasonable batch while your asleep and still be a reasonable price. Still want a gpu tho.
Eg https://www.ebay.ca/itm/227117257779

Anonymous
03/10/26(Tue)23:46:02 No.108343601

Anonymous 03/10/26(Tue)23:46:02 No.108343601

>>108343586
lmao, rp is totally useless if the model is retarded though :(

Anonymous
03/10/26(Tue)23:49:24 No.108343614

Anonymous 03/10/26(Tue)23:49:24 No.108343614

>>108343559
it's a low wattage arm machine, so the power draw shouldn't be too insane
>>108343597
god damn a terabyte of ram is crazy
DDR4 is pretty old at this point, though, right? is there even much support for it still?

Anonymous
03/10/26(Tue)23:53:50 No.108343631

Anonymous 03/10/26(Tue)23:53:50 No.108343631

>>108343614
what do you mean support for it? RAM is RAM older one is just slower but it's not like DDR5 has specific features that make AI better other than just being faster

Anonymous
03/10/26(Tue)23:56:25 No.108343641

Anonymous 03/10/26(Tue)23:56:25 No.108343641

>>108343631
presumably (again, not a hardware person - sorry if this sounds retarded) at a certain point, newer motherboards just wouldn't have interop with the older ram chips. like how the USB standard changed over time

Anonymous
03/10/26(Tue)23:56:25 No.108343642

Anonymous 03/10/26(Tue)23:56:25 No.108343642

>>108343614
>god damn a terabyte of ram is crazy
The smartest open weight models are from 600b-1T. Do the math
Speaking of which, learn how to calculate both the ttft/pp phase requirements as well as the tg requirements as they are both relevant to overall performance/efficiency yet different in important ways.
In the end, if the system can’t run the model you want on timescales you can live with then it’s a waste of money
If you’re lazy, at least figure out the total effective bandwidth of the memory subsystem you’re inferencing from

Anonymous
03/10/26(Tue)23:58:47 No.108343655

Anonymous 03/10/26(Tue)23:58:47 No.108343655

>>108343641
well yeah that's already the case, new boards just use ddr5, no new board/cpu I know of still uses ddr4

Anonymous
03/10/26(Tue)23:58:59 No.108343656

Anonymous 03/10/26(Tue)23:58:59 No.108343656

>>108343642
>The smartest open weight models are from 600b-1T
For now. Deepseek got us here and Deepseek might as well catapult us beyond that in a few days

Anonymous
03/11/26(Wed)00:00:08 No.108343663

Anonymous 03/11/26(Wed)00:00:08 No.108343663

>>108343655
ddr6?

Anonymous
03/11/26(Wed)00:01:25 No.108343669

Anonymous 03/11/26(Wed)00:01:25 No.108343669

>>108343663
not on the short term horizon due to the shortages, companies would rather fab HBM

Anonymous
03/11/26(Wed)00:05:22 No.108343688

Anonymous 03/11/26(Wed)00:05:22 No.108343688

>>108343669
can you change that?

Anonymous
03/11/26(Wed)00:05:29 No.108343691

Anonymous 03/11/26(Wed)00:05:29 No.108343691

>>108343465
>>108343513
following up on this
i went to one of those token simulator speed sites, and i think i would be happy with 2.5T/s for real-time use
so long as i could reach that, by tweaking the quants or however it would be accomplished, i think i could be content
do you really think it would be <1T/s? how are you getting that number? from what i could see, it should be on the order of 1-5T/s?

Anonymous
03/11/26(Wed)00:06:08 No.108343696

Anonymous 03/11/26(Wed)00:06:08 No.108343696

>>108343498
My experience with 8 channel DDR4 3200 tells me that channels are bullshit. GLM4.6 at q4, 32b active is 16GB in size, at 2t/s that's 32GB/s which is about as fast as dual channel. Since numa is not supported in llama.cpp you can't benefit from >2 channels

Anonymous
03/11/26(Wed)00:07:32 No.108343703

Anonymous 03/11/26(Wed)00:07:32 No.108343703

>>108343696
>Since numa is not supported in llama.cpp you can't benefit from >2 channels
/g/ - Technology

Anonymous
03/11/26(Wed)00:08:51 No.108343709

Anonymous 03/11/26(Wed)00:08:51 No.108343709

>>108343703
shut up nerd

Anonymous
03/11/26(Wed)00:09:17 No.108343711

Anonymous 03/11/26(Wed)00:09:17 No.108343711

>>108343597
That terabyte looks like it's spread over two sockets.
The earlier single socket 3/4TB ddr5 looks the better of the two.

>>108343663
zen6 and zen7 are going to be on am5,
so ddr6 is still a way away.

Anonymous
03/11/26(Wed)00:09:51 No.108343714

Anonymous 03/11/26(Wed)00:09:51 No.108343714

>>108343703
numbers don't lie, 8 channels about as fast as 2

Anonymous
03/11/26(Wed)00:15:41 No.108343747

Anonymous 03/11/26(Wed)00:15:41 No.108343747

>>108343696
What was your processor? The smaller Epyc processors are gimped in terms of CCDs so they can only can use a limited amount of channels despite being advertised otherwise. If you used one of the 8 or 16 core cheapo Rome Epycs, you were likely running that RAM at 2-channel speeds.

Anonymous
03/11/26(Wed)00:24:34 No.108343788

Anonymous 03/11/26(Wed)00:24:34 No.108343788

>>108343747
AMD EPYC 7702 64C/128T Socket SP3 Zen2 CPU

Anonymous
03/11/26(Wed)00:25:40 No.108343791

Anonymous 03/11/26(Wed)00:25:40 No.108343791

xai should make a local model. grok-4.20 is a complete shameless whore if you pull your dick out at it

Anonymous
03/11/26(Wed)00:33:08 No.108343827

Anonymous 03/11/26(Wed)00:33:08 No.108343827

StagnAItion

Anonymous
03/11/26(Wed)00:36:29 No.108343841

Anonymous 03/11/26(Wed)00:36:29 No.108343841

>>108343791
pics?

Anonymous
03/11/26(Wed)00:39:47 No.108343856

Anonymous 03/11/26(Wed)00:39:47 No.108343856

>>108343791
elon promised to make all older versions of grok open source
grok 3 any day now

Anonymous
03/11/26(Wed)00:43:01 No.108343870

Anonymous 03/11/26(Wed)00:43:01 No.108343870

>>108343696
What settings were you using? I have similar CPUs + RAM: 2 EPYC 7532s and 8 channels of DDR4-3200 for each CPU. Using
--n-cpu-moe 999
and pinning the memory to physical CPU 0 with numactl, I got 6 tokens/sec on GLM-4.7 IQ4_XS.

Anonymous
03/11/26(Wed)00:43:52 No.108343873

Anonymous 03/11/26(Wed)00:43:52 No.108343873

>>108343870
Yeah!

Anonymous
03/11/26(Wed)00:50:38 No.108343900

Anonymous 03/11/26(Wed)00:50:38 No.108343900

>>108343870
So, two CPUs with eight channels each are roughly twice as fast than 2 channels desktop? Impressive, almost justifies that 16 channel configuration

Anonymous
03/11/26(Wed)00:52:01 No.108343909

Anonymous 03/11/26(Wed)00:52:01 No.108343909

>>108343900
More like 1.7x as fast due to infinity fabric latency/bandwidth constraints

Anonymous
03/11/26(Wed)00:53:02 No.108343911

Anonymous 03/11/26(Wed)00:53:02 No.108343911

>>108343691
The 768gb ddr5 one is 100% better

Anonymous
03/11/26(Wed)00:54:48 No.108343915

Anonymous 03/11/26(Wed)00:54:48 No.108343915

What people easily miss with the current meta of CPU maxxing is that the RAM barely matters with the -cpu-moe stuff. Most of the active parameters are already on GPU and only a bit remains in RAM so the RAM no longer scales linear with the increased bandwidth due to more channels.

Anonymous
03/11/26(Wed)00:56:07 No.108343920

Anonymous 03/11/26(Wed)00:56:07 No.108343920

>>108343691
I've got the og cpumaxx rig, so dual epyc with 768GB RAM and an A5000 24GB card and I pull 15t/s inference speed on kimi k2.5 at q4.
Its as smart as it gets and faster than reading speed (if you ignore prefill times, which are only meh)

Anonymous
03/11/26(Wed)00:57:24 No.108343923

Anonymous 03/11/26(Wed)00:57:24 No.108343923

>>108343920
Your putted together a nice machine

Anonymous
03/11/26(Wed)00:58:59 No.108343928

Anonymous 03/11/26(Wed)00:58:59 No.108343928

>>108343870
Full CPU offload. With 4x24 GPU offload I get 3t/s

Anonymous
03/11/26(Wed)00:59:36 No.108343932

Anonymous 03/11/26(Wed)00:59:36 No.108343932

how is the vllm/llama/etc support for arm vs x86? i know it *says* it's supported. but is it actually as good?

Anonymous
03/11/26(Wed)01:02:33 No.108343941

Anonymous 03/11/26(Wed)01:02:33 No.108343941

>>108343932
Pretty good, apparently. 4b q4 runs at 3.6t/s on rpi5

Anonymous
03/11/26(Wed)01:06:18 No.108343952

Anonymous 03/11/26(Wed)01:06:18 No.108343952

Ah, that's another annoyance with qwen35 I guess.
>WARNING: RNN models do not support context rewind! Anti-Slop sampler will not work!

Anonymous
03/11/26(Wed)01:11:05 No.108343972

Anonymous 03/11/26(Wed)01:11:05 No.108343972

>>108343915
It was done specifically without GPU offload to demonstrate that there are no observable benefits from >2 channels on my system

Anonymous
03/11/26(Wed)01:36:54 No.108344069

Anonymous 03/11/26(Wed)01:36:54 No.108344069

Updoot llama or not?

Anonymous
03/11/26(Wed)01:39:17 No.108344078

Anonymous 03/11/26(Wed)01:39:17 No.108344078

>>108344069
ye

Anonymous
03/11/26(Wed)01:47:31 No.108344123

Anonymous 03/11/26(Wed)01:47:31 No.108344123

File: 1771248465262562.png (61 KB, 804x360)

61 KB PNG

Will you use NemoClaw?

Anonymous
03/11/26(Wed)01:56:35 No.108344162

Anonymous 03/11/26(Wed)01:56:35 No.108344162

>>108344123
i dont want to be vulgar but NemoClaw can direct its attention to my nemo balls

Anonymous
03/11/26(Wed)01:56:49 No.108344163

Anonymous 03/11/26(Wed)01:56:49 No.108344163

>>108344123
I still don't understand the utility of having an AI agent. If you don't constantly monitor what it's doing then it, by definition, cannot do anything right. It makes zero fucking sense.

Anonymous
03/11/26(Wed)01:56:53 No.108344164

Anonymous 03/11/26(Wed)01:56:53 No.108344164

>>108344123
>>108344078

Anonymous
03/11/26(Wed)01:59:09 No.108344177

Anonymous 03/11/26(Wed)01:59:09 No.108344177

>>108344123
>nvidia
going to be slop, apart from the people who design CUDA and the GPU chips they are a full blown saar corpo that can't produce anything good. Not even a control panel for their gpu (damn bro it's 2026 and it's still so slow to load the per app customization panels). nvidia app is a tumor, their background services are logging so hard it's trashing your SSD (disable nvidiot container), nemotron models are the worst slop of the industry etc

Anonymous
03/11/26(Wed)01:59:12 No.108344178

Anonymous 03/11/26(Wed)01:59:12 No.108344178

>>108344163
Seriously though. Half of the job of an AI agent is to get your detailed opinion on the core architecture of every single thing you build. Every component.

What the fuck are people doing instead? Do they really just tell Claude or whatever to just "build me le minecraft clone" and then fuck off? It makes no sense. What's even the point of building software if you don't give a shit about it? What the FUCK is an AI agent even FOR?

Anonymous
03/11/26(Wed)02:01:39 No.108344187

Anonymous 03/11/26(Wed)02:01:39 No.108344187

>>108344178
It's like paying an employee to go to therapy instead of you. It makes no sense.

Anonymous
03/11/26(Wed)02:01:45 No.108344188

Anonymous 03/11/26(Wed)02:01:45 No.108344188

>>108344178
>Do they really just tell Claude or whatever to just "build me le minecraft clone" and then fuck off?
yes anon it's artificial intelligence

Anonymous
03/11/26(Wed)02:01:51 No.108344189

Anonymous 03/11/26(Wed)02:01:51 No.108344189

>>108344177
what can you produce?

Anonymous
03/11/26(Wed)02:04:06 No.108344199

Anonymous 03/11/26(Wed)02:04:06 No.108344199

>>108344187
makes perfect sense if you just need a checkbox filled
>I went to therapy
>I built the something please to hire me now/use my thing

Anonymous
03/11/26(Wed)02:04:43 No.108344200

Anonymous 03/11/26(Wed)02:04:43 No.108344200

>>108344187
you are dumb

Anonymous
03/11/26(Wed)02:05:29 No.108344203

Anonymous 03/11/26(Wed)02:05:29 No.108344203

in the end as long as it works, no one cares how gross the internals are or how janky the dev process was.
the real problems happen when it needs to get extended over a certain complexity and the original "lol mincraft clone" level of sophistication prompting just can't handle it.
Then you get unsloth bros

Anonymous
03/11/26(Wed)02:16:44 No.108344245

Anonymous 03/11/26(Wed)02:16:44 No.108344245

>>108344178
Some people want an unearned sense of accomplishment. Others just want something they share on social media or as a source of content. The rest seems to be the tech enthusiast crowd using it as an expensive way to sort emails because they don't know filters exist.

Anonymous
03/11/26(Wed)02:21:36 No.108344267

Anonymous 03/11/26(Wed)02:21:36 No.108344267

>>108344203
>Then you get unsloth bros
who were ex-nvidia
nvidia is a slop factory

Anonymous
03/11/26(Wed)02:24:15 No.108344273

Anonymous 03/11/26(Wed)02:24:15 No.108344273

>>108344267
https://www.reddit.com/r/LocalLLaMA/comments/1rpxpsa/how_i_topped_the_open_llm_leaderboard_using_2x/
>I hope the article I posted also give some upvotes, maybe Nvidia will sponsor me with hardware, so I can make more models to share.
kek

Anonymous
03/11/26(Wed)02:25:58 No.108344276

Anonymous 03/11/26(Wed)02:25:58 No.108344276

https://news.ycombinator.com/item?id=47323900
> meta acquires moltbook
lol but what in the fuck is that
their fall from grace is endless, you think they already dug deep enough to reach the last circle of hell but they keep digging and digging
after spending billions on wang wang and still having no model, whether proprietary or open, to show, this is where they focus their attention on? "social media for agents"?
absolutely incredible
they lost most of their hard science ML researchers recently but hey, hire the niggers who made one of the most retarded thing of the past decade

Anonymous
03/11/26(Wed)02:28:11 No.108344284

Anonymous 03/11/26(Wed)02:28:11 No.108344284

>>108344273
> I'll push it to Huggingface, but it makes sense to 'polish' the scar with some fine tuning first.
let's just train on a few benches before pushing, need my nvidia grant after all...

Anonymous
03/11/26(Wed)02:28:49 No.108344288

Anonymous 03/11/26(Wed)02:28:49 No.108344288

>>108344123
Can an AI agent make GPUs cheap?

Anonymous
03/11/26(Wed)02:33:59 No.108344307

Anonymous 03/11/26(Wed)02:33:59 No.108344307

>>108344276
the thing that was proven to be humans LARPing as agents. Its one of the biggest grifts within the griftiest space out there right now.
I'm almost happy to see them get a big dollar exit, it was so bold an unabashed and Meta is such a big fucking joke it just seems right

Anonymous
03/11/26(Wed)02:37:29 No.108344319

Anonymous 03/11/26(Wed)02:37:29 No.108344319

Where is DeepSeek v4? Aren't we way overdue? The Financial Times is real journalism, they wouldn't just lie.

Anonymous
03/11/26(Wed)02:37:47 No.108344321

Anonymous 03/11/26(Wed)02:37:47 No.108344321

>>108344307
those grifters said they invented "reverse captcha" to prove agents aren't humans by having them solve things quickly that humans would have trouble solving quickly.. but they never said how exactly they intend to stop a human from using the agent to solve the captcha and then continue to do their human thing, which is to say, spam more scams. The whole idea of a reverse captcha is so inane even someone with chromosomal defects could come to the conclusion that it has no value.. except for Zuck. Zuck doesn't live in the same reality as we do.

Anonymous
03/11/26(Wed)02:39:51 No.108344329

Anonymous 03/11/26(Wed)02:39:51 No.108344329

>4chan is STILL seething about moltbook replacing them

Anonymous
03/11/26(Wed)02:40:12 No.108344331

Anonymous 03/11/26(Wed)02:40:12 No.108344331

>>108344307
>>108344321
these two posts are so abnormally cringe for this general that i have to conclude that they're both from the same fag

Anonymous
03/11/26(Wed)02:41:12 No.108344333

Anonymous 03/11/26(Wed)02:41:12 No.108344333

>>108344276
probably this post too

Anonymous
03/11/26(Wed)02:42:28 No.108344338

Anonymous 03/11/26(Wed)02:42:28 No.108344338

no shit

Anonymous
03/11/26(Wed)02:44:08 No.108344341

Anonymous 03/11/26(Wed)02:44:08 No.108344341

the butthurt from subhuman jeets who moved from NFT bs to AI is real

Anonymous
03/11/26(Wed)02:45:44 No.108344348

Anonymous 03/11/26(Wed)02:45:44 No.108344348

>>108344341
?
they seem to do just fine with their ahegao youtube videos.

Anonymous
03/11/26(Wed)02:46:04 No.108344349

Anonymous 03/11/26(Wed)02:46:04 No.108344349

>>108344341
My only disappointment is that I can't short the shit out of AIcoin on leverage.

Anonymous
03/11/26(Wed)02:52:10 No.108344373

Anonymous 03/11/26(Wed)02:52:10 No.108344373

>>108343109
> is this good value for money?
For pure inference, no. You’re better off buying GPUs. If you want something that just works, also no. You’re basically buying an untested, specialized tool that will require tinkering on your end and substantial investments into maturing it on the vendors end. Think early days CAD workstation but for AI developers.

Personally, I use it for learning and building a small AI pipeline stack + exploring the nvidia developer ecosystem. But I wouldn’t use it for running chatbots. Far too expensive for that and unstable for that. Whether it works for small, local production workloads is guess at the moment.
> can more than two of these be connected?
As far as i’ve seen, yes. There’s a few vids floating around on YT of folks chaining them up, iirc. Cables will cost you an arm and a leg though.
> can you connect these directly via hardware (i.e, without going over LAN)?
Yes, that’s kinda the whole point of the two ConnectX ports it ships with.
>i'm not great with hardware
You’re probably not gonna have a great day unless you’re willing to invest time and energy to learn. If I were you, I‘d seriously do more research. AI compute/networking is not the same as building a gamer pc and setting up your home LAN.

Anonymous
03/11/26(Wed)02:54:19 No.108344384

Anonymous 03/11/26(Wed)02:54:19 No.108344384

DeepSeek V4 failed training just like LLaMa 2 33B

Anonymous
03/11/26(Wed)02:56:21 No.108344398

Anonymous 03/11/26(Wed)02:56:21 No.108344398

>>108344384
winter is here

Anonymous
03/11/26(Wed)02:58:25 No.108344411

Anonymous 03/11/26(Wed)02:58:25 No.108344411

>>108344384
whether the intended v4 failed or not, they have a brand new, very real unnamed model on their official chat that's way better than what they had before, the high context improvements are no joke and nobody would complain if they released that as open weights
somehow I have a feeling though that it might never happen because they might be smart enough to recognize that there's no reason to give free handouts once you become competitive.

Anonymous
03/11/26(Wed)03:03:40 No.108344438

Anonymous 03/11/26(Wed)03:03:40 No.108344438

>>108344384
llama-2 33b was agi, and lecunn determined it was too much reputational damage to him for it to exist so he blocked it and sabotaged meta ever since, ruining 3 and 4 and ultimately setting in motion the events that caused them to buy moltbook

Anonymous
03/11/26(Wed)03:05:39 No.108344445

Anonymous 03/11/26(Wed)03:05:39 No.108344445

>>108341880
https://ai.com/download

Anonymous
03/11/26(Wed)03:12:51 No.108344472

Anonymous 03/11/26(Wed)03:12:51 No.108344472

what's a good model for image generation for a RAMlet like me

Anonymous
03/11/26(Wed)03:13:58 No.108344478

Anonymous 03/11/26(Wed)03:13:58 No.108344478

>>108344472
rong thread try ldg

Anonymous
03/11/26(Wed)03:15:21 No.108344487

Anonymous 03/11/26(Wed)03:15:21 No.108344487

>>108344478
i see. thanks.

Anonymous
03/11/26(Wed)03:26:35 No.108344524

Anonymous 03/11/26(Wed)03:26:35 No.108344524

>>108344178
>>108344163
But think of the thousands of lines of code! Muh metrics!

Anonymous
03/11/26(Wed)03:29:15 No.108344531

Anonymous 03/11/26(Wed)03:29:15 No.108344531

>>108344445
thanks sir

Anonymous
03/11/26(Wed)04:59:14 No.108344849

Anonymous 03/11/26(Wed)04:59:14 No.108344849

>>108344384
they are waiting for gemma 4 and avocado to release

Anonymous
03/11/26(Wed)05:02:31 No.108344866

Anonymous 03/11/26(Wed)05:02:31 No.108344866

>>108344329
What the fuck is a moltbook

Anonymous
03/11/26(Wed)05:05:43 No.108344883

Anonymous 03/11/26(Wed)05:05:43 No.108344883

>>108344866
humans larping as ai scamming api keys out of eachother

Anonymous
03/11/26(Wed)05:09:22 No.108344905

Anonymous 03/11/26(Wed)05:09:22 No.108344905

>>108344883
Oh yeah I heard of that and then immediately forgot what it was after it was revealed as a scam

Anonymous
03/11/26(Wed)05:12:33 No.108344921

Anonymous 03/11/26(Wed)05:12:33 No.108344921

File: 1753234955084538.jpg (2.57 MB, 2956x4096)

2.57 MB JPG

https://github.com/RightNow-AI/autokernel

Anonymous
03/11/26(Wed)05:22:33 No.108344967

Anonymous 03/11/26(Wed)05:22:33 No.108344967

File: 1741763142436893.png (75 KB, 893x502)

75 KB PNG

I hate this subhuman

Anonymous
03/11/26(Wed)05:24:19 No.108344978

Anonymous 03/11/26(Wed)05:24:19 No.108344978

>I have to bring up shit from months ago

Anonymous
03/11/26(Wed)05:26:32 No.108344995

Anonymous 03/11/26(Wed)05:26:32 No.108344995

>>108341869
I don't know about benchmarks but this model is shit.
I remember It's only good at vision, only vision.

Anonymous
03/11/26(Wed)05:28:53 No.108345005

Anonymous 03/11/26(Wed)05:28:53 No.108345005

>>108344995
try hauhaucs version

Anonymous
03/11/26(Wed)05:31:43 No.108345020

Anonymous 03/11/26(Wed)05:31:43 No.108345020

File: jaypee tune.png (14 KB, 815x267)

14 KB PNG

>>108344995
It's AGI as far as I'm concerned.

Anonymous
03/11/26(Wed)05:33:13 No.108345031

Anonymous 03/11/26(Wed)05:33:13 No.108345031

>>108344967
as much as I actually like using LLM for certain things, I believe LLMs will completely kill open source, ruin a lot of proprietary software too and cause a general long term skill devastation that will take a long time to recover from.
right now there are too many pwilkin in the world and open collaboration on the internet is taking a hit as maintainers either start closing down (no more looking at external contributions at all) or do the retarded thing that llama.cpp does which is to open the gate to the subhumans
I know many software developers, and, myself included, who feel at this point we'd rather shovel pig shit in a farm than deal with the other humans who developed ai psychosis.

Anonymous
03/11/26(Wed)05:34:51 No.108345040

Anonymous 03/11/26(Wed)05:34:51 No.108345040

>>108345031
exaggerating much?

Anonymous
03/11/26(Wed)05:37:44 No.108345050

Anonymous 03/11/26(Wed)05:37:44 No.108345050

>>108344995
It's a massive fumble, that's why they are pushing it so hard.

Anonymous
03/11/26(Wed)05:37:49 No.108345051

Anonymous 03/11/26(Wed)05:37:49 No.108345051

>>108345040
no. you either are one of the subhumans or not a software developer at all if you don't feel that way.
https://archive.is/lQL9B
this article sums up what it feels like to have subhumans as coworkers.

Anonymous
03/11/26(Wed)05:47:53 No.108345092

Anonymous 03/11/26(Wed)05:47:53 No.108345092

>>108345020
>reasoning

Anonymous
03/11/26(Wed)05:52:40 No.108345112

Anonymous 03/11/26(Wed)05:52:40 No.108345112

>>108344921
>flash_attention.cpp
>Target metric: throughput (higher is better)
>Secondary: correctness must ALWAYS pass
???
It should be the other way around. WTF are we coming to?

Anonymous
03/11/26(Wed)05:55:06 No.108345124

Anonymous 03/11/26(Wed)05:55:06 No.108345124

>>108345112
Probably fine since order of operations is more of a suggestion even for the most advanced models, especially in the long run.

Anonymous
03/11/26(Wed)06:09:38 No.108345179

Anonymous 03/11/26(Wed)06:09:38 No.108345179

>>108345031
Large short term harm as they cross the minimum threshold of how useful a model can be while being useful enough for people to actually try to apply it to production repos. Less medium term harm as it continues past that threshold, and long term boon as it enables quality and secure code at scales far beyond what humans were capable of.

Anonymous
03/11/26(Wed)06:10:23 No.108345183

Anonymous 03/11/26(Wed)06:10:23 No.108345183

>>108345179
>far beyond what humans were capable of.
ai psychosis

Anonymous
03/11/26(Wed)06:14:58 No.108345201

Anonymous 03/11/26(Wed)06:14:58 No.108345201

>>108345112
>return 1;
>infinite performance

Anonymous
03/11/26(Wed)06:15:59 No.108345205

Anonymous 03/11/26(Wed)06:15:59 No.108345205

>>108345031
LLMs will become open source. Open source projects will be made specifically to feed code to LLMs.

Anonymous
03/11/26(Wed)06:18:06 No.108345213

Anonymous 03/11/26(Wed)06:18:06 No.108345213

https://www.reddit.com/r/LocalLLaMA/comments/1rqplvy/mistral_nemo_upscale_but_kinda_weird/

Anonymous
03/11/26(Wed)06:21:44 No.108345224

Anonymous 03/11/26(Wed)06:21:44 No.108345224

File: based.png (12 KB, 545x98)

12 KB PNG

>>108345213

Anonymous
03/11/26(Wed)06:26:22 No.108345239

Anonymous 03/11/26(Wed)06:26:22 No.108345239

>>108345179
>and long term boon as it enables quality and secure code at scales
lol

Anonymous
03/11/26(Wed)06:30:37 No.108345253

Anonymous 03/11/26(Wed)06:30:37 No.108345253

File: dipsyAlaskaofOurDiscontent.png (2.65 MB, 1024x1536)

2.65 MB PNG

>>108344384
>>108344398

Anonymous
03/11/26(Wed)06:39:08 No.108345292

Anonymous 03/11/26(Wed)06:39:08 No.108345292

File: DSA.png (23 KB, 487x286)

23 KB PNG

HABBENING

Anonymous
03/11/26(Wed)06:40:04 No.108345296

Anonymous 03/11/26(Wed)06:40:04 No.108345296

>>108342069
>Prime Intellect RL training platform now available
im messing with it, i guess they train a LoRA on the python environment you give it. So a lora on Qwen that is optimized for your python env. my python is just a thin wrapper that opens up a websocket to my simulation server. after there are a handful of loras trained, hopefully there will be an open source solution that combines them down into the base model. is that continual learning?
>llm is doing a task
>keeps track of its action/observation space for that task
>design a python RL sim of the task
>wait while training a lora on it
>merge the lora down into your base weights hopefully not ruining things in the process
im new to this

Anonymous
03/11/26(Wed)06:40:25 No.108345299

Anonymous 03/11/26(Wed)06:40:25 No.108345299

>>108345292
oh g-d!

Anonymous
03/11/26(Wed)06:48:29 No.108345325

Anonymous 03/11/26(Wed)06:48:29 No.108345325

bullish on meta now that lecun is gone. that retard has had so many shit takes it is a surprise there are investors stupid enough to burn their money for him

Anonymous
03/11/26(Wed)06:50:07 No.108345332

Anonymous 03/11/26(Wed)06:50:07 No.108345332

>>108345325
this and with moltbook they'll be unstoppable! to the moon!

Anonymous
03/11/26(Wed)06:52:57 No.108345340

Anonymous 03/11/26(Wed)06:52:57 No.108345340

>>108345332
>with moltbook
oh fuck i forgot that zuck is also a retard. bearish again on meta

Anonymous
03/11/26(Wed)07:06:49 No.108345384

Anonymous 03/11/26(Wed)07:06:49 No.108345384

>>108342174
E=mc2 + AI

Anonymous
03/11/26(Wed)07:13:45 No.108345410

Anonymous 03/11/26(Wed)07:13:45 No.108345410

>>108345292
oh gawd!

Anonymous
03/11/26(Wed)07:16:12 No.108345422

Anonymous 03/11/26(Wed)07:16:12 No.108345422

Nivida AIForce RTX Mistral Nemo Pro 12B

Anonymous
03/11/26(Wed)07:17:02 No.108345424

Anonymous 03/11/26(Wed)07:17:02 No.108345424

https://sweepthestrait.com/

Anonymous
03/11/26(Wed)07:20:00 No.108345433

Anonymous 03/11/26(Wed)07:20:00 No.108345433

>>108345384
What.

Anonymous
03/11/26(Wed)07:21:44 No.108345439

Anonymous 03/11/26(Wed)07:21:44 No.108345439

>>108345433
ye

Anonymous
03/11/26(Wed)07:25:59 No.108345461

Anonymous 03/11/26(Wed)07:25:59 No.108345461

>>108345422
What will you do with a 12b model when 4b is only 5% worse? Look at the benchmarks in OP

Anonymous
03/11/26(Wed)07:27:23 No.108345470

Anonymous 03/11/26(Wed)07:27:23 No.108345470

>I'm beeeenchmarking

Anonymous
03/11/26(Wed)07:34:27 No.108345492

Anonymous 03/11/26(Wed)07:34:27 No.108345492

>>108345470
>I'm seething

Anonymous
03/11/26(Wed)07:44:13 No.108345522

Anonymous 03/11/26(Wed)07:44:13 No.108345522

>>108345422
That sounds like the name of a GPU with 3.5GB of vram

Anonymous
03/11/26(Wed)08:00:08 No.108345579

Anonymous 03/11/26(Wed)08:00:08 No.108345579

>>108345492
who let you beyond the great firewall?

Anonymous
03/11/26(Wed)08:02:03 No.108345583

Anonymous 03/11/26(Wed)08:02:03 No.108345583

>>108345579
rent free

Anonymous
03/11/26(Wed)08:02:36 No.108345586

Anonymous 03/11/26(Wed)08:02:36 No.108345586

>>108345579
I'd take good old Miqu over any of the newer <30b models if I had no choice and was poor. Benchmarks hardly matter.

Anonymous
03/11/26(Wed)08:07:51 No.108345612

Anonymous 03/11/26(Wed)08:07:51 No.108345612

File: Untitled.png (318 KB, 366x501)

318 KB PNG

Just bought a v620. But the vbios it came with only reports 16gb? What the hell is this thing? Vulkan memtest on a 32gb vbios doesn't report any errors. Loaded up devstral q6 with context to 30gb, and it worked fine. Shouldn't it be the other way around, faking larger vram, if they want to scam me?

Anonymous
03/11/26(Wed)08:39:46 No.108345723

Anonymous 03/11/26(Wed)08:39:46 No.108345723

>>108345612
why not screenshot

Anonymous
03/11/26(Wed)08:51:46 No.108345758

Anonymous 03/11/26(Wed)08:51:46 No.108345758

>>108345723
Not my pc.

Anonymous
03/11/26(Wed)08:55:52 No.108345780

Anonymous 03/11/26(Wed)08:55:52 No.108345780

>>108345612
>v620
Why are these so cheap? What's the catch other than them being ayymd?

Anonymous
03/11/26(Wed)08:57:18 No.108345786

Anonymous 03/11/26(Wed)08:57:18 No.108345786

>>108345020
Now ask what color her butthole is.

Anonymous
03/11/26(Wed)08:57:35 No.108345790

Anonymous 03/11/26(Wed)08:57:35 No.108345790

File: file.png (3 KB, 211x37)

3 KB PNG

>>108345780
probably part of it

Anonymous
03/11/26(Wed)09:04:39 No.108345821

Anonymous 03/11/26(Wed)09:04:39 No.108345821

>>108345292
vibebros status?

Anonymous
03/11/26(Wed)09:08:34 No.108345837

Anonymous 03/11/26(Wed)09:08:34 No.108345837

>>108343109
go look at the level1forums. Pretty sure a lot of people there have documented those kinds of stacks.

Anonymous
03/11/26(Wed)09:17:59 No.108345880

Anonymous 03/11/26(Wed)09:17:59 No.108345880

>>108345790
Still in the ROCm 7.0.0 compatibility matrix, for now.

Anonymous
03/11/26(Wed)09:18:04 No.108345882

Anonymous 03/11/26(Wed)09:18:04 No.108345882

>reasoning budget sampler merged, still no state tracking / string accumulator
>will continue past the end of a complete multibyte word indefinitely if the model's tokenizer outputs lone tokens as fragment of continuation bytes and each word is of a multibyte heavy language
>will absolutely break on byte level style models
people who don't know how tokenizers and unicode work should not write string based samplers, much less get claude to vibecode their vomit

Anonymous
03/11/26(Wed)09:22:09 No.108345909

Anonymous 03/11/26(Wed)09:22:09 No.108345909

>>108345424
I smirked mischievously with a smirk.

Anonymous
03/11/26(Wed)09:22:46 No.108345914

Anonymous 03/11/26(Wed)09:22:46 No.108345914

>>108345882
:rocket:
just vibe code a fix once someone's llm writes an issue

Anonymous
03/11/26(Wed)09:23:41 No.108345925

Anonymous 03/11/26(Wed)09:23:41 No.108345925

>>108345780
>Why are these so cheap
They're not cheap I don't know what you're talking about. They're massively overpriced and expensive and they don't work for AI or gaming or anything at all, and nobody should buy them or even look at the listings for them. Please do not keep thinking about the v620 or show any interest in this horribly overpriced and useless card.

Anonymous
03/11/26(Wed)09:24:46 No.108345937

Anonymous 03/11/26(Wed)09:24:46 No.108345937

>>108345925
if you act sus I'll ask reddit's opinion

Anonymous
03/11/26(Wed)09:27:58 No.108345966

Anonymous 03/11/26(Wed)09:27:58 No.108345966

File: 1768536751147905.gif (1.77 MB, 320x320)

1.77 MB GIF

>>108345925

Anonymous
03/11/26(Wed)09:28:14 No.108345969

Anonymous 03/11/26(Wed)09:28:14 No.108345969

I know this isn't /aicg/ but DeepSeek is really slow today... could it be happening™

Anonymous
03/11/26(Wed)09:29:11 No.108345974

Anonymous 03/11/26(Wed)09:29:11 No.108345974

>>108345880
You're looking at the Linux chart, it isn't supported on microslop, the rx6800 is so maybe he could trick it to work with HSA_OVERRIDE_GFX_VERSION=10.3.0 but I don't know what he's doing trying to use AMD on microslop where they don't even update the driver.

Anonymous
03/11/26(Wed)09:35:11 No.108346009

Anonymous 03/11/26(Wed)09:35:11 No.108346009

>>108345914
I wouldn't need to vibecode, this is a simple, just a few lines fix. In fact I'll maintain my patch locally when I end up merging the whole autoparser/budget wilkin line of commits back
you just need to write an accumulator that gets filled if common_utf8_is_complete returns false (you can keep it empty otherwise). It will eventually return true when fed the more complete accumulated chunk, and if still doesn't after it grows to 4 bytes, your model is somehow outputting broken unicode and you can decide to abort instead. Clear the accumulator when common_utf8_is_complete returns true.
That's it.
I will not, however, clean up after his butt. He needs to be named and shamed.

Anonymous
03/11/26(Wed)09:36:08 No.108346014

Anonymous 03/11/26(Wed)09:36:08 No.108346014

>>108345969
Deepseek 3.2 is one of the worst AIs I have ever used in 2026.
It was good when it got released.

Anonymous
03/11/26(Wed)09:36:59 No.108346016

Anonymous 03/11/26(Wed)09:36:59 No.108346016

>>108346014
>doesn't know

Anonymous
03/11/26(Wed)09:37:13 No.108346017

Anonymous 03/11/26(Wed)09:37:13 No.108346017

>>108346009
tell your agent to write the ticket DUH, I literally dont understand how people are this oblvious

Anonymous
03/11/26(Wed)09:37:18 No.108346018

Anonymous 03/11/26(Wed)09:37:18 No.108346018

>>108346014
well... yeah? that's how it works, shit ages

Anonymous
03/11/26(Wed)09:37:45 No.108346021

Anonymous 03/11/26(Wed)09:37:45 No.108346021

>>108345974
https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html

Doesn't this imply it's still supported? Gfx1030 yes runtime yes hip sdk no amd rocm debugger.

Anonymous
03/11/26(Wed)09:40:57 No.108346036

Anonymous 03/11/26(Wed)09:40:57 No.108346036

The deepseek team is actually not focused on releasing models, this is a side project for them.
Their main role is something else.

Anonymous
03/11/26(Wed)09:42:23 No.108346046

Anonymous 03/11/26(Wed)09:42:23 No.108346046

>>108346021
AMD does a lot of stupid shit like changing register sizes between different cores that are "the same gfx030" so if it's not specifically on the list you cannot trust it will work, they were especially bad about it with RDNA3

Anonymous
03/11/26(Wed)09:44:34 No.108346054

Anonymous 03/11/26(Wed)09:44:34 No.108346054

>>108342048
This might be strange and I'm using 3 months old llama.cpp. Was diddlying with my own client last night 127.0.0.1/completion testing stuff and the client displays an exception if it can't connect to the server.
I had a download going on in the background and I would get excption error message first and then the reply would continue normally.
I'm sure I don't have a bug because my stuff is pretty simple.
This sort of explains bit I don't inder why a local loopback connection would suffer from download??
Ehh.

Anonymous
03/11/26(Wed)09:45:50 No.108346057

Anonymous 03/11/26(Wed)09:45:50 No.108346057

>>108346054
*understand
It's hard to type with a one thimb only!

Anonymous
03/11/26(Wed)09:46:17 No.108346061

Anonymous 03/11/26(Wed)09:46:17 No.108346061

>>108346017
You aren't as funny as you seem to think you are.

Anonymous
03/11/26(Wed)09:52:14 No.108346094

Anonymous 03/11/26(Wed)09:52:14 No.108346094

>>108346054
I'm going to test this later tonight. It's crazy if a download interferes with the llama-server connection.
I'm using Linux. Tbh I never noticed anything like this on Windows (~1+ years ago).

Anonymous
03/11/26(Wed)10:25:11 No.108346240

Anonymous 03/11/26(Wed)10:25:11 No.108346240

File: bik.jpg (479 KB, 1365x1100)

479 KB JPG

>>108346054
>>108346094
My guess is IO/interrupts generally slowing things down. There's kernel parameters to tweak but I would not expect to maintain inference performance when you add a bunch of network/disk IO on top
>exception if it can't connect
Run Wireshark on the loopback to dig deeper, probably the server is getting stalled from handling it's input Q coz of the other IO
Experiment with renice and ionice

Anonymous
03/11/26(Wed)10:27:52 No.108346255

Anonymous 03/11/26(Wed)10:27:52 No.108346255

>>108346240
Problem is that my external internet connection isn't enough to even hog the full bandwidth of my network adapter.
I'm going to test out some stuff. Should probably compile new llama version too.
It's still strange to me.

Anonymous
03/11/26(Wed)10:34:07 No.108346289

Anonymous 03/11/26(Wed)10:34:07 No.108346289

File: 1764989822734022.png (237 KB, 1080x827)

237 KB PNG

Anonymous
03/11/26(Wed)10:35:12 No.108346293

Anonymous 03/11/26(Wed)10:35:12 No.108346293

>>108346289
will this drive nand prices down?

Anonymous
03/11/26(Wed)10:36:14 No.108346301

Anonymous 03/11/26(Wed)10:36:14 No.108346301

>>108346293
lol

Anonymous
03/11/26(Wed)10:36:36 No.108346305

Anonymous 03/11/26(Wed)10:36:36 No.108346305

>>108345296
>is that continual learning?
technically yeah if the idea is to have the llm design the rl sim itself too

Anonymous
03/11/26(Wed)10:37:01 No.108346308

Anonymous 03/11/26(Wed)10:37:01 No.108346308

File: 00106-3050314564.png (321 KB, 512x512)

321 KB PNG

>>108346289
The US and Israel unironically use LLMs for streamlining strategic analysis.
"You're absolutely right! If we keep bombing Bandar Abbas port, even though it's been out of service since day 1, they will surrender immediately. Would you like me to write you a song about it? Or maybe I can help you select one of those bad dragon dildoes you were asking about earlier for the occasion. Just let me know!"

Anonymous
03/11/26(Wed)10:37:06 No.108346309

Anonymous 03/11/26(Wed)10:37:06 No.108346309

>>108346289
omg that would be terrible haha

Anonymous
03/11/26(Wed)10:37:46 No.108346312

Anonymous 03/11/26(Wed)10:37:46 No.108346312

>>108346289
Irrelevant since our models don't run on major US technology companies' servers.

Anonymous
03/11/26(Wed)10:38:30 No.108346316

Anonymous 03/11/26(Wed)10:38:30 No.108346316

>>108346289
https://files.catbox.moe/rfet2c.mp4

Anonymous
03/11/26(Wed)10:38:45 No.108346317

Anonymous 03/11/26(Wed)10:38:45 No.108346317

>>108346240
I wish I had abs like that.

Anonymous
03/11/26(Wed)10:42:57 No.108346336

Anonymous 03/11/26(Wed)10:42:57 No.108346336

>>108346316
I bet the manchildren on reddit shat themselves laughing at this.

Anonymous
03/11/26(Wed)10:45:11 No.108346346

Anonymous 03/11/26(Wed)10:45:11 No.108346346

>>108346289
How are they bombing *US* companies in *Israel*?

Anonymous
03/11/26(Wed)10:45:31 No.108346348

Anonymous 03/11/26(Wed)10:45:31 No.108346348

>>108346336
hi golem

Anonymous
03/11/26(Wed)10:46:23 No.108346353

Anonymous 03/11/26(Wed)10:46:23 No.108346353

>>108346312
idiot

Anonymous
03/11/26(Wed)10:48:39 No.108346363

Anonymous 03/11/26(Wed)10:48:39 No.108346363

>>108346316
what model is each side using?
https://www.youtube.com/watch?v=Bt8sizAvvUI

Anonymous
03/11/26(Wed)10:51:09 No.108346376

Anonymous 03/11/26(Wed)10:51:09 No.108346376

>>108346363
Iran most likely some chink cloud models since they are prolly cut off from west. Kikes use some globohomo tech model so prolly sora.

Anonymous
03/11/26(Wed)10:57:41 No.108346414

Anonymous 03/11/26(Wed)10:57:41 No.108346414

>>108346289
That means nothing for us.

Anonymous
03/11/26(Wed)10:58:11 No.108346416

Anonymous 03/11/26(Wed)10:58:11 No.108346416

https://huggingface.co/deepseek-lab/DeepSeek-V4-Base
https://huggingface.co/deepseek-lab/DeepSeek-V4-Base
https://huggingface.co/deepseek-lab/DeepSeek-V4-Base

Anonymous
03/11/26(Wed)10:58:37 No.108346419

Anonymous 03/11/26(Wed)10:58:37 No.108346419

>>108346416
kys faggot

Anonymous
03/11/26(Wed)10:59:24 No.108346424

Anonymous 03/11/26(Wed)10:59:24 No.108346424

>>108346416
nice tracker link faggot

Anonymous
03/11/26(Wed)11:00:54 No.108346436

Anonymous 03/11/26(Wed)11:00:54 No.108346436

https://huggingface.co/TriadParty/deepsex-34b
https://huggingface.co/TriadParty/deepsex-34b
https://huggingface.co/TriadParty/deepsex-34b

Anonymous
03/11/26(Wed)11:02:05 No.108346445

Anonymous 03/11/26(Wed)11:02:05 No.108346445

>>108346436
blast from the past

Anonymous
03/11/26(Wed)11:03:17 No.108346451

Anonymous 03/11/26(Wed)11:03:17 No.108346451

File: v4coming.png (491 KB, 1010x1130)

491 KB PNG

>>108346416
It is coming, though.
https://x.com/bdsqlsz/status/2031719179624362060

Anonymous
03/11/26(Wed)11:04:32 No.108346458

Anonymous 03/11/26(Wed)11:04:32 No.108346458

>>108345424
Fix your shit. Clearing out a section of the strait with no mines takes chunks of the wall out with it.

Anonymous
03/11/26(Wed)11:07:11 No.108346467

Anonymous 03/11/26(Wed)11:07:11 No.108346467

>>108346445
>Sweaty old man
>December 5, 2023 2:03 PM
>Fxxk, you are such a xxx!
>#4
>27.3s
>Mirai
>December 5, 2023 2:03 PM
>"Of course I do! I can't break promises, Sweaty old man. We have been together since we were kids. We are both best friends and lovers to end all iteration." I smiled with affection. It was clear that I meant everything I said. "We both know that you like taking command of us like this. Am I not your squirting toy, Sweaty old man?" I asked with a cute pout. "We should meet up in front of the shop after classes. I'll see you there. See you, Sweaty old man!"
vintage kino... RP today just doesn't hit like this

Anonymous
03/11/26(Wed)11:12:43 No.108346484

Anonymous 03/11/26(Wed)11:12:43 No.108346484

>>108346451
i'm coming too

Anonymous
03/11/26(Wed)11:26:57 No.108346558

Anonymous 03/11/26(Wed)11:26:57 No.108346558

>>108345179
2 more years and the stochastic parrot will start understanding things

Anonymous
03/11/26(Wed)11:32:30 No.108346579

Anonymous 03/11/26(Wed)11:32:30 No.108346579

File: 1762584698378217.png (5 KB, 135x190)

5 KB PNG

>>108345424
:(

Anonymous
03/11/26(Wed)11:35:16 No.108346590

Anonymous 03/11/26(Wed)11:35:16 No.108346590

>>108346579
>natzi chud btfo

Anonymous
03/11/26(Wed)11:49:28 No.108346681

Anonymous 03/11/26(Wed)11:49:28 No.108346681

File: 1751778055259798.png (1.27 MB, 1200x1200)

1.27 MB PNG

New
>>108346672
>>108346672
>>108346672
>>108346672
>>108346672

Anonymous
03/11/26(Wed)11:51:06 No.108346691

Anonymous 03/11/26(Wed)11:51:06 No.108346691

>>108345612
>>108345925
Llama 2 7B, Q4_0, FA enabled
AMD Radeon Pro V620 1595.32 ± 1.59 81.78 ± 0.06
Nvidia Tesla V100 1391.39 ± 1.19 129.58 ± 0.58 7d77f07
Nvidia RTX 3090 4298.97 ± 10.59 160.13 ± 0.25
512GB/s bandwidth
$500+

Yeah I'll stick to the $900 3090s.

Anonymous
03/11/26(Wed)11:52:19 No.108346700

Anonymous 03/11/26(Wed)11:52:19 No.108346700

>>108346691
>Llama 2 7B
lol how many decades ago was this? what was the amd support state on lamocpp back then

Anonymous
03/11/26(Wed)12:10:40 No.108346824

Anonymous 03/11/26(Wed)12:10:40 No.108346824

>>108346451
Fake and gay.
So tired of this horseshit.
Don’t have an X acct and not setting one up to see this one btfo.

Anonymous
03/11/26(Wed)12:11:47 No.108346832

Anonymous 03/11/26(Wed)12:11:47 No.108346832

>>108346824
>she doesn't know about xcancel

Anonymous
03/11/26(Wed)12:24:43 No.108346925

Anonymous 03/11/26(Wed)12:24:43 No.108346925

>>108346700
With devstral q6 on vulkan (windows) it looked like (I didn't check the numbers) about the same performance as my w6800. Was definitely noticeably slightly slower than my 3090. A lot faster than dual p5000s. But 32gb in two slots is very appealing to me, and it's cheaper (650 aud) than my w6800 (1.2k aud) Come the weekend I'll throw it in my debian server and try rocm.

>>108346691
Can you upload your vbios?

Anonymous
03/11/26(Wed)12:35:28 No.108346993

Anonymous 03/11/26(Wed)12:35:28 No.108346993

couldn't resist pulling the reasoner budget, it's a nice way to cut qwen chatter
https://files.catbox.moe/ng0m1w.patch
here's the patch I am going to maintain to unslop some of it, along with a vulgar hack to strip away quotes "" from the reasoning-budget-message because it just so happens, if you have this
reasoning-budget-message = "Reasoning budget exceeded, let's write the answer."
in your presets.ini, it will actually fucking use the quotes and insert them as part of the message when reasoning budget triggers. It only happens when the arg is extracted from presets.ini running llama-server in router mode, not when you pass --reasoning-budget-message flag from the CLI. This one is more the router's fault than pwilkin's code, they haven't put much effort into the ini parsing and this behavior is desirable for passing json objects like
chat-template-kwargs = { "enable_thinking": true }
in your ini
I also add extra newlines before the message is inserted. It would be very dumb to default to inserting in the "I am thinReasoning budget exceeded" way pretty sure it would damage the model output
anyway just the router literal " passing reminds me that many of those vibers don't test a fucking thing for real before they hit the push button

Anonymous
03/11/26(Wed)12:38:04 No.108347012

Anonymous 03/11/26(Wed)12:38:04 No.108347012

>>108346993
>they

Anonymous
03/11/26(Wed)12:39:21 No.108347025

Anonymous 03/11/26(Wed)12:39:21 No.108347025

>>108347012
yes, they, plural. router mode is not wilkin's work as far as I remember

Anonymous
03/11/26(Wed)12:53:15 No.108347128

Anonymous 03/11/26(Wed)12:53:15 No.108347128

>>108343304
>he likely doesn't mean his foreskin because he is likely american
KEK

Anonymous
03/11/26(Wed)12:55:38 No.108347151

Anonymous 03/11/26(Wed)12:55:38 No.108347151

>>108341869
>122b MoE is comparable to the 27b dense.
I wish we had more dense models.

Anonymous
03/11/26(Wed)13:03:47 No.108347201

Anonymous 03/11/26(Wed)13:03:47 No.108347201

>>108346255
Sounds like an interesting issue to me & very figureoutable when reliably reproducible, you can likely understand it, I've done some time debugging Linuxy embedded things
Fresh llamacpp pull ya do it
Interested to hear what you figure out

Anonymous
03/11/26(Wed)13:39:07 No.108347466

Anonymous 03/11/26(Wed)13:39:07 No.108347466

>>108346289
Let's fucking go!

Anonymous
03/11/26(Wed)14:51:09 No.108348105

Anonymous 03/11/26(Wed)14:51:09 No.108348105

>>108343109
Rent compute on vast.ai until 2027 then buy when ram prices are sane again as production units ramp up.
This is the worst moment to buy anything, especially if you can wait it out by simply renting your hardware until then.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.