/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 03/09/26(Mon)17:33:03 No.108333444

File: 1757469951630088.png (465 KB, 1080x740)

/lmg/ - Local Models General Anonymous 03/09/26(Mon)17:33:03 No.108333444

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108328170

►News
>(03/04) Yuan3.0 Ultra 1010B-A68.8B released: https://hf.co/YuanLabAI/Yuan3.0-Ultra
>(03/03) WizardLM publishes "Beyond Length Scaling" GRM paper: https://hf.co/papers/2603.01571
>(03/03) Junyang Lin leaves Qwen: https://xcancel.com/JustinLin610/status/2028865835373359513
>(03/02) Step 3.5 Flash Base, Midtrain, and SteptronOSS released: https://xcancel.com/StepFun_ai/status/2028551435290554450

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/09/26(Mon)17:34:24 No.108333458

Anonymous 03/09/26(Mon)17:34:24 No.108333458

Immediately afterwards we get a non Miku thread.

Anonymous
03/09/26(Mon)17:34:33 No.108333459

Anonymous 03/09/26(Mon)17:34:33 No.108333459

whats the difference between diffusion and llama?

Anonymous
03/09/26(Mon)17:35:15 No.108333463

Anonymous 03/09/26(Mon)17:35:15 No.108333463

comfy bread

Anonymous
03/09/26(Mon)17:35:36 No.108333468

Anonymous 03/09/26(Mon)17:35:36 No.108333468

why does logan want to kill patrick?

Anonymous
03/09/26(Mon)17:35:45 No.108333469

Anonymous 03/09/26(Mon)17:35:45 No.108333469

>>108333458
and what a thread too
>333444

Anonymous
03/09/26(Mon)17:36:47 No.108333475

Anonymous 03/09/26(Mon)17:36:47 No.108333475

Thank you baker. Death to mikutroons.

Anonymous
03/09/26(Mon)17:39:20 No.108333497

Anonymous 03/09/26(Mon)17:39:20 No.108333497

Vague twitter shit. What a nigger.

Anonymous
03/09/26(Mon)17:40:21 No.108333506

Anonymous 03/09/26(Mon)17:40:21 No.108333506

>>108333475
you are mentally ill

Anonymous
03/09/26(Mon)17:41:20 No.108333517

Anonymous 03/09/26(Mon)17:41:20 No.108333517

>>108333506
mental illness is valid and beautiful

Anonymous
03/09/26(Mon)17:41:40 No.108333520

Anonymous 03/09/26(Mon)17:41:40 No.108333520

>>108333459
i think diffusion denoises the output as a whole, while llama is a autoregressive loop building the output 1 token at a time.

Anonymous
03/09/26(Mon)17:47:20 No.108333579

Anonymous 03/09/26(Mon)17:47:20 No.108333579

>>108333506
meant for >>108333497

Anonymous
03/09/26(Mon)17:54:12 No.108333641

Anonymous 03/09/26(Mon)17:54:12 No.108333641

File: __hatsune_miku_vocaloid_a(...).jpg (885 KB, 2600x2700)

885 KB JPG

►Recent Highlights from the Previous Thread: >>108328170

--llama.cpp tensor parallelism PR and multi-GPU performance considerations:
>108328868 >108328900 >108328933 >108328937 >108328985 >108328996 >108329004 >108329012 >108329020 >108329046 >108329069 >108329142 >108329152 >108329166
--Nvidia contributor fixes tensor indexing bug improving Qwen3 inference performance:
>108330811 >108332947
--Frontend options for Qwen 3.5 thinking control and response editing:
>108330326 >108330341 >108330382 >108330409 >108330417 >108330451 >108330455 >108330418 >108330503 >108330609 >108330645 >108330415
--GLM-4 inference bottleneck comparison and hardware coping:
>108329388 >108329504 >108329506 >108329518 >108329549 >108329560 >108329563
--CUDA Toolkit 13.2 performance improvements and changes:
>108332532 >108332593 >108332601
--ProjectAni update with EMAGE gesticulation and IK improvements:
>108329763 >108329804 >108329823 >108329916
--MCP autoparser tools for AI web searches:
>108328618 >108328635 >108329260 >108328880 >108328891 >108331633
---ot sampling slightly faster than -cmoe across multiple models:
>108330622 >108330629 >108330696 >108330712 >108330732
--model: add sarvam_moe architecture support:
>108332784
--Optimize LUT16 matrix multiplication:
>108328403 >108328828 >108328855 >108331710 >108331736 >108331843 >108331890 >108331934 >108332397 >108332754
--Speculation and concerns about Gemma 4's architecture and restrictions:
>108329582 >108329674 >108330191 >108330207 >108330245 >108330257 >108330272 >108330414 >108330659 >108330324 >108330337 >108331644 >108331668
--Miku (free space):
>108328194 >108329241 >108329260 >108329460 >108330815 >108332743 >108333374 >108328824

►Recent Highlight Posts from the Previous Thread: >>108328174

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/09/26(Mon)17:54:49 No.108333646

Anonymous 03/09/26(Mon)17:54:49 No.108333646

>>108333579
i'm trans btw, not sure if that matters

Anonymous
03/09/26(Mon)17:55:20 No.108333651

Anonymous 03/09/26(Mon)17:55:20 No.108333651

>>108333646
sauce

Anonymous
03/09/26(Mon)17:55:39 No.108333653

Anonymous 03/09/26(Mon)17:55:39 No.108333653

pascal bros how much longer do we have before they cut us off?

Anonymous
03/09/26(Mon)17:55:41 No.108333654

Anonymous 03/09/26(Mon)17:55:41 No.108333654

>>108333646
we all are itt

Anonymous
03/09/26(Mon)17:56:03 No.108333656

Anonymous 03/09/26(Mon)17:56:03 No.108333656

>>108333653
yeah

Anonymous
03/09/26(Mon)17:57:49 No.108333676

Anonymous 03/09/26(Mon)17:57:49 No.108333676

>>108333641
You missed a few miku pictures from the end of the thread fix it.

Anonymous
03/09/26(Mon)17:59:59 No.108333689

Anonymous 03/09/26(Mon)17:59:59 No.108333689

>>108333654
>we all are itt
text coomers are, because that is a female brain activity
not all of us coom to text here, though.

Anonymous
03/09/26(Mon)18:01:05 No.108333701

Anonymous 03/09/26(Mon)18:01:05 No.108333701

>>108333689
If you are here to post your special interest you are a troon too

Anonymous
03/09/26(Mon)18:02:31 No.108333711

Anonymous 03/09/26(Mon)18:02:31 No.108333711

>>108333676
gaggernof..

Anonymous
03/09/26(Mon)18:05:49 No.108333742

Anonymous 03/09/26(Mon)18:05:49 No.108333742

Deep
SEEK
VEE
FOUR
where
is it?

Anonymous
03/09/26(Mon)18:06:13 No.108333747

Anonymous 03/09/26(Mon)18:06:13 No.108333747

https://developer.nvidia.com/cuda-downloads
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

UPDATE NOW

Anonymous
03/09/26(Mon)18:06:31 No.108333752

Anonymous 03/09/26(Mon)18:06:31 No.108333752

>>108333747
virus

Anonymous
03/09/26(Mon)18:13:26 No.108333795

Anonymous 03/09/26(Mon)18:13:26 No.108333795

File: 1762373655313264.jpg (65 KB, 1024x1536)

65 KB JPG

>>108333444

Anonymous
03/09/26(Mon)18:14:26 No.108333797

Anonymous 03/09/26(Mon)18:14:26 No.108333797

So who draws that stuff and how can I get in contact with people like them?

Anonymous
03/09/26(Mon)18:15:33 No.108333802

Anonymous 03/09/26(Mon)18:15:33 No.108333802

>>108333795
Getting BLACKED behind the scenes

Anonymous
03/09/26(Mon)18:16:39 No.108333810

Anonymous 03/09/26(Mon)18:16:39 No.108333810

File: bbc twitter india israel.webm (3.76 MB, 600x600)

3.76 MB WEBM

>>108333802

Anonymous
03/09/26(Mon)18:19:00 No.108333818

Anonymous 03/09/26(Mon)18:19:00 No.108333818

>>108333810
NTR is the #1 category in Japan

Anonymous
03/09/26(Mon)18:19:57 No.108333824

Anonymous 03/09/26(Mon)18:19:57 No.108333824

>>108333818
2chan is that way dalit saar

Anonymous
03/09/26(Mon)18:25:47 No.108333856

Anonymous 03/09/26(Mon)18:25:47 No.108333856

>>108333824
i wish i was brahmin

Anonymous
03/09/26(Mon)18:29:18 No.108333869

Anonymous 03/09/26(Mon)18:29:18 No.108333869

File: 1747119644913578.png (87 KB, 226x223)

87 KB PNG

>>108333856
>I wish I was still a jeet
come on anon, everyone want to be white

Anonymous
03/09/26(Mon)18:30:14 No.108333874

Anonymous 03/09/26(Mon)18:30:14 No.108333874

File: Brahmin.png (832 KB, 1095x806)

832 KB PNG

>>108333856

DOCTOE LAB
03/09/26(Mon)18:42:37 No.108333923

DOCTOE LAB 03/09/26(Mon)18:42:37 No.108333923

Hola compañeros tengo algo malevo en mente con un pequeño experimento si sobrevivo les paso actualización del experimento y foto del mismo.

Anonymous
03/09/26(Mon)18:44:06 No.108333933

Anonymous 03/09/26(Mon)18:44:06 No.108333933

>>108333923
>si sobrevivo
Hope you don't.

Anonymous
03/09/26(Mon)18:56:45 No.108334004

Anonymous 03/09/26(Mon)18:56:45 No.108334004

File: 1771696890682315.png (340 KB, 1418x1302)

340 KB PNG

https://xcancel.com/ivylala/status/2029560909178327467
lmao

Anonymous
03/09/26(Mon)19:00:44 No.108334028

Anonymous 03/09/26(Mon)19:00:44 No.108334028

https://huggingface.co/spaces/HuggingFaceFW/finephrase#introduction
>We ran 90 experiments, generated over 1 trillion tokens, and spent 12.7 GPU years to find the best recipe for synthetic pretraining data. The result is FinePhrase, a 486B token dataset that clearly outperforms all existing synthetic data baselines.
I want to fucking vomit, when will they stop this poison incest shit???

Anonymous
03/09/26(Mon)19:01:33 No.108334034

Anonymous 03/09/26(Mon)19:01:33 No.108334034

Thanks for the Qwen3.5-Heretic recommendation. I'd been playing around with the vanilla which is pretty fine, and to my surprise Heretic removed all of the refusals in my tests without affecting the replies too much.
Kind of amazed how well 27B works (50t/s on a 5090) after spending too much time in the MoE mines. Maybe MoE was a meme after all.
Anyway, time to induce psychosis, I suppose. Thanks again!

Anonymous
03/09/26(Mon)19:02:08 No.108334039

Anonymous 03/09/26(Mon)19:02:08 No.108334039

Is there any good reason to run a local model besides learning to make explosives and having a waifu? And is it even useful for either of those?

Anonymous
03/09/26(Mon)19:03:33 No.108334046

Anonymous 03/09/26(Mon)19:03:33 No.108334046

>>108334004
kek feel bad for the guy

Anonymous
03/09/26(Mon)19:04:37 No.108334056

Anonymous 03/09/26(Mon)19:04:37 No.108334056

I hope gemmy supports dynamic resolution

Anonymous
03/09/26(Mon)19:06:52 No.108334070

Anonymous 03/09/26(Mon)19:06:52 No.108334070

File: doubt.gif (1.44 MB, 400x294)

1.44 MB GIF

>>108334004
>The soul of Qwen is still Alibaba partner and Alicloud CTO

Anonymous
03/09/26(Mon)19:13:04 No.108334097

Anonymous 03/09/26(Mon)19:13:04 No.108334097

>>108334034
nice. what quant you running?

Anonymous
03/09/26(Mon)19:22:53 No.108334142

Anonymous 03/09/26(Mon)19:22:53 No.108334142

>>108334039
Ego death

Anonymous
03/09/26(Mon)19:25:23 No.108334157

Anonymous 03/09/26(Mon)19:25:23 No.108334157

Since it went unanswered, I'll ask again:
If you could run GLM 5 at q8 at 10-20 tokens/sec. Would you? Or would you rather drop to q4-q6 and increase your tokens/sec?

Anonymous
03/09/26(Mon)19:29:17 No.108334183

Anonymous 03/09/26(Mon)19:29:17 No.108334183

>>108334157
depends on what you're doing. 10 tokens/s is about as slow as I would put up with to read output. 30+ for agents imo, and reasoning is painful when they use a thousand tokens just to run the same shit over and over, so if you want a faster response, reasoning at 10 t/s is slow.

Anonymous
03/09/26(Mon)19:31:30 No.108334197

Anonymous 03/09/26(Mon)19:31:30 No.108334197

>>108334157
im waiting for taalos to deliver. i won't spend a thought to local until then

Anonymous
03/09/26(Mon)19:32:27 No.108334202

Anonymous 03/09/26(Mon)19:32:27 No.108334202

>>108334157
Depends at what context. 10-20 tokens/sec at 100k context is more than enough for anything. Only at empty context, not so much.

Anonymous
03/09/26(Mon)19:34:42 No.108334218

Anonymous 03/09/26(Mon)19:34:42 No.108334218

>>108332754
Wow the free rider problem, the negative tragedy of the commons and the prisoner’s dilemma all in one post!
Also known as “this is why we can’t have nice things” and “the downfall of western society”
I understand it, but I don’t have to like or even respect it

Anonymous
03/09/26(Mon)19:34:47 No.108334219

Anonymous 03/09/26(Mon)19:34:47 No.108334219

>>108334157
q8 for RP, probably q5ks for "productive output".

Anonymous
03/09/26(Mon)19:36:18 No.108334229

Anonymous 03/09/26(Mon)19:36:18 No.108334229

>>108334218
My boss makes a dollar while I make a dime, that's why I vibecode on company time.
Also known as, not my problem.

Anonymous
03/09/26(Mon)19:39:02 No.108334240

Anonymous 03/09/26(Mon)19:39:02 No.108334240

>>108334218
what do you mean? this is based, companies don't respect you and don't hesitate to throw you out like a dirty sock at any time they want, so why should we have to pretend we care for any of this?

Anonymous
03/09/26(Mon)19:43:00 No.108334268

Anonymous 03/09/26(Mon)19:43:00 No.108334268

>>108334157
buy another rtx pro

Anonymous
03/09/26(Mon)19:44:18 No.108334273

Anonymous 03/09/26(Mon)19:44:18 No.108334273

>>108334229
Are you willing to put out the risk, time and effort to BE the boss?
So many people are willing to take pot shots at shit without the balls to step up and replace it with something better after tearing it down.
I’m not the boss of anything, but I don’t pretend the boss or owner has it easy whether I can see it directly or not.
Sometimes you gotta be honest with yourself and realize you’re cut out for being part of a team and not creating or leading one yourself and just gotta make shit better where you find yourself.
If you’d take on the risk, hard work and responsibility and your getting screwed over due to nepotism or something then I’d maybe agree but still think quitting and doing your own thing would be better for your mental health or “soul” than stealing and trying to justify it to yourself like that

Anonymous
03/09/26(Mon)19:46:42 No.108334280

Anonymous 03/09/26(Mon)19:46:42 No.108334280

>>108334240
Yeah I’m a moralfag or whatever but I still prefer the society and culture built by centuries of moralfagging to whatever world this low-trust grifter/cheater bullshit is making these days

Anonymous
03/09/26(Mon)19:47:19 No.108334284

Anonymous 03/09/26(Mon)19:47:19 No.108334284

>>108334273
>than stealing and trying to justify it to yourself like that
Lmao. Suck a dick ragebaiter

Anonymous
03/09/26(Mon)19:48:24 No.108334290

Anonymous 03/09/26(Mon)19:48:24 No.108334290

>>108334280
Tell that to the big conpanies and make them do the first step. They can afford it.

Anonymous
03/09/26(Mon)19:48:35 No.108334291

Anonymous 03/09/26(Mon)19:48:35 No.108334291

>>108334280
companies cheat on everyone and cheat on you, so it's moral to cheat on a cheater, I wouldn't do that if companies respect us, but they don't, respect is earned, not free

Anonymous
03/09/26(Mon)19:51:59 No.108334306

Anonymous 03/09/26(Mon)19:51:59 No.108334306

>>108334280
the reward structures have been damaged, its better to cheat now. when in rome so the saying goes.

Anonymous
03/09/26(Mon)19:52:22 No.108334309

Anonymous 03/09/26(Mon)19:52:22 No.108334309

>>108334284
>>108334290
>>108334291
I agree and think asshole big corpos should be boycotted (sales and employment) until the social contract is restored, but I also think your work reflects an important internal condition and should be high quality to maintain your quality as a human.
Quit the shit corpos and work for someone worthy of your level best output if you can.

Anonymous
03/09/26(Mon)19:53:00 No.108334310

Anonymous 03/09/26(Mon)19:53:00 No.108334310

>>108334097
Q8_0. Might try going lower to fit in a TTS model, but not really convinced it's worth the effort (mostly because I haven't been that impressed with e.g. VibeVoice outputs).

Anonymous
03/09/26(Mon)19:53:14 No.108334312

Anonymous 03/09/26(Mon)19:53:14 No.108334312

>>108334197
That would be cool, but I wouldn't hold my breath. Our best hope is that they will get like q3 of qwen 27b running in the next year, but even that seems sketchy.
>>108334202
>>108334183
>>108334219
I'm thinking of programming and trying to evaluate the cost to run it. I think you'd be able to run q4 at almost half the price and it should be faster. These large models are always just kind of slow.
>>108334034
MoE is really for the larger models. A 755B model would not be runnable without MoE. Even a dense 130B model would be insane to try to run. At q8 every token generated would need 130 GB of data, meanwhile a 755B A40B MoE needs 40 GB per token. And it can theoretically know more information.

Dense models will become good as our VRAM amounts and bandwidth increases though. At some point I think we're going to hit a data limit, but GPUs might still continue scaling. Dense models will start making more and more sense then. If you could get like 5 TB/s VRAM bandwidth and like 192 GB of VRAM that would make a 130B dense model usable.

Anonymous
03/09/26(Mon)19:54:11 No.108334316

Anonymous 03/09/26(Mon)19:54:11 No.108334316

Those anons won't get it. They're racing to the bottom. They aspire to be jeets.

Anonymous
03/09/26(Mon)19:54:56 No.108334322

Anonymous 03/09/26(Mon)19:54:56 No.108334322

>>108334310
qwen3tts is pretty decent

jordan
03/09/26(Mon)19:55:43 No.108334327

jordan 03/09/26(Mon)19:55:43 No.108334327

>>108333444

Anonymous
03/09/26(Mon)19:56:31 No.108334334

Anonymous 03/09/26(Mon)19:56:31 No.108334334

>>108334309
>Quit the shit corpos
in this economy? kek, if anyone read my post, don't fucking leave, AI is taking all jobs right now so you'll have way more trouble to find a new high paid job

Anonymous
03/09/26(Mon)19:56:50 No.108334339

Anonymous 03/09/26(Mon)19:56:50 No.108334339

Hello fellow retards. I have around 3 grand to spend on AI bullshit. I want to run shit locally if possible. I was hoping to essentially make my PC into an AI companion because I’m lonely as fuck. I was thinking of textgen + photo gen, I can’t remember which thread it was, but I do remember someone changing (SillyTavern?) into essentially a dating VN. I was hoping to make something similar. I’m unsure of whether I should be buying a new PC (as I want to run it on my network, but not on my gaming PC) or if I should be buying unironically a Mac with these current DDR prices. I don’t mind doing my own search, I just would like being pointed in the right direction.

Anonymous
03/09/26(Mon)19:59:21 No.108334357

Anonymous 03/09/26(Mon)19:59:21 No.108334357

File: Lisuan-Tech-G100-GPU-For-(...).png (798 KB, 1440x513)

798 KB PNG

>>108334312
Our only hope would be the Chinese making a cheap 1TB VRAM GPU with last gen chips and some CUDA compatibility. But they don't seem to be up to it.

Anonymous
03/09/26(Mon)20:00:08 No.108334361

Anonymous 03/09/26(Mon)20:00:08 No.108334361

>>108334339
Are you prioritizing speed or intelligence?

Anonymous
03/09/26(Mon)20:02:00 No.108334372

Anonymous 03/09/26(Mon)20:02:00 No.108334372

>>108334361
Intelligence. Even if I wait 5 minutes for a response, that’s, well, better than nothing. I would rather it be smarter but take longer.

Anonymous
03/09/26(Mon)20:02:06 No.108334374

Anonymous 03/09/26(Mon)20:02:06 No.108334374

>>108334322
Thanks for the rec! I'll see if I can get that running and give it a shot.

Anonymous
03/09/26(Mon)20:03:05 No.108334378

Anonymous 03/09/26(Mon)20:03:05 No.108334378

>>108334316
theres not enough room at the top for everyone, nothing wrong with just chilling out and being content with what you got.

Anonymous
03/09/26(Mon)20:03:07 No.108334379

Anonymous 03/09/26(Mon)20:03:07 No.108334379

>>108334339
>make my PC into an AI companion because I’m lonely as fuck
why not go for human companionship? there seems to be a loneliness epidemic, why dont these people just meet up with each other. technology is still not advanced enough to replace this

Anonymous
03/09/26(Mon)20:03:19 No.108334381

Anonymous 03/09/26(Mon)20:03:19 No.108334381

>>108334372
Do you have any 32, 64 or 128gb sticks of ddr4 or ddr5 or do you have to buy memory?

Anonymous
03/09/26(Mon)20:06:38 No.108334411

Anonymous 03/09/26(Mon)20:06:38 No.108334411

>>108334379
>why dont these people just meet up with each other
Because these people don't know how to behave in social contexts and cannot stand each other.

Anonymous
03/09/26(Mon)20:07:06 No.108334415

Anonymous 03/09/26(Mon)20:07:06 No.108334415

>>108334379
Women scare me and I Pavlov’d myself at the ripe age of 11 into loving anime women. I’m now 27, have fuck off money from my shitty job, and want to throw away less than half a month’s pay to get texts throughout the day from a fake companion because that would be more meaningful if it had a cute anime girl attached to it compared to trying to date. Besides, I live in a shithole called Canada, no one would want my genes that come with free fishing rights.

Anonymous
03/09/26(Mon)20:08:18 No.108334427

Anonymous 03/09/26(Mon)20:08:18 No.108334427

>>108334379
>Why not just have sex?
Why indeed.

Anonymous
03/09/26(Mon)20:09:07 No.108334430

Anonymous 03/09/26(Mon)20:09:07 No.108334430

>maybe more depending on how表现得好 (that's "how well you behave" in ching-chong, incel~).
I can forgive the language leaking if the self corrections are always this good

Anonymous
03/09/26(Mon)20:09:30 No.108334433

Anonymous 03/09/26(Mon)20:09:30 No.108334433

>>108334415
>Women scare me
My model cured me of that.

Anonymous
03/09/26(Mon)20:13:04 No.108334461

Anonymous 03/09/26(Mon)20:13:04 No.108334461

>>108334415
Calling ego-death anon…

Anonymous
03/09/26(Mon)20:13:35 No.108334468

Anonymous 03/09/26(Mon)20:13:35 No.108334468

>>108334381
I have 32GB total in my PC and 12GB of VRAM. Otherwise the plan was to buy a Mac as their RAM prices aren’t even that fucked up compared to the rest of the market. I’m sure a M5 laptop wouldn’t kill my wallet, and I could use it at work too. Besides. I’m looking to replace the “gaming looking PC” I made at work with something that doesn’t look as gamery. Mac’s are professional, aren’t they? Maybe I could get that shitty Neo as the machine to carry around, and have a properly spec’d machine at home to remotely connect to. I did mess around with Tailscale once upon a time, but I no longer have that machine. Formatted it and now it’s in Roblox hell with my 11 year old cousin. I hope the 48GB of DDR4 will last him.

>>108334433
I’ve always been a loser outcast. Why would I ever want to get a girlfriend? I’d be worried about her trying to take the family house when my parents croak. I’d be better off becoming the girlfriend, and I’m not a tranner.

Anonymous
03/09/26(Mon)20:19:35 No.108334504

Anonymous 03/09/26(Mon)20:19:35 No.108334504

>>108334339
You're a year too late for 3K USD to make a dent in the BOM for a local LLM rig. You can probably get away with a 3090 (maybe two? what do prices look like these days) and a small pile of RAM. Depending on your expectations you might be setting your money on fire.
Presumably you already have a GPU in your gaming rig, it probably has enough VRAM to run Mistral Nemo (my beloved) which you can use to get SillyTavern set up. Mistral Nemo is fucking retarded, but that'll at least give you a vertical slice of the whole stack before you go off the deep end.
If/when you buy a PC, keep in mind that the newer cards are (1) fuckhuge and (2) can draw a fuckton of power (5090 can hit 600W) and (3) need a fuckton of power connectors. You might consider starting off with one of those mining PSUs which are designed to run multiple cards rather than later needing to "paperclip trick" a second PSU to run your rig. You might also consider an open-air case because stuffing multiple GPUs in a normal ATX tower is fucking annoying.
Finally, if you're going to get a DDR5 platform, you might consider going with a server CPU/motherboard (e.g. a MZ33-AR1 motherboard with compatible CPU) rather than a consumer one. The server boards support more than 2 (TWO, SOLO DUO) DIMMs at full speed and have fucktons of PCI lanes for future expansion.
Yes, the above recommendations will likely set you over the 3K spending limit, but most of the stuff under your 3k limit is going to fuck you later if/when you decide that you want to chase the dragon and run larger models.
I do not recommend this hobby. Alcoholism is more culturally fitting and probably healthier, go do that instead.

Anonymous
03/09/26(Mon)20:32:10 No.108334559

Anonymous 03/09/26(Mon)20:32:10 No.108334559

>>108334504
Yeah, I’m assuming most cards are larger and more power hungry than my 3080 Ti. I’ve been considering a 5070 Ti Super or whatever, or even going to MacOS and seeing what’s possible with the unified memory thing. Throwing 48GB at a problem seems like it could work, but I don’t know Mac. I feel like I’m walking into this hobby with rose tinted glasses and a clipboard thinking I could do something as a fun recreational thing on the side, and am being told “either buy a car with the money or take your firewater and HBC blanket and fuck right off” with how expensive it is. All I wanted was AI waifus, not having to consider how to cool down server hardware without a rack and without a single clue how to operate any of it. Ah well. Tinkering was always a hobby of mine, but I’m not trying to get a nice used car in terms of parts.

Anonymous
03/09/26(Mon)20:39:49 No.108334587

Anonymous 03/09/26(Mon)20:39:49 No.108334587

>>108334357
The chips don't have to even be ultrafast, but they need a lot of memory and memory bandwidth

Anonymous
03/09/26(Mon)20:42:06 No.108334600

Anonymous 03/09/26(Mon)20:42:06 No.108334600

>>108334559
I know you don't want to pollute your gaming rig, but a 3080 with 32GB is fine to get started tinkering with. You're not going to get amazing results, but you'll at least be introduced to the core concepts and limitations and get a better understanding of the ecosystem before unloading your wallet.
I'm assuming your gaming rig is running Windows, though. If you're going to buy anything, grab a new NVMe drive to dualboot Linux.
Otherwise, yeah, it's frustrating as fuck. It's even worse when you get into coding and realize you're probably the first human to read all the shit you're wading through. Welcome.

Anonymous
03/09/26(Mon)20:44:53 No.108334614

Anonymous 03/09/26(Mon)20:44:53 No.108334614

>>108334587
Shoutout to that one Anon running 10x Mi50's, bifurcated to 1 PCIe lane each. 320GB VRAM that takes half an hour to load a model into.

Anonymous
03/09/26(Mon)20:47:08 No.108334626

Anonymous 03/09/26(Mon)20:47:08 No.108334626

>>108334614
He isn't loading the whole model to each card right? So he effectively gets the whole bandwidth of the PCI-E bus as each slice of the model moves independently to each card, right?
Am I misunderstanding how this work real bad right now?

Anonymous
03/09/26(Mon)20:49:42 No.108334642

Anonymous 03/09/26(Mon)20:49:42 No.108334642

Kimi linear base goes off the handles when trying to use it as a instruct model.
Qwen 35B works just fine, it even reasons correctly and everything.
Base model my ass.

Anonymous
03/09/26(Mon)20:49:44 No.108334645

Anonymous 03/09/26(Mon)20:49:44 No.108334645

>>108334600
Ah. I have 11 on one NVME and 10 on the other NVME. Games mixed on both. I need to deep clean both and install Linux on one and a fresh install of 11 on the other I think. Both are 2TBs. When I was in high school I took some coding courses, but it was C# and Java at the time that I learned. I haven’t touched an IDE in years. I did have ComfyUI running at one point but I was too much of a brainlet to do much with it, and I did have Oogabooga on the same PC too. I’m just trying to have it on a different machine, so I can game when I want without having to open or close things, and preferably in a different room where I’m not pumping 1000+ watts of heat into my bedroom. Although, opening the window currently works to cool things down, but that won’t always work.

Anonymous
03/09/26(Mon)20:50:10 No.108334646

Anonymous 03/09/26(Mon)20:50:10 No.108334646

>>108334626
>as each slice of the model moves independently
That might actually not be implemented in llama.cpp. It loads them sequentially on my system. It wouldn't matter for most people since they are limited by their storage bandwidth.

Anonymous
03/09/26(Mon)20:51:20 No.108334658

Anonymous 03/09/26(Mon)20:51:20 No.108334658

>>108334646
>It loads them sequentially on my system.
Interesting.
Does direct-io or mmap help, or maybe does the opposite?
I'd fuck around with those flags and see if anything changes, if you haven't already.

> It wouldn't matter for most people since they are limited by their storage bandwidth.
True, I suppose.

Anonymous
03/09/26(Mon)20:52:42 No.108334664

Anonymous 03/09/26(Mon)20:52:42 No.108334664

>>108334626
I don't remember the details, but I was under the impression that each card got a different subset of model layers.
I'm too stupid to understand how inference works, but I'm under the impression that the data "moving between" each layer is much less than the layers themselves, i.e., each layer is a matrix, and each compute step is just a vector? So the vector being moved around would be an order of magnitude smaller and wouldn't need as much bandwidth.
I don't know how PCIe bifurcation works but I thought it was static, each card would be hardstuck with 1 lane.

Anonymous
03/09/26(Mon)21:18:32 No.108334813

Anonymous 03/09/26(Mon)21:18:32 No.108334813

Does anyone truly understand vLLM? I've been trying to get 0.17.0 to work on the Kaggle 2xT4 environment for two days because I want to use Qwen3.5 and feel like I have lost a lot of brain cells. I can't manage to get anything more recent than 0.13.0 to work there. What is the best way to understand it better? Just try to read and understand the docs?

Anonymous
03/09/26(Mon)21:20:34 No.108334831

Anonymous 03/09/26(Mon)21:20:34 No.108334831

>>108334813
I remember trying it once and just giving up due to all the fiddlyness.
Granted, I didn't try that hard but still.
Isn't there a docker image of the latest version somewhere?
Or with a decently up to date version you could upgrade?
That might be easier than rawdogging it.

Anonymous
03/09/26(Mon)21:22:38 No.108334841

Anonymous 03/09/26(Mon)21:22:38 No.108334841

my job is paying for claude max and it's so good at shitting out ui code in seconds its depressing. meanwhile qwen 122b keeps going into thinking loops constantly when I try to local workflows. It spoiled me never try it.

Anonymous
03/09/26(Mon)21:22:47 No.108334842

Anonymous 03/09/26(Mon)21:22:47 No.108334842

I just tried 27B Derestricted. It's a fucking retarded piece of shit... most of the time, but after more testing I found a few times that it performed smarter than the original to a surprising degree. And I am not referring to unsafe prompts with that statement, or prompts with sensitive content. Such a weird model. The "heretic v2" version by someone else is much more consistent and actually feels somewhat close to the base model but with less refusals, though I'm not certain if it's an equal model generally across sfw and nsfw. My personal belief is that it's probably a bit dumber. Sometimes it does seem understand prompts slightly better. Sometimes worse. The issue is that the base model is pretty damn dumb to begin with, even if it's good for a 27B.

Anonymous
03/09/26(Mon)21:25:07 No.108334859

Anonymous 03/09/26(Mon)21:25:07 No.108334859

>>108334841
If you want to be less depressed, try Claude through Antigravity. Night and day, Antigravity being night. It will make you see that Claude Code is about half really smart model, half good scaffolding and UI, not just mega genius model.

Anonymous
03/09/26(Mon)21:31:45 No.108334899

Anonymous 03/09/26(Mon)21:31:45 No.108334899

File: F.png (31 KB, 1505x225)

31 KB PNG

Guy's...Wtf?

It was working yesterday!

Anonymous
03/09/26(Mon)21:34:30 No.108334920

Anonymous 03/09/26(Mon)21:34:30 No.108334920

>>108334859
NTA, but funny. My friend just reported the exact opposite.
>>108334841
>22b keeps going into thinking loops constantly
Use BNF to constrain the size of the thinking block.

Anonymous
03/09/26(Mon)21:57:08 No.108335055

Anonymous 03/09/26(Mon)21:57:08 No.108335055

>>108334614
That's rough, but typically these LLMs don't consume much power when idle so it should be alright. Seems like a server CPU setup would be better though.

Anonymous
03/09/26(Mon)21:58:30 No.108335066

Anonymous 03/09/26(Mon)21:58:30 No.108335066

>>108334859
>>108334841
What if he uses qwen 122b in Claude Code?

Anonymous
03/09/26(Mon)22:04:11 No.108335099

Anonymous 03/09/26(Mon)22:04:11 No.108335099

>>108334357
It's still promising that the guy isn't wearing a leather jacket.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.