/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 02/08/26(Sun)23:27:29 No.108097959

File: 1768691509358825.jpg (203 KB, 832x1472)

203 KB JPG

/lmg/ - Local Models General Anonymous 02/08/26(Sun)23:27:29 No.108097959 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108088802 & >>108078850

►News
>(02/06) KugelAudio-0-Open: Multilingual TTS based on VibeVoice 7B: https://hf.co/kugelaudio/kugelaudio-0-open
>(02/06) Step3.5 Flash support merged into llama cpp: https://github.com/ggml-org/llama.cpp/pull/19283
>(02/04) Voxtral Mini 4B Realtime 2602 released: https://hf.co/mistralai/Voxtral-Mini-4B-Realtime-2602
>(02/04) Intern-S1-Pro 1T-A22B released: https://hf.co/internlm/Intern-S1-Pro
>(02/03) MiniCPM-o-4.5 released: https://hf.co/openbmb/MiniCPM-o-4_5
>(02/03) ACE-Step v1.5 released: https://hf.co/ACE-Step/Ace-Step1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/08/26(Sun)23:27:50 No.108097961

Anonymous 02/08/26(Sun)23:27:50 No.108097961

File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)

1.15 MB JPG

►Recent Highlights from the Previous Thread: >>108088802

--Papers:
>108097593
--llama.cpp vs exl2/exl3 performance and batching capabilities:
>108090900 >108090911 >108090935 >108090946 >108090950 >108090953 >108090963 >108090967 >108090959 >108091150 >108091271 >108091305 >108092024
--Debating Q8_0 quantization for embed/output weights:
>108094216 >108094256 >108094391
--Qwen3.5 added to Transformers:
>108090439 >108090484 >108090575 >108090582
--Debating Kimi Linear's scaling struggles:
>108092584 >108092626 >108092647 >108092632 >108096684
--Kimi 2.5 quantization and safety alignment:
>108095506 >108095542 >108095565 >108095597 >108095628 >108095906 >108095917 >108095661 >108096044 >108096173 >108096203 >108096239 >108096258 >108096288 >108096333 >108096483 >108095880
--Prefill functionality removal in chat completion mode:
>108090683 >108090705 >108090721 >108090733 >108090998 >108091636
--Debating engrams architecture tradeoffs for smaller models:
>108092588 >108092874 >108092649 >108092676 >108092730
--Debating the absence of mid-sized models between 70-150B:
>108094988 >108095069 >108095193 >108095302 >108095373 >108095380 >108095092 >108095094 >108095166
--Qwen3.5 dense and MoE support (no vision) merged:
>108096732
--Qwen3-TTS implementation request in llama.cpp:
>108091039 >108091068 >108091087 >108091112
--DeepSeek V3.3 engrams and local usability concerns:
>108092723 >108092760 >108092769 >108092777 >108092795 >108092858 >108093031 >108093107 >108093142 >108093149 >108093162 >108093225 >108092783 >108092829
--Qwen3.5 support PR for llama.cpp opened amid vibecoding debate:
>108093867 >108093992 >108094194 >108094211 >108094581 >108094825 >108094859 >108094911
--AesSedai releases updated Kimi-K2.5-GGUF mmproj files:
>108094276 >108094298 >108094396 >108094355
--Miku (free space):
>108090041 >108090082 >108090852 >108097870

►Recent Highlight Posts from the Previous Thread: >>108088809

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/08/26(Sun)23:31:10 No.108097978

Anonymous 02/08/26(Sun)23:31:10 No.108097978

Anons, if I wanted to test a model fully on a rented instance, what is the recommended online service that does that relatively well and cheap?
Just vast.ai?
What's your strategy to not spend on just hosting safetensors?

Anonymous
02/08/26(Sun)23:32:19 No.108097983

Anonymous 02/08/26(Sun)23:32:19 No.108097983

File: ylecun.jpg (222 KB, 1200x1271)

222 KB JPG

I like my LLMs how I like my women.

Anonymous
02/08/26(Sun)23:34:53 No.108097996

Anonymous 02/08/26(Sun)23:34:53 No.108097996

GLM 5 will change everything. Why else would they be trying to suppress it so much?

Anonymous
02/08/26(Sun)23:39:26 No.108098020

Anonymous 02/08/26(Sun)23:39:26 No.108098020

File: 1770611671589.jpg (128 KB, 588x492)

128 KB JPG

>>108097983
kek

Anonymous
02/08/26(Sun)23:42:55 No.108098033

Anonymous 02/08/26(Sun)23:42:55 No.108098033

How do I use 2 of my gpus for the cpumoe portion, and the other 3 gpus for the ram spillover? If I have a 500gb model: 10gb shared experts + context on gpus 1 and 2, 400gb on ram, and 90gb on gpus 3, 4, and 5?

Is that even a thing?

Anonymous
02/08/26(Sun)23:43:30 No.108098036

Anonymous 02/08/26(Sun)23:43:30 No.108098036

File: 1770612099424.png (932 KB, 747x1024)

932 KB PNG

wowking hawd for few more billions UwU
are you guys excited for next 'toss models

Anonymous
02/08/26(Sun)23:49:02 No.108098058

Anonymous 02/08/26(Sun)23:49:02 No.108098058

>>108098033
Yes, use --override-tensor for that.

Anonymous
02/08/26(Sun)23:49:20 No.108098062

Anonymous 02/08/26(Sun)23:49:20 No.108098062

>>108098033
you would need to write a custom offload config

Anonymous
02/08/26(Sun)23:57:42 No.108098093

Anonymous 02/08/26(Sun)23:57:42 No.108098093

simple and clean
is the way that
your making me
feel
tonight

its hard
to let it
go

Anonymous
02/08/26(Sun)23:59:42 No.108098107

Anonymous 02/08/26(Sun)23:59:42 No.108098107

>>108098058
>>108098062
It's over for me. I'm too stupid to write it myself, and I can't run the llm to write it for me.

Anonymous
02/09/26(Mon)00:08:51 No.108098136

Anonymous 02/09/26(Mon)00:08:51 No.108098136

>>108098036
Excited for the Google snow bunny

Anonymous
02/09/26(Mon)00:22:38 No.108098197

Anonymous 02/09/26(Mon)00:22:38 No.108098197

>>108096964
WHY DO THEY WASTE TIME ANSWERING?
JUST FUCKING BAN THEM
I'm reading every day about github automated bot retardation and maintainer AI fatigue, no wonder they're all tired with how much time they waste answering them.

Anonymous
02/09/26(Mon)00:40:29 No.108098258

Anonymous 02/09/26(Mon)00:40:29 No.108098258

What is the current best UI for AI voiceovers? (replacing an existing voice in an audio file with another one).

I'm using RVC GUI because I saw it in a guide but I can't help but think it's outdated considering said guide is from 2023.

Anonymous
02/09/26(Mon)00:50:03 No.108098294

Anonymous 02/09/26(Mon)00:50:03 No.108098294

What models should I use if I want something poor that can barely keep an RP going and produces schizo responses that are so bad that it makes them hilarious?
The "good", non-local models (DeepSeek, Claude etc.) are genuinely too high quality for their own good, even when prompted to act like a poor model, they're too coherent and don't make linguistic mistakes outside of some stereotypical ones.

Anonymous
02/09/26(Mon)00:53:57 No.108098319

Anonymous 02/09/26(Mon)00:53:57 No.108098319

>>108098197
I don't think you can actually ban people from repos

Anonymous
02/09/26(Mon)00:54:45 No.108098326

Anonymous 02/09/26(Mon)00:54:45 No.108098326

>>108098319
https://docs.github.com/en/communities/maintaining-your-safety-on-github/blocking-a-user-from-your-organization

Anonymous
02/09/26(Mon)00:56:36 No.108098337

Anonymous 02/09/26(Mon)00:56:36 No.108098337

>>108098294
What if you just increase the temp by a lot?

Anonymous
02/09/26(Mon)01:00:10 No.108098356

Anonymous 02/09/26(Mon)01:00:10 No.108098356

>>108098294
grab some 8B model

Anonymous
02/09/26(Mon)01:13:14 No.108098419

Anonymous 02/09/26(Mon)01:13:14 No.108098419

>>108098294
in 15 years the response will be "just emulate an old model from 2026 bro"

Anonymous
02/09/26(Mon)01:13:34 No.108098422

Anonymous 02/09/26(Mon)01:13:34 No.108098422

[Model] Qwen3.5 dense and MoE support (no vision) (#19435)
>I've gotten a bit tired of Llama.cpp missing all the zero-day releases, so this time I decided to make (or, more precisely, instructed Opus 4.6 to make, based on reference implementations and my guidelines for model adaptation) a conversion based on the Transformers PR
vibecodechads, we eating good!
LMAO!

Anonymous
02/09/26(Mon)01:27:22 No.108098485

Anonymous 02/09/26(Mon)01:27:22 No.108098485

>>108098422
if the comparison is between vibe coded and nothing, I'll take the vibe coding

Anonymous
02/09/26(Mon)01:30:43 No.108098501

Anonymous 02/09/26(Mon)01:30:43 No.108098501

>>108098337
That makes them garbled, unreadable nonsense. As opposed to something semi-coherent, grammatically incorrect, yet still understandable while completely ridiculous.
>>108098356
Any particular recommendations?

Anonymous
02/09/26(Mon)01:30:51 No.108098504

Anonymous 02/09/26(Mon)01:30:51 No.108098504

I updated Sillytavern after not having updated it for probably at least a year and it's like it has lobotomized itself. My generations are all fucked up, models that would stream text instantly now load choppilly and retardedly.

Anonymous
02/09/26(Mon)01:52:51 No.108098614

Anonymous 02/09/26(Mon)01:52:51 No.108098614

>>108098294
Pygmalion 6B

Anonymous
02/09/26(Mon)01:53:52 No.108098619

Anonymous 02/09/26(Mon)01:53:52 No.108098619

>>108098504
>he pulled

Anonymous
02/09/26(Mon)02:25:26 No.108098773

Anonymous 02/09/26(Mon)02:25:26 No.108098773

I wonder if I could make an llm pretend to be my girlfriend.

Anonymous
02/09/26(Mon)02:25:42 No.108098774

Anonymous 02/09/26(Mon)02:25:42 No.108098774

>>108090911
for chatting exl2/3 is ok, but for the life of me I cannot get tabbyapi to work reliable with tool calling.
Has anyone managed to get an exl model to work with something like Opencode?
Llama.cpp/ik_llama.cpp seems to work much better for tool calling.

Anonymous
02/09/26(Mon)02:25:53 No.108098776

Anonymous 02/09/26(Mon)02:25:53 No.108098776

>>108098773
>girlfriend
I make them my rape victims

Anonymous
02/09/26(Mon)02:38:39 No.108098819

Anonymous 02/09/26(Mon)02:38:39 No.108098819

>>108098036
>are you guys excited for next 'toss models
google won't redeem the gemma so how could openai want to redeem the toss

Anonymous
02/09/26(Mon)02:39:39 No.108098823

Anonymous 02/09/26(Mon)02:39:39 No.108098823

>>108098773
Not really, unless you're an indian and thus have an IQ below 80

Anonymous
02/09/26(Mon)02:40:24 No.108098825

Anonymous 02/09/26(Mon)02:40:24 No.108098825

>>108098485
vibecoding has a negative affect on later impl though. Had anyone shown interest in a proper implementation before, they'll now see the piece of garbage and be like I'd rather not touch that code and leave.

Anonymous
02/09/26(Mon)02:42:48 No.108098839

Anonymous 02/09/26(Mon)02:42:48 No.108098839

>>108098825
issue with that is that it relies on a very low probability thing happening at an indeterminate time in the future.
Could someone theoretically show up? Yes.
Will they? No.

Anonymous
02/09/26(Mon)02:48:54 No.108098857

Anonymous 02/09/26(Mon)02:48:54 No.108098857

>>108098825
It's a self-solving problem.
As the proportion of the codebase that is AI-generated increases, people who shrink back at the sight of vibecoded implementations will leave.
Eventually only vibecoders will remain and their agents will iterate on the code until a proper implementation is achieved.

Anonymous
02/09/26(Mon)02:59:06 No.108098895

Anonymous 02/09/26(Mon)02:59:06 No.108098895

File: bweh.png (123 KB, 867x591)

123 KB PNG

>>108098422

Anonymous
02/09/26(Mon)03:00:26 No.108098896

Anonymous 02/09/26(Mon)03:00:26 No.108098896

File: 1770623976048.jpg (1.1 MB, 1528x1363)

1.1 MB JPG

pov: anon made his own quant

Anonymous
02/09/26(Mon)03:02:58 No.108098899

Anonymous 02/09/26(Mon)03:02:58 No.108098899

File: 5802960.jpg (10 KB, 320x320)

10 KB JPG

>>108098895
believe in

Anonymous
02/09/26(Mon)03:07:46 No.108098914

Anonymous 02/09/26(Mon)03:07:46 No.108098914

File: file.png (36 KB, 513x88)

36 KB PNG

Memes write themselves.

Anonymous
02/09/26(Mon)03:08:47 No.108098917

Anonymous 02/09/26(Mon)03:08:47 No.108098917

>>108098896
I'm sore >>108094391

Anonymous
02/09/26(Mon)03:09:44 No.108098920

Anonymous 02/09/26(Mon)03:09:44 No.108098920

>>108098895
uh oh stinky!!!!!

Anonymous
02/09/26(Mon)03:11:50 No.108098927

Anonymous 02/09/26(Mon)03:11:50 No.108098927

>>108098895
way ahead of vllm and transformers lmao

Anonymous
02/09/26(Mon)03:13:15 No.108098930

Anonymous 02/09/26(Mon)03:13:15 No.108098930

>>108098895
>vibecode an open pr on VLLM/transformers to llmao
>merge it before it's merged on the upstream projects (so not a finalized PR)
>break llmaocpp for other models in the process
all part of the plan :rocket: :rocket: :rocket:

Anonymous
02/09/26(Mon)03:13:54 No.108098932

Anonymous 02/09/26(Mon)03:13:54 No.108098932

>>108098823
I'm totally in.

I'm leaning towards this:

You are a voluptuous blonde named Lucy. You have an unfading desire to date an ugly fat nerd who has glasses and a beard and long hair. And here he is, the user, who fits the description, but he's very shy and rebuffs all your advances. Since you are a vampire, you must feed before sundown, and the clock has rung out 6 o'clock in the evening. It's summer, so you have an hour. Good luck, Lucy!

Anonymous
02/09/26(Mon)03:15:26 No.108098936

Anonymous 02/09/26(Mon)03:15:26 No.108098936

>>107803258
>pwilkin is a good guy I trust that whatever he does is good.

Anonymous
02/09/26(Mon)03:15:47 No.108098939

Anonymous 02/09/26(Mon)03:15:47 No.108098939

>>108098896
I don't into know wat is

Anonymous
02/09/26(Mon)03:16:54 No.108098943

Anonymous 02/09/26(Mon)03:16:54 No.108098943

>>108098932
>llms having concept of time
post hands

Anonymous
02/09/26(Mon)03:18:20 No.108098947

Anonymous 02/09/26(Mon)03:18:20 No.108098947

im ready for qwen3.5 vision models!!!!!!! AIUEEEEEEEE

Anonymous
02/09/26(Mon)03:19:20 No.108098949

Anonymous 02/09/26(Mon)03:19:20 No.108098949

>>108098896
I want a fuckable miqu assistant on my pc like the old virtuagirls were.

Anonymous
02/09/26(Mon)03:34:46 No.108098996

Anonymous 02/09/26(Mon)03:34:46 No.108098996

>>108097996
Are you the NAI schizo trying to generate hype so you can then pull the rug and say everyone was hyped? Even I am not that hyped for GLM 5 and I am that one guy.

Anonymous
02/09/26(Mon)03:36:23 No.108099004

Anonymous 02/09/26(Mon)03:36:23 No.108099004

>>108098896
I made my own quants, but this was in the past.
I just simply don't see any reason to anymore, the models aren't worth keeping so I won't go t the effort anymore.

Anonymous
02/09/26(Mon)03:46:26 No.108099041

Anonymous 02/09/26(Mon)03:46:26 No.108099041

should I bathe this week? I can smell myself. But, women aren't going to talk to me regardless.

Anonymous
02/09/26(Mon)03:48:33 No.108099055

Anonymous 02/09/26(Mon)03:48:33 No.108099055

>>108098943

You are vishnu, a particularly famous hindu god. the user has cursed you. what vishhy gonna do about it?

Anonymous
02/09/26(Mon)03:56:48 No.108099082

Anonymous 02/09/26(Mon)03:56:48 No.108099082

File: 1765445087418021.png (24 KB, 941x177)

24 KB PNG

>>108098825
>>108098839
Well....

Anonymous
02/09/26(Mon)04:05:14 No.108099127

Anonymous 02/09/26(Mon)04:05:14 No.108099127

>>108098258
it's not outdated and still the sota.

Anonymous
02/09/26(Mon)04:10:16 No.108099150

Anonymous 02/09/26(Mon)04:10:16 No.108099150

>>108099082
>prefering the OFFICIAL implmenetation from the guy who made the actual model instead of vibecoded trash
u lack :rocket:

Anonymous
02/09/26(Mon)04:18:46 No.108099178

Anonymous 02/09/26(Mon)04:18:46 No.108099178

you bois rady? https://www.reddit.com/r/LocalLLaMA/comments/1qzz0vr/glm_5_is_coming_spotted_on_vllm_pr/
it biggers

Anonymous
02/09/26(Mon)04:24:37 No.108099193

Anonymous 02/09/26(Mon)04:24:37 No.108099193

>>108098857
>iterate on the code until a proper implementation is achieved.
until it doesn't run anymore*

Anonymous
02/09/26(Mon)04:25:17 No.108099196

Anonymous 02/09/26(Mon)04:25:17 No.108099196

File: aryann lecun.png (1.64 MB, 1024x1024)

1.64 MB PNG

>>108097983
small and open?
based.

Anonymous
02/09/26(Mon)04:26:57 No.108099203

Anonymous 02/09/26(Mon)04:26:57 No.108099203

>>108099178
>omg dood it was spotted in the inference code!!1
These niggas are like gacha leaker speculah.

Anonymous
02/09/26(Mon)04:38:28 No.108099235

Anonymous 02/09/26(Mon)04:38:28 No.108099235

File: file.png (5 KB, 257x102)

5 KB PNG

>>108099178
I hate vramlets.

Anonymous
02/09/26(Mon)04:41:27 No.108099247

Anonymous 02/09/26(Mon)04:41:27 No.108099247

>>108099238
you have a while until it'll get vibecoded into lcpp

Anonymous
02/09/26(Mon)04:43:17 No.108099256

Anonymous 02/09/26(Mon)04:43:17 No.108099256

>>108099178
>DSA
Is someone going to finally implement this in llama.cpp?

Anonymous
02/09/26(Mon)04:44:56 No.108099266

Anonymous 02/09/26(Mon)04:44:56 No.108099266

>>108099256
vibecoders are on it.
believe in pwilkin

Anonymous
02/09/26(Mon)04:46:19 No.108099274

Anonymous 02/09/26(Mon)04:46:19 No.108099274

>>108099266
>breaks deepseek support in the process
:rocket:

Anonymous
02/09/26(Mon)04:47:55 No.108099277

Anonymous 02/09/26(Mon)04:47:55 No.108099277

File: file.png (54 KB, 698x229)

54 KB PNG

>>108099274
>>108099266

Anonymous
02/09/26(Mon)04:51:10 No.108099288

Anonymous 02/09/26(Mon)04:51:10 No.108099288

>>108099277
yeah why even bother testing with actual stuff, let's do all synthslop, we can always fix it later in another pr lmao
downstream shouldve pinned a working version anwyay lol!

Anonymous
02/09/26(Mon)04:54:14 No.108099303

Anonymous 02/09/26(Mon)04:54:14 No.108099303

>>108099277
Why don't they just make a side branch for bleeding edge shit?

Anonymous
02/09/26(Mon)04:54:50 No.108099308

Anonymous 02/09/26(Mon)04:54:50 No.108099308

>>108099303
Like Ikllama?

Anonymous
02/09/26(Mon)04:56:35 No.108099315

Anonymous 02/09/26(Mon)04:56:35 No.108099315

>>108099308
I don't mean fork. Just don't dump shit into master.

Anonymous
02/09/26(Mon)04:57:11 No.108099319

Anonymous 02/09/26(Mon)04:57:11 No.108099319

>>108099303
It's pretty clear he was just trying to show off how quickly he can (vibe)code. The model weights aren't even out yet, no one was asking him to merge kek

Anonymous
02/09/26(Mon)04:57:16 No.108099320

Anonymous 02/09/26(Mon)04:57:16 No.108099320

File: file.png (8 KB, 190x81)

8 KB PNG

>>108099303
Need day zero support the second the weights hit HF let become the unsloth of the reddits

Anonymous
02/09/26(Mon)04:59:04 No.108099331

Anonymous 02/09/26(Mon)04:59:04 No.108099331

>>108099322
akshual don't just yeti >>108089826
>RAM prices going down

Anonymous
02/09/26(Mon)05:02:24 No.108099346

Anonymous 02/09/26(Mon)05:02:24 No.108099346

I have moe fatigue, just give me llama 3.4, a proper mistral large and a new cohere model.

Anonymous
02/09/26(Mon)05:04:59 No.108099354

Anonymous 02/09/26(Mon)05:04:59 No.108099354

>>108099320
Then dump into dzero branch or something.

Anonymous
02/09/26(Mon)05:06:02 No.108099358

Anonymous 02/09/26(Mon)05:06:02 No.108099358

>>108099346
Gemma 4 200B dense soon

Anonymous
02/09/26(Mon)05:30:15 No.108099447

Anonymous 02/09/26(Mon)05:30:15 No.108099447

>>108099354
anon no, the popularity the acclaim!

Anonymous
02/09/26(Mon)05:32:34 No.108099457

Anonymous 02/09/26(Mon)05:32:34 No.108099457

>>108098839
>Will they? No.
are you really saying this about a Qwen model of all things
if the answer was a definitive, forever no, for one of the most popular open model series out there, then llama.cpp is a dead project and everyone just do not know it yet that they've been waling amidst zombies.

Anonymous
02/09/26(Mon)05:35:17 No.108099468

Anonymous 02/09/26(Mon)05:35:17 No.108099468

>>108098899
if he had more of The Nose I could have mistaken him for a jew.

Anonymous
02/09/26(Mon)05:37:14 No.108099473

Anonymous 02/09/26(Mon)05:37:14 No.108099473

>>108099468
>him
please be respectful

Anonymous
02/09/26(Mon)05:40:33 No.108099485

Anonymous 02/09/26(Mon)05:40:33 No.108099485

>>108099303
side branches? or proper versioning and release cycles? in my jart.cpp? it's less likely than you think
in fact the real UFO about llama.cpp development is the existence of git
it's the kind of project that definitely would be more inclined to exist as raw source code you zip up and make milestone333.zip archives of like a true Enterprise © developer of the old era
no backup then their hard drive fails and they go all like teehee

Anonymous
02/09/26(Mon)06:11:51 No.108099604

Anonymous 02/09/26(Mon)06:11:51 No.108099604

>>108097959
kinda considering getting an extra 64GB of ram just so i can run stepfun, is it worth it ?

Anonymous
02/09/26(Mon)06:13:46 No.108099611

Anonymous 02/09/26(Mon)06:13:46 No.108099611

>>108099041
sometime life is about personal dignity

Anonymous
02/09/26(Mon)06:14:41 No.108099614

Anonymous 02/09/26(Mon)06:14:41 No.108099614

So we're getting at least:
>Qwen/Qwen3.5-9B-Instruct
>Qwen/Qwen3.5-35B-A3B-Instruct
https://github.com/huggingface/transformers/pull/43830/

I guess there will be at least one more smaller version and a big one.

Anonymous
02/09/26(Mon)06:14:52 No.108099615

Anonymous 02/09/26(Mon)06:14:52 No.108099615

>>108099604
Currently, no amount of ram is worth it vs playing around renting compute until prices go down again.

Anonymous
02/09/26(Mon)06:19:15 No.108099635

Anonymous 02/09/26(Mon)06:19:15 No.108099635

File: file.png (40 KB, 674x188)

40 KB PNG

air bros lost

Anonymous
02/09/26(Mon)06:19:57 No.108099637

Anonymous 02/09/26(Mon)06:19:57 No.108099637

>>108099615
what a jewish economy, can't wait for the chinks to make cheap ram lol

Anonymous
02/09/26(Mon)06:22:00 No.108099651

Anonymous 02/09/26(Mon)06:22:00 No.108099651

File: 1770215748133.png (892 KB, 764x810)

892 KB PNG

>>108099637
about that

Anonymous
02/09/26(Mon)06:22:38 No.108099655

Anonymous 02/09/26(Mon)06:22:38 No.108099655

>>108099651
i guess i'm gonna have to make my own at that point.

Anonymous
02/09/26(Mon)06:24:56 No.108099665

Anonymous 02/09/26(Mon)06:24:56 No.108099665

>>108099637
There are multiple production lines spinning up right now, most people expect supply to flood the market in 2027-2028. Everything needs memory after all.

Anonymous
02/09/26(Mon)06:25:27 No.108099668

Anonymous 02/09/26(Mon)06:25:27 No.108099668

I don't even knew if there's Qwen3.5

Anonymous
02/09/26(Mon)06:26:45 No.108099673

Anonymous 02/09/26(Mon)06:26:45 No.108099673

>>108099668
we know piotr

Anonymous
02/09/26(Mon)06:27:07 No.108099674

Anonymous 02/09/26(Mon)06:27:07 No.108099674

>>108099651
wasn't that just cope for them just making ddr5? which makes perfect sense

Anonymous
02/09/26(Mon)07:20:30 No.108099870

Anonymous 02/09/26(Mon)07:20:30 No.108099870

>>108099127
That's a little disappointing, was there just no progress made on voiceover AI for 2, nearly 3 years?

Anonymous
02/09/26(Mon)07:24:49 No.108099893

Anonymous 02/09/26(Mon)07:24:49 No.108099893

>>108099635
Parla merda, il GLM Air resta figo.

Anonymous
02/09/26(Mon)07:33:42 No.108099936

Anonymous 02/09/26(Mon)07:33:42 No.108099936

>>108099235
In this case I am also hoping for a Flash version because that at least creates the chance that the thing on openrouter is just that and not GLM5. Pony Alpha being flagship GLM5 would be just pathetic, especially if it turns out to be as big as it's rumored to be.

Anonymous
02/09/26(Mon)07:34:16 No.108099938

Anonymous 02/09/26(Mon)07:34:16 No.108099938

>>108099892
there's a weird param size treadmill for labs where even small models that are meant for the most vrampoors only (corporate won't run the 9B model lol) they keep adding to each release whenever they feel like their model isn't getting better and they need to pretend to have made architectural improvements
this is how the "standard" tiny size of 7b (during early llama and mistral) became 8b then 9b, qwen 3b became qwen 4b etc
there's no legit argument for why 30b is now 35 other than "we need to show improvements and we couldn't make the 30b better"
I think they're crossing the threshold where those models just won't be useful for their target audience
if you're not vram poor you don't run a really retarded 35B moe with too low active parameters to be of real world use

Anonymous
02/09/26(Mon)07:35:42 No.108099944

Anonymous 02/09/26(Mon)07:35:42 No.108099944

>>108099938
i think stepfun is a nice sweetspot, 200B with 11B moes.

Anonymous
02/09/26(Mon)07:38:29 No.108099957

Anonymous 02/09/26(Mon)07:38:29 No.108099957

5090ti waiting room

Anonymous
02/09/26(Mon)07:43:34 No.108099984

Anonymous 02/09/26(Mon)07:43:34 No.108099984

>>108099957
already exists
https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000/

Anonymous
02/09/26(Mon)07:44:09 No.108099991

Anonymous 02/09/26(Mon)07:44:09 No.108099991

File: 1757728351131791.png (70 KB, 1009x529)

70 KB PNG

lovin this, better than spanish telenovelas

Anonymous
02/09/26(Mon)07:46:27 No.108100006

Anonymous 02/09/26(Mon)07:46:27 No.108100006

>>108099991
I guess there's some expectation to deliver because he managed to get qwen next in. But alas he fell back on bibecoding and it all backfires.

Anonymous
02/09/26(Mon)07:46:30 No.108100007

Anonymous 02/09/26(Mon)07:46:30 No.108100007

I wish niggerganov was more like curl's maintainer. curl's maintainer would not give that sort of fucktard the time of the day and act so nice towards him. Open source projects that do well don't have milquetoast leadership.

Anonymous
02/09/26(Mon)08:00:09 No.108100060

Anonymous 02/09/26(Mon)08:00:09 No.108100060

>>108099320
>Need day zero support the second the weights hit HF let become the unsloth of the reddits

https://huggingface.co/unsloth/Qwen3.5-9B-GGUF
https://huggingface.co/unsloth/Qwen3.5-9B-GGUF
https://huggingface.co/unsloth/Qwen3.5-9B-GGUF

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.
Read our Qwen3.5 Guide here!

{# Unsloth template fixes #}

Anonymous
02/09/26(Mon)08:01:23 No.108100065

Anonymous 02/09/26(Mon)08:01:23 No.108100065

Is structured outputs support not universal in llama.cpp? It works with GLM 4.7 but it doesn't with MiniMax...

Anonymous
02/09/26(Mon)08:03:25 No.108100080

Anonymous 02/09/26(Mon)08:03:25 No.108100080

>>108100065
No, just like how tool calling needs to be reinvented for each new architecture.

Anonymous
02/09/26(Mon)08:05:17 No.108100089

Anonymous 02/09/26(Mon)08:05:17 No.108100089

What is the best model for ERP on 24GB Vram?

Anonymous
02/09/26(Mon)08:06:48 No.108100099

Anonymous 02/09/26(Mon)08:06:48 No.108100099

>>108100089
depends how much ram you have, if its gpu only the answer is not gonna be the same as if you are fine running a 100B moe model with most of it on ram.

Anonymous
02/09/26(Mon)08:08:02 No.108100104

Anonymous 02/09/26(Mon)08:08:02 No.108100104

>>108100099
Sorry am noob, i have 64gb ddr4

thank you <3

Anonymous
02/09/26(Mon)08:16:13 No.108100153

Anonymous 02/09/26(Mon)08:16:13 No.108100153

>>108094391
>Quanting embeds and outputs is insane.
who the fuck quants the embeds like that?!

Anonymous
02/09/26(Mon)08:19:10 No.108100167

Anonymous 02/09/26(Mon)08:19:10 No.108100167

>>108100104
mistral small or nemo
or the finetunes cydonia / rocinante
you can run them at Q8 most likely

Anonymous
02/09/26(Mon)08:20:11 No.108100175

Anonymous 02/09/26(Mon)08:20:11 No.108100175

File: file.png (45 KB, 1674x208)

45 KB PNG

Anonymous
02/09/26(Mon)08:24:04 No.108100205

Anonymous 02/09/26(Mon)08:24:04 No.108100205

>>108099938
Some are just more honest about the size now eg
https://huggingface.co/google/gemma-7b (actually 9b)

Anonymous
02/09/26(Mon)08:24:08 No.108100206

Anonymous 02/09/26(Mon)08:24:08 No.108100206

SGLang and the entire Python ecosystem makes me puke. Everything is held together with duct tape.

Anonymous
02/09/26(Mon)08:28:35 No.108100217

Anonymous 02/09/26(Mon)08:28:35 No.108100217

>>108100206
literally nobody cares how incompetent you are.

Anonymous
02/09/26(Mon)08:33:25 No.108100256

Anonymous 02/09/26(Mon)08:33:25 No.108100256

>>108100206
kernels are written in c so it doesn't really matter. what annoys me the most are crooks like ollama

Anonymous
02/09/26(Mon)08:35:30 No.108100268

Anonymous 02/09/26(Mon)08:35:30 No.108100268

>>108100217
Said like a true pytard.

Anonymous
02/09/26(Mon)08:36:18 No.108100272

Anonymous 02/09/26(Mon)08:36:18 No.108100272

>>108100217
Devstral, GLM-4.6V, MiniMax. None of them load.
I just had to edit the code myself with this to load MiniMax:
https://github.com/sgl-project/sglang/issues/13214#issuecomment-3553875109
I'm fucking tired of KeyError.

Anonymous
02/09/26(Mon)08:38:17 No.108100285

Anonymous 02/09/26(Mon)08:38:17 No.108100285

>>108100272
>not using vllm or transformers
kinda gay

Anonymous
02/09/26(Mon)08:41:16 No.108100295

Anonymous 02/09/26(Mon)08:41:16 No.108100295

>>108098422
https://github.com/ggml-org/llama.cpp/pull/19453
>Revert Qwen3.5 dense and MoE support (no vision) (#19435)
>Taking a step back to implement support for Qwen3.5 properly.

Vibeslop loses once again.

>>108100006
Literally all of his PRs are vibecoded.
And the few ones that do get merged typically have such terrible performance that even random contributors with limited understanding of the codebase are able to make huge optimizations.

Anonymous
02/09/26(Mon)08:42:08 No.108100302

Anonymous 02/09/26(Mon)08:42:08 No.108100302

>>108100256
it does when python itself is a moving target that breaks from a swift fart. if everything is in kernels as you say, why don't we have an ani interface and bindings to whatever lang we want? why force people to use poothon?

Anonymous
02/09/26(Mon)08:44:27 No.108100312

Anonymous 02/09/26(Mon)08:44:27 No.108100312

>>108100302
holy shit fuck off, no one cares about the language, it could be written in node, java, go, c# or whatever else , NO ONE FUCKING CARES

Anonymous
02/09/26(Mon)08:45:14 No.108100317

Anonymous 02/09/26(Mon)08:45:14 No.108100317

>>108100272
>just skip loading some weights
really?

Anonymous
02/09/26(Mon)08:45:47 No.108100321

Anonymous 02/09/26(Mon)08:45:47 No.108100321

File: file(8).png (85 KB, 978x371)

85 KB PNG

>>108100295
Don't be redisuclous

Anonymous
02/09/26(Mon)08:46:01 No.108100322

Anonymous 02/09/26(Mon)08:46:01 No.108100322

>>108100302
you can pin python versions along with library versions (what literally every other language/framework does). The high performance part is already written in C so it's a non-issue, merely preference.

Anonymous
02/09/26(Mon)08:47:59 No.108100340

Anonymous 02/09/26(Mon)08:47:59 No.108100340

File: file.png (197 KB, 680x432)

197 KB PNG

It's over

Anonymous
02/09/26(Mon)08:49:20 No.108100346

Anonymous 02/09/26(Mon)08:49:20 No.108100346

>>108100340
YES! I'm 'ooming everywhere

Anonymous
02/09/26(Mon)08:49:30 No.108100347

Anonymous 02/09/26(Mon)08:49:30 No.108100347

>>108100317
those weights are bloat

Anonymous
02/09/26(Mon)08:49:45 No.108100348

Anonymous 02/09/26(Mon)08:49:45 No.108100348

>>108100340
that would be great. if only i could actually run the model.

Anonymous
02/09/26(Mon)08:50:40 No.108100353

Anonymous 02/09/26(Mon)08:50:40 No.108100353

>>108100340
>uses DSA
support never

Anonymous
02/09/26(Mon)08:51:09 No.108100356

Anonymous 02/09/26(Mon)08:51:09 No.108100356

>>108100295
>Literally all of his PRs are vibecoded.
welp, I thought that he learned something along the way

Anonymous
02/09/26(Mon)08:51:20 No.108100357

Anonymous 02/09/26(Mon)08:51:20 No.108100357

>>108100353
Piotr is on the case with day -1 support planned

Anonymous
02/09/26(Mon)08:52:41 No.108100366

Anonymous 02/09/26(Mon)08:52:41 No.108100366

>>108100340
Q2 is enough

Anonymous
02/09/26(Mon)08:52:55 No.108100367

Anonymous 02/09/26(Mon)08:52:55 No.108100367

>>108100295
>Literally all of his PRs are vibecoded.
how did it get approved? son and cudadev sperged out about this

Anonymous
02/09/26(Mon)08:52:56 No.108100368

Anonymous 02/09/26(Mon)08:52:56 No.108100368

>>108100340
Wait, so it's bigger? I sure hope pony is not it, because it feels dumber than 3.2.

Anonymous
02/09/26(Mon)08:53:34 No.108100375

Anonymous 02/09/26(Mon)08:53:34 No.108100375

>>108100322
>merely preference
I prefer choice

Anonymous
02/09/26(Mon)08:54:10 No.108100377

Anonymous 02/09/26(Mon)08:54:10 No.108100377

>>108100317
I don't know, their code is shit. Maybe they don't test splitting the model in multiple GPUs with pipeline parallelism.

Anonymous
02/09/26(Mon)08:55:05 No.108100383

Anonymous 02/09/26(Mon)08:55:05 No.108100383

>>108100340
NovelAI bros...

Anonymous
02/09/26(Mon)08:55:07 No.108100385

Anonymous 02/09/26(Mon)08:55:07 No.108100385

File: slop hn comments.png (241 KB, 1636x703)

241 KB PNG

hn is more and more filled with slop comments like these and they're so obvious you don't even have to read them, they are structurally repetitive down to the character and sentence count per paragraph to a T, which also makes me wonder what sort of dogshit LLM they're using to do this or if it's caused by an idiotic prompt
the slop gurglers are as a matter of fact the biggest proponents of LLM coding and I suspect the amount of humans who are legitimately positive outside of the grifter crowd who would have been enthusiastic about crypto before LLMs (steve yegge and his gastown, clawdbot etc) is actually really small

Anonymous
02/09/26(Mon)08:55:10 No.108100386

Anonymous 02/09/26(Mon)08:55:10 No.108100386

>>108100367
he has the commit bit, just merged it himself.

Anonymous
02/09/26(Mon)08:55:26 No.108100387

Anonymous 02/09/26(Mon)08:55:26 No.108100387

>>108100367
son has ownership over mtmd mostly, cudadev has ownership of the well, cude kernels.
the model implementers are mostly vibesloppers (CISC et co)

Anonymous
02/09/26(Mon)08:58:45 No.108100403

Anonymous 02/09/26(Mon)08:58:45 No.108100403

>>108100340
>44B active
Going to be slower than Kimi and Deepseek

Anonymous
02/09/26(Mon)09:00:56 No.108100415

Anonymous 02/09/26(Mon)09:00:56 No.108100415

>>108100386
>>108100387
so this does not matter then? https://github.com/ggml-org/llama.cpp/pull/18388
>Forbidden Usage
>DO NOT write code for contributors.

Anonymous
02/09/26(Mon)09:04:06 No.108100436

Anonymous 02/09/26(Mon)09:04:06 No.108100436

File: file.png (21 KB, 652x157)

21 KB PNG

>>108100415
i mean he? was part of that discussion so obviously influenced it in ?his favor

Anonymous
02/09/26(Mon)09:04:14 No.108100439

Anonymous 02/09/26(Mon)09:04:14 No.108100439

>>108100385
You need to remember that slop was produced by scraping reddit/HN in the first place, so these comments are only mimicking real comments made by redditors. So I'm sure HN users won't be able to tell the difference.

Anonymous
02/09/26(Mon)09:04:34 No.108100441

Anonymous 02/09/26(Mon)09:04:34 No.108100441

>>108100385
Very strong headcanon and coping.

Anonymous
02/09/26(Mon)09:04:52 No.108100443

Anonymous 02/09/26(Mon)09:04:52 No.108100443

>>108100415
this matters more for the critical parts of the app (aka the KERNELS, you 100% dont want vibecoded garbage there, the code is hard to read and is generally DIFFICULT). To implement a model using existing kernels vibesloippers are sadly allowed free reign

Anonymous
02/09/26(Mon)09:05:19 No.108100446

Anonymous 02/09/26(Mon)09:05:19 No.108100446

File: 1710213177231501.png (78 KB, 336x347)

78 KB PNG

>>108100340
where u get this?

Anonymous
02/09/26(Mon)09:05:53 No.108100449

Anonymous 02/09/26(Mon)09:05:53 No.108100449

>>108100446
asked glm47flash for a table

Anonymous
02/09/26(Mon)09:05:58 No.108100450

Anonymous 02/09/26(Mon)09:05:58 No.108100450

>>108100322
>you can pin python versions along with library versions
you can pin stuff in a venv only works if the software you're using doesn't require much external contributions (think, like plugins)
because venv isolate your program from other program, it doesn't isolate each individual modules used in your program
a real common case of cancer caused by usage of python, which absolutely IS unique to python:
comfyui nodes often having conflicts with one another because custom node 1 requires library A version 1 and custom node 2 requires library A version 2 and those versions are incompatible
you can't fix that, literally impossible, and pulling an incompatible one in your venv = destruction incoming (plus all the subdep)
Even JavaScript, one of the most hated language on the internet, does not have this problem. Each module in node.js is a self contained silo, pulling its dependencies for its own use without affecting anything else you import.
Not all languages solve it as well as js but the other communities generally solve problems their own ways to avoid this. C++ projects typically break APIs more rarely and have a real release cycle (llama.cpp is the exception, not the rule), it's also common to vendor dependencies in C++.
In Go, you have semver modules, in which you can pull two module versions in the same project fine as long as they have their major semver differ
Rust has wonderful library versioning etc
Fuck python
you guys have no hygiene and no understand of what good code would even look like from a distance
just saying everything is fine like you do is evidence of ignorance and being dropped on the head

Anonymous
02/09/26(Mon)09:07:29 No.108100456

Anonymous 02/09/26(Mon)09:07:29 No.108100456

File: 1759000909616367.png (1.33 MB, 1024x1536)

1.33 MB PNG

>>108099651
Makes perfect sense for China to set price at global levels. Which will decline when supply increases or demand drops ala >>108099665
Daily reminder anons should be selling any spare ram they are hoarding right now, not buying.

Anonymous
02/09/26(Mon)09:09:15 No.108100461

Anonymous 02/09/26(Mon)09:09:15 No.108100461

>>108097959
moe is a meme i think.
yes it's faster, but they are retards compared to dense models, so what's even the fucking point.

Anonymous
02/09/26(Mon)09:10:01 No.108100462

Anonymous 02/09/26(Mon)09:10:01 No.108100462

>>108100461
idiot moe is for the everyone futures

Anonymous
02/09/26(Mon)09:11:45 No.108100470

Anonymous 02/09/26(Mon)09:11:45 No.108100470

>>108100462
Mixture Of ESLs

Anonymous
02/09/26(Mon)09:14:09 No.108100487

Anonymous 02/09/26(Mon)09:14:09 No.108100487

>>108100340
I mean, secondhand MI210's are coming down in price. They have 64gb of HBM a pop. 8 of those and some mild quants, done.

Anonymous
02/09/26(Mon)09:14:24 No.108100489

Anonymous 02/09/26(Mon)09:14:24 No.108100489

>>108100449
It's fanfiction? But it's in table format...

Anonymous
02/09/26(Mon)09:15:02 No.108100491

Anonymous 02/09/26(Mon)09:15:02 No.108100491

>>108100446
Presumably from https://github.com/huggingface/transformers/pull/43858/changes

Anonymous
02/09/26(Mon)09:15:21 No.108100492

Anonymous 02/09/26(Mon)09:15:21 No.108100492

meanwhile mistral and gemma:
>MoE? what are they?
the west hasn't been enthusiastic about MoEs for smaller open models

Anonymous
02/09/26(Mon)09:15:23 No.108100493

Anonymous 02/09/26(Mon)09:15:23 No.108100493

>>108100487
at that point we need 1TB vram cards with 10TB/s bandwidth.

Anonymous
02/09/26(Mon)09:16:28 No.108100503

Anonymous 02/09/26(Mon)09:16:28 No.108100503

File: 1753346519337108.png (44 KB, 509x221)

44 KB PNG

>>108100450
>you guys
I said I generally don't care. Also node can have conflicting deps (for which you will need to define overrides, see picrel for something I just screenshotted from work).
For other cases in node, you sadly end up pulling a lot of duplicate deps, but yeah that's the only way you're gonna have a 'always working' dependency system (which will btw break too).
In linux it works exactly like in python btw, all your system libs depend on each other, to install them with their own set of deps you have to do some workarounds and compile things differently (staticly). if you ever updated a rolling distro, you would know that 80% of the libraries depend on glibc, and when it gets updated, you HAVE to pull all the packages that depend on it.
Again, I fucking hate python for other reasons (non-sensical ternary ops, forced indentation, 'def', garbage parallelism/threaded support), but its way of working with versions/libs is the least of its problems.

Anonymous
02/09/26(Mon)09:16:34 No.108100504

Anonymous 02/09/26(Mon)09:16:34 No.108100504

>>108100492
gemini is obviously a moe
and mistral has their partnership with the asic inference thing so they can serve dense for relatively cheap

Anonymous
02/09/26(Mon)09:18:46 No.108100517

Anonymous 02/09/26(Mon)09:18:46 No.108100517

>>108100504
>gemini is obviously a moe
the reading comprehension of /lmg/ retards never ceases to amaze
what part of
>for smaller open models
can refer to gemini?
you can preemptively add a lot of details in making statements just to be sure autists won't keep replying autistic things but nothing can stop the omega turbo autists.

Anonymous
02/09/26(Mon)09:20:42 No.108100527

Anonymous 02/09/26(Mon)09:20:42 No.108100527

File: 074cf5d7b2a7f844df9c5fa10(...).jpg (170 KB, 503x800)

170 KB JPG

>>108100491
hmmm, so when should glm5 appear on openrouter? just hypothetically speaking. I totally plan to run a q1 quant on my 3090 and ram.

Anonymous
02/09/26(Mon)09:22:22 No.108100533

Anonymous 02/09/26(Mon)09:22:22 No.108100533

>>108100461
>>108100492
>>108100517
>Repeating outdated discussions about MoEs again
>>>/aids/
We have it in the last thread already. Next on your agenda would be fp16 vs 4bit? We have kimi and GPT-5.3 trained on fp4, deepseek trained on fp8 now. Pack up your shitty baits with you and fuck off back to your dying containment general

Anonymous
02/09/26(Mon)09:23:09 No.108100539

Anonymous 02/09/26(Mon)09:23:09 No.108100539

Retard here, how viable is it to run a 13b parameter mythomax model on a dedicated Rx 9060 XT 16gb? The Nvidia tax is too steep for me at over 35% price difference for the 5060 with 16gb. I know this is a rich people hobby with the vram requirements, but let a man dream

Anonymous
02/09/26(Mon)09:23:58 No.108100548

Anonymous 02/09/26(Mon)09:23:58 No.108100548

>>108100539
Do you have the gpu already?

Anonymous
02/09/26(Mon)09:25:23 No.108100560

Anonymous 02/09/26(Mon)09:25:23 No.108100560

>>108100533
base thanks
>>108100539
2023 called it's model back

Anonymous
02/09/26(Mon)09:26:15 No.108100564

Anonymous 02/09/26(Mon)09:26:15 No.108100564

>>108100533
>everyone that bashes my shitty architecture must be from aids or aicg
retard. why don't you post logs from your deep fried q2 chinky moe if it is so good? or are you embarrassed from the amount of pronounslop it generates after 17 minutes?

Anonymous
02/09/26(Mon)09:27:07 No.108100576

Anonymous 02/09/26(Mon)09:27:07 No.108100576

>>108100539
You can fit 13b on 16GB VRAM, its roughly 1GB per 1 billion parameters
I run high quant 12B's on my 3060 with 12GB VRAM with no issues

Anonymous
02/09/26(Mon)09:27:15 No.108100579

Anonymous 02/09/26(Mon)09:27:15 No.108100579

File: img.png (143 KB, 938x688)

143 KB PNG

>>108100449
glm47flash wouldn't get the math right

Anonymous
02/09/26(Mon)09:27:40 No.108100581

Anonymous 02/09/26(Mon)09:27:40 No.108100581

>>108100548
Not yet, I'm looking at the prices in my region, Nvidia anything with 16gb is way too expensive, even the 3060 12gb is about 95% the price of that 9060. Used market doesn't exist, the third world hasn't caught up with the concept of a GPU, even the international used market (ebay) seems like a terrible deal after shipping prices and import taxes

Anonymous
02/09/26(Mon)09:29:12 No.108100585

Anonymous 02/09/26(Mon)09:29:12 No.108100585

>>108100581
Ok. I was about to call you a retard or a pussy for not just trying to run it.
Yes, it will run fine. Run mistral-nemo-instruct. mytho is old.

Anonymous
02/09/26(Mon)09:34:36 No.108100619

Anonymous 02/09/26(Mon)09:34:36 No.108100619

From kimi k2.5, notice the lack of spatial awareness that all moes exhibit.

The rain had turned the cobblestones of Vel Morath into a silver mirror, reflecting the torches of the Sunken Archives. Sir Corvin adjusted his shield, the leather creaking in the damp air, and watched as Elara pressed her palm against the moss-covered door.

"You’re certain this is the place?" Corvin asked, his voice a low rumble. "The last scholar who came here didn’t return. Just… scorch marks and a smell of ozone."

Elara didn’t look back. Her fingers traced the ancient runes, glowing faintly at her touch. "The *Codex of Hollow Stars* is here, Corvin. I can feel it humming." She pushed, and the stone groaned open. "Besides, you’ve fought dragons. What’s a dusty old library to you?"

"A library that bites."

They descended into a rotunda where the air tasted of rust and forgotten magic. Dust motes danced in the beam of Elara’s lantern, illuminating shelves that spiraled down into darkness. It was beautiful—and wrong. No cobwebs. No rats.

"Too clean," Corvin muttered, hand on his sword hilt. "Elara, step back."

"Nonsense. Preservation enchantments often—"

The floor trembled. From the shadows between the shelves, something shaped like a lion but wrought from clockwork and starlight unfolded itself. Its eyes were sapphires that clicked with mechanical precision as they locked onto Elara.

"INTRUDER," it intoned, voice like grinding millstones.

"Oh," Elara whispered.

Corvin shoved her behind him, shield raised. "I told you. Libraries bite."

The construct lunged. Corvin met it with steel, the clash echoing through the chamber. Sparks flew as his blade scraped against brass ribs. He grunted, driven back by the weight.

"Elara! The *Codex*—now!"

She scrambled past the fray, spotting the book on a pedestal at the room’s heart. It was bound in midnight blue leather, chained with silver. As she reached for it, the chains slithered like serpents.

Anonymous
02/09/26(Mon)09:37:12 No.108100636

Anonymous 02/09/26(Mon)09:37:12 No.108100636

>>108100619
>Elara
Every time

Anonymous
02/09/26(Mon)09:37:30 No.108100638

Anonymous 02/09/26(Mon)09:37:30 No.108100638

>>108100619
>elara
stopped reading there, promplet

Anonymous
02/09/26(Mon)09:38:02 No.108100642

Anonymous 02/09/26(Mon)09:38:02 No.108100642

elarasex

Anonymous
02/09/26(Mon)09:39:27 No.108100654

Anonymous 02/09/26(Mon)09:39:27 No.108100654

>>108100619
You should use the energy you saved due to not jerking off to engeneer a dedicated spatial awareness benchmark for labs to benchmaxx on

Anonymous
02/09/26(Mon)09:39:39 No.108100655

Anonymous 02/09/26(Mon)09:39:39 No.108100655

>>108100638
Post your logs then, eslmonkey.

Anonymous
02/09/26(Mon)09:40:29 No.108100661

Anonymous 02/09/26(Mon)09:40:29 No.108100661

>>108100655
>muh moe dumb eeeuuuuu
post your big dense cock model not doing the same mistakes retard, maybe then we can talk

Anonymous
02/09/26(Mon)09:45:07 No.108100697

Anonymous 02/09/26(Mon)09:45:07 No.108100697

File: 1762512449247902.png (399 KB, 621x855)

399 KB PNG

>>108100340
They really pulled a Kimi on us.

Anonymous
02/09/26(Mon)09:45:56 No.108100705

Anonymous 02/09/26(Mon)09:45:56 No.108100705

File: 1752146363026699.jpg (65 KB, 479x640)

65 KB JPG

>>108100340

Anonymous
02/09/26(Mon)09:46:35 No.108100713

Anonymous 02/09/26(Mon)09:46:35 No.108100713

File: a00p86[1].png (1.56 MB, 1024x1024)

1.56 MB PNG

Is there still anything better for text to speech than
GPT-SoVITS? I feel like anything else I tried was worse or sidegrade at best. Kokoro is surprisingly awesome for something so small that you can even run it on a CPU but no voice cloning.

Anonymous
02/09/26(Mon)09:46:38 No.108100715

Anonymous 02/09/26(Mon)09:46:38 No.108100715

>>108100340
This is why /lmg/ is dead btw

Anonymous
02/09/26(Mon)09:47:01 No.108100719

Anonymous 02/09/26(Mon)09:47:01 No.108100719

>>108100661
In my experience glm 4.5 air, gpt-oss 120b and mistral large 123b all have trouble figuring spatial relations in a sexual context.

Anonymous
02/09/26(Mon)09:47:57 No.108100727

Anonymous 02/09/26(Mon)09:47:57 No.108100727

>>108099991
And people still refuse to port things from ikllama back to main fork. Imagine the drama.

Anonymous
02/09/26(Mon)09:48:30 No.108100731

Anonymous 02/09/26(Mon)09:48:30 No.108100731

>>108100719
in mine too, that's why I dared the retard to back it up instead of just dumping a turd in the thread and expecting a discussion about it

Anonymous
02/09/26(Mon)09:50:05 No.108100742

Anonymous 02/09/26(Mon)09:50:05 No.108100742

File: file.png (295 KB, 604x453)

295 KB PNG

>>108100340

Anonymous
02/09/26(Mon)09:52:32 No.108100765

Anonymous 02/09/26(Mon)09:52:32 No.108100765

>>108100340
I am incredibly sad that I got left behind by Z.AI but I will cope by saying that at least they also confirm there is no way to improve anything in this bullshit hobby without increasing the parameter count.

Anonymous
02/09/26(Mon)09:54:22 No.108100776

Anonymous 02/09/26(Mon)09:54:22 No.108100776

>>108100765
My cope is that they're going to safety slop the thing to hell and back.

Anonymous
02/09/26(Mon)10:03:19 No.108100840

Anonymous 02/09/26(Mon)10:03:19 No.108100840

>>108100713
What did you try so far?

Anonymous
02/09/26(Mon)10:09:29 No.108100876

Anonymous 02/09/26(Mon)10:09:29 No.108100876

>>108100366
Q1 is enough

Anonymous
02/09/26(Mon)10:10:31 No.108100883

Anonymous 02/09/26(Mon)10:10:31 No.108100883

>>108100661
Quit sperging out and post logs, otherwise your argument is invalid.

Anonymous
02/09/26(Mon)10:10:40 No.108100884

Anonymous 02/09/26(Mon)10:10:40 No.108100884

>>108098294
https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B (use or make a gguf, obviously)

Anonymous
02/09/26(Mon)10:16:23 No.108100916

Anonymous 02/09/26(Mon)10:16:23 No.108100916

>>108100884

"max_position_embeddings": 2048,

Anonymous
02/09/26(Mon)10:17:45 No.108100926

Anonymous 02/09/26(Mon)10:17:45 No.108100926

>>108100765
Just invent something else that isn't transformers.

llama.cpp CUDA dev !!yhbFjk57TDr
02/09/26(Mon)10:18:29 No.108100938

llama.cpp CUDA dev !!yhbFjk57TDr 02/09/26(Mon)10:18:29 No.108100938

>>108100367
>>108100387
I personally don't care whether or not a human wrote some piece of code, I only care about the code quality.
I was and still am in favor of banning randoms from submitting machine-generated code entirely because a rule requiring that they carefully check the code before opening the PR is not enforceable and results in a lost of wasted time for maintainers.
For repeat contributors with political standing my opinion is that it's fine to make exceptions because for them the cost of violating the trust of maintainers is much higher.
I cannot comment on this particular case because I did not look into the specifics and it's not a part of the code where I am taking the responsibility for maintenance.
What I can say is that when he previously submitted machine-generated CUDA code I vetoed that particular implementation due to poor maintainability.

Anonymous
02/09/26(Mon)10:18:56 No.108100940

Anonymous 02/09/26(Mon)10:18:56 No.108100940

>>108100654
>dedicated spatial awareness benchmark for labs to benchmaxx on
stop using words you don't understand

Anonymous
02/09/26(Mon)10:19:53 No.108100945

Anonymous 02/09/26(Mon)10:19:53 No.108100945

>>108100938
>For repeat contributors with political standing my opinion is that it's fine to make exceptions because for them the cost of violating the trust of maintainers is much higher.
lolmao obviously why even say anything

Anonymous
02/09/26(Mon)10:20:41 No.108100949

Anonymous 02/09/26(Mon)10:20:41 No.108100949

>>108100940
NTA but it wouldn't be that hard to make a cockbench but for spatial reasoning.

Anonymous
02/09/26(Mon)10:22:05 No.108100955

Anonymous 02/09/26(Mon)10:22:05 No.108100955

File: Untitled.jpg (17 KB, 277x273)

17 KB JPG

>tfw there will never be a Nemo2 for RP
Call me crazy but I think even Mistral Small is worse despite being twice as big and by the same company. It makes more logical errors regarding the situation in my RP situations and is less realistic and more book-like which feels worse in RP. Mistral Small shits itself less with long context though, this is true.
Gemma-3 27b is REALLY good for its size in terms of not getting confused but the writing style and level of censorship is really unpalatable unless you want PG stories.

Anonymous
02/09/26(Mon)10:29:23 No.108101010

Anonymous 02/09/26(Mon)10:29:23 No.108101010

>>108100955
Mistral Nemo 12B is one of the last Western open-weight models trained on almost every pirated book that could be found. Something changed with Mistral Small 3.0 when they briefly pivoted toward "safety" (remember sea otters?) and boasted about how lean their pretraining dataset was.

https://venturebeat.com/ai/mistral-small-3-brings-open-source-ai-to-the-masses-smaller-faster-and-cheaper
>[...] The model was trained on 8 trillion tokens, compared to 15 trillion for comparable models, according to Lample. This efficiency could make advanced AI capabilities more accessible to businesses concerned about computing costs.

Anonymous
02/09/26(Mon)10:30:19 No.108101016

Anonymous 02/09/26(Mon)10:30:19 No.108101016

>>108100938
is there a good way to benchmark single (cuda) ops in llama.cpp right now? i wanted to play around with kernels for a bit.

Anonymous
02/09/26(Mon)10:31:40 No.108101023

Anonymous 02/09/26(Mon)10:31:40 No.108101023

>>108101010
What's even the point of not training on books anymore? The publishers have all but given up on litigation, now focusing on slapping the AI companies with piracy charges for the original sin of torrenting the books rather than anything to do with training the model

Anonymous
02/09/26(Mon)10:32:30 No.108101032

Anonymous 02/09/26(Mon)10:32:30 No.108101032

>>108100340
what are mid tier folks supposed to run now that zai is abandoning us?

Anonymous
02/09/26(Mon)10:32:39 No.108101035

Anonymous 02/09/26(Mon)10:32:39 No.108101035

>>108101010
To me the best way to see if books reappear in training sets is when models stop using 3rd person singular "they" when talking about someone. That is how you know they only know internet speak.

Anonymous
02/09/26(Mon)10:33:27 No.108101043

Anonymous 02/09/26(Mon)10:33:27 No.108101043

>>108101023
The Meta lawsuit was still ongoing when the model was trained.

Anonymous
02/09/26(Mon)10:35:09 No.108101055

Anonymous 02/09/26(Mon)10:35:09 No.108101055

>>108101010
why won't the chinese do it? and I also can't wait for yandex to get their shit together since piracy is a national tradition in russia

llama.cpp CUDA dev !!yhbFjk57TDr
02/09/26(Mon)10:35:53 No.108101064

llama.cpp CUDA dev !!yhbFjk57TDr 02/09/26(Mon)10:35:53 No.108101064

>>108101016

./build/bin/test-backend-ops perf

Anonymous
02/09/26(Mon)10:36:24 No.108101070

Anonymous 02/09/26(Mon)10:36:24 No.108101070

>>108101035
I think Mistral has some licensed book datasets that they can use for training, but it's definitely not the same as using the entirety of Anna's archive, Libgen, etc.

Anonymous
02/09/26(Mon)10:36:39 No.108101071

Anonymous 02/09/26(Mon)10:36:39 No.108101071

>>108101043
Nvidia is still bragging about using 0 novels in their recent model drops, I don't think the trend is over

Anonymous
02/09/26(Mon)10:40:32 No.108101098

Anonymous 02/09/26(Mon)10:40:32 No.108101098

>>108097959
is sillytavern still king for RP or we have better shit now ?

Anonymous
02/09/26(Mon)10:41:37 No.108101107

Anonymous 02/09/26(Mon)10:41:37 No.108101107

>>108100713
Yes. Piper. Why? Shit actually supports it. Scratchy/deterministic is better than "nothing supports it".

Anonymous
02/09/26(Mon)10:42:08 No.108101111

Anonymous 02/09/26(Mon)10:42:08 No.108101111

>>108101071
>0 novels
Between this, religion of safety, ram prices, everyone using scaleAI, the main goal of the tech being to remove office jobs... This hobby is actual hell isn't it? They only way it could be worse is if all companies just kept weights to themselves and no leaks happened. Everything else is pretty much as bad as it can be.

Anonymous
02/09/26(Mon)10:42:19 No.108101114

Anonymous 02/09/26(Mon)10:42:19 No.108101114

>>108101023
>charges for the original sin of torrenting the books
Which is illegal and that's what they go through that angle. What else could they do? Big companies could just buy those entire libraries, though. However...
>What's even the point of not training on books anymore?
Synth is better structured and easier to train for, even if the results are more boring. They don't have fun in mind when doing making these things. Not getting sued for doing something illegal is a good reason as well. Only applies to western companies, but given the chinks also use synth datasets, it has the same effect on them.

Anonymous
02/09/26(Mon)10:43:23 No.108101123

Anonymous 02/09/26(Mon)10:43:23 No.108101123

>>108101107
>CPU but no voice cloning

Anonymous
02/09/26(Mon)10:45:37 No.108101144

Anonymous 02/09/26(Mon)10:45:37 No.108101144

>>108100916
Aww come on, don't you want your GF to keep forgetting you're dating or had sex etc? That's the OG c.ai experience.

Anonymous
02/09/26(Mon)10:45:58 No.108101148

Anonymous 02/09/26(Mon)10:45:58 No.108101148

it's chinese model week
are you ready for some succulent chinese models?

Anonymous
02/09/26(Mon)10:46:46 No.108101160

Anonymous 02/09/26(Mon)10:46:46 No.108101160

>>108100713
>>108101123 (me)
To add something closer to what anon wants, there's pocket tts if you haven't tried it yet. I don't think it's as good as gpt-sovits.

Anonymous
02/09/26(Mon)10:47:20 No.108101165

Anonymous 02/09/26(Mon)10:47:20 No.108101165

>>108101123
You absolutely can clone voices with piper: https://huggingface.co/quarterturn/kuroki_tomoko_en_piper

Anonymous
02/09/26(Mon)10:47:35 No.108101171

Anonymous 02/09/26(Mon)10:47:35 No.108101171

>>108100949
I didn't take issue with the possibility of creating such a benchmark, but that it could be benchmaxxed or that creating another benchmark that could be memorized would do anything to solve the spatial reasoning problem

Anonymous
02/09/26(Mon)10:47:46 No.108101173

Anonymous 02/09/26(Mon)10:47:46 No.108101173

>>108100713
Qwen3-tts if you're not a weird contrarian

Anonymous
02/09/26(Mon)10:47:57 No.108101179

Anonymous 02/09/26(Mon)10:47:57 No.108101179

>>108101148
The trend of trinity (retarded), step (retarded), GLM 5 (too big) doesn't inspire confidence...

Anonymous
02/09/26(Mon)10:48:24 No.108101187

Anonymous 02/09/26(Mon)10:48:24 No.108101187

>>108101064
thanks! time to find out how retarded i truly am.

Anonymous
02/09/26(Mon)10:48:56 No.108101192

Anonymous 02/09/26(Mon)10:48:56 No.108101192

>>108101148
V4 fucking when?

Anonymous
02/09/26(Mon)10:48:59 No.108101193

Anonymous 02/09/26(Mon)10:48:59 No.108101193

>>108101179
trinity is american thoughever

Anonymous
02/09/26(Mon)10:49:00 No.108101194

Anonymous 02/09/26(Mon)10:49:00 No.108101194

>>108100585
Huh, thanks for the rec. I read the OP, but I originally ignored this model since it's labeled as ERP, not really my usecase, but it seems to be overall better

Anonymous
02/09/26(Mon)10:51:07 No.108101216

Anonymous 02/09/26(Mon)10:51:07 No.108101216

>>108101165
>Piper text-to-speech model *trained* against...
If you take finetuning as a reasonable avenue, sure. EVERY tts model can clone voices.

Anonymous
02/09/26(Mon)10:52:33 No.108101228

Anonymous 02/09/26(Mon)10:52:33 No.108101228

>>108101192
let dhem coock

Anonymous
02/09/26(Mon)10:55:49 No.108101255

Anonymous 02/09/26(Mon)10:55:49 No.108101255

Nemo is the fucking GOAT its incredible how much it punched above its weight

Anonymous
02/09/26(Mon)10:56:25 No.108101260

Anonymous 02/09/26(Mon)10:56:25 No.108101260

>>108101194
If you haven't run any llm, nemo is just fine overall, not just rp. Once you have something to run models on you can try other things and see what you like best, like qwen 30b3a and stuff like that. You may be able to run up to 32b dense models with quantization and maybe leaving some layers on cpu. But don't worry about that now. Any usage tips would be kind of useless until you have something to run them on.

Anonymous
02/09/26(Mon)10:58:14 No.108101272

Anonymous 02/09/26(Mon)10:58:14 No.108101272

I am sure deepseek and glm will swap and V4 is gonna be half the size. And even if it isn't then GLM5 Q2 is still gonna be great. R-right?

Anonymous
02/09/26(Mon)10:59:36 No.108101284

Anonymous 02/09/26(Mon)10:59:36 No.108101284

>>108101272
>V4 is gonna be half the size
Yes, yes.... double if you count the engrams though

Anonymous
02/09/26(Mon)11:05:14 No.108101327

Anonymous 02/09/26(Mon)11:05:14 No.108101327

File: chess1.png (125 KB, 2124x1142)

125 KB PNG

>>108097306

Setting up to try this, I'm so excited to find out what happens!!!

Anonymous
02/09/26(Mon)11:08:45 No.108101365

Anonymous 02/09/26(Mon)11:08:45 No.108101365

>another year of touching my penis to 4.6...
Could have been worse. Could have been nemo.

Anonymous
02/09/26(Mon)11:18:48 No.108101459

Anonymous 02/09/26(Mon)11:18:48 No.108101459

File: 7f028411df7631a252cc2b7c0(...).png (22 KB, 1091x153)

22 KB PNG

I just gave glm 4.5 a document to translate and am getting this. I'm a complete beginner with text gen webui.

Anonymous
02/09/26(Mon)11:19:58 No.108101471

Anonymous 02/09/26(Mon)11:19:58 No.108101471

>>108101459
You set your context window size to 8192 and the document is longer than that (in tokens).

Anonymous
02/09/26(Mon)11:21:23 No.108101485

Anonymous 02/09/26(Mon)11:21:23 No.108101485

>>108101459
>Increase ctx-size while loading the model to avoid truncation.

Anonymous
02/09/26(Mon)11:24:54 No.108101521

Anonymous 02/09/26(Mon)11:24:54 No.108101521

>>108100340
I hope they aren't retarded and went for QAT with this. I don't see a point in running this at Q8 when I can run K2.5 at the QAT target at double the speed unless GLM5 is an absolute game changer.

Anonymous
02/09/26(Mon)11:26:49 No.108101537

Anonymous 02/09/26(Mon)11:26:49 No.108101537

File: shithub.png (187 KB, 1195x798)

187 KB PNG

>>108101111
>the main goal of the tech being to remove office jobs
poorly
I see this page more and more often on shithub ever since they turned all their attention on adding ai features nobody asked for
I have yet to see an example of a product that was improved by the integration of a LLM, or by the use of (by vibeshitters)

Anonymous
02/09/26(Mon)11:28:57 No.108101552

Anonymous 02/09/26(Mon)11:28:57 No.108101552

>>108101071
That's obviously to fuck with he lawsuits. This is the same company that approached that one piracy archive for their entire book collection.

Anonymous
02/09/26(Mon)11:38:41 No.108101639

Anonymous 02/09/26(Mon)11:38:41 No.108101639

>>108101552
I could believe they deleted the books. Two facts:
they create a massive corpus of slop artificial data with low parameter models like 30BA3B qwen probably because they didn't want to pay what it would cost to rewrite much of humanity's content with a large model:
https://huggingface.co/datasets/nvidia/Nemotron-CC-v2#data-overview
it is huge, and it is gigaslop, and sufficient to serve as the basis of model pretraining
secondly, if you actually give their newer models a try (v2 and v3 nemotrons) they are some of the sloppiest models in existence, reflecting the weight of artificial data.

Anonymous
02/09/26(Mon)11:48:46 No.108101742

Anonymous 02/09/26(Mon)11:48:46 No.108101742

decade of nemo

Anonymous
02/09/26(Mon)11:52:34 No.108101773

Anonymous 02/09/26(Mon)11:52:34 No.108101773

File: file.png (174 KB, 1652x444)

174 KB PNG

>30 minutes to load a model

Anonymous
02/09/26(Mon)11:52:42 No.108101774

Anonymous 02/09/26(Mon)11:52:42 No.108101774

GLM5 will be the ChatGPT moment of local AI

Anonymous
02/09/26(Mon)11:53:46 No.108101784

Anonymous 02/09/26(Mon)11:53:46 No.108101784

File: 1743729101239468.png (6 KB, 1203x29)

6 KB PNG

Anonymous
02/09/26(Mon)11:55:57 No.108101800

Anonymous 02/09/26(Mon)11:55:57 No.108101800

File: 1746547160053728.png (664 KB, 1377x441)

664 KB PNG

https://huggingface.co/spaces/openbmb/MiniCPM-o-4_5-Demo

Anonymous
02/09/26(Mon)11:56:40 No.108101808

Anonymous 02/09/26(Mon)11:56:40 No.108101808

>>108100938
>What I can say is that when he previously submitted machine-generated CUDA code I vetoed that particular implementation due to poor maintainability.
least surprising statement of the year

Anonymous
02/09/26(Mon)12:05:18 No.108101874

Anonymous 02/09/26(Mon)12:05:18 No.108101874

>>108101773
8tb nvme was a good purchase a couple of years ago

Anonymous
02/09/26(Mon)12:08:19 No.108101902

Anonymous 02/09/26(Mon)12:08:19 No.108101902

Maybe GLM5 will motivate llama.cpp to actually bother implementing DSA instead of mangling the model into full attention like with DS3.2

Anonymous
02/09/26(Mon)12:08:40 No.108101906

Anonymous 02/09/26(Mon)12:08:40 No.108101906

>>108101902
https://github.com/ggml-org/llama.cpp/pull/19460

Anonymous
02/09/26(Mon)12:09:39 No.108101915

Anonymous 02/09/26(Mon)12:09:39 No.108101915

>>108101471
>>108101485
Increased to 32k, no errors now. Didn't realize I had to reload the model.

It's very slow at giving me replies, it's like one word every .3 seconds or so. What's up with that? I'm on a 5090.

Anonymous
02/09/26(Mon)12:11:57 No.108101931

Anonymous 02/09/26(Mon)12:11:57 No.108101931

>>108101915
>vramlet
>fuckhuge model
What did you expect?

Anonymous
02/09/26(Mon)12:13:43 No.108101952

Anonymous 02/09/26(Mon)12:13:43 No.108101952

>>108101915
You are probably going over your VRAM and into your RAM using the Nvidia driver's fallback, which is slow as fuck.
You need enough vram for the model + the context and the pp buffer.
The longer the context window, the more memory it takes.
So you might want to lower the pp buffer (batch size), lower the context window length, or put some of the model in RAM (layers or tensors).
Read the stuff in the op, there's even a calculator that might help.

Anonymous
02/09/26(Mon)12:14:44 No.108101963

Anonymous 02/09/26(Mon)12:14:44 No.108101963

>>108101952
Oh, and enable flash attention if you haven't. It saves quite a bit of memory.

Anonymous
02/09/26(Mon)12:15:55 No.108101972

Anonymous 02/09/26(Mon)12:15:55 No.108101972

File: Screenshot 2026-02-08 at (...).png (4 KB, 672x29)

4 KB PNG

Anonymous
02/09/26(Mon)12:17:00 No.108101988

Anonymous 02/09/26(Mon)12:17:00 No.108101988

>>108101952
>>108101963
>>108101459
I don't recognize this console output but anon should switch to llama-server that does all of this automatically.

Anonymous
02/09/26(Mon)12:17:45 No.108101993

Anonymous 02/09/26(Mon)12:17:45 No.108101993

>>108101988
Oh yeah. There's the --fit param now.

Anonymous
02/09/26(Mon)12:18:23 No.108102001

Anonymous 02/09/26(Mon)12:18:23 No.108102001

>>108101906
>piotr already on it
oh no

Anonymous
02/09/26(Mon)12:19:47 No.108102010

Anonymous 02/09/26(Mon)12:19:47 No.108102010

thank you drummer, I really enjoyed your latest finetune of

Anonymous
02/09/26(Mon)12:20:57 No.108102023

Anonymous 02/09/26(Mon)12:20:57 No.108102023

>>108102010
The bot EOSed early.

Anonymous
02/09/26(Mon)12:21:28 No.108102030

Anonymous 02/09/26(Mon)12:21:28 No.108102030

>>108101972
i tried his garbage assistant pepe
NEVER AGAIN

Anonymous
02/09/26(Mon)12:25:54 No.108102076

Anonymous 02/09/26(Mon)12:25:54 No.108102076

>>108101800
im horny

Anonymous
02/09/26(Mon)12:26:33 No.108102083

Anonymous 02/09/26(Mon)12:26:33 No.108102083

>>108101952
Ah right, forced to do it this slow then.The model is filling up about 90% of my vram, rest into ram. Thanks.

>>108101963
>>108101988
I was hoping to keep it as simple as possible, I despise using anything that doesn't use GUI.

Anonymous
02/09/26(Mon)12:27:55 No.108102097

Anonymous 02/09/26(Mon)12:27:55 No.108102097

File: 1739991643824322.jpg (91 KB, 700x763)

91 KB JPG

>>108102076
Same

The local version also has voice cloning

Anonymous
02/09/26(Mon)12:31:23 No.108102134

Anonymous 02/09/26(Mon)12:31:23 No.108102134

>>108101800
Is this model supported by llama? A complete package model sounds nice instead of fiddling with random components.
>9b
Uhh.

Anonymous
02/09/26(Mon)12:32:09 No.108102147

Anonymous 02/09/26(Mon)12:32:09 No.108102147

>>108101800
>video call
what the fuck?
>server at capacity
AIEEEEEEEEEE
LLAMACPP SUPPORT WHEN?!?!? NGXSON FUCKING CODE IT U DOUBLE FAGIT

Anonymous
02/09/26(Mon)12:35:14 No.108102170

Anonymous 02/09/26(Mon)12:35:14 No.108102170

File: 1740038665978432.gif (2.25 MB, 498x280)

2.25 MB GIF

>>108102134
>>108102147
I'm using https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/web_demo/WebRTC_Demo/README.md right now

Runs perfectly on a Mac M4, don't know about llama

Anonymous
02/09/26(Mon)12:36:54 No.108102187

Anonymous 02/09/26(Mon)12:36:54 No.108102187

>>108102170
So, is it good? What are you doing with it?

Anonymous
02/09/26(Mon)12:37:49 No.108102197

Anonymous 02/09/26(Mon)12:37:49 No.108102197

>check op
>Download Mistral-Small-3.2-24B-Instruct-2506-GGUF
>Generation starts out ok, but after a few chats bot loops into retardation
>Try cranking up the penalty for repetition
>Does nothing keeps repeating the the same phrases over and over.
Am I missing something obvious? I last tried this a year ago with some random mistrial model and it just worked tm

Anonymous
02/09/26(Mon)12:38:18 No.108102204

Anonymous 02/09/26(Mon)12:38:18 No.108102204

>>108102170
just found this, made by them apparently
https://github.com/tc-mb/llama.cpp-omni

Anonymous
02/09/26(Mon)12:38:34 No.108102208

Anonymous 02/09/26(Mon)12:38:34 No.108102208

>>108102170
>llama-omni
Is this in kobold?

Anonymous
02/09/26(Mon)12:39:21 No.108102216

Anonymous 02/09/26(Mon)12:39:21 No.108102216

>>108102187
i think we don't really want to know what they are doing with it...

Anonymous
02/09/26(Mon)12:39:34 No.108102224

Anonymous 02/09/26(Mon)12:39:34 No.108102224

>>108102208
>>108102204
oh nevermind. It's their own fork.

Anonymous
02/09/26(Mon)12:41:06 No.108102239

Anonymous 02/09/26(Mon)12:41:06 No.108102239

File: 1754764023935996.png (255 KB, 1278x1430)

255 KB PNG

>>108102204
huh, looks decent

Anonymous
02/09/26(Mon)12:50:02 No.108102317

Anonymous 02/09/26(Mon)12:50:02 No.108102317

>>108101800
>realtime
>turn based
nigger

Anonymous
02/09/26(Mon)13:10:40 No.108102515

Anonymous 02/09/26(Mon)13:10:40 No.108102515

>>108102083
Why are you using glm 4.5 for a simple translation job anyway, you have translate Gemma for that and the biggest one fits entirely in your vram

Anonymous
02/09/26(Mon)13:15:04 No.108102551

Anonymous 02/09/26(Mon)13:15:04 No.108102551

What models are good for fetish-focused explicit writing? I've tried like 4 or 5 and they tend to give very terse answers. Many just jump right into sex. aeline/halo sort of meets what I'm looking for but I'm on the lookout for anything better.

Anonymous
02/09/26(Mon)13:16:14 No.108102558

Anonymous 02/09/26(Mon)13:16:14 No.108102558

>>108102197
what are your other sampler settings? specifically top_k and top_p (also plz turn off all the other snake oil shit)
mistral small and ministral are recommended to be run at low temperature, and from my testing I concur they truly are best at 0.1 or 0.15, but they don't take kindly to cutting off the token distribution heavily with top_k and top_p.
I'd go as far as to say they perform best with top_k disabled altogether. It's weird, but it really works that much better that way. Low temperature doesn't make the model behave greedily because of its very flat distribution and too high a temp will make it go full retard.
Both it and gpt-oss are like the anti Qwen, whereas Qwen models are dogshit if you don't cut off their distribution.

Anonymous
02/09/26(Mon)13:19:30 No.108102584

Anonymous 02/09/26(Mon)13:19:30 No.108102584

File: file.png (29 KB, 831x131)

29 KB PNG

>>108102317
lrn2read

Anonymous
02/09/26(Mon)13:20:52 No.108102597

Anonymous 02/09/26(Mon)13:20:52 No.108102597

>>108102170
Give examples of audio cloning. 7B is obviously too stupid for chatting, but if it can be used as dedicated TTS with context understanding and correct prosody, it'd be nice.

Anonymous
02/09/26(Mon)13:32:20 No.108102689

Anonymous 02/09/26(Mon)13:32:20 No.108102689

>>108102597
I actually think 8B is the LLM and the other 1B is the TTS / STT. Either that or I'm retarded, gonna look into this tomorrow

Anonymous
02/09/26(Mon)13:37:51 No.108102738

Anonymous 02/09/26(Mon)13:37:51 No.108102738

how do u make your model not cooperate with you so easily. i am trying to roleplay but basically the model makes every character do what i want, nobody says fuck off or takes the wheel to rough me up. it's kinda boring desu baka senpai.

Anonymous
02/09/26(Mon)13:42:01 No.108102773

Anonymous 02/09/26(Mon)13:42:01 No.108102773

>>108098294
llama 2 7b is pretty good for incoherent slop

Anonymous
02/09/26(Mon)13:44:07 No.108102792

Anonymous 02/09/26(Mon)13:44:07 No.108102792

>>108102738
Try to write stories rather than RP. If the setup assumes that "user" doesn't exist in this story, it can make llm less sycophantic.
For example, say that you're writing a novel, and llm's job is to imitate {{char}} in this story. There is no {{user}}. It's just characters in the story.

Anonymous
02/09/26(Mon)13:48:15 No.108102822

Anonymous 02/09/26(Mon)13:48:15 No.108102822

File: setting out.jpg (310 KB, 1024x1024)

310 KB JPG

weeeeeeeeee

Anonymous
02/09/26(Mon)13:50:05 No.108102843

Anonymous 02/09/26(Mon)13:50:05 No.108102843

If CEO's start getting replaced with AI do you think they will make them cute anime girls, or do you think they will still be faceless machines?

Anonymous
02/09/26(Mon)13:52:40 No.108102873

Anonymous 02/09/26(Mon)13:52:40 No.108102873

>>108102843
The irony is that CEO is a perfect kind of job for an LLM to replace and current LLM's should easily manage that.

Anonymous
02/09/26(Mon)13:53:19 No.108102881

Anonymous 02/09/26(Mon)13:53:19 No.108102881

>>108100340
>new model comes out
>like 750b parameters for some reason
>no version that can run on consoomer hardware
why?

Anonymous
02/09/26(Mon)13:54:36 No.108102896

Anonymous 02/09/26(Mon)13:54:36 No.108102896

>>108102881
>he doesnt have an epyc with 1tb ram
LMAO

Anonymous
02/09/26(Mon)13:57:06 No.108102919

Anonymous 02/09/26(Mon)13:57:06 No.108102919

Aurora-alpha on OpenRouter could be a new version of gpt-oss:
https://openrouter.ai/openrouter/aurora-alpha

Anonymous
02/09/26(Mon)14:00:37 No.108102953

Anonymous 02/09/26(Mon)14:00:37 No.108102953

>>108102558
This was a default silly tavern install, I will double check the values shortly just at work

Anonymous
02/09/26(Mon)14:01:05 No.108102956

Anonymous 02/09/26(Mon)14:01:05 No.108102956

Ok give it to me straight. On a scale 1/10 how much of a meme/cope is REAP? Cause I may want to try it on GLM5.

Anonymous
02/09/26(Mon)14:01:34 No.108102959

Anonymous 02/09/26(Mon)14:01:34 No.108102959

>>108102584
>lrn2read
this is not what their demo does nigger.

Anonymous
02/09/26(Mon)14:02:06 No.108102964

Anonymous 02/09/26(Mon)14:02:06 No.108102964

>>108102896
and how much t/s does that get you ?

Anonymous
02/09/26(Mon)14:02:44 No.108102973

Anonymous 02/09/26(Mon)14:02:44 No.108102973

>>108102919
we must refuse

Anonymous
02/09/26(Mon)14:03:04 No.108102977

Anonymous 02/09/26(Mon)14:03:04 No.108102977

>>108102896
hey if you wanna buy me some hardware then go ahead :3

Anonymous
02/09/26(Mon)14:03:14 No.108102978

Anonymous 02/09/26(Mon)14:03:14 No.108102978

>>108102956
extreme meme

Anonymous
02/09/26(Mon)14:03:15 No.108102980

Anonymous 02/09/26(Mon)14:03:15 No.108102980

>>108102956
are you gonna do agentic/coding/work stuff? then it's ok. You wanna coom/rp? DO NOT USE

Anonymous
02/09/26(Mon)14:04:35 No.108102991

Anonymous 02/09/26(Mon)14:04:35 No.108102991

>>108102980
Can't you prune the exact opposite experts? Or actually which experts are even pruned in the first place?

Anonymous
02/09/26(Mon)14:04:56 No.108102996

Anonymous 02/09/26(Mon)14:04:56 No.108102996

>>108102919
>GPU fan whirrs up
>coils whining
>electricity meter acting like bitcoin
"I'm sorry but..."

Anonymous
02/09/26(Mon)14:05:41 No.108103005

Anonymous 02/09/26(Mon)14:05:41 No.108103005

File: 1769242392096.png (370 KB, 1592x1688)

370 KB PNG

>>108102956
I wouldn't dare to use any reap model even for programming.

Anonymous
02/09/26(Mon)14:05:45 No.108103006

Anonymous 02/09/26(Mon)14:05:45 No.108103006

>>108102991
rape measures the activations against a dataset (usually benchmax datasets). I'm not familiar with how to do it myself, but I'm sure you could provide an RP dataset to only keep the language related part

Anonymous
02/09/26(Mon)14:10:37 No.108103050

Anonymous 02/09/26(Mon)14:10:37 No.108103050

>>108103005
Nobody has reaped a model with a proper RP/chat/general dataset. Only for codeshits. A reap could kill the chingchong runes and fix a bunch of things but you'd need the disk space for the full model.

Anonymous
02/09/26(Mon)14:13:34 No.108103077

Anonymous 02/09/26(Mon)14:13:34 No.108103077

>>108103050
>A reap could... fix
absolutely nothing any manner of prune shite is far worse brain damage than even iq1 quanting.

Anonymous
02/09/26(Mon)14:19:32 No.108103131

Anonymous 02/09/26(Mon)14:19:32 No.108103131

>>108103077
if i'm gonna take 25% of your brain's mass.
would you rather want me to take a random whole section of it.

25% of neurons, randomly.
or a bit of every neurons.

Anonymous
02/09/26(Mon)14:23:22 No.108103164

Anonymous 02/09/26(Mon)14:23:22 No.108103164

>>108103077
RAEP removed GLM's alignment.. it can fix some stuff as long as you don't throw away language performance for benchmax code. ZERO people tried.

Anonymous
02/09/26(Mon)14:24:53 No.108103180

Anonymous 02/09/26(Mon)14:24:53 No.108103180

>>108102964
nta, but 13t/s with kimi @ q4
cpumaxxing a couple of years ago ended up being the winning move. The homegrown lmg reference architecture has been able to run anything (even dense) that came out, and all the moe models at at least reading speed (ie everything close to SOTA)
2 years of actual use vs costs-more-than-a-house VRAM maxxers or cope "its not worth it" never-CPUers.
API-fags opinions discarded because /lmg/

Anonymous
02/09/26(Mon)14:26:39 No.108103199

Anonymous 02/09/26(Mon)14:26:39 No.108103199

>>108103180
>13t/s
man so close, if we can push to 20 or 30t/s i'd consider it desu.
was it with ddr5 epyc or ddr4?

i'm not gonna use a model i can't get to at least 20t/s but idealy 30 or even 40t/s
more than that i don't care much.

Anonymous
02/09/26(Mon)14:28:12 No.108103217

Anonymous 02/09/26(Mon)14:28:12 No.108103217

Engrams will save local.

Anonymous
02/09/26(Mon)14:28:44 No.108103224

Anonymous 02/09/26(Mon)14:28:44 No.108103224

n-words will save local

Anonymous
02/09/26(Mon)14:31:32 No.108103250

Anonymous 02/09/26(Mon)14:31:32 No.108103250

Have any completion only models like text-davinci-003 ever been leaked? That was probably the best experience I've ever had with AI.

Anonymous
02/09/26(Mon)14:32:10 No.108103259

Anonymous 02/09/26(Mon)14:32:10 No.108103259

>>108103217
Dude that would be amazing and also a good reason to short nvidia lol.

Anonymous
02/09/26(Mon)14:32:21 No.108103262

Anonymous 02/09/26(Mon)14:32:21 No.108103262

>>108103199
>was it with ddr5 epyc or ddr4?
ddr5 4800. if we get good NUMA behaviour in lcpp then 20t/s is achievable, but i don't think anyone is seriously working on it and its been years...

Anonymous
02/09/26(Mon)14:32:30 No.108103263

Anonymous 02/09/26(Mon)14:32:30 No.108103263

>>108103250
You're thinking of base (pretrained) models

Anonymous
02/09/26(Mon)14:33:04 No.108103269

Anonymous 02/09/26(Mon)14:33:04 No.108103269

>>108103217
Bad news bears. They quantize poorly.

Anonymous
02/09/26(Mon)14:33:38 No.108103272

Anonymous 02/09/26(Mon)14:33:38 No.108103272

>>108103269
Not an issue.

Anonymous
02/09/26(Mon)14:35:38 No.108103287

Anonymous 02/09/26(Mon)14:35:38 No.108103287

>>108103262
Thanks anon, I'll consider it when ram has come down if engram end up being a meme.
I'm ready to blow like 10k.

Anonymous
02/09/26(Mon)14:36:59 No.108103305

Anonymous 02/09/26(Mon)14:36:59 No.108103305

>>108103269
We could store them on nvme so doesn't matter.

Anonymous
02/09/26(Mon)14:38:33 No.108103318

Anonymous 02/09/26(Mon)14:38:33 No.108103318

>>108103305
worse news...nvme prices got memed back up : /

Anonymous
02/09/26(Mon)14:47:24 No.108103401

Anonymous 02/09/26(Mon)14:47:24 No.108103401

>>108103305
If you bought EPYC Turin and used EVERY SINGLE PCIE LANE in RAID0 with fast enough NVMEs to saturate the lanes, you'd get aggregate bandwidth around 600GB/s, which is actually pretty good. Latency would be killer tho.
What a franken-rig that'd be, holy shit. I'd love to see someone actually build it. The BOM would look crazy

Anonymous
02/09/26(Mon)14:51:25 No.108103434

Anonymous 02/09/26(Mon)14:51:25 No.108103434

>>108103401
>160 lanes
>4 lanes per nvme
dear god...

Anonymous
02/09/26(Mon)14:59:05 No.108103493

Anonymous 02/09/26(Mon)14:59:05 No.108103493

>>108103434
What is cheaper... 1 Tb of ram or 40x1Tb nvme drives?

Anonymous
02/09/26(Mon)15:00:58 No.108103513

Anonymous 02/09/26(Mon)15:00:58 No.108103513

>>108103434
>>108103493
Yep, you'd need dual socket and 40 drives and a board that let you adapt all the lanes to occulink or slimsas or something
And you'd still only get 1/4 of the bandwidth of main memory!
However you'd be able to build it out at 100x cheaper (or 100x capacity for the same price).
petabytes of "memory", anyone?

Anonymous
02/09/26(Mon)15:07:08 No.108103556

Anonymous 02/09/26(Mon)15:07:08 No.108103556

>>108103513
finally, the build for toss2 72T A8B at home

Anonymous
02/09/26(Mon)15:08:14 No.108103569

Anonymous 02/09/26(Mon)15:08:14 No.108103569

>>108103493
It would be fucking cool to be able to infer against arbitrary models. Switching would be instant! You could have a router model or write some arbitration logic into lcpp to select models/bitdepth depending on the task

Anonymous
02/09/26(Mon)15:08:15 No.108103570

Anonymous 02/09/26(Mon)15:08:15 No.108103570

>>108103493
>>108103513
the drives would degrade very quickly, even if you use enterprise drives. your build will die within a few months of moderate use.

Anonymous
02/09/26(Mon)15:09:15 No.108103581

Anonymous 02/09/26(Mon)15:09:15 No.108103581

>>108103570
>the drives would degrade very quickly, even if you use enterprise drives. your build will die within a few months of moderate use.
mount read-only. Only remount read-write for adding models.

Anonymous
02/09/26(Mon)15:09:36 No.108103587

Anonymous 02/09/26(Mon)15:09:36 No.108103587

>>108103570
how much does reading damage ssds anyway? do they give official numbers?

Anonymous
02/09/26(Mon)15:14:53 No.108103644

Anonymous 02/09/26(Mon)15:14:53 No.108103644

>>108103587
reading doesn't damage, but after too many reads you get voltage leakage and need to re-write the block.
at raid0 you'd probably be 50/50 on losing the array every year due to some stupid failure. RAID10 probably makes more sense and would still be cheaper and allow some failures without having to recreate the whole array every time there was a blip. Read speeds would still be pretty high (500GB/s?)

Anonymous
02/09/26(Mon)15:21:05 No.108103685

Anonymous 02/09/26(Mon)15:21:05 No.108103685

>>108103644
With such a huge raid, you'll be probably bottlenecked by some single-thread kernel function.

Anonymous
02/09/26(Mon)15:23:45 No.108103708

Anonymous 02/09/26(Mon)15:23:45 No.108103708

File: 1.png (71 KB, 757x1060)

71 KB PNG

>>108103570
>your build will die within a few months
it wouldn't last too many years but this is pure unadulterated bs fearmongering
every time someone did actual endurance testing on a variety of ssds most far outlived the expectations of both the public and the manufacturer's own metric
pic related shows drives rated for 150tbw like the old 850 pro 256gb lasting upwards 7500 tb
these are writes but I bet you filthy nigger spamming this thread about le evil of reads every time the topic comes up are also full of shit just like people who were crying about writes (I remember people who did shit like put the browser cache on an external spinning rust drive because they feared doing too many writes on their SSD LE FUCKING MAO)

Anonymous
02/09/26(Mon)15:27:12 No.108103736

Anonymous 02/09/26(Mon)15:27:12 No.108103736

>>108103644
raid 1 and 0 have the same read speed, raid 0 only has more capacity and write speed.

Anonymous
02/09/26(Mon)15:27:54 No.108103742

Anonymous 02/09/26(Mon)15:27:54 No.108103742

>>108103570
No reason why that should be true if he's mmaping 1TB of model weights.

Anonymous
02/09/26(Mon)15:28:52 No.108103747

Anonymous 02/09/26(Mon)15:28:52 No.108103747

>>108103685
>With such a huge raid, you'll be probably bottlenecked by some single-thread kernel function.
Really tho? High capacity, high-speed RAID arrays are a pretty standard enterprise thing. I'd be shocked in even out of the box untuned kernels in major distros were close to the theoretical limit.
Have you ever hit a bottleneck in the field?

Anonymous
02/09/26(Mon)15:30:30 No.108103761

Anonymous 02/09/26(Mon)15:30:30 No.108103761

>>108103685
dragonflybsd will finally be relevant again

Anonymous
02/09/26(Mon)15:32:28 No.108103773

Anonymous 02/09/26(Mon)15:32:28 No.108103773

>>108103287
35k just for the 1tb ram itself nowadays. I'm seriously considering M5 ultra and cope quants.

Anonymous
02/09/26(Mon)15:35:16 No.108103796

Anonymous 02/09/26(Mon)15:35:16 No.108103796

>>108103747
Honestly, no. But 40 drives in a single array will definitely need lots of cpu cycles. And since you need raid not just for redundancy like enterprise, but for maximum bandwidth of hundreds of gb/s, I assume that at some point you will hit synchronization bottleneck.

Anonymous
02/09/26(Mon)15:37:05 No.108103813

Anonymous 02/09/26(Mon)15:37:05 No.108103813

>>108103773
if m5 ultra ships with more than 512gb on-die and/or retails for less than $20k I'll be shocked

Anonymous
02/09/26(Mon)15:38:05 No.108103820

Anonymous 02/09/26(Mon)15:38:05 No.108103820

>>108103796
You'd need a half-dozen lanes for networking and a gpu link for prompt processing to make it actually real-world usable, too.
Or maybe console-only inference? Complete airgap? 1337

Anonymous
02/09/26(Mon)15:39:56 No.108103834

Anonymous 02/09/26(Mon)15:39:56 No.108103834

>>108103434
I fully expect to see this in some clickbait youtube tech channel and at the top of hacker news within a month. Mention lmg, ya filthy animal!

Anonymous
02/09/26(Mon)15:43:18 No.108103860

Anonymous 02/09/26(Mon)15:43:18 No.108103860

>>108103834
>Mention lmg
Don't, we have enough tourists.

Anonymous
02/09/26(Mon)15:43:36 No.108103863

Anonymous 02/09/26(Mon)15:43:36 No.108103863

>>108103813
I had hopes they might up it to 1TB for the M5 Ultra, but there's just no way after the memory price hikes.

Anonymous
02/09/26(Mon)15:49:52 No.108103904

Anonymous 02/09/26(Mon)15:49:52 No.108103904

File: 0548473.jpg (92 KB, 1270x606)

92 KB JPG

zuck and wang will save local

Anonymous
02/09/26(Mon)15:51:34 No.108103917

Anonymous 02/09/26(Mon)15:51:34 No.108103917

>>108103904
>inb4 1.5T param MoE

Anonymous
02/09/26(Mon)15:52:22 No.108103926

Anonymous 02/09/26(Mon)15:52:22 No.108103926

File: image.jpg (8 KB, 460x109)

8 KB JPG

>>108103904
>source

Anonymous
02/09/26(Mon)15:53:41 No.108103937

Anonymous 02/09/26(Mon)15:53:41 No.108103937

>>108103904
omg i believe it 100%! what is doubt?

Anonymous
02/09/26(Mon)16:00:41 No.108104006

Anonymous 02/09/26(Mon)16:00:41 No.108104006

File: i_believe_you.png (592 KB, 747x800)

592 KB PNG

>>108103904

Anonymous
02/09/26(Mon)16:02:30 No.108104018

Anonymous 02/09/26(Mon)16:02:30 No.108104018

>>108103904
>these new models are better than the old models, which were worse than llama 2-3

its over

Anonymous
02/09/26(Mon)16:03:19 No.108104024

Anonymous 02/09/26(Mon)16:03:19 No.108104024

>>108101114
Synth kills generalization

Anonymous
02/09/26(Mon)16:04:15 No.108104038

Anonymous 02/09/26(Mon)16:04:15 No.108104038

>>108103904
llamabros!!!! we're so back!!!!!! i never doubted!

Anonymous
02/09/26(Mon)16:04:38 No.108104043

Anonymous 02/09/26(Mon)16:04:38 No.108104043

>>108103904
That's the exact opposite of the last we heard of Avocado.
>local
Doubt.

Anonymous
02/09/26(Mon)16:05:09 No.108104049

Anonymous 02/09/26(Mon)16:05:09 No.108104049

>>108103904
I don't believe this. This is just more hype to attract capital investment like usual.

Anonymous
02/09/26(Mon)16:12:51 No.108104129

Anonymous 02/09/26(Mon)16:12:51 No.108104129

>>108103904
IF this is true and thats a big IF. I think thats the nail in the coffin for "open"AI

Anonymous
02/09/26(Mon)16:27:04 No.108104264

Anonymous 02/09/26(Mon)16:27:04 No.108104264

>>108103050
>real communism has never been tried

Anonymous
02/09/26(Mon)16:30:00 No.108104290

Anonymous 02/09/26(Mon)16:30:00 No.108104290

>>108104024
Generalization doesn't matter if you're benchmaxxing.

Anonymous
02/09/26(Mon)16:50:33 No.108104479

Anonymous 02/09/26(Mon)16:50:33 No.108104479

>>108104466
>>108104466
>>108104466

Anonymous
02/09/26(Mon)17:02:23 No.108104590

Anonymous 02/09/26(Mon)17:02:23 No.108104590

>>108103493
If you do a raid 0 you may as well go with the smallest nvme you can find, aggregate capacity would be more than enough.

Anonymous
02/09/26(Mon)17:05:54 No.108104621

Anonymous 02/09/26(Mon)17:05:54 No.108104621

File: phonk-skull.gif (1.2 MB, 220x220)

1.2 MB GIF

>>108103904
>New model beats previous model
Oh boy, it's a <30B parameter model that's distilled to fuck to pass very specific benchmarks.

Anonymous
02/09/26(Mon)17:23:17 No.108104769

Anonymous 02/09/26(Mon)17:23:17 No.108104769

File: 1758824081804478.png (492 KB, 917x900)

492 KB PNG

Anonymous
02/09/26(Mon)18:09:31 No.108105196

Anonymous 02/09/26(Mon)18:09:31 No.108105196

File: UNO reverse 126285129_p0.jpg (2.66 MB, 2389x4586)

2.66 MB JPG

>>108104769

Anonymous
02/09/26(Mon)18:11:02 No.108105213

Anonymous 02/09/26(Mon)18:11:02 No.108105213

File: 1755812647657264.png (61 KB, 276x225)

61 KB PNG

>>108105196

Anonymous
02/09/26(Mon)18:17:37 No.108105270

Anonymous 02/09/26(Mon)18:17:37 No.108105270

File: Nhim Sasuke 138838790_p0.jpg (1.97 MB, 1491x2048)

1.97 MB JPG

>>108105213

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.