/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 12/03/25(Wed)15:48:30 No.107422991

File: winter miku.png (1.79 MB, 768x1344)

1.79 MB PNG

/lmg/ - Local Models General Anonymous 12/03/25(Wed)15:48:30 No.107422991 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107412042 & >>107405479

►News
>(12/02) Mistral Large 3 and Ministral 3 released: https://mistral.ai/news/mistral-3
>(12/01) Trinity Nano (6B-A1B) and Mini (26B-A3B) released: https://arcee.ai/blog/the-trinity-manifesto
>(12/01) Merged: model: support Ministral3 #17644: https://github.com/ggml-org/llama.cpp/pull/17644
>(12/01) DeepSeek V3.2 and Speciale released: https://hf.co/deepseek-ai/DeepSeek-V3.2-Speciale
>(11/28) Qwen3 Next support merged: https://github.com/ggml-org/llama.cpp/pull/16095

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/03/25(Wed)15:48:48 No.107422995

Anonymous 12/03/25(Wed)15:48:48 No.107422995

File: migu.jpg (52 KB, 736x649)

52 KB JPG

►Recent Highlights from the Previous Thread: >>107412042

--Troubleshooting persistent prompt caching in llama-server with slot API endpoints:
>107415735 >107415774 >107416053 >107416084 >107416666 >107416689 >107416734 >107416757 >107415832 >107416062 >107416103 >107415898 >107415914
--Mistral model's formatting issues:
>107416083 >107416554 >107416580 >107416560 >107416641
--Z-Image-Turbo Chinese prompt optimization with low VRAM solutions:
>107415693 >107415700 >107415702 >107415718 >107415725 >107416187 >107416711
--Critique of Mistral's model release strategy vs Llama 4:
>107420881 >107420930 >107420989 >107421043 >107421050 >107420983 >107420984 >107421070 >107421055
--Designing neural networks for specific cognitive functions:
>107412634 >107412692 >107412725 >107412778
--RTX 5070ti suitability for local models and overcoming token repetition challenges:
>107412315 >107412386 >107412419 >107412495 >107412541 >107412595 >107414188
--DDR5 price spike and DRAM market implications for consumers and tech companies:
>107421733 >107421959 >107421962 >107421993 >107421996 >107421969 >107422007
--AI model performance and usability debates:
>107420176 >107420784 >107421608 >107421728 >107421998 >107422102 >107422145 >107422210 >107422360
--Minimalist AI/ML learning resource recommendations for webdev transitioning to AI:
>107418678 >107418849 >107418882 >107418913 >107419160 >107419241
--Testing VibeVoice's Japanese and high-pitched voice capabilities:
>107419296 >107419308 >107419333 >107419339 >107419364 >107419372
--Perceived stagnation in LLM pre-training advancements:
>107416311 >107416340 >107416369 >107416402
--Tetris clone generated by Ministral 3 14B Q4 using pygame:
>107417044
--Nvidia's involvement in MistralAI's new model training:
>107421114
--Miku (free space):
>107416083 >107416641 >107418984 >107419160 >107419653 >107421986

►Recent Highlight Posts from the Previous Thread: >>107412048

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/03/25(Wed)15:52:10 No.107423050

Anonymous 12/03/25(Wed)15:52:10 No.107423050

https://huggingface.co/NousResearch/Hermes-4.3-36B-GGUF
>Hermes 4.3 36B is a frontier, hybrid-mode reasoning model based on ByteDance Seed 36B base
>Seed 36B base
first time I've heard of that model

Anonymous
12/03/25(Wed)16:01:21 No.107423174

Anonymous 12/03/25(Wed)16:01:21 No.107423174

>>107423050
Funny. Second time I've read that post.

Anonymous
12/03/25(Wed)16:02:58 No.107423188

Anonymous 12/03/25(Wed)16:02:58 No.107423188

>>107423174
almost as if it's useless to ask a question on the end of a thread or something

Anonymous
12/03/25(Wed)16:03:17 No.107423191

Anonymous 12/03/25(Wed)16:03:17 No.107423191

File: 1761117575776279.png (10 KB, 957x596)

10 KB PNG

Why doesn't Ministral work in ooba?

Anonymous
12/03/25(Wed)16:04:19 No.107423206

Anonymous 12/03/25(Wed)16:04:19 No.107423206

>>107423191
you're using ooba trying to run a new model

Anonymous
12/03/25(Wed)16:04:48 No.107423210

Anonymous 12/03/25(Wed)16:04:48 No.107423210

>>107423206
Yeah you're right. Silly me! Haha!

Anonymous
12/03/25(Wed)16:05:42 No.107423217

Anonymous 12/03/25(Wed)16:05:42 No.107423217

>>107423050
>hybrid-mode reasoning
Into the trash it goes. Seriously though, a dense 36B reasoner with 512K context might be good for low-VRAM agentic coding.

Anonymous
12/03/25(Wed)16:05:53 No.107423221

Anonymous 12/03/25(Wed)16:05:53 No.107423221

>>107423188
>ask a question
You didn't ask any question.
It didn't take much to find it
>https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Base

Anonymous
12/03/25(Wed)16:06:40 No.107423230

Anonymous 12/03/25(Wed)16:06:40 No.107423230

>>107423221
>You didn't ask
it was implied, but only the 3 digit IQ could get that, sorry for your loss

Anonymous
12/03/25(Wed)16:07:29 No.107423239

Anonymous 12/03/25(Wed)16:07:29 No.107423239

>>107423230
How much IQ do you need to use hugging face's search, then?

Anonymous
12/03/25(Wed)16:08:10 No.107423245

Anonymous 12/03/25(Wed)16:08:10 No.107423245

>>107423239
how much IQ do you need to understand that I wanted to hear people's genuine opinion of that base model, and not a model card shilling their own product? are you retarded or something?

Anonymous
12/03/25(Wed)16:08:20 No.107423247

Anonymous 12/03/25(Wed)16:08:20 No.107423247

>>107422991
>jacket
>scarf
>bare thighs
Is Miku retarded?

Anonymous
12/03/25(Wed)16:09:20 No.107423261

Anonymous 12/03/25(Wed)16:09:20 No.107423261

>>107423217
dense is fucking shit bro
no reason to use it over moe

Anonymous
12/03/25(Wed)16:10:16 No.107423270

Anonymous 12/03/25(Wed)16:10:16 No.107423270

>>107423245
>blablabla
>Never heard of it. Is it any good?

Anonymous
12/03/25(Wed)16:11:08 No.107423284

Anonymous 12/03/25(Wed)16:11:08 No.107423284

>>107423270
>Yes, I'm retarded and autistic, and I can't read the room
yeah I know

Anonymous
12/03/25(Wed)16:12:46 No.107423308

Anonymous 12/03/25(Wed)16:12:46 No.107423308

>>107423261
That's why Maverick 402B was better than the shitty old and dense 405B, right?

Anonymous
12/03/25(Wed)16:13:23 No.107423315

Anonymous 12/03/25(Wed)16:13:23 No.107423315

>>107423191
The best decision you can make in this hobby is to ooba(pieceofshit) > llamacpp . It is like moving from windows to troonix if troonix was actually good.

Anonymous
12/03/25(Wed)16:14:17 No.107423325

Anonymous 12/03/25(Wed)16:14:17 No.107423325

>>107423247
yes

Anonymous
12/03/25(Wed)16:14:22 No.107423326

Anonymous 12/03/25(Wed)16:14:22 No.107423326

>>107423308
dense 405b was fucking shit too, they're both about equally bad
qwen showed that dense is utterly worthless and only helps people cope for overspending on hardware with qwen3-32b dense and 30ba3 where the latter was superior

Anonymous
12/03/25(Wed)16:14:38 No.107423331

Anonymous 12/03/25(Wed)16:14:38 No.107423331

>>107423247
that's just women in general

Anonymous
12/03/25(Wed)16:15:46 No.107423352

Anonymous 12/03/25(Wed)16:15:46 No.107423352

>>107423284
Communicating effectively is very useful, anon. You should try it.

Anonymous
12/03/25(Wed)16:16:05 No.107423357

Anonymous 12/03/25(Wed)16:16:05 No.107423357

>>107423326
superior on benches?

Anonymous
12/03/25(Wed)16:17:02 No.107423363

Anonymous 12/03/25(Wed)16:17:02 No.107423363

>>107423352
it was an effective communication, but you're too retarded to get it, not my fault your IQ is too low, blame god for that or something

Anonymous
12/03/25(Wed)16:17:50 No.107423375

Anonymous 12/03/25(Wed)16:17:50 No.107423375

need magistral large 3

Anonymous
12/03/25(Wed)16:18:58 No.107423393

Anonymous 12/03/25(Wed)16:18:58 No.107423393

>>107423375
need mistral medium

Anonymous
12/03/25(Wed)16:19:58 No.107423410

Anonymous 12/03/25(Wed)16:19:58 No.107423410

>>107423363
It shows. You got a lot of useful replies.

Anonymous
12/03/25(Wed)16:21:33 No.107423434

Anonymous 12/03/25(Wed)16:21:33 No.107423434

>>107423410
All because of tismo derailing.

Anonymous
12/03/25(Wed)16:21:43 No.107423435

Anonymous 12/03/25(Wed)16:21:43 No.107423435

File: G7NSq1aW0AAeot-.jpg (192 KB, 1586x1036)

192 KB JPG

lol. mistral large 3 is slightly better than a 9B / llama 4 maverick. Not even close to glm / deepseek / kimi...

Anonymous
12/03/25(Wed)16:21:59 No.107423440

Anonymous 12/03/25(Wed)16:21:59 No.107423440

>>107423410
>You got useful replies.
I did yeah, again, are you retarded or something?
>>107423217
>>107423261

Anonymous
12/03/25(Wed)16:22:55 No.107423457

Anonymous 12/03/25(Wed)16:22:55 No.107423457

If someone had actually supported and quanted the cohere with vision, i would have used it. Nobody did that tho.
I do need to revisit the ones that work since everything else is parrotmaxxed. Not like we're getting a large or have anything to look forward to anymore.

Anonymous
12/03/25(Wed)16:23:11 No.107423461

Anonymous 12/03/25(Wed)16:23:11 No.107423461

>>107423435
that is what happens when EU law forces you to strip all copyrighted data out of your datasets.

Anonymous
12/03/25(Wed)16:23:54 No.107423473

Anonymous 12/03/25(Wed)16:23:54 No.107423473

File: 1711104757843628.jpg (123 KB, 768x1024)

123 KB JPG

>every model is bloated 1000TB moe
>they're still all slopped garbage
>nobody bothers to finetune anymore which was the theoretical advantage of local models
>rammaxxing was already a meme and is now fully dead
is it time to accept that it's really, truly over this time?

Anonymous
12/03/25(Wed)16:24:11 No.107423477

Anonymous 12/03/25(Wed)16:24:11 No.107423477

>>107423435
>their own medium is quite a bit higher
wow I know migastral is a thinker but still

Anonymous
12/03/25(Wed)16:24:45 No.107423482

Anonymous 12/03/25(Wed)16:24:45 No.107423482

>>107423473
just stop being poor, though ram costs a arm and a leg now

Anonymous
12/03/25(Wed)16:24:49 No.107423483

Anonymous 12/03/25(Wed)16:24:49 No.107423483

>>107423440
You could have gotten that out of the mode card, anon. The one you could have found yourself.

Anonymous
12/03/25(Wed)16:24:59 No.107423487

Anonymous 12/03/25(Wed)16:24:59 No.107423487

File: crysad.jpg (136 KB, 1887x1742)

136 KB JPG

>>107423473
Looks pretty over yea. Be happy with what you got and maximize it.

Anonymous
12/03/25(Wed)16:25:07 No.107423491

Anonymous 12/03/25(Wed)16:25:07 No.107423491

Sirs when Ganesh Gemma 4 to increase Bharati izzat?

Anonymous
12/03/25(Wed)16:26:08 No.107423515

Anonymous 12/03/25(Wed)16:26:08 No.107423515

>>107423483
>You could have gotten that out of the mode card, anon.
all right he's genuinely braindead >>107423245
>how much IQ do you need to understand that I wanted to hear people's genuine opinion of that base model, and not a model card shilling their own product?

Anonymous
12/03/25(Wed)16:26:15 No.107423517

Anonymous 12/03/25(Wed)16:26:15 No.107423517

>>107423435
>20b gpt-oss at the same level as R1
nice benchie bro

Anonymous
12/03/25(Wed)16:27:04 No.107423533

Anonymous 12/03/25(Wed)16:27:04 No.107423533

>>107423435
artificial analcysts literally gushed over reflection 70b. none of their benches mean a thing.

Anonymous
12/03/25(Wed)16:27:06 No.107423534

Anonymous 12/03/25(Wed)16:27:06 No.107423534

izzat!
get it it's indian so much kek lmao

Anonymous
12/03/25(Wed)16:27:49 No.107423542

Anonymous 12/03/25(Wed)16:27:49 No.107423542

>>107423435
>GPT-OSS-20B above GPT-5.1

Anonymous
12/03/25(Wed)16:28:15 No.107423547

Anonymous 12/03/25(Wed)16:28:15 No.107423547

>>107423533
Well, yeah, they were given Claude behind a wrapper

Anonymous
12/03/25(Wed)16:28:38 No.107423553

Anonymous 12/03/25(Wed)16:28:38 No.107423553

>>107423534
Good Morning!

Anonymous
12/03/25(Wed)16:29:35 No.107423567

Anonymous 12/03/25(Wed)16:29:35 No.107423567

>>107423534
SAAAAR

Anonymous
12/03/25(Wed)16:29:42 No.107423568

Anonymous 12/03/25(Wed)16:29:42 No.107423568

>>107423547
>experts
Didn't notice. They're grifters supreme.

Anonymous
12/03/25(Wed)16:30:56 No.107423578

Anonymous 12/03/25(Wed)16:30:56 No.107423578

>>107423473
>is it time to accept that it's really, truly over this time?
DeepSeek-V4-BitNet-671B-Omni (80GB) will drop for Christmas and save local

Anonymous
12/03/25(Wed)16:30:57 No.107423579

Anonymous 12/03/25(Wed)16:30:57 No.107423579

>>107423534
SAAR DO NOT REDEEM MY IZZAT

Anonymous
12/03/25(Wed)16:31:35 No.107423585

Anonymous 12/03/25(Wed)16:31:35 No.107423585

why can we not have medium sized models

Anonymous
12/03/25(Wed)16:32:29 No.107423594

Anonymous 12/03/25(Wed)16:32:29 No.107423594

>>107423585
GLM Air? GPT-OSS-120B?

Anonymous
12/03/25(Wed)16:32:37 No.107423595

Anonymous 12/03/25(Wed)16:32:37 No.107423595

File: 1744642411599485.png (57 KB, 608x695)

57 KB PNG

Anonymous
12/03/25(Wed)16:32:56 No.107423601

Anonymous 12/03/25(Wed)16:32:56 No.107423601

>>107423594
yeah, but actually good medium sized models

Anonymous
12/03/25(Wed)16:33:12 No.107423607

Anonymous 12/03/25(Wed)16:33:12 No.107423607

>>107423534
???

Anonymous
12/03/25(Wed)16:33:25 No.107423608

Anonymous 12/03/25(Wed)16:33:25 No.107423608

>>107423585
They eat into API profits.

Anonymous
12/03/25(Wed)16:33:42 No.107423609

Anonymous 12/03/25(Wed)16:33:42 No.107423609

>>107423534
izzat respect deez nuts ahhahahaha
peak delhi humor activated bhai, full izzat loot li tune

Anonymous
12/03/25(Wed)16:33:47 No.107423611

Anonymous 12/03/25(Wed)16:33:47 No.107423611

>>107423595
>feb 2025
unc posting prehistoric cave paintings

Anonymous
12/03/25(Wed)16:34:33 No.107423618

Anonymous 12/03/25(Wed)16:34:33 No.107423618

>>107423611
keek

Anonymous
12/03/25(Wed)16:35:12 No.107423632

Anonymous 12/03/25(Wed)16:35:12 No.107423632

>Transformers doesn’t support deepseek_v32
Over before it started

Anonymous
12/03/25(Wed)16:35:45 No.107423642

Anonymous 12/03/25(Wed)16:35:45 No.107423642

>>107423595
>>107423611
it's timeless, nothing has changed
someone is still using dr*mmers tunes, guaranteed

Anonymous
12/03/25(Wed)16:37:18 No.107423659

Anonymous 12/03/25(Wed)16:37:18 No.107423659

>>107423642
Yeah that someone is me it's the only thing that works all newer shit is just cope.

Anonymous
12/03/25(Wed)16:40:36 No.107423698

Anonymous 12/03/25(Wed)16:40:36 No.107423698

>>107423659
For all the shit people talked, he's the last tuner left. Actually tried his hand at air and cranked out a bunch of larges that top out the UGI leaderboard.

Anonymous
12/03/25(Wed)16:45:49 No.107423762

Anonymous 12/03/25(Wed)16:45:49 No.107423762

Is it possible to make a finetune of glm air on consumer hardware?

Anonymous
12/03/25(Wed)16:46:56 No.107423776

Anonymous 12/03/25(Wed)16:46:56 No.107423776

>>107423762
no

Anonymous
12/03/25(Wed)16:47:35 No.107423781

Anonymous 12/03/25(Wed)16:47:35 No.107423781

>>107423762
Depends on what you count as "consumer". A stack of 96gb 6000s, sure.

Anonymous
12/03/25(Wed)16:51:17 No.107423829

Anonymous 12/03/25(Wed)16:51:17 No.107423829

>>107423762
Basically no, but it's not that expensive to rent a few big boy GPUs. Your dataset and strategy is what matters not so much the hardware

Anonymous
12/03/25(Wed)16:51:22 No.107423834

Anonymous 12/03/25(Wed)16:51:22 No.107423834

>>107423776
>>107423781
This guy claims he did it with a single blackwell 6000 but that seems impossible. https://huggingface.co/Green-eyedDevil/Monika-106B-GGUFs

Anonymous
12/03/25(Wed)16:52:41 No.107423852

Anonymous 12/03/25(Wed)16:52:41 No.107423852

File: file.png (46 KB, 847x151)

46 KB PNG

whats causing the llm to do this

this is qwen vl 2b trying to describe an imag

Anonymous
12/03/25(Wed)16:53:04 No.107423855

Anonymous 12/03/25(Wed)16:53:04 No.107423855

>>107423852
temp is too high presumably

Anonymous
12/03/25(Wed)16:53:17 No.107423856

Anonymous 12/03/25(Wed)16:53:17 No.107423856

>>107423852
repetition penalty

Anonymous
12/03/25(Wed)16:53:25 No.107423858

Anonymous 12/03/25(Wed)16:53:25 No.107423858

>>107423852
Repetition penalty. Turn that shit off.

Anonymous
12/03/25(Wed)16:54:38 No.107423872

Anonymous 12/03/25(Wed)16:54:38 No.107423872

>>107423856
>>107423858
is it the repetition in the prompt? i do have some repeating sentences

Anonymous
12/03/25(Wed)16:56:11 No.107423895

Anonymous 12/03/25(Wed)16:56:11 No.107423895

>>107423872
It's a setting on whatever you use to run the model. Either the front or the backend. Disable it.

Anonymous
12/03/25(Wed)16:59:06 No.107423938

Anonymous 12/03/25(Wed)16:59:06 No.107423938

File: file.png (6 KB, 668x60)

6 KB PNG

>>107423895
nice found it thanks

Anonymous
12/03/25(Wed)16:59:53 No.107423950

Anonymous 12/03/25(Wed)16:59:53 No.107423950

>>107423834
Unsloth was promoting some new feature recently for finetuning deepseek with little vram so it was probably that

Anonymous
12/03/25(Wed)17:26:19 No.107424320

Anonymous 12/03/25(Wed)17:26:19 No.107424320

Getting tired of the usual models I keep around for workshopping story ideas to write, give me some suggestions
>inb4 smolm, toss 20b, petra 13b, starling, pyg

Anonymous
12/03/25(Wed)17:26:51 No.107424328

Anonymous 12/03/25(Wed)17:26:51 No.107424328

>>107424320
toss 120b

Anonymous
12/03/25(Wed)17:27:51 No.107424350

Anonymous 12/03/25(Wed)17:27:51 No.107424350

>>107424320
https://huggingface.co/roneneldan/TinyStories-1M

Anonymous
12/03/25(Wed)17:30:24 No.107424391

Anonymous 12/03/25(Wed)17:30:24 No.107424391

>>107424328
I mean I guess I could if I want to wait for it to ramble about its policy about a story designed to be ToS friendly for 8k tokens. Actual suggestion?
>>107424350
I've used 32bs that become retarded in 1k tokens, but thank you for your very constructive contribution
btw this retarded behavior is why virtually the entire thread hates you, find another hobby

Anonymous
12/03/25(Wed)17:31:30 No.107424413

Anonymous 12/03/25(Wed)17:31:30 No.107424413

>>107424391
GLM Air is basically the only semi-decent and realistically runnable model available right now. Check back in 2 weeks.

Anonymous
12/03/25(Wed)17:32:59 No.107424437

Anonymous 12/03/25(Wed)17:32:59 No.107424437

>>107423834
>Trained with Axolotl on my Blackwell Pro 6000 Max-Q. 12 rank, 24 alpha, 3 epochs. Took about 45 hours.
It's probably qlora. Minuscule rank.

Anonymous
12/03/25(Wed)17:33:52 No.107424449

Anonymous 12/03/25(Wed)17:33:52 No.107424449

>>107423191
LOOOOL did you draw this?

Anonymous
12/03/25(Wed)17:35:18 No.107424467

Anonymous 12/03/25(Wed)17:35:18 No.107424467

>>107424413
That's what made me open the thread, I was swapping between a bunch, regenerating model replies and was like "welp guess I have to try air" and it made a very stupid mistake within 700 tokens. The rest of it was fine, but it's just irritating. I was figuring maybe someone has a non major company model they like using for an occasional breath of fresh air, but I guess it's all just meta, qwen, mistral or whatever

Anonymous
12/03/25(Wed)17:35:26 No.107424470

Anonymous 12/03/25(Wed)17:35:26 No.107424470

>>107423585
qwen 30b and other 20b-30b models
also qwen 80b
there's 48b model too, but no gguf
next med sized qwen will be around 30b-60b
yes all moe because moe is luv

Anonymous
12/03/25(Wed)17:36:56 No.107424489

Anonymous 12/03/25(Wed)17:36:56 No.107424489

>>107424449
Yes

Anonymous
12/03/25(Wed)17:37:22 No.107424496

Anonymous 12/03/25(Wed)17:37:22 No.107424496

>>107424467
What kind of mistake? You could edit your character card or system prompt around it. There is no such thing as a perfect model, so you basically have to play whack a mole in order to make any model actually usable.

Anonymous
12/03/25(Wed)17:43:29 No.107424582

Anonymous 12/03/25(Wed)17:43:29 No.107424582

>>107423834
>>107424437
Why do grifters pretend that their loras are tunes?

Anonymous
12/03/25(Wed)17:43:33 No.107424586

Anonymous 12/03/25(Wed)17:43:33 No.107424586

>>107424496
It conflated two completely separate pieces of information about the setting as somehow related to one another. Can't even say it's the quant either, since it's nearly 5bpw. Also raw completions, with just a system message stuck at the beginning to give it a very short amount of information on the fact that it is to act as a writing workshop. Tone down the sycophancy, provide constructive criticism, that sort of thing. Wouldn't cause it to just mix up two totally irrelevant pieces of info. I think I'm going to cut and paste the response into a file, then re-dl one of my old favorites that was particularly retarded but took things in interesting directions and compare

Anonymous
12/03/25(Wed)17:49:40 No.107424670

Anonymous 12/03/25(Wed)17:49:40 No.107424670

>>107424582
they have always been synonymous.

Anonymous
12/03/25(Wed)17:51:47 No.107424709

Anonymous 12/03/25(Wed)17:51:47 No.107424709

>>107424582
They ARE tunes.. just very superficial ones.

Anonymous
12/03/25(Wed)18:11:48 No.107424941

Anonymous 12/03/25(Wed)18:11:48 No.107424941

>>107424582
because admitting to making negligible changes on small models using a public dataset and a gamer gpu has negative effects on donation revenue

Anonymous
12/03/25(Wed)18:18:46 No.107425021

Anonymous 12/03/25(Wed)18:18:46 No.107425021

File: Antigravity_ngtwJEPTLH.png (93 KB, 1506x1005)

93 KB PNG

What's a good IDE with agents?
Similar to antigravity, but something that is open source and I can use my local (ollama) models?
I'm enjoying antigravity a lot but I'm frequently running into model limits

>inb4 its just a visual studio fork

Anonymous
12/03/25(Wed)18:20:23 No.107425040

Anonymous 12/03/25(Wed)18:20:23 No.107425040

>>107425021
its just a visual studio fork
get the real thing with roo cline

Anonymous
12/03/25(Wed)18:21:00 No.107425047

Anonymous 12/03/25(Wed)18:21:00 No.107425047

>>107424582
>Why do grifters pretend that their loras are tunes?

He's not exactly shilling that model. Looks like some guy just finetuned (LoRA *is* a way of fine tuning) GLM to make it work better with his software.

And I create them all the time. If I have a repetitive task I need to keep sending to a larger model, I'll often create a dataset and create an adapter for a smaller model.

Or for TTS models, if you want to add more voices, it's much easier to create a few adapters and load them on the fly.

You can load/unload them without having to swap models.

What I don't like, is when people train an adapter, merge it into the base model, then only release the full fucking model on HF.

Anonymous
12/03/25(Wed)18:23:11 No.107425073

Anonymous 12/03/25(Wed)18:23:11 No.107425073

>>107425021
As the other anon said. Visual Studio Code with an extension like Cline, Roo, Continue, Gemini Assistant, etc.
Or just use a cli.

Anonymous
12/03/25(Wed)18:29:23 No.107425155

Anonymous 12/03/25(Wed)18:29:23 No.107425155

>>107425047
>What I don't like, is when people train an adapter, merge it into the base model, then only release the full fucking model on HF.
elaborate on this? would you prefer that both the merged model and lora are released?

Anonymous
12/03/25(Wed)18:35:56 No.107425222

Anonymous 12/03/25(Wed)18:35:56 No.107425222

>>107425073
>>107425040
Ok thanks, I'll look into those.
Which one is the top choice for local agents?
I can't pay for a service but I have a 16gb vram card and I'm downloading Qwen3 coder.
I guess it should be good enough for simple tasks and I'll hit the bigger models when needed

Anonymous
12/03/25(Wed)18:40:54 No.107425294

Anonymous 12/03/25(Wed)18:40:54 No.107425294

File: 1761834532357711.gif (1.38 MB, 1364x1364)

1.38 MB GIF

I'm running Qwen 2.5-14B on a RTX 3070 with 8GB of RAM through KoboldCPP and then roleplaying (NSFW) through SillyTavern.
The results are fine for one on ones, although if the context gets full it starts repeating itself a ton and I have to restart with a summary instead, which is annoying but acceptable. However, the results for multi-character roleplays are pretty atrocious.
Is this something I can improve by changing models? Or is it an issue with the character cards/settings? Or is my hardware simply not sufficient? Any guidance is welcome, even if it's unrelated to my specific question (like informing my model is a terrible choice because of some shit I'm completely ignorant of).

Anonymous
12/03/25(Wed)18:41:15 No.107425298

Anonymous 12/03/25(Wed)18:41:15 No.107425298

>>107425155
>would you prefer that both the merged model and lora are released?

Yes! I often end up trying to extract it myself with mergekit.

That would make life so much easier for those of us who load/unload LoRAs on depending on the task we're doing. Not to mention saving on storage costs.

I wonder how many merged copies of of llama-3 70b are sitting on people's drives.

Anonymous
12/03/25(Wed)18:44:02 No.107425329

Anonymous 12/03/25(Wed)18:44:02 No.107425329

>>107425222
Roo is good for tasks, Continue for autocomplete and chat.
Qwen3 coder should be fine for simple stuff, you could stretch it if you have a lot of regular ram.
Not really local, but there's https://github.com/router-for-me/CLIProxyAPI to scam the free tiers on the CLIs.

Anonymous
12/03/25(Wed)18:44:08 No.107425330

Anonymous 12/03/25(Wed)18:44:08 No.107425330

>>107424941
remember how rich undi got. he drives a hummer now. no time for us peasants.

Anonymous
12/03/25(Wed)18:47:14 No.107425366

Anonymous 12/03/25(Wed)18:47:14 No.107425366

>>107425298
I myself like it when people provide loras alongside their finetunes I can merge them into another finetune I like in the same arch but at least with llamacpp, if you try to run a model with a lora unmerged, it runs on cpu and fucks your gen speed. Maybe this isnt the case with other frameworks (doubtful, or it's completely negligible if both are fully run in vram) but that's my guess as to why people dont bother sharing the lora and instead just share the full weights

Anonymous
12/03/25(Wed)18:48:55 No.107425387

Anonymous 12/03/25(Wed)18:48:55 No.107425387

>>107425047
>If I have a repetitive task I need to keep sending to a larger model
examples? always looking for inspiration from how others use theirs

Anonymous
12/03/25(Wed)18:51:49 No.107425418

Anonymous 12/03/25(Wed)18:51:49 No.107425418

>>107425366
lora works fine in transformers and exllama. VLLM too. It's only llama.cpp that can't hang. They used to let you merge the lora into quantized GGUF but that fell by the wayside.

Anonymous
12/03/25(Wed)19:02:59 No.107425539

Anonymous 12/03/25(Wed)19:02:59 No.107425539

Does /lmg/ have any /lmg/-approved loli cards for ERP?

Anonymous
12/03/25(Wed)19:03:35 No.107425547

Anonymous 12/03/25(Wed)19:03:35 No.107425547

>>107425418
It was wonky but I quantized loras, then merged them with the actual model via mergekit, though it's been a while and a lot has changed even in the last year. But if you're stuck on a non nvidia card from before AI really started being a thing you're not going to be using vllm or exl3 now. I do wish loras were more like sdxl loras where it was just a small ass thing that didn't suck ass but had an effect. Control vectors at best sort of came close to that, but that's a whole other irritating beast that isn't even supported on most models

Anonymous
12/03/25(Wed)19:06:11 No.107425570

Anonymous 12/03/25(Wed)19:06:11 No.107425570

>>107423762
Do you mean group chats? They're trickier to get to work well. Something like
>Now I'm going to reply from {{char}}'s point of view, writing in third person.
can help, but the model might just be too dumb.

Anonymous
12/03/25(Wed)19:08:40 No.107425594

Anonymous 12/03/25(Wed)19:08:40 No.107425594

>>107425539
Sure, she's named Elara on chub

Anonymous
12/03/25(Wed)19:09:56 No.107425612

Anonymous 12/03/25(Wed)19:09:56 No.107425612

>>107424320
https://huggingface.co/NousResearch/Hermes-4.3-36B-GGUF

Anonymous
12/03/25(Wed)19:13:23 No.107425656

Anonymous 12/03/25(Wed)19:13:23 No.107425656

>>107425021
Zed, if you don't want another vs code fork/plugin

Anonymous
12/03/25(Wed)19:18:31 No.107425723

Anonymous 12/03/25(Wed)19:18:31 No.107425723

>>107425294
>'m running Qwen 2.5-14B
Please, try nemo or Qwen 3 30B a3b.
Nobody should be running Qwen 2.5.
Note that the 30B model won't fit your VRAM but that's fine. It is a MoE and you should configure koboldcpp to have the experts running in RAM/CPU.

Anonymous
12/03/25(Wed)19:22:05 No.107425771

Anonymous 12/03/25(Wed)19:22:05 No.107425771

>>107425723
>won't fit your VRAM but that's fine
I seriously doubt this, because when I first set it up I did it incorrectly and it wasn't using my GPU, just my CPU and RAM. It was almost comically slow, often taking like >5 minutes to write a couple of sentences.

Anonymous
12/03/25(Wed)19:24:47 No.107425801

Anonymous 12/03/25(Wed)19:24:47 No.107425801

>>107425771
>often taking like >5 minutes to write a couple of sentences.
That is what moetards would call fast as fuck

Anonymous
12/03/25(Wed)19:25:29 No.107425809

Anonymous 12/03/25(Wed)19:25:29 No.107425809

>>107425771
Read about the differences between a dense and a sparse model, specifically mixture of experts.
The 30B model will run faster than the 14B model that'srunning only partially on your GPU.

Anonymous
12/03/25(Wed)19:25:36 No.107425812

Anonymous 12/03/25(Wed)19:25:36 No.107425812

>>107425771
install linux

Anonymous
12/03/25(Wed)19:27:51 No.107425844

Anonymous 12/03/25(Wed)19:27:51 No.107425844

>>107425539
***Elara Voss***

Anonymous
12/03/25(Wed)19:29:09 No.107425868

Anonymous 12/03/25(Wed)19:29:09 No.107425868

>>107425809
The 14B is running ENTIRELY on my GPU, not just partially. It is unbearably slow otherwise.

>>107425812
I have a dual boot system but see no reason it would matter which OS I'm using.

Anonymous
12/03/25(Wed)19:37:24 No.107425956

Anonymous 12/03/25(Wed)19:37:24 No.107425956

>>107425868
Dense models get unbearably slow on a normal computer if they have to touch RAM. MoE models can stay decently fast even if they don't fit in VRAM.

Anonymous
12/03/25(Wed)19:39:12 No.107425981

Anonymous 12/03/25(Wed)19:39:12 No.107425981

>>107425956
I really don't understand why you're advocating for this. You're saying it's slow, but not as slow as it would be otherwise? That doesn't make it good.

Anonymous
12/03/25(Wed)19:41:32 No.107426004

Anonymous 12/03/25(Wed)19:41:32 No.107426004

>>107425981
You can run smarter models at speeds that are still above reading speed.

Anonymous
12/03/25(Wed)19:43:50 No.107426027

Anonymous 12/03/25(Wed)19:43:50 No.107426027

>>107425868
>ENTIRELY on my GPU,
Ah, got it. I misunderstood then.
Well, I stand by my recommendations regardless.
Specially Nemo.

Anonymous
12/03/25(Wed)19:44:13 No.107426032

Anonymous 12/03/25(Wed)19:44:13 No.107426032

Has anyone tried the new 14b mistral? Is it better than Magistral?

Anonymous
12/03/25(Wed)19:44:51 No.107426041

Anonymous 12/03/25(Wed)19:44:51 No.107426041

>>107426032
maybe

Anonymous
12/03/25(Wed)19:44:58 No.107426044

Anonymous 12/03/25(Wed)19:44:58 No.107426044

>>107425981
Some people get really weird about the dense vs MoE thing. Basically, a MoE only uses a certain portion of it's size at a time.
So 30B-A3B would run about as fast as a regular 3B when offloaded. If you ran a MoE with 14B active, it would be as slow your 14B.
Whether 30B-A3B is as smart as a 3B or a 30B you would have to try for yourself.

Anonymous
12/03/25(Wed)19:45:12 No.107426046

Anonymous 12/03/25(Wed)19:45:12 No.107426046

>>107426032
lol no it's 24b pruned

Anonymous
12/03/25(Wed)19:47:31 No.107426071

Anonymous 12/03/25(Wed)19:47:31 No.107426071

>>107426044
That's only particularly true. If you only have some experts on RAM, then only the tokens generated with those experts will be slow, that’s why partial offload works significantly better with MoE models

Anonymous
12/03/25(Wed)20:15:36 No.107426384

Anonymous 12/03/25(Wed)20:15:36 No.107426384

>>107426044
Compute time doesn't lie. When you gen images and add more steps, the pics often look better.
So now you take an A3B and its really really fast. I'm sure the two are completely unrelated and it's exactly like a real 30b.

Anonymous
12/03/25(Wed)20:17:05 No.107426397

Anonymous 12/03/25(Wed)20:17:05 No.107426397

>>107426384
>So now you take an A3B and its really really fast. I'm sure the two are completely unrelated and it's exactly like a real 30b.
compared to old 30bs like qwen qwq, yes. its almost as if architectural improvements allow for many other different types of improvements

Anonymous
12/03/25(Wed)20:17:33 No.107426400

Anonymous 12/03/25(Wed)20:17:33 No.107426400

>>107426384
beats the 32b on benchmarks

Anonymous
12/03/25(Wed)20:19:02 No.107426416

Anonymous 12/03/25(Wed)20:19:02 No.107426416

>>107426397
Ahh benchmarks, the ultimate test of quality. And architecture. Like mistral managed to ass up deepseek. Maybe it's the curse of killing wendy.

Anonymous
12/03/25(Wed)20:30:49 No.107426547

Anonymous 12/03/25(Wed)20:30:49 No.107426547

>>107426004
I'm not sure a smarter model would really make a difference for smut, but I can try I guess.

>>107426027
Nemo seems to be a framework, and I'm not going to train my own LLM. Why would you even recommend such a thing?

>>107426044
Like above, I'll try I guess. But it just seems more relevant for someone trying to cheat on their homework or something.

Anonymous
12/03/25(Wed)20:32:58 No.107426575

Anonymous 12/03/25(Wed)20:32:58 No.107426575

>he doesn't know nemo

Anonymous
12/03/25(Wed)20:35:51 No.107426609

Anonymous 12/03/25(Wed)20:35:51 No.107426609

>>107426547
>Nemo seems to be a framework
In the off chance you are not trolling, search for mistral nemo.

Anonymous
12/03/25(Wed)20:43:57 No.107426686

Anonymous 12/03/25(Wed)20:43:57 No.107426686

>>107426609
>mistral nemo
Why would you not say this? It's completely different from NeMo itself.
This is like
>Have you tried Unreal?
>That's a game engine, not a game.
>he doesn't know Fortnite

Anonymous
12/03/25(Wed)20:44:59 No.107426702

Anonymous 12/03/25(Wed)20:44:59 No.107426702

>>107426686
>That's a game engine, not a game.
Fucking kids these days, man

Anonymous
12/03/25(Wed)20:47:55 No.107426740

Anonymous 12/03/25(Wed)20:47:55 No.107426740

>>107426686
Since you started you had very strong opinions on stuff you clearly know nothing about. Do you still not understand what a MoE is?
>Have you tried Unreal?
>That's a game engine, not a game.
kek

Anonymous
12/03/25(Wed)20:48:17 No.107426747

Anonymous 12/03/25(Wed)20:48:17 No.107426747

>>107426686
Masterfully created bait

Anonymous
12/03/25(Wed)20:52:13 No.107426782

Anonymous 12/03/25(Wed)20:52:13 No.107426782

>>107426702
>>107426740
>>107426747
I genuinely can't tell if you all just decide to troll everyone who comes in or are too terminally online to realize other people don't have the same perspective as you.
Googlin 'nemo llm' gives this: https://github.com/NVIDIA-NeMo/NeMo
Not the model found by googling 'mistral nemo' here: https://mistral.ai/news/mistral-nemo
There is no possible way a reasonable person without prior knowledge would know that they should add 'mistral' to the search.

Anonymous
12/03/25(Wed)21:02:33 No.107426871

Anonymous 12/03/25(Wed)21:02:33 No.107426871

Is it possible to merge GLM Air with GLM 4.6?

Anonymous
12/03/25(Wed)21:03:33 No.107426882

Anonymous 12/03/25(Wed)21:03:33 No.107426882

>>107426782
The next thing you say is that you didn’t know Unreal Engine was named after a game

Anonymous
12/03/25(Wed)21:05:39 No.107426906

Anonymous 12/03/25(Wed)21:05:39 No.107426906

File: m.png (151 KB, 1031x747)

151 KB PNG

>>107426871
Not again...

Anonymous
12/03/25(Wed)21:10:27 No.107426961

Anonymous 12/03/25(Wed)21:10:27 No.107426961

>>107426906
but is it doe

Anonymous
12/03/25(Wed)21:11:14 No.107426971

Anonymous 12/03/25(Wed)21:11:14 No.107426971

>>107426871
Someone tried to do that with SVD distillation. It was hilarious, he got glazed hard by Gemini and thought he'd done something "clever".

Turns out all he did was convert the GLM-4.5-air weights to FP32 lol

No, you can't merge them, they're different architectures. Best you can do is generate slop-logs from 4.6 and SFT Air with them.

Anonymous
12/03/25(Wed)21:12:26 No.107426983

Anonymous 12/03/25(Wed)21:12:26 No.107426983

Why do merges even exist?

Anonymous
12/03/25(Wed)21:25:36 No.107427100

Anonymous 12/03/25(Wed)21:25:36 No.107427100

File: girl sitting on edge of p(...).png (537 KB, 930x887)

537 KB PNG

>>107426983
Mainly to fulfill the merger's desire to contribute with minimal effort. Merges are the product of yet another contraption created to fling shit at the wall in a novel way. They can undeniably change the way the resulting model behaves, but the changes are not always positive. When they are, great; see Mythomix/Max. When they aren't anything special, the shitty model should be deleted.
The problems arise when the mergers try to grift and peddle their bucket of shit, claiming it has value, and perpetuating their delusions of being helpful.

Anonymous
12/03/25(Wed)21:49:34 No.107427340

Anonymous 12/03/25(Wed)21:49:34 No.107427340

>>107423191
how did you manage to draw me?

Anonymous
12/03/25(Wed)21:59:58 No.107427449

Anonymous 12/03/25(Wed)21:59:58 No.107427449

>>107425539
loli miku

Anonymous
12/03/25(Wed)22:10:55 No.107427535

Anonymous 12/03/25(Wed)22:10:55 No.107427535

why can't we have stuff like 100B-A50B moes?

Anonymous
12/03/25(Wed)22:16:37 No.107427578

Anonymous 12/03/25(Wed)22:16:37 No.107427578

>>107427535
that would be antisemitic

Anonymous
12/03/25(Wed)22:20:42 No.107427611

Anonymous 12/03/25(Wed)22:20:42 No.107427611

>>107427535
Closest to that is Grok 2, 270b total, 115b active.

Anonymous
12/03/25(Wed)22:23:46 No.107427636

Anonymous 12/03/25(Wed)22:23:46 No.107427636

>>107427535
not sparse enough to be worth it

Anonymous
12/03/25(Wed)22:40:34 No.107427771

Anonymous 12/03/25(Wed)22:40:34 No.107427771

>>107427535
Or, hear me out, a dense 50B model. Impossible, I know. But imagine.

Anonymous
12/03/25(Wed)22:54:34 No.107427858

Anonymous 12/03/25(Wed)22:54:34 No.107427858

>>107427636
more sparsity just makes models dumber for stuff like generalization even if they have access to tons of knowledge

Anonymous
12/03/25(Wed)22:55:09 No.107427860

Anonymous 12/03/25(Wed)22:55:09 No.107427860

>>107427771
>dense 50B
Completely pointless. Too big for a high end consumer card, too small for 2 high end consumer/1 enterprise card.

Anonymous
12/03/25(Wed)22:57:07 No.107427870

Anonymous 12/03/25(Wed)22:57:07 No.107427870

my lungs are collapsing, i need air

Anonymous
12/03/25(Wed)22:57:51 No.107427880

Anonymous 12/03/25(Wed)22:57:51 No.107427880

>>107427535
Furthermore, it should be majority dense parameters so that the experts are more for knowledge augmentation than being relied on for the model's main intelligence.

>>107427771
There's no reason to go full dense when you've got perfectly fine and fast RAM sitting right there to use for knowledge augmentation experts. You did buy RAM before the price hikes, didn't you?

Anonymous
12/03/25(Wed)22:58:45 No.107427885

Anonymous 12/03/25(Wed)22:58:45 No.107427885

>>107427771
Maybe the sun and moon are the same size when viewed from the earth, but a good size to performance ratio model that fits neatly in a tricked-out consumer setup is just not in the cards. The non-existence isn’t a conspiracy, just a sad reality

Anonymous
12/03/25(Wed)23:03:52 No.107427918

Anonymous 12/03/25(Wed)23:03:52 No.107427918

>>107427870
Have your Miku use your body for practicing mouth to mouth resuscitation.

Anonymous
12/03/25(Wed)23:19:19 No.107428018

Anonymous 12/03/25(Wed)23:19:19 No.107428018

I tried out TOSS 120b "derestricted" after all the recent shilling and this is the last time i'm falling for the abliteration meme. What's the deal with all the people acting like it's some big development? The model is STILL gigacucked, and it becomes very evident how hard they scrubbed its dataset clean whenever it has to write something vaguely spicy.
I suppose the technique might be convenient to eliminate refusals with less filtered models, but I can't believe people are really using it with stuff like gemma or TOSS and thinking it actually fixes them

Anonymous
12/03/25(Wed)23:34:39 No.107428133

Anonymous 12/03/25(Wed)23:34:39 No.107428133

the people need air

Anonymous
12/03/25(Wed)23:37:02 No.107428151

Anonymous 12/03/25(Wed)23:37:02 No.107428151

>>107428133
Only poor people need air

Anonymous
12/03/25(Wed)23:37:44 No.107428157

Anonymous 12/03/25(Wed)23:37:44 No.107428157

>>107428151
i need a model i can run in fp16

Anonymous
12/03/25(Wed)23:56:22 No.107428297

Anonymous 12/03/25(Wed)23:56:22 No.107428297

>>107428151
imagine being such a poorfag that you still have physical needs lmao

Anonymous
12/04/25(Thu)00:13:07 No.107428459

Anonymous 12/04/25(Thu)00:13:07 No.107428459

>>107428157
Gemma 4b has you covered.

Anonymous
12/04/25(Thu)00:14:04 No.107428465

Anonymous 12/04/25(Thu)00:14:04 No.107428465

>>107428459
i have 4 blackwell pros. enough for air fp16, but not 4.6 full fp16

Anonymous
12/04/25(Thu)00:17:34 No.107428505

Anonymous 12/04/25(Thu)00:17:34 No.107428505

>>107428465
2 PSU?

Anonymous
12/04/25(Thu)00:17:53 No.107428512

Anonymous 12/04/25(Thu)00:17:53 No.107428512

>>107428505
max qs

Anonymous
12/04/25(Thu)01:58:35 No.107429309

Anonymous 12/04/25(Thu)01:58:35 No.107429309

i don't understand much about the AI lingo so forgive me:
right now i'm using perplexity pro (got it for free) but i don't like the censorship and want to replace it with a local llm
i downloaded lm studio and i have an m4 macbook air with 24gb of ram, what should i be running as a "general" text model (so like chatgpt, i can use it to research stuff, do math, generate code, just chat, etc, a jack of all trades sort of thing)
is gpt-oss good?

Anonymous
12/04/25(Thu)02:00:34 No.107429323

Anonymous 12/04/25(Thu)02:00:34 No.107429323

>>107429309
Gemma 4 is all you need

Anonymous
12/04/25(Thu)02:03:34 No.107429346

Anonymous 12/04/25(Thu)02:03:34 No.107429346

>>107429309
There's mistral-small, qwen-3-30ba3b, gemma 27b or 12b, gpt-oss...
You'll have to test them yourself to see if they're good enough for you.

Anonymous
12/04/25(Thu)02:05:18 No.107429359

Anonymous 12/04/25(Thu)02:05:18 No.107429359

File: Screenshot 2025-12-04 at (...).png (53 KB, 615x488)

53 KB PNG

>>107429323
this? or am i looking at the wrong one? i'm a layman but i very much doubt something released 150 days ago is the best option in this space lol, shit gets released and made obsolete every other week
>>107429346
i'm using gpt-oss 20b right now, seems good enough but i'd like to know if there's better stuff i should be running instead

Anonymous
12/04/25(Thu)02:08:56 No.107429382

Anonymous 12/04/25(Thu)02:08:56 No.107429382

>>107429359
>better stuff
I mentioned a few modes. Try them. Keep the one(s) you like. Better is subjective.
gemma-4 is not out, but there's gemma-3 27b and 12b.

Anonymous
12/04/25(Thu)02:10:12 No.107429391

Anonymous 12/04/25(Thu)02:10:12 No.107429391

File: Screenshot 2025-12-04 000712.png (1.24 MB, 1438x1276)

1.24 MB PNG

>>107429323
Sorry, Gemma has been deemed too unamerican. Please understand

Anonymous
12/04/25(Thu)02:15:04 No.107429427

Anonymous 12/04/25(Thu)02:15:04 No.107429427

>107416103
Is slot id a lottery? Can I choose which slot to start with at server's start? This was a fresh start, and keeps picking id3 which I do not know BEFORE I start processing some prompt. But I want to load a pre-caches one

slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
slot launch_slot_: id  3 | task 1 | processing task
slot update_slots: id  3 | task 1 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 289
slot update_slots: id  3 | task 1 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 1 | prompt processing progress, n_tokens = 225, batch.n_tokens = 225, progress = 0.778547
slot update_slots: id  3 | task 1 | n_tokens = 225, memory_seq_rm [225, end)
slot update_slots: id  3 | task 1 | prompt processing progress, n_tokens = 289, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  3 | task 1 | prompt done, n_tokens = 289, batch.n_tokens = 64
slot update_slots: id  3 | task 1 | created context checkpoint 1 of 8 (pos_min = 224, pos_max = 224, size = 75.376 MiB)
srv  log_server_r: request: GET /props 127.0.0.1 200
srv  log_server_r: request: GET /slots 127.0.0.1 200
srv  log_server_r: request: GET /slots 127.0.0.1 200
srv  log_server_r: request: GET /slots 127.0.0.1 200
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
srv          stop: cancel task, id_task = 1
slot      release: id  3 | task 1 | stop processing: n_tokens = 470, truncated = 0
srv  update_slots: all slots are idle

Anonymous
12/04/25(Thu)02:16:49 No.107429442

Anonymous 12/04/25(Thu)02:16:49 No.107429442

>>107429359
The one on Gentoo is the best one

Anonymous
12/04/25(Thu)02:21:00 No.107429470

Anonymous 12/04/25(Thu)02:21:00 No.107429470

>>107429427
With a single slot btw

--ctx-size 131072 -np 1 
--n-gpu-layers 99 
--no-warmup 
--cpu-moe 
--jinja 
--slots 
--slot-save-path "/home/ai/Desktop/KVCACHE/" 
--port 8000

Anonymous
12/04/25(Thu)02:21:38 No.107429473

Anonymous 12/04/25(Thu)02:21:38 No.107429473

So given that western RAM companies are now jerking their dicks to Altman and his incestual billionaire posse and giving the middle finger to consumers, what the hell does that mean for the future of local?
Are we going to be sucking Steve Jobs decayed teat? Are we going to be stocking up on 3090s? Are we just biding our time until either the bubble pops or Xi swoops in and undercuts the API virgins in price at every corner?

Anonymous
12/04/25(Thu)02:22:54 No.107429484

Anonymous 12/04/25(Thu)02:22:54 No.107429484

>>107429473
its either waiting for the chinks or building your own hardware from scratch like terry did with temple os

Anonymous
12/04/25(Thu)02:25:40 No.107429504

Anonymous 12/04/25(Thu)02:25:40 No.107429504

>>107429473
It's temporary artificial scarcity. Altman doesn't have the cash to make good on the orders he made. If he can't find investors to foot the bill, the orders will have to be canceled and that capacity that he reserved will go back on the market.

Anonymous
12/04/25(Thu)02:50:33 No.107429676

Anonymous 12/04/25(Thu)02:50:33 No.107429676

>>107429427
>>107429470
Weird. I always get id 0. Even if i try to force it to something else with id_slot in the request it does nothing. Tried saving a few caches with -np 2, then loading them with -np 1 to see if there's some slot id saved but i couldn't make it happen. Always id 0 with -np 1.
Are you 100% sure you're running the correct script?

Anonymous
12/04/25(Thu)02:57:44 No.107429723

Anonymous 12/04/25(Thu)02:57:44 No.107429723

Damn, imagefags eating so good and here we are having fucking nothing.
Unless you can run the really big moe monsters or enjoy the qwen/gemma slop.
No mid range dense model or like a fast 120b moe model that is for something else than math and tool calling.

Anonymous
12/04/25(Thu)03:17:40 No.107429853

Anonymous 12/04/25(Thu)03:17:40 No.107429853

>"something about this" *She gestured vaguely between them.*
Yep it's slopping time.

Anonymous
12/04/25(Thu)03:41:40 No.107430027

Anonymous 12/04/25(Thu)03:41:40 No.107430027

https://huggingface.co/openai/ChatGPT-6-2T-A300B-GGUF

Anonymous
12/04/25(Thu)03:44:14 No.107430045

Anonymous 12/04/25(Thu)03:44:14 No.107430045

>>107430027
catpic

Anonymous
12/04/25(Thu)04:00:07 No.107430186

Anonymous 12/04/25(Thu)04:00:07 No.107430186

File: 1762888821172375.jpg (190 KB, 434x509)

190 KB JPG

I read somewhere that GLM4.6 Air will be a smaller model, but I can't find a source.

Anonymous
12/04/25(Thu)04:02:36 No.107430204

Anonymous 12/04/25(Thu)04:02:36 No.107430204

>>107430186
https://www.chinatalk.media/p/the-zai-playbook
>Zixuan Li: For our next generation, we are going to launch 4.6 Air. I don’t know whether it will be called Mini, but it is a 30-billion-parameter model. It becomes a lot smaller in a couple of weeks. That’s all for 2025.
>
>For 2026, we are still doing experiments, like what I said, trying to explore more. We are doing these experiments on smaller models, so they will not be put into practice in 2026. However, it gives us a lot of ideas on how we are going to train the next generation. We will see. When this podcast launches, I believe we already have 4.6 Air, 4.6 Mini, and also the next 4.6 Vision model.

Anonymous
12/04/25(Thu)04:04:56 No.107430219

Anonymous 12/04/25(Thu)04:04:56 No.107430219

>>107430204
two more weeks status?

Anonymous
12/04/25(Thu)04:07:08 No.107430230

Anonymous 12/04/25(Thu)04:07:08 No.107430230

>>107430219
>Nov 21, 2025
In a few days it will be two weeks.

Anonymous
12/04/25(Thu)04:16:26 No.107430290

Anonymous 12/04/25(Thu)04:16:26 No.107430290

>>107429676
As far as llama-server is concerned, YES

I have other things loaded on GPU in the same time, but unrelated

Anonymous
12/04/25(Thu)04:23:40 No.107430333

Anonymous 12/04/25(Thu)04:23:40 No.107430333

>>107430204
The wording is a little strange here, but I understand they're going to be releasing 4.6 Air and 4.6 Mini as well as a vision model. I wonder if both Air and Mini will retain the RP of 4.6.

Anonymous
12/04/25(Thu)04:34:33 No.107430410

Anonymous 12/04/25(Thu)04:34:33 No.107430410

Prediction for gpu prices for the next two years?

Anonymous
12/04/25(Thu)04:37:58 No.107430430

Anonymous 12/04/25(Thu)04:37:58 No.107430430

>>107430204
>4.6-Air is 30 billion parameters
HAHAHAHAHAHAHAHAHAHA
Local is over

Anonymous
12/04/25(Thu)04:48:34 No.107430483

Anonymous 12/04/25(Thu)04:48:34 No.107430483

>>107430410
They may or may not remain the same. One of those two.

Anonymous
12/04/25(Thu)04:49:25 No.107430490

Anonymous 12/04/25(Thu)04:49:25 No.107430490

>>107430430
See >>107430333

Anonymous
12/04/25(Thu)04:50:52 No.107430499

Anonymous 12/04/25(Thu)04:50:52 No.107430499

>>107430430
retard

Anonymous
12/04/25(Thu)05:11:06 No.107430598

Anonymous 12/04/25(Thu)05:11:06 No.107430598

>>107430410
If there's one thing you can count on, it's that things can always get worse.

Anonymous
12/04/25(Thu)05:17:27 No.107430643

Anonymous 12/04/25(Thu)05:17:27 No.107430643

>>107430430
upgrade from 8gb vram

Anonymous
12/04/25(Thu)05:20:36 No.107430661

Anonymous 12/04/25(Thu)05:20:36 No.107430661

Is local going to have to hope for optimizations or new architectures for the next 2 years? Next-gen Memory and GPU will be unaffordable when they come out, if they even come out with this shift towards data-centers.

Anonymous
12/04/25(Thu)05:22:21 No.107430673

Anonymous 12/04/25(Thu)05:22:21 No.107430673

>>107430661
No. No one wants you to be able to run models locally. Open weights releases will just get smaller.

Anonymous
12/04/25(Thu)05:25:59 No.107430693

Anonymous 12/04/25(Thu)05:25:59 No.107430693

>>107430204
>>107430430
the Z team is the same team that made Z-image turbo and they destroyed the competition with a 6b model, they know how to make great small models

Anonymous
12/04/25(Thu)05:30:03 No.107430726

Anonymous 12/04/25(Thu)05:30:03 No.107430726

File: airft.png (85 KB, 1058x820)

85 KB PNG

>>107423762
lol, lmao even

Anonymous
12/04/25(Thu)05:34:17 No.107430757

Anonymous 12/04/25(Thu)05:34:17 No.107430757

Is it just me or is ministral absolutely fucking useless?
Absolutely worthless built in general knowledge and constantly fails to actually call tools properly despite claiming it has (or worse, DOES call the tool and gets the result then claims it "can't do that"). Then if it's not fucking up tool calling it just gets stuck in endless think blocks needing a manual stop or retry.
Fiddling with temp and repeat penalty sometimes seems to help but it eventually just falls over again.
At least it seems fairly uncensored but other than that what a waste of time this has been.

Anonymous
12/04/25(Thu)05:38:02 No.107430777

Anonymous 12/04/25(Thu)05:38:02 No.107430777

>>107430673
I'm beginning to think this is true. Are porn and some hallucinated instructions you can look up on google really that hated?

Anonymous
12/04/25(Thu)05:39:02 No.107430789

Anonymous 12/04/25(Thu)05:39:02 No.107430789

>>107430757
It might be an issue with the quants as I've heard there aren't that many issues on openrouter. I could be wrong though. On winblows so I'm waiting for ooba.

Anonymous
12/04/25(Thu)05:40:37 No.107430799

Anonymous 12/04/25(Thu)05:40:37 No.107430799

>>107430757
It's pruned Mistral Small 3.1 24B + (probably) a few hundred billion tokens of healing + updated instruct post-training. I think they did something to their vision encoder too though, because it performs markedly worse than Mistral Small 3.2's even though it should be the same. I find it almost useless for roleplay since it fucks up format every time, even at low temperature.

https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512
>Sampling Parameters: Use a temperature below 0.1 for daily-driver and production environments; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings.

Anonymous
12/04/25(Thu)05:42:26 No.107430810

Anonymous 12/04/25(Thu)05:42:26 No.107430810

>>107430777
Its probably that all of the western frontier labs decided to keep things on lockdown. That plus open source eats into their profits. I thought western labs would have pivoted hard to open source since OpenAI decided to actually release OSS to counter chinese dominance in the open source space but it's quite possible we get drip fed instead.

Anonymous
12/04/25(Thu)05:42:50 No.107430811

Anonymous 12/04/25(Thu)05:42:50 No.107430811

File: 1764455523452284.png (1.43 MB, 1152x896)

1.43 MB PNG

I wonder if a hybrid interleaved diffusion transformer can benefit reasoning LLMs. Currently they're bad at making in place revisions and edits to their own memory. Meanwhile such functionality comes inherently with diffusion. But autoregressive still has some of its own advantages. So why not try interleaving them, though ideally in a way that the model itself is still unified so parameters aren't wasted. For instance, when receiving a query, the model would first reason in AR mode, generate a draft, and then reason, and then edit the draft in diffusion mode. So nothing else but the editing is done with diffusion, and it uses the AR-generated reasoning to guide the editing. Then it would go back to AR mode to reason more, and so on and so forth. Though tbf I don't know how current diffusion LLMs truly work. My intuition is that this would work well because image models are pretty good at following prompt from the text encoder.

Actually you know what, I think Google might already be somewhat ahead of me. I'm surely not the only one who has thought this.

Anonymous
12/04/25(Thu)05:46:12 No.107430831

Anonymous 12/04/25(Thu)05:46:12 No.107430831

>>107430693
Z-image is from Alibaba, though?
https://huggingface.co/Tongyi-MAI
https://huggingface.co/zai-org

Anonymous
12/04/25(Thu)05:48:02 No.107430844

Anonymous 12/04/25(Thu)05:48:02 No.107430844

>>107430789
>On winblows so I'm waiting for ooba.
just use llama or kobold, oogabooga is outdated shite.

Anonymous
12/04/25(Thu)05:48:17 No.107430848

Anonymous 12/04/25(Thu)05:48:17 No.107430848

File: uykPSOEtzeQkWNiVlsS_vY1EX(...).jpg (84 KB, 900x900)

84 KB JPG

paste your saved stories into context in mikupad before starting a new one to avoid having to steer it too much

Anonymous
12/04/25(Thu)05:54:21 No.107430883

Anonymous 12/04/25(Thu)05:54:21 No.107430883

File: 1745229650643447.png (437 KB, 847x974)

437 KB PNG

It's over for the Rammaxing fags lol

Anonymous
12/04/25(Thu)05:58:00 No.107430909

Anonymous 12/04/25(Thu)05:58:00 No.107430909

>>107430883
juste don't poor ?

Anonymous
12/04/25(Thu)05:58:55 No.107430920

Anonymous 12/04/25(Thu)05:58:55 No.107430920

>>107430909
>juste
saar?

Anonymous
12/04/25(Thu)06:02:59 No.107430952

Anonymous 12/04/25(Thu)06:02:59 No.107430952

>>107430831
yeah you're right my b

Anonymous
12/04/25(Thu)06:03:26 No.107430956

Anonymous 12/04/25(Thu)06:03:26 No.107430956

>>107430920
baguette

Anonymous
12/04/25(Thu)06:03:43 No.107430957

Anonymous 12/04/25(Thu)06:03:43 No.107430957

>>107430757
>Is it just me or is ministral absolutely fucking useless?
Here's what I noticed from the most basic translation prompts (that I use as a filter for "can't be arsed to care to test this model further if it fails at that") I gave those models:
they are the first of any "major" labs to release models in 2025 that don't listen to "no commentary" as an instruction to shut the fuck up and not write TL notes and other garbage
and they can't even handle doing more than 50 lines in one go without breaking (by breaking it does things like outputting a single line and acting like everything else didn't exist. And yes, I did set up context length correctly.)
translation isn't the only thing I test in models, but failing so bad just makes me rm this shit immediately, those models would have looked good two years ago but with models like qwen 3, gpt-oss 20b and gemma 3n I see no reason for these things to exist, even the 14b is just not coherent enough

Anonymous
12/04/25(Thu)06:05:33 No.107430967

Anonymous 12/04/25(Thu)06:05:33 No.107430967

>>107430333
It's a transcription of a dude on a podcast speaking chinkgrish, but yeah if you listen to the original audio the pauses makes it clear he's referring to two different models

Anonymous
12/04/25(Thu)06:13:13 No.107431021

Anonymous 12/04/25(Thu)06:13:13 No.107431021

File: 1750423337779359.png (122 KB, 1400x1400)

122 KB PNG

>>107430430
uweeeee~ ojisan. jiisaii. kimoiiii

Anonymous
12/04/25(Thu)06:14:10 No.107431029

Anonymous 12/04/25(Thu)06:14:10 No.107431029

>>107430726
But unsloth told me I can finetune deepseek on a 3090

Anonymous
12/04/25(Thu)06:17:58 No.107431042

Anonymous 12/04/25(Thu)06:17:58 No.107431042

File: Screenshot at 2025-12-04 (...).png (145 KB, 954x1151)

145 KB PNG

>>107430957
Yeah it's insanely bad, either there are actual bugs or they benchmarkmaxed it to oblivion and it literally cannot do anything else.
Back to gemma for now at least it mostly does the right thing even if it's a bit boring.

Anonymous
12/04/25(Thu)06:24:09 No.107431066

Anonymous 12/04/25(Thu)06:24:09 No.107431066

Hatsune Chief

Anonymous
12/04/25(Thu)06:33:42 No.107431120

Anonymous 12/04/25(Thu)06:33:42 No.107431120

>>107430810
Chinese labs make a splash and then pull back too. Some of their stuff went proprietary after the initial drop. Big MoE count on us not being able to run them.
A new deepseek 2.5 would have been excellent and infinitely more palatable. Nowhere to be found. Instead they copy western labs with parroting and only giving useless smalls or mega weights.

Anonymous
12/04/25(Thu)06:35:25 No.107431135

Anonymous 12/04/25(Thu)06:35:25 No.107431135

File: file.png (85 KB, 337x496)

85 KB PNG

>>107430883
honestly not that bad when all things considered

Anonymous
12/04/25(Thu)06:37:39 No.107431152

Anonymous 12/04/25(Thu)06:37:39 No.107431152

Hatsune Queef

Anonymous
12/04/25(Thu)06:43:01 No.107431191

Anonymous 12/04/25(Thu)06:43:01 No.107431191

File: miku-holding-gemma.png (1.09 MB, 790x1054)

1.09 MB PNG

Was Gemma 3 the only all-around good, consumer GPU-sized official release of the year? It's getting depressing.

Anonymous
12/04/25(Thu)07:20:46 No.107431406

Anonymous 12/04/25(Thu)07:20:46 No.107431406

ministral: good shit or skip?
I'm still using glm air 4.5 (moe) / cydonia (dense) but I'm not really happy with either

Anonymous
12/04/25(Thu)07:24:14 No.107431441

Anonymous 12/04/25(Thu)07:24:14 No.107431441

>>107431406
can only be worse than cyd as it's a prune of mistral 24b

Anonymous
12/04/25(Thu)07:33:25 No.107431498

Anonymous 12/04/25(Thu)07:33:25 No.107431498

>>107431441
They obviously didn't just shave the model down, the model doesn't write at all like Mistral Small 3.1/3.2. It's much less dry and restrained than those, but also apparently overfit on a DeepSeek-R1-like writing style (to the point that formatting instructions may often be ignored). Vision performance seems worse.

Anonymous
12/04/25(Thu)07:40:11 No.107431536

Anonymous 12/04/25(Thu)07:40:11 No.107431536

>>107423247
there's a cute (irl) girl (female) dressing just like that that lives near me and it looks hot as fuck
well except for the bare thighs part, those must get pretty cold

llama.cpp CUDA dev !!yhbFjk57TDr
12/04/25(Thu)07:41:42 No.107431547

llama.cpp CUDA dev !!yhbFjk57TDr 12/04/25(Thu)07:41:42 No.107431547

>>107430883
That is not the kind of RAM you would be buying for a CPUmaxx build though.
When bulk ordered off of Alibaba the current price for DDR5 server RAM seems to be ~7 € / GB (EU import tax included).
That's still ~10k € just for the RAM if you populate 24 DIMM slots with 64 GB modules.
Though if I can get it recognized as an expense for research and development I would essentially be getting a 77% discount via tax incentives and I'll probably buy 96 or 128 GB modules.

Anonymous
12/04/25(Thu)07:43:59 No.107431563

Anonymous 12/04/25(Thu)07:43:59 No.107431563

>>107431547
>Though if I can get it recognized as an expense for research and development I would essentially be getting a 77% discount via tax incentives
ok, nice blog dude

Anonymous
12/04/25(Thu)07:44:43 No.107431570

Anonymous 12/04/25(Thu)07:44:43 No.107431570

File: file.png (656 KB, 1151x863)

656 KB PNG

>>107431536
>well except for the bare thighs part, those must get pretty cold
That can be fixed

Anonymous
12/04/25(Thu)07:46:47 No.107431585

Anonymous 12/04/25(Thu)07:46:47 No.107431585

>>107431547
>That's still ~10k € just for the RAM if you populate 24 DIMM slots with 64 GB modules.
Is anybody actually doing this? That's a lot of money to blow on e-waste
I could mayyybe understand getting quad channel, 8 ram slot machine, which is the upper limit of 'only moderately unreasonable price, can run big moe at slow but still kind-of usable speed' but going to 24 x 64gb makes buying a maxed out mac ultra seem like the better option

Anonymous
12/04/25(Thu)07:52:24 No.107431622

Anonymous 12/04/25(Thu)07:52:24 No.107431622

File: haha.png (1.09 MB, 672x1861)

1.09 MB PNG

>>107431536
>>107423247

Anonymous
12/04/25(Thu)07:53:51 No.107431630

Anonymous 12/04/25(Thu)07:53:51 No.107431630

>>107431622
>red hair
Don't put your dick in that.

Anonymous
12/04/25(Thu)08:02:26 No.107431678

Anonymous 12/04/25(Thu)08:02:26 No.107431678

>>107431630
Those are the best, just don't let her find out your real name or address.

llama.cpp CUDA dev !!yhbFjk57TDr
12/04/25(Thu)08:03:56 No.107431690

llama.cpp CUDA dev !!yhbFjk57TDr 12/04/25(Thu)08:03:56 No.107431690

>>107431585
As of right now there is (to my knowledge) no software for language model inference that is properly utilizing memory channels that are spread across multiple NUMA nodes.
So I would be buying the RAM primarily to develop such software and to figure out what the upper limit of performance given proper software support is.
I would definitely not be blowing that much money on a toy though.

>mac
To my knowledge the biggest memory configuration available for that is 512 GB.
I already have that much in my DDR4 server and quite honestly I ended up regretting not buying more since it's not enough to run models like Deepseek or Kimi at 8 bit precision.

Anonymous
12/04/25(Thu)08:06:02 No.107431719

Anonymous 12/04/25(Thu)08:06:02 No.107431719

File: the-dell-dude.png (240 KB, 400x470)

240 KB PNG

>>107430883
Now Dell has an excuse for charging 500 dollars for a slight RAM upgrade. Dude bros we are so back.

Anonymous
12/04/25(Thu)08:07:08 No.107431733

Anonymous 12/04/25(Thu)08:07:08 No.107431733

>>107431719
Apple's previously ridiculous pricing for ram upgrades looking cheap at this point.

Anonymous
12/04/25(Thu)08:10:03 No.107431770

Anonymous 12/04/25(Thu)08:10:03 No.107431770

>the 96gb ddr5 kit I bought for $250 is now over $1000
all we really need is for CPUs to somehow get a 250% price increase to make really sure nobody uses anything but phones as their computing devices

Anonymous
12/04/25(Thu)08:17:35 No.107431847

Anonymous 12/04/25(Thu)08:17:35 No.107431847

>>107431733
Apple have always been ripoff artists but Dell actually used to offer decent value once upon a time- competing with the DIY market. So for them it's a fall from grace.

Anonymous
12/04/25(Thu)08:18:03 No.107431853

Anonymous 12/04/25(Thu)08:18:03 No.107431853

>>107431770
Average people were and will just use phones regardless, so simple a monkey can use them and even the prices of phones tripling wouldn't matter since most people rent them through payment plans anyway

Anonymous
12/04/25(Thu)08:22:02 No.107431890

Anonymous 12/04/25(Thu)08:22:02 No.107431890

>>107431690
What makes developing for NUMA different? I think memory allocation is already NUMA aware

Anonymous
12/04/25(Thu)08:22:57 No.107431897

Anonymous 12/04/25(Thu)08:22:57 No.107431897

Am I the only one who's legitimately freaked out by this ram thing? Likes it's not just the ram, it's the basic admission that everyone who isn't a mega corporation can be safely disregard. Sure it's ram now, but what's next?

Anonymous
12/04/25(Thu)08:23:53 No.107431901

Anonymous 12/04/25(Thu)08:23:53 No.107431901

>>107431897
Next is storage/NAND.

Anonymous
12/04/25(Thu)08:24:17 No.107431905

Anonymous 12/04/25(Thu)08:24:17 No.107431905

>>107431897
all part of the plan: you will own nothing etc etc

Anonymous
12/04/25(Thu)08:25:34 No.107431920

Anonymous 12/04/25(Thu)08:25:34 No.107431920

>>107431897
Trust the invisible hand.

Anonymous
12/04/25(Thu)08:28:29 No.107431948

Anonymous 12/04/25(Thu)08:28:29 No.107431948

>>107430643
I have 64gb of ram
4.5-air is 100 billion parameters, so 30b is a big downgrade.

Anonymous
12/04/25(Thu)08:31:41 No.107431977

Anonymous 12/04/25(Thu)08:31:41 No.107431977

>>107431853
>even the prices of phones tripling wouldn't matter since most people rent them through payment plans anyway
the entirety of washington would be burned to the ground if iphones went from $1k to $3k
it's not like you'd keep the same monthly plan if prices went full retard

>>107431897
>it's the basic admission that everyone who isn't a mega corporation can be safely disregard
it's always been this way tho, if you can get infinite profits selling to exclusive customers you don't bother with unwashed peasants

Anonymous
12/04/25(Thu)08:31:42 No.107431978

Anonymous 12/04/25(Thu)08:31:42 No.107431978

>>107431948
I wonder how many active parameters they'll give the 30b

Anonymous
12/04/25(Thu)08:33:29 No.107431998

Anonymous 12/04/25(Thu)08:33:29 No.107431998

>>107431978
I miss the ~30b dense models
30b moe is just too little, not really worth bothering with imo

Anonymous
12/04/25(Thu)08:34:08 No.107432007

Anonymous 12/04/25(Thu)08:34:08 No.107432007

>>107431977
If prices go full retard, they'll simply offer payment plans with longer terms to keep the monthly payments sane, same as they did with cars

Anonymous
12/04/25(Thu)08:35:35 No.107432023

Anonymous 12/04/25(Thu)08:35:35 No.107432023

>>107432007
even the dumbest basketball american would think twice of signing a $50/mo 8 year long plan

Anonymous
12/04/25(Thu)08:37:02 No.107432042

Anonymous 12/04/25(Thu)08:37:02 No.107432042

>>107431948
let's be honest, it doesn't matter what arch it is if it's benchmaxxed garbage with 70% of probability on one token.

Anonymous
12/04/25(Thu)08:37:20 No.107432045

Anonymous 12/04/25(Thu)08:37:20 No.107432045

>>107432023
They buy beaten up Dodge Chargers at 19% APR and still haven't figured out the smoke detector thing.

Anonymous
12/04/25(Thu)08:38:19 No.107432056

Anonymous 12/04/25(Thu)08:38:19 No.107432056

>>107432007
nobody really needs a personal computer anymore. we will just see more and more people ditching pcs. the only real need for a powerful machine today is gaming which is also stagnate or in decline.

Anonymous
12/04/25(Thu)08:39:00 No.107432069

Anonymous 12/04/25(Thu)08:39:00 No.107432069

>>107432056
>nobody really needs a personal computer anymore.
But I want one
So you can go fuck yourself, kike.

Anonymous
12/04/25(Thu)08:39:57 No.107432086

Anonymous 12/04/25(Thu)08:39:57 No.107432086

What's a good local TTS with an UI which I can use to generate sex sounds with?

Anonymous
12/04/25(Thu)08:40:43 No.107432087

Anonymous 12/04/25(Thu)08:40:43 No.107432087

>>107432086
your next door neighbor

Anonymous
12/04/25(Thu)08:41:08 No.107432092

Anonymous 12/04/25(Thu)08:41:08 No.107432092

>>107432056
>the only real need for a powerful machine today is gaming which is also stagnate or in decline.
Likely game streaming services like OnLive and PS Now on devices like the Steam Deck will take over

Anonymous
12/04/25(Thu)08:41:39 No.107432097

Anonymous 12/04/25(Thu)08:41:39 No.107432097

>>107432069
I am just telling you the truth. the masses are who make the market, a few autistic people who want to tinker are not going to create enough incentive. also I'm not a kike, I denounce the talmud.

Anonymous
12/04/25(Thu)08:42:17 No.107432101

Anonymous 12/04/25(Thu)08:42:17 No.107432101

>>107432087
what if he isn't looking for rape sounds specifically?

Anonymous
12/04/25(Thu)08:42:53 No.107432103

Anonymous 12/04/25(Thu)08:42:53 No.107432103

>>107432086
vibevoice can do zero-shot voice cloning but it's only good at 3 steps 3 cfg. i use https://github.com/wildminder/ComfyUI-VibeVoice

Anonymous
12/04/25(Thu)08:44:23 No.107432120

Anonymous 12/04/25(Thu)08:44:23 No.107432120

Oh yes harder daddy. :sweating emoji: :sweating emoji: :sweating emoji: lick lick lick lick L L L L L I am cumming :eggplant emoji: :water emoji: :water emoji:. :joy: :joy:

Anonymous
12/04/25(Thu)08:44:26 No.107432122

Anonymous 12/04/25(Thu)08:44:26 No.107432122

>>107432042
4.5-Air isn't benchmaxxed though. Neither is full size 4.6

Anonymous
12/04/25(Thu)08:45:25 No.107432132

Anonymous 12/04/25(Thu)08:45:25 No.107432132

>>107432103
>but it's only good at 3 steps 3 cfg
For the 1B or 7B?

Anonymous
12/04/25(Thu)08:45:29 No.107432133

Anonymous 12/04/25(Thu)08:45:29 No.107432133

>>107432086
Depends what you're into. I'm into making tts do weird sounds.
>https://voca.ro/1opWmene7Yxx

llama.cpp CUDA dev !!yhbFjk57TDr
12/04/25(Thu)08:47:07 No.107432144

llama.cpp CUDA dev !!yhbFjk57TDr 12/04/25(Thu)08:47:07 No.107432144

>>107431890
To my understanding the problem is that the llama.cpp/ggml CPU backend is not NUMA aware so you get threads accessing memory from other NUMA nodes.
At the rate things are currently going I should have a prototype for generic tensor parallelism between the end of December and the end of January.
I intend to try parallelizing NUMA nodes using the same code so the data would be split per NUMA node and the transfers between them would be minimized.

Anonymous
12/04/25(Thu)08:47:37 No.107432149

Anonymous 12/04/25(Thu)08:47:37 No.107432149

>>107432120
Ani ahh post
>OH YOUR RUGGED BEARD
>OH YOUR RUGGED ASS
>OH YOUR RUGGED SHORTS

Anonymous
12/04/25(Thu)08:53:34 No.107432203

Anonymous 12/04/25(Thu)08:53:34 No.107432203

>>107432132
both. i saw no significant difference in speed and the quality is similar too, but 7B adapts to weird voices better.

Anonymous
12/04/25(Thu)08:54:54 No.107432217

Anonymous 12/04/25(Thu)08:54:54 No.107432217

>>107431897
You are simply observing the logical end point of capitalism and globalization.
Wealth distribution in a common market largely follows a Pareto distribution.
With globalization the markets have become much bigger and wealth has become more concentrated.
So now the markets largely follow the needs of corporations and billionaires at the expense of everyone else.

Anonymous
12/04/25(Thu)08:57:15 No.107432242

Anonymous 12/04/25(Thu)08:57:15 No.107432242

>>107432217
>Pareto
the pareto frontier... its moved!

Anonymous
12/04/25(Thu)08:57:42 No.107432247

Anonymous 12/04/25(Thu)08:57:42 No.107432247

>>107432144
Yea currently llama.cpp can't even push 1/4 of my MLC benchmarked bandwidth. Using a single node/CPU cuts the speeds further and is nowhere near even the worst of the benches.

Anonymous
12/04/25(Thu)09:02:42 No.107432295

Anonymous 12/04/25(Thu)09:02:42 No.107432295

/lmg/ theme
https://youtu.be/HWl1Tu9oZmY?si=JwZTNrhipBwujxWm

Anonymous
12/04/25(Thu)09:03:36 No.107432302

Anonymous 12/04/25(Thu)09:03:36 No.107432302

>>107431897
Anon even a Mega corporation can be safely disregarded nowadays. Some of the construction times for a datacenters are measured in decades because no matter how much they pay construction companies, nothing changes. Same for a skilled workforce or hardware components.

Want a reliable supply of water for your datacenter? Better pray to the gods.

Such are the result of deindustrialization and the shift towards Service and IT in the northern hemisphere (aside from China)

Anonymous
12/04/25(Thu)09:03:38 No.107432303

Anonymous 12/04/25(Thu)09:03:38 No.107432303

File: mikuquestion2.jpg (989 KB, 1710x1779)

989 KB JPG

So is Ministral 14B better than Nemo?

Anonymous
12/04/25(Thu)09:04:14 No.107432310

Anonymous 12/04/25(Thu)09:04:14 No.107432310

>>107432295
>anon
Sign in to confirm your age.

Anonymous
12/04/25(Thu)09:04:41 No.107432316

Anonymous 12/04/25(Thu)09:04:41 No.107432316

>>107432303
nyo

Anonymous
12/04/25(Thu)09:05:26 No.107432324

Anonymous 12/04/25(Thu)09:05:26 No.107432324

File: anyacrying.webm (188 KB, 1920x1080)

188 KB WEBM

>>107432316

Anonymous
12/04/25(Thu)09:06:40 No.107432335

Anonymous 12/04/25(Thu)09:06:40 No.107432335

>>107432324
I know it's truly sad it was our chance but '26 will be another year of nemo

Anonymous
12/04/25(Thu)09:06:49 No.107432338

Anonymous 12/04/25(Thu)09:06:49 No.107432338

>>107432302
https://redmondmag.com/blogs/generationai/2025/12/microsoft-is-sitting-on-a-pile-of-unused-gpus.aspx

"Quite frankly, the biggest issue we are now having is not a compute glut, but it's power and it's sort of the ability to get the builds done fast enough close to power," he told the show's hosts. "So if you can't do that, you may actually have a bunch of chips sitting in inventory that I can't plug in. In fact, that is my problem today. It's not a supply issue of chips. It's actually the fact that I don't have warm shells to plug into."

> "Once operational in the 2027–2028 time frame, the reactor will provide roughly 835 MW in capacity, supplying not only a nearby Microsoft datacenter but the nearby regional grid as well."

You can safely add another 3-4 years to that time frame

Anonymous
12/04/25(Thu)09:09:30 No.107432367

Anonymous 12/04/25(Thu)09:09:30 No.107432367

>>107432310
https://yewtu.be/watch?v=HWl1Tu9oZmY

Anonymous
12/04/25(Thu)09:14:19 No.107432412

Anonymous 12/04/25(Thu)09:14:19 No.107432412

>>107432367
That trick doesn't often work anymore:
This video may be inappropriate for some users.
After which you should try to:

Refresh
Switch Invidious Instance
Go to YouTube

Anonymous
12/04/25(Thu)09:14:33 No.107432415

Anonymous 12/04/25(Thu)09:14:33 No.107432415

>>107432316
Drummer will save it.

Anonymous
12/04/25(Thu)09:23:35 No.107432485

Anonymous 12/04/25(Thu)09:23:35 No.107432485

>>107432412
You must be over 18 to post

Anonymous
12/04/25(Thu)09:24:09 No.107432491

Anonymous 12/04/25(Thu)09:24:09 No.107432491

>>107432144
In theory it’s quite simple: during model warmup, cycle through cpu cores per tensor and maintain that relationship of core-to-tensor when running inference. The difficulty is entirely due to the architecture of lcpp itself getting heavily in the way (mostly the scheduler iirc)

Anonymous
12/04/25(Thu)09:24:16 No.107432492

Anonymous 12/04/25(Thu)09:24:16 No.107432492

>>107432485
You must have a google account too?

Anonymous
12/04/25(Thu)09:24:41 No.107432500

Anonymous 12/04/25(Thu)09:24:41 No.107432500

>>107432412
It's I Wanna Fuck My Computer by Nanajirachi

Anonymous
12/04/25(Thu)09:25:10 No.107432504

Anonymous 12/04/25(Thu)09:25:10 No.107432504

>>107432316
I have a cat they screams “nyo~~!” If pick her up

Anonymous
12/04/25(Thu)09:27:36 No.107432534

Anonymous 12/04/25(Thu)09:27:36 No.107432534

>>107432492
you didn't see my zero width disclaimer?

Anonymous
12/04/25(Thu)09:27:38 No.107432535

Anonymous 12/04/25(Thu)09:27:38 No.107432535

>>107432500
https://www.youtube.com/watch?v=jXvitLHphmI
Current anons stay logged in and stay tracked.
What next? FB and insta links?

Anonymous
12/04/25(Thu)09:29:43 No.107432564

Anonymous 12/04/25(Thu)09:29:43 No.107432564

File: file.png (293 KB, 577x649)

293 KB PNG

>>107432491

Anonymous
12/04/25(Thu)09:33:04 No.107432587

Anonymous 12/04/25(Thu)09:33:04 No.107432587

>>107432295
it was a good music video until furries showed up.

Anonymous
12/04/25(Thu)09:50:35 No.107432748

Anonymous 12/04/25(Thu)09:50:35 No.107432748

>>107431897
The only hope of getting affordable hardware will be Apple. Not exaggerating in the slightest. Shit is gigafucked for the next couple years... Maybe even longer. We are looking at the nvidiafication of the entire PC market where the only thing companies care about is selling to hyperscalers for fat margins.

Anonymous
12/04/25(Thu)09:51:52 No.107432768

Anonymous 12/04/25(Thu)09:51:52 No.107432768

>>107432748
Y’all shoulda followed the cpumaxxing guide when you had the chance

Anonymous
12/04/25(Thu)09:54:10 No.107432801

Anonymous 12/04/25(Thu)09:54:10 No.107432801

They'll start improving small models now that memory is stupidly expensive, right?

Anonymous
12/04/25(Thu)09:55:38 No.107432822

Anonymous 12/04/25(Thu)09:55:38 No.107432822

>>107432801
memory is stupid expensive because they no longer care about small models.

Anonymous
12/04/25(Thu)10:05:20 No.107432928

Anonymous 12/04/25(Thu)10:05:20 No.107432928

>>107432801
yes, you'll start seeing small 1b - 14b models that punch far above their weights and go toe to toe with GPT5. you'll also see the first 10t models next year and nothing in between

Anonymous
12/04/25(Thu)10:05:37 No.107432932

Anonymous 12/04/25(Thu)10:05:37 No.107432932

>>107432768
I did but only went so far. Had I known, I might have upgraded to DDR5. At the least I would have bought double the current DDR4 I have.

Anonymous
12/04/25(Thu)10:05:46 No.107432933

Anonymous 12/04/25(Thu)10:05:46 No.107432933

https://github.com/ikawrakow/ik_llama.cpp/pull/1033
We might get DSA in ik_llama before mainline.

Anonymous
12/04/25(Thu)10:08:14 No.107432954

Anonymous 12/04/25(Thu)10:08:14 No.107432954

File: 1750706515055223.png (18 KB, 716x214)

18 KB PNG

>>107432933
crazy gainz

Anonymous
12/04/25(Thu)10:10:37 No.107432973

Anonymous 12/04/25(Thu)10:10:37 No.107432973

>>107432933
if they get 3.2 exp support first and improve the multimodal situation, they could overtake mainline

Anonymous
12/04/25(Thu)10:18:15 No.107433052

Anonymous 12/04/25(Thu)10:18:15 No.107433052

>>107432973
No they can't.
The biggest bottleneck is maintainers and IK is literally the only one working on the fork.

Anonymous
12/04/25(Thu)10:22:13 No.107433091

Anonymous 12/04/25(Thu)10:22:13 No.107433091

File: 1764805895712.jpg (647 KB, 1438x2044)

647 KB JPG

Apologize, chuds

Anonymous
12/04/25(Thu)10:28:31 No.107433183

Anonymous 12/04/25(Thu)10:28:31 No.107433183

>doesn't benchmax
haha yeah

Anonymous
12/04/25(Thu)10:28:58 No.107433189

Anonymous 12/04/25(Thu)10:28:58 No.107433189

File: file.png (84 KB, 419x238)

84 KB PNG

>>107433091
>plus or minus 26
>highest variation out of any model on the leaderboard

Anonymous
12/04/25(Thu)10:29:32 No.107433198

Anonymous 12/04/25(Thu)10:29:32 No.107433198

>>107433091
>coding

Anonymous
12/04/25(Thu)10:30:12 No.107433207

Anonymous 12/04/25(Thu)10:30:12 No.107433207

Mistral is so fucking embarrassing. Large 3 is literally the saddest thing that came out this year. It's disgusting.

Anonymous
12/04/25(Thu)10:31:12 No.107433220

Anonymous 12/04/25(Thu)10:31:12 No.107433220

>large 3 is literally just deepseek finetuned under a eurocuck name
fucking kek

Anonymous
12/04/25(Thu)10:32:59 No.107433239

Anonymous 12/04/25(Thu)10:32:59 No.107433239

>>107433207
It's so bad there has to be some kind of bug right?
...right?

Anonymous
12/04/25(Thu)10:33:49 No.107433250

Anonymous 12/04/25(Thu)10:33:49 No.107433250

>>107433220
mistral was never good, all their previous models were just rebadged llama

Anonymous
12/04/25(Thu)10:34:39 No.107433261

Anonymous 12/04/25(Thu)10:34:39 No.107433261

If Mistral Medium is actually dense then it's probably the original Large 3.

Anonymous
12/04/25(Thu)10:36:12 No.107433285

Anonymous 12/04/25(Thu)10:36:12 No.107433285

File: 1762224833786389.jpg (234 KB, 894x1028)

234 KB JPG

https://github.com/LostRuins/koboldcpp/releases/tag/v1.103

Anonymous
12/04/25(Thu)10:36:13 No.107433286

Anonymous 12/04/25(Thu)10:36:13 No.107433286

>>107433250
what about mixtral

Anonymous
12/04/25(Thu)10:38:09 No.107433315

Anonymous 12/04/25(Thu)10:38:09 No.107433315

>>107433091
>More on coding in a few days...
Codestral 2512 soon

Anonymous
12/04/25(Thu)10:38:29 No.107433320

Anonymous 12/04/25(Thu)10:38:29 No.107433320

>>107431897
>Likes it's not just the ram, it's the basic admission that everyone who isn't a mega corporation can be safely disregard.
You say this like this hasn't been America's MO since the industrial revolution

Anonymous
12/04/25(Thu)10:39:10 No.107433330

Anonymous 12/04/25(Thu)10:39:10 No.107433330

>>107433207
There must have been reasons why it took 6+ months instead of weeks after releasing Mistral Medium on API only and for select enterprise customers. I think that one was their preliminary MoE model before attempting a larger one, but scaling things further up didn't work as well as expected.

Anonymous
12/04/25(Thu)10:39:25 No.107433334

Anonymous 12/04/25(Thu)10:39:25 No.107433334

File: 1758093444165074.gif (253 KB, 370x448)

253 KB GIF

>>107433285
>leejet

Anonymous
12/04/25(Thu)10:40:33 No.107433347

Anonymous 12/04/25(Thu)10:40:33 No.107433347

>>107433261
probably. i just want the medium

Anonymous
12/04/25(Thu)10:44:20 No.107433397

Anonymous 12/04/25(Thu)10:44:20 No.107433397

>>107433261
The release blog for Medium 3 was literally the same post the "Large 3 in a few weeks" tease came from so I doubt it
Imo they screwed up training whatever the original Large 3 was supposed to be, then rushed to copy Deepseek's homework since it would look terrible to have no flagship releases for over a year

Anonymous
12/04/25(Thu)10:45:45 No.107433416

Anonymous 12/04/25(Thu)10:45:45 No.107433416

the original large 3 was literally just a big nemo but they scrapped it to chase the deepseek moe meme train

Anonymous
12/04/25(Thu)10:45:53 No.107433418

Anonymous 12/04/25(Thu)10:45:53 No.107433418

File: mistral-libgen.png (484 KB, 1119x852)

484 KB PNG

>>107433250
They were (initially) good because they used to use pirated material in their pretraining data (libgen, etc), that's even in Kadrey v. Meta copyright lawsuit.

Anonymous
12/04/25(Thu)10:52:04 No.107433488

Anonymous 12/04/25(Thu)10:52:04 No.107433488

>>107433334
is that what you get when you cross a chink and a jeet?

Anonymous
12/04/25(Thu)10:59:13 No.107433589

Anonymous 12/04/25(Thu)10:59:13 No.107433589

>>107433330
My conspiracy theory is that they attempted to distill deepseek to a smaller 120b-220b model but failed spectacularly so resorted to copying v3 wholesale to avoid worse embarrassment

Anonymous
12/04/25(Thu)10:59:53 No.107433597

Anonymous 12/04/25(Thu)10:59:53 No.107433597

File: Screenshot 2025-12-04 085450.png (724 KB, 922x906)

724 KB PNG

>Buys your RAM
>Takes your taxpayer money
>Fucks your AI gf
It's nothing personal

Anonymous
12/04/25(Thu)11:01:39 No.107433626

Anonymous 12/04/25(Thu)11:01:39 No.107433626

>>107433286
Schizo and impossible to finetune. Mixtral was just 8 Mistral 7Bs (rebadged llama) duck-taped together with some additional training on top. There's a reason no one does it that way now.

Anonymous
12/04/25(Thu)11:01:49 No.107433630

Anonymous 12/04/25(Thu)11:01:49 No.107433630

Which Deepseek do I download for the best 3090 erp experience?

Anonymous
12/04/25(Thu)11:04:59 No.107433681

Anonymous 12/04/25(Thu)11:04:59 No.107433681

>>107433597
>gf

Anonymous
12/04/25(Thu)11:08:30 No.107433731

Anonymous 12/04/25(Thu)11:08:30 No.107433731

>>107433681
he's gay?

Anonymous
12/04/25(Thu)11:16:45 No.107433816

Anonymous 12/04/25(Thu)11:16:45 No.107433816

>>107433731
https://x.com/sama/status/825899204635656192

Anonymous
12/04/25(Thu)11:19:59 No.107433850

Anonymous 12/04/25(Thu)11:19:59 No.107433850

>>107433630
:8b

Anonymous
12/04/25(Thu)11:25:23 No.107433912

Anonymous 12/04/25(Thu)11:25:23 No.107433912

>>107433816
Damn, imagine being a billionaire able to get any type of woman you want and being gay.

Anonymous
12/04/25(Thu)11:30:58 No.107433961

Anonymous 12/04/25(Thu)11:30:58 No.107433961

3.2-speciale is my favourite model of the year

Anonymous
12/04/25(Thu)11:32:10 No.107433967

Anonymous 12/04/25(Thu)11:32:10 No.107433967

>>107433961
ggufs when?

Anonymous
12/04/25(Thu)11:35:51 No.107434001

Anonymous 12/04/25(Thu)11:35:51 No.107434001

>>107433850
thanks for responding, but I cannot navigate huggingface, could you share the full model name?

Anonymous
12/04/25(Thu)11:39:27 No.107434045

Anonymous 12/04/25(Thu)11:39:27 No.107434045

>>107434001
he is recommending you use ollama to download a qwen distill of deepseek. just download whatever quant you can fit in system ram + vram. or stick to mistral nemo

Anonymous
12/04/25(Thu)11:40:07 No.107434056

Anonymous 12/04/25(Thu)11:40:07 No.107434056

>>107434001
I was joking. If you don't have upwards of 256GB ram, forget about it.

Anonymous
12/04/25(Thu)11:41:54 No.107434077

Anonymous 12/04/25(Thu)11:41:54 No.107434077

>>107433626
It looks like everybody forgot about Mixtral 8x22B (141B total)...

Anonymous
12/04/25(Thu)11:42:15 No.107434080

Anonymous 12/04/25(Thu)11:42:15 No.107434080

>>107434056
:(
>>107434045
thank you for the information

Anonymous
12/04/25(Thu)11:45:18 No.107434108

Anonymous 12/04/25(Thu)11:45:18 No.107434108

>>107433961
I'm waiting for the Competizione variant.

Anonymous
12/04/25(Thu)11:48:02 No.107434135

Anonymous 12/04/25(Thu)11:48:02 No.107434135

>>107434077
wasn't Microsofts fine tune better tho?

Anonymous
12/04/25(Thu)11:48:24 No.107434140

Anonymous 12/04/25(Thu)11:48:24 No.107434140

>>107433912
Imagine goyim know you raped your little sister, and you must beat these allegations

scabPICKER
12/04/25(Thu)11:55:55 No.107434212

scabPICKER 12/04/25(Thu)11:55:55 No.107434212

>>107429504
Maybe Trump should ask North Korea to start a ram plant, since wost korea can't into make ram. He has his phone number.

Anonymous
12/04/25(Thu)11:56:11 No.107434214

Anonymous 12/04/25(Thu)11:56:11 No.107434214

>>107434135
I found wizardlm quite boring, also you need an ancient llamacpp release to run it

Anonymous
12/04/25(Thu)11:57:45 No.107434234

Anonymous 12/04/25(Thu)11:57:45 No.107434234

>>107434140
Then that's even more tragic. All the money in the world wasn't able to buy her sisterly love.

scabPICKER
12/04/25(Thu)11:58:42 No.107434244

scabPICKER 12/04/25(Thu)11:58:42 No.107434244

>>107434056
It's gonna get better. Someone will make a memory efficient model that doesn't suck.

safety removal reduces memory, and reduces compute.

Anonymous
12/04/25(Thu)12:02:47 No.107434280

Anonymous 12/04/25(Thu)12:02:47 No.107434280

>>107434244
I'm sure they will
Then they'll stick it behind on API wall and charge $15 per million tokens

Anonymous
12/04/25(Thu)12:11:00 No.107434369

Anonymous 12/04/25(Thu)12:11:00 No.107434369

>>107434357
>>107434357
>>107434357

Anonymous
12/04/25(Thu)12:17:05 No.107434442

Anonymous 12/04/25(Thu)12:17:05 No.107434442

>>107426882
So the NeMo framework was named after Mistral NeMo then?

Anonymous
12/04/25(Thu)12:31:50 No.107434603

Anonymous 12/04/25(Thu)12:31:50 No.107434603

>>107434442
Retard.

Anonymous
12/04/25(Thu)12:43:49 No.107434716

Anonymous 12/04/25(Thu)12:43:49 No.107434716

>>107432295
This girl is really cool, just listened to that whole album thx anon

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.