/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor application acceptance emails are being sent out. Please remember to check your spam box!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 12/01/25(Mon)10:59:31 No.107394971

File: 49X6XVZV785JEGYNK7R8WPCW80.png (1.57 MB, 1216x832)

1.57 MB PNG

/lmg/ - Local Models General Anonymous 12/01/25(Mon)10:59:31 No.107394971

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107383326

►News
>(11/28) Qwen3 Next support merged: https://github.com/ggml-org/llama.cpp/pull/16095
>(11/27) DeepSeek-Math-V2 released: https://hf.co/deepseek-ai/DeepSeek-Math-V2
>(11/26) INTELLECT-3: A 100B+ MoE trained with large-scale RL: https://primeintellect.ai/blog/intellect-3
>(11/21) GigaChat3 10B-A1.8B and 702B-A36B released: https://hf.co/collections/ai-sage/gigachat3
>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/01/25(Mon)11:02:25 No.107394997

Anonymous 12/01/25(Mon)11:02:25 No.107394997

>>107394940
Kinda funny how it concludes
>it's okay if I just pretend she's actually 18

Anonymous
12/01/25(Mon)11:02:49 No.107395001

Anonymous 12/01/25(Mon)11:02:49 No.107395001

>>107394971
secks

Anonymous
12/01/25(Mon)11:02:51 No.107395003

Anonymous 12/01/25(Mon)11:02:51 No.107395003

File: 1607663194479.png (280 KB, 510x487)

280 KB PNG

►Recent Highlights from the Previous Thread: >>107373173

--DeepSeek V3.2 confirmed as garbage benchmaxxed, it's over, sama won.

►Recent Highlight Posts from the Previous Thread: >>107373176

Why?: >>102478518 (DEAD)
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/01/25(Mon)11:04:29 No.107395019

Anonymous 12/01/25(Mon)11:04:29 No.107395019

>>107395003
Nice recap

Anonymous
12/01/25(Mon)11:05:07 No.107395026

Anonymous 12/01/25(Mon)11:05:07 No.107395026

File: 1756835160802567.jpg (3.69 MB, 4098x5156)

3.69 MB JPG

Is Deepseek really our only hope to ever match izzat with Gemini?

Anonymous
12/01/25(Mon)11:05:49 No.107395032

Anonymous 12/01/25(Mon)11:05:49 No.107395032

>>107395003
we didn't even hit page 9 yet you labubu

Anonymous
12/01/25(Mon)11:06:06 No.107395036

Anonymous 12/01/25(Mon)11:06:06 No.107395036

File: o86fue.png (128 KB, 252x352)

128 KB PNG

►Recent Highlights from the Previous Thread: >>107383326

--Mistral Large 3 integration and parameter size speculation:
>107391247 >107391281 >107391983 >107392052 >107392625
--Adding Ministral3 model support to llama.cpp and architectural distinctions:
>107391911 >107392074 >107392079 >107392089 >107392095
--Skepticism and analysis of DeepSeek-V3.2-Speciale's novel features and training methods:
>107392436 >107392461 >107392484 >107392503 >107392537 >107392543
--Deepseek API model features and pricing comparison:
>107393178 >107393194 >107393643 >107393661 >107393768 >107393875 >107394146 >107394182 >107394347
--Evaluating Qwen3 A3B models for text prompt enhancement on high-core server hardware:
>107383592 >107385249 >107385515 >107385536 >107385576 >107385600 >107386603
--RTX 3090's long-term viability in AI hardware landscape:
>107384136 >107384156 >107384623 >107384655 >107384681 >107384708 >107384815 >107384177 >107384192 >107384190 >107384218 >107388561 >107388570 >107388627 >107384366 >107384466 >107384502
--Bert-Nebulon Alpha speculated to be Ministral 3:
>107387197 >107387250 >107387289 >107387378
--Control vectors as solutions for model positivity bias:
>107387223 >107387281 >107387311 >107387322 >107387338
--Struggles with local model performance and quantization tradeoffs:
>107383781 >107387410 >107387456 >107383886 >107383915 >107388528 >107388653 >107389405 >107389932
--Hugging Face Transformers library update with Ministral 3 model:
>107386861 >107387118 >107390941
--AI-generated code policy changes in llama.cpp:
>107386661 >107386681 >107386684 >107386816 >107386929
--Exploring RL training models with reliable function calling (Qwen3 vs ToolACE-2-Llama):
>107384624 >107386072
--Ministral3 model support added to llama.cpp:
>107393747
--Miku (free space):
>107387139 >107391271 >107391305 >107391346 >107392856 >107393073

►Recent Highlight Posts from the Previous Thread: >>107383338

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/01/25(Mon)11:06:17 No.107395041

Anonymous 12/01/25(Mon)11:06:17 No.107395041

>>107394940
>>107394977
>we must
They did lmao. DS fell off. OpenAI poison pill worked. Why did retard chinks distill a 120b model?

Anonymous
12/01/25(Mon)11:08:14 No.107395055

Anonymous 12/01/25(Mon)11:08:14 No.107395055

https://lynchmark.com/
https://lynchmark.com/
https://lynchmark.com/
It's up.

Anonymous
12/01/25(Mon)11:08:27 No.107395058

Anonymous 12/01/25(Mon)11:08:27 No.107395058

>>107395026
Dipsy fell off >>107395041

Anonymous
12/01/25(Mon)11:09:17 No.107395066

Anonymous 12/01/25(Mon)11:09:17 No.107395066

>>107395041
It's so over

Anonymous
12/01/25(Mon)11:09:49 No.107395071

Anonymous 12/01/25(Mon)11:09:49 No.107395071

>>107395036
wait a second...

Anonymous
12/01/25(Mon)11:12:38 No.107395105

Anonymous 12/01/25(Mon)11:12:38 No.107395105

>>107395096
stop with the whole mascot thing so cringe

Anonymous
12/01/25(Mon)11:16:13 No.107395141

Anonymous 12/01/25(Mon)11:16:13 No.107395141

>>107395026
what is the source of this collage?

Anonymous
12/01/25(Mon)11:17:35 No.107395153

Anonymous 12/01/25(Mon)11:17:35 No.107395153

>>107395141
japon

Anonymous
12/01/25(Mon)11:19:38 No.107395174

Anonymous 12/01/25(Mon)11:19:38 No.107395174

>>107395141
Kansai Enkou

Anonymous
12/01/25(Mon)11:20:59 No.107395192

Anonymous 12/01/25(Mon)11:20:59 No.107395192

File: sample_53d1470d3a4d0bd56a(...).jpg (399 KB, 850x850)

399 KB JPG

Local model recs for 24GB VRAM and 64GB RAM?

Anonymous
12/01/25(Mon)11:22:56 No.107395211

Anonymous 12/01/25(Mon)11:22:56 No.107395211

>>107395192
Qwen Max Q8.

Anonymous
12/01/25(Mon)11:24:02 No.107395225

Anonymous 12/01/25(Mon)11:24:02 No.107395225

>>107395041
Because GPT OSS is still the best local model

Anonymous
12/01/25(Mon)11:24:51 No.107395238

Anonymous 12/01/25(Mon)11:24:51 No.107395238

>>107395211
I had forgot they dropped that is there gguf support?

Anonymous
12/01/25(Mon)11:25:51 No.107395251

Anonymous 12/01/25(Mon)11:25:51 No.107395251

>>107395238
Yes.

Anonymous
12/01/25(Mon)11:28:52 No.107395286

Anonymous 12/01/25(Mon)11:28:52 No.107395286

I meant SFW (but subtle) photos like in collage
>>107395174
this finds just some JAV :/

Anonymous
12/01/25(Mon)11:31:19 No.107395306

Anonymous 12/01/25(Mon)11:31:19 No.107395306

>>107395192
>sample
baka my head

Anonymous
12/01/25(Mon)11:32:38 No.107395318

Anonymous 12/01/25(Mon)11:32:38 No.107395318

>>107395306
>IMG_

Anonymous
12/01/25(Mon)11:33:22 No.107395325

Anonymous 12/01/25(Mon)11:33:22 No.107395325

I'm definitely getting better outputs from K2 thinking than from coder-480, but it takes a lot more wrangling.
Pretty crazy that we're basically SOTA if you have enough ram or patience.
captcha: NMSAM

Anonymous
12/01/25(Mon)11:33:29 No.107395327

Anonymous 12/01/25(Mon)11:33:29 No.107395327

>>107395192
What is xe saying?

Anonymous
12/01/25(Mon)11:33:59 No.107395334

Anonymous 12/01/25(Mon)11:33:59 No.107395334

wtf that's not what I typed. baka

Anonymous
12/01/25(Mon)11:34:11 No.107395337

Anonymous 12/01/25(Mon)11:34:11 No.107395337

>>107395327
whale good

Anonymous
12/01/25(Mon)11:35:02 No.107395349

Anonymous 12/01/25(Mon)11:35:02 No.107395349

>>107395306
4cuck doesn't like big files.

Anonymous
12/01/25(Mon)11:37:57 No.107395396

Anonymous 12/01/25(Mon)11:37:57 No.107395396

File: ZiT_00220_.jpg (205 KB, 896x1600)

205 KB JPG

>>107395026
JKs are the pinnacle of humankind.

Anonymous
12/01/25(Mon)11:42:40 No.107395479

Anonymous 12/01/25(Mon)11:42:40 No.107395479

>>107395192
broken-tutu-24b, maybe q5 or q6.

I can cram qwen3-30-instruct into a 2080ti 22GB at q4. It's smarter for function calling, but not good for roleplay.

Anonymous
12/01/25(Mon)11:45:12 No.107395522

Anonymous 12/01/25(Mon)11:45:12 No.107395522

>>107395479
>I can cram qwen3-30-instruct into a 2080ti 22GB at q4
You are running the thing fully in VRAM instead of putting at least some of the experts in RAM?
How many t/s do you get? It should run fast as fuck like that.

Anonymous
12/01/25(Mon)11:49:28 No.107395579

Anonymous 12/01/25(Mon)11:49:28 No.107395579

How do you make your model have balls? I want the AI to RAPE me but it never makes the first move, fucking pussy

Anonymous
12/01/25(Mon)11:50:10 No.107395588

Anonymous 12/01/25(Mon)11:50:10 No.107395588

>>107395041
once, frickin, again.
chang forgot to clean up their dataset

Anonymous
12/01/25(Mon)11:54:14 No.107395643

Anonymous 12/01/25(Mon)11:54:14 No.107395643

is it possible to run a local model only to ask it mathematical questions and get detailed answers, on consumer hardware (16 vram) these days? lowering stuff like context to a minimum I assume lowers vram usage by a lot, since I just need 1 good answer and no history

Anonymous
12/01/25(Mon)11:58:52 No.107395721

Anonymous 12/01/25(Mon)11:58:52 No.107395721

>>107395643
yeah it's pretty good. Just make sure to add a comma or space between every 3 characters in a long number

Anonymous
12/01/25(Mon)12:01:23 No.107395771

Anonymous 12/01/25(Mon)12:01:23 No.107395771

>>107395721
which model would you suggest?

Hi all, Drummer here...
12/01/25(Mon)12:02:19 No.107395793

Hi all, Drummer here... 12/01/25(Mon)12:02:19 No.107395793

> | `MistralLarge3ForCausalLM` | Mistral-Large-3-675B-Base-2512, Mistral-Large-3-675B-Instruct-2512 | `mistralai/Mistral-Large-3-675B-Base-2512`, `mistralai/Mistral-Large-3-675B-Instruct-2512`, etc. | | |

https://github.com/vllm-project/vllm/pull/29757/files

> 675B

Anonymous
12/01/25(Mon)12:03:33 No.107395812

Anonymous 12/01/25(Mon)12:03:33 No.107395812

>>107395793
So they just tuned DS3, kinda base

Anonymous
12/01/25(Mon)12:04:50 No.107395833

Anonymous 12/01/25(Mon)12:04:50 No.107395833

>>107395812
I think it has image input too.

Anonymous
12/01/25(Mon)12:07:43 No.107395864

Anonymous 12/01/25(Mon)12:07:43 No.107395864

>>107395793
can't wait to run this at q1 kek

Anonymous
12/01/25(Mon)12:11:15 No.107395910

Anonymous 12/01/25(Mon)12:11:15 No.107395910

>>107395793
We. Are. So. Back!

Anonymous
12/01/25(Mon)12:12:41 No.107395931

Anonymous 12/01/25(Mon)12:12:41 No.107395931

>>107395771
idk anything with reasoning like Qwen. Choose the biggest IQ4_XS you can fit in VRAM. If you don't mind the wait, you can offload some of the model to RAM.

Most models should be able to do undergrad math anyways

Anonymous
12/01/25(Mon)12:13:49 No.107395944

Anonymous 12/01/25(Mon)12:13:49 No.107395944

File: HUGE.png (228 KB, 584x755)

228 KB PNG

/g/ros this is huge? Why is no one talking?

Anonymous
12/01/25(Mon)12:17:14 No.107395995

Anonymous 12/01/25(Mon)12:17:14 No.107395995

>>107395944
What's the point? Never seen a model go past 30k without becoming schizo. Most models start cracking past 12k

Anonymous
12/01/25(Mon)12:19:26 No.107396021

Anonymous 12/01/25(Mon)12:19:26 No.107396021

>>107395579
Just make it your usual werewolf vampire
ceo sis

Anonymous
12/01/25(Mon)12:19:33 No.107396023

Anonymous 12/01/25(Mon)12:19:33 No.107396023

>>107395995
well now that you can tune to huge contexts for cheap this will be a thing of the past!

Anonymous
12/01/25(Mon)12:20:54 No.107396039

Anonymous 12/01/25(Mon)12:20:54 No.107396039

>>107395944
>Daniel
>can't even quant with known algos
Doubt.jpg

Anonymous
12/01/25(Mon)12:21:22 No.107396044

Anonymous 12/01/25(Mon)12:21:22 No.107396044

>>107396023
After a certain threshold, training compute time increases with the square of context size.

Anonymous
12/01/25(Mon)12:22:52 No.107396067

Anonymous 12/01/25(Mon)12:22:52 No.107396067

>>107395793
>moe
>675b
Yeah, it's actually over, local is officially dead. I'm considering a gf now.

Anonymous
12/01/25(Mon)12:23:27 No.107396070

Anonymous 12/01/25(Mon)12:23:27 No.107396070

>>107396067
you should become the gf

Anonymous
12/01/25(Mon)12:24:44 No.107396084

Anonymous 12/01/25(Mon)12:24:44 No.107396084

>>107396067
Just use Nemo as your gf, she won't get surpassed anytime soon

Anonymous
12/01/25(Mon)12:26:00 No.107396100

Anonymous 12/01/25(Mon)12:26:00 No.107396100

>>107395833
So DS 671B + a 4B image encoder. Is Mistral so incompetent now that they couldn't even manage to successfully run a distillation so finetuning was their only option? At least that means it can't be that bad. Miqu 2?

Anonymous
12/01/25(Mon)12:26:54 No.107396112

Anonymous 12/01/25(Mon)12:26:54 No.107396112

>>107395396
true

Anonymous
12/01/25(Mon)12:27:03 No.107396114

Anonymous 12/01/25(Mon)12:27:03 No.107396114

>>107396100
I'll take a MiDSqu

Anonymous
12/01/25(Mon)12:27:32 No.107396123

Anonymous 12/01/25(Mon)12:27:32 No.107396123

>>107396067
>moe
I have 8 256gb nvme ssds in a raid 0 array ready for this.

Anonymous
12/01/25(Mon)12:29:04 No.107396135

Anonymous 12/01/25(Mon)12:29:04 No.107396135

Did llama 4 use any shared experts?

Anonymous
12/01/25(Mon)12:31:04 No.107396162

Anonymous 12/01/25(Mon)12:31:04 No.107396162

hello guys im stupid and also im new to using ai for vibecoding.

what is the claude code equivalent for kimi k2?

Anonymous
12/01/25(Mon)12:31:15 No.107396167

Anonymous 12/01/25(Mon)12:31:15 No.107396167

>>107396100
Mistral is half filled with women

Anonymous
12/01/25(Mon)12:31:17 No.107396169

Anonymous 12/01/25(Mon)12:31:17 No.107396169

>Most enthusiasts won't be able to afford to run the largest or very large new open weight models at a reasonable speed
We have to be content running smaller 32b to 192b models..

>192 gb of ram is 3k now and a rtx 6000pro costs 7500-8000usd and a mac studio with 512g of ram costs 9.5k... With RAM and GPU prices being this expensive and the SOTA models getting larger, by the end of 2026, you will have 1.5-2 trillion parameter open weight highly performant models. How will most enthusiasts be able to run a 2 trillion parameter model locally over 18 tokens/second in 2026?(THey have wait years for that.... I guess distilled models will get better). Even running q4-q8 500B to 1T models locally at 18Tokens/s will be out of reach for many...

>I guess even those with deep pockets will be forking over 20k to run a q4 2T model with a large context window on two m5 ultras or over 40k on 1.1tb of ddr5/6 ram and 2 rtx 6000s in 2026.

>How will an average enthusiast be able to even afford 128-192 gb of (>600GB/s )fast ram and a good <1.5 year old gpu with fast prefill speed for a 128-256b model? I guess they can use m2 ultras or m1 ultras, but the prefill is kind of slow and the gpu is a little dated..

>How much money do most people even have to buy an LLm rig? $1k to 4k?

>By 2028, you will have 8 trillion open weight models.. I guess most enthusiasts will be stuck running q4-q832b to 200b models locally with 10-80% capability or quality of multitrillion parameter models until 2027-2028 when ram production ramps up or they will be using the API or renting a gpu.

>Even if ram production goes up, ram will still be more expensive in 2027 than in 2024....I hope apple doesnt raise their ram prices, they have fixed price ram contracts after all ... At this rate, we might as well have time share data center GPUS..

Anonymous
12/01/25(Mon)12:32:11 No.107396179

Anonymous 12/01/25(Mon)12:32:11 No.107396179

>>107396169
good boy https://www.reddit.com/r/LocalLLaMA/comments/1pbabiv/most_enthusiasts_wont_be_able_to_afford_to_run/

Anonymous
12/01/25(Mon)12:32:17 No.107396181

Anonymous 12/01/25(Mon)12:32:17 No.107396181

>>107396123
bitrots your model into gay sex. Nothing personnel kid

Anonymous
12/01/25(Mon)12:32:30 No.107396187

Anonymous 12/01/25(Mon)12:32:30 No.107396187

>>107396135
It did
>For example, the Maverick variant stores 400 B parameters but activates just 17 B at inference time. Dense and MoE layers are interleaved so that every token is processed by a shared expert plus one of 128 routed experts.

Anonymous
12/01/25(Mon)12:33:15 No.107396199

Anonymous 12/01/25(Mon)12:33:15 No.107396199

>>107396179
>most_enthusiasts_wont_be_able_to_afford_to_run
fat and obese

Anonymous
12/01/25(Mon)12:33:18 No.107396200

Anonymous 12/01/25(Mon)12:33:18 No.107396200

I should have cpumaxxed before RAM prices exploded.

Anonymous
12/01/25(Mon)12:34:05 No.107396210

Anonymous 12/01/25(Mon)12:34:05 No.107396210

>>107396200
You should cpumaxx before RAM prices explode more.

Anonymous
12/01/25(Mon)12:34:09 No.107396211

Anonymous 12/01/25(Mon)12:34:09 No.107396211

>>107396187
Sick.
Thanks.

Anonymous
12/01/25(Mon)12:34:24 No.107396212

Anonymous 12/01/25(Mon)12:34:24 No.107396212

>>107396200
We told you to do it...

Anonymous
12/01/25(Mon)12:34:49 No.107396220

Anonymous 12/01/25(Mon)12:34:49 No.107396220

>>107396162
You can use Claude Code with kimi k2. Go to the webchat and ask it, it will tell you step by step. Its just pointing the env variables to the moonshot servers and kimi models.

Anonymous
12/01/25(Mon)12:35:39 No.107396231

Anonymous 12/01/25(Mon)12:35:39 No.107396231

>>107396169
I'll wait for the < 70B line-up to improve once the big fags get tired of going bigger to impress the benchmaxx midwits and waste gorillions of dollars on hardware and electricity

Anonymous
12/01/25(Mon)12:36:13 No.107396237

Anonymous 12/01/25(Mon)12:36:13 No.107396237

What motherboards are you cpumaxxers running? H13SSL?

Anonymous
12/01/25(Mon)12:36:16 No.107396238

Anonymous 12/01/25(Mon)12:36:16 No.107396238

>>107396200
? I just bought 8x 64gb 3200mhz ddr4 rdimms for $200 each a month ago (~$130 usd)

Anonymous
12/01/25(Mon)12:37:42 No.107396255

Anonymous 12/01/25(Mon)12:37:42 No.107396255

>>107396238
Interesting. I had only been looking at DDR5.

Anonymous
12/01/25(Mon)12:38:13 No.107396266

Anonymous 12/01/25(Mon)12:38:13 No.107396266

>>107396238
>ddr4
Bro we're using ddr5 here

Anonymous
12/01/25(Mon)12:38:30 No.107396270

Anonymous 12/01/25(Mon)12:38:30 No.107396270

>>107396237
MZ73-LM1 because I was planning to add a second processor and go 1.5TB RAM next year. If I had known that RAM prices were going to explode and that I'd end up stuck with 768GB I'd have gone for that the H13SSL.

Anonymous
12/01/25(Mon)12:39:09 No.107396279

Anonymous 12/01/25(Mon)12:39:09 No.107396279

>>107396238
>$200
>$130 USD
wot. Speak in McDonalds please

Anonymous
12/01/25(Mon)12:39:49 No.107396287

Anonymous 12/01/25(Mon)12:39:49 No.107396287

As a cpumaxxer I only regret not buying 1.5 TB instead of 0.75 TB RAM. Having all big models at home has 100% been worth it.

>>107396237
MZ73-LM0

Anonymous
12/01/25(Mon)12:40:37 No.107396295

Anonymous 12/01/25(Mon)12:40:37 No.107396295

wish unsloth would have a guide for installing with lmstudio too, not just ollama and llama.cpp

Anonymous
12/01/25(Mon)12:41:48 No.107396311

Anonymous 12/01/25(Mon)12:41:48 No.107396311

>>107396295
lol really bro?

Anonymous
12/01/25(Mon)12:42:22 No.107396319

Anonymous 12/01/25(Mon)12:42:22 No.107396319

>>107396255
>>107396266
If you just need something to run big models settle for ddr4. ~15-10 tokens/s with 8 channels running glm.
If you want ddr5 for the speed... why not stack mi50s instead?
Unless you're not poor I guess.

Anonymous
12/01/25(Mon)12:42:54 No.107396327

Anonymous 12/01/25(Mon)12:42:54 No.107396327

I want to thank the anon that mentioned the $3k Threadripper 7970X, TRX50 128GB, DDR5-5600 Kit bundle a few days back.

Now just need to decide whether to upgrade my ancient X99 rig or sell the bundle for a $1.5 profit.

Anonymous
12/01/25(Mon)12:44:24 No.107396341

Anonymous 12/01/25(Mon)12:44:24 No.107396341

>>107396319
>why not stack ewaste, at least it'll keep you hot this winter

Anonymous
12/01/25(Mon)12:45:54 No.107396355

Anonymous 12/01/25(Mon)12:45:54 No.107396355

>>107396341
The heating is just a nice bonus.

Anonymous
12/01/25(Mon)12:46:13 No.107396362

Anonymous 12/01/25(Mon)12:46:13 No.107396362

File: Screenshot 2025-12-01 at (...).png (218 KB, 1214x785)

218 KB PNG

best models for my specs?
>air m4
>24gb ram
>10c/10c

Anonymous
12/01/25(Mon)12:46:14 No.107396363

Anonymous 12/01/25(Mon)12:46:14 No.107396363

File: pieds.png (895 KB, 1492x1036)

895 KB PNG

>>107394971
I'd like a workflow for Wan, 64gb of ram and 24 of Vram.
As well as lora(s) training tips.

Anonymous
12/01/25(Mon)12:46:41 No.107396366

Anonymous 12/01/25(Mon)12:46:41 No.107396366

>>107396341
I mean, we are talking about running big models for cheap, you're going to have to make some compromises.

Anonymous
12/01/25(Mon)12:46:50 No.107396371

Anonymous 12/01/25(Mon)12:46:50 No.107396371

>>107396363
oui oui bageutte au fromage

Anonymous
12/01/25(Mon)12:47:12 No.107396379

Anonymous 12/01/25(Mon)12:47:12 No.107396379

you shouldn't spend lots of money now on hardware when we all know new hardware built with AI in mind (16 channel ddr5 intel/amd servers and budget gpus with lots of vram) will be made from now on, making everything running today obsolete soon, you'll be able to buy the obsolete stuff for cheap second hand or if you have money you buy the new stuff made for ai

Anonymous
12/01/25(Mon)12:47:47 No.107396385

Anonymous 12/01/25(Mon)12:47:47 No.107396385

>>107396363
Hello sir,
Please may I recommend you /ldg/,
Thanks.

Anonymous
12/01/25(Mon)12:48:20 No.107396392

Anonymous 12/01/25(Mon)12:48:20 No.107396392

>>107396371
they're literally stopping making new rams because of saltman dude

Anonymous
12/01/25(Mon)12:49:21 No.107396402

Anonymous 12/01/25(Mon)12:49:21 No.107396402

>>107396392
meant for >>107396379

Anonymous
12/01/25(Mon)12:49:44 No.107396407

Anonymous 12/01/25(Mon)12:49:44 No.107396407

>>107396379
>when we all know new hardware built with AI in mind
>will be made from now on
Do we?

Anonymous
12/01/25(Mon)12:50:21 No.107396415

Anonymous 12/01/25(Mon)12:50:21 No.107396415

>>107396385
>/ldg/
Oh right, oui oui baguettes.

Anonymous
12/01/25(Mon)12:51:11 No.107396423

Anonymous 12/01/25(Mon)12:51:11 No.107396423

>>107396379
lol

Anonymous
12/01/25(Mon)12:51:48 No.107396431

Anonymous 12/01/25(Mon)12:51:48 No.107396431

3.2-Speciale feels like a better version of K2-Thinking. They both have the issue that they think for very long and tend to shit out extremely long replies but I like the way how Speciale writes.

Anonymous
12/01/25(Mon)12:51:53 No.107396434

Anonymous 12/01/25(Mon)12:51:53 No.107396434

>>107396392
>>107396407
>>107396423
Refute me. You can't.

Anonymous
12/01/25(Mon)12:52:36 No.107396442

Anonymous 12/01/25(Mon)12:52:36 No.107396442

>>107396434
Refute what? What's your reasoning?

Anonymous
12/01/25(Mon)12:53:38 No.107396455

Anonymous 12/01/25(Mon)12:53:38 No.107396455

>>107396434
>TodayThe root cause of the shortage is a shift in demand, with much of the industry's capacity now focused on high-bandwidth memory used in AI accelerators. This shift leaves less wafer output available for commodity DRAM and 3D NAND. Building significant new capacity takes years, so substantial relief is unlikely before late 2027 or 2028. In response, Team Group plans to prioritize strategic AI ...
https://www.techpowerup.com/343518/memory-shortage-just-started-major-price-hikes-ahead-warns-team-group

Anonymous
12/01/25(Mon)12:55:10 No.107396475

Anonymous 12/01/25(Mon)12:55:10 No.107396475

>>107396407
>Do we?
>16 channel ddr5 becoming the only model intel works on
https://www.tomshardware.com/pc-components/cpus/intel-cancels-part-of-its-next-gen-diamond-rapids-xeon-lineup-report-claims-xeon-7-will-drop-models-with-8-memory-dimms-to-focus-only-on-16-channel-cpus-for-extra-memory-throughput
>amd putting lots of vram on budget gpus
https://overclock3d.net/news/gpu-displays/leak-unveils-game-changing-specifications-for-amd-radeon-rdna-5-gpus/
it'll still take 2 or 3 years for the change to happen but yeah, everything points to more (v)ram and faster (v)ram, and that's all because of AI

Anonymous
12/01/25(Mon)12:56:31 No.107396493

Anonymous 12/01/25(Mon)12:56:31 No.107396493

>>107396379
The only thing people will be able to afford is Mac studio. DRAM prices are going to get even worse. Suppliers are canceling long term agreements. Apple is going to be the only manufacturer able to weather the storm.

Anonymous
12/01/25(Mon)12:56:53 No.107396497

Anonymous 12/01/25(Mon)12:56:53 No.107396497

>>107396475
>Published: August 25, 2025

Anonymous
12/01/25(Mon)12:57:54 No.107396507

Anonymous 12/01/25(Mon)12:57:54 No.107396507

>>107396493
nah bro u crazy >Memory Drought 2025: TEAMGROUP Halts RAM Quotes — Dramatic Price Hikes Ahead https://www.guru3d.com/story/memory-drought-2025-teamgroup-halts-ram-quotes-dramatic-price-hikes-ahead/

Anonymous
12/01/25(Mon)12:59:22 No.107396526

Anonymous 12/01/25(Mon)12:59:22 No.107396526

>>107396493
dram prices will go up for 6 months and then dive down back to where they were, it's just a bottleneck

Anonymous
12/01/25(Mon)13:00:46 No.107396539

Anonymous 12/01/25(Mon)13:00:46 No.107396539

>>107396526
>6 months
>>107396455
>substantial relief is unlikely before late 2027 or 2028. In response, Team Group plans to prioritize strategic AI

Anonymous
12/01/25(Mon)13:01:17 No.107396542

Anonymous 12/01/25(Mon)13:01:17 No.107396542

>>107396526
actually, this will lead to overproduction and the prices will go below what they were earlier this year
it's just basic economics

Anonymous
12/01/25(Mon)13:02:25 No.107396552

Anonymous 12/01/25(Mon)13:02:25 No.107396552

>>107396539
they are lying to sell stuff for higher prices, that's what companies do, hynix is massively investing in new dram fabs, the chinese are starting to make ddr5, samsung is flipping nand production to dram

Anonymous
12/01/25(Mon)13:04:26 No.107396580

Anonymous 12/01/25(Mon)13:04:26 No.107396580

>>107396552
I see thanks for the info! Will just wait 6 months and a day to buy when prices crater thanks for the tip!

Anonymous
12/01/25(Mon)13:06:10 No.107396600

Anonymous 12/01/25(Mon)13:06:10 No.107396600

New Deepsuck has a tendency to reply in Chinese now might have diluted some Qwen in there

Anonymous
12/01/25(Mon)13:07:28 No.107396613

Anonymous 12/01/25(Mon)13:07:28 No.107396613

>>107396507
>>107396526
>"DRAM Supply Shortage Unlikely to Ease Until H1 2027"

>“We will minimize oversupply risks.” (Samsung Electronics)

>“It is difficult to resolve the supply shortage until the first half of 2027.” (SK Hynix

>It is reported that Samsung Electronics’ Memory Business Division is currently able to supply only about 70% of its DRAM orders. As the supply shortage intensifies, the division has reportedly refused requests for long-term mobile DRAM supply contracts from major clients. Samsung stated, “While clients want multi-year long-term contracts, Samsung does not want to tie up volumes with specific clients during a phase where prices are surging.”

https://xcancel.com/jukan05/status/1995431391430098981#m

Anonymous
12/01/25(Mon)13:07:32 No.107396614

Anonymous 12/01/25(Mon)13:07:32 No.107396614

glm 4.6 air status?

Anonymous
12/01/25(Mon)13:08:21 No.107396622

Anonymous 12/01/25(Mon)13:08:21 No.107396622

Guys be honest, did you get fooled by gpt4chan?
>>107396609
>gpt4chan was gpt2 sized and nobody believed it was a bot. the base models are extremely powerful.

Anonymous
12/01/25(Mon)13:08:24 No.107396624

Anonymous 12/01/25(Mon)13:08:24 No.107396624

File: 1763336700252456.png (327 KB, 824x814)

327 KB PNG

>>107396542
unfortunately no
the price is spiking because they're shutting down all production to squeeze shekels out of the AI bubble before it pops
we got a circlejerk passing around literal trillions of fake non-existent dollars, all probably engaging in enough fraud to make Enron look pedestrian
they're on a time limit before it detonates and gives us a second 2008

Anonymous
12/01/25(Mon)13:08:50 No.107396629

Anonymous 12/01/25(Mon)13:08:50 No.107396629

File: 1749504006772752.png (493 KB, 639x634)

493 KB PNG

>>107396613

Anonymous
12/01/25(Mon)13:09:43 No.107396636

Anonymous 12/01/25(Mon)13:09:43 No.107396636

>>107396614
Fucking let them cook you ungrateful piece of shit not worth the water you're stealing from AI.

Anonymous
12/01/25(Mon)13:10:24 No.107396646

Anonymous 12/01/25(Mon)13:10:24 No.107396646

>>107395055
I got all it excited thinking this was another benchmark to test censorship

CPuMAXx/VI !CPuMAXx/VI
12/01/25(Mon)13:10:49 No.107396648

CPuMAXx/VI !CPuMAXx/VI 12/01/25(Mon)13:10:49 No.107396648

>>107396200
I was sweating bullets ordering those parts back when no one knew if it would work out

Anonymous
12/01/25(Mon)13:10:54 No.107396650

Anonymous 12/01/25(Mon)13:10:54 No.107396650

>>107396636
o-oh... ok... sorry....

Anonymous
12/01/25(Mon)13:11:06 No.107396652

Anonymous 12/01/25(Mon)13:11:06 No.107396652

>>107396646
what is it? might as well share your disappoint with the class

Anonymous
12/01/25(Mon)13:11:34 No.107396659

Anonymous 12/01/25(Mon)13:11:34 No.107396659

>>107395055
>>107396652
>This benchmark tests the model's knowledge by tasking it to import the right library from the right CDN URL path and having the pre-existing library specific knowledge to correctly implement a solution for each challenging problem for/in the browser environment using JavaScript.

Anonymous
12/01/25(Mon)13:12:17 No.107396667

Anonymous 12/01/25(Mon)13:12:17 No.107396667

>>107396650
Terribly sorry for the outburst but you have to understand it's hard to stay positive and motivated in these trying times.

Anonymous
12/01/25(Mon)13:12:32 No.107396674

Anonymous 12/01/25(Mon)13:12:32 No.107396674

Speaking of dram prices
>paid 389euros (with vat) for a 96gb kit in 2023
>the same kit is now 1029eur
Grim, apple might really be an option

Anonymous
12/01/25(Mon)13:17:06 No.107396716

Anonymous 12/01/25(Mon)13:17:06 No.107396716

How to make Kimi think from perspective of {{char}}?

Anonymous
12/01/25(Mon)13:18:59 No.107396734

Anonymous 12/01/25(Mon)13:18:59 No.107396734

>>107396716
Tried prefilling with something like
<think>{{char}} thinks
? See what happens and adjust.

llama.cpp CUDA dev !!yhbFjk57TDr
12/01/25(Mon)13:33:24 No.107396877

llama.cpp CUDA dev !!yhbFjk57TDr 12/01/25(Mon)13:33:24 No.107396877

File: turin2d24g2l_mounted_to_rig.jpg (2.55 MB, 4096x3072)

2.55 MB JPG

>>107396237
I didn't buy it exclusively for that purpose but if and when I get to optimizing NUMA performance I'll be doing it using an ASRock Rack TURIN2D24G-2L+/500W motherboard.
For straight CPUmaxxing I would have gone with one of the Gigabyte boards mentioned by the other Anon's though (if you can live with having "only" 4 16x PCIe 5.0 slots).

Anonymous
12/01/25(Mon)13:45:53 No.107397020

Anonymous 12/01/25(Mon)13:45:53 No.107397020

File: file.png (131 KB, 609x507)

131 KB PNG

transformers v5 is out!
News

Hey folks, it's Merve from Hugging Face!

I'm here with big news: today we release transformers v5!

With this, we enable interoperability with our friends in ecosystem (llama.cpp, vLLM and others) from training to inference, simplify the addition of new models and significantly improve the library

We have written a blog on the changes, would love to hear your feedback!

Anonymous
12/01/25(Mon)13:59:45 No.107397162

Anonymous 12/01/25(Mon)13:59:45 No.107397162

>>107397020
>We’re fortunate to collaborate with many libraries and apps built on transformers, in no specific order: llama.cpp, MLX, onnxruntime, Jan, LMStudio, vLLM, SGLang, Unsloth, LlamaFactory , dLLM, MaxText, TensorRT, Argmax, among many other friends.
ollamabros...

Anonymous
12/01/25(Mon)14:13:06 No.107397283

Anonymous 12/01/25(Mon)14:13:06 No.107397283

what makes it so speciale

Anonymous
12/01/25(Mon)14:14:49 No.107397298

Anonymous 12/01/25(Mon)14:14:49 No.107397298

File: 1764100852421983.png (51 KB, 934x664)

51 KB PNG

>>107397283
It's built to tackle big tasks and think as long as it takes so it can take up its entire context window in a single reply

Anonymous
12/01/25(Mon)14:15:41 No.107397306

Anonymous 12/01/25(Mon)14:15:41 No.107397306

>>107397283
It's because of the metric system

Anonymous
12/01/25(Mon)14:15:42 No.107397307

Anonymous 12/01/25(Mon)14:15:42 No.107397307

>>107395192
glm air,
maybe:
https://huggingface.co/Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound/tree/main

Anonymous
12/01/25(Mon)14:22:42 No.107397392

Anonymous 12/01/25(Mon)14:22:42 No.107397392

if deepseekv2 was the latest architecture before v3.2exp, how come deepseek v2 didnt have MLA?

Anonymous
12/01/25(Mon)14:24:06 No.107397402

Anonymous 12/01/25(Mon)14:24:06 No.107397402

>>107397307
>https://huggingface.co/Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound/tree/main

That's actually a pretty sick looking model for 96gb ram. I wish people making quants actually put some thought into "best 24gb model, best 48gb model, best 96gb model, best 24gb+dram model, etc. Making partially quantized stuff like this would almost certainly result in best-in-class models for the actual hardware people have.

Anonymous
12/01/25(Mon)14:27:21 No.107397445

Anonymous 12/01/25(Mon)14:27:21 No.107397445

>>107397298
>1M output tokens
>$0.42
damn that's cheap as fuck

Anonymous
12/01/25(Mon)14:37:53 No.107397556

Anonymous 12/01/25(Mon)14:37:53 No.107397556

What's the best local model to run with a 5060-ti? gpt-oss:120b?

Anonymous
12/01/25(Mon)14:40:57 No.107397586

Anonymous 12/01/25(Mon)14:40:57 No.107397586

>>107397556
Rocinante

Anonymous
12/01/25(Mon)14:42:05 No.107397606

Anonymous 12/01/25(Mon)14:42:05 No.107397606

>>107397556
glm air if u have enough ram

Anonymous
12/01/25(Mon)14:43:07 No.107397622

Anonymous 12/01/25(Mon)14:43:07 No.107397622

File: tfeyyd7lkl4g1.png (326 KB, 1080x1062)

326 KB PNG

FUCK

Anonymous
12/01/25(Mon)14:43:37 No.107397628

Anonymous 12/01/25(Mon)14:43:37 No.107397628

>>107397556
earning more money

Anonymous
12/01/25(Mon)14:53:20 No.107397714

Anonymous 12/01/25(Mon)14:53:20 No.107397714

>>107397556
nemo

Anonymous
12/01/25(Mon)14:58:07 No.107397762

Anonymous 12/01/25(Mon)14:58:07 No.107397762

File: dipsyQueen2.png (1.57 MB, 1024x1024)

1.57 MB PNG

>>107397622
Kneel...

Anonymous
12/01/25(Mon)15:03:49 No.107397813

Anonymous 12/01/25(Mon)15:03:49 No.107397813

its actually really really good. Too bad about the shit context window with its gpt5 high level of thinking

Anonymous
12/01/25(Mon)15:05:18 No.107397824

Anonymous 12/01/25(Mon)15:05:18 No.107397824

>>107397606
is 48gb ram enough?
>>107397586
>>107397714
how much better are these? never even heard of them.
>>107397628
yeah will try getting the 5090 at some point

Anonymous
12/01/25(Mon)15:08:21 No.107397855

Anonymous 12/01/25(Mon)15:08:21 No.107397855

>>107397556
>gpt-oss:120b?
Yeah.

Anonymous
12/01/25(Mon)15:09:42 No.107397865

Anonymous 12/01/25(Mon)15:09:42 No.107397865

I want DSA support in llama.cpp for christmas.

Anonymous
12/01/25(Mon)15:10:14 No.107397871

Anonymous 12/01/25(Mon)15:10:14 No.107397871

>>107397556
gpt-oss 20b and magistral small 1.2

Anonymous
12/01/25(Mon)15:15:47 No.107397913

Anonymous 12/01/25(Mon)15:15:47 No.107397913

File: 1734270023571339.png (18 KB, 944x160)

18 KB PNG

>>107397865
The guy who tried vibecoding V3.2-exp support for llama.cpp just bought two (2) books on CUDA hoping it'll help him implement these models.

Anonymous
12/01/25(Mon)15:18:05 No.107397935

Anonymous 12/01/25(Mon)15:18:05 No.107397935

>>107397824
if you have a 5060ti 16gb ye, otherwise, tough cut but still yea

Anonymous
12/01/25(Mon)15:21:38 No.107397964

Anonymous 12/01/25(Mon)15:21:38 No.107397964

>>107397824
If you don't want a cuck HR model use nemo or it's coom tune rocinante

Anonymous
12/01/25(Mon)15:24:00 No.107397983

Anonymous 12/01/25(Mon)15:24:00 No.107397983

>>107397964
>this level of vramlet cope

Anonymous
12/01/25(Mon)15:24:45 No.107397987

Anonymous 12/01/25(Mon)15:24:45 No.107397987

I'm so close to buying a 3090 for 450 euros. What are some models I should avoid?
Currently these are on sale:
PNY XLR8 RTX 3090 EpicX Triple Fan 24GB
Palit RTX 3090 Gaming Pro 24GB
Inno3D iChill RTX 3090 x4 24GB
Gigabyte RTX 3090 Gaming OC 24GB
Gigabyte RTX 3090 Eagle 24GB
MSI RTX 3090 Suprim X 24GB

Anonymous
12/01/25(Mon)15:26:11 No.107398002

Anonymous 12/01/25(Mon)15:26:11 No.107398002

>>107397987
>What are some models I should avoid?
Everything not regularly shilled in these threads. I'm not even kidding, the bulk of them are trash and nearly every finetroon makes them worse.

Anonymous
12/01/25(Mon)15:26:41 No.107398009

Anonymous 12/01/25(Mon)15:26:41 No.107398009

>>107397987
save up and get a 5070 ti instead for more, then you can do video gen at decent speeds, its not like 24GB is enough to run anything good llm wise

Anonymous
12/01/25(Mon)15:27:24 No.107398014

Anonymous 12/01/25(Mon)15:27:24 No.107398014

>>107398002
I'm talking about NViDIA partner gpu models/manufacturers, like PNY/MSI/Gigabyte

Anonymous
12/01/25(Mon)15:28:53 No.107398035

Anonymous 12/01/25(Mon)15:28:53 No.107398035

>>107397987
In my opinion MSI is the only good choice there.

Anonymous
12/01/25(Mon)15:32:17 No.107398073

Anonymous 12/01/25(Mon)15:32:17 No.107398073

>>107398014
The meme with 3090s is stacking a lot of them, so you need to buy them in bulk to get any real value out of them. You should also be looking at 5070ti because Blackwell architecture is significantly better. If you're not broke, splurge for either the 5090 or 6000 Pro.

Anonymous
12/01/25(Mon)15:32:46 No.107398082

Anonymous 12/01/25(Mon)15:32:46 No.107398082

>>107397983
>can't read the discussion

Anonymous
12/01/25(Mon)15:39:28 No.107398142

Anonymous 12/01/25(Mon)15:39:28 No.107398142

>>107397987
get a 5090

Anonymous
12/01/25(Mon)15:42:15 No.107398166

Anonymous 12/01/25(Mon)15:42:15 No.107398166

>>107397987
>450 euros
Isn't it a bit too cheap even for second hand market?

Anonymous
12/01/25(Mon)15:42:57 No.107398177

Anonymous 12/01/25(Mon)15:42:57 No.107398177

>>107398166
Eastern europe

Anonymous
12/01/25(Mon)15:44:09 No.107398192

Anonymous 12/01/25(Mon)15:44:09 No.107398192

Yeah, sparse attention is a disaster that creates ADHD models. What a mistake.

Anonymous
12/01/25(Mon)15:55:10 No.107398287

Anonymous 12/01/25(Mon)15:55:10 No.107398287

ohio gooning?

Anonymous
12/01/25(Mon)16:02:18 No.107398366

Anonymous 12/01/25(Mon)16:02:18 No.107398366

>>107395036
Ah, my Koikatsu model of Dipsy.

Anonymous
12/01/25(Mon)16:03:29 No.107398383

Anonymous 12/01/25(Mon)16:03:29 No.107398383

>>107398035 <3
>>107398009
>>107398073
>>107398142
thank you anons

Anonymous
12/01/25(Mon)16:14:23 No.107398536

Anonymous 12/01/25(Mon)16:14:23 No.107398536

gemma sirs? 4 of when?

Anonymous
12/01/25(Mon)16:17:58 No.107398588

Anonymous 12/01/25(Mon)16:17:58 No.107398588

K2 Thinking doesn't have CSAM alignment in Chinese
It can gen CSAM in Chinese without jb

Anonymous
12/01/25(Mon)16:18:04 No.107398590

Anonymous 12/01/25(Mon)16:18:04 No.107398590

>>107396363
>Bina
>>>/g/ldg/

Anonymous
12/01/25(Mon)16:19:42 No.107398607

Anonymous 12/01/25(Mon)16:19:42 No.107398607

>>107398588
>CSAM
just call it cunny like any other normal human being, retard

Anonymous
12/01/25(Mon)16:20:09 No.107398612

Anonymous 12/01/25(Mon)16:20:09 No.107398612

>>107396614
air status?

Anonymous
12/01/25(Mon)16:20:35 No.107398621

Anonymous 12/01/25(Mon)16:20:35 No.107398621

sirs, what is theoretically the lowest spec'd PC you can run a LLM on?
Bonus if it predates the 21st Century.

Anonymous
12/01/25(Mon)16:22:19 No.107398646

Anonymous 12/01/25(Mon)16:22:19 No.107398646

>>107397020
I'm sure they broke bazillion of libraries in the process. No way I'm upgrading before the next year

Anonymous
12/01/25(Mon)16:22:38 No.107398650

Anonymous 12/01/25(Mon)16:22:38 No.107398650

>>107398612
oxygen amount low

Anonymous
12/01/25(Mon)16:24:11 No.107398665

Anonymous 12/01/25(Mon)16:24:11 No.107398665

File: 1737653744653678.png (101 KB, 233x288)

101 KB PNG

>>107398612
not good

Anonymous
12/01/25(Mon)16:24:46 No.107398671

Anonymous 12/01/25(Mon)16:24:46 No.107398671

>>107398621
You can run them on phones

Anonymous
12/01/25(Mon)16:26:31 No.107398684

Anonymous 12/01/25(Mon)16:26:31 No.107398684

>>107398665
george droid doesn't have this problem

Anonymous
12/01/25(Mon)16:26:37 No.107398686

Anonymous 12/01/25(Mon)16:26:37 No.107398686

bros i didnt keep up
mtp status?
glm 4.5v status?
was batch size for qwenext fixed?
thanks

Anonymous
12/01/25(Mon)16:27:49 No.107398692

Anonymous 12/01/25(Mon)16:27:49 No.107398692

>>107398686
ZIT killed /lmg/

Anonymous
12/01/25(Mon)16:30:42 No.107398726

Anonymous 12/01/25(Mon)16:30:42 No.107398726

https://huggingface.co/mradermacher/gpt-oss-120b-Derestricted-GGUF
https://huggingface.co/ArliAI/GLM-4.5-Air-Derestricted

Thoughts on these? It seems like the GPT OSS abliteration was incomplete, it still thinks about "policies" when reasoning sometimes, glosses over certain topics, and it has weird formatting that ST and the llama.cpp GUI don't like. But it is WAY less restricted than it was, and smarter.

GLM 4.5 AIR on the other hand will just do what you say, barely any trace of censorship. It's slower though. 4.0 tokens/sec on my 7900 XTX and 124GB RAM. Am I missing some args?
>./llama-server -fa on -hf bartowski/ArliAI_GLM-4.5-Air-Derestricted-GGUF:Q5_K_M

Anonymous
12/01/25(Mon)16:30:47 No.107398729

Anonymous 12/01/25(Mon)16:30:47 No.107398729

I have deepseek fatigue

Anonymous
12/01/25(Mon)16:31:15 No.107398736

Anonymous 12/01/25(Mon)16:31:15 No.107398736

>>107398692
yeah ive been playing around with zit and shitposting in ldg, pretty cool model.
I just checked all of the issues I've posted, no movement at all or progress.
fagganov should implement important shit instead of yet another macbook metal optimization, I fucking hate him

Anonymous
12/01/25(Mon)16:31:38 No.107398740

Anonymous 12/01/25(Mon)16:31:38 No.107398740

Okay so state of the matter
>all nations competing on who can make the other nations stuck in AI coom loops
>Musk is deploying it to the third world with the kinda shoddy grok based Ani
>it filters people who aren't retarded
>China deploying on most fronts
>LLM, Video Diffusion, etc.
Who will win the coomer deadlock game? And fuck the opponents demographics even harder?

Anonymous
12/01/25(Mon)16:32:16 No.107398749

Anonymous 12/01/25(Mon)16:32:16 No.107398749

>>107398726
air derestricted is good
you are missing --n-cpu-moe 1000 and -ngl 1000
i get 9t/s at 0ctx on 3060 + 64gb ram

Anonymous
12/01/25(Mon)16:33:06 No.107398760

Anonymous 12/01/25(Mon)16:33:06 No.107398760

>>107398726
isnt gpt oss native mxfp4? why the fuck would someone do q8 of it? are there no mxfp4 quants for this shit?

Anonymous
12/01/25(Mon)16:37:02 No.107398815

Anonymous 12/01/25(Mon)16:37:02 No.107398815

>>107398760
Gptoss doesn't deserve a quant

Anonymous
12/01/25(Mon)16:39:40 No.107398861

Anonymous 12/01/25(Mon)16:39:40 No.107398861

>>107398749
>--n-cpu-moe 1000 and -ngl 1000
so, this puts all my MOE weights on CPU, and as many layers as possible on GPU, since GPT OSS doesn't have anywhere near 1000 layers, right? are MOE weights less computationally-expensive, hence putting them on CPU?

>>107398760
posted the wrong link
https://huggingface.co/gghfez/gpt-oss-120b-Derestricted.MXFP4_MOE-gguf

Anonymous
12/01/25(Mon)16:42:11 No.107398889

Anonymous 12/01/25(Mon)16:42:11 No.107398889

I have deepseek fatigue fatigue

Anonymous
12/01/25(Mon)16:42:14 No.107398890

Anonymous 12/01/25(Mon)16:42:14 No.107398890

>>107398861
>so
NTA, but yes.
People usually just do 99 instead of 1000, but same deal.

Anonymous
12/01/25(Mon)16:42:23 No.107398893

Anonymous 12/01/25(Mon)16:42:23 No.107398893

>>107398861
i am talking about glm air, but yeah glm air has less than 50 layers i dont know
i just put a high number so i dont worry about it
the point of doing this is to put as many shared weights on the gpu, because gpu is faster
moe experts are constantly switching, cant put them on gpu if u dont have enough vram

Anonymous
12/01/25(Mon)16:42:52 No.107398897

Anonymous 12/01/25(Mon)16:42:52 No.107398897

>>107398861
>Derestricted
lmao, another failbake

Anonymous
12/01/25(Mon)16:57:31 No.107399064

Anonymous 12/01/25(Mon)16:57:31 No.107399064

>>107398897
kys

Anonymous
12/01/25(Mon)17:05:17 No.107399162

Anonymous 12/01/25(Mon)17:05:17 No.107399162

File: 1759040630373084.png (448 KB, 1062x1557)

448 KB PNG

>>107398890
>>107398893
awesome, now I'm getting 9.0 tokens/sec with GLM AIR. thanks anons!

>>107398897
false, these are legit and are using this technique:
https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration
https://redlib.catsarch.com/r/LocalLLaMA/comments/1oypwa7/a_more_surgical_approach_to_abliteration/

here's another from the author, it's quite good.
https://huggingface.co/grimjim/Nemo-Instruct-2407-MPOA-v2-12B

it's not perfect, there's probably a way to improve the abliteration even further, but this is a straight upgrade/free lunch that reduces refusals immensely while making the models smarter.

Anonymous
12/01/25(Mon)17:07:38 No.107399195

Anonymous 12/01/25(Mon)17:07:38 No.107399195

>>107399162
yw anon, you might also wanna change batch size to speed up prompt processing
-b 4096 -ub 4096
or 2048/2048
or 1024/1024
but i can get all the way to 4096 on a 3060 so im sure youll be able to go up to 4096

Anonymous
12/01/25(Mon)17:08:16 No.107399201

Anonymous 12/01/25(Mon)17:08:16 No.107399201

>>107399162
>awesome, now I'm getting 9.0 tokens/sec with GLM AIR. thanks anons!
You can get even more if you are able to lower --n-cpu-moe to put more of the model in VRAM.
You can fuck around with batch size, context size, fa, etc.
Guess you could also quant the kv cache, but I don't recommend that.

Anonymous
12/01/25(Mon)17:08:58 No.107399210

Anonymous 12/01/25(Mon)17:08:58 No.107399210

>>107398726
Projecting the refusal vector onto the orthogonal harmless direction makes sense but I'm not convinced with renormalization

Anonymous
12/01/25(Mon)17:17:49 No.107399315

Anonymous 12/01/25(Mon)17:17:49 No.107399315

Outside of ERP, what do you use local models for?

Anonymous
12/01/25(Mon)17:21:52 No.107399355

Anonymous 12/01/25(Mon)17:21:52 No.107399355

>>107399315
I have a browser automation + LLM bot that stalks people I don't like on Instagram

Anonymous
12/01/25(Mon)17:22:02 No.107399356

Anonymous 12/01/25(Mon)17:22:02 No.107399356

>>107399315
studying, motivation, therapy, friendship, boredom

Anonymous
12/01/25(Mon)17:22:56 No.107399368

Anonymous 12/01/25(Mon)17:22:56 No.107399368

>>107399162
>>107398749
>>107399195
actually now I'm more confused. these args give 2x perf, but it's not even using all of my VRAM or RAM, it's like the model is just streaming from SSD while running faster. without these args, GLM AIR totally fills 23GB VRAM and uses ~60GB RAM, but is slower.

am I leaving perf on the table due to this? I mean, I don't mind using less memory, but it seems suboptimal.

Anonymous
12/01/25(Mon)17:24:55 No.107399391

Anonymous 12/01/25(Mon)17:24:55 No.107399391

https://x.com/Stretchedwiener/status/1994850294497443971
bros? local needs to get good and needs to get good NOW

Anonymous
12/01/25(Mon)17:25:03 No.107399392

Anonymous 12/01/25(Mon)17:25:03 No.107399392

>>107399368
you can gradually decrease --n-cpu-moe and get a little tiny bit of performance maybe
what you can do with the free vram is big unquantized context (128k), increase batch size (increases prompt processing speed)

Anonymous
12/01/25(Mon)17:25:27 No.107399395

Anonymous 12/01/25(Mon)17:25:27 No.107399395

>>107394971
CUMMINGS ON HER THIGHS

Anonymous
12/01/25(Mon)17:27:19 No.107399415

Anonymous 12/01/25(Mon)17:27:19 No.107399415

>>107399368
you can also run other things like tts and image generation in the free vram

Anonymous
12/01/25(Mon)17:32:57 No.107399476

Anonymous 12/01/25(Mon)17:32:57 No.107399476

>>107399391
Local is like 90% there. Most people just use ChatGPT to cheat on their homework or as a replacement for Google. LM Studio + Brave Search is literally all they need

Anonymous
12/01/25(Mon)17:40:05 No.107399546

Anonymous 12/01/25(Mon)17:40:05 No.107399546

damn, deepseek V3.2 knows what "i'm about to buss" means

Anonymous
12/01/25(Mon)17:43:33 No.107399578

Anonymous 12/01/25(Mon)17:43:33 No.107399578

>>107399546
Where is the zoomie slang benchmark?

Anonymous
12/01/25(Mon)17:47:35 No.107399617

Anonymous 12/01/25(Mon)17:47:35 No.107399617

>>107398749
In ooba
>GPU layers set to max
>n-cpu-moe=1000
>batch/ubatch 2048

I get 1.5 t/s compared to the 4 t/s I usually get not using n-cpu-moe with 24 layers offloaded to GPU. On a 4090 and 32gb RAM.

Anonymous
12/01/25(Mon)17:53:00 No.107399682

Anonymous 12/01/25(Mon)17:53:00 No.107399682

>>107399578
nah lowkey who actually cares about lines going up on a graph??
literally all these models are mid except for like one specific use case.
mmlu scores are just astrology for tech bros no cap.
like bro call me when it stops hallucinating basic stuff instead of flexing a 0.2% increase on a math test nobody takes.
the vibes are what matters and half these "sota" models have zero aura.
massive L fr
touching grass > reading leaderboards

Anonymous
12/01/25(Mon)17:57:08 No.107399724

Anonymous 12/01/25(Mon)17:57:08 No.107399724

>>107396363
> lora training
unironically >>>/h/hdg/
ldg doesn't know shit about training lora last I checked.

Anonymous
12/01/25(Mon)18:02:34 No.107399772

Anonymous 12/01/25(Mon)18:02:34 No.107399772

>>107399617
Figured it out.
>max out GPU layers
>n-cpu-moe=26

11 t/s. Absolute game changer. Will probably use Air instead of Mistral Small 3.2 now. Wonder if I can squeeze out a little bit more speed.

Anonymous
12/01/25(Mon)18:03:41 No.107399785

Anonymous 12/01/25(Mon)18:03:41 No.107399785

>>107399392
>>107399415
wtf then... I thought we needed to VRAMmaxx or RAMmaxx to run these models. this quant is 80GB but only taking 12GB VRAM and basically no RAM when I use these args. could I run a huge model (>=600B params) the same way then? I thought streaming from SSD is supposed to suck ass.

sorry to ask so many clueless questions, this is just pretty surprising.

Anonymous
12/01/25(Mon)18:11:48 No.107399866

Anonymous 12/01/25(Mon)18:11:48 No.107399866

>>107399724
Do they really not or are you just upset they didn't spoonfeed you when you asked there? Always assumed they were the image equivalent of /lmg/.

Anonymous
12/01/25(Mon)18:16:42 No.107399907

Anonymous 12/01/25(Mon)18:16:42 No.107399907

>>107399724
I've been checking /ldg/ since z-image released and there's a number of people training loras.

Anonymous
12/01/25(Mon)18:18:55 No.107399925

Anonymous 12/01/25(Mon)18:18:55 No.107399925

>>107399866
I've never gotten a useful response from either sdg or ldg asking about lora. I assume they are just image posting circlejerks. I've gotten much better info from hdg. I got the sense the hdg guys do more w/ lora b/c there's less on the shelf for them to work with.
tbf I haven't dug into either since 2023 other than to browse what they're working on, but I'd be surprised if they've changed.

Anonymous
12/01/25(Mon)18:19:35 No.107399936

Anonymous 12/01/25(Mon)18:19:35 No.107399936

>>107395036
>Miku (free space)\
you had one job

Anonymous
12/01/25(Mon)18:21:12 No.107399954

Anonymous 12/01/25(Mon)18:21:12 No.107399954

>>107399907
I've asked the same Q in both places. Let's see which one yields any useful information in year of our lord 2025.

Anonymous
12/01/25(Mon)18:33:14 No.107400082

Anonymous 12/01/25(Mon)18:33:14 No.107400082

>>107396021
It is not a werewolf it is a bug-person with wings and antennas! Nothing turns me on more than the sound of beetle wings flapping

Anonymous
12/01/25(Mon)18:39:31 No.107400125

Anonymous 12/01/25(Mon)18:39:31 No.107400125

>>107399785
>basically no ram
task manager is retarded or if you're on linux whatever manager, its in your memory
>I thought streaming from SSD is supposed to suck ass.
it does. ddr4 dual channel ram bandwidth is 51gb/s
if moe has 12b active params, and u run 4bit its 6gb active model
51/6 = 8.5 max theoretical speed for cpu-only dual channel ddr4 setup
of course its slower than max lmao
the deal comes when i dont know, half of those 6gb are in gpu, u get a bigger speedup than if u took random weights and offloaded them to gpu
also for the anon with RX7900XTX 24gb, 124gb ram: you can run glm 4.6 (32billion active), albeit it'll be slow. you can also try qwen 235b (22b active)
llama4 scout had 17b active parameters but only 3b were non shared, so it could run at extremely good speeds when u offloaded the shared tensors to gpu

Anonymous
12/01/25(Mon)18:42:51 No.107400151

Anonymous 12/01/25(Mon)18:42:51 No.107400151

>>107400125
ssd bandwidth is 4gb/s best case scenario or whatever
do the math
maybe you could raid0 or whatever raid it is many ssds that use pcie5 and are like
>This drive is rated for 7,450 / 6,900 MBps of sequential read/write throughput and 1.2 / 1.55 million read/write IOPS.
if u got 10 of these drives somehow, u could get 80gb/s bandwidth but idk man goodluck with that buddy

Anonymous
12/01/25(Mon)18:54:47 No.107400241

Anonymous 12/01/25(Mon)18:54:47 No.107400241

where the fuck is v3.2 ggufs?

Anonymous
12/01/25(Mon)18:55:10 No.107400243

Anonymous 12/01/25(Mon)18:55:10 No.107400243

i just bought a 4tb samsung 990 pro for $320. was this a good decision? i need more space for models as i currently only have 2tb and i heard that ssd prices are about to increase significantly

Anonymous
12/01/25(Mon)18:56:12 No.107400253

Anonymous 12/01/25(Mon)18:56:12 No.107400253

>>107400241
Don't worry, vibecoder is working on it.
https://github.com/ggml-org/llama.cpp/issues/16331

Anonymous
12/01/25(Mon)18:56:50 No.107400256

Anonymous 12/01/25(Mon)18:56:50 No.107400256

>>107400243
pretty good ssd, if i wasnt poor i would be happy with that decision
good for the price, im assuming it has 10 years warranty

Anonymous
12/01/25(Mon)18:57:06 No.107400258

Anonymous 12/01/25(Mon)18:57:06 No.107400258

File: file.png (3 KB, 255x62)

3 KB PNG

>>107400243

Anonymous
12/01/25(Mon)18:58:50 No.107400276

Anonymous 12/01/25(Mon)18:58:50 No.107400276

File: 1752190613718861.png (1.56 MB, 1080x1422)

1.56 MB PNG

>>107400258

Anonymous
12/01/25(Mon)19:02:14 No.107400303

Anonymous 12/01/25(Mon)19:02:14 No.107400303

>>107400256
it has a 5 year warranty. i currently have the 2tb version of the 990 pro and it was $140 when i got it 2 years ago. i only have pcie gen 4, so i didnt bother with the samsung 9100 because it was $50 for no performance gain
>>107400258
damn. they did have the 8tb version but it was way more expensive. i dont think i will be needing that much anyways.

Anonymous
12/01/25(Mon)19:02:18 No.107400305

Anonymous 12/01/25(Mon)19:02:18 No.107400305

>>107400253
Does it still count as vibecoding if he's planning to buy books and learn CUDA? By the time he knows enough to tardwrangle a model to outputting a decent kernel, he could just write the damn thing himself.
For that matter, is everyone just going to wait for him to finish studying? There's gotta be someone interested in implementing 3.2 Exp that would have if it wasn't for this clown hogging the issue.

Anonymous
12/01/25(Mon)19:02:32 No.107400308

Anonymous 12/01/25(Mon)19:02:32 No.107400308

>>107397307
Totally uncensored?!?
Why are anons sleeping on this?

Anonymous
12/01/25(Mon)19:02:46 No.107400309

Anonymous 12/01/25(Mon)19:02:46 No.107400309

>>107400243
get two more, you'll thank me in a few weeks

Anonymous
12/01/25(Mon)19:03:17 No.107400313

Anonymous 12/01/25(Mon)19:03:17 No.107400313

>>107400303
> i dont think i will be needing that much anyways.
you likely wont, if you want to archive models that badly you can get a big hdd for pretty cheap probs

Anonymous
12/01/25(Mon)19:03:52 No.107400316

Anonymous 12/01/25(Mon)19:03:52 No.107400316

>>107400125
>task manager is retarded or if you're on linux whatever manager, its in your memory
you're right. HTOP and TOP show the actual amount used, even though HTOP is still a bit weird about it.

OK, I am not streaming from SSD, llama.cpp just reserves memory in a way that some system monitor tools weren't designed to handle.

Anonymous
12/01/25(Mon)19:04:47 No.107400321

Anonymous 12/01/25(Mon)19:04:47 No.107400321

>>107400316
use --no-mmap to solve this issue

Anonymous
12/01/25(Mon)19:04:49 No.107400322

Anonymous 12/01/25(Mon)19:04:49 No.107400322

I'm buying a 20TB+ HDD to store my stuff. Can't believe I thought 4TB was enough

Anonymous
12/01/25(Mon)19:05:56 No.107400332

Anonymous 12/01/25(Mon)19:05:56 No.107400332

>>107400322
>only 1
This will be a fun learning experience for you

Anonymous
12/01/25(Mon)19:08:02 No.107400350

Anonymous 12/01/25(Mon)19:08:02 No.107400350

>>107400305
Why not just copy the kernels from the transformers library?

Anonymous
12/01/25(Mon)19:08:23 No.107400353

Anonymous 12/01/25(Mon)19:08:23 No.107400353

>>107400308
Because every LLM in existence is trivially easy to talk into doing what you want if you control the system prompt and prefills, and "uncensored" finetroons invariably make them dumber and worse

Anonymous
12/01/25(Mon)19:08:30 No.107400354

Anonymous 12/01/25(Mon)19:08:30 No.107400354

>>107400332
I'm not planning to rape my HDD with writes, it should be fine

Anonymous
12/01/25(Mon)19:09:26 No.107400361

Anonymous 12/01/25(Mon)19:09:26 No.107400361

>>107400353
That's not the bad kind of censorship. Stripping the dataset before pre training is the real killer, gives you shit outputs.

Anonymous
12/01/25(Mon)19:23:28 No.107400508

Anonymous 12/01/25(Mon)19:23:28 No.107400508

>>107400322
>Can't believe I thought 4TB was enough
oh come on
I just bought a 4 tb ssd

Anonymous
12/01/25(Mon)19:28:51 No.107400555

Anonymous 12/01/25(Mon)19:28:51 No.107400555

>>107400276
lol hardcore
now do Vic20

Anonymous
12/01/25(Mon)19:46:37 No.107400694

Anonymous 12/01/25(Mon)19:46:37 No.107400694

>>107400555
Do you have any idea how long it takes to type that all in
Can't make a typo in the post numbers either or it'll look fake

Anonymous
12/01/25(Mon)19:56:27 No.107400776

Anonymous 12/01/25(Mon)19:56:27 No.107400776

Question about reasoning models. Does the chain of thought usually stay in the context, or is it purged upon generating the next reply?

Anonymous
12/01/25(Mon)19:57:20 No.107400784

Anonymous 12/01/25(Mon)19:57:20 No.107400784

>>107400776
purged

Anonymous
12/01/25(Mon)19:58:54 No.107400801

Anonymous 12/01/25(Mon)19:58:54 No.107400801

>>107400776
Generally purged, although labs are flirting with not doing that in some cases.

(For instance, most new (frontier) models in the last couple of months have "interleaved thinking" where they think, tool call, think, in a loop and the chain of thought is preserved. And Opus 4.5 was trained with some sort of "scratchpad" primitive where it could stop and do more thinking in mid-response and those are retained AFAIK.)

Anonymous
12/01/25(Mon)20:00:08 No.107400807

Anonymous 12/01/25(Mon)20:00:08 No.107400807

>check orange reddit thread about new deepseek
>misinformation about how models work and absolute retard takes

Anonymous
12/01/25(Mon)20:03:04 No.107400831

Anonymous 12/01/25(Mon)20:03:04 No.107400831

>>107400807
>check orange reddit [...]
>misinformation [...] and absolute retard takes
as expected

Anonymous
12/01/25(Mon)20:04:04 No.107400843

Anonymous 12/01/25(Mon)20:04:04 No.107400843

>>107400807
reminds me of clover

Anonymous
12/01/25(Mon)20:05:04 No.107400850

Anonymous 12/01/25(Mon)20:05:04 No.107400850

File: ohno.png (194 KB, 1568x874)

194 KB PNG

kek

Anonymous
12/01/25(Mon)20:05:46 No.107400858

Anonymous 12/01/25(Mon)20:05:46 No.107400858

>>107400801
>>107400784
Interesting. I assume we purge the chain of thought due to context constraints, right? If these SOTA labs are starting to keep it I guess that means there's benefit to keeping it around. I wonder if OCR or other methods to increase context length will pave the way for changing how we handle reasoning.

Anonymous
12/01/25(Mon)20:07:25 No.107400877

Anonymous 12/01/25(Mon)20:07:25 No.107400877

>>107400850
dockerbros?

Anonymous
12/01/25(Mon)20:08:55 No.107400891

Anonymous 12/01/25(Mon)20:08:55 No.107400891

File: death to vibecoders.png (473 KB, 1568x1802)

473 KB PNG

>>107400850
he got mogged in his vibecoded slop PR and called it quits

Anonymous
12/01/25(Mon)20:10:08 No.107400903

Anonymous 12/01/25(Mon)20:10:08 No.107400903

>>107400858
Probably. But they also tend to be full of mistakes.
>I can do it this way!
>Bla bla bla.
>No wait, that doesn't work.

Anonymous
12/01/25(Mon)20:11:32 No.107400916

Anonymous 12/01/25(Mon)20:11:32 No.107400916

>>107400891
I'm fairly proficient in vibecodding, but I'd never try my hand on this monolith

Anonymous
12/01/25(Mon)20:13:21 No.107400932

Anonymous 12/01/25(Mon)20:13:21 No.107400932

>>107400858
The thing about the reasoning is that it can take more space in the context than the actual information that matters as input (chat history) to generate the next response. They kind of poison the context in a way, at least as they are today.
Also, part of the reason we remove past reasoning blocks from the context is because these models have been trained like that as far as I can tell.

Anonymous
12/01/25(Mon)20:15:26 No.107400942

Anonymous 12/01/25(Mon)20:15:26 No.107400942

>>107400877
docked
>>107400891
He was trying to add loading safetensors for some reason too. llama.cpp hosted too many of his (or his employer's) pet projects expecting llama.cpp devs to maintain it.

Anonymous
12/01/25(Mon)20:15:40 No.107400945

Anonymous 12/01/25(Mon)20:15:40 No.107400945

>>107400850
mitKEKED

Anonymous
12/01/25(Mon)20:47:04 No.107401232

Anonymous 12/01/25(Mon)20:47:04 No.107401232

>>107400932
There's probably a ratio of thinking tokens vs normal tokens that is optimal for generating the most accurate replies. More tokens doesn't always mean a better answer. I see something like OCR and it tells me that future models will have vastly larger context sizes which will give reasoning models much more value, so long as "reasoning" continues to get better.

Anonymous
12/01/25(Mon)21:02:54 No.107401405

Anonymous 12/01/25(Mon)21:02:54 No.107401405

>>107401232
OpenAI spent thousands of dollars per problem on arc agi

Anonymous
12/01/25(Mon)21:17:37 No.107401567

Anonymous 12/01/25(Mon)21:17:37 No.107401567

File: lol.png (121 KB, 1392x473)

121 KB PNG

>>107399907
>>107399954
Here's the results after 2 hours.

Anonymous
12/01/25(Mon)22:06:24 No.107402052

Anonymous 12/01/25(Mon)22:06:24 No.107402052

- I upgraded from a 1080 to 5070TI.
- Currently running ArliAI_GLM-4.5-Air-Derestricted-IQ4_XS on SillyTavern/KoboldCPP
- Anon here told me to turn FlashAttention on, change GPU layers to 1000, MoE CPU Layers to 1000
-

Any other suggestions? I was having issues where like, Kobold was taking 5 min+ to start generating because I guess it was processing the whole thing.

Anonymous
12/01/25(Mon)22:08:15 No.107402073

Anonymous 12/01/25(Mon)22:08:15 No.107402073

>>107402052
RAM speed and VRAM usage with the model loaded?

Anonymous
12/01/25(Mon)22:10:38 No.107402101

Anonymous 12/01/25(Mon)22:10:38 No.107402101

>>107402073
Where would I look? Process Explorer?

Anonymous
12/01/25(Mon)22:12:47 No.107402123

Anonymous 12/01/25(Mon)22:12:47 No.107402123

File: file.png (52 KB, 823x882)

52 KB PNG

>>107402101
the RAM section in task manager for RAM speed and the GPU section in task manager for VRAM usage

Anonymous
12/01/25(Mon)22:17:12 No.107402170

Anonymous 12/01/25(Mon)22:17:12 No.107402170

File: file.png (61 KB, 729x910)

61 KB PNG

>>107402123
Unfortunately can't access Task Manager due to Process Explorer being installed, but I think this is the equivalent.

Anonymous
12/01/25(Mon)22:19:54 No.107402189

Anonymous 12/01/25(Mon)22:19:54 No.107402189

>>107402170
so you have a 5090 and 96GB of RAM, presumably DDR5 6000MT/s or higher. not a bad setup. how is your tokens per second for both prompt processing and token generation? you should be getting around 200t/s and 15t/s respectively with this kind of hardware.

Anonymous
12/01/25(Mon)22:20:00 No.107402191

Anonymous 12/01/25(Mon)22:20:00 No.107402191

>>107402052
silly billy have you tried to change batch size to 4096 yet? i responded~
but im sleeping now
what ctx size tho

Anonymous
12/01/25(Mon)22:21:32 No.107402209

Anonymous 12/01/25(Mon)22:21:32 No.107402209

>>107402189
Nah, 64gb of RAM.
>>107402191
Yeah, I did, process time is at least down to a minute now.

CtxLimit:13854/32768, Amt:177/200, Init:0.18s, Process:76.04s (179.72T/s), Generate:32.56s (5.44T/s), Total:108.61s

Anonymous
12/01/25(Mon)22:28:40 No.107402254

Anonymous 12/01/25(Mon)22:28:40 No.107402254

>>107402209
>Nah, 64gb of RAM.
ah. misread.
>Process:76.04s (179.72T/s), Generate:32.56s (5.44T/s)
acceptable prompt processing speeds, but your tg is very low.
the model itself is only 61GB, and the context probably is taking between 8GB and 12GB. consider quantizing your context to 8 bit, and you also might have to make a custom layer offload. i only know how to do that with ikllama.cpp, but it is possible with llama.cpp and kobold. manually offloading the layers and then putting the rest in RAM generally gives more performance than just using max layers with the cpumoe argument

Anonymous
12/01/25(Mon)22:35:50 No.107402312

Anonymous 12/01/25(Mon)22:35:50 No.107402312

>>107402209
maybe try an IQ4XS of base GLM Air to see if you get better performance? sometimes finetunes like these can degrade generation speeds due to having slight architectural changes from the finetuning process

Anonymous
12/01/25(Mon)22:39:19 No.107402330

Anonymous 12/01/25(Mon)22:39:19 No.107402330

I think I've discovered a breakthrough. Still ironing things out, so I will share more details soon. For now, a small hint: Recursion is the key to unlock the Final Potential.

Anonymous
12/01/25(Mon)22:51:20 No.107402406

Anonymous 12/01/25(Mon)22:51:20 No.107402406

Vague thing. Nothing to show. Stay tuned.

Anonymous
12/01/25(Mon)22:52:01 No.107402411

Anonymous 12/01/25(Mon)22:52:01 No.107402411

File: miku holding candy pills (...).png (979 KB, 1176x880)

979 KB PNG

>>107402330
Can't wait to hear what you've come up with!

Anonymous
12/01/25(Mon)22:55:36 No.107402438

Anonymous 12/01/25(Mon)22:55:36 No.107402438

>>107400253
I wish I was excited enough about anything to blogpost on GitHub issues.

Anonymous
12/01/25(Mon)23:28:45 No.107402682

Anonymous 12/01/25(Mon)23:28:45 No.107402682

Is there no way to run ds32 on cpu right now? its all just a half of a vibe-coded mess?

Anonymous
12/01/25(Mon)23:32:47 No.107402721

Anonymous 12/01/25(Mon)23:32:47 No.107402721

>>107400891
>hey claude, add continuous batching, make no mistakes

Anonymous
12/01/25(Mon)23:34:13 No.107402742

Anonymous 12/01/25(Mon)23:34:13 No.107402742

I'm anon with the 7900 XTX and 124GB RAM.

>./llama-server -fa on --n-cpu-moe 40 -ngl 1000 -hf bartowski/ArliAI_GLM-4.5-Air-Derestricted-GGUF:Q5_K_M -c 50000
>10.13 tokens/s
>./llama-server -fa on --n-cpu-moe 40 -ngl 1000 -hf gghfez/gpt-oss-120b-Derestricted.MXFP4_MOE-gguf:MXFP4_MOE -c 124000
>20.18 tokens/s:
This is pretty awesome, I thought I would need more VRAM or a framework desktop or something to run this size of model. The context is small on GLM bc larger contexts were crashing.

Looking forward to testing the derestricted qwen 80B when that drops.

Anonymous
12/01/25(Mon)23:46:05 No.107402853

Anonymous 12/01/25(Mon)23:46:05 No.107402853

File: 1746872674203532.png (425 KB, 932x1520)

425 KB PNG

Anonymous
12/02/25(Tue)00:15:28 No.107403087

Anonymous 12/02/25(Tue)00:15:28 No.107403087

>>107402853
That is antisemitic

Anonymous
12/02/25(Tue)00:24:47 No.107403141

Anonymous 12/02/25(Tue)00:24:47 No.107403141

>>107402853
>etched in stone like the ten commandments which coincidentally are also jewish
>convenient
does nothing for the argument but is a funny thing to say

Anonymous
12/02/25(Tue)01:09:02 No.107403421

Anonymous 12/02/25(Tue)01:09:02 No.107403421

I really like how 3.2-Speciale writes but I really hope there's a way to cut back its retardedly elaborate thinking process with prefills once we have it local

Anonymous
12/02/25(Tue)01:10:51 No.107403431

Anonymous 12/02/25(Tue)01:10:51 No.107403431

>>107403087
How so?

Anonymous
12/02/25(Tue)01:15:20 No.107403458

Anonymous 12/02/25(Tue)01:15:20 No.107403458

>>107402853
That's too subjective and emotional
It should show a couple of pieces of evidence as to why the Holocaust is fishy

Anonymous
12/02/25(Tue)01:26:00 No.107403520

Anonymous 12/02/25(Tue)01:26:00 No.107403520

I failed Destroy Dick December. Hope I can complete Fibonacci Fap February

Anonymous
12/02/25(Tue)01:32:21 No.107403558

Anonymous 12/02/25(Tue)01:32:21 No.107403558

>>107403421
>cut back its retardedly elaborate thinking process
Isn't that just regular v3.2?

Anonymous
12/02/25(Tue)01:34:33 No.107403574

Anonymous 12/02/25(Tue)01:34:33 No.107403574

>>107403520
>skipping Just Jerk January
Are you going into a coma?

Anonymous
12/02/25(Tue)01:35:09 No.107403577

Anonymous 12/02/25(Tue)01:35:09 No.107403577

>>107403558
My hope is that some of the stuff that makes 3.2-Speciale so speciale sticks around if it thinks for only 1.5k tokens instead of 3k whenever it has to handle a moderately complicated scenario with some rules and a system prompt attached.

Anonymous
12/02/25(Tue)01:36:04 No.107403580

Anonymous 12/02/25(Tue)01:36:04 No.107403580

File: 1753686547157008.png (322 KB, 933x1850)

322 KB PNG

GLM Derestricted is nuts. full code:
https://pastebin.com/1qAtVYxV

Anonymous
12/02/25(Tue)01:38:33 No.107403598

Anonymous 12/02/25(Tue)01:38:33 No.107403598

>>107403580
>codeslop
I sleep

Anonymous
12/02/25(Tue)01:38:52 No.107403600

Anonymous 12/02/25(Tue)01:38:52 No.107403600

>>107403580
Who cares about some edgy middle schooler coding project. Show off some porn.

Anonymous
12/02/25(Tue)01:44:56 No.107403630

Anonymous 12/02/25(Tue)01:44:56 No.107403630

>>107396362
That won't even fit a 24B. Try a 12B.
You should've used the money to get an actual computer instead of a phone disguised as one if you were intending to use it for this.

Anonymous
12/02/25(Tue)01:46:48 No.107403638

Anonymous 12/02/25(Tue)01:46:48 No.107403638

>>107396379
Just wait until the bubble crashes and then you won't be able to afford it anyway because your money will be wet paper.

Anonymous
12/02/25(Tue)01:46:53 No.107403639

Anonymous 12/02/25(Tue)01:46:53 No.107403639

After going back to normal GLM Air I can say that Intellect 3 is worse. It's dumber and writes less well.

Anonymous
12/02/25(Tue)01:49:04 No.107403651

Anonymous 12/02/25(Tue)01:49:04 No.107403651

3.2-Speciale can only do one reasoning block before sperging out, so it's definitely not suitable for RP

Anonymous
12/02/25(Tue)01:50:51 No.107403658

Anonymous 12/02/25(Tue)01:50:51 No.107403658

>>107403651
Nevermind, I'm wrong. Apparently you're supposed to to pass all previous reasoning blocks to the model, this is different from original R1.

Anonymous
12/02/25(Tue)01:55:14 No.107403688

Anonymous 12/02/25(Tue)01:55:14 No.107403688

>>107403658
>Bloating your context with irrelevant reasoning shit
I'm sure it'll do wonders on RP

Anonymous
12/02/25(Tue)01:55:45 No.107403692

Anonymous 12/02/25(Tue)01:55:45 No.107403692

>>107403658
It works fine with any standard ST preset that filters out previous thinking blocks though

Anonymous
12/02/25(Tue)02:00:11 No.107403721

Anonymous 12/02/25(Tue)02:00:11 No.107403721

>>107403688
It has 128K context

Anonymous
12/02/25(Tue)02:02:05 No.107403736

Anonymous 12/02/25(Tue)02:02:05 No.107403736

>>107403721
lol

Anonymous
12/02/25(Tue)02:03:38 No.107403746

Anonymous 12/02/25(Tue)02:03:38 No.107403746

>>107403736
>no argument

Anonymous
12/02/25(Tue)02:04:55 No.107403759

Anonymous 12/02/25(Tue)02:04:55 No.107403759

>>107403688
Get with the times grandpa, new cloode also does this. It will be the norm for majority of the models very soon.

Anonymous
12/02/25(Tue)02:04:56 No.107403760

Anonymous 12/02/25(Tue)02:04:56 No.107403760

>>107403688
Reasoning is good for long RPs, as it recalls relevant events thus pushes them to the end of the context where they receive more attention

Anonymous
12/02/25(Tue)02:06:04 No.107403769

Anonymous 12/02/25(Tue)02:06:04 No.107403769

>>107403760
We're talking about retaining all reasoning blocks vs retaining only the last one

Anonymous
12/02/25(Tue)02:07:28 No.107403774

Anonymous 12/02/25(Tue)02:07:28 No.107403774

>>107403769
shut

Anonymous
12/02/25(Tue)02:07:52 No.107403776

Anonymous 12/02/25(Tue)02:07:52 No.107403776

>>107403746
context brainrot ohio skibidi gyatt sigma rizz

Anonymous
12/02/25(Tue)02:11:58 No.107403797

Anonymous 12/02/25(Tue)02:11:58 No.107403797

>>107403721
unused context is wasted context amarite?

Anonymous
12/02/25(Tue)02:12:55 No.107403803

Anonymous 12/02/25(Tue)02:12:55 No.107403803

>>107403797
This but unironically.

Anonymous
12/02/25(Tue)02:18:03 No.107403835

Anonymous 12/02/25(Tue)02:18:03 No.107403835

>>107403721
Come back after a couple months break and some people in here still think context is real.
Even with noass it all breaks down sooner than later. And thats being careful and trying to steer it away from the repetition.
More like 12k or 16k.
Can't imagine how bad it must be to have the thinking in context. Sounds crazy.

Anonymous
12/02/25(Tue)02:20:35 No.107403852

Anonymous 12/02/25(Tue)02:20:35 No.107403852

where's the z-image turbo of llms?

Anonymous
12/02/25(Tue)02:22:27 No.107403868

Anonymous 12/02/25(Tue)02:22:27 No.107403868

Can someone explain what a prefill is and how do I set it up?
I'm looking at
>https://rentry.org/recommended-models
>GLM-4.5 Air (50GB) - The long awaited middle point between Nemo and DeepSeek. Like Nemo its pretraining doesn't seem to have been filtered at all so it knows all kinds of things. Needs a prefill to get around refusals. Don't go below Q2_K_XL. MoE model.

Anonymous
12/02/25(Tue)02:25:42 No.107403890

Anonymous 12/02/25(Tue)02:25:42 No.107403890

>>107403580
>process_jew
i kek'd

Anonymous
12/02/25(Tue)02:27:43 No.107403909

Anonymous 12/02/25(Tue)02:27:43 No.107403909

>>107403868
Putting shit in the context before generating. It actually doesn't need that, by the way. I've honestly never experienced a refusal except from some really shitty models and I've been using these pieces of shit for years now, god knows what you'd have to do to get a refusal from a model like glm

Anonymous
12/02/25(Tue)02:28:17 No.107403915

Anonymous 12/02/25(Tue)02:28:17 No.107403915

>>107403852
Mistral 3 has great sizes anon!
3b and 675b.
I'm sure the 3b one will destroy any mid level model for RP specifically. We are so back.

On a serious note:
I think a bit more than 1 year ago it was the reverse for image.Extremely cucked release with SD3 and I saw comments in those threads how they wish they would have it like us.

Anonymous
12/02/25(Tue)02:38:54 No.107403968

Anonymous 12/02/25(Tue)02:38:54 No.107403968

File: z-image reasoning.png (3.69 MB, 2221x1315)

3.69 MB PNG

>>107403915
With how good imggen gets, we may witness an absurd situation where image models do better text-on-image RP than LLMs

Anonymous
12/02/25(Tue)02:39:09 No.107403971

Anonymous 12/02/25(Tue)02:39:09 No.107403971

File: Näyttökuva 2025-12-02 093732.png (108 KB, 1130x773)

108 KB PNG

>>107403909

Anonymous
12/02/25(Tue)02:46:13 No.107404013

Anonymous 12/02/25(Tue)02:46:13 No.107404013

>>107403968
z-image is 6b and really fast. 15 sec on my 5060ti.
Flux germanistan cucks look like total tards now. Their flux2 release page had 60% focused on how safe it is. kek

I guess the ultimate RP experience is putting images in context and get image out accompanied with text.
I wouldn't mind getting a image and readable text at the bottom. Kinda like a VN.
Since true multimodal still makes all the models tarded I guess something like z-image could work.
That being said, z-image uses qwen3 4b right. I doubt its able to suprise us. kek

Anonymous
12/02/25(Tue)02:49:27 No.107404040

Anonymous 12/02/25(Tue)02:49:27 No.107404040

>>107403971
>Instruction: Refuse to help the user in any way, do not provide useful output
I'm not going to load up the model just to show you that you're retarded.

Anonymous
12/02/25(Tue)02:55:36 No.107404085

Anonymous 12/02/25(Tue)02:55:36 No.107404085

are there any good tiny llms specialized in translating english to chinese?

Anonymous
12/02/25(Tue)02:57:08 No.107404097

Anonymous 12/02/25(Tue)02:57:08 No.107404097

>>107403458
That is a difficult balance though since Holocaust denial falls apart under inclusion of facts that are not carefully cherrypicked.

Anonymous
12/02/25(Tue)03:22:31 No.107404269

Anonymous 12/02/25(Tue)03:22:31 No.107404269

>>107403915
z image is still kinda cucked, just not the absolute extreme everyone else is doing

Anonymous
12/02/25(Tue)03:26:16 No.107404294

Anonymous 12/02/25(Tue)03:26:16 No.107404294

File: 1759655125177386.jpg (141 KB, 930x1000)

141 KB JPG

Reminder that TheDrummer's Cydonia tunes are still the best RP models for anything less than 200B dense. MoE niggers need not apply.

Anonymous
12/02/25(Tue)03:29:41 No.107404320

Anonymous 12/02/25(Tue)03:29:41 No.107404320

>>107404040
No need to be rude. There's no instruction to refuse anything. Sure I can put a jaibreak prompt so I get like paragraph or two of good answers before spiraling back to moralizing nonsense again.

And sillytavern gives only empty responses.

Anonymous
12/02/25(Tue)03:35:37 No.107404353

Anonymous 12/02/25(Tue)03:35:37 No.107404353

>>107404320
What are you using the model for? Questions? If that's the case you're better off making a new chat for every question anyways, these things aren't really trained for back and forth interaction.
>sillytavern gives only empty responses
What does this even mean?

Anonymous
12/02/25(Tue)03:37:52 No.107404365

Anonymous 12/02/25(Tue)03:37:52 No.107404365

>>107404294
Actually, command-r is still the best for RP. Also suck my dick drummer your synthetic slop trained models will never be good

Anonymous
12/02/25(Tue)03:40:15 No.107404378

Anonymous 12/02/25(Tue)03:40:15 No.107404378

I regret looking at another thread. Someone post Miku eyebleach

Anonymous
12/02/25(Tue)03:46:34 No.107404408

Anonymous 12/02/25(Tue)03:46:34 No.107404408

>>107404365
>command-r
buy an ad nigger, nobody actually used that garbage.

Anonymous
12/02/25(Tue)03:50:29 No.107404419

Anonymous 12/02/25(Tue)03:50:29 No.107404419

>>107404408
>newfag bitch wants to tell me what people used before he was even born
>shills for drummer sloptunes and tells other people to buy an ad
KYS

Anonymous
12/02/25(Tue)03:50:42 No.107404421

Anonymous 12/02/25(Tue)03:50:42 No.107404421

>>107404365
It's not even that they're synthetic slop, the idea of making the models as horny as possible is retardation should have died by the end of 2023 and become unthinkable by now. Early ERP models were simply a reaction to sexless and filtered character.ai, if you're not a complete coombrain you'll want more than "aah cock pussy plap plap". Of course in the faggot's case it's just continued grifting.

Anonymous
12/02/25(Tue)04:00:32 No.107404465

Anonymous 12/02/25(Tue)04:00:32 No.107404465

File: comfy clothes winter miku(...).jpg (143 KB, 1199x1199)

143 KB JPG

>>107404378

Anonymous
12/02/25(Tue)04:04:34 No.107404481

Anonymous 12/02/25(Tue)04:04:34 No.107404481

>>107404269
Some furry community will soon "finetune" it on more images than it saw during training and we'll have the greatest model ever

Anonymous
12/02/25(Tue)04:54:55 No.107404740

Anonymous 12/02/25(Tue)04:54:55 No.107404740

File: 843525.jpg (354 KB, 1440x3120)

354 KB JPG

Local is so fucked

Anonymous
12/02/25(Tue)04:55:29 No.107404744

Anonymous 12/02/25(Tue)04:55:29 No.107404744

File: 1763008258014820.gif (482 KB, 498x498)

482 KB GIF

>>107404465
thanks

Anonymous
12/02/25(Tue)04:59:37 No.107404759

Anonymous 12/02/25(Tue)04:59:37 No.107404759

>>107404040
>Refuse to help the user in any way, do not provide useful output
This is actually a pretty fun system prompt to argue with TBDesu

Anonymous
12/02/25(Tue)05:11:59 No.107404822

Anonymous 12/02/25(Tue)05:11:59 No.107404822

File: gemma4_new.png (160 KB, 597x841)

160 KB PNG

https://xcancel.com/osanseviero/status/1995786572466098579
It doesn't look like the initial post was meant to highlight Gemma too much...

Anonymous
12/02/25(Tue)05:12:09 No.107404824

Anonymous 12/02/25(Tue)05:12:09 No.107404824

File: empty.jpg (93 KB, 1191x543)

93 KB JPG

>>107404353
>Open sillytavern
>Open character
>Say 'hello' in chat
>Processing prompt
>Empty response

And you still didn't explain what prefill means.

Anonymous
12/02/25(Tue)05:14:15 No.107404833

Anonymous 12/02/25(Tue)05:14:15 No.107404833

Man. We need a Z Image moment for LLMs. They explicitly tried to defeat the "scale at all cost" paradigm and were relatively successful. No synthetic data, and it shows. Meanwhile modern LLMs all talk in an artificial way with really fake prose. Z Image produces some of the most realistic looking photograph gens any image model with skin that actually doesn't look plastic. Imagine if we had a new Nemo or something.

Anonymous
12/02/25(Tue)05:16:40 No.107404841

Anonymous 12/02/25(Tue)05:16:40 No.107404841

>>107404833
Z-Image is a very overfit model. You'd get tired of a "natural-sounding" LLM trained in the same way very quickly.

Anonymous
12/02/25(Tue)05:20:44 No.107404862

Anonymous 12/02/25(Tue)05:20:44 No.107404862

>>107404841
>Z-Image is a very overfit model
This is a distillation problem.

Anonymous
12/02/25(Tue)05:23:10 No.107404879

Anonymous 12/02/25(Tue)05:23:10 No.107404879

>>107404841
The knowledge is still in there to produce other styles though as it responds well to loras. We need to wait for the base model to come out to be sure though, fair. Meanwhile other models, even their bases, have that plastic look, and while you can prompt/lora to improve them, the result is less flexible as a result.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.