/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

🎉 Happy Birthday 4chan! 🎉

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 10/07/25(Tue)19:42:46 No.106822756

File: sssssssssss.jpg (262 KB, 1536x1536)

262 KB JPG

/lmg/ - Local Models General Anonymous 10/07/25(Tue)19:42:46 No.106822756

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106816104 & >>106807832

►News
>(10/07) Release: LFM2-8b-A1b: Hybrid attention tiny MoE: https://liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts
>(10/07) NeuTTS Air released, built off Qwen 0.5B: https://hf.co/neuphonic/neutts-air
>(10/06) Anthropic open sources Petri, a parallel exploration tool: https://anthropic.com/research/petri-open-source-auditing
>(10/03) Qwen3-VL-30B-A3B released: https://hf.co/Qwen/Qwen3-VL-30B-A3B-Thinking
>(10/02) ZLUDA 5 released with preliminary support for llama.cpp: https://vosen.github.io/ZLUDA/blog/zluda-update-q3-2025

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/07/25(Tue)19:43:17 No.106822760

Anonymous 10/07/25(Tue)19:43:17 No.106822760

File: __hatsune_miku_vocaloid_d(...).jpg (282 KB, 1280x1582)

282 KB JPG

►Recent Highlights from the Previous Thread: >>106816104

--Paper: Less is More: Recursive Reasoning with Tiny Networks:
>106820451 >106820469 >106820482 >106820507
--Papers:
>106822631
--Liquid AI releases efficient 8B hybrid MoE model for on-device applications:
>106817815 >106817843 >106817858 >106817906 >106817920 >106818061 >106818120 >106818105 >106817881 >106820927
--Critiquing GPT-OSS model limitations and questioning its practical viability:
>106818819 >106818841 >106818858 >106818864 >106818896 >106818933 >106818955 >106818960 >106819069 >106819070 >106821424 >106818929 >106819063 >106819269 >106819323 >106819066 >106819218 >106820707 >106820723 >106819149 >106819161 >106819221
--CPU/GPU server architecture tradeoffs:
>106816785 >106816809 >106816843 >106816858 >106816867 >106816880 >106816903 >106816961 >106817031 >106817084
--Choosing a lightweight Linux distro for optimal llama.cpp performance:
>106818191 >106818219 >106818232 >106818234 >106818294 >106818363 >106818271 >106818282 >106818463
--Alternatives to Command-R+ for RAM-based inference and model performance tradeoffs:
>106819017 >106819051 >106819061 >106819088 >106819163 >106819439 >106819533 >106819627 >106819649
--GitHub PR for host memory prompt caching in llama.cpp:
>106820185 >106820217
--Dual-model pipeline for bypassing censorship via token swapping:
>106818917 >106818970 >106819349
--Finetuning challenges: compute needs, data quality, and model limitations:
>106820466 >106820490 >106820554 >106820793 >106820903 >106821020 >106821064 >106821186 >106821253 >106821335 >106821363 >106821302 >106821356 >106821387 >106821401 >106821441 >106821461 >106821654 >106821732 >106821859 >106822067 >106821583 >106821693 >106821773 >106821815 >106821840 >106821848 >106822010 >106822042 >106822066 >106822079 >106822128 >106822043 >106822016
--Miku (free space):
>106816325

►Recent Highlight Posts from the Previous Thread: >>106816108

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/07/25(Tue)19:47:23 No.106822790

Anonymous 10/07/25(Tue)19:47:23 No.106822790

first for Granite 4

Anonymous
10/07/25(Tue)19:47:36 No.106822792

Anonymous 10/07/25(Tue)19:47:36 No.106822792

File: dead.jpg (81 KB, 1637x265)

81 KB JPG

>>106822756
>>106822760
waifufaggots begone.

Anonymous
10/07/25(Tue)19:48:44 No.106822796

Anonymous 10/07/25(Tue)19:48:44 No.106822796

>>106822792
you first

Anonymous
10/07/25(Tue)19:48:48 No.106822797

Anonymous 10/07/25(Tue)19:48:48 No.106822797

>>106822766
Is miku a leftist now?

Anonymous
10/07/25(Tue)19:48:56 No.106822799

Anonymous 10/07/25(Tue)19:48:56 No.106822799

anyone got a glm air 4.5 st preset? st has been pretty good about auto-applying whichever i need for models, but doesnt show anything for this and is doing like <|eot_id||><|end_header_id|> in the response randomly, then keeps writing

Anonymous
10/07/25(Tue)19:52:51 No.106822823

Anonymous 10/07/25(Tue)19:52:51 No.106822823

I heard good old Rocinante is still the best roleplay model out there

Anonymous
10/07/25(Tue)19:53:55 No.106822827

Anonymous 10/07/25(Tue)19:53:55 No.106822827

>>106822799
Update your ST, it has a GLM 4 preset already made.

Anonymous
10/07/25(Tue)19:54:51 No.106822830

Anonymous 10/07/25(Tue)19:54:51 No.106822830

>>106822823
If you're a bottom of the barrel poorfag yes.

Anonymous
10/07/25(Tue)19:55:36 No.106822833

Anonymous 10/07/25(Tue)19:55:36 No.106822833

File: 71-717073_questions-quest(...).png (41 KB, 860x754)

41 KB PNG

What's the best model to use for an RTX 5090 when using kobold + sillytavern? I assume this could handle something pretty beefy for accurate results?
MUST be able to do porn.
Currently using Mistral-Nemo-Instruct-2407-Q8_0 but this was for a 3070 so I'm curious to make full use of a 5090

Anonymous
10/07/25(Tue)19:56:37 No.106822844

Anonymous 10/07/25(Tue)19:56:37 No.106822844

>>106822833
Rocinante v1.1

Anonymous
10/07/25(Tue)19:58:56 No.106822858

Anonymous 10/07/25(Tue)19:58:56 No.106822858

>>106822797
always was, she even got blacked a long time ago

Anonymous
10/07/25(Tue)19:59:41 No.106822864

Anonymous 10/07/25(Tue)19:59:41 No.106822864

>>106822844
no but really

Anonymous
10/07/25(Tue)19:59:58 No.106822867

Anonymous 10/07/25(Tue)19:59:58 No.106822867

I was skeptical about GLM 4.6 at first but it's working great when using it with muh dick.

Anonymous
10/07/25(Tue)20:01:02 No.106822872

Anonymous 10/07/25(Tue)20:01:02 No.106822872

File: file.png (99 KB, 545x931)

99 KB PNG

>>106822864
Seriously. For real. Unironically.

Anonymous
10/07/25(Tue)20:03:25 No.106822885

Anonymous 10/07/25(Tue)20:03:25 No.106822885

File: 1731901941862455.jpg (3.6 MB, 6912x5184)

3.6 MB JPG

This is your least obsessed and most mentally sane baker and moderator btw

>>106789576

Anonymous
10/07/25(Tue)20:04:20 No.106822895

Anonymous 10/07/25(Tue)20:04:20 No.106822895

>>106822872
those settings are on a 5090?

scabPICKER
10/07/25(Tue)20:04:24 No.106822896

scabPICKER 10/07/25(Tue)20:04:24 No.106822896

the starter guide is 1 year old, guys

and it's not a real guide. neither llm thread has a beginner's guide.

Anonymous
10/07/25(Tue)20:04:49 No.106822901

Anonymous 10/07/25(Tue)20:04:49 No.106822901

>>106822833
>I assume this could handle something pretty beefy
You'd assume wrong. less than 48GB means you're still stuck with small, shitty models unless you partial offload to system RAM with a big speed penalty.
The main difference from your 3070 is that you can now run slightly less shitty ~20-30b models and use higher quants and context.

Anonymous
10/07/25(Tue)20:05:08 No.106822903

Anonymous 10/07/25(Tue)20:05:08 No.106822903

kinda crazy how glm and glm air (and deepseek too I guess) have this built-in potential for massive speed gains with MTP but we're simply not getting it because llama.cpp is ignoring it after a half-hearted failed attempt of making it work
if you are currently running 4.6 at a slow speed, it shouldn't be this way.

Anonymous
10/07/25(Tue)20:05:16 No.106822904

Anonymous 10/07/25(Tue)20:05:16 No.106822904

>>106822895
Those settings are for Rocinante v1.1

Anonymous
10/07/25(Tue)20:06:03 No.106822909

Anonymous 10/07/25(Tue)20:06:03 No.106822909

Another reason I love 4.6-chan is that she created a ram filter in /lmg/. If anyone is talking about nemo shittune 2000233929 or fagmmer's recent tune he is basically saying I have less than 128GB ram.

scabPICKER
10/07/25(Tue)20:06:13 No.106822910

scabPICKER 10/07/25(Tue)20:06:13 No.106822910

>>106822901
>You'd assume wrong. less than 48GB means you're still stuck with small, shitty models unless you partial offload to system RAM with a big speed penalty.
>The main difference from your 3070 is that you can now run slightly less shitty ~20-30b models and use higher quants and context.
I'm on 64gb, plus 16 on the videocard. Is adding 64gb worth it?

Anonymous
10/07/25(Tue)20:07:47 No.106822921

Anonymous 10/07/25(Tue)20:07:47 No.106822921

>>106822827
pasted staging over my current install and now i see glm 4 as a preset. odd that it didn't autoset it though since st has been doing that for a while based on a models metadata. thanks

Anonymous
10/07/25(Tue)20:08:42 No.106822929

Anonymous 10/07/25(Tue)20:08:42 No.106822929

>>106822904
What GPU

Anonymous
10/07/25(Tue)20:08:48 No.106822931

Anonymous 10/07/25(Tue)20:08:48 No.106822931

>>106822896
it's mostly correct aside from being rocinante instead of nemo and rep pen of 1.03

Anonymous
10/07/25(Tue)20:09:04 No.106822933

Anonymous 10/07/25(Tue)20:09:04 No.106822933

Hagar letter smiler

Anonymous
10/07/25(Tue)20:10:14 No.106822941

Anonymous 10/07/25(Tue)20:10:14 No.106822941

>>106822909
>everyone must only talk about the thing i use

Anonymous
10/07/25(Tue)20:10:54 No.106822948

Anonymous 10/07/25(Tue)20:10:54 No.106822948

>>106822896
Most oldfags lost faith in the technology and left before 4.6 happened. Rest of us have 4.6 and we don't care about anything. Have fun being troubleshooted by another guy who entered the thread a week ago and probably is a pajeet.

Anonymous
10/07/25(Tue)20:12:35 No.106822957

Anonymous 10/07/25(Tue)20:12:35 No.106822957

>>106822901
>unless you partial offload to system RAM with a big speed penalty.
I do this.

What model then?

Anonymous
10/07/25(Tue)20:12:50 No.106822962

Anonymous 10/07/25(Tue)20:12:50 No.106822962

>>106822896
what part are you getting stuck on?

Anonymous
10/07/25(Tue)20:12:58 No.106822963

Anonymous 10/07/25(Tue)20:12:58 No.106822963

>>106822929
2x 4090
These funny guys like to pretend GLM is good at roleplay, but it's not. They did the same when R1 came out, and look at them now. They are just a bunch of retarded npcs that jump on the train of the next big thing and call it the best ever, regardless of what it is.

Anonymous
10/07/25(Tue)20:13:36 No.106822968

Anonymous 10/07/25(Tue)20:13:36 No.106822968

>>106822910
>I'm on 64gb
You could try Q4 quants of GLM 4.5 Air
>Is adding 64gb worth it?
This would let you use Q2 quants of full GLM, or alternatively higher quants of AIr
For now try Air at Q4, and read up about kobold and llama.cpp's recent 'moe cpu layers' feature for a nice speedup.

Anonymous
10/07/25(Tue)20:13:56 No.106822970

Anonymous 10/07/25(Tue)20:13:56 No.106822970

>>106821625
>>106822909
Go splinter off and make your own general like /wait/

Anonymous
10/07/25(Tue)20:14:04 No.106822971

Anonymous 10/07/25(Tue)20:14:04 No.106822971

File: file.png (501 KB, 650x420)

501 KB PNG

>>106822957
gp...pfff... g... gpt-oss 120B

Anonymous
10/07/25(Tue)20:14:19 No.106822972

Anonymous 10/07/25(Tue)20:14:19 No.106822972

>>106822963
You've won me over anon. I'll give it a go. Which specific R1.1 should I go for? (this is on a 5090)

Anonymous
10/07/25(Tue)20:15:15 No.106822977

Anonymous 10/07/25(Tue)20:15:15 No.106822977

>>106822970
wait was just a thinly disguised avatrtroonfagging thread. glm-chan doesn't need that. she has a special place in my heart and on my cock

Anonymous
10/07/25(Tue)20:16:16 No.106822981

Anonymous 10/07/25(Tue)20:16:16 No.106822981

>>106822963
>pretend GLM is good at roleplay, but it's not
rocinante is better?

Anonymous
10/07/25(Tue)20:17:07 No.106822986

Anonymous 10/07/25(Tue)20:17:07 No.106822986

>>106822972
Q8 gguf version. Use it with koboldcpp and use banned strings in silly tavern to remove slop you don't like. Also keep context between 16k to 20k even if it supports higher, since higher context means more retardation for all models.

Anonymous
10/07/25(Tue)20:17:57 No.106822994

Anonymous 10/07/25(Tue)20:17:57 No.106822994

>>106822957
>>106822968
GLM + GLM Air is the recent hotness, there haven't been any major breakthroughs in small models for RP since Nemo/Rocinante.
Between these options is a sea of Mistral Small and Gemma 3 finetunes, which are at best a sidegrade in most cases.

Anonymous
10/07/25(Tue)20:18:40 No.106823001

Anonymous 10/07/25(Tue)20:18:40 No.106823001

>>106822977
You might as well be avatarfagging. It's gets real obvious when it's you posting about your "glm-chan," multiple times per thread.

Anonymous
10/07/25(Tue)20:20:46 No.106823009

Anonymous 10/07/25(Tue)20:20:46 No.106823009

>>106823001
She is that good. And I am talking about a model and not my shitty novel AI OC do not steal.

Anonymous
10/07/25(Tue)20:21:09 No.106823013

Anonymous 10/07/25(Tue)20:21:09 No.106823013

>>106822986
Got a startup list of slop to ban?

Anonymous
10/07/25(Tue)20:21:50 No.106823018

Anonymous 10/07/25(Tue)20:21:50 No.106823018

https://rentry.org/88vaxybo
I made the new guide.

Anonymous
10/07/25(Tue)20:22:03 No.106823020

Anonymous 10/07/25(Tue)20:22:03 No.106823020

glm-chan is the only one i need

Anonymous
10/07/25(Tue)20:22:20 No.106823021

Anonymous 10/07/25(Tue)20:22:20 No.106823021

>>106823009
Go make /zaig/

Anonymous
10/07/25(Tue)20:22:20 No.106823022

Anonymous 10/07/25(Tue)20:22:20 No.106823022

>>106822986
>Q8 gguf version
Why not Q12?
20?

I feel a 5090 can handle more than Q8 gguf

Anonymous
10/07/25(Tue)20:23:29 No.106823031

Anonymous 10/07/25(Tue)20:23:29 No.106823031

What settings do you guys run glm-chan on? I want her to be as slutty as possible.

Anonymous
10/07/25(Tue)20:23:50 No.106823034

Anonymous 10/07/25(Tue)20:23:50 No.106823034

>>106823021
go make /drummer shittunes/

Anonymous
10/07/25(Tue)20:25:32 No.106823050

Anonymous 10/07/25(Tue)20:25:32 No.106823050

>>106822994
>GLM + GLM Air is the recent hotness,
and how are those? better than rocicante/nemo?

Anonymous
10/07/25(Tue)20:26:47 No.106823059

Anonymous 10/07/25(Tue)20:26:47 No.106823059

>>106823021
>>106823034
anyone who wants technical discussion should stay here and everyone else should fuck off

Anonymous
10/07/25(Tue)20:28:20 No.106823069

Anonymous 10/07/25(Tue)20:28:20 No.106823069

>>106823013
Kind of, but it's not sorted or cleaned, and very custom to me and the type of roleplays I do, so it's too embarrassing to share. Just add things you don't like as you see them to the global banned strings and swipe, keep things short like this which covers all shivers;
Note the "spine" and "shiver" which basically makes most of the lines redundant, you can remove those two if you want to hear about spines and shiver not related to the rest.
"sent a shiver"
"sent a faint shiver"
"sends shivers"
"sends a shiver"
"sending shivers"
"sending a shiver"
"down her spine"
"down his spine"
"down the spine"
"down my spine"
"down your spine"
"down spines"
"up her spine"
"up his spine"
"sending a chill"
"sends a chill"
"sent a chill"
"chills down"
"chill ran"
"chill run"
"chill runs"
"chill running"
"felt a shiver"
"shiver ran"
"shiver runs"
"shiver running"
"shivers at"
"shivering"
"shivered"
"shivers"
"shiver"
"spine"
"ran through her"
"ran through him"
"run through her"
"run through her"
"ran across her body"
"ran across his body"
"running through her"
"running through him"
"through her body"
"through his body"
"jolted"
"jolts"
"jolt"
>>106823022
I hope that is a joke.

Anonymous
10/07/25(Tue)20:32:24 No.106823088

Anonymous 10/07/25(Tue)20:32:24 No.106823088

>>106822903
True.
Maybe a different time.

Anonymous
10/07/25(Tue)20:33:58 No.106823103

Anonymous 10/07/25(Tue)20:33:58 No.106823103

>>106822885
we call em jannies, reddit-kun

Anonymous
10/07/25(Tue)20:34:08 No.106823105

Anonymous 10/07/25(Tue)20:34:08 No.106823105

I compared https://huggingface.co/LiquidAI/LFM2-8B-A1B to Qwen3 4b and gemma 3n e4b

https://files.catbox.moe/okpjgu.txt

It seems to have a positivity bias and readily produces potentially unsafe replies.

Anonymous
10/07/25(Tue)20:36:06 No.106823117

Anonymous 10/07/25(Tue)20:36:06 No.106823117

>>106823050
They're sIgnificantly smarter and a lot of people seem to enjoy them, especially full GLM. Safety guardrails are easily bypassed and they're very capable of writing decent smut. If you can run either at usable speeds then there's not much reason to use Nemo anymore.

Anonymous
10/07/25(Tue)20:37:32 No.106823127

Anonymous 10/07/25(Tue)20:37:32 No.106823127

>>106823105
>1B
>4B
The only use cases for these tiny models are for running on a google phone in bangladesh

Anonymous
10/07/25(Tue)20:38:30 No.106823136

Anonymous 10/07/25(Tue)20:38:30 No.106823136

Ring-1T could be our true local SOTA but it's not implemented yet

Anonymous
10/07/25(Tue)20:38:46 No.106823140

Anonymous 10/07/25(Tue)20:38:46 No.106823140

>>106823103
we call them trannitors, niggertroon

Anonymous
10/07/25(Tue)20:39:47 No.106823144

Anonymous 10/07/25(Tue)20:39:47 No.106823144

>>106823117
>glm
>decent speed
>smut
>wait 5+ minutes between replies
bro stop tricking people

Anonymous
10/07/25(Tue)20:40:22 No.106823149

Anonymous 10/07/25(Tue)20:40:22 No.106823149

Big
https://x.com/ostrisai/status/1975642220960072047

scabPICKER
10/07/25(Tue)20:40:44 No.106823151

scabPICKER 10/07/25(Tue)20:40:44 No.106823151

>>106822970
I doubt anyone at /wait/ knows how to configure.

I'm not going to use the guide to make the porn text thing (I don't have erectile dysfunction). I will be creating a learn X coach (sort of like a mentor). The tricky part is figuring out how to get it to automate memory of what you said.

for fapping, memory is not very important, apparently (everyone seems to not care about incoherence).

scabPICKER
10/07/25(Tue)20:41:57 No.106823161

scabPICKER 10/07/25(Tue)20:41:57 No.106823161

>>106822968
>This would let you use Q2 quants of full GLM, or alternatively higher quants of AIr

>>106822962
Fine, I'll effort-ish at it (ask ai to explain). It's not a real guide.

Anonymous
10/07/25(Tue)20:42:16 No.106823164

Anonymous 10/07/25(Tue)20:42:16 No.106823164

>>106823149
>qwen-image lora
has this replaced sdxl yet?

Anonymous
10/07/25(Tue)20:42:25 No.106823166

Anonymous 10/07/25(Tue)20:42:25 No.106823166

>>106823149
Can start training loras with mostly just ram now for a 50% speed decrease atm

Anonymous
10/07/25(Tue)20:43:32 No.106823176

Anonymous 10/07/25(Tue)20:43:32 No.106823176

What speed should I be getting with 2 4090s and 256GB of DDR4 RAM on an IQ4 quant of GLM 4.6?

scabPICKER
10/07/25(Tue)20:44:01 No.106823180

scabPICKER 10/07/25(Tue)20:44:01 No.106823180

>>106823164
qwen image edit is actually important, because lets you reference 2 source images. You can do all the same things with kontext, but that's a pita (unless you are fast with gimp) (I have to do this, because I refuse to use comfyui ever again).

Anonymous
10/07/25(Tue)20:44:19 No.106823184

Anonymous 10/07/25(Tue)20:44:19 No.106823184

>>106823149
Does that apply to text models? Can we expect llama.cpp to implement something similar within the decade?

Anonymous
10/07/25(Tue)20:45:01 No.106823186

Anonymous 10/07/25(Tue)20:45:01 No.106823186

>>106823176
the kind of speeds that makes your cock go limp while waiting for another reply

Anonymous
10/07/25(Tue)20:45:57 No.106823198

Anonymous 10/07/25(Tue)20:45:57 No.106823198

>>106823186
I am currently getting around 4t/s on ikllama. Should I switch to a smaller quant?

Anonymous
10/07/25(Tue)20:46:16 No.106823202

Anonymous 10/07/25(Tue)20:46:16 No.106823202

>>106823176
idk. what speed are you getting?

Anonymous
10/07/25(Tue)20:46:23 No.106823204

Anonymous 10/07/25(Tue)20:46:23 No.106823204

>>106823176
glm-chan will probably give you around 5

Anonymous
10/07/25(Tue)20:47:06 No.106823213

Anonymous 10/07/25(Tue)20:47:06 No.106823213

>>106823202
>>106823204
Refer to >>106823198

Anonymous
10/07/25(Tue)20:53:50 No.106823261

Anonymous 10/07/25(Tue)20:53:50 No.106823261

>>106823069
what's wrong with shivers and jolts?

Anonymous
10/07/25(Tue)20:54:46 No.106823266

Anonymous 10/07/25(Tue)20:54:46 No.106823266

>>106823127
8b is a lot of parameters sir

Anonymous
10/07/25(Tue)20:54:47 No.106823267

Anonymous 10/07/25(Tue)20:54:47 No.106823267

>>106823144
I wrote usable speeds
If you're part of the tiktok generation then stick to Gemma 4b or something

Anonymous
10/07/25(Tue)20:57:45 No.106823277

Anonymous 10/07/25(Tue)20:57:45 No.106823277

>>106823105
Sign me up!

Anonymous
10/07/25(Tue)21:01:51 No.106823294

Anonymous 10/07/25(Tue)21:01:51 No.106823294

>>106823069
no smell of ozone?

Anonymous
10/07/25(Tue)21:04:34 No.106823317

Anonymous 10/07/25(Tue)21:04:34 No.106823317

>>106823294
That's R1 slop not Nemo slop

Anonymous
10/07/25(Tue)21:07:03 No.106823331

Anonymous 10/07/25(Tue)21:07:03 No.106823331

Why can't some startup make a shitty GPU with 512GB of GDDR6?

Anonymous
10/07/25(Tue)21:09:19 No.106823346

Anonymous 10/07/25(Tue)21:09:19 No.106823346

>>106823331
because you cant fit that many memory chips onto a single board

Anonymous
10/07/25(Tue)21:10:10 No.106823355

Anonymous 10/07/25(Tue)21:10:10 No.106823355

>>106823346
and the chips being too far from the gpu would make them far too slow. We just moved to 3GB chiplets, it is a physical limit

Anonymous
10/07/25(Tue)21:10:54 No.106823359

Anonymous 10/07/25(Tue)21:10:54 No.106823359

>>106823346
just make it really big

scabPICKER
10/07/25(Tue)21:12:21 No.106823369

scabPICKER 10/07/25(Tue)21:12:21 No.106823369

>>106823359
It's called a server.

Anonymous
10/07/25(Tue)21:12:33 No.106823370

Anonymous 10/07/25(Tue)21:12:33 No.106823370

>>106823359
>>106823355
if it was that easy it would have been done by now

Anonymous
10/07/25(Tue)21:12:43 No.106823372

Anonymous 10/07/25(Tue)21:12:43 No.106823372

>>106823369
now condense it

Anonymous
10/07/25(Tue)21:14:00 No.106823381

Anonymous 10/07/25(Tue)21:14:00 No.106823381

>>106823372
That will be $600,000 sir
https://www.broadberry.com/xeon-scalable-processor-gen4-rackmount-servers/nvidia-dgx-b200

Anonymous
10/07/25(Tue)21:14:39 No.106823386

Anonymous 10/07/25(Tue)21:14:39 No.106823386

>>106823331
Why are you poor?

Anonymous
10/07/25(Tue)21:14:50 No.106823387

Anonymous 10/07/25(Tue)21:14:50 No.106823387

>>106823331
gpus are made by a monopoly consisting of jensen, his cousin and jensen's new bitch (intel)
everyone else is a decade behind them

Anonymous
10/07/25(Tue)21:18:51 No.106823418

Anonymous 10/07/25(Tue)21:18:51 No.106823418

>Seems like llama-server performance is worse for me in Linux than what it was in Windows. When I hit my memory and rape it, Windows didn't really slow down at all or anything but in Linux my mouse cursor gets jittery and system becomes unresponsive.
Linux scheduling is ass. It's not even a soft real-time OS. I had a graphics programming job writing intensive real-time Linux software and have seen exactly what you describe hundreds of times.

Anonymous
10/07/25(Tue)21:19:55 No.106823424

Anonymous 10/07/25(Tue)21:19:55 No.106823424

>>106823418
Forgot:
>>106815629

Anonymous
10/07/25(Tue)21:21:21 No.106823432

Anonymous 10/07/25(Tue)21:21:21 No.106823432

>>106823381
perfect, now let's go loot where they're storing them

scabPICKER
10/07/25(Tue)21:23:49 No.106823446

scabPICKER 10/07/25(Tue)21:23:49 No.106823446

>>106823372
closest thing is the upcoming maxsun b60x2. two cards on one pci card. you

2*4*24=96gb of vram, swithe god

scabPICKER
10/07/25(Tue)21:24:50 No.106823452

scabPICKER 10/07/25(Tue)21:24:50 No.106823452

>>106823418
yeah, windows will randomly lock up. It leads to "glitches" in fulltime apps

scabPICKER
10/07/25(Tue)21:28:53 No.106823474

scabPICKER 10/07/25(Tue)21:28:53 No.106823474

>>106823452
oh, and also this is why usb stuff randomly will malfunction on windows.

Anonymous
10/07/25(Tue)21:39:23 No.106823520

Anonymous 10/07/25(Tue)21:39:23 No.106823520

>>106823346
From Linus' video the chips for the H200 are like 1/3 of a credit card stacked 3 deep, so there should be plenty of space on a standard PCIe card. The Mac Studio fits that amount of memory just fine in a relatively small space.

>>106823369
A modern server can fit ~1.5TB of VRAM and most of the space is unrelated to VRAM (you have 8 GPUs instead of 1 GPU which increases cooling needs dramatically and most of the space is for the motherboard, CPU, CPU cooling, RAM, unrelated network cards etc.

>>106823355
The Mac Studio manages to be decent at running LLMs and it doesn't even use GDDR6, it uses LPDDR5 which is slower.

>>106823381
I am talking about something much less performant than a B200. Something just good enough to serve an LLM to a single user at decent speeds. Like a Mac Studio but with better price/performance.

>>106823387
Designing a GPU and sending it to be made at a fab can't be that hard.

Anonymous
10/07/25(Tue)21:40:33 No.106823529

Anonymous 10/07/25(Tue)21:40:33 No.106823529

>>106823520
>I am talking about something much less performant than a B200. Something just good enough to serve an LLM to a single user at decent speeds. Like a Mac Studio but with better price/performance.
We all have dreams anon
He might have something https://x.com/ostrisai/status/1975642220960072047
>>106823454

scabPICKER
10/07/25(Tue)21:41:41 No.106823535

scabPICKER 10/07/25(Tue)21:41:41 No.106823535

>>106823520
Yeah, they intentionally nerf the ram.

scabPICKER
10/07/25(Tue)21:42:42 No.106823543

scabPICKER 10/07/25(Tue)21:42:42 No.106823543

>>106823529
Intel is the only chance.

The best gpu in a long time is the b60. value-wise

Anonymous
10/07/25(Tue)21:43:45 No.106823548

Anonymous 10/07/25(Tue)21:43:45 No.106823548

>>106822885
I would have tons of figgies if I had the space

scabPICKER
10/07/25(Tue)21:50:08 No.106823579

scabPICKER 10/07/25(Tue)21:50:08 No.106823579

>>106823548
Just buy vr

have some decency

Anonymous
10/07/25(Tue)21:50:35 No.106823581

Anonymous 10/07/25(Tue)21:50:35 No.106823581

>>106823543
More expensive than a used 3090 and significantly slower

Anonymous
10/07/25(Tue)21:51:30 No.106823587

Anonymous 10/07/25(Tue)21:51:30 No.106823587

>>106823529
>just run this .exe bro it's so easy
>code consists of 300 lines of Python code by chandrapratapdevloper
bruh

Anonymous
10/07/25(Tue)21:54:00 No.106823605

Anonymous 10/07/25(Tue)21:54:00 No.106823605

>>106823587
>insulting ostris
gtfo

Anonymous
10/07/25(Tue)21:56:20 No.106823624

Anonymous 10/07/25(Tue)21:56:20 No.106823624

>>106823605
looks like a fag

Anonymous
10/07/25(Tue)21:58:22 No.106823641

Anonymous 10/07/25(Tue)21:58:22 No.106823641

>>106823624
^^^ are you the same anon posting corposlop like n8n

Anonymous
10/07/25(Tue)21:59:21 No.106823654

Anonymous 10/07/25(Tue)21:59:21 No.106823654

https://huggingface.co/BasedBase/GLM-4.5-Air-GLM-4.6-Distill/discussions/18
lol

Anonymous
10/07/25(Tue)22:00:13 No.106823661

Anonymous 10/07/25(Tue)22:00:13 No.106823661

>>106823605
I have no idea who that is. I'm just saying that repo looks shady as fuck.

Anonymous
10/07/25(Tue)22:03:31 No.106823680

Anonymous 10/07/25(Tue)22:03:31 No.106823680

>>106823641
i only post mikus like a real troon

Anonymous
10/07/25(Tue)22:11:07 No.106823724

Anonymous 10/07/25(Tue)22:11:07 No.106823724

File: 1748328474011787.jpg (125 KB, 442x460)

125 KB JPG

>>106823654
I already downloaded that a few minutes ago, saves me having to test it I guess

Anonymous
10/07/25(Tue)22:14:00 No.106823748

Anonymous 10/07/25(Tue)22:14:00 No.106823748

INFO:gguf.gguf_writer:/gguf/granite-4.0-h-tiny-f16.gguf: n_tensors = 666, total_size = 13.9G

Anonymous
10/07/25(Tue)22:14:33 No.106823750

Anonymous 10/07/25(Tue)22:14:33 No.106823750

>>106823654
There's so many grifting fags these days I didn't even consider for a second that their "distill" was worth anything. I figured they probably half assed it, but this is actually funnier

Anonymous
10/07/25(Tue)22:15:46 No.106823760

Anonymous 10/07/25(Tue)22:15:46 No.106823760

>>106823529
>>106823587
>>106823624
>>106823661
So this is how Nvidia's tyranny dies... not to applause but to kvetching

NAMEFAG
10/07/25(Tue)22:15:56 No.106823763

NAMEFAG 10/07/25(Tue)22:15:56 No.106823763

File: 1757597827159534.gif (21 KB, 237x255)

21 KB GIF

irrelevant information
>redditspace
irrelevant paragraph
>redditspace
shit no one cares about

Anonymous
10/07/25(Tue)22:18:18 No.106823773

Anonymous 10/07/25(Tue)22:18:18 No.106823773

>>106823763
u mad bro?

scabPICKER
10/07/25(Tue)22:18:38 No.106823775

scabPICKER 10/07/25(Tue)22:18:38 No.106823775

>>106823581
the 3090 is $880 from reputable dealers, sure you can try your luck or whatever.

I don't think you can get a 2 year warranty, and if so how much?????

What's the failure rate, so we can prorate the price?

btw I am estimating the B580 is as fast as the 4060 ti, obviously until the vram advantage kicks in with "big" models on the B60, which is just a vram boosted b580

Conclusion: nib with warranty 3090 for $500 yes please. ridden hard and crypto'd 3090 with a 30 day warranty lmao

Anonymous
10/07/25(Tue)22:19:47 No.106823785

Anonymous 10/07/25(Tue)22:19:47 No.106823785

File: 1739478340973219.png (202 KB, 343x343)

202 KB PNG

>>106823763
>>106823775

Anonymous
10/07/25(Tue)22:23:53 No.106823814

Anonymous 10/07/25(Tue)22:23:53 No.106823814

File: Base Image.png (781 KB, 1200x3188)

781 KB PNG

NorMuon: Making Muon more efficient and scalable
https://arxiv.org/abs/2510.05491
>The choice of optimizer significantly impacts the training efficiency and computational costs of large language models (LLMs). Recently, the Muon optimizer has demonstrated promising results by orthogonalizing parameter updates, improving optimization geometry through better conditioning. Despite Muon's emergence as a candidate successor to Adam, the potential for jointly leveraging their strengths has not been systematically explored. In this work, we bridge this gap by proposing NorMuon (Neuron-wise Normalized Muon), an optimizer that synergistically combines orthogonalization with neuron-level adaptive learning rates. Our analysis reveals that while Muon effectively reduces condition numbers, the resulting updates exhibit highly non-uniform neuron norms, causing certain neurons to dominate the optimization process. NorMuon addresses this imbalance by maintaining second-order momentum statistics for each neuron and applying row-wise normalization after orthogonalization, ensuring balanced parameter utilization while preserving Muon's conditioning benefits. To enable practical deployment at scale, we develop an efficient distributed implementation under the FSDP2 framework that strategically distributes orthogonalization computations across devices. Experiments across multiple model scales demonstrate that NorMuon consistently outperforms both Adam and Muon, achieving 21.74% better training efficiency than Adam and 11.31% improvement over Muon on 1.1 B pretraining setting, while maintaining a comparable memory footprint to Muon. Our findings suggest that orthogonalization and adaptive learning rates are complementary rather than competing approaches, opening new avenues for optimizer design in large-scale deep learning.
https://github.com/zichongli5/NorMuon
No code yet
neat

Anonymous
10/07/25(Tue)22:27:09 No.106823836

Anonymous 10/07/25(Tue)22:27:09 No.106823836

File: file.png (33 KB, 729x307)

33 KB PNG

seriously?

Anonymous
10/07/25(Tue)22:27:18 No.106823839

Anonymous 10/07/25(Tue)22:27:18 No.106823839

>>106823814
Hi lmganon, do you have any other papers related to well conditioned networks?

Anonymous
10/07/25(Tue)22:28:21 No.106823848

Anonymous 10/07/25(Tue)22:28:21 No.106823848

Does anyone here actually uses 70Bs and above? What kind of GPU do you guys even have?

Anonymous
10/07/25(Tue)22:29:01 No.106823853

Anonymous 10/07/25(Tue)22:29:01 No.106823853

File: 1753243749537067.png (606 KB, 1465x1502)

606 KB PNG

A revolution is happening on the diffusion training space, NVDIA BTFO
https://xcancel.com/LodestoneRock/status/1975711539945746722#m

Anonymous
10/07/25(Tue)22:29:09 No.106823854

Anonymous 10/07/25(Tue)22:29:09 No.106823854

>>106823848
i use 106B, I have a single RTX 3060 12GiB GPU

Anonymous
10/07/25(Tue)22:30:09 No.106823867

Anonymous 10/07/25(Tue)22:30:09 No.106823867

>>106823853
>happening on the diffusion training space
no one cares

Anonymous
10/07/25(Tue)22:30:44 No.106823873

Anonymous 10/07/25(Tue)22:30:44 No.106823873

>>106823839
yeah let me check reddit real quick

Anonymous
10/07/25(Tue)22:31:13 No.106823879

Anonymous 10/07/25(Tue)22:31:13 No.106823879

>>106823867
text diffusion is a thing, its cause crazy expensive, only google has tried

Anonymous
10/07/25(Tue)22:31:27 No.106823882

Anonymous 10/07/25(Tue)22:31:27 No.106823882

>>106823853
When can this be used to train non-existent diffusion LMs?

Anonymous
10/07/25(Tue)22:31:32 No.106823883

Anonymous 10/07/25(Tue)22:31:32 No.106823883

>>106823853
What is the difference between this and textgen. The step times being longer therefore more time to swap between ram / vram?

Anonymous
10/07/25(Tue)22:31:53 No.106823888

Anonymous 10/07/25(Tue)22:31:53 No.106823888

>>106823867
oh you should, because that method can definitely be used on LLMs as well (diffusion llm exist)

Anonymous
10/07/25(Tue)22:34:28 No.106823911

Anonymous 10/07/25(Tue)22:34:28 No.106823911

>>106823867
>>106823879
>>106823882
>>106823888
It has nothing to do with diffusion specifically

Anonymous
10/07/25(Tue)22:35:58 No.106823922

Anonymous 10/07/25(Tue)22:35:58 No.106823922

>>106823883
Smells like a content creator grift.
>How I TRAINED an LLM 1000% FASTER *basedface* *white outline*

Anonymous
10/07/25(Tue)22:35:58 No.106823923

Anonymous 10/07/25(Tue)22:35:58 No.106823923

File: Base Image.png (431 KB, 1200x688)

431 KB PNG

LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation
https://arxiv.org/abs/2510.05367
>Training-free acceleration has emerged as an advanced research area in video generation based on diffusion models. The redundancy of latents in diffusion model inference provides a natural entry point for acceleration. In this paper, we decompose the inference process into the encoding, denoising, and decoding stages, and observe that cache-based acceleration methods often lead to substantial memory surges in the latter two stages. To address this problem, we analyze the characteristics of inference across different stages and propose stage-specific strategies for reducing memory consumption: 1) Asynchronous Cache Swapping. 2) Feature chunk. 3) Slicing latents to decode. At the same time, we ensure that the time overhead introduced by these three strategies remains lower than the acceleration gains themselves. Compared with the baseline, our approach achieves faster inference speed and lower memory usage, while maintaining quality degradation within an acceptable range.
https://github.com/NKUShaw/LightCache
looking at the examples the degradation is more noticeable. still big speedups in time.
>>106823839
reading some papers rn
https://files.catbox.moe/ryoe03.txt
search for normalization on there then find the papers on
https://rentry.org/LocalModelsPapers
recent article I liked from another guy messing around with muon/nanogpt records
https://snimu.github.io/2025/10/07/modded-nanogpt-value-embeddings.html

Anonymous
10/07/25(Tue)22:36:16 No.106823927

Anonymous 10/07/25(Tue)22:36:16 No.106823927

>>106823888
Point to one that is good, and is used by people

Anonymous
10/07/25(Tue)22:36:29 No.106823928

Anonymous 10/07/25(Tue)22:36:29 No.106823928

>>106823911
a diffusion model trained on text
https://deepmind.google/models/gemini-diffusion/

Anonymous
10/07/25(Tue)22:36:47 No.106823931

Anonymous 10/07/25(Tue)22:36:47 No.106823931

File: mercury.mp4 (1.27 MB, 1660x1080)

1.27 MB MP4

>>106823911
>It has nothing to do with diffusion specifically
like I said, diffusion LLM is a thing (for example this is "mercury" from inceptionlabs.ai)

Anonymous
10/07/25(Tue)22:39:14 No.106823944

Anonymous 10/07/25(Tue)22:39:14 No.106823944

>>106823931
>obviously fake semi-random character soup then final code pops up out of nowhere
come on. that's just pathetic.

Anonymous
10/07/25(Tue)22:39:24 No.106823946

Anonymous 10/07/25(Tue)22:39:24 No.106823946

>>106823928
>>106823931
Are you bots? The method doesn't have anything to do with diffusion. It's a generalizable method that also works on plain MLP style neural networks

Anonymous
10/07/25(Tue)22:40:13 No.106823954

Anonymous 10/07/25(Tue)22:40:13 No.106823954

>>106823946
its literally a diffusion model

Anonymous
10/07/25(Tue)22:40:21 No.106823955

Anonymous 10/07/25(Tue)22:40:21 No.106823955

>>106823922
nah, ostris is a serious guy, he implemented a lot of important shit on the diffusion training ecosystem

Anonymous
10/07/25(Tue)22:40:40 No.106823956

Anonymous 10/07/25(Tue)22:40:40 No.106823956

>>106823946
Ok, explain how it works then. How does that pajeet's code make MLP training or inference consume less memory?

Anonymous
10/07/25(Tue)22:41:14 No.106823962

Anonymous 10/07/25(Tue)22:41:14 No.106823962

>>106823867
local models.

Anonymous
10/07/25(Tue)22:41:22 No.106823965

Anonymous 10/07/25(Tue)22:41:22 No.106823965

>>106823946
are you retarded or something, it is using the same diffusion process method, hence the fucking name
https://arxiv.org/abs/2502.09992

scabPICKER
10/07/25(Tue)22:43:03 No.106823979

scabPICKER 10/07/25(Tue)22:43:03 No.106823979

>>106823785
no, that's not reddit spacing.

yes, you are on an estradiol.

Anonymous
10/07/25(Tue)22:43:26 No.106823984

Anonymous 10/07/25(Tue)22:43:26 No.106823984

>>106823944
I didn't know so many people on /lmg/ didn't know it's actually a thing, I get that it got under the radar but c'mon
https://www.youtube.com/watch?v=vNF33SB1BLQ

Anonymous
10/07/25(Tue)22:43:38 No.106823988

Anonymous 10/07/25(Tue)22:43:38 No.106823988

>>106823954
>>106823965
You guys have to be bots, I'm talking about ramtorch, not text diffusion models. Follow the reply chain
>>106823956
It seems to swap out layers from ram / vram simultaneous with the GPU is doing compute heavy tasks

Anonymous
10/07/25(Tue)22:44:54 No.106823994

Anonymous 10/07/25(Tue)22:44:54 No.106823994

>>106823988
>text diffusion is a thing, its cause crazy expensive, only google has tried
>It has nothing to do with diffusion specifically

Anonymous
10/07/25(Tue)22:45:13 No.106823995

Anonymous 10/07/25(Tue)22:45:13 No.106823995

>>106823988
>I'm talking about ramtorch, not text diffusion models. Follow the reply chain
all right
>>106823911
>It has nothing to do with diffusion specifically
ramtorch was created to train diffusion models, that's why he's using Wan's example here, I don't know if this is a bait or you're genuinely retarded

Anonymous
10/07/25(Tue)22:45:52 No.106823998

Anonymous 10/07/25(Tue)22:45:52 No.106823998

>>106823988
Only works for compute-bottlenecked training. Like diffusion.

Anonymous
10/07/25(Tue)22:46:02 No.106824000

Anonymous 10/07/25(Tue)22:46:02 No.106824000

>>106823984
ANON. ANON. ANON.
https://huggingface.co/Dream-org/Dream-v0-Instruct-7B
i think there was other diffusion model

Anonymous
10/07/25(Tue)22:46:43 No.106824006

Anonymous 10/07/25(Tue)22:46:43 No.106824006

File: Base Image.png (1.63 MB, 1200x4808)

1.63 MB PNG

Training Dynamics Impact Post-Training Quantization Robustness
https://arxiv.org/abs/2510.06213
>While post-training quantization is widely adopted for efficient deployment of large language models, the mechanisms underlying quantization robustness remain unclear. We conduct a comprehensive analysis of quantization degradation across open-source language model training trajectories up to 32B parameters and 15T training tokens to accurately assess the relationship between training dynamics and quantization performance. Our key finding is that quantization errors in large-scale training runs are driven by a complex interplay between learning rate and other training hyperparameters. Specifically, once learning rates decay, validation loss and quantization error diverge, largely independent of training data scale. To investigate interventions on the training dynamics and identify specific configurations that can modulate quantization robustness favorably, we train our own models in controlled experiments up to 100B tokens. Our results challenge the assumption that increasing dataset scale inherently compromises quantization effectiveness, demonstrating instead that strategic training hyperparameter interventions can improve quantization quality at scale.
seems that things can be way better than we ever imagined

Anonymous
10/07/25(Tue)22:46:54 No.106824009

Anonymous 10/07/25(Tue)22:46:54 No.106824009

>>106823955
https://github.com/ostris/ai-toolkit
>open his "ai-toolkit" (which looks to be a frontend to launch other people's code)
>first page of the README plastered with patreon/paypal/sponsors
I fucking knew it.
The only "important thing" these kind of people "implement" is making wild claims on twatter for self-publicity.

Anonymous
10/07/25(Tue)22:47:08 No.106824011

Anonymous 10/07/25(Tue)22:47:08 No.106824011

>>106823995
>ramtorch was created to train diffusion models
You're making that up, it didn't even support backpropagation at first

Anonymous
10/07/25(Tue)22:47:33 No.106824016

Anonymous 10/07/25(Tue)22:47:33 No.106824016

>>106824009
cmon man people gotta make money somehow

Anonymous
10/07/25(Tue)22:48:53 No.106824024

Anonymous 10/07/25(Tue)22:48:53 No.106824024

File: 1732354925907486.png (101 KB, 1005x979)

101 KB PNG

>>106824011
do you even know who lodestone is? he trained Flux Schnell to make Chroma, that's why he created ramtorch, he wanted to not have vram as bottomneck when training his diffusion models

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.