[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


🎉 Happy Birthday 4chan! 🎉


[Advertise on 4chan]


File: sssssssssss.jpg (262 KB, 1536x1536)
262 KB
262 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106816104 & >>106807832

â–ºNews
>(10/07) Release: LFM2-8b-A1b: Hybrid attention tiny MoE: https://liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts
>(10/07) NeuTTS Air released, built off Qwen 0.5B: https://hf.co/neuphonic/neutts-air
>(10/06) Anthropic open sources Petri, a parallel exploration tool: https://anthropic.com/research/petri-open-source-auditing
>(10/03) Qwen3-VL-30B-A3B released: https://hf.co/Qwen/Qwen3-VL-30B-A3B-Thinking
>(10/02) ZLUDA 5 released with preliminary support for llama.cpp: https://vosen.github.io/ZLUDA/blog/zluda-update-q3-2025

â–ºNews Archive: https://rentry.org/lmg-news-archive
â–ºGlossary: https://rentry.org/lmg-glossary
â–ºLinks: https://rentry.org/LocalModelsLinks
â–ºOfficial /lmg/ card: https://files.catbox.moe/cbclyf.png

â–ºGetting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

â–ºFurther Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

â–ºBenchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

â–ºTools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

â–ºText Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
â–ºRecent Highlights from the Previous Thread: >>106816104

--Paper: Less is More: Recursive Reasoning with Tiny Networks:
>106820451 >106820469 >106820482 >106820507
--Papers:
>106822631
--Liquid AI releases efficient 8B hybrid MoE model for on-device applications:
>106817815 >106817843 >106817858 >106817906 >106817920 >106818061 >106818120 >106818105 >106817881 >106820927
--Critiquing GPT-OSS model limitations and questioning its practical viability:
>106818819 >106818841 >106818858 >106818864 >106818896 >106818933 >106818955 >106818960 >106819069 >106819070 >106821424 >106818929 >106819063 >106819269 >106819323 >106819066 >106819218 >106820707 >106820723 >106819149 >106819161 >106819221
--CPU/GPU server architecture tradeoffs:
>106816785 >106816809 >106816843 >106816858 >106816867 >106816880 >106816903 >106816961 >106817031 >106817084
--Choosing a lightweight Linux distro for optimal llama.cpp performance:
>106818191 >106818219 >106818232 >106818234 >106818294 >106818363 >106818271 >106818282 >106818463
--Alternatives to Command-R+ for RAM-based inference and model performance tradeoffs:
>106819017 >106819051 >106819061 >106819088 >106819163 >106819439 >106819533 >106819627 >106819649
--GitHub PR for host memory prompt caching in llama.cpp:
>106820185 >106820217
--Dual-model pipeline for bypassing censorship via token swapping:
>106818917 >106818970 >106819349
--Finetuning challenges: compute needs, data quality, and model limitations:
>106820466 >106820490 >106820554 >106820793 >106820903 >106821020 >106821064 >106821186 >106821253 >106821335 >106821363 >106821302 >106821356 >106821387 >106821401 >106821441 >106821461 >106821654 >106821732 >106821859 >106822067 >106821583 >106821693 >106821773 >106821815 >106821840 >106821848 >106822010 >106822042 >106822066 >106822079 >106822128 >106822043 >106822016
--Miku (free space):
>106816325

â–ºRecent Highlight Posts from the Previous Thread: >>106816108

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
first for Granite 4
>>
File: dead.jpg (81 KB, 1637x265)
81 KB
81 KB JPG
>>106822756
>>106822760
waifufaggots begone.
>>
>>106822792
you first
>>
>>106822766
Is miku a leftist now?
>>
anyone got a glm air 4.5 st preset? st has been pretty good about auto-applying whichever i need for models, but doesnt show anything for this and is doing like <|eot_id||><|end_header_id|> in the response randomly, then keeps writing
>>
I heard good old Rocinante is still the best roleplay model out there
>>
>>106822799
Update your ST, it has a GLM 4 preset already made.
>>
>>106822823
If you're a bottom of the barrel poorfag yes.
>>
What's the best model to use for an RTX 5090 when using kobold + sillytavern? I assume this could handle something pretty beefy for accurate results?
MUST be able to do porn.
Currently using Mistral-Nemo-Instruct-2407-Q8_0 but this was for a 3070 so I'm curious to make full use of a 5090
>>
>>106822833
Rocinante v1.1
>>
>>106822797
always was, she even got blacked a long time ago
>>
>>106822844
no but really
>>
I was skeptical about GLM 4.6 at first but it's working great when using it with muh dick.
>>
File: file.png (99 KB, 545x931)
99 KB
99 KB PNG
>>106822864
Seriously. For real. Unironically.
>>
File: 1731901941862455.jpg (3.6 MB, 6912x5184)
3.6 MB
3.6 MB JPG
This is your least obsessed and most mentally sane baker and moderator btw

>>106789576
>>
>>106822872
those settings are on a 5090?
>>
the starter guide is 1 year old, guys

and it's not a real guide. neither llm thread has a beginner's guide.
>>
>>106822833
>I assume this could handle something pretty beefy
You'd assume wrong. less than 48GB means you're still stuck with small, shitty models unless you partial offload to system RAM with a big speed penalty.
The main difference from your 3070 is that you can now run slightly less shitty ~20-30b models and use higher quants and context.
>>
kinda crazy how glm and glm air (and deepseek too I guess) have this built-in potential for massive speed gains with MTP but we're simply not getting it because llama.cpp is ignoring it after a half-hearted failed attempt of making it work
if you are currently running 4.6 at a slow speed, it shouldn't be this way.
>>
>>106822895
Those settings are for Rocinante v1.1
>>
Another reason I love 4.6-chan is that she created a ram filter in /lmg/. If anyone is talking about nemo shittune 2000233929 or fagmmer's recent tune he is basically saying I have less than 128GB ram.
>>
>>106822901
>You'd assume wrong. less than 48GB means you're still stuck with small, shitty models unless you partial offload to system RAM with a big speed penalty.
>The main difference from your 3070 is that you can now run slightly less shitty ~20-30b models and use higher quants and context.
I'm on 64gb, plus 16 on the videocard. Is adding 64gb worth it?
>>
>>106822827
pasted staging over my current install and now i see glm 4 as a preset. odd that it didn't autoset it though since st has been doing that for a while based on a models metadata. thanks
>>
>>106822904
What GPU
>>
>>106822896
it's mostly correct aside from being rocinante instead of nemo and rep pen of 1.03
>>
Hagar letter smiler
>>
>>106822909
>everyone must only talk about the thing i use
>>
>>106822896
Most oldfags lost faith in the technology and left before 4.6 happened. Rest of us have 4.6 and we don't care about anything. Have fun being troubleshooted by another guy who entered the thread a week ago and probably is a pajeet.
>>
>>106822901
>unless you partial offload to system RAM with a big speed penalty.
I do this.

What model then?
>>
>>106822896
what part are you getting stuck on?
>>
>>106822929
2x 4090
These funny guys like to pretend GLM is good at roleplay, but it's not. They did the same when R1 came out, and look at them now. They are just a bunch of retarded npcs that jump on the train of the next big thing and call it the best ever, regardless of what it is.
>>
>>106822910
>I'm on 64gb
You could try Q4 quants of GLM 4.5 Air
>Is adding 64gb worth it?
This would let you use Q2 quants of full GLM, or alternatively higher quants of AIr
For now try Air at Q4, and read up about kobold and llama.cpp's recent 'moe cpu layers' feature for a nice speedup.
>>
>>106821625
>>106822909
Go splinter off and make your own general like /wait/
>>
File: file.png (501 KB, 650x420)
501 KB
501 KB PNG
>>106822957
gp...pfff... g... gpt-oss 120B
>>
>>106822963
You've won me over anon. I'll give it a go. Which specific R1.1 should I go for? (this is on a 5090)
>>
>>106822970
wait was just a thinly disguised avatrtroonfagging thread. glm-chan doesn't need that. she has a special place in my heart and on my cock
>>
>>106822963
>pretend GLM is good at roleplay, but it's not
rocinante is better?
>>
>>106822972
Q8 gguf version. Use it with koboldcpp and use banned strings in silly tavern to remove slop you don't like. Also keep context between 16k to 20k even if it supports higher, since higher context means more retardation for all models.
>>
>>106822957
>>106822968
GLM + GLM Air is the recent hotness, there haven't been any major breakthroughs in small models for RP since Nemo/Rocinante.
Between these options is a sea of Mistral Small and Gemma 3 finetunes, which are at best a sidegrade in most cases.
>>
>>106822977
You might as well be avatarfagging. It's gets real obvious when it's you posting about your "glm-chan," multiple times per thread.
>>
>>106823001
She is that good. And I am talking about a model and not my shitty novel AI OC do not steal.
>>
>>106822986
Got a startup list of slop to ban?
>>
https://rentry.org/88vaxybo
I made the new guide.
>>
glm-chan is the only one i need
>>
>>106823009
Go make /zaig/
>>
>>106822986
>Q8 gguf version
Why not Q12?
20?

I feel a 5090 can handle more than Q8 gguf
>>
What settings do you guys run glm-chan on? I want her to be as slutty as possible.
>>
>>106823021
go make /drummer shittunes/
>>
>>106822994
>GLM + GLM Air is the recent hotness,
and how are those? better than rocicante/nemo?
>>
>>106823021
>>106823034
anyone who wants technical discussion should stay here and everyone else should fuck off
>>
>>106823013
Kind of, but it's not sorted or cleaned, and very custom to me and the type of roleplays I do, so it's too embarrassing to share. Just add things you don't like as you see them to the global banned strings and swipe, keep things short like this which covers all shivers;
Note the "spine" and "shiver" which basically makes most of the lines redundant, you can remove those two if you want to hear about spines and shiver not related to the rest.
"sent a shiver"
"sent a faint shiver"
"sends shivers"
"sends a shiver"
"sending shivers"
"sending a shiver"
"down her spine"
"down his spine"
"down the spine"
"down my spine"
"down your spine"
"down spines"
"up her spine"
"up his spine"
"sending a chill"
"sends a chill"
"sent a chill"
"chills down"
"chill ran"
"chill run"
"chill runs"
"chill running"
"felt a shiver"
"shiver ran"
"shiver runs"
"shiver running"
"shivers at"
"shivering"
"shivered"
"shivers"
"shiver"
"spine"
"ran through her"
"ran through him"
"run through her"
"run through her"
"ran across her body"
"ran across his body"
"running through her"
"running through him"
"through her body"
"through his body"
"jolted"
"jolts"
"jolt"


>>106823022
I hope that is a joke.
>>
>>106822903
True.
Maybe a different time.
>>
>>106822885
we call em jannies, reddit-kun
>>
I compared https://huggingface.co/LiquidAI/LFM2-8B-A1B to Qwen3 4b and gemma 3n e4b

https://files.catbox.moe/okpjgu.txt

It seems to have a positivity bias and readily produces potentially unsafe replies.
>>
>>106823050
They're sIgnificantly smarter and a lot of people seem to enjoy them, especially full GLM. Safety guardrails are easily bypassed and they're very capable of writing decent smut. If you can run either at usable speeds then there's not much reason to use Nemo anymore.
>>
>>106823105
>1B
>4B
The only use cases for these tiny models are for running on a google phone in bangladesh
>>
Ring-1T could be our true local SOTA but it's not implemented yet
>>
>>106823103
we call them trannitors, niggertroon
>>
>>106823117
>glm
>decent speed
>smut
>wait 5+ minutes between replies
bro stop tricking people
>>
Big
https://x.com/ostrisai/status/1975642220960072047
>>
>>106822970
I doubt anyone at /wait/ knows how to configure.

I'm not going to use the guide to make the porn text thing (I don't have erectile dysfunction). I will be creating a learn X coach (sort of like a mentor). The tricky part is figuring out how to get it to automate memory of what you said.

for fapping, memory is not very important, apparently (everyone seems to not care about incoherence).
>>
>>106822968
>This would let you use Q2 quants of full GLM, or alternatively higher quants of AIr

>>106822962
Fine, I'll effort-ish at it (ask ai to explain). It's not a real guide.
>>
>>106823149
>qwen-image lora
has this replaced sdxl yet?
>>
>>106823149
Can start training loras with mostly just ram now for a 50% speed decrease atm
>>
What speed should I be getting with 2 4090s and 256GB of DDR4 RAM on an IQ4 quant of GLM 4.6?
>>
>>106823164
qwen image edit is actually important, because lets you reference 2 source images. You can do all the same things with kontext, but that's a pita (unless you are fast with gimp) (I have to do this, because I refuse to use comfyui ever again).
>>
>>106823149
Does that apply to text models? Can we expect llama.cpp to implement something similar within the decade?
>>
>>106823176
the kind of speeds that makes your cock go limp while waiting for another reply
>>
>>106823186
I am currently getting around 4t/s on ikllama. Should I switch to a smaller quant?
>>
>>106823176
idk. what speed are you getting?
>>
>>106823176
glm-chan will probably give you around 5
>>
>>106823202
>>106823204
Refer to >>106823198
>>
>>106823069
what's wrong with shivers and jolts?
>>
>>106823127
8b is a lot of parameters sir
>>
>>106823144
I wrote usable speeds
If you're part of the tiktok generation then stick to Gemma 4b or something
>>
>>106823105
Sign me up!
>>
>>106823069
no smell of ozone?
>>
>>106823294
That's R1 slop not Nemo slop
>>
Why can't some startup make a shitty GPU with 512GB of GDDR6?
>>
>>106823331
because you cant fit that many memory chips onto a single board
>>
>>106823346
and the chips being too far from the gpu would make them far too slow. We just moved to 3GB chiplets, it is a physical limit
>>
>>106823346
just make it really big
>>
>>106823359
It's called a server.
>>
>>106823359
>>106823355
if it was that easy it would have been done by now
>>
>>106823369
now condense it
>>
>>106823372
That will be $600,000 sir
https://www.broadberry.com/xeon-scalable-processor-gen4-rackmount-servers/nvidia-dgx-b200
>>
>>106823331
Why are you poor?
>>
>>106823331
gpus are made by a monopoly consisting of jensen, his cousin and jensen's new bitch (intel)
everyone else is a decade behind them
>>
>Seems like llama-server performance is worse for me in Linux than what it was in Windows. When I hit my memory and rape it, Windows didn't really slow down at all or anything but in Linux my mouse cursor gets jittery and system becomes unresponsive.
Linux scheduling is ass. It's not even a soft real-time OS. I had a graphics programming job writing intensive real-time Linux software and have seen exactly what you describe hundreds of times.
>>
>>106823418
Forgot:
>>106815629
>>
>>106823381
perfect, now let's go loot where they're storing them
>>
>>106823372
closest thing is the upcoming maxsun b60x2. two cards on one pci card. you

2*4*24=96gb of vram, swithe god
>>
>>106823418
yeah, windows will randomly lock up. It leads to "glitches" in fulltime apps
>>
>>106823452
oh, and also this is why usb stuff randomly will malfunction on windows.
>>
>>106823346
From Linus' video the chips for the H200 are like 1/3 of a credit card stacked 3 deep, so there should be plenty of space on a standard PCIe card. The Mac Studio fits that amount of memory just fine in a relatively small space.

>>106823369
A modern server can fit ~1.5TB of VRAM and most of the space is unrelated to VRAM (you have 8 GPUs instead of 1 GPU which increases cooling needs dramatically and most of the space is for the motherboard, CPU, CPU cooling, RAM, unrelated network cards etc.

>>106823355
The Mac Studio manages to be decent at running LLMs and it doesn't even use GDDR6, it uses LPDDR5 which is slower.

>>106823381
I am talking about something much less performant than a B200. Something just good enough to serve an LLM to a single user at decent speeds. Like a Mac Studio but with better price/performance.

>>106823387
Designing a GPU and sending it to be made at a fab can't be that hard.
>>
>>106823520
>I am talking about something much less performant than a B200. Something just good enough to serve an LLM to a single user at decent speeds. Like a Mac Studio but with better price/performance.
We all have dreams anon
He might have something https://x.com/ostrisai/status/1975642220960072047
>>106823454
>>
>>106823520
Yeah, they intentionally nerf the ram.
>>
>>106823529
Intel is the only chance.

The best gpu in a long time is the b60. value-wise
>>
>>106822885
I would have tons of figgies if I had the space
>>
>>106823548
Just buy vr

have some decency
>>
>>106823543
More expensive than a used 3090 and significantly slower
>>
>>106823529
>just run this .exe bro it's so easy
>code consists of 300 lines of Python code by chandrapratapdevloper
bruh
>>
>>106823587
>insulting ostris
gtfo
>>
>>106823605
looks like a fag
>>
>>106823624
^^^ are you the same anon posting corposlop like n8n
>>
https://huggingface.co/BasedBase/GLM-4.5-Air-GLM-4.6-Distill/discussions/18
lol
>>
>>106823605
I have no idea who that is. I'm just saying that repo looks shady as fuck.
>>
>>106823641
i only post mikus like a real troon
>>
File: 1748328474011787.jpg (125 KB, 442x460)
125 KB
125 KB JPG
>>106823654
I already downloaded that a few minutes ago, saves me having to test it I guess
>>
INFO:gguf.gguf_writer:/gguf/granite-4.0-h-tiny-f16.gguf: n_tensors = 666, total_size = 13.9G
>>
>>106823654
There's so many grifting fags these days I didn't even consider for a second that their "distill" was worth anything. I figured they probably half assed it, but this is actually funnier
>>
>>106823529
>>106823587
>>106823624
>>106823661
So this is how Nvidia's tyranny dies... not to applause but to kvetching
>>
File: 1757597827159534.gif (21 KB, 237x255)
21 KB
21 KB GIF
irrelevant information
>redditspace
irrelevant paragraph
>redditspace
shit no one cares about
>>
>>106823763
u mad bro?
>>
>>106823581
the 3090 is $880 from reputable dealers, sure you can try your luck or whatever.

I don't think you can get a 2 year warranty, and if so how much?????

What's the failure rate, so we can prorate the price?

btw I am estimating the B580 is as fast as the 4060 ti, obviously until the vram advantage kicks in with "big" models on the B60, which is just a vram boosted b580

Conclusion: nib with warranty 3090 for $500 yes please. ridden hard and crypto'd 3090 with a 30 day warranty lmao
>>
File: 1739478340973219.png (202 KB, 343x343)
202 KB
202 KB PNG
>>106823763
>>106823775
>>
File: Base Image.png (781 KB, 1200x3188)
781 KB
781 KB PNG
NorMuon: Making Muon more efficient and scalable
https://arxiv.org/abs/2510.05491
>The choice of optimizer significantly impacts the training efficiency and computational costs of large language models (LLMs). Recently, the Muon optimizer has demonstrated promising results by orthogonalizing parameter updates, improving optimization geometry through better conditioning. Despite Muon's emergence as a candidate successor to Adam, the potential for jointly leveraging their strengths has not been systematically explored. In this work, we bridge this gap by proposing NorMuon (Neuron-wise Normalized Muon), an optimizer that synergistically combines orthogonalization with neuron-level adaptive learning rates. Our analysis reveals that while Muon effectively reduces condition numbers, the resulting updates exhibit highly non-uniform neuron norms, causing certain neurons to dominate the optimization process. NorMuon addresses this imbalance by maintaining second-order momentum statistics for each neuron and applying row-wise normalization after orthogonalization, ensuring balanced parameter utilization while preserving Muon's conditioning benefits. To enable practical deployment at scale, we develop an efficient distributed implementation under the FSDP2 framework that strategically distributes orthogonalization computations across devices. Experiments across multiple model scales demonstrate that NorMuon consistently outperforms both Adam and Muon, achieving 21.74% better training efficiency than Adam and 11.31% improvement over Muon on 1.1 B pretraining setting, while maintaining a comparable memory footprint to Muon. Our findings suggest that orthogonalization and adaptive learning rates are complementary rather than competing approaches, opening new avenues for optimizer design in large-scale deep learning.
https://github.com/zichongli5/NorMuon
No code yet
neat
>>
File: file.png (33 KB, 729x307)
33 KB
33 KB PNG
seriously?
>>
>>106823814
Hi lmganon, do you have any other papers related to well conditioned networks?
>>
Does anyone here actually uses 70Bs and above? What kind of GPU do you guys even have?
>>
File: 1753243749537067.png (606 KB, 1465x1502)
606 KB
606 KB PNG
A revolution is happening on the diffusion training space, NVDIA BTFO
https://xcancel.com/LodestoneRock/status/1975711539945746722#m
>>
>>106823848
i use 106B, I have a single RTX 3060 12GiB GPU
>>
>>106823853
>happening on the diffusion training space
no one cares
>>
>>106823839
yeah let me check reddit real quick
>>
>>106823867
text diffusion is a thing, its cause crazy expensive, only google has tried
>>
>>106823853
When can this be used to train non-existent diffusion LMs?
>>
>>106823853
What is the difference between this and textgen. The step times being longer therefore more time to swap between ram / vram?
>>
>>106823867
oh you should, because that method can definitely be used on LLMs as well (diffusion llm exist)
>>
>>106823867
>>106823879
>>106823882
>>106823888
It has nothing to do with diffusion specifically
>>
>>106823883
Smells like a content creator grift.
>How I TRAINED an LLM 1000% FASTER *basedface* *white outline*
>>
File: Base Image.png (431 KB, 1200x688)
431 KB
431 KB PNG
LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation
https://arxiv.org/abs/2510.05367
>Training-free acceleration has emerged as an advanced research area in video generation based on diffusion models. The redundancy of latents in diffusion model inference provides a natural entry point for acceleration. In this paper, we decompose the inference process into the encoding, denoising, and decoding stages, and observe that cache-based acceleration methods often lead to substantial memory surges in the latter two stages. To address this problem, we analyze the characteristics of inference across different stages and propose stage-specific strategies for reducing memory consumption: 1) Asynchronous Cache Swapping. 2) Feature chunk. 3) Slicing latents to decode. At the same time, we ensure that the time overhead introduced by these three strategies remains lower than the acceleration gains themselves. Compared with the baseline, our approach achieves faster inference speed and lower memory usage, while maintaining quality degradation within an acceptable range.
https://github.com/NKUShaw/LightCache
looking at the examples the degradation is more noticeable. still big speedups in time.
>>106823839
reading some papers rn
https://files.catbox.moe/ryoe03.txt
search for normalization on there then find the papers on
https://rentry.org/LocalModelsPapers
recent article I liked from another guy messing around with muon/nanogpt records
https://snimu.github.io/2025/10/07/modded-nanogpt-value-embeddings.html
>>
>>106823888
Point to one that is good, and is used by people
>>
>>106823911
a diffusion model trained on text
https://deepmind.google/models/gemini-diffusion/
>>
File: mercury.mp4 (1.27 MB, 1660x1080)
1.27 MB
1.27 MB MP4
>>106823911
>It has nothing to do with diffusion specifically
like I said, diffusion LLM is a thing (for example this is "mercury" from inceptionlabs.ai)
>>
>>106823931
>obviously fake semi-random character soup then final code pops up out of nowhere
come on. that's just pathetic.
>>
>>106823928
>>106823931
Are you bots? The method doesn't have anything to do with diffusion. It's a generalizable method that also works on plain MLP style neural networks
>>
>>106823946
its literally a diffusion model
>>
>>106823922
nah, ostris is a serious guy, he implemented a lot of important shit on the diffusion training ecosystem
>>
>>106823946
Ok, explain how it works then. How does that pajeet's code make MLP training or inference consume less memory?
>>
>>106823867
local models.
>>
>>106823946
are you retarded or something, it is using the same diffusion process method, hence the fucking name
https://arxiv.org/abs/2502.09992
>>
>>106823785
no, that's not reddit spacing.

yes, you are on an estradiol.
>>
>>106823944
I didn't know so many people on /lmg/ didn't know it's actually a thing, I get that it got under the radar but c'mon
https://www.youtube.com/watch?v=vNF33SB1BLQ
>>
>>106823954
>>106823965
You guys have to be bots, I'm talking about ramtorch, not text diffusion models. Follow the reply chain
>>106823956
It seems to swap out layers from ram / vram simultaneous with the GPU is doing compute heavy tasks
>>
>>106823988
>text diffusion is a thing, its cause crazy expensive, only google has tried
>It has nothing to do with diffusion specifically
>>
>>106823988
>I'm talking about ramtorch, not text diffusion models. Follow the reply chain
all right
>>106823911
>It has nothing to do with diffusion specifically
ramtorch was created to train diffusion models, that's why he's using Wan's example here, I don't know if this is a bait or you're genuinely retarded
>>
>>106823988
Only works for compute-bottlenecked training. Like diffusion.
>>
>>106823984
ANON. ANON. ANON.
https://huggingface.co/Dream-org/Dream-v0-Instruct-7B
i think there was other diffusion model
>>
File: Base Image.png (1.63 MB, 1200x4808)
1.63 MB
1.63 MB PNG
Training Dynamics Impact Post-Training Quantization Robustness
https://arxiv.org/abs/2510.06213
>While post-training quantization is widely adopted for efficient deployment of large language models, the mechanisms underlying quantization robustness remain unclear. We conduct a comprehensive analysis of quantization degradation across open-source language model training trajectories up to 32B parameters and 15T training tokens to accurately assess the relationship between training dynamics and quantization performance. Our key finding is that quantization errors in large-scale training runs are driven by a complex interplay between learning rate and other training hyperparameters. Specifically, once learning rates decay, validation loss and quantization error diverge, largely independent of training data scale. To investigate interventions on the training dynamics and identify specific configurations that can modulate quantization robustness favorably, we train our own models in controlled experiments up to 100B tokens. Our results challenge the assumption that increasing dataset scale inherently compromises quantization effectiveness, demonstrating instead that strategic training hyperparameter interventions can improve quantization quality at scale.
seems that things can be way better than we ever imagined
>>
>>106823955
https://github.com/ostris/ai-toolkit
>open his "ai-toolkit" (which looks to be a frontend to launch other people's code)
>first page of the README plastered with patreon/paypal/sponsors
I fucking knew it.
The only "important thing" these kind of people "implement" is making wild claims on twatter for self-publicity.
>>
>>106823995
>ramtorch was created to train diffusion models
You're making that up, it didn't even support backpropagation at first
>>
>>106824009
cmon man people gotta make money somehow
>>
File: 1732354925907486.png (101 KB, 1005x979)
101 KB
101 KB PNG
>>106824011
do you even know who lodestone is? he trained Flux Schnell to make Chroma, that's why he created ramtorch, he wanted to not have vram as bottomneck when training his diffusion models



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.