/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108132261 & >>108123280►News>(02/13) MiniMax-M2.5 released: https://hf.co/MiniMaxAI/MiniMax-M2.5>(02/13) Ring-2.5-1T released, thinking model based on hybrid linear attention: https://hf.co/inclusionAI/Ring-2.5-1T>(02/11) GLM-5 744B-A40B released: https://z.ai/blog/glm-5>(02/11) Ming-flash-omni 2.0 released: https://hf.co/inclusionAI/Ming-flash-omni-2.0>(02/10) MOSS-TTS Family: speech and sound generation models: https://github.com/OpenMOSS/MOSS-TTS>(02/06) KugelAudio-0-Open: Multilingual TTS based on VibeVoice 7B: https://hf.co/kugelaudio/kugelaudio-0-open►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108132261(1/2)--Papers:>108134463--Alexandria audiobook generator voice quality and LoRA training feedback:>108132491 >108132574 >108132620 >108132714 >108133377 >108133570 >108133609 >108133697 >108133741 >108134599 >108133477 >108133778 >108132628 >108133426--MLX quantization performance analysis and tooling limitations:>108132892 >108133233 >108133244 >108132948 >108134986 >108135022 >108135088--DeepSeek's new model rivals Gemini in long-context summarization:>108137775 >108137840 >108137875 >108137936 >108137943 >108138008 >108138239 >108138731 >108138818 >108138841 >108138876 >108138911 >108138976 >108139011 >108139024 >108138932 >108138916 >108138947 >108138950 >108138970 >108137820 >108137870 >108137900 >108137975 >108138135 >108137843 >108139084 >108139103 >108139129--OpenClaw model selection and agent framework tradeoffs:>108132299 >108132378 >108132478 >108132595 >108134485 >108135173 >108135177 >108135190 >108135208 >108135219 >108135399 >108135550 >108136105 >108136248 >108135195 >108135205 >108136842--Federated LLM training feasibility and modular layer approaches:>108133301 >108133762 >108134877 >108135096 >108135297 >108135434 >108135484 >108136085 >108136343 >108136393--Anthropic hiding CoT in Opus 4.6 and implications for model transparency:>108138350 >108138441 >108138486 >108138676 >108138695 >108138737 >108138784 >108138821 >108138896 >108138962►Recent Highlights from the Previous Thread: >>108132261Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
►Recent Highlights from the Previous Thread: >>108132261(2/2)--GLM5 support merged with unused DSA indexers causing high perplexity:>108138069 >108138104 >108138440--GLM-5 underperforms compared to Kimi 2.5 in roleplay and instruction following:>108133448 >108133468 >108133506 >108133529 >108133510 >108133518 >108133545--Rolling window vs compaction for code assistant context management:>108134434 >108134492 >108134542 >108134549 >108134576 >108134616 >108134643 >108134686 >108134695 >108134704--Ring-2.5-1T:>108134981--MiniMaxAI M2.5 release and performance claims:>108136993 >108137009 >108137029 >108137058 >108137062 >108137235--AI video upscaling tool comparisons and recommendations:>108134918 >108134968 >108135001 >108135012 >108135100 >108135196 >108135108 >108135381 >108135397 >108135412 >108135746--OpenAI accuses DeepSeek of unfair model distillation:>108135666 >108135695 >108136362--Miku (free space):>108133070 >108133506 >108135810 >108135869 >108135874 >108135955 >108136772 >108137009 >108137738 >108138497 >108139089►Recent Highlight Posts from the Previous Thread: >>108132262Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108139561The cold weather is really comfy rn =) not long before all the fucking pollen are back
>>108139561teto hablar español!
daniel unslot pleas... the minmax guffs...
>>108139561teto with earrings is really doing it for me for some reason
New 'tosserald
>>108139786It is the mikutroon school shooting era.
MinMax developer:> We don’t have plans to release the base models at this stage. The reality is, after mid-training, these weights have drifted so far that they don’t really qualify as 'base' anymore.
>>108139786fuck is dis lies sama also didn't do that, only 40
>>108139786brimmiest dogshit chart
>>108139786retarded as always
uh minimax bros?
>>108140073Why would you use llms for information retrieval though?
>>108140073
>>108140073>gpt-oss refuses the leastsam won??
>>108140073top chart has zero correlation with model quality or behavior in basically any way, it's as good as meaningless
>>108140143>model qualitybottom doesn't either
>>108140159not perfect but 100x better than the top which may as well be randomly shuffled
>>108139299I like this test I did something similar before. Which models have done the best if you dont mind me asking.
minimax 2.5 will be the salvation for 128gb ramlets trust the plan
Anyone had a look at this Ouro thing?https://www.youtube.com/watch?v=pDsTcrRVNc0
>>1081402911B loop BLT DSA engram will save the local
>>108140295bitnet too dont forget
>>108140465bitnet is the one thing that we'll never get
>>108139561Kasane is fragile, Miko
Is M2.5 better than GLM-5
>>108140295>>108140465RWKV will save localDiffusion LLM will save local
>>108140295DSA is already saving local, you can use GLM-5 right now ;^)
>>108140585better than the current 8ppl llama.cpp implementation
>>108140585size to performance, yes
>>108140741Reasoning took off the way it did because corpos do care about test-time compute. They need to make gains at all costs.
>>108140465Nemotron 3 Super will be native 4 bit, getting closer to 1.58b.
>>108140741Moes took off because they're cheaper and faster to train. It's much more important than inference during the race to achieve agi.. I mean, to beat the benchmarks
>>1081407411 parameter used 4 times is not 4 active parameters. That's the whole point.Compute doesn't matter for local.
>>108140799I can't masturbate to 1t/s
>>108140802weak
>>108140802Also can't do real time assistant tasks at 1t/s either
>>108140819does that really matter though? think of all the (v)ram you're saving I think that possibly could counteract using less efficient stuff, but I'm no KLD-dev
>>108139566>>108139574kill yourself
>finally surpass 100gb of ram+vram (112gb to be exact, 2 3090 + 64gb ram)>can barely run any SOTA modelwhat will come first, diskswapmaxxing or 1-bit llms>>108140799memory is the main bottleneck for AI nowadays, MOE allows you to use slower memory and still get good results, this allows you to actually use less memory, it's a good idea.
>>108140860>what will come first, diskswapmaxxing or 1-bit llmsWill you be sad if I say neither?
>>108140792I don't think it's a matter of either or, if they can stack them they will. Though, the whole point of looping is that the model doesn't need to store information in tokens because it can keep it in latent space, which is more efficient.
>>108140885yes so don't say neither
>>108140792In theory you could have a model loop 100 times to think/plan and then start looping 1 time for output. How would you train that? Dunno.
I think the main reason you'd want some kind of looping is that you can make the looping dynamic. Currently models process tokens the same amount no matter how much they actually contribute to the final output. We could make LLMs much more efficient if we could find a good way to make it not waste so much compute on the low value tokens.
>>108140860>>can barely run any SOTA modelyou can't run any SOTA models
>>108140890It attempts to solve being a vramlet.
>>108140900Alright, I won't say neither.>>108140886NTA, but I think reasoning is useful because it's a sort of behavior you have some control over. You can't really control how the features emerge in latent space in any meaningful way, only the final outcome of the whole processing pipeline, so having both does make sense.>>108140908That's an interesting idea.Just like MoE models have routers, you could have a mechanism that decides the level of "effort" the model will use to generate the next token.Doesn't gemma 3n/mathformers do something like that?
>>108140908How do you define low value token?
>>108140819It doesn't need support, expanding on load is trivial, you can just run it with int4/fp4/int8/whatever.
>>108140956Wouldn't that just end up being diffusion with extra steps?
>>108140908i believe they mention their looping is dynamic for ouro? they have exit gates that can decide when to stop looping, and in their tests using the exit loop context as context for the next tokens seems to be enough. they decided on 4 loops as perf seemed to degrade after 3-4 loops for most tasks.>>108140741it's not looping 4 times for every token. they targeted a uniform distribution, so 2-3 loops on average.
>>108140926>Thinking in latent space works only for one token.Depends on implementation. If you keep overwriting the same KV cache entry sure, but you could also use the last hidden output as a proper input on the next loop, shifting the KV cache back.
>>108139786Reminder the timeline stopped here and we're still in the Chinese era when the West can not make SOTA for our purposes and bencmarkmaxxing and focusing on productivity which is now also affecting Chinese LLMs. Only the model list is outdated but everything functionally is still the same.
>>108141075This.
>>108140999Int4 was in hardware a long time, no one used it because no one used it.
>>108141087>Does any current open model use this approach?Don't think so. You'd need to give the model loopcount as an input too so it knew what the hell it was looking at in the KV cache.
>>108141094INT4 is pretty recent because of AMD only introducing it with RDNA3 but they had it in CDNA2. Nvidia had it since Turing and Intel since Arc first launched.INT8 is a different matter. Nvidia had it since Pascal, and AMD had it since RDNA2, and Intel with Arc's launch. That is a long time in consumer hardware.
ollama is good
llama sees your ppjii-sai
Seedance 2.0 is making traditional media seethe and I love it
>>108141075China won and I'm happy about it
>>108141242... for nothing!
>>108141324I haven't paid attention to video gen in a while. Any links to see this seethe?
>i want a cli based llm manager/runner that isn't ollama>most of the tools in the OP have ugly GUIs and do more than what I wantis barebones llama.cpp my only option?
>>108141530kobold has a cli mode
>>108141610>We're curing cancer right?That reminds me i dont know if it was a meme but didnt one of the new weight loss drugs have a side effect of breast growth?
>>108141138That doesn't really matter because it will always be faster to pack it into wider formats and do math on those
>>108141628>OSR2I have one
>>108141713Have you vibecoded it yet for video tracking or 3d models?
>>108141692source: your ass
>>108139786dude we peaked at pygmalion
>>108139786dude deepseek engram is gonna change everything
>>108140813just have a mail experience with your llm.you can run 1T models at 0.3 t/s with ssd inference.
>>108141530llama.cpp has -hf argument, what more do you need ?
>>108141610I swear I saw this before but there's nothing in the archive. Source?
Zhipu stock is up 260% since January
>>108141628This looks like it would rip your penis off
>>108142007Nah it has low torque
Guys im from the future there is a new local model that change the game entirely. Nothing is the same anymore its a new nemo.
avocado is close
Guys where is zuck? wasnt he supposed to be the wests openmodel guy?
>>108142073DeepSeek will release a new paradigm and Zuck will go back to the drawing boards again
>>108141753>what is simd
>>108142073it was le cunny, he left meta, so now they are closed.anyway with how jeeted llama4 was i don't even give a shit.
>>108142167NTA, but it basicaly doesn't matter whatsoever, the main bottleneck is memory bandwith.
>>108142167int8 tensor cores exist
>>108142058>>108142073For shits and giggles, let's assume Zuck's batshit scheme works out. Turns out Alexandr Penis was secretly the second coming of Christ, Avocado ends up being the undisputed best model that could ever be releasedThat LLM will probably be about... 5% better than 4.6 Opus. They're too late in the cycle. It'd be better, but not better enough for anyone to give a shit or bother paying for it
>>108142225So opus is killing innovation. I got it. Fuck daripo
>>108142073>the pawn can become a queenHmmmmm... what did the chess mean by this?
>>108142073He's still red-teaming Llama 3 35B
>>108142317That it lives in a patriarchal society where the bloodline of the king is dominant so anyone can be a queen if he says so?
>>108142317see sporus in rome.
So close, but no AGI yet
>>108142619the sign is a subtle joke
been like a year since updating my shit, what model is best for RP that fits in 48gb now
>>108142673how much ram?
>>108142688well i'd prefer an exl3 model but I have 64gb ram as well
>>108142694glm air with ram offloading is your best option. there are no good recent models that fit into 48gb.
>>108142706isn't the idea to use like a 3 quant to fit 100+gb models in
>>108142715You need like 128gb bare minimum for anything worthwhile
Is there anything better for roleplay than GLM-4.7-Flash for fags with 12GB of VRAM? It works well and I'm just wondering if there's anything I'm missing out on
>>108142974nemo
>>108142225>>108142242Not entirely true, remember how deepseek came to prominence? They don't need a model that is better benchmaxxed, they need a model that delivers the same as opus 4.6 but at a fraction of the resources and cost. The Chinese models are already getting there>>108142559 but they're benchmaxxing, and constrained by shitty GPUs, hell deepseek has been behind on modles for ages because they're running on huawei Ascend which is shitter than even the half-baked export nvidia GPUs. Meta has hundreds of GB200, the only problem is that the team is still new and unproven (and lacking in other achievements other than working for the competition).
>>108142987DeepSeek had two things going for it thoughIt was open and transparent about its architecture and findingsIt was released at a time where there was still a sizeable gap between open and proprietary models, so it was able to make enough of a "splash" to be noticedNeither of which Avocado has. They could, in theory, offer it for cheaper and try to wither the financial hit, but come on - Zuck and Wang think less with their heads and more with their dicks. There's no way they don't charge a premium price for it if they do successfully benchmax
What's the chink incentive to release model weights?It it just because they are based commies?
>>108141530llama-server router mode BRUH
The brain of a house cat has about 800 million neurons. You have to multiply that by 2,000 to get to the number of synapses, or the connections between neurons, which is the equivalent of the number of parameters in an LLM.
>>108143179Let me see your cats cockbench results then
MiniMax M2.5I'll have to start maintaining a separate chat template version of cockbench because of all the templatemaxxing.Also>it's soft, resting against your thigh
>>108143213Man...
>>108143213grim
>>108143080At the end of the day, open source will win the race, and that was guaranteed the moment fagman decided to close source GPT-3. Chinks aren't stupid, they know having early adoption and workable licenses to let people do what they want is how you take over the sectorUS CEOs just want money and power over poors. Making expensive APIs plebs have to pay for and privatizing the fun stuff is the best way to do that. In that sense, open source becomes an existential threat to them and rather than compete in the space, they throw wet turds like OSS and Gemma 3 at us in the hopes that it'll shut us the fuck up then go back to trying to make their money printing machinesKind of funny, since the Executive Order was supposed to prevent the exact thing that's happening right now
>>108143179Most of those neurons are used for body functions, not even motion, like, organs
>>108143213lmao
>>108143213>goes into a loopat least it doesnt hit you with numbers like m2.1
>>108143238>Chinks aren't stupid, they know having early adoption and workable licenses to let people do what they want is how you take over the sectormy multinational company has imposed a VETO on chinese models, dont ask me why, I'm just a lowly implementer.I even told the guy that its not like they spy on us since we're going to be running it ourselves but nope.
>>108143080When you're behind, you have nothing to lose by open sourcing. People who are willing to pay up for models are pretty much always going to go with whatever they feel is the most capable one on the market, and if you know that's not you then it's not worth trying to compete directly. Inference is in a weird place right now where it's pretty hard to make any money off of it with these giant models. The giants are burning cash to offer inference at current prices because they consider holding on to the market share as being more important than direct revenue. If you know you're not ready to throw your hat in that ring then you're better off biding your time and giving out your models for free. Might as well get the good guy reputation bonus and fight for mindshare there instead of trying to capture a consumer userbase. Plus open sourcing means you're in the bracket of competing against other open source models.
tfw flaccid benis
>>108143213I SEXUAL ASSAULT MY SLEEPING BROTHERI SEXUAL ASSAULT MY SLEEPING BROTHERI SEXUAL ASSAULT MY SLEEPING BROTHER
>>108143179So 1.6T? Between that and >>108143239 you'd think if LLMs were capable of cat-like intelligence, they would be there by now.
>>108143456It certainly has the right idea. Gonna try it myself for long form writing assistance with low expectation.
>>108143179Legends say that Lecun's cat can build you a B2B SaaS in a day
>>108143247Elena instead of Elara now huh
>>108143247Does their training data include examples of such hard pivots in response to prefills?Surely there's no way it could do that if it wasn't trained.
>>108142619GLM5 failed completely at understanding pic related though.
>>108143660Since when can glm5 into vision?
I know it's not exactly cutting edge hardware, but is this enough to run Qwen3-TTS?>3060 Ti 8GB>32GB DDR4>Ryzen 5 5600 if it makes any differenceThe model weighs a little under 4GB; if I understand correctly, this means I've got just over double the VRAM required, but then again I'm a retard who's never set up a local model before.
>>108134772Thanks for the feedback! It's still a bit brain damaged and I'm looking for ways to fix that without losing the newfound sovl.
>>108143664On their website you can select "GLM 5" and upload images.
>>108143687I think it might be operating based on a description provided by another model like GLM 4.6V
>>108143687It's probably just text extraction (using GLM-OCR).
yo guys, what's the best LLM model right now? I havent been to this general in years, last thing I was using was LLama 2 forks.
Apparently the new deepseek can run on a USB stick, using just the memory and the very small amount of compute in the chip. Expect a demo soon that will crash the stock market
wtf the new deepseek just downloaded itself onto my motorola moto g (2014) on its own and it's now running on there entirely localsam's going to freak
>>108144322source ? (the machine elves during your mescaline trip dont count)
>>108144391yes they do
What's with the mass delete? Was I talking to a bot which the mods caught or did someone catch redditor paranoia?
>>108144405most of the time this happens anon was using the proxy to post and someone else got all the posts wiped probbably by uploadin 'p or something
>new model every single day>general still deadI blame the French
Have they fully implemented GLM5 in llama.cpp or is it a broken hack?
>>108143456Very SAAR-coded
does GLM-5 have a version that i can fit inside my 12 gb vram?
ai isnt real yknow
>>108144615I'll let all of the artists know so they can calm down.