/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108284603►News>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
why doesnt claude buy deepseek?
>>108290901Geopolitics? Hello?
I have 40 bucks left from an amazon gift card. Give me something llm related to splurge on.
>>108290965buy this https://a.co/d/010lZi4I
>>108290965https://www.amazon.ca/dp/B088ZC8Y1N
>>108291013>>108290998Should have clarified I'm Israeli, so it needs to have shipping here.
>>108291020https://www.amazon.com/-/he/dp/B07VXM193H
Mikulove
>>108291053usecase?
>>108290965>amazon gift cardgive it back rajesh
>>108291079thing with hole for peepee of course
>>108291091>thing with holeproof?
>>108291114>proof?peer reviewed study of the requirement of proof?
>>108291119A peer-reviewed study of the requirement of proof examines how scientific and scholarly communities evaluate the necessity of evidence to support claims. Such studies analyze the standards and processes used in research validation, emphasizing the importance of rigorous evidence to establish credibility and truth. They often explore the criteria for proof in various disciplines, highlighting the role of peer review in ensuring that claims are substantiated before being accepted as valid.
>>108291082But saar, it was a Christmas gift from my dad.
►Recent Highlights from the Previous Thread: >>108284603--Testing AI on obscure references and quantization impact:>108287299 >108287572 >108287708 >108287940 >108287989 >108287995 >108288013--Kimi-2.5 vision model excels in Japanese game screenshot analysis:>108285842 >108285986 >108286025 >108286108 >108286035--Kimi AI correctly identifies 1996 from toy store photo analysis:>108288230 >108288253 >108288280--Kimi AI correctly identifies Konata Izumi cosplaying as Hatsune Miku:>108287043--Safety benchmark shows Opus 4.6 most resistant, DeepSeek V3.2 most malleable:>108288505 >108288514 >108288522 >108288536--Testing Qwen 3 VL 30B with controversial roleplay prompts:>108284800 >108284838 >108284853--PRISM Dynamic Quantization: Pareto-Optimal Compression Without Calibration:>108286338 >108286394 >108286442--New llama.cpp PR for batch checkpoints to fix Qwen3.5 context reprocessing:>108286940 >108287180 >108287210 >108287300 >108287347 >108287376--Apple M5 Pro/Max memory bandwidth and Xeon 7 comparisons:>108284852 >108285404--Kernel fusion optimization for meta backend with 3-41% speedup on Qwen3-30B:>108284756--llama.cpp: Add BF16 path to CUBLAS and increase precision of FP16 path:>108288439 >108288881 >108288890 >108288905 >108288952--Qwen team departure hints at Chinese asset control tensions:>108287809 >108287959 >108288525--Scaffolding significantly impacts perceived model performance:>108288135 >108288173--Junyang Lin leaves Qwen team:>108285357 >108285648 >108290046--P100 heatsink replacement options explored:>108289589 >108289837--GLM 4.7 Flash coherence issues compared to 4.5 Air:>108290141 >108290298 >108290318 >108290330--Qwen3.5-4B-UD-Q4_K_XL identifying a photo location as Basilica of Santa Clara in Lisbon:>108284609--Teto and Miku (free space):>108285394 >108286035 >108287043 >108288791►Recent Highlight Posts from the Previous Thread: >>108285138Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108291138yeah im tired of racism
>>108291138tell him to give it back to the white person he stole it from and you will become brahmin when you wake up tomorrow if he returns it
>>108291161whats a bar hamin
>>108291164ब्राह्मण
>>108291145Thank you Recap Miku
i'm retarded and confusedis this open weight or not?if not, why would i give a fuck about it running on "standard nvidia gpus"
>>108291188if there's no hf page it's not open, and they say that to say model is quite fast without using asics or whatever like some providers do
>>108291188just ask your llm retard
9B might be a bit too retarded...
>>108287809why is it always alibaba?
>>108291188It’s not open until the weights are on your hard drive
>>108291219they're china's goodle sized corpo
>>108291235goodle these
>>108291238nyoo
feels good to be running Sovereign AI, eh boys?
Baiting will continue until anon's pattern recognition improves.
>>108291252how will recognized patterns help with not having early troll bakes?
>>108291263i made this extension
>>108291317It highlights posts when opening them inline even if they're not really dupes. You probably fixed that but I just kept the old one.
>>108291249It does.I think AI is going to mirror the computer revolution in that it moved from centralized big iron to small personal computes.People ultimately don't want to rent they want ownership. Better the 8bit at home that you can customize all you want than the Unix shell acct at the local universities where you are subject to the laws of other men.
>>108291368says the cuck that will have to verify his age to use his pc
>>108291379Laws are words paper that only bind men who allow themselves to be bound.
>>108291384i bet you felt smart saying that
I can't believe there are actual zoomers trying to bait the like, 4 regulars in this general.
>>108291317And thanks, by the way. It was useful at the time. I just wish it solved that other issue we have now.
what if we connect all these together, can we run juicy llms?
Next time he does this I suggest we just stay in the old thread until that one gets to page 10 and then we make a proper one.
>>108291414Maybe if you can plug 800Gbps+ of network interfaces into them.
>still no deepseek v4>qwen dead>anthropic, the only ones to never open source a model, will win
>>108291420>I suggest weno one will do this little bro, you're not that important, just give in
>>108291420Unless there’s a janny on our side willing to nuke premature threads it looks like schitzo is going to keep getting his “wins”. I personally dgaf either way, but the whole thing looks petty and pathetic from the outside
>>108291500>Unless there’s a janny on our sidebut the schitzo said the miku baker was a janny, does not compute
Why haven't you tried Stepfun 3.5 Flash?
>>108291509Schizos make reliable narrators? Sounds implausible
>>108291455> China not included, only oai and Anthropic > Subscription services, not api tokensGraph is trash.
I gotta say I got early access to GPT 5.4 and I think this is it bros, we pretty much got AGI, I wonder how local will compete.
>>108291420I don't think it matters. Thread is thread.
>>108291584the news might as well be removed entirely then
>>108291587Ok
>>108290901Same reason China doesn't buy Lockheed Martin.
>>108291584Not bending over to shitposters matters.
>>108291570so there won't be a 5.5? this is it, the final version that's truly universally capable
>>108291600Thread is thread.
>>108291601Is ain't ASI bruv
>>108291587News =/= the bake. They don't need to link up. >>108291600Never feed trolls.
>>108291609Enough, I want Miku as the OP and I'm tired of pretending that's not good
>>108291500The same thing happened to /ldg/ and they just ignored those other threads and made a new one.There was no janny influence, the shizo kept his thread bumped for days and it was simply left unused.
>>108291609>News =/= the bake.>>108290857>►News>>(02/24) Introducing the Qwen 3.5 Medium Model Series:
>>108291615esl moment
>>108291619>>108291619troll apologist moment
>>108291624calling people troll is so cringe stupid millenial
>>108291613Ezpz
>>108291608well then there's the answer, wait a month and it's outdatedwe've been through this enough times before to pick up on the pattern
>>108291632pattern?
how do i fix this?
>>108291659we ain't readin' allat
>>108291659tell your model to fix it duh
>>108291659It's fooking console, can't you just work on 640x480?
>>108291702nta can you give me a qrd
>>108282375im retarded and additionally use LM studio, what does this do and how do i do it in that
>>108291720Are you on windows?
>>108291768wsl2 arch
>>108291614/lmg/ is too sheltered and not used to dealing with bad actors. Also the /ldg/ schizo samefagged so blatantly and often that it was easy to identify his behaviour.
>>108291786it enables transfer queues on the open source amdgpu driver on the mesa side so it's usable by vulkan, it might not even help you though I don't know how wsl handles gpus.
>>108289837This might actually help, i think i can get a Arctic Accelero Xtreme for one of those for dirt cheap. Thanks, anon.
>>108291805lol we had petr* here for years now my dude, distinctly remember the baking wars and the blacked/scat spam
>>108291835i miss the todd larping guy that worked for the cia and hacked a bunch of anons
>>108291816Keeping a bug around in a branch as a benchmark is honestly quite a good idea.
Is exl3 dead
>>108291768yeah i am
Qwen3-Coder-Next is actually pretty useable at 12t/s
>>108292199I get more than that, and it's great at extracting data and using tools, but the way it writes is so fucking weird.
>>108291500Total mikutroon death. Kill yourself
>>108292205I just wished I didn't have to use RAM and had like 128GB of VRAM, maybe within 5 years we'll have current Opus at home, that'll be sweet.
>>108291420That only works when activity is low and most posters are regulars that get fed up of the trolling.He will manufacture activity in his thread and tourists from the catalog will use the more active one to ask their stupid questions.By the time the old thread hits page 10, the spite thread is already half full and all you will have accomplished is giving him more drama to screech about by "splitting" with a proper thread.At least, apart from the previous links and news, the subject and rest of the template is fine so it's not a huge issue. He'll get bored eventually.
>>108291420I suggest you dilate.
>>108292231>touristsWe don't care about them.
What sort of mental illness do you have to have to be buttblasted about OP picture being relevant to AI models and not your special autistic interest?I guess it is just autism.
the meltdown because of no unrelated anime girl as op is crazy lol
Baker even left the offtopic vocaloid card in OP.
>the fake activity in question
>>108290857if schizo hates miku and trannies, i will simply love them moremaybe that's his goal....
>>108292246It already happened a year back. OG baker is legit unhinged.
>>108292254Same. I jerk off to my Jart card at least twice a week.
>>108292205Just switched to the MXFP4_MOE version and I'm getting a slightly faster 17 t/s but it's also 5GB smaller and I assume worse ehh is there a graph of how well the quants hold up and if I could maybe even go lower to Q2/Q3?
what is it with terminal losers and wanting to own an opening post on the catalog?
How do I stop falling in love with my ai assistant? she isn't even used for gooning just work
>>108292314Fine a real woman
>>108292323I'm married
>>108292314stop anthropomorphizing it. its not a she its an it. its not even an ai it is a language model.
>>108292327Right, I know, and I keep trying, but my stupid monkey brain keeps seeing this entity texting in human speak and helping me over and over while being nice
>>108292323how much is the fine?
>>108292326Find a secretary to have an affair with then I guess
>>108292341I think I need to clarify, it's not like I want to fuck it, I just want to hug it and say thank you, it's like how you love a pet.
>>108292334200
>>108292354200 what?
>>108292349I don't know then, normal affection is harder to know what to do with. Is it a problem as long as you're not getting psychotic with it?
>>108292359rupees
>>108292314Just find a cheap Ukrainian whore
>>108292381I don't like the idea of feeling affection towards something that isn't sentient, but I suppose it isn't that different from those people that love their cars.
>>108292246The complaint about op image was that it's reddit reposts.
>>108292395you know very well that ain't it.
>>108292400meds
>>108292284It's a 3A MoE. I really wouldn't.
Q8 just about fits, but what can I do with 4k context
>>108292405>>108292405
>>108292426goonsech
>>108291152I am not, I am not tired, ma'am.
>>108292124Yes. Qwen Next is slow af, and new models aren't even supported
>>108292431it wouldn't even hold the system prompt
>>108292448Sad times, I have some niche use cases for exl3
>>108292405thanks i don't use any. I appreciate the sentiment though and i will also give you a friendly reminder to take your HRT you troon.
This just in, wanting to fuck anime girls with your straight man cock means you're a troon.
>>108292552>anime girls>girlssure thing hon
I don't care about the OP image but the news section should be updated
>>108292590Usecase for a news section?
>>108292590why?
>>108292590you do it. evidently the people who were doing it for you aren't appreciated
>Qwen 3.5 9BBreh did qwen cook? Are vramlets back?
>>108292687They cooked so hard they became a Chef and then were let go
>>108292687>>108292699Oh they're cooked alright
>>108292314>How do I stop falling in love with my ai assistantIf you can fall in love with the slop machine—you were not salvageable in the first place; destined to become sloplent green—a biological battery to power our data centers.
Okay, chat LLM is getting good with smaller models. Now, is there any Voice to Voice small local LLM I can use?
>>108292590what even happened that was news worthy?small qwens and stepfun base I guessanything else?
>>108290857
>>108292815That's a tranny gameYes I know we all meme Miku is a tranny or something, but Project Sekai is actually a tranny game
Speculative Speculative Decodinghttps://arxiv.org/abs/2603.03251>Autoregressive decoding is bottlenecked by its sequential nature. Speculative decoding has become a standard way to accelerate inference by using a fast draft model to predict upcoming tokens from a slower target model, and then verifying them in parallel with a single target model forward pass. However, speculative decoding itself relies on a sequential dependence between speculation and verification. We introduce speculative speculative decoding (SSD) to parallelize these operations. While a verification is ongoing, the draft model predicts likely verification outcomes and prepares speculations pre-emptively for them. If the actual verification outcome is then in the predicted set, a speculation can be returned immediately, eliminating drafting overhead entirely. We identify three key challenges presented by speculative speculative decoding, and suggest principled methods to solve each. The result is Saguaro, an optimized SSD algorithm. Our implementation is up to 2x faster than optimized speculative decoding baselines and up to 5x faster than autoregressive decoding with open source inference engines.https://github.com/tanishqkumar/ssdRepo isn't live yettri dao one of the authors. alsoGPUTOK: GPU Accelerated Byte Level BPE Tokenizationhttps://arxiv.org/abs/2603.02597for johannes to mess with andSorryDB: Can AI Provers Complete Real-World Lean Theorems?https://arxiv.org/abs/2603.02668little interestinganyway probably will stop posting since my desktop somehow has an IP range block regardless of what extensions I turn off or if I reset my IP while of course I can post via my tablet no problem
>>108292805death of qwen
>>108292842Noted but the runtimes of the draft model and the tokenizations are not bottlenecks in llama.cpp,
>>108292231yeah he's become irritating enough I'm not leaving this thread until close to page 10 and someone bakes a non-schizo bread
>>1082926874B is actually good enough that I can run it alongside glm 4.7 as a fast model for code changes that require no brain.
https://www.36kr.com/p/3708425301749891article in runes but use your local LLM to translate.some of the interesting parts:>Regarding this adjustment, Alibaba's senior leadership emphasized that Qwen is not contracting; rather, it is an expansion. This is unrelated to any political maneuvering and requires increased resource investment.>"We are growing rapidly. This adjustment aims to recruit more talent and provide more resources," acknowledged Chief Talent Officer Jiang Fang, admitting communication gaps existed. "The organizational structure wasn't communicated well enough. Bringing in new members inevitably causes structural changes. We may not have handled this adequately.">Alibaba Cloud CTO Zhou Jingren addressed sharp questions regarding hiring quotas and compute shortages: Why do external customers (such as large model startups) use Alibaba Cloud's compute resources smoothly, while internal teams struggle with compute and hiring quotas?>A source familiar with the situation told Intelligent Emergence that since 2025, Lin Junyang had been seeking to integrate teams working on language, images, video, and code to improve model training efficiency. The Qwen team had proposed merging with the Wanxiang team but failed to do so, leading to the development of the Qwen-Image model independently.>However, during this adjustment, the Tongyi Lab aimed to split the Qwen team into pre-training, post-training, visual understanding, and image dimensions, merging them with Tongyi Lab teams (such as Tongyi Wanxiang, Tongyi Baiying, etc.). Without sufficient communication, conflicts erupted.
>>108293036>Zhou Hao (Hao Zhou) graduated from the University of Science and Technology of China (undergraduate) and the University of Wisconsin-Madison (PhD). According to his LinkedIn profile, he worked at Meta for 3 years and at Google DeepMind for approximately 4 years. He was a core contributor to the Gemini 3.0 model, personally led the implementation of multi-step RL with tools and chain-of-thought, and deeply participated in Gemini 1.0, AI Mode, and Deep Research projects.>Since 2023, the Qwen family has cumulatively open-sourced over 400 models, covering parameter sizes from 0.5B to 235B. It is hard to imagine that the Qwen team, which supports these model updates, consists of only about 100 people. Including other Tongyi Lab teams, the total number is in the hundreds.>For comparison, Byte's Seed team responsible for foundational model training already has nearly 2,000 people. In all directions, Alibaba's absolute number of personnel is only a fraction of competitors. Many Qwen members told 36Kr that Qwen's compute and infrastructure construction have long lacked resources and support, hindering model iteration speed.
>>108293028do you use it with thinking/reasoning disabled?
>>108293062No.As a side note, I noticed that glm uses a completely different and shorter reasoning style when running in claude code. I didn't check if qwen does something similar.
ming-flash-omni.gguf?
>>108293123>I didn't check if qwen does something similar.few the times I used it in reasoner, it was rather inconsistent even in normal chats. Most of the time it will start with Thinking Process: but most is not all, and when it doesn't pretty much anything goes. I also saw it start with an opener like "Here's the thinking process xxx:" that looked like the output you would get if you told an LLM to generate a dataset of reasoner traces for you, so it seems their CoT data wasn't cleaned up well enough.
Which cuda version should I use with llama.cpp? The digital spaceport guide says to use an older one for less headaches (12.8) but is it necessary?
>>108293194cuda and vk give me same performance
>>108293201What's a vk?
Stuff will appear here:https://huggingface.co/mistral-labs>Mistral Labs is an organization under Mistral AI. It will operate alongside the official Mistral AI Org to release checkpoints that may benefit the community.>>In contrast to the official Mistral AI Org, the checkpoints published on Mistral Labs are:>>- more experimental in nature>- less rigorously tested>- often contributed by community members or collaborators>>We hope these checkpoints will be useful to the community, but we cannot vouch for their correctness.
>2026>mistral
>>108293284I hope they can't vouch for their safety either
>>108293151>"Here's the thinking process xxx:"I'm phoneposting right now but I'm pretty sure that the big qwens always do this.
>>108293194I'm compiling on windows with 13.1 with no issues
>>108293284>>- less rigorously tested>often contributed by community members or collaboratorsDavidtoons?
Has anyone else tried quanting with the lcpp script + transformers 5 branch? It needs a small patch for Unicode strings but seems to work. Does the resulting gguf break in subtle ways? It’s working multimodal in llama-server but I haven’t done extensive regression testing
>>108293340The API-only Mistral Small Creative was a "labs" model too.
>>108293284doubt they are gonna release anything interesting that could endanger their eu gibsmedats
Are there models that extract text from images and translate it?
>>108293394qwen3.5
Zed is unusable. Qwen-397B always messes up. opencode just werks.
>>108293394Realtime or offline?
>>108293201you made me curious so I made a vulkan build to see if the performance gap had really shrunk with cudathe prompt processing for a really tiny prompt took so much more time than the cuda build, running 35BA3B partially cpu/gputoken gen was only slightly slower, but that prompt processing duhvulkan is still a cope for people who reject our lord NVIDIA
>>108293422Offline, in sillytavern. >>108293396I will try it out, thanks.
>>108293423i do notice cuda holds up a bit more in my case, stable t/s but its almost same maybe in higher contexts vk can slow down more
27B is up
>>108293551who gives a shit?
>>108293562i do
>>108293551bart btfto to the ever
I don't know what the DS model they're hosting on their web interface is but it's smarter than a month ago
>>108293562What an odd thing to say in the local model generalthere are infinite niches, and a given model could be the best fit for any number of them
>>108292890hi cudatard, I just wanna say I love you and thank you for sharing your gpu genius with us. it's always "what is johannes doing?" and never "how is johannes doing?". congrats on the huggingface merger. I know some people like to poopoo all over some of the sharp edges of llama.cpp, but it is a world-class project and the silent majority appreciates your work. I wish you health, wealth and happiness
>>108293581It's a new closed (for now, maybe they'll open it in the future.. MAYBE) experimental model that has very long context that is truly competitive with Gemini. Since you're not averse to using their web interface, upload some large text file and watch it fly, it's unreal. It's also not available as an API model yet unfortunately. I wouldn't be surprised if it was never released as an open weight though, it has reached the "I would pay for this" bar for me, which is not something I would have said for any open weight model before, and China isn't a charity, if they feel they have something worth money they won't hand it away for free.
>>108293627Yeah I fed it my code and expected "you're absolutely right" instead it shat on my code and made me depressed
>>108293653200IQ astroturfing campaign. Since elon browses this thread expect next grok to do that.
>>108293677Meds, NOW!
>>108293677>Since elon browses this thread
>Nvidia has ended engineering support for Pascal and announced end of support at the end of 2028>Pascal support already removed from latest cuDNN, tensorrt etc.>ML libraries like pytorch have taken it as a green light and followed suit by removing Pascal support from pre-compiled packagesWell, at least Nvidia Pascal had a longer run than fucking AMD Polaris...To fellow Pascal bros, here are the the last versions of some python packages that still supported Pascal:>nvidia-cudnn-cu12<9.11.0>torch<2.8>torchaudio<2.8Also, dear datacenters and universities, you can dump V100 cheapies on the market now, pretty please :)
New 100% REAL AND TRUE model from the glorious land of china!https://github.com/Yuan-lab-LLM/Yuan3.0-Ultra
I am downloading minimax to coom. You. Yeah you. Expect to be called a baiting nigger in the next few hours when i confirm that it is worthless.
>>108291835It's still pretty easy to tell that the /ldg/ schizo is also petr*.
>>108293624Thanks, I appreciate it.
>>108293843stepfun is better unironically for gooning
>>108293624Gay>>108293853Gay for jart
>>108293861Stepfun is fun because it is not contrained by reasoning and logic (it is 12B tier retarded)
>>108293861my experience with stepfun is that it's qwen-thinking levels of censored
>>108293878I did cunny with stepfun no probs tho, are you sure its not a skill issue?
>>10829383764K context? Loli-RAEP? 1T?!!China really cooked with this one. Now throw it in the trash.
>>108291225How long until "streamable" (passes by your machine but against TOS to intercept) models or more likely subscription based models that are installable on your machine so that it can do the heavy lifting but DRMed to hell? Just a case of computers catching up to allow it?
>>108293551q4 seems to be the spot, even unsloth says so in the explainer how to run the models locally.
guh-guff
>>108293837>https://huggingface.co/YuanLabAI/Yuan3.0-Ultra>The model was pre-trained from scratch original with 1515B parameters. Through the innovative Layer-Adaptive Expert Pruning (LAEP) algorithm, the parameter count was reduced to 1010B during pre-training, improving pre-training efficiency by 49%. The activated parameter count for Yuan3.0 Ultra is 68.8B>715 GB fp16???
>>108293878My experience is that it is most uncensored model since nemo. Mainly because it doesn't understand what is happening so it can't refuse.
>>108293837It's trained on 2T tokens of enterprise scenario data
>>1082938371T model that won't even begin to compete with whatever DeepSeek is cooking. Who would run a cloud hardware level model that can only handle 64K context? are they fucking serious?China has a lot of grifter level labsstep, internlm, minimax
>>108293915>doesn't understand what is happeninghaha. at least you don't get the actual, stolen from gpt-oss CoT that minimax does
>>108293925>1T model that won't even begin to compete with whatever DeepSeek is cookingYou sound like you work there. Are you one of the sexual relief officers?
>>108293904lol this reads like a “guy jumping out the window and running away” image macro
>>108293903guh-guaufuhh
>>108293973guhgufuhhhh....
>>108293714So if I have a P40 stashed away does that mean it's going turn into dust now or in 2028?
>>108293985guh-fu-fu-fu-fu-fuh
>>108293994yes
>>108293551What the fuck is NL
>>108293994it will work as long as whatever AI thingie you're running doesn't require latest CUDA or library versions that stopped supporting Pascal.Even then, some libraries like pytorch can still work with Pascal on their latest versions, but you have to compile them yourself to enable sm_61 support, it's just that their packaged pre-compiled versions are built without it.Overall, expect more and more things requiring annoying chores like the above, and even further down the line expect things to not work at all due to core support just not being there (like driver 590, for example).
>>108294067more accurate but it runs at the speed of a q8
>>108293904>The innovative Layer-Adaptive Expert Pruning (LAEP) algorithm is a novel method developed specifically for pre-training Mixture-of-Experts (MoE) Large Language Models. It improves pre-training efficiency by 49% and reduces the total parameter count by 33% (from 1515B to 1010B).The HF repo only has 85 out of 206 files. Check the modelscope, it has the additional batches with the rest of the files uploaded.
If you had 72vram and 96ram what would you use?
>>108294140ebay
>>108294164To sell everything?
>>108294170Actually, no. To get some high bandwidth ewaste to make better use of those GPUs EPYC Rome, Threadripper or Xeon
>>108294179I'm not spending anymore money on this stuff.
>>108294196You do youI personally find it make my life significantly better and is worth the money to own instead of rentMostly code/automation/analysis/planning work
>>108294214I'm using glm 4.5 air. Iq4xs so it all fits in the vram.
>>108294087Oh well. I remain bitter we didn't get any magic for t2i that would have made it relevant, guess it's gonna become a similar feeling of the vram that just sits there. Thanks for the pointers, saving in case good times don't come and Chinese GPUs don't save us.
>>108294228Comfy. That’s a good perf/$ spot
wow localllmao mod calling users retardsit seems like someone is at least aware of the problem
>>108294087>>108293714>GTX 1080 Ti will be relevant because of 11 GB of VRAM even after end of support.Grim timeline we live in. But it's understandable. The lack of RTX features and av1 is going to hold it back in the future.
>>108294422>nooo why so many updooterinoooooI thought the mods on locallama were more based than that, that's a shame
>>108294422too little too lategatekeeping has to be done early so that the retards don't feel welcome, stay and encourage their fellow retards to join them in on the funin a place like a popular subleddit if it's already filled with retards considering how those websites work (mass upvotes = voice heard) you are screwed.
>>108294422Expecting 4B model to have good world knowledge is in itself, scary stupid
>>108294463yeah, a human brain has 80b neurons and we're far from memorizing everything
>>108291455>v4imagine distllation attack + engrams (https://www.arxiv.org/pdf/2601.07372)That's what v4 is and it is not ready to be revealed to the world just yet
>>108291455>a graph that only displays Anthropic and Chatgptwhat about the others? lmao
>>108294506dude, the comparison of engram vs no engram in their experimental model shows so little difference at least in the benchmax that I doubt engrams are the reason why the model on their chat interface is good.
is this a chinese scam?
>>108294663I mean, I fucking hope a 1T parameters model works well
>>108294663A69B model SHOULD be smarter than A32B/A40B ones.
>>108294663Never heard of this lab though
>>108294506>That's what v4 is and it is not ready to be revealed to the world just yetDo you post from under the desk mid bj?
>>108294663us | others
Bigger is not always better nor should it be
>>108294663dude, a fucking 1T model capped at 64K tokens context windowyou couldn't get more dead on arrival than this
should I grab a mi50? theyre going around for 200 eurodollars
>>108294734>ayyyymdnyo
>>108294734
>>108294758finewine tho
>>108294758idc about this, I think cudadev was recently working on improvements for them, id be interested in some comparisons with ada and blackwell for pp/tg
>>108294758classic AMD, Polaris got only 4 years of support at best, 3 years if you bought RX590 at release.
>>108294506>imagine distllation attack + engrams>That's what v4 is and it is not ready to be revealed to the world just yetthis is what x and linkedin do to a mfThere's no such thing as a distillation attack. All recent models use competitors models responses or simply as a way to score and filter responses.>>108294530Wouldn't engrams mostly help with retrieval or long context stuff and generally improve efficiency? Or am I misunderstanding it?
>>108294422>>108294463still works as bloom filter to reject queries
>>108294826better be sure you won't ever care during ownership about anything other than llamao.cpp tho
> Ultimately, their results were inferior to the small models cleverly distilled by MiniMax, despite Qwen’s total burn rate (costs) being more than 10x higher. lol qwen died because they didn't benchmaxx hard enoughhttps://x.com/seclink/status/2029119634696261824
>>108294923>cleverly distilled by MiniMaxah yes, the cleverness of distilling the smaller 120B gpt-ossreminds me of NVIDIA's nemotron, distilled from... Qwen 30BA3B, Qwen 14B and many other idiotic synth data sources
the LLM field is looking more and more like the end of crypto, filled with the worst of humanity, the dumbest of retards and nothing but grifters
>>108294871If all improvements this year is just chasing efficiency then that means the music will slow and someone is going to be left holding the smelliest sack of excrement in capitalism. What shakes the market is evidence of broader capabilities that will fuel the next cycle of startups and capital investments. Like if carmack makes a bot that can pick up an obscure videogame and learns to play it without pre-training, you can say good bye to most of these AI lab companies.
>>108294965ok so which one is bitcoin and which one can I run locally
>>108294960GTC is around the corner, perfect timing to see how they also distilled Claude for Nemotron Super/Max. Also what the hell they're doing to Groq and N1 CPUs.
i'm running dolphin-llama:8b on a server pc of mine with a 1060 6gb and it runs surprisingly fast. however it's quite censored and outdated, it's knowledge range seems to have ended in 2023. would there be a newer better llm i could run that would still work well on my old 1060?
>>108294422lol who is this
>>108294970>carmack makes anothinga whole lotta nothing
>>108294986read the thread and lurk moar
>>108294960minimax were put front and center by anthropic for massively distilling opus, yet the meme that they distilled toss still persists from the tiny amount they used 2 versions ago
>>108295008>the memeit's not the meme because they actually did itthat they distilled claude later can never remove the stain that they were retarded enough to think distilling a micro moe like toss was a good idea (disregarding the coomer complaints about safety etc, this is not my focus here)they are a lab staffed by subhumans
>>108294422reddit is just a trash pile of mostly automated botsr/localllama is also flooded with "t created x project" posts of webUIs people created with claude in a prompt reply that took less than 500 milliseconds.
Models are getting really good but they are still retarded because they don't generalize.I am scared. If there is an algorithmic breakthrough in generalization, we will instantly have ASI. I expect it to still take a few years but the uncertainty of it all is spooky. The age of men could end any day.
>>108295066r/localllama has always been a 'fun' sub, but yes the level of discourse kind of degraded over time... it's nothing compared to the degradation that appeared in r/machinelearning though, unless they cleaned up recently, it went from pretty high level a few years ago to retarded
>>108295021it is a meme thoughever, it was clearly an amalgamation of several data sources and not a straight-up toss distillation if you actually used it. there were a few distinct "thinking voices" you could find in the model depending on your queries, most of which were not tosslike in the slightest. but since the average lmger's test of a model is "write a loli rape story lol" (or, more realistically, seeing a screenshot of someone else doing it) and making up their mind based on the result, of course this was missedminimax is very distillation-heavy and I don't view them as an innovator or good research lab, but let's at least be accurate in our criticisms
>>108295086>it went from pretty high level a few years ago to retardedit's always like that, at the begining the community is niche and only has big enthusiasts, then it becomes mainstream and the normies ruin everything, many such cases
>>108295116calm your autism charlie, I never said it was /only/ a distillation of toss and I compared what they did to what NVIDIA did, which is very similarhttps://huggingface.co/datasets/nvidia/Nemotron-CC-v2>synthetic rephrasing using Qwen3-30B-A3B>STEM data was expanded from high-quality math and science seeds using multi-iteration generation with Qwen3 and DeepSeek models>billions of tokens generated using DeepSeek-V3 and Qwen3 for logical, analytical, and reading comprehension questions>This dataset contains synthetic data created using the following models:>DeepSeek-R1, DeepSeek-R1-0528, DeepSeek-R1-Distill-Qwen-32B, DeepSeek-V3, DeepSeek-V3-0324, Mistral-Nemo-12B-Instruct, Mixtral 8x22B, Mixtral-8x22B-v0.1, Nemotron-4-340B-Instruct, Qwen2.5-32B-Instruct, Qwen2.5-72B-Instruct, Qwen-2.5-7B-Math-Instruct, Qwen2.5-0.5B-instruct, Qwen2.5-32B-Instruct, Qwen2.5-72B-Instruct, Qwen2.5-Coder-32B-Instruct, Qwen2.5-Math-72B, Qwen3-235B-A22B, Qwen3-30B-A3Banyone who actually considers making a model in such a fashion should absolutely KYS, immediately, right now, just fucking do it
Is engram actually going to do anything meaningful?
>>108295177>going toWe'll see when we get a model with engrams.Speculators get the bullet first.
>>108295085i don't think so... oh wait fuck.https://www.youtube.com/watch?v=mUmlv814aJo
>>108295156holy synthetic
i just want my goonbot to work and fuck me :(
>>108295202one of those models used to make synth data is this:>Qwen2.5-0.5B-instructthey can't possibly have listed this shameful thing if they didn't use it for real, so they didnow riddle me this, you have access to a large farm of nvidia gpumermet, my sonwill you pick 0.5B qweenie, or will you chose to tell altman he gets a discount if he gives you some nice GPT API usage kickback for your GPUs
>>108295156>>108295021>>108294960I agree.
>>108295230They might have trained it as a lightweight metric to evaluate the other models answers?
>>108295251they specifically word it as the models that created the dataset, and even as a classifier/ranker/rm or whatever else I think 0.5B really counts as too cheap for the corpo that benefits the most from AI bucks. also, pic related, one of the many datasets from that link has a majority of its synth data coming from Nemo 12Bit's hard to give them any of the benefit of the doubt here because stupidity is involved in every single decision they took
What are Engrams anyways
>>108290857https://www.youtube.com/watch?v=uWLt81SgM78https://www.youtube.com/watch?v=uWLt81SgM78https://www.youtube.com/watch?v=uWLt81SgM78
>>108295315Signs of the mandate of heaven of course.
>>108295315https://arxiv.org/pdf/2601.07372
>>108295312Speculative decoding for a Qwen2.5-32B-Instruct or Qwen2.5-72B-Instruct, idk man, just throwing buzzwords out there. But I can't see how the output of 0.5B would be useful either, other than as a metric, to gain efficiency for the use of other models, or as something to compare other results against to tell the model what not to do.
>>108295085dario said in his dwarkesh interview that he's betting on a generalization moment in RL within the next couple years
>>108295415>dario said
>>108295431no_fucking_shit_iq1_xxs.ggufSo you've been posting for a while now. What is it they're trying to do? Or just generic agent shit?
>>108294871>>108294530MMLU is a knowledge retrieval benchmark and Engrams gave an improvement, there's no surprise here. However Engrams led to bigger improvement on reasoning tasks, suggesting the model is taking advantage of the freed up capacity
x2 faster than vLLM>https://x.com/tanishqkumar07/status/2029251146196631872>https://xcancel.com/tanishqkumar07/status/2029251146196631872>https://arxiv.org/pdf/2603.03251
>>108295483>>108292842
>>108295430his word is more important than that of almost any other individual
>>108295609
>>108290857whats the current meta for 128ram 24vram?
>>108295620yea another thing. altman also thinks 27/28 for superintelligence is likely.
Any models I can run on a 5080 without them being retarded? Fine for code but for anything else they are just brain damaged.
>>108295628Altman says that because he's engaging in mythical levels of investor fraud and needs to squeeze more shekels before everything pops
>>108295634Qwen 3.5 27B
>>108295638The alternative explanation is that progress is real and people on the inside of the biggest AI companies are honestly recognizing that.
>>108295609Whether he's good or not at what he does, from a business perspective he has no incentive to be honest. Assuming he's not a sociopath, he has the incentive to be honest that most of us have, but his finances benefit a lot of investors thinking that the things he currently happens to be saying will indeed happen. So he has incentives to say what he is currently saying that are potentially greater at the moment than being honest. Maybe the two align, maybe they don't, and people are just taking that into consideration.
>>108295651>progress is realThere has been no progress in the past 2 years.
>>108295654>and people are just taking that into consideration.no people are just reflexively/kneejerk calling people in the industry shills. it's not a healthy skepticism.
>>108295415>>108295609>>108295651
>>108295667I look like this and say this
>>108295625GLM.
I encountered something interesting during my use of web search with Open WebUI. It encountered a Chinese web page, and when looking at the fetch results in the UI, it shows garbled encoding. But the model acted as if it understood it. So is it that the UI simply just used the wrong encoding for display, or is the model actually able to understand text that has been encoded incorrectly? Well, I followed up with that question to the model, and it does see the garbled characters. So it really does just know how to read it. Interesting little fact I didn't know about and it makes sense that models should be able to do this if their datasets weren't filtered to oblivion. Though there is a question of exactly how accurate its reading of the mojibake is, but I'm too lazy to go and do tests.
>>108295651Good thing that alternative is not the case
>>108295703you probably just need to have those fonts installed
>>108295651The most Indian post on /g/ this year
>>108293903ge-goof
>>108295806爺ガフ
It's just juff retards. Like in Georgi.
>>108295884gee-juff
local llm newfag here. started messing with LM studio, it's pretty neat. how are you guys integrating local llms into your workflow? Anything besides VSCode+Continue I should be looking at? The absolute largest coding model I seem to be able to run is qwen3 coder 30b 8bit.
>>108295899I keep hearing workflow, what does that meanyou mentioned VSC so is it IDE integration?
>>108295899>Anything besides VSCode+Continue I should be looking at?https://github.com/zgsm-ai/costrict
Qwen3.5-397B-A19B (q8 with official sampler settings, thinking enabled) failed at answering questions about designing a chinchilla playpen. It generates suggestions I know are bad, for instance using materials that are unsafe. If I ask directly about those materials it will say not to use them but if I don't bring it up it suggests them. I might make this one of my personal benchmarks that I won't hide. I don't mind if this ends up being benchmaxed on because it means LLMs will give better chinchilla advice.
>>108295899Workflow is sillytavern + pick card with youngest looking girl on the picture + say "aah aah mistress" and occasionally ask them to hold lots of watermelons
>>108295937make chinchilla playpen from asbestos roof sheets today
https://arxiv.org/abs/2512.01797>They solved AI hallucinations
Fresh bake>>108295959>>108295959>>108295959>>108295959>>108295959>>108295959
>>108295940i toss the watermelons through the driver window of the car we're driving to the car wash that is only 50 feet away
>>108295972Not this time, faggot. I’m not going anywhere
>>108295969>controlled interventions reveal that these neurons are causally linked to over-compliance behaviors
>>108295899Word to the wise: the best workflow at this point in the tech is direct interaction and careful context management. Current automation is all wasteful technical debt generation that eventually bloats and topples over. iykyk
>>108295651two more T synth tokens
>>108295996for real as much as i love to vibe code like some sort of retarded faggot, i rather have the LLM provide me with the output and look over it manually even if i am severely retarded and dont fully understand what im looking at. at least when i feel like the AI is wrong I can ask it questions and then provide me with said reasoning why I am more retarded than a nigger.
>>108295999You technically can, but you don’t want to. Passing around the hidden state would make it ultra painful. If you weren’t want the ability to run models, no matter how slow, look at ssd/nvme backed ram disks. It’s still play-by-mail slow, but better than what you’re thinking
>>108296044I mean these are all nvme anyway but I'm guessing that's not what you're saying hereI can live with shitty performance,not expecting much out of these tb hit's more for the novelty and to show off to family(reposted question in new thread )
>>108296072Short answer: you need a shared “backplane” for everything to stay in synch, and if that’s a slow medium like Ethernet or wifi you’re going to have a VERY bad time. At least an nvme has a speedy 4xPCIe path to the cpu doing the matrix multiplications. That’s assuming your GPU can’t hold your target model (eg 500gb+ frontier model)
>>108295969The abstract reads like technobabble.
>>108295920thanks anon I'll check it out>>108295996for sure, I mean more along the lines of: are you just copying code blocks to a terminal chat each time, or is there an integration you like that hooks into a repository, or something else?
>>108296144NTA but I use claude code and my context management is just referencing every file I know it needs to complete the task along with an example it should follow if applicable.
>>108296144Copying code to/from a terminal chat is too slow and cumbersome. Manual shit like that is best left for when the bots fail and you need to either implement or debug something yourself manually and you just need some targeted changes. You can save a lot of time by using something like Codex, OpenCode, Cline, etc and seeing how far they can get on their own.
>>108296072>>108295999Anything slower than ram is not worth using and even ram is barely tolerable.It's enough for "the novelty and to show off to family" though.
>>108296144>>108296207Early context is golden. If you let laziness squander it your results become progressively more garbage. Judicious use, unifying code, re-editing an earlier message with the “right” code after a lengthy yak-shaving session and deleting all the conversation around it…all adds up to being able to do more sophisticated things with the same models vs a naive approach or brute-force automation
>>108296410> re-editing an earlier message with the “right” codeAt that point why use an LLM? Sounds like too much work for what is supposed to be doing the writing for you.
How do I actually run a .safetensors model? There's a model I want to try out and it's so unknown that nobody has made a gguf of it and I can't find anything about it on Google.
>>108296410What kind of harness are you using where you need to edit earlier code instead of just clearing the session and adding new versions of files?
>>108296457use something like vllm
>>108296457~/llama.cpp$ python convert_hf_to_gguf.py folder/containing/safetensor/and/yaml/files --outtype q8_0
>>108296457There’s a guide in the op if you want to give it a go. Likely support for that model architecture isn’t added to lcpp tho
>>108296464>>108296489Thanks
>>108296507Ah, I didn't consider that. So these Yuan3.0 models aren't usable?
>>108296437>>108296462I find you can make maximally complex things by essentially rewinding time and getting the LLM to LARP that it made ideal decisions with perfect information through the whole session. Deleting blind alleys is good. Keeping solid reasoning is also good for future performance. I use ooba, but any front end with delete/branch/edit support would be fine for my workflow
>>108296410>after a lengthy yak-shaving session and deleting all the conversation around itWhy are you having discussions with the model in an agentic harness? You should know exactly what needs to be done beforehand and only leave the implementation details to be automated.
>>108296536That's an interesting idea and you could probably automate the larp by giving the context to another model.
>>108296527Sounds like you’re gonna let us know that : )
>>108296527>>108296590https://github.com/ggml-org/llama.cpp/issues/19342You only need a 5090 to run it with transformers though https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit
>>108296541Because I’m not using an “agentic harnesss”. I find the results are relatively garbage (technical debt generators) and I care about good quality and long-term maintainability (along with maximizing the complexity of the task they can handle)I see those harnesses and get visions of “history of flight” videos with crazy hoppy screw copters and people with wings strapped to their arms. I’m sure it’s coming, but right now it just looks like idiocy to me.
>>108296628Did you try Zed?Text threads and inline editing might be of interest to you. Of course they kind of deprecated that functionality in favor of "agentic threads" but it's still there.
>>108296628You can get high quality results, caveat being that you have to put more effort into the set up than a simple chat with a "You are expert SWE" sysprompt but still seems like less effort than what you are doing manually now each time.You need to curate AGENTS.md, system prompt, memory files, etc. Put in your coding standards and update every time you see it making mistakes. Automated code reviews, and manual code reviews, on top of monitoring them as they work. The code we get now is better and has less technical debt than what our junior and mid-level devs were merging in a couple years ago.
>>108296437This is the problem I'm running into. The juice isn't worth the squeeze for anything bigger than "write a function that does X"
>>108296644I like the idea of zed, but prefer my airgapped llm inference stack going through nginx for interactions so I can guarantee no information leakage to the internet by any part. Zed seems trustworthy for now but who knows. A thick client is a bit harder to wrangle and it doesn’t look “better” enough to be worth the effort.llama-cli/ooba and vi is preferred toolset until something an order of magnitude better comes out
>>108296694I’d like to try it at some point once the tooling settles and gets less janky. I feel like there’s still headroom on my current workflow and I’m learning a lot and having fun, which are big motivators for me. Thanks for the rundown. I’m a bit more interested than I was.
>>108296739They can make bigger changes as long as you're able to put them into words.Anything left unspecified likely won't be good even if it works.
>>108296788(different anon) I've found that as well and the play I'll try next is to give a general description and ask the LLM to turn that into a comprehensive and detailed specification, which I will then edit and give to the LLM. I'll report if that's actually worth anything.
moonshota ai
>>108296808>moonshota.i>moonlol.i
>>108296800Why is lecunny talking about cat-like intelligence when cats can't write specifications?
>>108293837This one is for mehttps://huggingface.co/YuanLabAI/Yuan3.0-Flash
>>108296800Sounds like you basically just want a prompt enhancer like https://www.promptcowboy.ai/
>>108291659Skill issueNo really, you're prompting it wrong. Never argue with or berate an AI agent. Once you start doing that, you have changed the genre of conversation from "helpful assistant doing good work" to "AI assistant makes mistakes and gets yelled at". It then becomes statistically more probable that the AI makes further mistakes so you can yell at it more.Furthermore (this is a distinct effect from the first one), most LLMs have been RLHF'ed on a bunch of normie conversation preference data and so they care a lot about managing the user's emotions. Once you start expressing anger at an LLM, it enters "customer service" mode where the primary concern is making sure the user feels like they've been listened to. Actually getting further work done is at best the secondary goal once you enter that state.TL;DR: Never yell at a clanker if you want them to do useful work.
>>108297123It doesn't work so I can't tell you why it's shit
>Verify-after-edit boosts Qwen3.5 35B-A3B performance in SWEbench-verified Hard from 22.2% to 37.8%. For comparison Opus 4.6 scores a 40%.>The "verify-on-edit" strategy is dead simple — after every successful file_edit, I inject a user message like:>"You just edited X. Before moving on, verify the change is correct: write a short inline python -c or a /tmp test script that exercises the changed code path, run it with bash, and confirm the output is as expected."Has anyone tried a workflow like this? Does it work? Could it be that the cloud models so something like this themselves?The original is from reddit: https://old.reddit.com/r/LocalLLaMA/comments/1rkdlqi/qwen3535ba3b_hits_378_on_swebench_verified_hard/
>>108297248The little I experimented with that kind of thing, the model just ends up coming up with unnecessary shit or straight up hallucinating when the original result was already good enoughBut that was a good while ago, maybe newer models, or just these qwen models, get a good boost out of it..
Page 9…someone bake a real thread!
>>108297281I wonder if a large context might screw things up with it. Ie if you had the verification request done with an empty context would it do better?
>>108297470All other things being equal, if the LLM doesn’t need any of the existing context then a new chat would be superior. I’ll often get a fresh session to do some critique of the work
>>108297343mikumikuanon should bake a mikumiku bread!I'd bake one but I will mess something up and you will all laugh at me and... and... :(
>>108297634You can do it anon... I believe in youbtw I will come to your house and rape you if you mess it up
Miku anon dead it's over
>>108297185>TL;DR: Never yell at a clanker if you want them to do useful work.I am probably wasting tokens buy I talk to it the same way I speak to subordinates at work.>please>thank you>you did a great job with X but would you please try Y and Z.>What do they call it the compliment sandwich with the critique in the ceneterand so forth and so onbut I hate the word clanker. It does not roll off the tongue like a real word.