/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107604598 & >>107595736►News>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107604598--MiMo-V2-Flash release and SWA vs GA debate in long-context model training:>107610359 >107611212 >107611389 >107611538 >107611607 >107611641 >107611726 >107612033 >107612057 >107612103 >107612118 >107612128 >107612346 >107612451--Evaluating modern language models for roleplay efficiency vs cost:>107607970 >107608333 >107608570 >107608582 >107608588 >107608691 >107608373 >107608741 >107608449 >107608489 >107608674 >107608716 >107608757 >107608804 >107608943--Skepticism about undertrained historical LLM project despite potential for time-gated models:>107606603 >107606628 >107606672 >107606741 >107606844 >107606718 >107606784 >107606757 >107606770 >107606932--MiraTTS model critique and audio quality debate:>107607864 >107607903 >107608117 >107609014 >107609123 >107609241 >107609363--Cost and performance considerations for local AI builds:>107604956 >107604998 >107605235 >107605247--Gemma's hiring for multimodal AI research sparks skepticism about job requirements:>107613937 >107613974 >107614102 >107614027--Google MedASR release and Llama 4 speculation:>107605308 >107605884 >107606028--LLMs as psychological tools: experiences, limitations, and philosophical debates:>107607433 >107607442 >107613464 >107607458 >107607522 >107607532 >107607610 >107607651 >107607747 >107607825 >107607872 >107607896 >107607918 >107607942 >107607954 >107607972 >107607912 >107607580 >107607725 >107607757 >107607832 >107607734 >107607907 >107607945 >107608083 >107609148 >107613528 >107613599 >107613646 >107613702 >107613792 >107613896 >107613644 >107613717 >107613821 >107613858 >107613912 >107613990 >107614028 >107614195 >107614243 >107614338 >107614050 >107614222--/lmg/ Book Club:>107606330 >107606677--Miku (free space):>107604899 >107614611 >107614788►Recent Highlight Posts from the Previous Thread: >>107604607Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
There were a lot of new TTS models released the past few weeks, do any of them have a better quality-and-speed-to-system-requirements ratio than GPT-SoVITs yet with voice cloning? Or a C++ implementation without needing a VENV?
>>107614872Not really, sovits is still the best
So let me get this straight. After a week of anons screenshotting some retard on twitter, google released nothing of note?
Thoughts on Huihui-Qwen3-VL-32B-Thinking-abliterated? Apparently it's the first 100% uncensored model on the UGI leaderboard. Will do literally anything you ask.
>>107614887As in last thread. Just use 4.6 to destroy your identity and then you won't care about new models not releasing.
>>107614887MedASR and FunctionGemma sir?
>>107614872chatterbox/vibevoice/sovits are really the end. I enjoyed echo-tts but that faggot won't release the cloning part.
>>107614896>MedASREnglish and Chinese only Whisper for medical texts. Oh yeah, that's exactly what we've been waiting for.
>>107614872You don't need more than sovits
>>107614887which retard?
>>107614882>>107614903>>107614930Now if only someone made a llama.cpp/stable-diffusion.cpp style single binary that doesn't require all the usual python bloat.
>>107614887BLOODY VISHNU BASTERDS NO REDEEM>>107614892the most embarrasing shit is seeing that leadrboard mentioned here it is litteraly fakeer and gayer then math benchmaks also the guy running it is a globohomo faggot
>>107614972I thought kobol.cpp supports some of them.
>>107614974>globohomo faggotAhhh thank you anon. /lmg/ really sucked today and it needed that.
>>107614977I think it uses tts.cpp, which unfortunately only works with a handful of models (Kokoro and a few others that I never heard about and don't seem worth it). No GPU or Vulkan support either.
>>107614972Understable. I'm running it in docker
>>107614887It seems that anons forgot the golden rule: nothing ever happens
>>107614999>dockerWas it called docker because of docking?
>>107615001that's not true. bad things often happen
>>107614887I have never run or cared about a google llm why start now it's obv m-m-m-monster-cucked out the gate
>>107615001LLMs didn't exist 10 years ago
>>107615068Gemmas have the most world knowledge out of all similarly sized models. Her main disadvantage is that she can't say "cock" without help.
>>107615079I'm sure you're always willing to help guide her knowledge along ey?
>>107615079Because they did not filter the pretrain data hard enough. Did you think they would make the same mistake again? Especially when the Gemma team was the one bragging that they were able to reduce the ability to reproduce the training data. Any Gemma 4 would have just been another gpt-oss.
>>107614903>I enjoyed echo-tts but that faggot won't release the cloning part.What are you talking about? echo-tts supports cloning out of the box. It's actually the best clonning tts for English-only.
I'm just wondering, why are we waiting for Gemma 4? Does gemma excel at anything in particular? The way I see it most want local models for rp mostly but Gemma is cucked, no? Or do you guys use local models for coding too? If yes why?
>>107615290Just general assistant shit desu, as a ramlet/vramlet.
>>107615290I think the saars are that plentiful they can shill even here, never saw a good google model log they only release tiny shit for vrammies - /lmg/ membership card denied
>>107615290>local>whybecause its local
>>107615172>approxgeneralization GOOD
>>107615361yeah but when coding you would want to use the best tool for the job, why use a local 27B model which will also slow down for your system when you can just use gemini 3 pro for free unlimited.
>>107615377Approximation is not generalization. God even made two different words to express the distinct concepts.
>>107615418i only use things i can run myself. small code models are pretty capable but if you need something bigger, llama 3 70b and devstral 2 123b are quite good.
>>107615455>he cant see the correlationI bet you cant even visualize apples, retard. go do more psychiatrist sessions with glm, fucking low iq human waste
>>107615290Gemma is a prude but she's good at RP and chat, and smarter than anything else around that size.Mistral Small and Nemo are supposed to be good for ERP but they're too retarded for me. Nemo particularly seems braindead. I can't stand to use them to build up to anything.
>>107615465devstral is fucking dogshit, the tool calling is fucking SHIT (tested fp8 through cline)
>>107615418That's how you get yourself killed with a drone.
>>107615465I guess it depends on what you are doing. I couldn't see myself using local llms for coding as I would at least want it to incorporate google searches for recent things. Like using the ESP IDF is way better with gemini 3 pro as it's api is constantly updating and deprecating older shit.
there's really no decent community for talking about LLMs on the internet uhthis place is mainly about coomers and leddit.. leddit.. just saw this nugget:https://old.reddit.com/r/LocalLLaMA/comments/1pragtf/open_source_llm_tooling_is_getting_eaten_by_big/pure llm slop thread, of 99 current comments, only two people even notice it's fucking slophow retarded of a LLM user do you have to be that you spend hours with those tools and not notice when someone gen this kind of slop> This isn't just my stack. The whole ecosystem is churning.> That's when it clicked. > We're not choosing tools anymore. We're being sorted into ecosystems.I kinda wish someone with guts of steel made a classic vbulletin/phpbb webforum with heavy moderation or something. Instaban slop posters AND people who are too retarded to notice slop.
>>107615478Someone said they got it better by reducing the default rep pen, otherwise need to wait for a fix either in llama.cpp or for cline to add native tool support.
>>107615270When it was posted here something was withheld over "safety". Did that change?>Below are samples, and further below is a more detailed description of the model. We plan on releasing model weights/code, though we are not planning on releasing the speaker-reference transformer weights at this time due to safety concerns.
>>107615478never tried the tool calling, just coding. devstral 2 suggested a few things that worked where qwen 2.5/3 missed.>>107615503for my use i dont need internet search but i'm pretty sure kobold added that a while back. might exist in lcpp too
>>107615504>want to talk about models>annoyed when people use models to help put their own thoughts into words???
>>107615504I love the dissonance some people have when they both get heavily into the hobby and hate it when they see AI output where they didn't expect it.
>>107615549>their own thoughtsno, I assure you, no form of human thought was put into this logorrhea>>107615552>I love the dissonance some people have when they both get heavily into the hobby and hate it when they see AI output where they didn't expect it.incredible, people expect human interaction to be with other humans, preferably not the lobotomized kind
>>107615508forgot to mention I tested the CLOUD model, thats why I specifically said fp8, not Q8.Anyway, I've been battling with a problem this past week and I've tried a combination of local and remote free models.Local:>gemma 27b>qwen 3 next 80b>qwen 3 code 30b>gptoss 120bRemote>grok-code-fast>glm 4.6>devstral 123b>minimax m2I gave them the same instructions, the same traces/observations and even the same pointers as where the problem was located/what to look for ALL of them failed. Some of them had really bad tool support (glm4.6 and devstral) and when they worked they couldnt find the solution. I iterated the errors/solution with them ranging from 2 to 6 times, then I gave up. Literal days spent tard wrangling LLMs.Then I said FUCK IT, let's try Antigravity with gpt3 pro and 1st iteration failed, but it got really close, 2nd iteration I gave it the small push it needed and BAM, bug fixed.Problem is that I'm a capable programmer, but I got fixated into having the AI fucking solve the bug, if I instead would've just decided to look at the source myself I would've solved it in maybe 2~ hours (gemini took 30~ mins).This was a really big reality check for me.Yes you can tell LLMs to code a lot of stuff for you, I even spend time validating, but SADLY cloud models are at another level. SAD
>>107615571I feel like if you unfried all the "that is the most important question" preference training and added long term memory LLM's (4.6 or better) would be much more interesting to talk to than 99% of people.
>>107615591>Antigravity with gpt3 pro??????????????????????????????????????????????????????????????
>>107615644He means gemini.
>>107615619>I feel like if you unfried all the "that is the most important question" preference trainingAnother leaderboard would have to take the place of lmarena first.
>>107615524https://voca.ro/1kdrd2885gib
>>107615591I write VBA automation scripts at my work and I gave gpt4 a try for like 2 days. Then I realized it will be faster if I write it myself. And I didn't ask it for complete program. Just single functions or rewriting what I already wrote with some explicit pointers I know it would need. Can't imagine how vibe coders work if they don't know what the model should know before it starts writing.
>>107615770A lot of vibe coders dont care, to them whatever the AI produces is a black box, they just look at the result (not the code) and iterate.Of course they end up with garbage code, but they dont really care.
>>107615770You used a year old model and complain about its quality, and then say you can't imagine how others work?I'd ask if you're retarded but geeze.>>107615591>gpt3 pro
>>107615770You're supposed to do it through an agentic framework like claude code or codex or opencode.I don't know if Excel has some way to run a macro from a file. It's not the best situation to apply vibecoding.Vibecoding works best with the sloppiest languages (python, js) and and not that well with anything slightly obscure.Also>gpt4Nigger, what?
>>107615797This is the vibemax way, and it will be much of the code in your futureThe hellscape where we can do anything but don't do anything well
>>107615701I got it from their blog: https://jordandarefsky.com/blog/2025/echo/
>>107615770>>107615797if LLMs could really boost productivity or code quality it would be known by now, years of this shit and even the people who make those models are incapable of showing the utility of those things for code lmaoclaude code is filled with retarded bugs like these :https://github.com/anthropics/claude-code/issues/6481that are simply not getting any attention because their crap was legacy code (legacy code being code no one understands in the team) from day one and the way they built their TUI is almost unfixableor how about this gold nugget :https://github.com/openai/openai-python/issues/2472that one is legendary for appearing on an openai video stream for gpt 5 launch where they asked their model to vibe fix the issuemonths later nothing happened issue still exists just cricketsif openai and anthropic can't use their own models to produce better code than paying a team of cheap third worlders what's the use of AI ? lmao
>>107615913The issue is on the chair, not the LLM. If you can't get anything from current LLMs you're beyond retarded. Even no coders are able to get fully functional apps by now.
>>107615949>If you can't get anything from current LLMs you're beyond retardedso anthropic and openai employees are beyond retarded uh
>>107615905I see.That blogpost is misleading. Even the block-wise diffusion, which in the blogpost is references as:>It is unlikely that we will include the block-wise fine-tuned weights in our initial release.
>>107615913In reality, Claude Code feels polished. And so do their web apps. They have good design.
>>107615874Thing is that it knew functions and snippets I gave it basically gave it the correct syntax. And it had correct syntax but it would just randomly decide to do something retarded that makes zero sense.
huggingface is aware that people don't care about their website they just like the bandwidth right?
>>107616009what do you mean? They make money the same way github does, free for individuals so it becomes an industry standard but charging teams and corporations.
feet
>>107615913Ok, now show the alternative to claude code that is not vibecoded. Oh, wait, it doesn't exist, or it contains vastly less features to the point that it's not even remotely worth using despite claude code having a few bugs here and there.But I respect that you've done your homework and brought real examples to the table.Cross platform TUI applications that have to support complex behavior like multi line input and paste support (for example distinguishing a newline from the keyboard from a newline that comes with the pasted text) are a pain in the ass to build and they are pretty much bound to have edge cases because a lot of stuff is hard to do without heuristics (if the text comes at a burst then it's a paste, if it comes with a few milliseconds in between it's the user typing, etc). And that's without getting into incompatibilities between terminal apps.The only thing that probably comes close to that level of complexity might be emacs.>>107616008The GPT 4 family *sounds* good stylistically but is retarded (for modern standards) when it comes to intelligence. I was roleplaying a teenager in an orphanage with it and it completely changed the names of the residents from one scene to the next.OpenAI was never able to make their non thinking models progress past 2024, and their thinking models haven't advanced noticeably since o1.Both Gemini and Anthropic are kicking OpenAI's ass, as well as the Chinese distilling from those two.
MiMo seems alright. Parrots less than GLM but not as smart.
>>107616009I jerk off to huggingface-chan card at least once a month.
Post membership card local gens on your hardware only
>>107616098LEANS IN VOICE DROPPING TO A WHISPER A SMIRK PLAYING ON HER LIPS WINKING FORGET YOUR OWN NAME
>>107616118nothing XTC and some token bans can't fix.the bar on coherence is really fucking low this year.
I would like to posit a groundbreaking theory: slop in cooming is only a problem if your penis wasn't touched by a fuckhuge moe. The second it gets touched by a fuckhuge moe slop concerns become secondary. The problem always was with the model being retarded. The slop is just a red herring and most obvious thing you can see in the output, but if the model knows what you want and writes unique stuff slop becomes bearable. Because slop has its source in real written smut. I would like to call this the law of: "Real ERP has never been tried".
>>107616115
>>107615959They hire literal high-school dropouts and jeets. Is that surprising? I aced my bachelor’s thanks to fucking GPT-3.5. The assignment was to build a Java CLI program (a calendar with alarms and stuff) with a minimum of 1k loc in 4 hours. Barely anyone even knew chatgpt existed back then. And now retards like you are acting smug because you can’t get a single method out of fucking SOTA models? Who are you kidding, dumbass? Just apply for disability.
>>107616115I should've cropped it>>107616265Welcome, what is being printed? Sign of the end when we cannot understand the machinations of our own destruction
>>107616330Nothing, she just lives there
>>107616299What turbojeetoid university did you go to that required you to make a final project in Java with a minimum line count in 4 hours, wtf?
Does your AI nag you about your menial tasks too?
>>107616299>now retards like you are acting smugThe irony
>>107616348Hope she is comfy don't leave your Miku in a hot car / oven / 3d printer
>>107616360A German university and I won’t argue about the pointlessness of the assignment. There is a correct way to prompt LLMs that most programmers can’t wrap their heads around. Exhibit A: >>107616399
>>107616386Honestly I gotta try having an LLM nag me for procrastinating. Maybe that helps.
>>107616386Well, it's mostly me venting. Still that's a good idea