/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106491545 & >>106481874►News>(09/05) Klear-46B-A2.5B released: https://hf.co/collections/Kwai-Klear/klear10-68ba61398a0a4eb392ec6ab1>(09/04) Kimi K2 update for agentic coding and 256K context: https://hf.co/moonshotai/Kimi-K2-Instruct-0905>(09/04) Tencent's HunyuanWorld-Voyager for virtual world generation: https://hf.co/tencent/HunyuanWorld-Voyager>(09/04) Google released a Gemma embedding model: https://hf.co/google/embeddinggemma-300m>(09/04) Chatterbox added better multilingual support: https://hf.co/ResembleAI/chatterbox►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106491545--Klear-46B model training methodology and benchmark performance analysis:>106492824 >106492846 >106492855 >106492872 >106492877 >106492882 >106492885 >106492903 >106493017 >106493058 >106493088--AI-generated loli podcast creation using VV voice cloning and GLM text generation:>106495961 >106495966 >106496018 >106496034 >106496055 >106496061 >106496121 >106496139 >106496144 >106496157 >106496189 >106496197 >106496208--German supercomputing expansion and copyright law compliance challenges:>106493305 >106493329 >106493355 >106493378 >106493423 >106493481 >106494001 >106493977 >106493529--Qwen Max model updates and community collaboration efforts:>106491646 >106492302 >106492366 >106492394 >106492411 >106492421 >106492430 >106492428--Balancing data quality and diversity in machine learning training:>106492910 >106492929--VibeVoice-Large's capabilities and controversy:>106494251 >106494424 >106494708 >106494778 >106495648 >106494801 >106494950 >106495166 >106495298 >106495187 >106495273 >106495566 >106495590 >106495612 >106495637 >106495639 >106495671 >106495689 >106495101--Challenges with managing R1's censorship and card-based context switching:>106493572 >106495514--Temperature settings tradeoff between tool call accuracy and answer quality in local LLMs:>106491720 >106491751 >106491761 >106491845 >106491888 >106491989--Gender bias in doctor riddle from Qwen3-Max-Preview:>106493573 >106494565 >106494593 >106496265--Qwen3-Max-Preview (Instruct) outperforms peers in benchmark tests:>106492622 >106492630 >106492638--VibeVoice model optimization challenges for single-voice applications:>106496609 >106496636 >106496646--Analyzing Qwen3 Max's distinctive generation style:>106493524--Miku (free space):>106493154 >106493503 >106493190 >106494251 >106497578►Recent Highlight Posts from the Previous Thread: >>106491549Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Mikulove
You have them, right?
>>106497597geez Peter, TWO mics?
>A loli whispers in my earTHANK YOU MICROSOFTAHAHAHAHA
I hate microsoft.Xi please release the same or better model.Please your parade was very impressive.
>>106497909the only good chinese model is wan
>>106497916>the only good chinese model is wanThe only good local is wan? are you baiting me?
>>106497909here's your chinese tts brohttps://www.youtube.com/watch?v=mnfLp9O96ak
>>106497927model, deepseek / kimi are not where near claude / gpt / gemini
where do I get good voice clone clips
>>106497909The model is from MSRA in Beijing with a full Chinese teamIt's by all means a Chinese model
what are some absolutely necessary loli voices I should be cloning right now?
>>106498009Aya Hirano
holy shit there are no good online tools for making multiple cuts to an mp3 file lmfao
Where can I get removed VibeVoice large?
>>106498156click it then press delete, then go to your recycle-bin and empty it
>>106497992youtube
>>106498148ask your favorite llm about ffmpeg
Looking for suggestions for an uncensored lite local model for using on my phone. Purely informational.
>>106498210this, gpt5's codex / claude code automate so much stuff for me, I just ask it to make some script to do something and it takes like a minute
>>106498210nah ffmpeg does a lot but if you need to cut up an audio file a whole bunch of times to remove voices from other characters or sounds you really need a gui to plan the cuts and you know not type a billion things into a terminal constantly. found some site called soundtrap that can do what I want. if I get into this enough then I'll just download audacity or something
>>106497992just make your own in audacity. Most good TTS only want 20-30 seconds so. Focus on quality above all. The audio should be clean with no noise. This is where most people fuck up because the voice they are trying to clone doesnt have good audio sources (music, sound effects, static, background noises, traffic etc).People do share models but you'll eternally be using taylor swift, peter griffin, etc.
>>106498228tell it to make you a tool for doing that. GPT5 can one shot it
>>106497992>Pirate TV show/movie>Extract audio with ffmpeg>Trim odoen to bits you need>?????>Profit
Is there a local version of nano banana anyone has made? the ones iv seen on hugging face went down quickly
>>106498265nano banana is not a local model, its a google model
>>106498240yeah I figured >>106498251am I missing something. is ffmpeg easier to use than I thought for something like this?
>>106498148>>106498210>>106498226kek i remember some time ago when i was cutting up the audio for some other tts i was too lazy to install kdenlive or some other shit so i asked deepseek for ffmpeg idk what happened but shit dident work (think i installed it wrong or sumthing) so i jsut asked it to instead make a powershell script which jsut worked lol XD literally just put in the mp3/mp4 in the folder and give it from which second to cut to which and it does it fucking awesome how jank you can get with llms its really alot of fun
>>106498271Actually using audacity would probably be easier. I'm so used to the CLI interface that I sometimes forget guis exist.
>>106498271ffmpeg is for nerds who love command lines. If you want usable stuff, use audacity, or maybe even da vinci resolve which will do audio fine for free.
>>106498265the best image model out right now that can be run on your gaming PC is qwen image, you can run it if you have 16gb of vram.here is a guide from a man who is definitely not a pedohttps://www.youtube.com/watch?v=0yB_F-NIzkc&t=303s
>>106498269thank you anon, thats disapointing is there anything really comparable i can use locally?
>>106498319sorry didnt see this reply, lol this guy looks susjust want a good model to edit wallpapers with
>>106498334qwen image / qwen edit?
Sonoma Sky/Dusk Alpha are likely the next LLaMA or a new Meta line of models (possibly proprietary)
>>106498390lol no, its grok, just ask it, and its shit
new kimi is great btw, like actually better than sonnet imo
hmm it gets pretty schizo at 1.3 tried 1.4 and have tried higher but I dunno. https://vocaroo.com/1dlL1nEjQenysaid voice clone filehttps://vocaroo.com/1orutFZaUpJb
>>10649842810 steps are far too few, try like 50
>>106498412It's shit. I accidentally used v3.1 instead of the new kimi for one of my tests and I actually had a much better time with that before I noticed.
Can anyone give me an estimate of how many t/s I could get with pic related at a 5090? If 3090 + 96 GB + SSD can run R1 at .88t/s how much of an increase would it be with 512 GB of DDR5 over 5.0 x16 + 96 GB of ddr5 + 32GB of vram?
I spent 3 hours looking at comfy ui and all this crap because you told me it was easier on us vramlets and I finally got the comfy-UI VibeVoice thingy running and when I try to generate I get stuck on this>Downloading VibeVoice model: VibeVoice-Large...>Fetching 17 files: 0%| | 0/17 [00:00<?, ?it/s]an hour later still stuck there, I force stop comfyUi and restart it and it still gets stuck on that
>>106498668>ddr5so the same speed as using it on regular ddr5?
>>106498668It's going over PCI-E 5.0 X16 so the hypothetical maximum bandwidth going over that connection is 128GB/s
>>106498676Download from modelscope into ComfyUI/models/tts/VibeVoice-Large.
Redpill me on nanobanana
>>106498704>ComfyUI/models/tts/VibeVoice-Large.*ComfyUI/models/tts/vibevoice/VibeVoice-Large
>>106498707SOTA but it's likely actually genie3 creating a virtual reality where the prompt is real and taking a virtual photo off that
>>106498319do you think it can be done with 12gb?
>>106498721That sounds incredibly convoluted for what's essentially Photoshop: Gemini Version but it's cool how it works
>>106498668I dunno, r1 was kinda hard to run and I havent tried it since jan cuz I hate it's writing style.I have a build of 5090, 2x 5060's for 64vram/160 ddr5 (4000mhz). On linux that got me 5 tokens a second on 4k context full glm q4 with some mmap and maybe using 48gb of vram (hard to balance MoE layers, I suck). Presumably if I bought a proper 256gb ddr5 (6000 mhz) kit, I'd be getting more tokens per second, maybe 8 or so even with 8k context I wanna say.That's a 200gb model. 400gb deepseek is gonna cut shit in half unless you invest in tons of vram
how much potable water is being drink because of this.how many forests are being burn because this technology.People accepted computers because their energy output is low.Now that is gone.
>>106498787almost nothing, and water is not destroyed, that is not a thing, it just condenses back into water after being turned to steam
>>106498787if burning electricity for cars is considered green, then burning electricity for ai is even cleaner (and doesn't cause tonnes of rubber plastic contamination through tires)
>>106498787dying of thirstcomputers drank it allme go too far
>>106498787>People accepted computers because their energy output is low.No, they accepted them because the utility is high. Now its even higher.
when are we going to get tts.cpp and vibevoice GOOFSfuck this python nonsense, couldn't be bothered to set any of it up all over again for every new shitty web UI and whatever that gets released
>tfw mom is mad at you again for using up all the house water to talk to the ai
>>106498787leftist detected
>>106498787I set fire to the amazon (both the river and the rainforest) just to ahh ahh mistress, and I'd do it again.
>>106498787You have identified the issue but not the cause. We have water shortages because people reproduce endlessly until we reach a breaking point. The main use of water is to grow FOOD.Most electric plants and datacenters consume a lot of water but that water is cycled into the plant and then returned to the environment shortly, making their numbers on a graph look high, but essentially very low compared to other uses.They do use a lot of power though. They need to chill out a bit on large training runs and pointlessly making tiny improvements.
>>106498428Where can I get the model?
>>106498787i’d burn the entire amazon if it meant i get to rp with my robot loli
>>106498959https://www.modelscope.cn/models/microsoft/VibeVoice-Large/files
>>106498704>>106498676ok I got it from the torrent in last thread, took a whole friggin hour to download itHonestly yeah I see a pretty massive improvement, previously it took me 3 minutes to generate 15 seconds of speech with the Large model now I generated 40 second of speech in 80 secondsMASSIVE improvement
Here's the reason why vibvoice large was taken down: https://voca.ro/1bCzVodtGtHZ(had to use a voice clone of porn moaning to get it reliably to do this. The base clip sounded this fake and inauthentic too so maybe someone can do better)
Fucked up how a picture is worth a thousand words yet LLMs are way more resource intensive than diffusion models
>>106499113I feel unsafe right now. Like, my whole life is in danger.
>>106499132Now I'm basically raping you : https://voca.ro/15bXrL5GeAS9
>>106498787Energy consumption is how civilisation advances
>>106499149Take care when using the EXCLAMATION MARK(!) IN VIBEVOICE! I find it hilarious when it instantly goes to 11 with mic clipping and distortion
>>106499173Example: https://vocaroo.com/127ZooPcK7mj
Don't you dare!!
>>106499113can it do japanese?
>>106499220Yeah real. English sucks for sex. Though I guess it could be worse.
>>106497859It's the Chinese Family Guy knockoff.
>>106499189Kek you weren't kidding, that really went apeshit.Was that just an exclamation mark or was it allcaps too?
>>106499018Some of the results I've been doing, I stole that degenerate's Gwen voice >>106498428 and just ran it through an AI voice cleaner, all those cartoon sound effects and background noise ruin the sampleand a violet sample I had prepared beforehttps://vocaroo.com/1llO81h1n7kRalso you need to find more even keeled samples, that sample will only produce a hopped up angry yelling Gwenalso from what I've seen the sample options mostly produce garbage, turn it off and the steps seem to be fine at 30 at most, I didn't see massive improvement beyond that point and only slows down with diminishing returns
>>106499220No. I put in some jav moans as the model. Even I can tell it's bad. I generated this several times and I never got the same passion or breathiness, grunts etc that the English voice could. https://voca.ro/1aruRYcd92sp>>106499364It contextually just sorta figures it out. Theres no prompting or anything, but you can say "I'm gonna sing a sad song about etc" and it will try to do it kind of. Voice models seem to help push it in various directions too. I bet it could sing better if I just put in a song.
>>106499389Which one?
>>106499415>fakingu yes
>>106499415JeesasHow about Chinese?
>>106499425which one what?
>>106499415>https://voca.ro/1aruRYcd92sp lmao, that's funny though
>>106499442AI voice cleaner.
>>106492238>>106497310If you're using AMD, be aware that the default for --flash-attn is now "auto", which means to enable it if the backend supports it.On master the AMD FlashAttention performance can be quite bad though, so try "-fa off" and re-try after https://github.com/ggml-org/llama.cpp/pull/15769 has been merged (if you have an old AMD GPU).>>106498834bark.cpp https://github.com/PABannier/bark.cpp already exists though the last commit was 10 months ago.
>>106499448cleanvoice, literally the first one I found while googling lolYou have to create an account and it has limited uses, you know what we're in /lmg/someone please point me to the best local voice cleaner model please
>>106497597>https://www.theverge.com/anthropic/773087/anthropic-to-pay-1-5-billion-to-authors-in-landmark-ai-settlement>Anthropic to pay $1.5 billion to authors in landmark AI settlement>$3000 per bookPack it up boys, it's over.
>>106499449>bark.cpp https://github.com/PABannier/bark.cpp already exists though the last commit was 10 months ago.That's bark model specific though, and I think VV will be more difficult to implement support for since it's actually a diffusion model + a Qwen LLM.
>>106499477seeing comments cheering it i think people deserve the humiliation ritual that is the modern world
>>106499477they should be releasing claude 1.2 instead
>>106499488>seeing comments cheering it i think people deserve the humiliation ritual that is the modern worldthis, why the fuck do they want to make their own jail, humanity was a mistake
>>106499477hey wheres my 3,00 dollars? I've been typing bullshit onto the internet for years. When someone tells the ai not to act like an uniformed angry idiot, that's MY DATA they're using.
>>106499477looool
>>106499121That’s….the natural implication of that phraseYou need 1000x the resources to generate the words for 1 picture
>>106499477holy fuck dude, this is actually horrible, the US really wants to lose the AI race to the chinks or what?
>>106499497Humanity can accomplish amazing things, it's humans that are the problem. Once you realize that anything under 120 IQ can barely be considered sapient you'll know universal sufferage and internet access was a mistake.
>>106499477>1% of the company's worthoh no
>>106499497The market is a thing that allows me to buy things. But when it goes away i probably wont need it.
>>106499518we have invested hundreds of billions on ai and hundreds billions more on hardware to run it. We have invested 50 times more on ai than on nuclear fusion.Thats settlement is token shit to say we did the right thing. And you are correct in thinking that if we actually acted with integrity and morality, other countries would surge ahead of us as we shot ourselves in the foot. If you think this tiny crumb is gonna slow us down you're kind of dumb. If anything it shows the worst that could happen and emboldens lawbreaking as a known expense. A slap on the wrist is the worst that can happen.
>>106499477That's Anthropic's problem. Should have given Orange Jew a few appeasement gifts.>>106499518They have multiple groups of jews infighting for money while chinks can act as one. They don't care so much about the long term, as long as it instantly profitable it's okay. The market will fix it. EU on the other hand put on safety IoT cock cage on and is begging to be dommed by both.
>>106499449>bark.cpp https://github.com/PABannier/bark.cpp already existsNice, thanks.I also saw that koboldcpp added support for something called TTS.cpp although in my (very limited) experience it's really slow on PC and the developer seems to be a macfag because that's the primary platform.
>>106499574>If you think this tiny crumb is gonna slow us down you're kind of dumb.I don't think you realize how serious this is. All emerging companies will need billions of dollars to obtain the data necessary to train their models. This will destroy everything; only large companies will be able to afford it. The US killed itself on that race, they didn't just shoot themselves at their foot this time.
>>106499521>Once you realize that anything under 120 IQ can barely be considered sapientfact, and I say this because have 121 IQ kek
Has anyone tried VV with some Japanese dlsite voice works? I'm curious how it would handle going from Japanese to English.
>>106499521Most humans that report 120+ IQ are benchmaxxed.
>>106499521>Once you realize that anything under 120 IQ can barely be considered sapient you'll know universal sufferage and internet access was a mistakeI felt that way after seeing the lack of gamers and reviewers mention how utterly broken the AI is in the new shinobi (where you can stand next to many enemies and not ever take a single bit of damage)people are worthless
>>1064987024t/s for GLM-4.5-FP8...
>>106499521>Once you realize that anything under 120 IQ can barely be considered sapient you'll know universal sufferage and internet access was a mistake.and the average IQ will goes down and down due to the fact the africans are the only ones making a shit ton of babies, this world is fucked, I pity the future generation
>>106499415>https://voca.ro/1aruRYcd92spKek that's actually not bad though, it just seems to have lost the context of what it's doing.
>>106499735I mean, it is a 5x increase but on the other, 3000 USD + your own RDIMMs is get another GPU territory.
>>106499121LLMs need a much better world model than image models. If you fuck up just one word, it can completely break a sentence or turn it into nonsense, but nobody cares if some blurry background detail on an image is a bit deformed. Or even some foreground details in many cases.
>>106499702I guess this answers my question >>106499415
>Kimi turned out to be censored>Deepseek is still autistic>GLM wasn't much, of anythingWhat's /g/ using for ERP these days after the rose color glasses of that 'new model prose' has worn off?
>>106499780R1
>>106499780Literally nothing
So does vibevoice have stuff like [laugh] or it's words only?
>>106499389Ok enough fun with this thing for today, I'm really impressed by the inflections and effects that it gives the scripts, really surprising model all around https://vocaroo.com/19GSroXyQYlT
>>106499831>really surprising model all aroundthat's why they wanted to shut the model down, it's too good for local
>>106499831>and don't get me startedDid an LLM write this script?
>>106499863no but my imagination is pretty stunted anone-kunRight now I'm just trying to come up with funny throwaway scripts to test this sheez
>>106499875This would be more of a storytime than an RP but here is some human slop I wrote for the Open Assistant dataset:>In the land of South Korea K-pop used to reign supreme. Anyone listening to other genres of music was ridiculed and even openly discriminated against. But no one had it as bad as the fans of Japanese idols. Gangs of K-pop mutant mecha warriors roamed the streets and when they found an idol fan they would be lucky to get away with just a beating. Their favorite thing to do with idol fans was to use their built-in speakers to blast K-pop at such volumes that it would rupture the idol fans' eardrums so they would never be able to hear the voice of their favorite idols again. Sometimes they would switch it up by spewing acid from their mutant bodies for the same effect.>A lone blacksmith knew that the K-pop warriors would be coming for him next. He had made a small figurine of a vaguely humanoid monster with sharp claws and pointed horns. With all of his heart he prayed to Hatsune Miku, begging her to bring his statue to life so that it may protect idol fans from their terrible persecution - and his prayer was answered. Hatsune Miku descended from the heavens and with her divine powers she brought the statue to life. She named the monster Pulgasari, the eater of iron and steel.>And Pulgasari did indeed become stronger and bigger as he consumed more and more metal. To grant him even bigger powers Hatsune Miku brought the radioactive remains of the Fukushima reactor core to Korea so that Pulgasari may feast on them. And as the radiation entered Pulgasari's body he began to mutate, growing stronger and stronger by the second. The blacksmith knew that with Pulgasari on their side the time for rebellion had come and so he rallied his fellow idol fans to finally rise up en masse.
>>106499189>you- FUCKING NIGGER!
>>106499113that's bredd good actually :-D
>>106499875>>106499907>It wasn't long until the K-pop warriors realized that something was wrong: a giant, radioactive monster was marching towards their headquarters and thousands of rebel idol fans were following it. Thanks to their mechanical bodies the K-pop warriors were able to quickly concentrate their forces and a battle of epic proportions ensued. The K-pop warriors reasoned that they would only need to take down Pulgasari and their victory would be assured, but their strategy ended up backfiring. With each felled mecha warrior that Pulgasari consumed his wounds wound close and he emerged even stronger than he had been before. Eventually the K-pop warriors realized their mistake but it was too late; Pulgasari had killed too many of them and they were being overrun.>The battle ended with a crushing defeat for the K-pop warriors and their headquarters was occupied by the idol fans. But Pulgasari's hunger for metal did not stop. He continued to feast on the corpses of the defeated mecha warriors and then went on eat any kind of metal he could find. Hatsune Miku, realizing that Pulgasari's hunger would never be satisfied, quickly hid herself in a bell just as Pulgasari was eating it. Pulgasari devoured her and as he realized what he had done he turned to stone while letting out a heart-wrenching scream. Touched by Hatsune Miku's heroic sacrifice the fans of different music genres established an uneasy peace. Whether this peace would last only time would tell but if the menace of K-pop mutant mecha warriors were to ever rear its ugly head again, then Pulgasari will be back to stop them.
>>106499149holy shit man she needs to calm down
>>106499780giantess woman.her ass is your new home.
>>106496501>>106496504Thank you, absolute legends. Now it not only doesn't OOM, but works faster in some scenarios where it wasn't OOMing.
>>106499839Is it still available somewhere?
>>106499967no
>>106499967yes
>>106499967maybe
>>106499967https://www.modelscope.cn/organization/microsoft
>>106499907>>106499916damn anon, that's a lot of shit took 12 whole minutes to generate that, be sure to listen to the end :)https://vocaroo.com/12ef4CDQg9pZI made clones from Sarah and Ellie from tlou and Violet from the incredibles, I especially like how you can hear paper shuffling at some points and Sarah flubs a line once
>>106499916>>106500081I just noticed it was your script that flubbed the line but it generated as a geniune mistake of someone reading too fast, incredible
>>106500081Cool, thank you.The intonation is still off for e.g. "Hatsune Miku" or more generally for emphasizing the intended emotions of the story but for something that is machine-generated this is very impressive.If someone were to leave me a voicemail using this I don't think I could reliably tell that it's not a human.>>106500089Yeah, I wrote this at like 1 am.
Now that he wave reached the plateau of XXXB/30~50A MoE models, how are we going to run the next upcoming MoE 70b~100b active parameters SOTA? Even CPUMAXXing and Macs start being slow as shit at those active parameter sizes.
>>106500288the trend is towards lower active param count, not higher
>>106500288>Now that he wave reached the plateau of XXXB/30~50A MoE modelsWe still haven't hit that, biggest niggers on the block are <40BA, and trending downward.
>>106499814words only. You can type haha and it will kind of do an actual laugh but I couldnt get it to do more than that. Maybe if you put laughing in the voice clone... I didnt tryhttps://voca.ro/1iU4VFpN4gXK
>>106500404What voice did you sample to get this croaking harlot?
>>106500288- better quants- different experts quanted differently- wait for amd's giant multi-channel apus
I'm thinking of getting a 5060 16gb later this year, but I'm worried about a price hike. I'm using a dinosaur 2060.It looks like a good, enduring buy. Even that nip blog says it's a very good entry-level card.It's a shame AAA gaming is so shitty these days that the only thing you'd want a good card for is 'playing' with AI.