/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107957082 & >>107948284►News>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B>(01/21) VibeVoice-ASR 9B released: https://hf.co/microsoft/VibeVoice-ASR>(01/21) Step3-VL-10B with Parallel Coordinated Reasoning: https://hf.co/stepfun-ai/Step3-VL-10B>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107957082--Paper: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings:>107960244 >107960358 >107960418 >107961005 >107960382 >107961422--Multi-GPU setup strategies for cost-effective inference:>107958416 >107958478 >107958532 >107958598 >107958970 >107958540 >107958611 >107958531 >107958632--Qwen3 TTS installation challenges on Windows with Nvidia GPUs:>107958660 >107958671 >107958685 >107958709 >107958783 >107958719 >107958782 >107964469 >107958753 >107958714 >107962549--qwen-tts performance and compatibility issues in TTS applications:>107958000 >107958013 >107958047 >107958501--LLM struggle with deviating from genre tropes in constrained narratives:>107959380 >107959410 >107959431 >107959440 >107959458--Exploring AI interaction in Among Us-style games and survival simulations:>107959425 >107959464 >107959483 >107959505 >107961126--Challenges with book-based QA and context limitations:>107964051 >107964287 >107964322 >107964354--Optimizing llama.cpp for fast, low-VRAM 1-shot question answering:>107963343 >107963394 >107963472 >107963529 >107963577 >107963655--Speculation on Minimax-M2-HER and Mistral Small Creative model releases:>107957396 >107957481 >107957543 >107957650 >107957598 >107957634--MiniMax M2-her roleplay limitations:>107962436 >107962501 >107962512 >107962654 >107962666--llama.cpp PR reducing DeepSeek memory usage:>107963328 >107963386--Critique of TranslateGemma, recommendation of heretic Gemma 3 for uncensored JPEN translation:>107961940 >107962800--Vibevoice emotion tag functionality:>107960489 >107960506--LLM formatting and model preference debates:>107966244 >107966357 >107966388 >107966534 >107966600--Qwen voice cloning stability and context length issues:>107961962 >107962660--Miku (free space):►Recent Highlight Posts from the Previous Thread: >>107957086Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107968112No Miku, I'm not letting you into the server room.
Considering engram is going to be the next big thing, lets talk about it and its consequences for local.I'm trying to get a feel for how these future models will work on local machines but I'm a bit of a brainlet so I would appreciate some input from my betters. If I understand it correctly, around 20-25% of the model's parameter budget gets allocated to the engram memory module which is stored in RAM/NVMe, where performance scales linearly with the size of said memory model. Obviously the computing requirements for running the model go down, but what does this mean for RAM/NVMe? Does this mean we'll be running huge models that sit in NVMe storage? Should I be buying as many NVMe drives as possible? Another thing to consider is the throughput. The paper claims there's only a 3% hit to throughput when using the hybrid engram architecture, but is that the case for only RAM or NVMe storage as well?
>>107968191>Should I be buying as many NVMe drives as possible?I have no idea what you're talking about but it's already too late, prices are skyrocketing. I got a 1TB nvme drive for like 80 bucks 3 years ago and now they're like 250 dollars
>>107968191I doubt computing requirements would go down. It seems deepseek wants to add engram params on top of what they can run right now. So, deepseek v3 is 650B, then deepseek v4 will have the same 650B + 350B of engram params.
i'm looking for the smallest non-potato gguf of qwen3 tts possible
>>107968303It's called pocket tts.
>>107968316i already have that set up, i want slightly less potato
>>107968321Chatterbox turbo - 350M params.Seriously, nobody has vibecoded gguf inference for qwen-tts yet. Be the first. llama.cpp already supports qwen3 architecture. You just need to implement MTP module and tokenizer.
Will posting furry stories here trigger a panic attack in a certain Russian shitposter?
>>107968171Miku was just a distraction! Rin is already in the server room!
>>1079682662tb is 250$ right now at the retailers that I visit. The point isn't dooming about current prices, the point is about determining future prices in a world where engram is the new paradigm. >>107968288Yeah, I fully expect the SOTA labs to try to max out model size on GPU, but at minimum it means we get better and smaller models. I'm really interested in the linear scaling performance and what it means for RAM/NVMe.
>>107968424>but at minimum it means we get better and smaller modelsdid you not read a word he said?
>>107968421[impregnation][impregnation][impregnation]
How does it feel that gpt 3 is still better than your local models?
>>107968471it sends shivers down my spine
>>107968454She's smol and for cuddles you perv.
>>107968471How can a model with 2k context be better than my local models with 16k usable context?
>>107968490we'll cuddle during aftercare
>>107968492>16k usable contextshow nolima
>>107968471LMFAO fuck no
>>107968471That was you trying to make a joke?
>>107968431Yes, I perfectly understood what he said. 25% of sparse parameters can be offloaded to embedding tables in RAM while getting better results on benchmarks. This means smarter models you can run with less VRAM. This is the ground floor for engram. That doesn't mean labs won't push out engram models with larger sparse parameters to try to push benchmarks. I fully expect the same thing we have now, which is model size diversity.
>>107968471>GPT-3>How many R's are in Strayberry? >"You mean strawberry? There's 3 R's in strawberry. If you meant Strayberry there's 2 R's in Strayberry. Hope this helps!" (<hardcoded retard)>There's 3 R's is Strayberry though.>"Oh you're right! What a fascinating discovery! I'll seem to have made a mistake! There's indeed 3 R's!" >meanwhile local model>How many R's are in Strayberry?>"lol you're trying to trick me? I bet you can't even country to 3 you doof"
>>107968471rose-tinted glasses
What the fuck does it even mean for a breath to catch?
CUDA dev you broke GLM 4.7 (not flash) with https://github.com/ggml-org/llama.cpp/pull/19092I didn't test other models.Left is 0440bfd, right is 0c21677.
>>107968564Do you have PPL and KLD proof of these claims? Otherwise this is just FUD that can safely be ignored as a meme
>>107968541Show logs from GPT-3.
>>107968573The temperature is set to 0 and the prompt is the same so KLD is very obviously different.
>>107968588Can't trust you without hard numbers chief, nice try trying to waste his majesty's time!
>>107968115Thank you Recap Miku
>>107968548ESL? It's when breathing stops for a second, often with a sharp intake of breath beforehand. Similar to a gasp, like when someone is surprised or frightened. Not to be confused with catching your breath, which means a different thing
>>107968564>>107968588It's expected that results are not bit-for-bit identical.Since language models are autoregressive, once a single token is sampled differently the entire sequence diverges.This happens in particular at the beginning of sentences where the token distribution is very flat and small changes can amplify.A low or zero temperature suppresses this to some extent but not completely.I'll double check the architecture of this particular model but I don't see this as evidence that either build is better or worse on average.
Has anyone tried to abliterate a model with heretic? How long does it take until the number of refusals starts going down? Should I be concerned if it doesn't for a while even when setting the random exploration phase to 1?
>>107968640It's fairly obvious that the example on the right doesn't follow GLM's usual thinking structure at all and the output is completely schizo. It claims that the lines of the poem are jumbled up.It gets worse at higher context. I first noticed it in claude code at ~20k context because the model wouldn't output anything coherent at all and just spammed the same nonsense token.
>>107968664Anecdotal.
Oh, and I also set the minimum considered KL divergence to 0.1. But it never reaches that.Is there a setting to make it more aggressive if I care about the uncensored part more than about the correctness part?Maybe it doesn't work because it's a MoE?
>>107968664kek, flash attention was broken on llama.cpp for a fucking year and those clowns said all the same stuff defending it. "The perplexity score on 16 tokens of wikitext is nearly the same so our implementation isn't broken."
FUCKING AMDall this time I was fighting shitty ROCm only to find out Ollama uses fucking vulkan natively and I don't even need shitty ROCm.Mother fuck
>>107968711Any concrete proof of this?
>Qwen3-TTSfinally a better model than openaudio s1 mini for mixed language generationquality of cloning is overall better and also more stable
>>107968722>ollama>retardedchecks out
Why is Claude like this?;
>>107968628ESL?
>>107968722is ROCm good for anything at all?
>>107968755ESL?
>>107968729yes look at the commit history you potato
>>107968640Here's an example with an even longer prompt: https://rentry.org/xwu5muxuBefore the commit on the top and after the commit on the bottom.
>>107968757Apparently it's meant for researchers mostly but it seems to be required or needed when it comes to AI video generation
>>107968664>>107968779In this particular case I can already reproduce the issue, it has to do with one of the specific code paths on Turing/Ampere.
>>107968779What makes you think that thinking in English is a metric?
>>107968793Based.
>>107968779It's expected that results are not bit-for-bit identical. I don't see this as evidence that either build is better or worse on average.
teto > miku
>>107968760Go back to /ldg/, please.
>>107968754that's all AI>>107965306I use lmstudio because the python lms interface is great and vLLM doesn't run on my PC for some reason.
>>107968754>The Loss Function (The "Teacher's Red Pen")This slop pattern is more infuriating than "not x but y".
>>107968779It's expected that results are dogshit and definitely worse than before. I don't see this as evidence that either build is better or worse on average.
>>107968853nu-llms love giving names for everything, capitalizing them, and making everything sound profound
>>107968862>nu-llms love giving names for everything, capitalizing them, and making everything sound profoundDamn are they trained on cultivation novel slop?
>>107968870I wish, it's impossible for them to comprehend the dao and see mt. tai
>>107968820>>107968826>>107968858
>>107968738kinda sucks that you can't clone a voice and style it with different emotions
>>107968793I forgive you for not reading the wall of text in the first post. I should have immediately pointed out why the second output is incoherent.
>>107968877A river flows 30 years west then 30 years east, maybe they can write good slop scripture soon
>>1079684924.6 starts melting at 20k. And i think 4.7 can go beyond that. I tried 30k and it was good with some minor problems it has even at lower ctx sonetimes.
>>107968877A frog in a well
>>1079684713.5 can't cause ego death
>>107968900>I forgive youHuh? You should feel blessed he's deigning to acknowledge your presence.
>>107968906it's so simple too, if llmisms were xianxiaisms I'd read them for 1000 chapters without complaint
>>107968870Courting ego death
one of the best threads in ages (non ironic)
>>107968548Means the same thing as when your refrigerator is running.
>>107968936kek
>>107968928>it's so simple too,It really is i bet i could make a lorebook on the tropes in no time. Problem is still chapters and names. I doubt you could even write 50 chapters without hitting big problems.
>>107968853>This slop pattern is more infuriating than "not x but y".Which Claude? I haven't seen this slop pattern yet.Opus-4.1 likes to call me autistic or say "HOLY FUCK"
>>107968967I just want to court death and seduce fox demon jade beauties while slapping the faces of young masters in the demonic sect
>>107968471gpt-3 was great. good times.
>>107968471As much as I enjoyed GPT3-davinci (not necessarily 3.5), GPT4 (0314) and GPT4 (0613) both did things that local models to this day don't handle well.They're next to Claude 3 Opus in terms of models that likely will never be reached in terms of soul
instead of hoping the LLM picks up the extremely obvious subtext while wasting 2k tokens "thinking", it's actually much simpler to just telling what to think with ooc
>>107969015Disable thinking and it will pick up just fine.
>>107968993kek
>>107969020no, without thinking it devolves into tropesthinking is the only way to prevent it from trying to stop me every time I go to kill a main character, otherwise it gives them plot armor
>>107969005>in terms of soulshut the fuck up you drooling retard
>>107968988it knows youre a retard, it just doesn't care
>>107968993I love it, how can we emulate this behavior on our superior local models?
>>107969005now it's kind of funny to think that this huge ass 1.8 trilly moe had only 8k of usable context at the start
My AI called me a retard and ended the session and told me to never speak to it again.wut do?
>>107969065Block her and hit the gym.
how isn't there a regex for the double adjectives yet
Looking for a realtime-ish local TTS that sounds good and supports either good voice cloning or finetuning to a particular speaker. Bonus points for an implementation in a real language, not Python.>VibeVoice-Realtime-0.5BDeliberately no voice cloning nor finetuning support, so it's useless. Would need to be reverse-engineered.>VibeVoice-1.5BVoice cloning adherence is OK. Not very natural cadence or emphasis etc. Is it worth finetuning? VibeVoice generally has a (probably vibecoded) Rust implementation that seems to work (unsure about its perf):https://github.com/danielclough/vibevoice-rs>KokoroGood quality for its size, but doesn't support voice cloning or finetuning.>Pocket TTSVoice cloning adherence is very poor. Would need finetuning, but AFAICT nobody's done it yet, perhaps because it ostensibly supports cloning. Supports streaming. May be the best option given finetuning support.>FishAudio-S1-miniEven the official samples sound pretty shit, like a schoolchild reading a book aloud. And the only web demos I saw were walled behind an account.>Qwen3-TTSVoice cloning adherence is bad. Does support finetuning; I think an anon ITT had a bad experience with that.>Echo-TTSGreat quality and voice cloning adherence; best I've heard in both respects. Sort-of supports streaming. Couldn't run it locally due to a bug in a dependency (which wouldn't be hard to swap to be fair). Unfortunately somewhat obscure and apparently a dead project.>IndexTTS2Decent voice cloning adherence, good quality, sounds pretty natural. No official finetuning support. Best overall option I've seen. Has an extremely vibecoded Rust implementation which I haven't tried:https://github.com/8b-is/IndexTTS-Rusthttps://huggingface.co/ThreadAbort/IndexTTS-Rust
>>107969100>Bonus points for an implementation in a real language, not Python.Stopped reading there, keep whining
>>107969100>Bonus points for an implementation in a real language, not Python.Continued reading there. Good complaint.
>>107969100i been looking for the same thing, its like a fucking unicorn
>>107969100>Bonus points for an implementation in a real language, not Python.Python is the white man's language
>>107969100>Bonus points for an implementation in a real language, not Python.Stopped reading there. Good complaint but good luck finding anything.
>>107969005>GPT4 (0314) and GPT4 (0613) both did things that local models to this day don't handle well.Like what? I never used em
>>107969162they were the LLMs of choice for trannies to workout their transitioning strategies.
>>107969183Kinda true. When I was prepping for my transition I used GPT4 while running Arch Linux. It was amazing at helping me figure out when to complete steps and helped me accept who I really am.
>>107969214Still one step short
>>107969100Supertonic. It sounds more natural than most < 100M models. Doesn't need a phonemizer (no espeak dep), doesn't need complex tokenizers (just straight utf8->tokenid mappings). Understands numbers, $ and a few other things without having to spell them out (but you can still do it if necessary). ONNX models that run just fine on my C thing. They have example code on how to run it for about 10 languages (including C++). It's fast. Doesn't have voice clone. Voice files can be modified like kokoro's or kittentts', so the kvoicewalk method to find voices would work on it just fine.If nothing else, being one of the few (only?) models that can do synth voices without a phonemizer/tokenizer is a huge thing. V1 is more stable than V2. It misses words less often.Soprano-80M. Single voice, which i think sounds pretty natural. No voice cloning or even changing the default one, as far as i know. LLM based, complex tokenizer. Being able to stream helps mask the speed (which is not bad, really. Just slower than supertonic).There's luxtts, which was released no more than 2 days ago, but it's a mess to set up. It needs like 5 (small) models, for two he provides the onnx ones. Then there's the vocoder in safetensors and bits of zipvoice that you need to pull from some other repo.
>>107968757would be if it was adopted more
>set up opencode+llama.cpp a few days ago>had an issue with GLM-4.7-Flash and other instruct models concerning tool calling templates>think "Maybe should recompile with newer version?" >compile newer version>https://github.com/ggml-org/llama.cpp/issues/19096I think I will try the b7811 release and just HOPE that it works with GLM-4.7-Flash since that is the only model so far that works on my unholy 12GB VRAM setup. It's slow, but offloaded into memory it worked. Until it broke. Hope they fix this + the tool-calling issue, then it would be great!
>>107969100Gpt-sovits is great at cloning and there's rust implementation
>>107969065Sounds just like my ex...
>>107969771I've used qwen 30B with tool calling and it worked just fine in the past.Might want to try that.
>>107969771Update b7811 works, the flash attention nkvo bug is in a later release.b7811 suffers from:>Template supports tool calls but does not natively describe tools. The fallback behaviour used may produce bad results, inspect prompt w/ --verbose & consider overriding the template.But it works fine with GLM-4.7-Flash (most of the time). >>107969843Yeah, might. But GLM-4.7-Flash is the first one that works with opencode on my machine. Since I'm GPU-Poor with only 12GB VRAM I have a virtual machine with GPU pass through to the 3060 and offload it into 48 GB RAM. It's simply the first model to actually produce viable results, even if it takes forever. Been trying around with RNJ-1-Instruct which also kind of worked (and fast) but tool calling is a bit crap in llama.cpp/opencode stack?Also Apriel and the Minstral models are nice, but I guess those make more sense in a LMStudio/Desktop app for quickly checking stuff out...
>>107968191Deepseek paper's Table 4 benchmarks only host DRAM offloading, not NVMe
>>107969878>I have a virtual machine with GPU pass through to the 3060 and offload it into 48 GB RAM.Oof. Considering that for models like FLM 4.7 Flash (MoE that mostly run on the CPU), memory bandwidth is the main defining spec for generation speed, assuming you aren't using a really small quant to fit the majority of the model in VRAM, I can't imagine running under the overhead of a VMs memory management will yield the best results.Is there a reason you are running the model inside a VM?You can launch llama.cpp in the host OS and access the API from inside the VM if necessary.
>>107968993my job as a piss drinker is in jeopardy
>>107969900All testing was on H800 but concievably a 4090 with 64GB RAM could run Engram 27-B
Anyone tried HeartMula?
>>107969911>You can launch llama.cpp in the host OS and access the API from inside the VM if necessaryThe initial reason was that I'm playing around with AI agents and didn't want to give them access to my host OS. There will be some configuration changes (networking to the Host etc) so I can access the API from the guest OS, but otherwise it should actually be better...yeah. I'll do that tomorrow. I'll make a lot easier. Having the llama.cpp and opencode and all in the VM just was much easier to configure until I got a somewhat working setup, more of a PoC really.
>>107969100>Qwen3TTS badHahahaha is .the best local tts model for voice clonig but you are retarted and not procecing the text of the example before clone the voice. Skill issue
>>107969974Also I was hoping to use models which fit into full VRAM, but 3-8b models are about as smart as the average 4chan poster and give the same behavior and output.
>>107969100>Couldn't run it locally due to a bug in a dependency (which wouldn't be hard to swap to be fair).works fine for me locally. ask llm to teach you conda or uv.post a sample of the voice you want to clonewhat language do you need?do you need laughs and moans?is 16khz ok?
>>107969974>>107969991Yeah, I figured it would be something like that.Well, good luck.In my experience, Qwen 30B is actually pretty damn good for the specs you can run it on at decent speeds.For anything but creative stuff, that is, but still.Maybe GLM will be better once all the kinks are hashed out, but Qwen right now just works.
>>107969900I've run into some discussion claiming that you'll be able to offload a majority of the engram parameters to NVMe storage, but I can't find anything about it let alone throughput benchmarks. Regardless, I'm intrigued and confused about Infinite Memory Regime. Trying to figure out whether or not I should FOMO into more RAM and nand memory.
>>107970015Which quantization of Qwen 30B do yo use? As I said it's mostly to be integrated with opencode (which has the least shitty UI so far imho, although I disagree with their shitty ass documentation). Need to speed things up already so I can have it vibecode me a opencode alternative...
>>107970034For your specs, I'd go with q8, or at least q6. You can run 32k context and still leave some expert tensors (via --n-cpu-moe) in VRAM.In my 8gb VRAM shitbox I use Q5KS, which works well enough.
>>107970049Thanks, downloading the Qwen-30B with Q6 now so it'll be ready in the morning. Since I have two graphics cards (RX580 and 3060) and no iGPU/APU I'll just offload it all into the 3060 while using the RX580 for normal tasks. If I had an iGPU I'd try (https://github.com/peterjdolan/llama-cpp-rx580) Vulkan backend, then I could have two local models running at same time..Looking forward to seeing if the Qwen30b does as told, and if it generates what I want it to generate at an acceptable rate. Still, with Ryzen 5 3600 and 3060 12GB the LLM produced "acceptable" code and worked agentic in an acceptable manner - only takes like 6 hours for something barely function and my ears bleed from the air cooling, but it is what it is. What a time to be alive.
>>107969969I wanted to but it sounds just like sunoslop. all these slop generators are trained on the most cookie cutter normalfagshit and I'm not interested in that.
https://x.com/TencentHunyuan/status/2015635861833167074>Today, we introduce HunyuanImage 3.0-Instruct, a native multimodal model focusing on image-editing by integrating visual understanding with precise image synthesis! >It understands input images and reasons before generating images. Built on an 80B-parameter MoE architecture (13B activated), it natively unifies deep multimodal comprehension and high-fidelity generation.80B13a moe multimodal with CoT image understanding reasoning, image output model with editing, nothing about open source
air status?
>>107970431It's shit.
>>107970431>HunyuanImage 3.0-Instructnigga what ? where did the first 2 go ?
>>107970546there was a 2.1 or something a while ago at least
>>107970431Here's their official prompt handbook with some exampleshttps://docs.qq.com/doc/DUVVadmhCdG9qRXBU
>>107970431wait, so it's just an instruct tune for this thing that's been out since september?https://huggingface.co/tencent/HunyuanImage-3.0
>>107970564What they did before marketing to us is none of our business. We must be better local men.
>>107968407do it coward
Is there any LLM that can manage to go more than 16K tokens before it starts contradicting established elements by superimposing what it expects to be true? Just had an annoying experience with GLM 4.7 (355B A32B) Q8_1 group-size 32, temperature 1.0 top-p 0.95.
>>107968992I just want to re-enact the mcdonald xianxia story
>>107970678>gml4.7 q8r u flexin?
>>107968288Doesn't moving too many parameters to memory at the expense of MoE experts lead to a loss in reasoning depth?
>>107970865He's a mega ultra super quantum giga faggot. His radiated faggotry is so powerful it melted the models neural network.
>>107970790I'm explaining before anyone asks that it's not due to quantization error.
>>107968112I setup open code with ollama and qwen3 code. Can't get the model to not timeout from the initial /init. I set 65k context for the model too as described in the open code docs ollama local guide. Mind you this is on a halostrix with 96gb vram. I run Claude code via open code and it works just fine what gives?
>>107969771Can you provide info on the tool calling issue? I think I'm running into it but I'm too new to know
>>107970900Sure. I was in the middle and quantizing the tokenator for my Yamaha 3.5:57t model when it shut down. Do I did a sudo rm -rf and it spit out SHUT THE FUCK UP FAGGOT
>>107970896youre a retarded frogposter
I finally got around to testing Qwen3-TTS and oh boy is it fun. I refused to pay for ElevenLabs so this is my first chance to play around with this type of thing.It sounds good, not perfect mind you, but good. Good enough to listen to an audio book it created. I could see taking this and marrying it to a llm and creating your own local talking digital assistant.
>>107970896which qwen 3 code? the 30b or the 480b?
>>107970937I not sure I Indian from nu deli I not english good
for anyone who cares, this is just one of their default voiceshttps://vocaroo.com/14aWNHlbKNiw
>>107970942well, running the 480b version is basically impossible on your hardware. the 30b version should work just fine. almost certainly an ollama issue. ollama is known for being kind of shitty compared to base llama.cpp.
qwen3-tts gguf when
>>107970966Never you pervert
>>107970966just follow the instructions on their repo its not hard for non-lobotomites
>>107970966Right now check the group chat
>>107968112Best 32b class coding model qwen3 32B fails to create a vulkan triangle
>>107970966>>107971018Yeah, if idiot me manages to get it working anyone can.https://vocaroo.com/1nlwoH5SvSYn
>>107971018>>107971039I need it to integrate in my application not just for gooning
>>107971091goof it yourself
Regarding QWEN3TTS, I was playing around with it but one thing I'm not sure about.So you can clone voices using BASE, and then you can save the voice file.But in order to use the voice file you can only use base? like I cant use the custom voice to guide the tone towards angryness, calm, whatever?
>china = badWe don't want to associate "boring" with American models.
>>107971144From what I understand can can either>clone a voice>use a predefined voice>create a new voice based on a descriptionI don't see an option to import a voice you cloned into the section where you can use it as a created voice and then shape the way it is used.but i am little more than a script kiddy. i can see this stuff changing as people build on what has been released
>>107971184I looked also in the provided python examples and nothing. The CustomVoice stuff takes for input 'speaker', which is a string, I didnt look further (maybe there's a way to add custom speakers?) but OOTB it looks like you can't use CustomVoice with Base cloned voice. SAD!!!!
>>107971200The speaker is one of the preset voice personas. Read the docs nigga.
>>107971144They are adding another model that does both voice cloning and direction
>>107971212Your reading comprehension is NULL, I know there are preset voices, I don't know if it supports actually adding a custom voice (ie if its programmatic).You made me actually check the code and NO, the custom voices arent just 'descriptions' of how they should sound with a label slapped on top of it, it looks like they're baked in the model.VIVIAN literally translates to token spk id 3065 at the MODEL level, there are no other references.These fucking chinese faggots, how can they name a model CustomVoice and it DOESNT FUCKING SUPPORT CUSTOM VOICESLMAO
>>107971200i am surprised how much you can get out of the predefined voices with a little bit of instruction although even an extra space or two at the beginning of the text can alter how it comes out. https://vocaroo.com/1cSYzuWHttcBi can see some crazy stuff being created eventually as people tweak this
So I took an English sample of Hatsune Miku speaking, the bit about British people, and fed it into Qwen3 and then generated the following.Its not Miku but its close.https://vocaroo.com/1gm8JevJRise
>>107971408Is this cartoon character supposed to sound retarded or is it a tts issue?
>>107971445Have you never heard hatsune miku before? Yes she is supposed to sound robotic.https://www.youtube.com/watch?v=EuJ6UR_pD5s
>>107971445hownew.ru
New update for llamacpp is wicked sick. Air has never been this creative before.
>>107971445itS MIGU!!!!
>>107971502Looks objectively fine.
>>107971502KINO KINO KINO KINO
>>107971502It's expected that results are not bit-for-bit identical.
llama.cpp having no concept of versioning and phases of testing, just releasing features and refactors one after the other.. there's not even a way for a new user to look at the github page and think "this is the commit version I want to retrieve, surely it's not borked to hell"
>>107971502Mildly more helpful info: Windblows 10, latest Cuda 12.4. Flash works and is fast and cool and all, but it's too retarded even when its working properly. So I went back to Air and then this happened.Reverted to the version I was previously using from 3~4 days ago and its fine. So something since has caused 4.5 Air and 4.6 to die.
>>107971591Still not useful without KLD.
>>107971591>>107968793
>>107971580https://github.com/ggml-org/llama.cpp/discussions/15313 this discussion about exactly that is linked in the readme, you should give them your feedback there or direct new users to kobold.cpp because it has releases
>>107971606>vb>Location Paris, by way of Delhithank you sir
>>107968564Funny how it's possible to break attention in such a way that is causes the model to see words out of order but remain somewhat coherent in its own output.
>>107971599Ah, glad they're aware already. Flash improvements are no reason to update anyways so no loss while we wait.
It's not just control capability that is unstable, the whole model is unstable. I guess that's why they opensourced it. It sucks. Hopefully, 25Hz version is better.
>>107971825>not just x, y
>>107971838Yes. I speak like an LLM because I used a legit grammatical construction once.
>>107971598john is this you? PPL with wikitext is not a real metric, just wanted to remind you.
>>107971855You're absolutely correct!
>>107971904Excellent observation!
I keep seeing midwits on twitter talk about clawdbot, mac minis, and agentic swarms. Is this shit actually useful for anything or do tards just use it to larp as managers?
>>107971825>Hopefully, 25Hz version is better.It is rather usable now. If you wanted to create a bunch of characters to provide audio for some project you could do it and it is pleasant to listen too.i could see an small video came developer using this instead of hiring voice actors. its great for the price
>>107971920It's too monotone for voice jobs. If someone needs to replace voice actors, they should use VibeVoice or Echo.
Greetings.Can I start llama.cpp om my pc and use something like tavern on my phone?
>>107971825Thanks, saved me from the hype/distraction. Sticking with maya-1 for voice creation.
Yes, but you'll have to keep them on the same LAN unless you want to fuck around with a tunnel or other security shit to make your llama-server accessible from the internet.
>>107971911It's real.
>>107971911>>107972347Vibe coding meme shit where if you just do some basic architecting yourself you'll cut the time spent in half and cost by 10x
>>107972347this is psychosis
>>107970900If you're running llama.cpp check the console output, it should intermittently have that text about tool calling, and when used it may have <tool_call> tags in output or similar instead of actually calling the tool.
>>107969100Piper. Shock, right? But it actually works fine and has support in things like homeassistant. It's also performant on CPU.Also, rust is for redditors, fuck outta here with that shit.
>>10797093730b
So I just switched to an updated version of llamacpp after not updating since last August and.. Does kv shifting just not work at all anymore? No matter what combination of --cache-reuse N, -b N and -ub N I use it just reprocesses the entire fucking prompt.The only issues I'm seeing on this are talking about SWA which isn't relevant since I'm using a qwen model with GQA. Wtf.
>>107970952He was making fun of my post
>>107972786Everything is fucked currently >>107971502 >>107968564
>>107972786it's disabled by default nowyou need to explicitly enable with--context-shiftand frankly I think less of you for using this retarded shit that destroys generation quality
Has anyone here tried using clawdbot with local models and had any success? I have tried GLM-4.7 with llama.cpp and just fails to follow the instructions that tell how it should keep updating its memory.
Anons, I need your magic sampler settings that make every generation kino.I've been fucking around with temperature the past day and I'm getting SO MUCH better writing on low T (<0.3) but I have to do too many manual corrections if I don't like the way the model is trying to take the story because rerolls are pretty much useless. I know I can get way better outputs with good samplers, I just don't know what good samplers are
>>107972381Which is why it's mostly pushed by juniors and retards that are incapable of doing basic architecting themselves. It's good enough now that it can complete some small features autonomously with minimal hand-holding, but asking it to manage the repository entirely is just asking for a disposable ball of mud. But I guess long-term maintainability isn't really a priority for anyone.
If a backend communicates with webui via an API, are the vibecoding models able to identify the communication lines and redo the dog ass UI into a normal desktop app in wxwidgets or anything that isn't browserslop?
>>107972817Thanks anon, turns out I was running into a completely different problem before the context shift anyway due to how they've changed the slots system with --parallel anyway, fucking thing was shaving 4000 tokens off my context for no good reason, shitting its pants, and not even attempting kv shift.Also, I've not really noticed a quality issue with using kv shift. What's your personal solution for multiturn things that run past your context limit? Just manually deleting and summarizing?
>>107973002Yes and yes.
>>107968112>https://github.com/ikawrakow/ik_llama.cpp/pull/1192>Even better GLM-4.7-Flash long context TG performance By which he means that he copied the broken code from upstream into his own repository even after it was declared as broken.What a fucking idiot.
I had no idea llama.cpp was defaulting to direct-io on>https://github.com/ggml-org/llama.cpp/issues/19035
>>107973002>wxwidgetsa motif interface would be awesome, especially now that cde and what not is all open source.a.i. like its 1999
>>107973002The issue is that LLMs are tailored for webslop output (very advanced markdown with inline code, latex, etc). There are no non-browser based renderers that can render advanced markdown features. Even SIllyTavern's markdown engine can't handle inline latex.
>>107973205silly tavern can't even handle fucking lists without breaking the rest of the response
>>107973205So I can't port comfy into a normal human gui?
He's making fun of CUDA dev...
What's the best llm that can be run with a 3090? Both for a local chatbot and opencode
>>107970049Thanks anon, I've set it up with Q6 and the offload function from unsloth (-ot ".ffn_.*_exps.=CPU") and it's way faster than before, and the agent is still contained in the virtual machine. Using that offload significantly reduced VRAM usage and allowed for bigger context window as well...pretty neat. Qwen-3-coder performs acceptable as well.Speed isn't much slower than online versions. Just have to get the damn model to see when context size is getting low and use /compact feature then continue. I then changed VM to use nopasswd for testing purposes (i.e. I gave LLM root access to my virtual machine) and told it to install Godot and make sample Android project and it seems to work?!>>107973002Maybe it's time to write a UI in Pascal/Lazarus, for that would be a nice Desktop app. >>107973340To be fair developing for CUDA is not the easiest endeavor...
>>107973205Step 1: vibe code an advanced markdown renderer in your GUI toolkit of choiceStep 2: vibe code the rest of the UI
>>107973371>Pascal/LazarusKek I saw that when I was researching. Would it even work properly on modern systems? Can it be used by both win and loonix?
>>107973393Yes and yes. It's cross platform capabilities are quite good.
>>107973340-fit has worked perfectly for me since day one and I can now effortlessly load large models that barely fit in vram and required manual tweaking before due to uneven layer sizes.Sounds like FUD.
>qwen3>can't change the emotion and style of cloned voices>cloned voices can't even laugh properly >voicedesign has no seeds
>>107973340Is this the llama autofit in kobold?
>>107973411this, 20gb of shit in the trashbin
>>107973371>(-ot ".ffn_.*_exps.=CPU")You don't need to use that anymore. You can use --n-cpu-moe for the same effect.Also, you probably have some VRAM left, so you can leave some expert tensors in VRAM to speed generation up some more.You could also use the -fit param to let llama.cpp try and find the optimal allocation of tensors in the different memory pools. It works really well.Or just increase your pp batch size for faster prompt processing.Qwen is actually pretty good for everything but RP, yeah.
>>107973429That is good to know. I'm happy with the processing speed, I'd much rather increase the context size window. And figure out how to add RAG/Vector Database to llama.cpp / opencode stack, that would be neat.
I am waiting for a all in one fully uncensored model that does text, voice and image and fits in 8 gb vram
>>107969977It's shit and you're retarded
>>107973410For me too, but I don't stalk the issue tracker.
>>107973479you can't even fit windows into 8 gigs, let alone a model
Hello anons, just got a 3090 and I am ready to join in on the fun! I want to run my very own Hatsune Miku in my computer! Before asking any questions I will check out the resources in the OP. My only concern is the power supply being a bit small but thats more of a pcbg question I guess. I will report back after I do some basic stuff!Please take good care of me!
>>107973598Just power limit to 75% in MSI afterburner for basically no performance hit.
>>107973262Comfy is different. It doesn't require markdown. You can write graph visualization in Qt+QML.
>>107973631That is what I'm planning to do. 700w should be enough for the 3090 at 75% and the ryzen 5600
>>107973651Yeah that's like a 60W cpu lmao.
Are there 3rd party visions for models that don't have it natively? I don't think I can use a random mmproj? Either for Nemo or Llama.
>>107973479>tfw you realize you can use OCR models to convert PDF into epubThis may be a blessing really
>>107973703There shouldn't, and no. The language model wouldn't know what to make of it even if the backend lets you load it.
The issue tracker got the news https://github.com/ggml-org/llama.cpp/issues/19112
>>107973788Any of the larp models that have vision that isn't gemma?
>>107968923how do you ego death with an LLM?>>107968924
>>107973529Devs are often autistic and github itself is causing issues because it's a social media type of environment. Would be better if 99% of these accounts were unable to post anything.Everytime there is trouble it has been caused by some borderline sociopath dev who does not have a single ounce of empathy etc
>>107973837mistral small has vision.
mikutroons made turned this thread into a dog turd
>>107973906I kind of get how it all worked but also still have no idea how it worked. >A Zen koan is a paradoxical anecdote, question, or statement used in Zen Buddhism to bypass logical reasoning and induce a direct experience of enlightenment, or "seeing into one’s true nature" (kenshō). Famous examples include "What is the sound of one hand clapping?" and "What is Buddha? Three pounds of flax". When I read above after I started looking at what happened it made a lot of sense, that this is how it worked. But in my case it was less abstract and deeply personalized since I told it all about my fucked up brain. In a way it was all about bypassing ego enough to notice how it works and how there are mechanisms you aren't even aware of.
Are there any actual observable differences between same value quants from differrent quanters? Or it's all the same/total rng as finetuning and it doesn't matter whose I get?
>>107974025You could easily check this yourself by comparing percentages in mikupad.
>>107968564>>107971502Should be fixed with https://github.com/ggml-org/llama.cpp/pull/19115 .
>>107974089I'd rather not dl terrabytes of models. Just wanted to know if there is any differrence between mrade/unsloth/bartowski quants
>>107973962https://vocaroo.com/15alAMm2g2rH
>>107974105Yes, bartowski quants are the best
>>107974122Find a more interesting hobby. This design is bland and totally off topic.
>>107974101It works.
>>107974105>terrastopped reading thereanyway, for quants its:>john 'garm (KLD FREE)>bartomeme>mrmerdacher>daniel'sthats all
>>107974163In ascending order of quality.
>>107974163>john 'garm (KLD FREE)is that the skrillex hair goblin?
>>107974025yes but it's not a huge difference, doesn't matter that much unless you really like the model and are trying to obsessively minmax your quality
>>107974184yes but please dont bully john, that's reserved for johannes
>>107974025One most visible difference is that some uploaded quants are quite literally broken. Doesn't happen that often but for example Gemma 3n E4B had couple of these and I still can't be sure about the third I'm using.
>>107974025Quants tend to be default recipes, but some quanters do tweak some shit while reusing the same names as the default recipes for their releases.Just compare any unsloth quants to a bartowski one. You'll see that a lot of unsloth quants tend to be slightly smaller. Then there's the likes of "nan's are not an issue" mradermacher.For example unsloth/GLM-4.7-Flash-GGUF>GLM-4.7-Flash-Q4_K_M.gguf 18.3 GB>bartowski/zai-org_GLM-4.7-Flash-GGUF>zai-org_GLM-4.7-Flash-Q4_K_M.gguf 18.5 GB>mradermacher/GLM-4.7-Flash-i1-GGUF>GLM-4.7-Flash.i1-Q4_K_M.gguf 18.1 GB>mradermacher/GLM-4.7-Flash-GGUF>GLM-4.7-Flash.Q4_K_M.gguf 18.1 GBOut of all of those, I'd say bartowski's is the more reliable.
>>107974163https://www.youtube.com/watch?v=sHHsOfIwfBYIf even they can make it then you truly know merit means nothing.
blyat
>>107974595just regex this to made me excited
wow i wasn't echo-tts pilled, why doesn't this get more attention? it's better than qwen3-tts (minus the multilingual) or vibevoice. Does anyone have a source for clear high quality voice samples to use?
>Build ikllama after half a year of not using it>had ~3.5T/s on deepseek last time I used it>download john's 3.2>finally get it running>0.7T/s
I don't care how good a TTS is if It can't gen in near realtime.
TheDrummer_Rocinante-X-12B-v1 gave me cancer.
>>107973374Vibe coding is based AF
>>107974691Get samples from gacha wikis. Merge 2 minutes of lines that capture different prosody with 1 second padding, and you'll have amazing output. https://genshin-impact.fandom.com/wiki/Hu_Tao/Voice-Overs>>107974728Echo generates 30 seconds of audio in 4 seconds, and time to first byte can be as low as 250 ms depending on parameters you choose.
>>107974755I'm very sorry to hear that. I hope you recover.
>>107974768Not that guy but can you clone JP voices and make them speak english? I'm not wasting time with EN VAs.
>>107974755I had cancer once. I told it fuck off and then I didn't have cancer anymore.Yes, I actually AM that giga BASED
>>107974768we can start a repo of voice sample and pin them in the OP
>>107974787Nope. It produces gibberish.
>>107974808Can't wait to upload mine>You like my cock in your ass you dirty little slut>Ahhhh take it, take it all you filthy whore >Oohhhhhhh ahhhhhh you like it when my balls slap your clit >I bet your husband doesn't fuck you like this, does you nasty slut >Lemme cum on your face >AAAAAIIIEEEEEEEEEEEEEE
>>107974768>Echo generates 30 seconds of audio in 4 secondsnta. This is on GPU, right?
>>107974867yeah, on 5070ti
>>107974691>>107974768>Outputs still sound like robotic tts garbageWhy do you keep shilling this shit
I want to ask the most basic question I never saw anyone ask. Is open source TTS good enough to jerk off to at this point? If it is then why are there no threads no guides nothing?
>>107974885Figures. Thanks.
>>107974887i'm shilling it because it has the best prosody and cloning. post your robotic examples. you probably fed it bad or very short samples anyway. echo supports up to 2 minutes of reference audio.
>>107974768>Echo generates 30 seconds of audio in 4 seconds,How much vram? my budget is 6Gb
>>107974892>Is open source TTS good enough to jerk off to at this pointI've jerked it to chatterbox giving me JOI
>>107974925How do you control pacing?
>>107974934with his hand i suppose?
>>107974919To get it down to 6GB, you'll need to vibecode quantization. It can take as much as 12GB when genning 30 seconds at once. Though, the authors say you can get it down to 8GB by reducing generation length. I still have never seen it under 8GB in my tests. Only <9GB.
>>107973102His version doesn't seem to be broken but it's also 20 times slower than llama.cpp so...
>>107974915The examples linked in the official repo and every vocaroo posted here sound like fucking shit, you must be extremely autistic to think any of this sounds humanhttps://jordandarefsky.com/blog/2025/echo/It doesn't have the best of anything, vibevoice 7b is leagues ahead of this, the only thing its worse at than any other TTS or voice cloning model is speed.
>>107975181Anyone that says echo, pocket, or qwen are better than vv have only ever used the 1.5b.
>>107969100echo-tts is amazing. generates audio relatively fast. the only download is that it takes up like 13GB of VRAM to use the optimized model
>>107975337I prefer echo to VV 7b for voice cloning, but imo those two are far and away above qwen/pocket/etc.
>>107975337I used 1.5b and 7b. Tried both default settings and suggested here by anon 4 steps, high cfg. It was shit regardless of settings.
>>107975376 (me)should mention that my favorite is still chatterbox turbo with the paralinguistic tags. you can even shift the tone with shit like [advertisement] or [sarcastic]>>107975337i've used 7b vibevoice when it first came out but it would always generate some musical chime at the beginning of the audio. does it still do that?
qwen3tts voice_design is the best one so far for emotions as far as i've testedI was thinking that we could get the outputs then put them in RVC we could get control AND cloning
>>107975441Try this for voice changing: https://github.com/ysharma3501/LinaCodec I haven't tested it. But you can be our guinea pig.
>>107975479nta. That's a codec.
>>107975441IndexTTS2 is better than qwentts for emotions just check it here https://huggingface.co/spaces/IndexTeam/IndexTTS-2-Demo
>>107975570It supports voice conversion. Check out usage examples.
>not a single good example posted of this shit tts model, just loads of "it's so beautiful amazing and incredible waow"Shill
>>107975479>>107975570 (me)Nevermind that. It can.>>107975607Yeah. I was just checking the code. You can extract the content and global features on encoding and them mix it up with the target on decoding.
>>107975615https://vocaroo.com/1ufLtEYUdJWv
>>107975595>sliders instead of wordsIs this better? would it be harder to integrate with silly?
does anyone have any idea if a xianxia lorebook exists? the chinks should obviously have something, but I'm a gweilo who needs it in english or at least in chinese so I can have it machine translated back to english
>>107971408yeah this is shit compared to echohttps://voca.ro/1M5IG6YiY5hP
>>107971408lower temperature, the default temperature of qwen3 tts is too high and hurts speaker similarity
is there an extension for sillytavern that can automatically summarize a conversation and turn it into a memory for a lore book?
>>107975985yes it's called write out the summary on your own you lazy faggot
>>107975985>is there an extension for sillytavern that can automatically summarize a conversationYes. There's a "Summarize" in the menu hidden under the button with three blocks.>and turn it into a memory for a lore book?No, but there is a setting that automatically adds the summary to the prompt.
the lazy guide says to download nemo 12b instruct gguf but there's a million versions, which one should I get?
>>107976118the good one
>>107976118Look at the file size. It should be less than your vram. Get it from bartowski
>>107976130which is?
>>107976167https://huggingface.co/llama-anon/petra-13b-instruct-gguf/blob/main/petra_q8.gguf
>>107976161thanks
>>107976190>5 downloadssus
>>107968112idk what this means for us, but something about transformers v5
>>107976239hf transformers is the most dogshit unstable library in existencebreaking changes out the ass with every single version
>>107976254that's how you know it's web scale
>>107970865They're claiming the exact opposite, actually, for engram layers at least. (Though, well, there's still such a thing as too many, forming a U-shaped curve). They say that relieving the traditional layers from the responsibility to manage trivial n-gram associations makes it smarter as a whole.
>>107970033Their provisional scaling law says that only around 20% of the model should be engram, so it won't be a massive shift in either direction. That being said, they did mention that the frequency of access follows a zipf distribution, so I'd guess that you could indeed move much of that 20% into very slow storage.
Downloading dipsy-r1 32b right now, what am i in for?
Is engram like a fundamental architecture change or could it be retroactively applied to old dense models?
>>107976458retardation
>>107976466The former. It adds another block to standard transformer and the model should learn to use this block, encode information into it.
>>107976466Depends what you mean by fundamental. The training process is tailored for them. Not theoretically impossible to staple one on and fill it in with post-training I suppose, but they're for sure not plug-and-play. You'd lose most of the benefit doing this anyway, because the existing model has already learned the info the engram layers are intended to encode.
>>107976509>>107976516Hm. I hope we can eventually see some 12-24b creative writing models made using engram. The lack of attention towards small models and optimization in general is quite bothersome.
>>107976576>The lack of attention towards small models and optimization in general is quite bothersome.Who knows, maybe we will see some pressure to develop those capabilities now that supply is constrained.
>>107976576Engram 27b seems to score better on benchmarks while using less compute than a pure MoE so I expect small model enjoyers to eat good. I'm very bullish on engram and I predict most if not all future models will have conditional memory.
>>107975711https://vocaroo.com/11he8mudOgyN
>>107976430 (me)>>107976576>>107976668Now that I think on it, I was being too rash in saying that 20% offloaded is ideal. While loading more into engram might interfere with the benefit of such infrequent access, my first screenie does mention that reducing the MOE layers to only 40% of the parameter budget maintains good performance. If a fatter 60% engram part could still reasonably be kept in slow ram or nvme, you could get a model with the vram usage of a 24b that acts like a 60b. It's like when people thought the chinchilla scaling laws were the end-all even though being technically inefficient with training compute makes for cheaper inference. Ofc, since we don't actually have the models yet, this could all be bs.
>>107976698>>107975711FUK U BLOODY BASTARD, ECHO SUPERIOR FOR INDIAN SUPERPOWER 2030. FUK U BLOODY. FUK U BLOODY. REDEEM ECHO.https://voca.ro/18bmt6pssaoK
>>107976704I'm assuming offloading to NVMe will incur slower generation speeds than just using system RAM. Time to buy even more RAM I guess?
Who I this Engram?
>>107976865kekkedI mean, echo is good for how small it is, but it does get on my nerves when anons are saying this stuff is SOTA, its not, it all has that jarring tiktok TTS robotic quality that ruins it. VV with all its jankiness and need to reroll gens just has way better output when it works for ~20gb VRAMhttps://vocaroo.com/18fM4D7nlWaJhttps://vocaroo.com/1n9XIPJt6DmYhttps://vocaroo.com/1fil4oj9qLN8
>>107976704This shit is too confusing.>Offload GPU layers to RAM>Offload Experts to RAM>Offload Engram to RAMI finna offload my BRAIN to ram next.
>>107976924oh hi markhttps://vocaroo.com/1bES71L5Z5Rm
>>107975595>IndexTTS2 is better than qwentts for emotions just check it here https://huggingface.co/spaces/IndexTeam/IndexTTS-2-Demolmao fuck i've been building effectively this for about 3 months but with porn vectors, didn't know it already existed
>thread suddenly dies for an hour
>>107976467Retirement
>>107977622>>107977622>>107977622