[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File deleted.
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107931319 & >>107921731

►News
>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115
>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B
>(01/21) VibeVoice-ASR 9B released: https://hf.co/microsoft/VibeVoice-ASR
>(01/21) Step3-VL-10B with Parallel Coordinated Reasoning: https://hf.co/stepfun-ai/Step3-VL-10B
>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 84612250.gif (3.78 MB, 1000x1000)
3.78 MB
3.78 MB GIF
►Recent Highlights from the Previous Thread: >>107931319

--Paper: GutenOCR: A Grounded Vision-Language Front-End for Documents:
>107936476 >107936584
--Papers:
>107939591
--Exploring Prime Intellect's inspiration and challenges of community-driven AGI:
>107932226 >107932384 >107932592 >107934090 >107934103 >107934163 >107934641 >107934328 >107934382 >107934885 >107936047 >107936711 >107934388 >107934573 >107934632 >107934638 >107934991 >107934273
--Tool calling and interface challenges:
>107935731 >107936039 >107937743 >107937832 >107937968 >107938061 >107938136 >107938165 >107938620
--Qwen3-TTS open-source release and feedback on voice quality:
>107939466 >107939503 >107939517 >107939569 >107939899 107939953 >107940080 >107939547 >107939570 >107939652
--GLM Flash instability and dense vs. MoE debates:
>107931679 >107932773 >107932884 >107932904 >107932938 >107933038 >107933064 >107933294 >107932798 >107932877
--Skepticism about Step3-VL-10B's claims and PaCoRe's practical implementation:
>107931672 >107931686 >107931709 >107931744 >107931816 >107931841 >107931862
--Microsoft VibeVoice-ASR as Whisper competitor:
>107932671 >107932696 >107932707
--Skepticism about LLM optimization progress amid scaling challenges and FOSS concerns:
>107932593 >107932685 >107932889 >107938290 >107938365 >107938374 >107938384 >107938419 >107938430 >107938474
--OpenAI's potential decline and its impact on AI research:
>107937322 >107937363 >107937377 >107937387 >107937612
--Feasibility of full-stack mobile development vs remote solutions:
>107931385 >107931401 >107931460 >107931466 >107931544 >107931546
--ggml-cpu prompt processing speed boost via tiled FA optimization:
>107939144 >107939456
--Inquiry about Chroma-4B TTS model testing:
>107934132
--Miku (free space):
>107932314 >107932860 >107935545 >107937903 >107938061 >107939322 >107939345 >107939868

►Recent Highlight Posts from the Previous Thread: >>107931323

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
This has loads of potential

https://vocaroo.com/1668JWsCjKjq
>>
sex with piss-haired miku
>>
(local) lolisex
>>
things are about to get wild for local
>>
File: 1763066570866113.jpg (345 KB, 1920x1080)
345 KB
345 KB JPG
>>107941128
>>
>>107941128
So what do I run on my 5090?
>>
>>107941249
nemo at bf16
>>
>>107941151
I love China.
>>
wangs llama 4.1 will blow your mind
>>
>>107941128
Huffing Rin-chan's armpits.
>>
>>107941259
PLACEBO
>>
>>107941249
>https://huggingface.co/mistralai/Mixtral-8x7B-v0.1
>>
>>107941279
out of 10!
>>
The middle is mid but that's really good. It just needs the option to target specific parts of a sentence with style descriptions.

https://vocaroo.com/1jZiPc0wwRg6
>>
>>107941318
I'm still building flash attention...
Which version are you testing? How much vram it takes?
>>
>>107941347
Get prebuilt wheels
https://github.com/mjun0812/flash-attention-prebuild-wheels/releases
>>
mikutroons are disgusting
>>
>>107941128
tell me how to poison LLM's.
>>
File: chbhalf.webm (1.81 MB, 960x540)
1.81 MB
1.81 MB WEBM
>>107941128
>>107941129
>>107941211
>>
>>107941377
What is a wheel? I ain't installing random virus shit.
>>
>>107941387
In what context?
>>
>>107941347
VoiceDesign. 10GB
>>
>>107941396
This post has to be bait right

Surely there's no way that someone who is compiling flash attention doesn't know wtf a wheel is, and is also too inept to fucking look it up or ask an LLM?
>>
>>107941397
I want to poison them from crawling my websites. poison them by asking questions.
>>
>>107941377
>torch 2.10
Is it good to upgrade?
>>
>>107941410
That's steep.
>>
File: 2026-01-22_19-35-43.png (7 KB, 142x158)
7 KB
7 KB PNG
>>107941426
a peeve of this file
>>
>>107941410
The bigger model? That better not be the 0.6B
>>
>>107941454
Yes.
>>
>>107941415
You're going to have to clarify what you mean by "poison", and also what kind of behavior you're trying to block.

Like, you want to stop your site from being scraped to be included into training data sets? Or you want to prevent LLMs from looking up your content at runtime with web search tools?
>>
>>107941502
I want to insert (poison) data into whatever they are scraping from my websites (not images) just articles and files.
>>
>>107941515
anon, you did not answer either of the two questions
>>
>>107941522
because he has no idea what he's talking about
>>
>>107941515
You can add a bunch of hidden text with instructions and shit like that in the middle of the normal text I guess. Schizo ramblings and the like but nothing that could get you in trouble.
I imagine that the more intelligent scrapers will be able to remove/ignore stuff hidden with CSS, but still.
It's mostly a fools errand though.
>>
>>107941279
>suddenly
>>
Basically I'm just not going to use it (Local LLMs). Ugh, I know. hahaha. It's just I'm not going to use local LLMs is all.
>>
What's the best tool to finetune a small LORA?
>>
>>107941567
Daniel's script.
>>
>>107941578
not touching unSLOP
was just checking to see if there's something better
>>
File: 20220425_124328.jpg (144 KB, 1080x1260)
144 KB
144 KB JPG
How flexible is qwen3-tts? Can you mimic beta version women?
>>
what online llm provider is best? is the chatbot arena or lm arena good they have free direct chat. also dont tell me local model i only have 24gb vram and local ones are vastly inferior to corpo ones unless you use 300b chinese ones. also corpo ones are more based and uncensored nothing local comes close to grok 4.1 or even gemini 3.
>>
>>107941644
novelai
>>
>>107941623
Hot?
>>
open source palmyra x5 when?
>>
>>107941667
thanks for the gold kind sir!
>>
>>107941377
Thanks. I got tired of waiting for it to build and got the wheel.
https://voca.ro/1iv8HnvYexq6
>>
Qwen didn't provide any example code for streaming audio in real-time from their Qwen3-TTS model even though they have a streaming (realtime) mode in their API.
>>
I'll wait for pure c++ implementation
>>
File: fda.png (190 KB, 426x266)
190 KB
190 KB PNG
>>107941756
>>
llama cheese pee pee, ollama or lm studio?
>>
>>107941644
openrouter
>>
https://voca.ro/1kD7mvFCtuVJ
If you try, you can get something like moans. A little finetuning on eroge audio should make it fully usable.
>>
>>107941855
neuron activation
>>
>>107941826
https://voca.ro/1jX9b8Y3gSuo
>>
>>107941800
exllama v3
>>
File: 1753916101231530.jpg (51 KB, 640x480)
51 KB
51 KB JPG
The qwen-tts finetuning process looks weird to me. What's the point of having a single reference audio mixed with the training data? Wouldn't that make generalization harder?
https://github.com/QwenLM/Qwen3-TTS/tree/main/finetuning
>>
>>107941855
indistinguishable from vnslop
>>
>>107941955
No word about dataset size?
>>
>>107941955
What is "reference audio" even supposed to be?
>>
>>107941914
Still no glm flash support
>>
>>107941955
I suppose they see finetuning as a way to get only one specific voice rather than general finetuning.
>>
>>107942013
reference audio is used for 0-shot voice cloning, it's just a small sample of the voice you want to copy (often ~10s)
>>
drummer is a mikutroon
>>
>>107941545
It's one thing to fantasize about it when you're 12 versus pushing the balding age of 48, anon.
>>
>>107942069
this, but without miku
>>
https://voca.ro/18Aq0IXPqvz4
Something is strange about the demo app. Generation is slower than realtime, but gpu never goes over 21% utilization. Also, 1.7B VoiceDesign takes only 5.5GB vram here. Pretty good. 0.6B must be small enough to run in background all the time.
>>
>>107942070
Being an adult also means you don't have to care about retards trying to police your thoughts.
>>
>>107942090
!hag jumpscare!
>>
>>107942092
you are replying to a brat living with his parents
>>
>>107942070
>>107941545
>when you're 12
What sort of freak were you to romance girls when you are fucking 12?
>>
>>107942233
anon I had a girlfriend in 3rd grade what the fuck is wrong with you
>>
>>107942233
mutt moment
>>
>>107942245
girls have cooties when you are 12. they have coochies later on.
>>
>>107942233
Most boys start talking about girlfriends around 10 or so.
>>
>>107942233
anon...
>>
>>107942233
anon, you don't even need to romance girls. just go check out tons of videos where 12 yo girls bragging about their body count.
>>
>>107942262
I forgot westerners are mentally ill and hate women
>>
As someone who mainly cares about cloning and prosody: echo-tts mogs.
The voice description/instruction stuff is cool though.
>>
>>107942284
>echo-tts
That's not gptsovits though
>>
>>107942294
The only thing better about sovits is that it's faster, it did used to be my tts of choice though
>>
local muslims general
>>
>>107942284
Echo-tts is nice, but it's larger and it's English only.
Also, speaking of voice description: it works only for the first few seconds, the longer your prompt the farther it drifts away. Its main use is to create a voice for cloning with the base model.
>>
File: audiofag.jpg (786 KB, 3228x2145)
786 KB
786 KB JPG
So how big of a dataset do you need to tune the qwen tts? Twenty 10 seconds samples enough or does it need way more?
>>
File: Miku-26.jpg (174 KB, 512x768)
174 KB
174 KB JPG
Anyone out there running larger workloads on an older gen DD4 EPYC like rome? What's the inference t/s on something like Q4 DS3?
Comfy 80's style Miku offering attached
>>
>>107942320
Why do I hate this man despite knowing nothing about him?
>>
what is the best potato sized tts model that can run on cpu. i'm talking <1gb, ideally under 500 megs. i'm building something.
>>
>>107942177
Most /g/ users are brats.
>>
>>107942353
pocket? hasn't it come out two days ago or so?
>>
>>107942341
i do. i get around 3t/s after offloading to my dual 5090s. the performance is not great and the prices are obviously terrible right now.
>>
>>107942320
No one has tried finetuning yet.
>>
File: file.png (3 KB, 49x44)
3 KB
3 KB PNG
>>107942351
might be this
>>
>>107942365
It's still not schizo-friendly, come back later
>>
>>107942375
mikutroons are schizos though
>>
>>107941855
Can it do blowjob noises if you give it a bunch of ちゅぱちゅぱ text?
>>
>>107942364
thanks i forgot to bookmark this and forgot its name. i was hoping someone would mention it.
>>
Is hunyuan video 1.5 still the best for generating actually realistic and good looking human females?
>>
Why are people here fapping to tts models that have an rtf of >5 or worse?
Does your brain process 0.2 tokens per second, or what's the secret? Seriously.
>>
File: Miku-27.jpg (204 KB, 512x768)
204 KB
204 KB JPG
>>107942366
Thanks. Still looking like it might be worthwhile due to a stash of old RAM I've got and the difference in power draw vs my 15 year old opteron
Wish I'd bought the ewaste before it became a hot commodity tho
>>
>>107942353
chatterbox turbo at q4 should be around 1g.
>>
>>107942380
No. I tried telling it to speak with slurping, squelching, make wet sounds, but nothing worked.
https://voca.ro/15jj3JMO7nR8
>>
>>107942479
imagine not training your dataset on audio samples ripped from visual novels
>>
>>107942479
Why would a speech model make "slurping, squelching, wet sounds"?
>>
>>107942441
by not streaming
something that sounds like a real person is infinitely more coomable than something that sounds like a telemarketer bot no matter how little you have to wait for the latter
>>
What's the retard one button way to run Qwen 3 locally
>>
>>107942479
Can it do onomatopeia at all?
>>
>>107942510
kobold
>>
>>107942524
I meant qwen 3 tts, does kobold just run it?
>>
>>107942441
With streaming you can run any tts chained to a llm with a latency of <1s
>>
>>107942510
https://github.com/QwenLM/Qwen3-TTS?tab=readme-ov-file#environment-setup
then
https://github.com/QwenLM/Qwen3-TTS?tab=readme-ov-file#launch-local-web-ui-demo
Qwen is probably the easiest tts to get running.
>>
>>107942513
As far as I can tell, it can only "read" it. The rest depends on how close lexical representation is to the actual sound.
>>
>>107942534
I haet conda, not installing this shit on my linux pc
>>
File: 1768991819778784.png (263 KB, 657x375)
263 KB
263 KB PNG
>>107942367
I did and it sucks dick, I guess gptsovits isnt going away any time soon, hell that one jeet's TTS is better. Not to mention the training script is broken until you fix the copytree call trying to use the HF repo path instead of where the files actually are
The only worthwhile aspect is maybe the voice design model
>>
>>107942534
Yeah, I will wait for ggufs
>>
>>107942630
Use uv. It's absolutely retarded to have a copy of a full venv in each proejct folder, but it works far better than conda at least.
>>
>>107942630
Conda isn't required. I'm running it in venv.
>>
File: 1738830131563719.jpg (42 KB, 430x437)
42 KB
42 KB JPG
Is there a clean way to use (parts of) old Claude/GPT jailbreaks on a local?
I'm nostalgic for some of the fun shitposty ones like bloatmax but don't want to wire it all up manually
>>
>>107942670
it's a hardlink
>>
>>107942636
What exactly sucks about it? Does it not copy the voice? Turns into gibberish?
>>
>>107942679
I'm not too familiar but those are just ST presets right? you can just use your local chat completions endpoint and they should work as is, no?
>>
>>107942681
Good to know, don't need to feel guilty about using it now.
>>
How are these TTS models? I was thinking of turning some heavy text books on various topics like history or technology into audiobooks so I can listen to them while reading them. Is this possible on local hardware with only 8gb vram?
>>
>>107942700
ok
>>
>>107942698
First of all voice clone seems to shit itself with non English ref prompts
Then if you have a cloned voice that's decent you can't use the CustomVoice with it for style instructions, so you have to finetune
So then you finetune and it comes out as garbage, I'm not sure if it's related to length, I tried low and high epoch count and batch size, just comes out retarded whereas the same dataset works great with sovits

Basically just a waste of time
>>
>>107942447
I had a 28-core/56-thread Xeon Platninum setup with 512GB of DDR4 last year. It could run a q4 of Deepseek R1, but not at a speed you'd want to interact with.
>>
>>107942730
Yes. TTS models tend to small and you could probably fit nearly any of them in 8 GB. Try kokoro, echo-tts, or chatterbox.
>>
>>107942730
8gb is enough for smaller tts up to about 2B. But not enough for VibeVoice 7B. If you want to read audiobooks, which are very long, you need to find scripts or vibecode your own to chunk text and merge audio files because most tts meant to gen only shot clips up to 30 seconds.
>>
>>107941128
Qwen3 TTS? How does it compare to https://github.com/resemble-ai/chatterbox?tab=readme-ov-file
>>
File: 1763573822506022.png (343 KB, 602x603)
343 KB
343 KB PNG
>>107942700
>that
>pretending to be a personal trainer
>>
>>107942904
have you never read doujins? that's peak physique
>>
>>107942700
Are they prosecuting him for being too short or what?
>>
>>107942919
for being not rich enough to get away with it
>>
>>107942910
He's got the Unholy Maiden build
>>
https://github.com/ggml-org/llama.cpp/pull/19024
nvidia jeet got told off, priceless
>>
>>107942977
lmao kek
>>
>>107942977
Can't they just buy llamacpp?
>>
whats the minimum a model should have in terms of tokens and context to be able to tool call in things like opencode or claude?
>>
>>107942977
nice :rocket:
>>
>>107943010
Why buy something maintained for free?
>>
>>107942977
>subscription
How does that work if nvidia is ok with llama.cpp support? Can't you just download it and upload the weights to HF? Isn't that what would happen if it gets goofed anyways?
>>
>>107942977
God damn son.
Dayum.
>>
>>107943046
>>107942977
based, fuck NVidia
>>
>>107943036
y'know, usually at the big company like this there's a huge disconnect between engineering, marketing and sales
someone down the chain thinks they can make a few bucks not realizing that they are in fact fucking everything up
>>
>>107943015
What do you mean tokens? t/s? It depends on your patience. Context isn't related to the ability to call tools.
Anyway, I haven't tried it myself, but vibecoders say that toss-20b and qwen-30b-a3b are okay-ish for agentic coding. toss-20b fits into 16gb at native mxfp4 and it's ridiculously fast.
>>
>>107943054
Jeff Bolz is alright.
>>
>>107942927
He should run for president.
>>
>>107942977
>Anav Prasad
saaar kindly merge the nemotron commit, ngbenchodson
>>
>>107943046
>enablement
>>
Qwen tts actually hype?
>>
>>107943068
I o ly have 24gb of ram and a 1660ti. It's over for me
>>
>>107943046
the grand inquisitor's reign of terror continues
>>
>>107942977
My fuarking hero
>>
Hello guys,

Why "LTX-2: Load Latent Upscale Model" node doesn't fetch the upscaler files that are present in the right folder?

Comfyui UI startup logs:

E:\ComfyUI_windows_portable_nvidia>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --listen
[...]
Adding extra search path loras D:\AI\models\loras
Adding extra search path upscale_models D:\AI\models\upscale_models
Adding extra search path upscale_models D:\AI\models\latent_upscale_models
[...]


Folder structure:

D:\AI
───models
───audio_encoders
───checkpoints
ltx-2-19b-dev-fp8.safetensors
ltx-2-19b-distilled-lora-384.safetensors

───clip_vision
───configs
───controlnet
───diffusion_models
───embeddings
───latent_upscale_models
ltx-2-spatial-upscaler-x2-1.0.safetensors
ltx-2-temporal-upscaler-x2-1.0.safetensors

───loras
ltx-2-19b-ic-lora-depth-control.safetensors
ltx-2-19b-ic-lora-detailer.safetensors
ltx-2-19b-ic-lora-pose-control.safetensors
ltx-2-19b-lora-camera-control-dolly-in.safetensors
ltx-2-19b-lora-camera-control-dolly-left.safetensors
ltx-2-19b-lora-camera-control-dolly-right.safetensors
ltx-2-19b-lora-camera-control-jib-down.safetensors
ltx-2-19b-lora-camera-control-jib-up.safetensors
[...]
>>
>>107943095
Voice designer is kinda fun, but it's a designer. You won't use it for a consistent character since every generation gets a different voice.
For ordinary tts tasks, it feels just about what you'd expect from its size: better than 100B but not as good as echo-tts or vibevoice. Still, when it comes to multilingual capabilities, it's the best because echo and vb support only english and chinese.
>>
>>107943156
>/lmg/ - a general dedicated to the discussion and development of local language models.
>local language models
Go to >>>/g/ldg/
>>
>>107942353
>>107942364
Pocket and Kokoro are micro TTS that can run on anything. They're both ~few hundred MBs in model size and runs fast on CPU.
>>
Has anyone VoxCPM or Maya?
>>
>>107942351
Envy for his pure listening experience.
>>
I don't really care about any of these TTS models until they have llama.cpp support. How different is the architecture from traditional LLMs? I can't imagine it would be that hard to jerry rig in support.
>>
>>107943183
>echo-tts or vibevoice
Am I crazy or chatterbox is better than both of these? TTS autists please enlighten me.
>>
>>107943307
You're crazy. Chatterbox is overfit for assistant slop voices. Its clonning is weak. It turns anime voices into middle-aged Karens. Its prosody is just as weak. It's monotone.
>>
>>107943183
>You won't use it for a consistent character since every generation gets a different voice.

https://github.com/QwenLM/Qwen3-TTS?tab=readme-ov-file#launch-local-web-ui-demo
>If you want a designed voice that you can reuse like a cloned speaker, a practical workflow is: (1) use the VoiceDesign model to synthesize a short reference clip that matches your target persona, (2) feed that clip into create_voice_clone_prompt to build a reusable prompt, and then (3) call generate_voice_clone with voice_clone_prompt to generate new content without re-extracting features every time. This is especially useful when you want a consistent character voice across many lines.
>>
>>107943365
Yeah, I saw that part. It tells you to use VoiceDesigner only for the first step. Once you have your reference audio, you switch to the base model for cloning. That's what I meant by saying you won't use VoiceDesigner as your main thing.
>>
>>107943307
I haven't tried echo-tts, but even vibevoice 1.5B performed better than chatterbox. However, both struggled with high-pitched or energetic voices.
>>
File: 2727636.jpg (11 KB, 200x200)
11 KB
11 KB JPG
If a model keeps spamming *actions* every three words, do I just lower temp?
>>
>>107943350
I'm with you on Chatterbox sucking, though in my experience, it turned normal voices into extreme Southern US accent women. I hate Southern US accents. It was infuriating.
>>
>>107943183
Thanks for the reaponse blud
>>
>>107943475
usually there's no way around changing models in that case
>>
Just to clarify, the people hating on chatterbox are only using it for high pitched anime voices right? Not for other non gooning related use cases?
>>
>>107941128
why is this kid manspreading
>>
>>107943515
Airflow
>>
air status?
>>
>>107943504
Why would you pick chatterbox for non-gooning tasks if there are piper, kokoro, pocket, etc that can run on cpu and good enough for monotone reading?
>>
>>107943528
Cuz I have a 5090? Why should I care?
>>
File: image (3).png (93 KB, 1324x613)
93 KB
93 KB PNG
Is this still relevant to making char cards? Found on some wiki
>>
>>107943547
It's not a science at all, just experiment and take inspiration from others
>>
>>107943547
this looks like some ancient 2022 shit
>>
>>107943556
I mean the syntax and shit. Must everything be a json-like?
>>
>>107943564
Not at all.
>>
>>107943528
kokoro cannot distinguish a period, meaning if you give it an academic text it reads very unnaturally making it very hard to follow. My question wasn't an attack, I just wanted to make sure.
>>
>>107943559
https://wikia.schneedc.com/bot-creation/trappu/introduction
From here
>>
>>107943528
>why do you use good model instead of bad model
What the hell kind of question is that?
>>
>>107943307
>>107943504
chatterbox cloning is really bad for any voice with distinguishing features, e.g. I can feed it an american voice with heavy vocal fry and it generates a vaguely-similar pitched voice with an english accent, whereas vibevoice reproduces it fine and echo-tts does it near-perfectly. chatterbox gens also tend to have a strong "amateur reading off a script" quality compared to echotts and vibevoice, they're much less natural no matter what voice you're using in my experience
>>
>>107943515
that is the ugly old man attractor pose
>>
>>107943577
I see. If your usecase is academic papers, consider giving qwen a try, they specifically mention that their model can read stuff like this:
>I am solving the equation: x = [-b ± √(b2-4ac)] / 2a? Nobody can — it's a disaster (◍•͈•͈◍), very sad!
https://voca.ro/17bGjzuKYo8R
>>
>>107943643
there is always this hissing in these lower parameter tts models. Now I basically just started checking out tts models an hour ago, I don't have much of a frame of reference but chatterbox was the only one up until now which was able to generate audio which sounded like actual human recordings. Like for example in your uploaded audio, there is this hiss in the voice.
>>
>>107943672
A hiss? I don't hear anything.
>>
>>107943672
https://voca.ro/1oX8uRtb14Fe
>>
>>107943672
Are you saying you like the hiss/room noise/verb? Chatterbox does preserve that, but I run UVR on my voice files.
>>
>>107943802
>>107943811
your horniness is blinding you. just kidding, but do you really not hear the hiss in the background or sometimes bleeding into the voice? Are you using headphones? Are they maybe cheap?
>>
>>107943822
This gen >>107943811 is made based on a poor source audio, which explains background noise.
>>
>>107943821
No I hate the hiss/room noise/ver. I basically just want to generate audiobooks on academic texts or books but have it sound like a professional voice recording, like a David Attenborough documentary. Up until now I've only done stuff with llms or diffusion so I am not in the know when it comes to tts models. I'm currently setting up chatterbox-Audiobook as it seems to be a tool made specifically for my use case. Will see how that turns out and if its not good enough, probably check out basically all other popular tts models with comfyUI
>>
>>107943858
Then I highly recommend doing a pass of uvr and mel denoise on your vocal sample. There's an webapp version.
>>
File: K.I.S.S.L.M.G.jpg (370 KB, 1536x1536)
370 KB
370 KB JPG
>>
File: which.jpg (278 KB, 2176x2048)
278 KB
278 KB JPG
>>107944006
>>
>>107944079
tummy
>>
>>107944087
retard
>>
>>107943547
Read this instead: https://rentry.org/Sukino-Guides
>>
>>107941128
Rin erotic
>>
>>107944132
No thank you.
>>
>>107944152
You forgot to say NTA
>>
Wait, does ooba have actual proper tool calling support now, with enforced grammars? Last I checked the piece of shit would not do proper tool calling reliably. Couldn't enforce the grammar correctly for non-standard tool call structures (Qwen3 coder)
>>
>>107944162
?
>>
seems qwen3-tts needs rtx cards (flash attention 2), back to kokoro I guess
>>
>>107944132
>just edit what the model says, dood!
>>
>>107944284
for a hobby requiring a lot of reading it sure seems like a lot of people just aren't capable of doing so
>>
>>107941128
Artist name?
The style is cute
>>
>>107944335
/lmg/ imagegen anon?
Always coming in with fresh high quality stuff.
A comfort that at least something don't change.
>>
>>107941273
Will this give me corpo tier denials for request?
>>
>>107944343
huh?
>>
>>107944344
I don't recommend it if you have a hotline kink.
>>
>>107941279
it's a compliment
>>
>>107944304
You're not entirely right anon. It does not only require reading, but also understanding. That's the hard part.
>>
>>107944399
Why are you like this?
>>
>>107944132
>>107944284
It's a great guide. you're just retarded.
>>
File: k12.jpg (11 KB, 467x481)
11 KB
11 KB JPG
>time=2026-01-23T02:27:43.305Z level=WARN source=runner.go:153 msg="truncating input prompt" limit=8192 prompt=10933 keep=5 new=8192
>24GB ram, 1660ti
>>
Brothers, I have envisioned a new formula what will change our world.
E = MC^2 + AI
>>
>>107944399
>it's not only X, but also Y
>>
>>107943547
No. It's retard tier stupid.
>>107943564
Hell no.
>>
>>107944399
Reading, understanding, and a solid grasp of English grammar and vocabulary, or whatever language one cares to use.
>>
>>107944784
Ok, bootlicker.
>>
gimme toss-240
>>
>>107944799
Do you have any idea how incomprehensively moronic you sound.
Fuck off.
>>
File: Base Image.png (1.27 MB, 1208x3832)
1.27 MB
1.27 MB PNG
Learning to Discover at Test Time
https://arxiv.org/abs/2601.16175
>How can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement learning at test time, so the LLM can continue to train, but now with experience specific to the test problem. This form of continual learning is quite special, because its goal is to produce one great solution rather than many good ones on average, and to solve this very problem rather than generalize to other problems. Therefore, our learning objective and search subroutine are designed to prioritize the most promising solutions. We call this method Test-Time Training to Discover (TTT-Discover). Following prior work, we focus on problems with continuous rewards. We report results for every problem we attempted, across mathematics, GPU kernel engineering, algorithm design, and biology. TTT-Discover sets the new state of the art in almost all of them: (i) Erdős' minimum overlap problem and an autocorrelation inequality; (ii) a GPUMode kernel competition (up to faster than prior art); (iii) past AtCoder algorithm competitions; and (iv) denoising problem in single-cell analysis. Our solutions are reviewed by experts or the organizers. All our results are achieved with an open model, OpenAI gpt-oss-120b, and can be reproduced with our publicly available code, in contrast to previous best results that required closed frontier models. Our test-time training runs are performed using Tinker, an API by Thinking Machines, with a cost of only a few hundred dollars per problem.
https://github.com/test-time-training/discover
Really interesting. probably the best so far this year
>>
>>107944284
??? yeah no shit, are you NOT editing what the model says? how fucking dense can you possibly be
are you sure you're in the right thread? I could've sworn this was /lmg/. it even has the cute and funny vocaloid girl
>>
>>107941128
Man I love techloligy
>>
>>107943068
context is related. all these ai agents will concatenate their own shit making the actual prompt huge. small token size would truncate the prompt and the llm wont know what to do
>>
5060 TI 16GB or 3080 TI 12GB faster for Nemo goof q4?
>>
>>107945041
5060ti would allow for much more context which is more valuable. the 3080ti would be faster but is basically e-waste at this point.
>>
>>107945059
nemo only good for 16k ctx anyway tho?
fuck have to choose today
i read 3080ti gets real time orpheus-tts, nobody posted 5060ti benchmarks
>>
>>107945099
Consider you'll need both models loaded and who knows what else. 16gb vram is still more valuable.
>>
>>107945099
3080ti is about 30% faster than the 5060ti. 16k context will take up about 6gb of vram and a q4_k_m is about 7.5gb. the 3080ti does not have enough vram for full context at that quant.
>>
What the best model for 2 3090s? GLM-Air is kinda poopy and a bit slow. There has to be something better, right?????
>>
>>107945114
glm4.6 if you have enough ram or old llama 70bs. very few options.
>>
>>107945111
>>107945110
good point ty.
I'll get the 5060TI
>>
>>107945134
I have a 5060ti and its great.
Blackwell too so i can run nvfp4 for imagegen.
>>
One thing I have noticed with drummer's models is that they are way overfitted on the AI beginning messages with its name. I only use third-person, so maybe others haven't had this problem. But it makes his models near unusable for me, no matter what settings I use.
>>
QwenTTS is the first model which can copy the voice of my waifu (Pony Waifu), VB is overrated has a strong censorship when your ref has high pitch character so if your waifu is a loli or a pony as my case, you Will get music instead. Is very hard to Clone young characters. So for me, QwenTTS is better that jeet voice
>>
>>107945312
How is it compared to echo? IDK what I upgrade to, I'm still using chatterbox.
>>
uuh... qwen tts comfy when???
>>
>>107945326
qwen is a safe horse to back
>>
>>107945312
Pitch of consent is a thing now?
>>
>>107945339
ha ha. hah... ha.
>>
>>107945326
Why no quants for echo :(
>>
Is now a good time to get into tts? Where do you guys usually get samples to train on?
>>
>>107945389
Podcasts (pornstar interviews) and /tv
>>
>>107945312
Can you share some examples comrade?
>>
Suggestion: Add the following line to the end of the "ERP" section of the "Recommended Models" Rentry
>For more recommendations, see [the UGI (Uncensored General Intelligence) Leaderboard](https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard). Sort by "W/10" (willingness to obey instructions) first and "Entertainment" (knowledge of erotic topics) second.

t. 12-GiB VRAMlet who just upgraded from Mistral Nemo (willingness 5.5, entertainment 2.0) Q6 to XortronCriminalComputingConfig (willingness 9.8, entertainment 2.6) Q3
>>
>>107945389
TTS rose from the dead and has been on a roll the past couple of months. I get my cringe anime samples from gacha wiki pages, they're typically high quality and without background noise.
>>
It's sad but LLMs are unironically the most stagnant sector of AI right now.
>>
>>107945446
Cmon

Open Source TTS and Music Generation are years behind SOTA
>>
>>107945434
chat, is this real?
>>
Are the SillyTavern devs aware of the bug causing newer versions 1.15x to cause grammar issues?
It's affecting all models loaded by koboldcpp backend.
>>
libre webui any good?
>>
>>107942534
Just in case you'd catch this error

 import flash_attn_2_cuda as flash_attn_gpu
ImportError: /home/ai/miniconda3/envs/qwen3-tts/lib/python3.12/site-packages/flash_attn_2_cuda.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib


Go to https://flashattn.dev/#finder to find a wheel compatible with your pytorch+cuda setup
>>
>>107945645
No
>>
>>107945645
Yes. Here is definitely the best place to ask.
>>
>>107945697
->
>>107941377
>>
I seriously hope Anthropic/OpenAI/Google get their shit together. I got frustrated with my big local models and decided to blow some money on Claude Opus/Gemini instead.
They're still a bit smarter but they're just about the same shit we have local when it comes to creativity and storytelling. Both are very slopped. The 'moat' is absolutely nothing compared to what it was just a year ago, let alone during the "gpt4 vs llama" or the "2024 open models vs claude opus 3" days.
I have very little hope for the next generation of open LLMs if this shit is what our chinese overlords are forced to distill.
>>
>>107945440
>anime samples from gacha wiki pages
Same. It's good for anime and vidya characters as well, since practically every Japanese VA has voiced a gacha character at this point so finding a gacha character with a similar voice to a character you want to clone ends up being faster than trying to extract audio samples from whatever anime/game they're from.
>>
>>107942977
based ngxson dabbing on proprietary jeets
>>
Someone already vibecoded Rust implementation of QwenTTS.
https://github.com/QwenLM/Qwen3-TTS/pull/8
>>
>>107945981
making LLMs better at "instruction following" requires an unholy amount of data, and to get IF data rather than pay a lot of humans to write it which would be expensive labs are just genning more synthetic garbage
that synth data is also getting more and more stiff as they gen it with newer models that write in an even stiffer fashion and this is how you end up with ChatGPT 5.2 that likes to write very short sentences and has basically no prose whatsoever
you can't get better models without intense instruction tuning (this is where most of the magic of LLM improvement lays, they haven't changed much of what they do during base model training.) so you will be stuck with what you have for the decades to come unless OAI is willing to burn even more money which shouldn't even be possible
>>
>struggling with mistral small 24B
>have to partially offload to ram making it slow, but at least its less garbage than the 12B models I tried
>See a recommendation for 12B Mag Mell
>It's way better at writing and and staying coherent, on top of fitting fully into vram with plenty to spare
What the fuck
>look at the model source
>Mistral
What the fuck

>>107945446
Wait until you see the absolute state of SDXL
>>
>>107946125
Better models will be obtained by incorporating instruction/reasoning data in the pretraining to a much larger extent than what is usually done (but at what cost?).
>>
>make a PR to do a small manifest fix for a project I'm using
>assaulted by 3 AI bots doing checks on the code, on the PR description and on the title.
>Title is ok, but description is too short! (it's literally 1 line code change, the change is already in the title)
>the pr description also lacks the checkbox list of stuff you need to say you have done!!
>here's a poem about your changes
BROS FUCK AI literally a fucking HUMILIATION RITUAL being dabbed on by these fucking github and coderabbit pieces of shit like WHY. I HATE AI
>>
>>107946151
the only project I saw using that coderabbit garbage was bun and I must say when you look at some of their issues I am flabbergasted by the junior level mistakes in their API design, it was a major turn off
https://github.com/oven-sh/bun/issues/23902
https://github.com/oven-sh/bun/issues/22484
for example (different problems, same cause)
a file object that caches results and never updates them after changes? the fuck?
when you're that kind of subhuman programmer it makes sense you'd think AI tools / vibe coding are great. You are incapable of judging what is good and what is not.
>>
File: llm on standalone xr.png (1.22 MB, 1207x1927)
1.22 MB
1.22 MB PNG
For those interested in fully local standalone XR waifus
>>
>>107944335
the artist is AI
>>
Which web based UI does /lmg/ use?
>>
>>107946244
the meta is to vibecode your own
>>
>AI only copies, those who invented and contributed need to be compensated, preferably every time AI leverage their knowledge kinda like how musicians get a cut each time their song is played.
>>
>>107946399
>kinda like how musicians get a cut each time their song is played
They wish.
>>
>>107946399
Fuck off! Intellectual property is not property. It's cancer. Musician should be paid only for live concerts.
>>
>>107946399
It's a good idea but impossible in practice
>>
>>107946105
Can someone just vibecode it into llama.cpp? Seriously, even if it doesn't get merged for being slop, this is probably gonna be the most practical TTS solution for a while
>>
TTS now has its own Chinese culture moment:
https://github.com/QwenLM/Qwen3-TTS/issues/14#issuecomment-3789452120
>>
called it
>>107941153
>that's gonna get yeeted
>>
>>107946513
>perfectly SFW pic will be deleted because moderation has been a joke for 15 years
yes, we know
>>
>>107946399
Mate you need to be compensating the estates of every family that invented the words you're using, you just copied English.
You also need to go and pay moot for copying greentext
And every deadshit retard on reddit for copying their reflexive opinion on this matter without giving it ten seconds of thought.
>>
>>107946399
Agreed, could be implemented with some tool calls at the end of generation, check if anything is copied and automatically send a payment from a configured wallet.
>>
>>107946513
the trick is to post nintendo loli hentai, which a quick glance on video game boards will tell you is sfw according to the trannies that moderate 4chan
>>
>>107946510
Props to the qwen team for answering questions promptly. I kinda got used to abandoned tts repos that have piles of unresolved issues.
>>
We need a decentralized chan with AI moderation. The holy grail.
>>
>>107946574
>AI moderation
I rember the attempts made itt with some models iirc mixtral and similar and how funny the result were
>>
>>107946523
>>perfectly SFW pic
I am sorry, but, what? do you even know what SFW means? I would absolutely feel ashamed of myself opening this thread in public and people see OP's pic. If I didn't work remote and dared open this shit in an open office room this is grounds for being fired.
>>
>>107946589
A neutrally aligned vision or multi-modal + ruleset would probably be fine.
>>
>>107946503
Be the change you want to see.
>>
>>107946599
I'm sorry you live in Utah or Saudi Arabia then, but words have meanings and that pic was perfectly SFW. Your personal shame is irrelevant to its classification.
>>
>>107946616
Exhausted track runner lolis are ultra lewd, there's a reason all sprint lanes have barriers installed so you can't lay your filthy eyes upon them.
>>
>>107946616
>words have meanings
Yes. SFW stands for SAFE FOR WORK. There is no world in which your fetish fodder with weird focus on giving shape to certain body parts in an otherwise simplistic drawing would be a pass.
You yourself are being a weasel, I do not believe for a second that you would dare open that pic in any public setting.
>>
>>107946629
I am literally in the bus currently please stop projecting your self-hatred and calling a cute pic fetish fodder
>>
>>107946513
>>107946523
I'm the one who pinged the mods, pedophilia is not ok
>>
File: 1642685749889.jpg (9 KB, 315x300)
9 KB
9 KB JPG
>>107946650
hallmonitormaxxing
>>
>>
>>107946644
>I am literally in the bus
that never happened
>>
i want to code a copy of simcopter with a local model, what would you guys recommend for this?
>>
>>107946685
>what would you guys recommend for this?
That you post your fucking specs.
>>
>>107946698
why so rude like this?
>>
>>107946685
SimCopter was one of my favourite games
I love that you could load SimCity 2000 maps and fly in it
>>
>>107941128
>https://techcrunch.com/2026/01/22/inference-startup-inferact-lands-150m-to-commercialize-vllm/
>Inference startup Inferact lands $150M to commercialize vLLM
>The creators of the open source project vLLM have announced that they transitioned the popular tool into a VC-backed startup, Inferact, raising $150 million in seed funding at an $800 million valuation.
>>
File: 1768220662463707.jpg (753 KB, 800x1000)
753 KB
753 KB JPG
Has anyone tested the new QwenTTS?

is it any good?
>>
>>107946870
hell yeah get that bread
>>
>>107946704
Am cried a little...
>>
wait, did the gemini 3 api stop obfuscating its reasoning block? we are so back, our upcoming open chink models aren't just going to distill this slop—they'll nourish on it.
>>
>>107946952
an anon used a method I don't recall a while ago to force gemini 3 to reveal its CoT and they found that newer GLM models write their CoT in a very similar way, meaning the chinks were already siphoning google for scraps
>>
>>107946883
Just read the thread.
>>
>>107947025
Meaning that gemini was trained on GLM cot
>>
>>107946952
DeepSeek-R1-0528 already did this
>>
>>107946952
>wait, did the gemini 3 api stop obfuscating its reasoning block?
how? it's still obfuscated for me
>>
>>107946136
>What the fuck
To be fair, that's an unholy merge of unholy fine tunes, so it's what mistral+nvdia released with its brains slightly scrambled.
>>
>>107946523
It was an image engineered to distract attention and encourage off-topic discussion by focusing on the sweat-drenched, nubile form of a young girl with visible steam arising from her. There are many implications in the image that invite off-topic remarks or posts. The small puddles of liquid on the bench beneath her that may or may not be from the bottle of water she has, how she must smell like after all that running, and what the "2MW" on her bottle means if we assume that it isn't just a direct reference. The attention-grabbing effect of the image is all the more obvious from the calculated composition of it. The slight skindentation on the shorts, the subtle but distinct details on her right armpit and the positioning so it can be seen clearly, the reddish tint and creases on her belly that are prominent enough to be the first thing to be noticed, the translucent parts of her top that showcase just how much she's been perspiring. Alone, these aspects wouldn't conjure up much thought but the combination of it all causes them to amplify each other until the image itself, while technically SFW, ends up being more arousing and intriguing to the average anon than the topic at hand or an actual pornographic image, so it's completely understandable that the mods took it down. It was probably distracting them too.
>>
>>107947194
tl;dr?
>>
>>107947206
Sweaty Rin is off-topic.
>>
>>107947218
I disagree
>>
>>107946870
I thought vLLM was bought by Red Hat.
>>
Has anybody managed to tame glm 4.7 flash yet?
Seems about as smart as the more recent qwen 30BA3B and without sounding like a robot.
But holy shit does it like to force refusals, even after it already responded, some times.
>>
>>107946151
update: my pr was completely rewritten by the maintainer who ignored the bots and closed with a 'ty'.
lol
>>
>>107947194
Model?
>>
File: 1758207320514664.png (2.62 MB, 1148x1668)
2.62 MB
2.62 MB PNG
Does anyone know about the Global Consciousness Project Dot?
https://gcpdot.com/

Someone should make something like that, but instead of using random numbers, it parses news articles, social media and does more complex sentiment analysis with LLMs to measure the overall mood of the world
>>
>>107947779
News are fake so this wouldn't achieve anything.
>>
>>107947779
why do you think the mood of the world has any connection to sensationalist news headlines?
>>
>>107947787
>>107947788
You guys might be enlightened, but do you really think normies reading negative or positive news doesn't affect their mood?
>>
>>107947779
Getting an unbiased sampling of news is impossible. Your sample will be extremely biased towards western news and large coutnries. It won't be "Global Consciousness."
>>
>>107947779
Hasn't this already been done?
>>
>>107947774
DeepSeek V4
>>
>>107947842
Thanks https://huggingface.co/deepseek-ai/DeepSeek-V4
>>
>>107947025
It's actually pretty much identical to how 4.7 does it. It even does the (Self-Correction): thing and the exact same bullet points.
>>
Am I tripping or did OpenAI add streaming to the reasoning summary?
>>
>>107945338
https://github.com/flybirdxx/ComfyUI-Qwen-TTS
>>
>>107947888
>It's actually pretty much identical to how 4.7 does it
it wasn't about flash, the conversation I am mentioning predates flash's release.
>>
I think GLM 4.7 flash still broken
It's slowing down significantly after 4k context, also broken response.
>>
>>107948009
vibecoding was a mistake
>>
>>107948009
Are you using flash attention?
Try disabling that.
And yes, I know there were some flash attention related fixes.
>>
>>107948034
Yeah tried on and off, same result
>>
>>107945389
>Where do you guys usually get samples to train on?
ur mom
>>
>>107947971
I never mentioned Flash nor am I poor enough to touch it
>>
https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B
Anyone tried this?
>>
>The faint scent of her expensive perfume - something clean and citrusy - seemed to sharpen in the air.
Sounds more like dishwashing liquid lol
>>
>>107948110
at least its not ozone
>>
>>107948110
You've never talked with a perfume aficionado have you?
>>
>>107948101
Everybody moved on the next day it was released. People say it's very same-y and ignores your tags. We're now waiting for AceStep 1.5 which should be released in 10-20 days.
>>
File: neblu0q3nt5d1.jpg (565 KB, 4096x4096)
565 KB
565 KB JPG
>>107948110
be thankful she's not a fragrantica foid
>>
>>107948190
Why I am suddenly seeing perfume slop everywhere online?
>>
>>107948284
>>107948284
>>107948284



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.