/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 01/22/26(Thu)12:53:02 No.107941128

/lmg/ - Local Models General Anonymous 01/22/26(Thu)12:53:02 No.107941128 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107931319 & >>107921731

►News
>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115
>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B
>(01/21) VibeVoice-ASR 9B released: https://hf.co/microsoft/VibeVoice-ASR
>(01/21) Step3-VL-10B with Parallel Coordinated Reasoning: https://hf.co/stepfun-ai/Step3-VL-10B
>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/22/26(Thu)12:53:20 No.107941129

Anonymous 01/22/26(Thu)12:53:20 No.107941129

File: 84612250.gif (3.78 MB, 1000x1000)

3.78 MB GIF

►Recent Highlights from the Previous Thread: >>107931319

--Paper: GutenOCR: A Grounded Vision-Language Front-End for Documents:
>107936476 >107936584
--Papers:
>107939591
--Exploring Prime Intellect's inspiration and challenges of community-driven AGI:
>107932226 >107932384 >107932592 >107934090 >107934103 >107934163 >107934641 >107934328 >107934382 >107934885 >107936047 >107936711 >107934388 >107934573 >107934632 >107934638 >107934991 >107934273
--Tool calling and interface challenges:
>107935731 >107936039 >107937743 >107937832 >107937968 >107938061 >107938136 >107938165 >107938620
--Qwen3-TTS open-source release and feedback on voice quality:
>107939466 >107939503 >107939517 >107939569 >107939899 107939953 >107940080 >107939547 >107939570 >107939652
--GLM Flash instability and dense vs. MoE debates:
>107931679 >107932773 >107932884 >107932904 >107932938 >107933038 >107933064 >107933294 >107932798 >107932877
--Skepticism about Step3-VL-10B's claims and PaCoRe's practical implementation:
>107931672 >107931686 >107931709 >107931744 >107931816 >107931841 >107931862
--Microsoft VibeVoice-ASR as Whisper competitor:
>107932671 >107932696 >107932707
--Skepticism about LLM optimization progress amid scaling challenges and FOSS concerns:
>107932593 >107932685 >107932889 >107938290 >107938365 >107938374 >107938384 >107938419 >107938430 >107938474
--OpenAI's potential decline and its impact on AI research:
>107937322 >107937363 >107937377 >107937387 >107937612
--Feasibility of full-stack mobile development vs remote solutions:
>107931385 >107931401 >107931460 >107931466 >107931544 >107931546
--ggml-cpu prompt processing speed boost via tiled FA optimization:
>107939144 >107939456
--Inquiry about Chroma-4B TTS model testing:
>107934132
--Miku (free space):
>107932314 >107932860 >107935545 >107937903 >107938061 >107939322 >107939345 >107939868

►Recent Highlight Posts from the Previous Thread: >>107931323

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/22/26(Thu)12:55:17 No.107941151

Anonymous 01/22/26(Thu)12:55:17 No.107941151

This has loads of potential

https://vocaroo.com/1668JWsCjKjq

Anonymous
01/22/26(Thu)12:57:00 No.107941163

Anonymous 01/22/26(Thu)12:57:00 No.107941163

sex with piss-haired miku

Anonymous
01/22/26(Thu)13:00:05 No.107941186

Anonymous 01/22/26(Thu)13:00:05 No.107941186

(local) lolisex

Anonymous
01/22/26(Thu)13:03:24 No.107941209

Anonymous 01/22/26(Thu)13:03:24 No.107941209

things are about to get wild for local

Anonymous
01/22/26(Thu)13:03:33 No.107941211

Anonymous 01/22/26(Thu)13:03:33 No.107941211

File: 1763066570866113.jpg (345 KB, 1920x1080)

345 KB JPG

>>107941128

Anonymous
01/22/26(Thu)13:10:39 No.107941249

Anonymous 01/22/26(Thu)13:10:39 No.107941249

>>107941128
So what do I run on my 5090?

Anonymous
01/22/26(Thu)13:11:41 No.107941259

Anonymous 01/22/26(Thu)13:11:41 No.107941259

>>107941249
nemo at bf16

Anonymous
01/22/26(Thu)13:11:46 No.107941261

Anonymous 01/22/26(Thu)13:11:46 No.107941261

>>107941151
I love China.

Anonymous
01/22/26(Thu)13:12:03 No.107941266

Anonymous 01/22/26(Thu)13:12:03 No.107941266

wangs llama 4.1 will blow your mind

Anonymous
01/22/26(Thu)13:12:06 No.107941267

Anonymous 01/22/26(Thu)13:12:06 No.107941267

>>107941128
Huffing Rin-chan's armpits.

Anonymous
01/22/26(Thu)13:12:46 No.107941272

Anonymous 01/22/26(Thu)13:12:46 No.107941272

>>107941259
PLACEBO

Anonymous
01/22/26(Thu)13:12:47 No.107941273

Anonymous 01/22/26(Thu)13:12:47 No.107941273

>>107941249
>https://huggingface.co/mistralai/Mixtral-8x7B-v0.1

Anonymous
01/22/26(Thu)13:13:48 No.107941282

Anonymous 01/22/26(Thu)13:13:48 No.107941282

>>107941279
out of 10!

Anonymous
01/22/26(Thu)13:18:01 No.107941318

Anonymous 01/22/26(Thu)13:18:01 No.107941318

The middle is mid but that's really good. It just needs the option to target specific parts of a sentence with style descriptions.

https://vocaroo.com/1jZiPc0wwRg6

Anonymous
01/22/26(Thu)13:22:16 No.107941347

Anonymous 01/22/26(Thu)13:22:16 No.107941347

>>107941318
I'm still building flash attention...
Which version are you testing? How much vram it takes?

Anonymous
01/22/26(Thu)13:27:33 No.107941377

Anonymous 01/22/26(Thu)13:27:33 No.107941377

>>107941347
Get prebuilt wheels
https://github.com/mjun0812/flash-attention-prebuild-wheels/releases

Anonymous
01/22/26(Thu)13:28:21 No.107941381

Anonymous 01/22/26(Thu)13:28:21 No.107941381

mikutroons are disgusting

Anonymous
01/22/26(Thu)13:29:04 No.107941387

Anonymous 01/22/26(Thu)13:29:04 No.107941387

>>107941128
tell me how to poison LLM's.

Anonymous
01/22/26(Thu)13:30:06 No.107941395

Anonymous 01/22/26(Thu)13:30:06 No.107941395

File: chbhalf.webm (1.81 MB, 960x540)

1.81 MB WEBM

>>107941128
>>107941129
>>107941211

Anonymous
01/22/26(Thu)13:30:23 No.107941396

Anonymous 01/22/26(Thu)13:30:23 No.107941396

>>107941377
What is a wheel? I ain't installing random virus shit.

Anonymous
01/22/26(Thu)13:30:26 No.107941397

Anonymous 01/22/26(Thu)13:30:26 No.107941397

>>107941387
In what context?

Anonymous
01/22/26(Thu)13:33:01 No.107941410

Anonymous 01/22/26(Thu)13:33:01 No.107941410

>>107941347
VoiceDesign. 10GB

Anonymous
01/22/26(Thu)13:33:16 No.107941411

Anonymous 01/22/26(Thu)13:33:16 No.107941411

>>107941396
This post has to be bait right

Surely there's no way that someone who is compiling flash attention doesn't know wtf a wheel is, and is also too inept to fucking look it up or ask an LLM?

Anonymous
01/22/26(Thu)13:33:44 No.107941415

Anonymous 01/22/26(Thu)13:33:44 No.107941415

>>107941397
I want to poison them from crawling my websites. poison them by asking questions.

Anonymous
01/22/26(Thu)13:34:15 No.107941419

Anonymous 01/22/26(Thu)13:34:15 No.107941419

>>107941377
>torch 2.10
Is it good to upgrade?

Anonymous
01/22/26(Thu)13:34:19 No.107941422

Anonymous 01/22/26(Thu)13:34:19 No.107941422

>>107941410
That's steep.

Anonymous
01/22/26(Thu)13:36:30 No.107941445

Anonymous 01/22/26(Thu)13:36:30 No.107941445

File: 2026-01-22_19-35-43.png (7 KB, 142x158)

7 KB PNG

>>107941426
a peeve of this file

Anonymous
01/22/26(Thu)13:37:25 No.107941454

Anonymous 01/22/26(Thu)13:37:25 No.107941454

>>107941410
The bigger model? That better not be the 0.6B

Anonymous
01/22/26(Thu)13:42:08 No.107941494

Anonymous 01/22/26(Thu)13:42:08 No.107941494

>>107941454
Yes.

Anonymous
01/22/26(Thu)13:43:21 No.107941502

Anonymous 01/22/26(Thu)13:43:21 No.107941502

>>107941415
You're going to have to clarify what you mean by "poison", and also what kind of behavior you're trying to block.

Like, you want to stop your site from being scraped to be included into training data sets? Or you want to prevent LLMs from looking up your content at runtime with web search tools?

Anonymous
01/22/26(Thu)13:46:06 No.107941515

Anonymous 01/22/26(Thu)13:46:06 No.107941515

>>107941502
I want to insert (poison) data into whatever they are scraping from my websites (not images) just articles and files.

Anonymous
01/22/26(Thu)13:47:07 No.107941522

Anonymous 01/22/26(Thu)13:47:07 No.107941522

>>107941515
anon, you did not answer either of the two questions

Anonymous
01/22/26(Thu)13:49:28 No.107941529

Anonymous 01/22/26(Thu)13:49:28 No.107941529

>>107941522
because he has no idea what he's talking about

Anonymous
01/22/26(Thu)13:50:09 No.107941534

Anonymous 01/22/26(Thu)13:50:09 No.107941534

>>107941515
You can add a bunch of hidden text with instructions and shit like that in the middle of the normal text I guess. Schizo ramblings and the like but nothing that could get you in trouble.
I imagine that the more intelligent scrapers will be able to remove/ignore stuff hidden with CSS, but still.
It's mostly a fools errand though.

Anonymous
01/22/26(Thu)13:54:03 No.107941557

Anonymous 01/22/26(Thu)13:54:03 No.107941557

>>107941279
>suddenly

Anonymous
01/22/26(Thu)13:54:08 No.107941559

Anonymous 01/22/26(Thu)13:54:08 No.107941559

File: dd802852-9cf2-454d-853d-2(...).jpg (78 KB, 800x800)

78 KB JPG

Basically I'm just not going to use it (Local LLMs). Ugh, I know. hahaha. It's just I'm not going to use local LLMs is all.

Anonymous
01/22/26(Thu)13:54:59 No.107941567

Anonymous 01/22/26(Thu)13:54:59 No.107941567

What's the best tool to finetune a small LORA?

Anonymous
01/22/26(Thu)13:57:02 No.107941578

Anonymous 01/22/26(Thu)13:57:02 No.107941578

>>107941567
Daniel's script.

Anonymous
01/22/26(Thu)14:01:39 No.107941611

Anonymous 01/22/26(Thu)14:01:39 No.107941611

>>107941578
not touching unSLOP
was just checking to see if there's something better

Anonymous
01/22/26(Thu)14:03:18 No.107941623

Anonymous 01/22/26(Thu)14:03:18 No.107941623

File: 20220425_124328.jpg (144 KB, 1080x1260)

144 KB JPG

How flexible is qwen3-tts? Can you mimic beta version women?

Anonymous
01/22/26(Thu)14:05:59 No.107941644

Anonymous 01/22/26(Thu)14:05:59 No.107941644

what online llm provider is best? is the chatbot arena or lm arena good they have free direct chat. also dont tell me local model i only have 24gb vram and local ones are vastly inferior to corpo ones unless you use 300b chinese ones. also corpo ones are more based and uncensored nothing local comes close to grok 4.1 or even gemini 3.

Anonymous
01/22/26(Thu)14:08:30 No.107941667

Anonymous 01/22/26(Thu)14:08:30 No.107941667

>>107941644
novelai

Anonymous
01/22/26(Thu)14:10:28 No.107941683

Anonymous 01/22/26(Thu)14:10:28 No.107941683

>>107941623
Hot?

Anonymous
01/22/26(Thu)14:13:02 No.107941707

Anonymous 01/22/26(Thu)14:13:02 No.107941707

open source palmyra x5 when?

Anonymous
01/22/26(Thu)14:13:49 No.107941712

Anonymous 01/22/26(Thu)14:13:49 No.107941712

>>107941667
thanks for the gold kind sir!

Anonymous
01/22/26(Thu)14:19:51 No.107941756

Anonymous 01/22/26(Thu)14:19:51 No.107941756

>>107941377
Thanks. I got tired of waiting for it to build and got the wheel.
https://voca.ro/1iv8HnvYexq6

Anonymous
01/22/26(Thu)14:20:44 No.107941766

Anonymous 01/22/26(Thu)14:20:44 No.107941766

Qwen didn't provide any example code for streaming audio in real-time from their Qwen3-TTS model even though they have a streaming (realtime) mode in their API.

Anonymous
01/22/26(Thu)14:21:45 No.107941775

Anonymous 01/22/26(Thu)14:21:45 No.107941775

I'll wait for pure c++ implementation

Anonymous
01/22/26(Thu)14:23:25 No.107941786

Anonymous 01/22/26(Thu)14:23:25 No.107941786

File: fda.png (190 KB, 426x266)

190 KB PNG

>>107941756

Anonymous
01/22/26(Thu)14:25:02 No.107941800

Anonymous 01/22/26(Thu)14:25:02 No.107941800

llama cheese pee pee, ollama or lm studio?

Anonymous
01/22/26(Thu)14:26:52 No.107941811

Anonymous 01/22/26(Thu)14:26:52 No.107941811

>>107941644
openrouter

Anonymous
01/22/26(Thu)14:32:28 No.107941855

Anonymous 01/22/26(Thu)14:32:28 No.107941855

https://voca.ro/1kD7mvFCtuVJ
If you try, you can get something like moans. A little finetuning on eroge audio should make it fully usable.

Anonymous
01/22/26(Thu)14:35:16 No.107941877

Anonymous 01/22/26(Thu)14:35:16 No.107941877

>>107941855
neuron activation

Anonymous
01/22/26(Thu)14:36:32 No.107941888

Anonymous 01/22/26(Thu)14:36:32 No.107941888

>>107941826
https://voca.ro/1jX9b8Y3gSuo

Anonymous
01/22/26(Thu)14:40:13 No.107941914

Anonymous 01/22/26(Thu)14:40:13 No.107941914

>>107941800
exllama v3

Anonymous
01/22/26(Thu)14:45:34 No.107941955

Anonymous 01/22/26(Thu)14:45:34 No.107941955

File: 1753916101231530.jpg (51 KB, 640x480)

51 KB JPG

The qwen-tts finetuning process looks weird to me. What's the point of having a single reference audio mixed with the training data? Wouldn't that make generalization harder?
https://github.com/QwenLM/Qwen3-TTS/tree/main/finetuning

Anonymous
01/22/26(Thu)14:47:30 No.107941977

Anonymous 01/22/26(Thu)14:47:30 No.107941977

>>107941855
indistinguishable from vnslop

Anonymous
01/22/26(Thu)14:50:14 No.107941999

Anonymous 01/22/26(Thu)14:50:14 No.107941999

>>107941955
No word about dataset size?

Anonymous
01/22/26(Thu)14:51:33 No.107942013

Anonymous 01/22/26(Thu)14:51:33 No.107942013

>>107941955
What is "reference audio" even supposed to be?

Anonymous
01/22/26(Thu)14:55:12 No.107942038

Anonymous 01/22/26(Thu)14:55:12 No.107942038

>>107941914
Still no glm flash support

Anonymous
01/22/26(Thu)14:55:43 No.107942042

Anonymous 01/22/26(Thu)14:55:43 No.107942042

>>107941955
I suppose they see finetuning as a way to get only one specific voice rather than general finetuning.

Anonymous
01/22/26(Thu)14:57:26 No.107942055

Anonymous 01/22/26(Thu)14:57:26 No.107942055

>>107942013
reference audio is used for 0-shot voice cloning, it's just a small sample of the voice you want to copy (often ~10s)

Anonymous
01/22/26(Thu)14:59:00 No.107942069

Anonymous 01/22/26(Thu)14:59:00 No.107942069

drummer is a mikutroon

Anonymous
01/22/26(Thu)14:59:08 No.107942070

Anonymous 01/22/26(Thu)14:59:08 No.107942070

>>107941545
It's one thing to fantasize about it when you're 12 versus pushing the balding age of 48, anon.

Anonymous
01/22/26(Thu)14:59:59 No.107942074

Anonymous 01/22/26(Thu)14:59:59 No.107942074

>>107942069
this, but without miku

Anonymous
01/22/26(Thu)15:02:00 No.107942090

Anonymous 01/22/26(Thu)15:02:00 No.107942090

https://voca.ro/18Aq0IXPqvz4
Something is strange about the demo app. Generation is slower than realtime, but gpu never goes over 21% utilization. Also, 1.7B VoiceDesign takes only 5.5GB vram here. Pretty good. 0.6B must be small enough to run in background all the time.

Anonymous
01/22/26(Thu)15:02:13 No.107942092

Anonymous 01/22/26(Thu)15:02:13 No.107942092

>>107942070
Being an adult also means you don't have to care about retards trying to police your thoughts.

Anonymous
01/22/26(Thu)15:05:20 No.107942121

Anonymous 01/22/26(Thu)15:05:20 No.107942121

>>107942090
!hag jumpscare!

Anonymous
01/22/26(Thu)15:12:33 No.107942177

Anonymous 01/22/26(Thu)15:12:33 No.107942177

>>107942092
you are replying to a brat living with his parents

Anonymous
01/22/26(Thu)15:19:48 No.107942233

Anonymous 01/22/26(Thu)15:19:48 No.107942233

>>107942070
>>107941545
>when you're 12
What sort of freak were you to romance girls when you are fucking 12?

Anonymous
01/22/26(Thu)15:21:34 No.107942245

Anonymous 01/22/26(Thu)15:21:34 No.107942245

>>107942233
anon I had a girlfriend in 3rd grade what the fuck is wrong with you

Anonymous
01/22/26(Thu)15:23:08 No.107942256

Anonymous 01/22/26(Thu)15:23:08 No.107942256

>>107942233
mutt moment

Anonymous
01/22/26(Thu)15:23:47 No.107942262

Anonymous 01/22/26(Thu)15:23:47 No.107942262

>>107942245
girls have cooties when you are 12. they have coochies later on.

Anonymous
01/22/26(Thu)15:24:18 No.107942270

Anonymous 01/22/26(Thu)15:24:18 No.107942270

>>107942233
Most boys start talking about girlfriends around 10 or so.

Anonymous
01/22/26(Thu)15:24:52 No.107942277

Anonymous 01/22/26(Thu)15:24:52 No.107942277

>>107942233
anon...

Anonymous
01/22/26(Thu)15:24:59 No.107942278

Anonymous 01/22/26(Thu)15:24:59 No.107942278

>>107942233
anon, you don't even need to romance girls. just go check out tons of videos where 12 yo girls bragging about their body count.

Anonymous
01/22/26(Thu)15:25:21 No.107942282

Anonymous 01/22/26(Thu)15:25:21 No.107942282

>>107942262
I forgot westerners are mentally ill and hate women

Anonymous
01/22/26(Thu)15:25:42 No.107942284

Anonymous 01/22/26(Thu)15:25:42 No.107942284

As someone who mainly cares about cloning and prosody: echo-tts mogs.
The voice description/instruction stuff is cool though.

Anonymous
01/22/26(Thu)15:26:55 No.107942294

Anonymous 01/22/26(Thu)15:26:55 No.107942294

>>107942284
>echo-tts
That's not gptsovits though

Anonymous
01/22/26(Thu)15:28:29 No.107942305

Anonymous 01/22/26(Thu)15:28:29 No.107942305

>>107942294
The only thing better about sovits is that it's faster, it did used to be my tts of choice though

Anonymous
01/22/26(Thu)15:30:38 No.107942316

Anonymous 01/22/26(Thu)15:30:38 No.107942316

local muslims general

Anonymous
01/22/26(Thu)15:30:44 No.107942317

Anonymous 01/22/26(Thu)15:30:44 No.107942317

>>107942284
Echo-tts is nice, but it's larger and it's English only.
Also, speaking of voice description: it works only for the first few seconds, the longer your prompt the farther it drifts away. Its main use is to create a voice for cloning with the base model.

Anonymous
01/22/26(Thu)15:30:59 No.107942320

Anonymous 01/22/26(Thu)15:30:59 No.107942320

File: audiofag.jpg (786 KB, 3228x2145)

786 KB JPG

So how big of a dataset do you need to tune the qwen tts? Twenty 10 seconds samples enough or does it need way more?

Anonymous
01/22/26(Thu)15:32:45 No.107942341

Anonymous 01/22/26(Thu)15:32:45 No.107942341

File: Miku-26.jpg (174 KB, 512x768)

174 KB JPG

Anyone out there running larger workloads on an older gen DD4 EPYC like rome? What's the inference t/s on something like Q4 DS3?
Comfy 80's style Miku offering attached

Anonymous
01/22/26(Thu)15:34:08 No.107942351

Anonymous 01/22/26(Thu)15:34:08 No.107942351

>>107942320
Why do I hate this man despite knowing nothing about him?

Anonymous
01/22/26(Thu)15:34:26 No.107942353

Anonymous 01/22/26(Thu)15:34:26 No.107942353

what is the best potato sized tts model that can run on cpu. i'm talking <1gb, ideally under 500 megs. i'm building something.

Anonymous
01/22/26(Thu)15:34:55 No.107942360

Anonymous 01/22/26(Thu)15:34:55 No.107942360

>>107942177
Most /g/ users are brats.

Anonymous
01/22/26(Thu)15:35:10 No.107942364

Anonymous 01/22/26(Thu)15:35:10 No.107942364

>>107942353
pocket? hasn't it come out two days ago or so?

Anonymous
01/22/26(Thu)15:35:39 No.107942366

Anonymous 01/22/26(Thu)15:35:39 No.107942366

>>107942341
i do. i get around 3t/s after offloading to my dual 5090s. the performance is not great and the prices are obviously terrible right now.

Anonymous
01/22/26(Thu)15:35:49 No.107942367

Anonymous 01/22/26(Thu)15:35:49 No.107942367

>>107942320
No one has tried finetuning yet.

Anonymous
01/22/26(Thu)15:36:18 No.107942373

Anonymous 01/22/26(Thu)15:36:18 No.107942373

File: file.png (3 KB, 49x44)

3 KB PNG

>>107942351
might be this

Anonymous
01/22/26(Thu)15:36:22 No.107942375

Anonymous 01/22/26(Thu)15:36:22 No.107942375

>>107942365
It's still not schizo-friendly, come back later

Anonymous
01/22/26(Thu)15:37:02 No.107942376

Anonymous 01/22/26(Thu)15:37:02 No.107942376

>>107942375
mikutroons are schizos though

Anonymous
01/22/26(Thu)15:37:26 No.107942380

Anonymous 01/22/26(Thu)15:37:26 No.107942380

>>107941855
Can it do blowjob noises if you give it a bunch of ちゅぱちゅぱ text?

Anonymous
01/22/26(Thu)15:37:30 No.107942382

Anonymous 01/22/26(Thu)15:37:30 No.107942382

>>107942364
thanks i forgot to bookmark this and forgot its name. i was hoping someone would mention it.

Anonymous
01/22/26(Thu)15:42:50 No.107942426

Anonymous 01/22/26(Thu)15:42:50 No.107942426

Is hunyuan video 1.5 still the best for generating actually realistic and good looking human females?

Anonymous
01/22/26(Thu)15:43:58 No.107942441

Anonymous 01/22/26(Thu)15:43:58 No.107942441

Why are people here fapping to tts models that have an rtf of >5 or worse?
Does your brain process 0.2 tokens per second, or what's the secret? Seriously.

Anonymous
01/22/26(Thu)15:44:26 No.107942447

Anonymous 01/22/26(Thu)15:44:26 No.107942447

File: Miku-27.jpg (204 KB, 512x768)

204 KB JPG

>>107942366
Thanks. Still looking like it might be worthwhile due to a stash of old RAM I've got and the difference in power draw vs my 15 year old opteron
Wish I'd bought the ewaste before it became a hot commodity tho

Anonymous
01/22/26(Thu)15:48:52 No.107942476

Anonymous 01/22/26(Thu)15:48:52 No.107942476

>>107942353
chatterbox turbo at q4 should be around 1g.

Anonymous
01/22/26(Thu)15:50:02 No.107942479

Anonymous 01/22/26(Thu)15:50:02 No.107942479

>>107942380
No. I tried telling it to speak with slurping, squelching, make wet sounds, but nothing worked.
https://voca.ro/15jj3JMO7nR8

Anonymous
01/22/26(Thu)15:51:03 No.107942485

Anonymous 01/22/26(Thu)15:51:03 No.107942485

>>107942479
imagine not training your dataset on audio samples ripped from visual novels

Anonymous
01/22/26(Thu)15:54:30 No.107942501

Anonymous 01/22/26(Thu)15:54:30 No.107942501

>>107942479
Why would a speech model make "slurping, squelching, wet sounds"?

Anonymous
01/22/26(Thu)15:55:05 No.107942505

Anonymous 01/22/26(Thu)15:55:05 No.107942505

>>107942441
by not streaming
something that sounds like a real person is infinitely more coomable than something that sounds like a telemarketer bot no matter how little you have to wait for the latter

Anonymous
01/22/26(Thu)15:55:38 No.107942510

Anonymous 01/22/26(Thu)15:55:38 No.107942510

What's the retard one button way to run Qwen 3 locally

Anonymous
01/22/26(Thu)15:56:03 No.107942513

Anonymous 01/22/26(Thu)15:56:03 No.107942513

>>107942479
Can it do onomatopeia at all?

Anonymous
01/22/26(Thu)15:57:04 No.107942524

Anonymous 01/22/26(Thu)15:57:04 No.107942524

>>107942510
kobold

Anonymous
01/22/26(Thu)15:57:50 No.107942529

Anonymous 01/22/26(Thu)15:57:50 No.107942529

>>107942524
I meant qwen 3 tts, does kobold just run it?

Anonymous
01/22/26(Thu)15:58:00 No.107942531

Anonymous 01/22/26(Thu)15:58:00 No.107942531

>>107942441
With streaming you can run any tts chained to a llm with a latency of <1s

Anonymous
01/22/26(Thu)15:58:31 No.107942534

Anonymous 01/22/26(Thu)15:58:31 No.107942534

>>107942510
https://github.com/QwenLM/Qwen3-TTS?tab=readme-ov-file#environment-setup
then
https://github.com/QwenLM/Qwen3-TTS?tab=readme-ov-file#launch-local-web-ui-demo
Qwen is probably the easiest tts to get running.

Anonymous
01/22/26(Thu)16:01:58 No.107942557

Anonymous 01/22/26(Thu)16:01:58 No.107942557

>>107942513
As far as I can tell, it can only "read" it. The rest depends on how close lexical representation is to the actual sound.

Anonymous
01/22/26(Thu)16:13:14 No.107942630

Anonymous 01/22/26(Thu)16:13:14 No.107942630

>>107942534
I haet conda, not installing this shit on my linux pc

Anonymous
01/22/26(Thu)16:13:25 No.107942636

Anonymous 01/22/26(Thu)16:13:25 No.107942636

File: 1768991819778784.png (263 KB, 657x375)

263 KB PNG

>>107942367
I did and it sucks dick, I guess gptsovits isnt going away any time soon, hell that one jeet's TTS is better. Not to mention the training script is broken until you fix the copytree call trying to use the HF repo path instead of where the files actually are
The only worthwhile aspect is maybe the voice design model

Anonymous
01/22/26(Thu)16:13:38 No.107942641

Anonymous 01/22/26(Thu)16:13:38 No.107942641

>>107942534
Yeah, I will wait for ggufs

Anonymous
01/22/26(Thu)16:17:53 No.107942670

Anonymous 01/22/26(Thu)16:17:53 No.107942670

>>107942630
Use uv. It's absolutely retarded to have a copy of a full venv in each proejct folder, but it works far better than conda at least.

Anonymous
01/22/26(Thu)16:18:05 No.107942671

Anonymous 01/22/26(Thu)16:18:05 No.107942671

>>107942630
Conda isn't required. I'm running it in venv.

Anonymous
01/22/26(Thu)16:18:43 No.107942679

Anonymous 01/22/26(Thu)16:18:43 No.107942679

File: 1738830131563719.jpg (42 KB, 430x437)

42 KB JPG

Is there a clean way to use (parts of) old Claude/GPT jailbreaks on a local?
I'm nostalgic for some of the fun shitposty ones like bloatmax but don't want to wire it all up manually

Anonymous
01/22/26(Thu)16:18:50 No.107942681

Anonymous 01/22/26(Thu)16:18:50 No.107942681

>>107942670
it's a hardlink

Anonymous
01/22/26(Thu)16:20:14 No.107942698

Anonymous 01/22/26(Thu)16:20:14 No.107942698

>>107942636
What exactly sucks about it? Does it not copy the voice? Turns into gibberish?

Anonymous
01/22/26(Thu)16:22:58 No.107942725

Anonymous 01/22/26(Thu)16:22:58 No.107942725

>>107942679
I'm not too familiar but those are just ST presets right? you can just use your local chat completions endpoint and they should work as is, no?

Anonymous
01/22/26(Thu)16:23:06 No.107942728

Anonymous 01/22/26(Thu)16:23:06 No.107942728

>>107942681
Good to know, don't need to feel guilty about using it now.

Anonymous
01/22/26(Thu)16:23:37 No.107942730

Anonymous 01/22/26(Thu)16:23:37 No.107942730

How are these TTS models? I was thinking of turning some heavy text books on various topics like history or technology into audiobooks so I can listen to them while reading them. Is this possible on local hardware with only 8gb vram?

Anonymous
01/22/26(Thu)16:24:12 No.107942734

Anonymous 01/22/26(Thu)16:24:12 No.107942734

>>107942700
ok

Anonymous
01/22/26(Thu)16:25:03 No.107942740

Anonymous 01/22/26(Thu)16:25:03 No.107942740

>>107942698
First of all voice clone seems to shit itself with non English ref prompts
Then if you have a cloned voice that's decent you can't use the CustomVoice with it for style instructions, so you have to finetune
So then you finetune and it comes out as garbage, I'm not sure if it's related to length, I tried low and high epoch count and batch size, just comes out retarded whereas the same dataset works great with sovits

Basically just a waste of time

Anonymous
01/22/26(Thu)16:28:27 No.107942758

Anonymous 01/22/26(Thu)16:28:27 No.107942758

>>107942447
I had a 28-core/56-thread Xeon Platninum setup with 512GB of DDR4 last year. It could run a q4 of Deepseek R1, but not at a speed you'd want to interact with.

Anonymous
01/22/26(Thu)16:28:47 No.107942761

Anonymous 01/22/26(Thu)16:28:47 No.107942761

>>107942730
Yes. TTS models tend to small and you could probably fit nearly any of them in 8 GB. Try kokoro, echo-tts, or chatterbox.

Anonymous
01/22/26(Thu)16:29:41 No.107942769

Anonymous 01/22/26(Thu)16:29:41 No.107942769

>>107942730
8gb is enough for smaller tts up to about 2B. But not enough for VibeVoice 7B. If you want to read audiobooks, which are very long, you need to find scripts or vibecode your own to chunk text and merge audio files because most tts meant to gen only shot clips up to 30 seconds.

Anonymous
01/22/26(Thu)16:43:20 No.107942872

Anonymous 01/22/26(Thu)16:43:20 No.107942872

>>107941128
Qwen3 TTS? How does it compare to https://github.com/resemble-ai/chatterbox?tab=readme-ov-file

Anonymous
01/22/26(Thu)16:48:52 No.107942904

Anonymous 01/22/26(Thu)16:48:52 No.107942904

File: 1763573822506022.png (343 KB, 602x603)

343 KB PNG

>>107942700
>that
>pretending to be a personal trainer

Anonymous
01/22/26(Thu)16:49:55 No.107942910

Anonymous 01/22/26(Thu)16:49:55 No.107942910

>>107942904
have you never read doujins? that's peak physique

Anonymous
01/22/26(Thu)16:50:51 No.107942919

Anonymous 01/22/26(Thu)16:50:51 No.107942919

>>107942700
Are they prosecuting him for being too short or what?

Anonymous
01/22/26(Thu)16:52:06 No.107942927

Anonymous 01/22/26(Thu)16:52:06 No.107942927

>>107942919
for being not rich enough to get away with it

Anonymous
01/22/26(Thu)16:53:34 No.107942939

Anonymous 01/22/26(Thu)16:53:34 No.107942939

>>107942910
He's got the Unholy Maiden build

Anonymous
01/22/26(Thu)16:59:25 No.107942977

Anonymous 01/22/26(Thu)16:59:25 No.107942977

https://github.com/ggml-org/llama.cpp/pull/19024
nvidia jeet got told off, priceless

Anonymous
01/22/26(Thu)17:02:55 No.107943006

Anonymous 01/22/26(Thu)17:02:55 No.107943006

>>107942977
lmao kek

Anonymous
01/22/26(Thu)17:03:19 No.107943010

Anonymous 01/22/26(Thu)17:03:19 No.107943010

>>107942977
Can't they just buy llamacpp?

Anonymous
01/22/26(Thu)17:03:56 No.107943015

Anonymous 01/22/26(Thu)17:03:56 No.107943015

whats the minimum a model should have in terms of tokens and context to be able to tool call in things like opencode or claude?

Anonymous
01/22/26(Thu)17:04:51 No.107943021

Anonymous 01/22/26(Thu)17:04:51 No.107943021

>>107942977
nice :rocket:

Anonymous
01/22/26(Thu)17:05:18 No.107943023

Anonymous 01/22/26(Thu)17:05:18 No.107943023

>>107943010
Why buy something maintained for free?

Anonymous
01/22/26(Thu)17:06:24 No.107943036

Anonymous 01/22/26(Thu)17:06:24 No.107943036

>>107942977
>subscription
How does that work if nvidia is ok with llama.cpp support? Can't you just download it and upload the weights to HF? Isn't that what would happen if it gets goofed anyways?

Anonymous
01/22/26(Thu)17:07:13 No.107943046

Anonymous 01/22/26(Thu)17:07:13 No.107943046

File: Screenshot 2026-01-22 at (...).png (230 KB, 869x1995)

230 KB PNG

>>107942977
God damn son.
Dayum.

Anonymous
01/22/26(Thu)17:08:21 No.107943054

Anonymous 01/22/26(Thu)17:08:21 No.107943054

>>107943046
>>107942977
based, fuck NVidia

Anonymous
01/22/26(Thu)17:09:41 No.107943063

Anonymous 01/22/26(Thu)17:09:41 No.107943063

>>107943036
y'know, usually at the big company like this there's a huge disconnect between engineering, marketing and sales
someone down the chain thinks they can make a few bucks not realizing that they are in fact fucking everything up

Anonymous
01/22/26(Thu)17:10:00 No.107943068

Anonymous 01/22/26(Thu)17:10:00 No.107943068

>>107943015
What do you mean tokens? t/s? It depends on your patience. Context isn't related to the ability to call tools.
Anyway, I haven't tried it myself, but vibecoders say that toss-20b and qwen-30b-a3b are okay-ish for agentic coding. toss-20b fits into 16gb at native mxfp4 and it's ridiculously fast.

Anonymous
01/22/26(Thu)17:10:20 No.107943070

Anonymous 01/22/26(Thu)17:10:20 No.107943070

>>107943054
Jeff Bolz is alright.

Anonymous
01/22/26(Thu)17:11:05 No.107943081

Anonymous 01/22/26(Thu)17:11:05 No.107943081

>>107942927
He should run for president.

Anonymous
01/22/26(Thu)17:11:29 No.107943086

Anonymous 01/22/26(Thu)17:11:29 No.107943086

>>107942977
>Anav Prasad
saaar kindly merge the nemotron commit, ngbenchodson

Anonymous
01/22/26(Thu)17:12:56 No.107943094

Anonymous 01/22/26(Thu)17:12:56 No.107943094

>>107943046
>enablement

Anonymous
01/22/26(Thu)17:13:07 No.107943095

Anonymous 01/22/26(Thu)17:13:07 No.107943095

Qwen tts actually hype?

Anonymous
01/22/26(Thu)17:13:49 No.107943098

Anonymous 01/22/26(Thu)17:13:49 No.107943098

>>107943068
I o ly have 24gb of ram and a 1660ti. It's over for me

Anonymous
01/22/26(Thu)17:14:47 No.107943109

Anonymous 01/22/26(Thu)17:14:47 No.107943109

>>107943046
the grand inquisitor's reign of terror continues

Anonymous
01/22/26(Thu)17:18:39 No.107943138

Anonymous 01/22/26(Thu)17:18:39 No.107943138

>>107942977
My fuarking hero

Anonymous
01/22/26(Thu)17:21:00 No.107943156

Anonymous 01/22/26(Thu)17:21:00 No.107943156

Hello guys,

Why "LTX-2: Load Latent Upscale Model" node doesn't fetch the upscaler files that are present in the right folder?

Comfyui UI startup logs:

E:\ComfyUI_windows_portable_nvidia>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --listen
[...]
Adding extra search path loras D:\AI\models\loras
Adding extra search path upscale_models D:\AI\models\upscale_models
Adding extra search path upscale_models D:\AI\models\latent_upscale_models
[...]

Folder structure:

D:\AI
───models
    ───audio_encoders
    ───checkpoints
           ltx-2-19b-dev-fp8.safetensors
           ltx-2-19b-distilled-lora-384.safetensors
    
    ───clip_vision
    ───configs
    ───controlnet
    ───diffusion_models
    ───embeddings
    ───latent_upscale_models
           ltx-2-spatial-upscaler-x2-1.0.safetensors
           ltx-2-temporal-upscaler-x2-1.0.safetensors
    
    ───loras
          ltx-2-19b-ic-lora-depth-control.safetensors
          ltx-2-19b-ic-lora-detailer.safetensors
          ltx-2-19b-ic-lora-pose-control.safetensors
          ltx-2-19b-lora-camera-control-dolly-in.safetensors
          ltx-2-19b-lora-camera-control-dolly-left.safetensors
          ltx-2-19b-lora-camera-control-dolly-right.safetensors
          ltx-2-19b-lora-camera-control-jib-down.safetensors
          ltx-2-19b-lora-camera-control-jib-up.safetensors
[...]

Anonymous
01/22/26(Thu)17:24:26 No.107943183

Anonymous 01/22/26(Thu)17:24:26 No.107943183

>>107943095
Voice designer is kinda fun, but it's a designer. You won't use it for a consistent character since every generation gets a different voice.
For ordinary tts tasks, it feels just about what you'd expect from its size: better than 100B but not as good as echo-tts or vibevoice. Still, when it comes to multilingual capabilities, it's the best because echo and vb support only english and chinese.

Anonymous
01/22/26(Thu)17:24:29 No.107943184

Anonymous 01/22/26(Thu)17:24:29 No.107943184

>>107943156
>/lmg/ - a general dedicated to the discussion and development of local language models.
>local language models
Go to >>>/g/ldg/

Anonymous
01/22/26(Thu)17:27:41 No.107943211

Anonymous 01/22/26(Thu)17:27:41 No.107943211

>>107942353
>>107942364
Pocket and Kokoro are micro TTS that can run on anything. They're both ~few hundred MBs in model size and runs fast on CPU.

Anonymous
01/22/26(Thu)17:29:11 No.107943224

Anonymous 01/22/26(Thu)17:29:11 No.107943224

Has anyone VoxCPM or Maya?

Anonymous
01/22/26(Thu)17:32:43 No.107943250

Anonymous 01/22/26(Thu)17:32:43 No.107943250

>>107942351
Envy for his pure listening experience.

Anonymous
01/22/26(Thu)17:38:07 No.107943288

Anonymous 01/22/26(Thu)17:38:07 No.107943288

I don't really care about any of these TTS models until they have llama.cpp support. How different is the architecture from traditional LLMs? I can't imagine it would be that hard to jerry rig in support.

Anonymous
01/22/26(Thu)17:40:34 No.107943307

Anonymous 01/22/26(Thu)17:40:34 No.107943307

>>107943183
>echo-tts or vibevoice
Am I crazy or chatterbox is better than both of these? TTS autists please enlighten me.

Anonymous
01/22/26(Thu)17:46:28 No.107943350

Anonymous 01/22/26(Thu)17:46:28 No.107943350

>>107943307
You're crazy. Chatterbox is overfit for assistant slop voices. Its clonning is weak. It turns anime voices into middle-aged Karens. Its prosody is just as weak. It's monotone.

Anonymous
01/22/26(Thu)17:48:26 No.107943365

Anonymous 01/22/26(Thu)17:48:26 No.107943365

>>107943183
>You won't use it for a consistent character since every generation gets a different voice.

https://github.com/QwenLM/Qwen3-TTS?tab=readme-ov-file#launch-local-web-ui-demo
>If you want a designed voice that you can reuse like a cloned speaker, a practical workflow is: (1) use the VoiceDesign model to synthesize a short reference clip that matches your target persona, (2) feed that clip into create_voice_clone_prompt to build a reusable prompt, and then (3) call generate_voice_clone with voice_clone_prompt to generate new content without re-extracting features every time. This is especially useful when you want a consistent character voice across many lines.

Anonymous
01/22/26(Thu)17:55:00 No.107943409

Anonymous 01/22/26(Thu)17:55:00 No.107943409

>>107943365
Yeah, I saw that part. It tells you to use VoiceDesigner only for the first step. Once you have your reference audio, you switch to the base model for cloning. That's what I meant by saying you won't use VoiceDesigner as your main thing.

Anonymous
01/22/26(Thu)18:03:48 No.107943464

Anonymous 01/22/26(Thu)18:03:48 No.107943464

>>107943307
I haven't tried echo-tts, but even vibevoice 1.5B performed better than chatterbox. However, both struggled with high-pitched or energetic voices.

Anonymous
01/22/26(Thu)18:05:01 No.107943475

Anonymous 01/22/26(Thu)18:05:01 No.107943475

File: 2727636.jpg (11 KB, 200x200)

11 KB JPG

If a model keeps spamming *actions* every three words, do I just lower temp?

Anonymous
01/22/26(Thu)18:05:03 No.107943477

Anonymous 01/22/26(Thu)18:05:03 No.107943477

>>107943350
I'm with you on Chatterbox sucking, though in my experience, it turned normal voices into extreme Southern US accent women. I hate Southern US accents. It was infuriating.

Anonymous
01/22/26(Thu)18:07:42 No.107943492

Anonymous 01/22/26(Thu)18:07:42 No.107943492

>>107943183
Thanks for the reaponse blud

Anonymous
01/22/26(Thu)18:08:22 No.107943499

Anonymous 01/22/26(Thu)18:08:22 No.107943499

>>107943475
usually there's no way around changing models in that case

Anonymous
01/22/26(Thu)18:08:54 No.107943504

Anonymous 01/22/26(Thu)18:08:54 No.107943504

Just to clarify, the people hating on chatterbox are only using it for high pitched anime voices right? Not for other non gooning related use cases?

Anonymous
01/22/26(Thu)18:10:07 No.107943515

Anonymous 01/22/26(Thu)18:10:07 No.107943515

>>107941128
why is this kid manspreading

Anonymous
01/22/26(Thu)18:11:07 No.107943518

Anonymous 01/22/26(Thu)18:11:07 No.107943518

>>107943515
Airflow

Anonymous
01/22/26(Thu)18:11:38 No.107943521

Anonymous 01/22/26(Thu)18:11:38 No.107943521

air status?

Anonymous
01/22/26(Thu)18:12:28 No.107943528

Anonymous 01/22/26(Thu)18:12:28 No.107943528

>>107943504
Why would you pick chatterbox for non-gooning tasks if there are piper, kokoro, pocket, etc that can run on cpu and good enough for monotone reading?

Anonymous
01/22/26(Thu)18:14:10 No.107943544

Anonymous 01/22/26(Thu)18:14:10 No.107943544

>>107943528
Cuz I have a 5090? Why should I care?

Anonymous
01/22/26(Thu)18:14:40 No.107943547

Anonymous 01/22/26(Thu)18:14:40 No.107943547

File: image (3).png (93 KB, 1324x613)

93 KB PNG

Is this still relevant to making char cards? Found on some wiki

Anonymous
01/22/26(Thu)18:15:36 No.107943556

Anonymous 01/22/26(Thu)18:15:36 No.107943556

>>107943547
It's not a science at all, just experiment and take inspiration from others

Anonymous
01/22/26(Thu)18:15:47 No.107943559

Anonymous 01/22/26(Thu)18:15:47 No.107943559

>>107943547
this looks like some ancient 2022 shit

Anonymous
01/22/26(Thu)18:16:13 No.107943564

Anonymous 01/22/26(Thu)18:16:13 No.107943564

>>107943556
I mean the syntax and shit. Must everything be a json-like?

Anonymous
01/22/26(Thu)18:17:14 No.107943574

Anonymous 01/22/26(Thu)18:17:14 No.107943574

>>107943564
Not at all.

Anonymous
01/22/26(Thu)18:17:55 No.107943577

Anonymous 01/22/26(Thu)18:17:55 No.107943577

>>107943528
kokoro cannot distinguish a period, meaning if you give it an academic text it reads very unnaturally making it very hard to follow. My question wasn't an attack, I just wanted to make sure.

Anonymous
01/22/26(Thu)18:19:54 No.107943586

Anonymous 01/22/26(Thu)18:19:54 No.107943586

>>107943559
https://wikia.schneedc.com/bot-creation/trappu/introduction
From here

Anonymous
01/22/26(Thu)18:23:23 No.107943613

Anonymous 01/22/26(Thu)18:23:23 No.107943613

>>107943528
>why do you use good model instead of bad model
What the hell kind of question is that?

Anonymous
01/22/26(Thu)18:24:58 No.107943620

Anonymous 01/22/26(Thu)18:24:58 No.107943620

>>107943307
>>107943504
chatterbox cloning is really bad for any voice with distinguishing features, e.g. I can feed it an american voice with heavy vocal fry and it generates a vaguely-similar pitched voice with an english accent, whereas vibevoice reproduces it fine and echo-tts does it near-perfectly. chatterbox gens also tend to have a strong "amateur reading off a script" quality compared to echotts and vibevoice, they're much less natural no matter what voice you're using in my experience

Anonymous
01/22/26(Thu)18:26:06 No.107943628

Anonymous 01/22/26(Thu)18:26:06 No.107943628

>>107943515
that is the ugly old man attractor pose

Anonymous
01/22/26(Thu)18:28:37 No.107943643

Anonymous 01/22/26(Thu)18:28:37 No.107943643

>>107943577
I see. If your usecase is academic papers, consider giving qwen a try, they specifically mention that their model can read stuff like this:
>I am solving the equation: x = [-b ± √(b2-4ac)] / 2a? Nobody can — it's a disaster (◍•͈•͈◍), very sad!
https://voca.ro/17bGjzuKYo8R

Anonymous
01/22/26(Thu)18:33:14 No.107943672

Anonymous 01/22/26(Thu)18:33:14 No.107943672

>>107943643
there is always this hissing in these lower parameter tts models. Now I basically just started checking out tts models an hour ago, I don't have much of a frame of reference but chatterbox was the only one up until now which was able to generate audio which sounded like actual human recordings. Like for example in your uploaded audio, there is this hiss in the voice.

Anonymous
01/22/26(Thu)18:55:53 No.107943802

Anonymous 01/22/26(Thu)18:55:53 No.107943802

>>107943672
A hiss? I don't hear anything.

Anonymous
01/22/26(Thu)18:57:52 No.107943811

Anonymous 01/22/26(Thu)18:57:52 No.107943811

>>107943672
https://voca.ro/1oX8uRtb14Fe

Anonymous
01/22/26(Thu)18:59:58 No.107943821

Anonymous 01/22/26(Thu)18:59:58 No.107943821

>>107943672
Are you saying you like the hiss/room noise/verb? Chatterbox does preserve that, but I run UVR on my voice files.

Anonymous
01/22/26(Thu)19:00:04 No.107943822

Anonymous 01/22/26(Thu)19:00:04 No.107943822

>>107943802
>>107943811
your horniness is blinding you. just kidding, but do you really not hear the hiss in the background or sometimes bleeding into the voice? Are you using headphones? Are they maybe cheap?

Anonymous
01/22/26(Thu)19:03:51 No.107943849

Anonymous 01/22/26(Thu)19:03:51 No.107943849

>>107943822
This gen >>107943811 is made based on a poor source audio, which explains background noise.

Anonymous
01/22/26(Thu)19:06:24 No.107943858

Anonymous 01/22/26(Thu)19:06:24 No.107943858

>>107943821
No I hate the hiss/room noise/ver. I basically just want to generate audiobooks on academic texts or books but have it sound like a professional voice recording, like a David Attenborough documentary. Up until now I've only done stuff with llms or diffusion so I am not in the know when it comes to tts models. I'm currently setting up chatterbox-Audiobook as it seems to be a tool made specifically for my use case. Will see how that turns out and if its not good enough, probably check out basically all other popular tts models with comfyUI

Anonymous
01/22/26(Thu)19:11:46 No.107943885

Anonymous 01/22/26(Thu)19:11:46 No.107943885

>>107943858
Then I highly recommend doing a pass of uvr and mel denoise on your vocal sample. There's an webapp version.

Anonymous
01/22/26(Thu)19:31:08 No.107944006

Anonymous 01/22/26(Thu)19:31:08 No.107944006

File: K.I.S.S.L.M.G.jpg (370 KB, 1536x1536)

370 KB JPG

Anonymous
01/22/26(Thu)19:46:06 No.107944079

Anonymous 01/22/26(Thu)19:46:06 No.107944079

File: which.jpg (278 KB, 2176x2048)

278 KB JPG

>>107944006

Anonymous
01/22/26(Thu)19:47:06 No.107944087

Anonymous 01/22/26(Thu)19:47:06 No.107944087

>>107944079
tummy

Anonymous
01/22/26(Thu)19:53:15 No.107944128

Anonymous 01/22/26(Thu)19:53:15 No.107944128

>>107944087
retard

Anonymous
01/22/26(Thu)19:53:40 No.107944132

Anonymous 01/22/26(Thu)19:53:40 No.107944132

>>107943547
Read this instead: https://rentry.org/Sukino-Guides

Anonymous
01/22/26(Thu)19:58:23 No.107944149

Anonymous 01/22/26(Thu)19:58:23 No.107944149

>>107941128
Rin erotic

Anonymous
01/22/26(Thu)19:58:45 No.107944152

Anonymous 01/22/26(Thu)19:58:45 No.107944152

>>107944132
No thank you.

Anonymous
01/22/26(Thu)19:59:59 No.107944162

Anonymous 01/22/26(Thu)19:59:59 No.107944162

>>107944152
You forgot to say NTA

Anonymous
01/22/26(Thu)20:00:47 No.107944168

Anonymous 01/22/26(Thu)20:00:47 No.107944168

Wait, does ooba have actual proper tool calling support now, with enforced grammars? Last I checked the piece of shit would not do proper tool calling reliably. Couldn't enforce the grammar correctly for non-standard tool call structures (Qwen3 coder)

Anonymous
01/22/26(Thu)20:05:30 No.107944200

Anonymous 01/22/26(Thu)20:05:30 No.107944200

>>107944162
?

Anonymous
01/22/26(Thu)20:14:48 No.107944247

Anonymous 01/22/26(Thu)20:14:48 No.107944247

seems qwen3-tts needs rtx cards (flash attention 2), back to kokoro I guess

Anonymous
01/22/26(Thu)20:19:27 No.107944284

Anonymous 01/22/26(Thu)20:19:27 No.107944284

>>107944132
>just edit what the model says, dood!

Anonymous
01/22/26(Thu)20:23:07 No.107944304

Anonymous 01/22/26(Thu)20:23:07 No.107944304

>>107944284
for a hobby requiring a lot of reading it sure seems like a lot of people just aren't capable of doing so

Anonymous
01/22/26(Thu)20:26:46 No.107944335

Anonymous 01/22/26(Thu)20:26:46 No.107944335

>>107941128
Artist name?
The style is cute

Anonymous
01/22/26(Thu)20:28:25 No.107944343

Anonymous 01/22/26(Thu)20:28:25 No.107944343

>>107944335
/lmg/ imagegen anon?
Always coming in with fresh high quality stuff.
A comfort that at least something don't change.

Anonymous
01/22/26(Thu)20:28:51 No.107944344

Anonymous 01/22/26(Thu)20:28:51 No.107944344

>>107941273
Will this give me corpo tier denials for request?

Anonymous
01/22/26(Thu)20:29:42 No.107944348

Anonymous 01/22/26(Thu)20:29:42 No.107944348

>>107944343
huh?

Anonymous
01/22/26(Thu)20:31:38 No.107944357

Anonymous 01/22/26(Thu)20:31:38 No.107944357

>>107944344
I don't recommend it if you have a hotline kink.

Anonymous
01/22/26(Thu)20:34:53 No.107944370

Anonymous 01/22/26(Thu)20:34:53 No.107944370

>>107941279
it's a compliment

Anonymous
01/22/26(Thu)20:41:25 No.107944399

Anonymous 01/22/26(Thu)20:41:25 No.107944399

>>107944304
You're not entirely right anon. It does not only require reading, but also understanding. That's the hard part.

Anonymous
01/22/26(Thu)20:57:18 No.107944482

Anonymous 01/22/26(Thu)20:57:18 No.107944482

>>107944399
Why are you like this?

Anonymous
01/22/26(Thu)21:16:09 No.107944566

Anonymous 01/22/26(Thu)21:16:09 No.107944566

>>107944132
>>107944284
It's a great guide. you're just retarded.

Anonymous
01/22/26(Thu)21:33:54 No.107944654

Anonymous 01/22/26(Thu)21:33:54 No.107944654

File: k12.jpg (11 KB, 467x481)

11 KB JPG

>time=2026-01-23T02:27:43.305Z level=WARN source=runner.go:153 msg="truncating input prompt" limit=8192 prompt=10933 keep=5 new=8192
>24GB ram, 1660ti

Anonymous
01/22/26(Thu)21:36:22 No.107944667

Anonymous 01/22/26(Thu)21:36:22 No.107944667

Brothers, I have envisioned a new formula what will change our world.
E = MC^2 + AI

Anonymous
01/22/26(Thu)21:36:38 No.107944669

Anonymous 01/22/26(Thu)21:36:38 No.107944669

>>107944399
>it's not only X, but also Y

Anonymous
01/22/26(Thu)21:56:44 No.107944758

Anonymous 01/22/26(Thu)21:56:44 No.107944758

>>107943547
No. It's retard tier stupid.
>>107943564
Hell no.

Anonymous
01/22/26(Thu)22:00:26 No.107944784

Anonymous 01/22/26(Thu)22:00:26 No.107944784

>>107944399
Reading, understanding, and a solid grasp of English grammar and vocabulary, or whatever language one cares to use.

Anonymous
01/22/26(Thu)22:04:00 No.107944799

Anonymous 01/22/26(Thu)22:04:00 No.107944799

>>107944784
Ok, bootlicker.

Anonymous
01/22/26(Thu)22:16:13 No.107944846

Anonymous 01/22/26(Thu)22:16:13 No.107944846

gimme toss-240

Anonymous
01/22/26(Thu)22:25:45 No.107944896

Anonymous 01/22/26(Thu)22:25:45 No.107944896

>>107944799
Do you have any idea how incomprehensively moronic you sound.
Fuck off.

Anonymous
01/22/26(Thu)22:27:21 No.107944905

Anonymous 01/22/26(Thu)22:27:21 No.107944905

File: Base Image.png (1.27 MB, 1208x3832)

1.27 MB PNG

Learning to Discover at Test Time
https://arxiv.org/abs/2601.16175
>How can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement learning at test time, so the LLM can continue to train, but now with experience specific to the test problem. This form of continual learning is quite special, because its goal is to produce one great solution rather than many good ones on average, and to solve this very problem rather than generalize to other problems. Therefore, our learning objective and search subroutine are designed to prioritize the most promising solutions. We call this method Test-Time Training to Discover (TTT-Discover). Following prior work, we focus on problems with continuous rewards. We report results for every problem we attempted, across mathematics, GPU kernel engineering, algorithm design, and biology. TTT-Discover sets the new state of the art in almost all of them: (i) Erdős' minimum overlap problem and an autocorrelation inequality; (ii) a GPUMode kernel competition (up to faster than prior art); (iii) past AtCoder algorithm competitions; and (iv) denoising problem in single-cell analysis. Our solutions are reviewed by experts or the organizers. All our results are achieved with an open model, OpenAI gpt-oss-120b, and can be reproduced with our publicly available code, in contrast to previous best results that required closed frontier models. Our test-time training runs are performed using Tinker, an API by Thinking Machines, with a cost of only a few hundred dollars per problem.
https://github.com/test-time-training/discover
Really interesting. probably the best so far this year

Anonymous
01/22/26(Thu)22:38:27 No.107944949

Anonymous 01/22/26(Thu)22:38:27 No.107944949

>>107944284
??? yeah no shit, are you NOT editing what the model says? how fucking dense can you possibly be
are you sure you're in the right thread? I could've sworn this was /lmg/. it even has the cute and funny vocaloid girl

Anonymous
01/22/26(Thu)22:46:18 No.107944985

Anonymous 01/22/26(Thu)22:46:18 No.107944985

>>107941128
Man I love techloligy

Anonymous
01/22/26(Thu)22:56:25 No.107945030

Anonymous 01/22/26(Thu)22:56:25 No.107945030

>>107943068
context is related. all these ai agents will concatenate their own shit making the actual prompt huge. small token size would truncate the prompt and the llm wont know what to do

Anonymous
01/22/26(Thu)23:00:44 No.107945041

Anonymous 01/22/26(Thu)23:00:44 No.107945041

5060 TI 16GB or 3080 TI 12GB faster for Nemo goof q4?

Anonymous
01/22/26(Thu)23:05:04 No.107945059

Anonymous 01/22/26(Thu)23:05:04 No.107945059

>>107945041
5060ti would allow for much more context which is more valuable. the 3080ti would be faster but is basically e-waste at this point.

Anonymous
01/22/26(Thu)23:12:00 No.107945099

Anonymous 01/22/26(Thu)23:12:00 No.107945099

>>107945059
nemo only good for 16k ctx anyway tho?
fuck have to choose today
i read 3080ti gets real time orpheus-tts, nobody posted 5060ti benchmarks

Anonymous
01/22/26(Thu)23:14:33 No.107945110

Anonymous 01/22/26(Thu)23:14:33 No.107945110

>>107945099
Consider you'll need both models loaded and who knows what else. 16gb vram is still more valuable.

Anonymous
01/22/26(Thu)23:14:58 No.107945111

Anonymous 01/22/26(Thu)23:14:58 No.107945111

>>107945099
3080ti is about 30% faster than the 5060ti. 16k context will take up about 6gb of vram and a q4_k_m is about 7.5gb. the 3080ti does not have enough vram for full context at that quant.

Anonymous
01/22/26(Thu)23:15:29 No.107945114

Anonymous 01/22/26(Thu)23:15:29 No.107945114

What the best model for 2 3090s? GLM-Air is kinda poopy and a bit slow. There has to be something better, right?????

Anonymous
01/22/26(Thu)23:21:33 No.107945129

Anonymous 01/22/26(Thu)23:21:33 No.107945129

>>107945114
glm4.6 if you have enough ram or old llama 70bs. very few options.

Anonymous
01/22/26(Thu)23:23:19 No.107945134

Anonymous 01/22/26(Thu)23:23:19 No.107945134

>>107945111
>>107945110
good point ty.
I'll get the 5060TI

Anonymous
01/22/26(Thu)23:25:24 No.107945145

Anonymous 01/22/26(Thu)23:25:24 No.107945145

>>107945134
I have a 5060ti and its great.
Blackwell too so i can run nvfp4 for imagegen.

Anonymous
01/22/26(Thu)23:45:36 No.107945220

Anonymous 01/22/26(Thu)23:45:36 No.107945220

One thing I have noticed with drummer's models is that they are way overfitted on the AI beginning messages with its name. I only use third-person, so maybe others haven't had this problem. But it makes his models near unusable for me, no matter what settings I use.

Anonymous
01/23/26(Fri)00:03:26 No.107945312

Anonymous 01/23/26(Fri)00:03:26 No.107945312

QwenTTS is the first model which can copy the voice of my waifu (Pony Waifu), VB is overrated has a strong censorship when your ref has high pitch character so if your waifu is a loli or a pony as my case, you Will get music instead. Is very hard to Clone young characters. So for me, QwenTTS is better that jeet voice

Anonymous
01/23/26(Fri)00:05:35 No.107945326

Anonymous 01/23/26(Fri)00:05:35 No.107945326

>>107945312
How is it compared to echo? IDK what I upgrade to, I'm still using chatterbox.

Anonymous
01/23/26(Fri)00:08:30 No.107945338

Anonymous 01/23/26(Fri)00:08:30 No.107945338

uuh... qwen tts comfy when???

Anonymous
01/23/26(Fri)00:08:40 No.107945339

Anonymous 01/23/26(Fri)00:08:40 No.107945339

>>107945326
qwen is a safe horse to back

Anonymous
01/23/26(Fri)00:12:53 No.107945357

Anonymous 01/23/26(Fri)00:12:53 No.107945357

>>107945312
Pitch of consent is a thing now?

Anonymous
01/23/26(Fri)00:14:52 No.107945364

Anonymous 01/23/26(Fri)00:14:52 No.107945364

>>107945339
ha ha. hah... ha.

Anonymous
01/23/26(Fri)00:18:11 No.107945372

Anonymous 01/23/26(Fri)00:18:11 No.107945372

>>107945326
Why no quants for echo :(

Anonymous
01/23/26(Fri)00:22:28 No.107945389

Anonymous 01/23/26(Fri)00:22:28 No.107945389

Is now a good time to get into tts? Where do you guys usually get samples to train on?

Anonymous
01/23/26(Fri)00:24:48 No.107945397

Anonymous 01/23/26(Fri)00:24:48 No.107945397

>>107945389
Podcasts (pornstar interviews) and /tv

Anonymous
01/23/26(Fri)00:26:32 No.107945405

Anonymous 01/23/26(Fri)00:26:32 No.107945405

>>107945312
Can you share some examples comrade?

Anonymous
01/23/26(Fri)00:36:00 No.107945434

Anonymous 01/23/26(Fri)00:36:00 No.107945434

Suggestion: Add the following line to the end of the "ERP" section of the "Recommended Models" Rentry
>For more recommendations, see [the UGI (Uncensored General Intelligence) Leaderboard](https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard). Sort by "W/10" (willingness to obey instructions) first and "Entertainment" (knowledge of erotic topics) second.

t. 12-GiB VRAMlet who just upgraded from Mistral Nemo (willingness 5.5, entertainment 2.0) Q6 to XortronCriminalComputingConfig (willingness 9.8, entertainment 2.6) Q3

Anonymous
01/23/26(Fri)00:37:17 No.107945440

Anonymous 01/23/26(Fri)00:37:17 No.107945440

>>107945389
TTS rose from the dead and has been on a roll the past couple of months. I get my cringe anime samples from gacha wiki pages, they're typically high quality and without background noise.

Anonymous
01/23/26(Fri)00:39:25 No.107945446

Anonymous 01/23/26(Fri)00:39:25 No.107945446

It's sad but LLMs are unironically the most stagnant sector of AI right now.

Anonymous
01/23/26(Fri)00:49:48 No.107945491

Anonymous 01/23/26(Fri)00:49:48 No.107945491

>>107945446
Cmon

Open Source TTS and Music Generation are years behind SOTA

Anonymous
01/23/26(Fri)00:50:02 No.107945492

Anonymous 01/23/26(Fri)00:50:02 No.107945492

>>107945434
chat, is this real?

Anonymous
01/23/26(Fri)01:25:39 No.107945645

Anonymous 01/23/26(Fri)01:25:39 No.107945645

Are the SillyTavern devs aware of the bug causing newer versions 1.15x to cause grammar issues?
It's affecting all models loaded by koboldcpp backend.

Anonymous
01/23/26(Fri)01:30:13 No.107945671

Anonymous 01/23/26(Fri)01:30:13 No.107945671

libre webui any good?

Anonymous
01/23/26(Fri)01:35:30 No.107945697

Anonymous 01/23/26(Fri)01:35:30 No.107945697

>>107942534
Just in case you'd catch this error
 import flash_attn_2_cuda as flash_attn_gpu
ImportError: /home/ai/miniconda3/envs/qwen3-tts/lib/python3.12/site-packages/flash_attn_2_cuda.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib
Go to https://flashattn.dev/#finder to find a wheel compatible with your pytorch+cuda setup

Anonymous
01/23/26(Fri)01:37:02 No.107945707

Anonymous 01/23/26(Fri)01:37:02 No.107945707

>>107945645
No

Anonymous
01/23/26(Fri)01:37:15 No.107945709

Anonymous 01/23/26(Fri)01:37:15 No.107945709

>>107945645
Yes. Here is definitely the best place to ask.

Anonymous
01/23/26(Fri)02:15:51 No.107945886

Anonymous 01/23/26(Fri)02:15:51 No.107945886

>>107945697
->
>>107941377

Anonymous
01/23/26(Fri)02:42:16 No.107945981

Anonymous 01/23/26(Fri)02:42:16 No.107945981

I seriously hope Anthropic/OpenAI/Google get their shit together. I got frustrated with my big local models and decided to blow some money on Claude Opus/Gemini instead.
They're still a bit smarter but they're just about the same shit we have local when it comes to creativity and storytelling. Both are very slopped. The 'moat' is absolutely nothing compared to what it was just a year ago, let alone during the "gpt4 vs llama" or the "2024 open models vs claude opus 3" days.
I have very little hope for the next generation of open LLMs if this shit is what our chinese overlords are forced to distill.

Anonymous
01/23/26(Fri)02:43:22 No.107945990

Anonymous 01/23/26(Fri)02:43:22 No.107945990

>>107945440
>anime samples from gacha wiki pages
Same. It's good for anime and vidya characters as well, since practically every Japanese VA has voiced a gacha character at this point so finding a gacha character with a similar voice to a character you want to clone ends up being faster than trying to extract audio samples from whatever anime/game they're from.

Anonymous
01/23/26(Fri)03:02:34 No.107946056

Anonymous 01/23/26(Fri)03:02:34 No.107946056

>>107942977
based ngxson dabbing on proprietary jeets

Anonymous
01/23/26(Fri)03:16:33 No.107946105

Anonymous 01/23/26(Fri)03:16:33 No.107946105

Someone already vibecoded Rust implementation of QwenTTS.
https://github.com/QwenLM/Qwen3-TTS/pull/8

Anonymous
01/23/26(Fri)03:20:45 No.107946125

Anonymous 01/23/26(Fri)03:20:45 No.107946125

>>107945981
making LLMs better at "instruction following" requires an unholy amount of data, and to get IF data rather than pay a lot of humans to write it which would be expensive labs are just genning more synthetic garbage
that synth data is also getting more and more stiff as they gen it with newer models that write in an even stiffer fashion and this is how you end up with ChatGPT 5.2 that likes to write very short sentences and has basically no prose whatsoever
you can't get better models without intense instruction tuning (this is where most of the magic of LLM improvement lays, they haven't changed much of what they do during base model training.) so you will be stuck with what you have for the decades to come unless OAI is willing to burn even more money which shouldn't even be possible

Anonymous
01/23/26(Fri)03:22:43 No.107946136

Anonymous 01/23/26(Fri)03:22:43 No.107946136

>struggling with mistral small 24B
>have to partially offload to ram making it slow, but at least its less garbage than the 12B models I tried
>See a recommendation for 12B Mag Mell
>It's way better at writing and and staying coherent, on top of fitting fully into vram with plenty to spare
What the fuck
>look at the model source
>Mistral
What the fuck

>>107945446
Wait until you see the absolute state of SDXL

Anonymous
01/23/26(Fri)03:25:32 No.107946148

Anonymous 01/23/26(Fri)03:25:32 No.107946148

>>107946125
Better models will be obtained by incorporating instruction/reasoning data in the pretraining to a much larger extent than what is usually done (but at what cost?).

Anonymous
01/23/26(Fri)03:26:06 No.107946151

Anonymous 01/23/26(Fri)03:26:06 No.107946151

>make a PR to do a small manifest fix for a project I'm using
>assaulted by 3 AI bots doing checks on the code, on the PR description and on the title.
>Title is ok, but description is too short! (it's literally 1 line code change, the change is already in the title)
>the pr description also lacks the checkbox list of stuff you need to say you have done!!
>here's a poem about your changes
BROS FUCK AI literally a fucking HUMILIATION RITUAL being dabbed on by these fucking github and coderabbit pieces of shit like WHY. I HATE AI

Anonymous
01/23/26(Fri)03:34:02 No.107946188

Anonymous 01/23/26(Fri)03:34:02 No.107946188

>>107946151
the only project I saw using that coderabbit garbage was bun and I must say when you look at some of their issues I am flabbergasted by the junior level mistakes in their API design, it was a major turn off
https://github.com/oven-sh/bun/issues/23902
https://github.com/oven-sh/bun/issues/22484
for example (different problems, same cause)
a file object that caches results and never updates them after changes? the fuck?
when you're that kind of subhuman programmer it makes sense you'd think AI tools / vibe coding are great. You are incapable of judging what is good and what is not.

Anonymous
01/23/26(Fri)03:45:55 No.107946222

Anonymous 01/23/26(Fri)03:45:55 No.107946222

File: llm on standalone xr.png (1.22 MB, 1207x1927)

1.22 MB PNG

For those interested in fully local standalone XR waifus

Anonymous
01/23/26(Fri)03:47:11 No.107946223

Anonymous 01/23/26(Fri)03:47:11 No.107946223

>>107944335
the artist is AI

Anonymous
01/23/26(Fri)03:53:21 No.107946244

Anonymous 01/23/26(Fri)03:53:21 No.107946244

Which web based UI does /lmg/ use?

Anonymous
01/23/26(Fri)03:54:23 No.107946249

Anonymous 01/23/26(Fri)03:54:23 No.107946249

>>107946244
the meta is to vibecode your own

Anonymous
01/23/26(Fri)04:32:43 No.107946399

Anonymous 01/23/26(Fri)04:32:43 No.107946399

>AI only copies, those who invented and contributed need to be compensated, preferably every time AI leverage their knowledge kinda like how musicians get a cut each time their song is played.

Anonymous
01/23/26(Fri)04:38:57 No.107946424

Anonymous 01/23/26(Fri)04:38:57 No.107946424

>>107946399
>kinda like how musicians get a cut each time their song is played
They wish.

Anonymous
01/23/26(Fri)04:39:28 No.107946426

Anonymous 01/23/26(Fri)04:39:28 No.107946426

>>107946399
Fuck off! Intellectual property is not property. It's cancer. Musician should be paid only for live concerts.

Anonymous
01/23/26(Fri)04:51:27 No.107946464

Anonymous 01/23/26(Fri)04:51:27 No.107946464

>>107946399
It's a good idea but impossible in practice

Anonymous
01/23/26(Fri)05:05:37 No.107946503

Anonymous 01/23/26(Fri)05:05:37 No.107946503

>>107946105
Can someone just vibecode it into llama.cpp? Seriously, even if it doesn't get merged for being slop, this is probably gonna be the most practical TTS solution for a while

Anonymous
01/23/26(Fri)05:06:14 No.107946510

Anonymous 01/23/26(Fri)05:06:14 No.107946510

TTS now has its own Chinese culture moment:
https://github.com/QwenLM/Qwen3-TTS/issues/14#issuecomment-3789452120

Anonymous
01/23/26(Fri)05:06:58 No.107946513

Anonymous 01/23/26(Fri)05:06:58 No.107946513

called it
>>107941153
>that's gonna get yeeted

Anonymous
01/23/26(Fri)05:08:43 No.107946523

Anonymous 01/23/26(Fri)05:08:43 No.107946523

>>107946513
>perfectly SFW pic will be deleted because moderation has been a joke for 15 years
yes, we know

Anonymous
01/23/26(Fri)05:13:47 No.107946543

Anonymous 01/23/26(Fri)05:13:47 No.107946543

>>107946399
Mate you need to be compensating the estates of every family that invented the words you're using, you just copied English.
You also need to go and pay moot for copying greentext
And every deadshit retard on reddit for copying their reflexive opinion on this matter without giving it ten seconds of thought.

Anonymous
01/23/26(Fri)05:15:31 No.107946552

Anonymous 01/23/26(Fri)05:15:31 No.107946552

>>107946399
Agreed, could be implemented with some tool calls at the end of generation, check if anything is copied and automatically send a payment from a configured wallet.

Anonymous
01/23/26(Fri)05:16:52 No.107946556

Anonymous 01/23/26(Fri)05:16:52 No.107946556

>>107946513
the trick is to post nintendo loli hentai, which a quick glance on video game boards will tell you is sfw according to the trannies that moderate 4chan

Anonymous
01/23/26(Fri)05:17:06 No.107946558

Anonymous 01/23/26(Fri)05:17:06 No.107946558

>>107946510
Props to the qwen team for answering questions promptly. I kinda got used to abandoned tts repos that have piles of unresolved issues.

Anonymous
01/23/26(Fri)05:20:37 No.107946574

Anonymous 01/23/26(Fri)05:20:37 No.107946574

We need a decentralized chan with AI moderation. The holy grail.

Anonymous
01/23/26(Fri)05:23:20 No.107946589

Anonymous 01/23/26(Fri)05:23:20 No.107946589

>>107946574
>AI moderation
I rember the attempts made itt with some models iirc mixtral and similar and how funny the result were

Anonymous
01/23/26(Fri)05:25:28 No.107946599

Anonymous 01/23/26(Fri)05:25:28 No.107946599

>>107946523
>>perfectly SFW pic
I am sorry, but, what? do you even know what SFW means? I would absolutely feel ashamed of myself opening this thread in public and people see OP's pic. If I didn't work remote and dared open this shit in an open office room this is grounds for being fired.

Anonymous
01/23/26(Fri)05:26:03 No.107946602

Anonymous 01/23/26(Fri)05:26:03 No.107946602

>>107946589
A neutrally aligned vision or multi-modal + ruleset would probably be fine.

Anonymous
01/23/26(Fri)05:26:21 No.107946604

Anonymous 01/23/26(Fri)05:26:21 No.107946604

>>107946503
Be the change you want to see.

Anonymous
01/23/26(Fri)05:27:57 No.107946616

Anonymous 01/23/26(Fri)05:27:57 No.107946616

>>107946599
I'm sorry you live in Utah or Saudi Arabia then, but words have meanings and that pic was perfectly SFW. Your personal shame is irrelevant to its classification.

Anonymous
01/23/26(Fri)05:30:12 No.107946623

Anonymous 01/23/26(Fri)05:30:12 No.107946623

>>107946616
Exhausted track runner lolis are ultra lewd, there's a reason all sprint lanes have barriers installed so you can't lay your filthy eyes upon them.

Anonymous
01/23/26(Fri)05:30:54 No.107946629

Anonymous 01/23/26(Fri)05:30:54 No.107946629

>>107946616
>words have meanings
Yes. SFW stands for SAFE FOR WORK. There is no world in which your fetish fodder with weird focus on giving shape to certain body parts in an otherwise simplistic drawing would be a pass.
You yourself are being a weasel, I do not believe for a second that you would dare open that pic in any public setting.

Anonymous
01/23/26(Fri)05:33:11 No.107946644

Anonymous 01/23/26(Fri)05:33:11 No.107946644

>>107946629
I am literally in the bus currently please stop projecting your self-hatred and calling a cute pic fetish fodder

Anonymous
01/23/26(Fri)05:34:06 No.107946650

Anonymous 01/23/26(Fri)05:34:06 No.107946650

>>107946513
>>107946523
I'm the one who pinged the mods, pedophilia is not ok

Anonymous
01/23/26(Fri)05:36:16 No.107946661

Anonymous 01/23/26(Fri)05:36:16 No.107946661

File: 1642685749889.jpg (9 KB, 315x300)

9 KB JPG

>>107946650
hallmonitormaxxing

Anonymous
01/23/26(Fri)05:36:39 No.107946662

Anonymous 01/23/26(Fri)05:36:39 No.107946662

File: 82c654dfgy1i9j8ydv0n1j22q(...).jpg (403 KB, 3534x1920)

403 KB JPG

Anonymous
01/23/26(Fri)05:39:20 No.107946677

Anonymous 01/23/26(Fri)05:39:20 No.107946677

>>107946644
>I am literally in the bus
that never happened

Anonymous
01/23/26(Fri)05:40:52 No.107946685

Anonymous 01/23/26(Fri)05:40:52 No.107946685

i want to code a copy of simcopter with a local model, what would you guys recommend for this?

Anonymous
01/23/26(Fri)05:43:38 No.107946698

Anonymous 01/23/26(Fri)05:43:38 No.107946698

>>107946685
>what would you guys recommend for this?
That you post your fucking specs.

Anonymous
01/23/26(Fri)05:44:36 No.107946704

Anonymous 01/23/26(Fri)05:44:36 No.107946704

>>107946698
why so rude like this?

Anonymous
01/23/26(Fri)05:45:01 No.107946707

Anonymous 01/23/26(Fri)05:45:01 No.107946707

>>107946685
SimCopter was one of my favourite games
I love that you could load SimCity 2000 maps and fly in it

Anonymous
01/23/26(Fri)06:24:10 No.107946870

Anonymous 01/23/26(Fri)06:24:10 No.107946870

>>107941128
>https://techcrunch.com/2026/01/22/inference-startup-inferact-lands-150m-to-commercialize-vllm/
>Inference startup Inferact lands $150M to commercialize vLLM
>The creators of the open source project vLLM have announced that they transitioned the popular tool into a VC-backed startup, Inferact, raising $150 million in seed funding at an $800 million valuation.

Anonymous
01/23/26(Fri)06:28:01 No.107946883

Anonymous 01/23/26(Fri)06:28:01 No.107946883

File: 1768220662463707.jpg (753 KB, 800x1000)

753 KB JPG

Has anyone tested the new QwenTTS?

is it any good?

Anonymous
01/23/26(Fri)06:28:07 No.107946885

Anonymous 01/23/26(Fri)06:28:07 No.107946885

>>107946870
hell yeah get that bread

Anonymous
01/23/26(Fri)06:36:29 No.107946917

Anonymous 01/23/26(Fri)06:36:29 No.107946917

>>107946704
Am cried a little...

Anonymous
01/23/26(Fri)06:43:41 No.107946952

Anonymous 01/23/26(Fri)06:43:41 No.107946952

wait, did the gemini 3 api stop obfuscating its reasoning block? we are so back, our upcoming open chink models aren't just going to distill this slop—they'll nourish on it.

Anonymous
01/23/26(Fri)07:00:39 No.107947025

Anonymous 01/23/26(Fri)07:00:39 No.107947025

>>107946952
an anon used a method I don't recall a while ago to force gemini 3 to reveal its CoT and they found that newer GLM models write their CoT in a very similar way, meaning the chinks were already siphoning google for scraps

Anonymous
01/23/26(Fri)07:14:32 No.107947094

Anonymous 01/23/26(Fri)07:14:32 No.107947094

>>107946883
Just read the thread.

Anonymous
01/23/26(Fri)07:17:17 No.107947109

Anonymous 01/23/26(Fri)07:17:17 No.107947109

>>107947025
Meaning that gemini was trained on GLM cot

Anonymous
01/23/26(Fri)07:20:39 No.107947126

Anonymous 01/23/26(Fri)07:20:39 No.107947126

>>107946952
DeepSeek-R1-0528 already did this

Anonymous
01/23/26(Fri)07:29:14 No.107947159

Anonymous 01/23/26(Fri)07:29:14 No.107947159

>>107946952
>wait, did the gemini 3 api stop obfuscating its reasoning block?
how? it's still obfuscated for me

Anonymous
01/23/26(Fri)07:30:22 No.107947164

Anonymous 01/23/26(Fri)07:30:22 No.107947164

>>107946136
>What the fuck
To be fair, that's an unholy merge of unholy fine tunes, so it's what mistral+nvdia released with its brains slightly scrambled.

Anonymous
01/23/26(Fri)07:37:56 No.107947194

Anonymous 01/23/26(Fri)07:37:56 No.107947194

>>107946523
It was an image engineered to distract attention and encourage off-topic discussion by focusing on the sweat-drenched, nubile form of a young girl with visible steam arising from her. There are many implications in the image that invite off-topic remarks or posts. The small puddles of liquid on the bench beneath her that may or may not be from the bottle of water she has, how she must smell like after all that running, and what the "2MW" on her bottle means if we assume that it isn't just a direct reference. The attention-grabbing effect of the image is all the more obvious from the calculated composition of it. The slight skindentation on the shorts, the subtle but distinct details on her right armpit and the positioning so it can be seen clearly, the reddish tint and creases on her belly that are prominent enough to be the first thing to be noticed, the translucent parts of her top that showcase just how much she's been perspiring. Alone, these aspects wouldn't conjure up much thought but the combination of it all causes them to amplify each other until the image itself, while technically SFW, ends up being more arousing and intriguing to the average anon than the topic at hand or an actual pornographic image, so it's completely understandable that the mods took it down. It was probably distracting them too.

Anonymous
01/23/26(Fri)07:39:55 No.107947206

Anonymous 01/23/26(Fri)07:39:55 No.107947206

>>107947194
tl;dr?

Anonymous
01/23/26(Fri)07:41:47 No.107947218

Anonymous 01/23/26(Fri)07:41:47 No.107947218

>>107947206
Sweaty Rin is off-topic.

Anonymous
01/23/26(Fri)07:42:33 No.107947224

Anonymous 01/23/26(Fri)07:42:33 No.107947224

>>107947218
I disagree

Anonymous
01/23/26(Fri)08:25:51 No.107947475

Anonymous 01/23/26(Fri)08:25:51 No.107947475

>>107946870
I thought vLLM was bought by Red Hat.

Anonymous
01/23/26(Fri)08:27:12 No.107947486

Anonymous 01/23/26(Fri)08:27:12 No.107947486

Has anybody managed to tame glm 4.7 flash yet?
Seems about as smart as the more recent qwen 30BA3B and without sounding like a robot.
But holy shit does it like to force refusals, even after it already responded, some times.

Anonymous
01/23/26(Fri)09:06:02 No.107947712

Anonymous 01/23/26(Fri)09:06:02 No.107947712

>>107946151
update: my pr was completely rewritten by the maintainer who ignored the bots and closed with a 'ty'.
lol

Anonymous
01/23/26(Fri)09:21:35 No.107947774

Anonymous 01/23/26(Fri)09:21:35 No.107947774

>>107947194
Model?

Anonymous
01/23/26(Fri)09:22:35 No.107947779

Anonymous 01/23/26(Fri)09:22:35 No.107947779

File: 1758207320514664.png (2.62 MB, 1148x1668)

2.62 MB PNG

Does anyone know about the Global Consciousness Project Dot?
https://gcpdot.com/

Someone should make something like that, but instead of using random numbers, it parses news articles, social media and does more complex sentiment analysis with LLMs to measure the overall mood of the world

Anonymous
01/23/26(Fri)09:23:42 No.107947787

Anonymous 01/23/26(Fri)09:23:42 No.107947787

>>107947779
News are fake so this wouldn't achieve anything.

Anonymous
01/23/26(Fri)09:24:10 No.107947788

Anonymous 01/23/26(Fri)09:24:10 No.107947788

>>107947779
why do you think the mood of the world has any connection to sensationalist news headlines?

Anonymous
01/23/26(Fri)09:26:09 No.107947802

Anonymous 01/23/26(Fri)09:26:09 No.107947802

>>107947787
>>107947788
You guys might be enlightened, but do you really think normies reading negative or positive news doesn't affect their mood?

Anonymous
01/23/26(Fri)09:30:10 No.107947824

Anonymous 01/23/26(Fri)09:30:10 No.107947824

>>107947779
Getting an unbiased sampling of news is impossible. Your sample will be extremely biased towards western news and large coutnries. It won't be "Global Consciousness."

Anonymous
01/23/26(Fri)09:30:28 No.107947826

Anonymous 01/23/26(Fri)09:30:28 No.107947826

>>107947779
Hasn't this already been done?

Anonymous
01/23/26(Fri)09:32:55 No.107947842

Anonymous 01/23/26(Fri)09:32:55 No.107947842

>>107947774
DeepSeek V4

Anonymous
01/23/26(Fri)09:36:23 No.107947860

Anonymous 01/23/26(Fri)09:36:23 No.107947860

>>107947842
Thanks https://huggingface.co/deepseek-ai/DeepSeek-V4

Anonymous
01/23/26(Fri)09:41:53 No.107947888

Anonymous 01/23/26(Fri)09:41:53 No.107947888

>>107947025
It's actually pretty much identical to how 4.7 does it. It even does the (Self-Correction): thing and the exact same bullet points.

Anonymous
01/23/26(Fri)09:43:10 No.107947892

Anonymous 01/23/26(Fri)09:43:10 No.107947892

Am I tripping or did OpenAI add streaming to the reasoning summary?

Anonymous
01/23/26(Fri)09:48:59 No.107947920

Anonymous 01/23/26(Fri)09:48:59 No.107947920

>>107945338
https://github.com/flybirdxx/ComfyUI-Qwen-TTS

Anonymous
01/23/26(Fri)09:58:41 No.107947971

Anonymous 01/23/26(Fri)09:58:41 No.107947971

>>107947888
>It's actually pretty much identical to how 4.7 does it
it wasn't about flash, the conversation I am mentioning predates flash's release.

Anonymous
01/23/26(Fri)10:05:47 No.107948009

Anonymous 01/23/26(Fri)10:05:47 No.107948009

I think GLM 4.7 flash still broken
It's slowing down significantly after 4k context, also broken response.

Anonymous
01/23/26(Fri)10:06:29 No.107948013

Anonymous 01/23/26(Fri)10:06:29 No.107948013

>>107948009
vibecoding was a mistake

Anonymous
01/23/26(Fri)10:09:38 No.107948034

Anonymous 01/23/26(Fri)10:09:38 No.107948034

>>107948009
Are you using flash attention?
Try disabling that.
And yes, I know there were some flash attention related fixes.

Anonymous
01/23/26(Fri)10:11:04 No.107948046

Anonymous 01/23/26(Fri)10:11:04 No.107948046

>>107948034
Yeah tried on and off, same result

Anonymous
01/23/26(Fri)10:15:26 No.107948083

Anonymous 01/23/26(Fri)10:15:26 No.107948083

>>107945389
>Where do you guys usually get samples to train on?
ur mom

Anonymous
01/23/26(Fri)10:15:59 No.107948089

Anonymous 01/23/26(Fri)10:15:59 No.107948089

>>107947971
I never mentioned Flash nor am I poor enough to touch it

Anonymous
01/23/26(Fri)10:18:17 No.107948101

Anonymous 01/23/26(Fri)10:18:17 No.107948101

https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B
Anyone tried this?

Anonymous
01/23/26(Fri)10:19:14 No.107948110

Anonymous 01/23/26(Fri)10:19:14 No.107948110

>The faint scent of her expensive perfume - something clean and citrusy - seemed to sharpen in the air.
Sounds more like dishwashing liquid lol

Anonymous
01/23/26(Fri)10:20:46 No.107948122

Anonymous 01/23/26(Fri)10:20:46 No.107948122

>>107948110
at least its not ozone

Anonymous
01/23/26(Fri)10:21:15 No.107948125

Anonymous 01/23/26(Fri)10:21:15 No.107948125

>>107948110
You've never talked with a perfume aficionado have you?

Anonymous
01/23/26(Fri)10:28:58 No.107948173

Anonymous 01/23/26(Fri)10:28:58 No.107948173

>>107948101
Everybody moved on the next day it was released. People say it's very same-y and ignores your tags. We're now waiting for AceStep 1.5 which should be released in 10-20 days.

Anonymous
01/23/26(Fri)10:30:51 No.107948190

Anonymous 01/23/26(Fri)10:30:51 No.107948190

File: neblu0q3nt5d1.jpg (565 KB, 4096x4096)

565 KB JPG

>>107948110
be thankful she's not a fragrantica foid

Anonymous
01/23/26(Fri)10:42:43 No.107948268

Anonymous 01/23/26(Fri)10:42:43 No.107948268

>>107948190
Why I am suddenly seeing perfume slop everywhere online?

Anonymous
01/23/26(Fri)10:47:34 No.107948295

Anonymous 01/23/26(Fri)10:47:34 No.107948295

>>107948284
>>107948284
>>107948284

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.