/mlp/ - Pony Preservation Project (Thread 160) - Pony

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/mlp/ - Pony

Return Catalog Bottom Refresh

[Post a Reply]

Name
Spoiler?	[Spoiler?]
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File	[Spoiler?]
Please read the Rules and FAQ before posting.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
Pony Preservation Project (Thr(...) 06/07/26(Sun)07:55:37 No.43289079

File: altOP.jpg (1.26 MB, 2119x1500)

1.26 MB JPG

Pony Preservation Project (Thread 160) Anonymous 06/07/26(Sun)07:55:37 No.43289079

Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE

The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.

Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.

AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.

Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.

EQG and G5 are not welcome.

>Quick start guide:
docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.

>The main Doc:
docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
An in-depth repository of tutorials, resources and archives.

>Online speech generation
haysay.ai

>Active tasks:
Research into animation AI
Research into pony image generation

>Latest developments:
pastebin.com/4p00iUZM

>The PoneAI drive, an archive for AI pony voice content:
drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp

>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
drive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx

>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.

Last Thread: https://desuarchive.org/mlp/thread/43127073/#43127073

Anonymous
06/07/26(Sun)07:55:44 No.43289080

Anonymous 06/07/26(Sun)07:55:44 No.43289080

FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Main: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit

>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwq
How to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy

>Where can I find content made with the voice AI?
In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
And the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit

>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ4
2021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f
2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw5
2023 pony.tube/w/fVZShksjBbu6uT51DtvWWz

>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.

>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imitations of official voices?
No.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.

>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
pony.tube/w/mqJyvdgrpbWgZduz2cs1Cm

PPP Redubs:
pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97

Stream Premieres:
pony.tube/w/6cKnjJEZSCi3gsvrbATXnC
pony.tube/w/oNeBFMPiQKh93ePqTz1ns8

Anonymous
06/07/26(Sun)07:56:49 No.43289085

Anonymous 06/07/26(Sun)07:56:49 No.43289085

Quick thread to see the state of things.

15.ai is now dead forever.
https://x.com/fifteenai/status/2060098921582772567

Anonymous
06/07/26(Sun)10:04:02 No.43289213

Anonymous 06/07/26(Sun)10:04:02 No.43289213

Is there a dataset describing the scenes of the show, kinda like a transcript but with detailed information on who does what?

Anonymous
06/07/26(Sun)10:36:20 No.43289238

Anonymous 06/07/26(Sun)10:36:20 No.43289238

>>43289213
I don't think there would be such thing for mlp (otherwise it would be scrapped for data years ago), I guess the closes would be to try rip off netflix and other tv shows that have the Audio Description/Video Description tracks for the blind people, and see if that could be somehow used as dataset in whatever project you are planing to do.

Anonymous
06/07/26(Sun)12:34:38 No.43289345

Anonymous 06/07/26(Sun)12:34:38 No.43289345

>>43289085
This MF just abandons every project of his, doesn't he
He'll abandon that marketplace soon enough and then announce a brand new project that he totally won't abandon, bros!

Anonymous
06/07/26(Sun)13:50:54 No.43289447

Anonymous 06/07/26(Sun)13:50:54 No.43289447

File: coolshit.jpg (1.35 MB, 7096x2020)

1.35 MB JPG

>>43289238
I doubt it would be detailed enough. I heard multimodal llms can take in video, even small ones like the ones I can run on my machine, so one could theoretically tag the entire show like this.

That aside, picrel is a compilation of experiments I did a while ago while playing with show frame compression. I trained a hierarchical VQ autoencoder that encodes 256x384 frames into two maps: 16x24 and 8x12 codebook indices, each codebook has 1024 entries, and then tried to generate larger map given smaller map using discrete diffusion. It's quite undertrained but I got bored of it. Just thought you guys would appreciate the abominations, some of them are even cute.

Anonymous
06/07/26(Sun)14:08:33 No.43289481

Anonymous 06/07/26(Sun)14:08:33 No.43289481

>>43289447
I think I saw some models months ago that were able to watch 10s video and describe what was happening in a scene along with any interaction people had with the setting, I wish I could remember that the name of it was since that sounds like something you could potentially reuse for your stuff.
Now that I type all of this, I kind of wish there was a program that could easy way to create automated audiobook from a fic, but that would require for tts model to be combined with some llm to understand which characters are included in the story and automatically swap the voices of character/narrator as well as add any relevant background music and sound effects.

Anonymous
06/07/26(Sun)14:26:49 No.43289508

Anonymous 06/07/26(Sun)14:26:49 No.43289508

>>43289481
>easy way to create automated audiobook from a fic
I made such app a while ago https://files.catbox.moe/cwj64u.mp4
https://drive.google.com/drive/folders/14zMbURz1SuYNMoewX88EjR8sHEkcaKXa

Anonymous
06/07/26(Sun)14:41:23 No.43289524

Anonymous 06/07/26(Sun)14:41:23 No.43289524

>>43289508
>if yours only supports cuda 11.x but you still want to run on gpu, run the following inside PVT folder: runtime/python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Could it be possible to make it work with the cu116? My system+gpu can't get python to work with anything above that, and Im not going to be upgrading my pc for at least next two/three years.

Anonymous
06/07/26(Sun)15:00:59 No.43289556

Anonymous 06/07/26(Sun)15:00:59 No.43289556

>>43289524
Replace cu118 with cu116 and try it, should work.

Anonymous
06/08/26(Mon)02:58:25 No.43290551

Anonymous 06/08/26(Mon)02:58:25 No.43290551

Oh hey, the lights are back on here?

Anonymous
06/08/26(Mon)17:02:42 No.43291353

Anonymous 06/08/26(Mon)17:02:42 No.43291353

does anyone know if there was an upgrade in openly available musical ai models or is the yue music model from last/two years ago still the "best" option?

Anonymous
06/08/26(Mon)18:01:17 No.43291450

Anonymous 06/08/26(Mon)18:01:17 No.43291450

What would people think about trying to make some kind of collaborative "album", with Anons posting their pony songs made with whatever ai model/service of their choice every Friday?

Anonymous
06/09/26(Tue)01:19:13 No.43291959

Anonymous 06/09/26(Tue)01:19:13 No.43291959

>>43291450
The idea sounds interesting. I'm not sure if there are enough Anons who are actively genning to make that system work though.

Anonymous
06/09/26(Tue)11:24:23 No.43292467

Anonymous 06/09/26(Tue)11:24:23 No.43292467

Up.

Anonymous
06/09/26(Tue)21:14:10 No.43293091

Anonymous 06/09/26(Tue)21:14:10 No.43293091

File: angrycelly.png (447 KB, 880x706)

447 KB PNG

Hello! Maybe someone here could help me find a certain audio file I'm looking for. It's Princess Celestia saying "And so, as punishment for your insolence and treachery, I hereby sentence you to death by decapitation! May the blade fall swiftly and your head be raised high as a warning to all those who would dare to defy the will of the crown." I can't remember whether it was AI generated or audio from Nicole Oliver herself or a fan dub, but I remember that the quality was very good. I know I listened to it in November 2024, but it may have been posted way before that. I scoured the PPP threads on desuarchive and Clipper's master files and did not find it.

Anonymous
06/09/26(Tue)21:19:05 No.43293096

Anonymous 06/09/26(Tue)21:19:05 No.43293096

>>43293091
>The PoneAI drive, an archive for AI pony voice content:
drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
Try here, if it was genned in a thread then it'll be found in here.

Anonymous
06/09/26(Tue)21:55:01 No.43293144

Anonymous 06/09/26(Tue)21:55:01 No.43293144

>>43293096
Thank you, Anon. There's lots of good audio in there but, alas, I could not find the file.

Anonymous
06/09/26(Tue)23:16:16 No.43293256

Anonymous 06/09/26(Tue)23:16:16 No.43293256

>>43293091
I found it! I can't believe I missed it the first time; it's right there on desu in Nov 2024 just like I said, >>41653178
Direct link to the audio, for anyone interested:
https://vocaroo.com/14eyuFuDu0Zs

Anonymous
06/10/26(Wed)02:08:34 No.43293432

Anonymous 06/10/26(Wed)02:08:34 No.43293432

>>43293091
>>43293256
That sounds pretty OOC for Celly.

Anonymous
06/10/26(Wed)13:23:40 No.43293971

Anonymous 06/10/26(Wed)13:23:40 No.43293971

nine

Anonymous
06/11/26(Thu)03:28:27 No.43294893

Anonymous 06/11/26(Thu)03:28:27 No.43294893

>>43293971
plus one

Anonymous
06/11/26(Thu)17:57:08 No.43296088

Anonymous 06/11/26(Thu)17:57:08 No.43296088

>cant use raw audio clip because singing reverb fucks with the voice conversion
>cant use the DeReverbed audio because random bits of words are just nuked from the clip
dear song makers, plz stop using aggressive autotune and fake ass reverb, its making finding a good song for ai pony covers way more difficult than it needs to be.

Anonymous
06/11/26(Thu)18:44:24 No.43296146

Anonymous 06/11/26(Thu)18:44:24 No.43296146

>https://u.pone.rs/basikmdv.mp3
>technobridge523 - BLACK SUNSET (Cover)
What a strange feeling, bumping into a brand new ai cover of old original ai song made in ppp thread from two years ago.

Anonymous
06/12/26(Fri)01:12:29 No.43296650

Anonymous 06/12/26(Fri)01:12:29 No.43296650

What's the latest and greatest in local voice gen?

Anonymous
06/12/26(Fri)09:43:01 No.43297041

Anonymous 06/12/26(Fri)09:43:01 No.43297041

we are so dead

Anonymous
06/12/26(Fri)09:52:27 No.43297051

Anonymous 06/12/26(Fri)09:52:27 No.43297051

>>43289079
We drove this deep into the heart of the Archives. We thought it was dead. We were wrong.

Anonymous
06/12/26(Fri)12:57:06 No.43297359

Anonymous 06/12/26(Fri)12:57:06 No.43297359

>>43297041
we are so alive

Anonymous
06/12/26(Fri)17:22:59 No.43297664

Anonymous 06/12/26(Fri)17:22:59 No.43297664

Gpt sovits on haysay is fucked and keeps giving an error. The only decent one that allows text to speech and has all the good characters.

Anonymous
06/12/26(Fri)17:49:32 No.43297705

Anonymous 06/12/26(Fri)17:49:32 No.43297705

>>43297664
Hmm, Im guessing HydrusBeta amazon server decided to crash out on that specific tts code , which sucks balls. Im guessing it wouldn't be helpful saying there is local ui installation of the Gpt sovits that can run pretty OK on 8gb vram? ~~worse come to worse, you could always request audio clip form anons in the thread here, as long as it's pony related it would be fine~~

HydrusBeta
06/12/26(Fri)22:07:20 No.43297961

HydrusBeta 06/12/26(Fri)22:07:20 No.43297961

>>43297664
There was a JSON file that got into an corrupted state. I've regenerated it, and I think that fixed the issue. Let me know if the problem persists.

Anonymous
06/13/26(Sat)01:13:00 No.43298139

Anonymous 06/13/26(Sat)01:13:00 No.43298139

>>43297705
I've been working around it by using the characters on the tts one and converting them into the other voices with mixed since they require an audio input.

>>43297961
This worked, thank you anon.

Anonymous
06/13/26(Sat)05:15:20 No.43298411

Anonymous 06/13/26(Sat)05:15:20 No.43298411

>>43297961
Thanks for the fix.

Anonymous
06/13/26(Sat)14:48:22 No.43298884

Anonymous 06/13/26(Sat)14:48:22 No.43298884

>>43297961
Thanks anon!

Anonymous
06/13/26(Sat)19:58:22 No.43299278

Anonymous 06/13/26(Sat)19:58:22 No.43299278

>>43297961
noice one, m8!

Anonymous
06/14/26(Sun)02:47:54 No.43299651

Anonymous 06/14/26(Sun)02:47:54 No.43299651

>>43289085
Didn't he say he was going to open source this stuff eventually? Now would be a good time.

Anonymous
06/14/26(Sun)13:02:59 No.43300095

Anonymous 06/14/26(Sun)13:02:59 No.43300095

>>43299651
I dont think he ever has, and even posting on ppp was very sporadic in 2019/2020. It would be cool if he had, or at least maybe create the documents showing exact steps of how it was made so people with the know-how would be able to recreate it with the modern tech.

Anonymous
06/14/26(Sun)13:21:09 No.43300117

Anonymous 06/14/26(Sun)13:21:09 No.43300117

>>43299651
>>43300095
I'm certain he at least stated several times that he was gonna release a paper about his research, which he never did, and likely never will.

Anonymous
06/14/26(Sun)21:56:16 No.43300618

Anonymous 06/14/26(Sun)21:56:16 No.43300618

>>43300117
Damn

Anonymous
06/15/26(Mon)00:19:20 No.43300755

Anonymous 06/15/26(Mon)00:19:20 No.43300755

>>43289079
hey check out my fuckass tts. I'll release model and stuff for realsies soon
samples:
https://files.catbox.moe/b5asjw.wav
https://files.catbox.moe/2vk68m.wav
demo:
http://198.53.64.194:35029/

Anonymous
06/15/26(Mon)01:12:50 No.43300798

Anonymous 06/15/26(Mon)01:12:50 No.43300798

File: Mare ooo.gif (428 KB, 235x274)

428 KB GIF

>>43300755

Anonymous
06/15/26(Mon)04:30:15 No.43300957

Anonymous 06/15/26(Mon)04:30:15 No.43300957

File: 645067__safe_artist-colon(...).png (630 KB, 5095x5993)

630 KB PNG

>>43300755

https://files.catbox.moe/hpg3m1.wav

https://files.catbox.moe/xrrt16.wav

https://files.catbox.moe/uz1fyc.wav

Anonymous
06/15/26(Mon)08:47:02 No.43301171

Anonymous 06/15/26(Mon)08:47:02 No.43301171

>>43300755
Is this your own arch or did you tune something?

Anonymous
06/15/26(Mon)11:06:57 No.43301307

Anonymous 06/15/26(Mon)11:06:57 No.43301307

>>43301171
Both. I'll explain later.

Anonymous
06/15/26(Mon)14:32:31 No.43301515

Anonymous 06/15/26(Mon)14:32:31 No.43301515

>>43300755
>https://files.catbox.moe/x2bt7n.wav
hey, thats neat. I can do a tf2 audio shitposting again.

Anonymous
06/15/26(Mon)15:28:13 No.43301579

Anonymous 06/15/26(Mon)15:28:13 No.43301579

>>43300755
This is really, really good - at least, from the small samples I've generated. And I bet it sounds even better if you know what the hell you're doing with top_p, temperature, and those other autistic sliders. Is there a release window?

Anonymous
06/15/26(Mon)15:29:49 No.43301581

Anonymous 06/15/26(Mon)15:29:49 No.43301581

>>43301579
>>43300755
Also, just my intuition, but did you use S1 voice clips for your model? If so, bravo.

Anonymous
06/15/26(Mon)17:19:12 No.43301712

Anonymous 06/15/26(Mon)17:19:12 No.43301712

>>43301579
Tomorrow at latest, tonight at earliest.

Anonymous
06/15/26(Mon)17:24:35 No.43301727

Anonymous 06/15/26(Mon)17:24:35 No.43301727

>>43301712
Awesome. Tested it a bit more and it'll probably be my daily driver for AI TTS. Will you be posting guides and whatnot, especially for things like interjections (hm, huh, wow, etc.)? Doing those just result in long silences; makes me think it might be bugged or something. And is the 30s audio output limit for the testing phase only?

Anonymous
06/15/26(Mon)17:35:15 No.43301735

Anonymous 06/15/26(Mon)17:35:15 No.43301735

>>43301727
>Will you be posting guides and whatnot, especially for things like interjections (hm, huh, wow, etc.)
Not even I know how to get those out of the model, that will require further research
>And is the 30s audio output limit for the testing phase only?
Yeah, plus a rate limit so everyone gets to test it. When running it by yourself you can do whatever you want.

Anonymous
06/15/26(Mon)17:41:08 No.43301745

Anonymous 06/15/26(Mon)17:41:08 No.43301745

File: p7o3wwkqyz271-760489712.png (93 KB, 498x375)

93 KB PNG

>>43301515

https://files.catbox.moe/chaxe4.wav

Anonymous
06/15/26(Mon)22:21:03 No.43302090

Anonymous 06/15/26(Mon)22:21:03 No.43302090

>>43301745
kek

Anonymous
06/16/26(Tue)01:46:20 No.43302289

Anonymous 06/16/26(Tue)01:46:20 No.43302289

>>43300755
Heyo, any chance for option to automatically convert the wav to MP3 files when generating them (I'm just but lazy to converting every single file by hand) ?

Anonymous
06/16/26(Tue)06:15:55 No.43302597

Anonymous 06/16/26(Tue)06:15:55 No.43302597

File: 1663805093515.png (152 KB, 868x920)

152 KB PNG

>>43300755
https://u.pone.rs/hyboomqt.wav
Alright, I've been pretty jaded towards voice AI as of late, but I gotta admit this is pretty impressive.

Anonymous
06/16/26(Tue)12:28:47 No.43303044

Anonymous 06/16/26(Tue)12:28:47 No.43303044

>>43302597
Yup. It'll still require some audio splicing between multiple generations in some cases for the best possible take, but the quality floor has a notable bump to it compared to past TTS software I've dabbled in.

Delta
06/16/26(Tue)13:23:14 No.43303129

Delta 06/16/26(Tue)13:23:14 No.43303129

File: mosst tts pny completeguide.png (513 KB, 1805x2460)

513 KB PNG

>>43300755
MOSS-TTS-1.7B-PNY v0.1: a finetune of MOSS-TTS + my custom vocoder for 48KHz audio

HuggingFace: https://huggingface.co/ZDisket/MOSS-TTS-PNY
Colab Notebook: https://colab.research.google.com/drive/1tDIYCMumcW5w3JWnQ0tBGyAr-ZpaaXBB
Public demo: http://198.53.64.194:35029/

See pic related on how to run on Google Colaboratory.
For local setup on your own hardware, you want at least 13GB of VRAM. Model runs ~1.5x realtime on a single RTX 5090 with the optimized runner. Download from HF and ask Claude Code to set it up for you.

>>43301171
From a technical perspective, this consists of two models: 1. A finetune of MOSS-TTS with fixed speaker conditioning, and 2. A very custom iSTFTNet2 vocoder that turns hidden states of the MOSS Audio tokenizer into 48KHz audio (which can be also repurposed for singing voice conversion).
>>43301515
The TF2 speakers are a bit lower quality because they were thrown in as an afterthought. Next version will include better emotion control and quality.

Anonymous
06/16/26(Tue)14:08:38 No.43303171

Anonymous 06/16/26(Tue)14:08:38 No.43303171

>>43303129
Neat, thanks anon, gonna check it out

Anonymous
06/16/26(Tue)15:53:20 No.43303320

Anonymous 06/16/26(Tue)15:53:20 No.43303320

>>43303129
It works! Very expressive, though it underpronounces some words.
https://litter.catbox.moe/inza6f99g9ioti7z.wav
https://litter.catbox.moe/epx5r2ssfynizrzv.wav
I think it would benefit from running the lm quantized with ggml. It's current ~12 gb vram footprint makes it impractical for most tasks, and it's pretty slow too.

Anonymous
06/16/26(Tue)16:08:09 No.43303339

Anonymous 06/16/26(Tue)16:08:09 No.43303339

>>43303129
>Delta
>Google Colab
This feels like one hell of a throwback to the very early days of PPP. Welcome back and godspeed.

Delta
06/16/26(Tue)18:02:52 No.43303475

Delta 06/16/26(Tue)18:02:52 No.43303475

File: anf shake tail.gif (198 KB, 560x526)

198 KB GIF

>>43303320
Known issue probably due to not much data, try lowering the temperature for now, something like 0.6. Temperature basically controls how "creative" the model is. Higher temperature is more chaotic.
>It's current ~12 gb vram footprint makes it impractical for most tasks, and it's pretty slow too
Very unoptimized, a 1.7B transformer should run much faster. I should figure something out for that soon
>>43303339
Thank you. I'm only getting started. Need to improve this and figure out singing voice conversion and LLM finetunes ~~in 2024 I finetuned Llama on a desuarchive dump of /mlp/ and found out I could ERP with it in greentext~~

Anonymous
06/16/26(Tue)19:13:37 No.43303552

Anonymous 06/16/26(Tue)19:13:37 No.43303552

File: 1511063309973.png (344 KB, 685x1024)

344 KB PNG

>>43303129
Cool, also Celestia is best pony.

Anonymous
06/16/26(Tue)19:56:22 No.43303611

Anonymous 06/16/26(Tue)19:56:22 No.43303611

>>43303320
>~12 gb vram footprint makes it impractical for most tasks
Sadly this, the reason why rvc exploded as ai voice tool was that it was poorfag friendly as it could run in 4gb vram and train any voice on a 8gb vram card.

Anonymous
06/17/26(Wed)03:31:19 No.43304033

Anonymous 06/17/26(Wed)03:31:19 No.43304033

>>43303320
>litterbox
Damn

Anonymous
06/17/26(Wed)12:43:22 No.43304608

Anonymous 06/17/26(Wed)12:43:22 No.43304608

>>43303129
noice

Anonymous
06/17/26(Wed)20:11:39 No.43305180

Anonymous 06/17/26(Wed)20:11:39 No.43305180

Bump.

Anonymous
06/18/26(Thu)02:06:22 No.43305634

Anonymous 06/18/26(Thu)02:06:22 No.43305634

>>43303129
I was watching family guy reruns on cytube the other day, and I thought what the hell, I'll test the TTS using the show's dialogue. This tool actually rocks. The audio quality is INSANELY crisp and practically artifact-free (at least for the M6 and Trixie - haven't tested anybody else yet) I think once you release better emotion control, it will truly be heads-and-shoulders above 15.ai, because as it stands, I'm not sure if there's a way to force a certain emotion into the line's delivery unless the dialogue itself clearly betrays that particular emotion.

https://files.catbox.moe/u3hjsj.wav
https://files.catbox.moe/ob9myx.wav
https://files.catbox.moe/yurj0p.wav
https://files.catbox.moe/o5olvb.wav
https://files.catbox.moe/bogptq.wav
https://files.catbox.moe/75ttqf.wav
https://files.catbox.moe/fx1zry.wav
https://files.catbox.moe/ifr120.wav
https://files.catbox.moe/isd7iz.wav
https://files.catbox.moe/k9espp.wav
https://files.catbox.moe/vsw445.wav
https://files.catbox.moe/l1ahoj.wav

Godspeed, fren. Looking forward to those updates.

Anonymous
06/18/26(Thu)04:41:01 No.43305746

Anonymous 06/18/26(Thu)04:41:01 No.43305746

>>43303475
I played around with optimization, q4 quantization doesn't speed it up, the bottleneck is n_vq_for_inference. Halving it (16 instead of 32) produces audio in half the time but it's not as good https://litter.catbox.moe/uqzjohq2d3i03o9k.wav but I assume you already know all this. I wonder if 32 codebook channels is overkill for this task since it only has a limited set of voices to represent.

Anonymous
06/18/26(Thu)07:57:02 No.43305871

Anonymous 06/18/26(Thu)07:57:02 No.43305871

>>43301515
Good luck, anon - I made the Heavy say
>We are going to destroy Israel AND generate ponies!

Delta
06/18/26(Thu)15:06:10 No.43306298

Delta 06/18/26(Thu)15:06:10 No.43306298

>>43305634
Emotion control is in the plans. I'll stick with Cookie's BERT conditioning technique. Audio still has artifacts on the TF2 speakers as those represent only a minority in its training set.
>>43305746
I've got it up to ~3x realtime with some cudagraph and torch compile witchcraft
>I wonder if 32 codebook channels is overkill for this task since it only has a limited set of voices to represent
That I was wondering too. It would probably require training a vocoder specifically for that.
>>43303611
Is RVC still the go-to for singing voice conversion? I can swap out the vocoder in it for mine to improve its audio quality

Anonymous
06/18/26(Thu)15:20:11 No.43306318

Anonymous 06/18/26(Thu)15:20:11 No.43306318

>>43306298
>Is RVC still the go-to for singing voice conversion?
Yep.

Anonymous
06/18/26(Thu)15:34:06 No.43306329

Anonymous 06/18/26(Thu)15:34:06 No.43306329

>>43303129
These models sound amazing! What are the limits? Are you able to train a derpy model or is there not enough data?

Anonymous
06/18/26(Thu)17:10:56 No.43306445

Anonymous 06/18/26(Thu)17:10:56 No.43306445

>>43306298
I'll be waiting for updates

Anonymous
06/19/26(Fri)02:11:28 No.43307079

Anonymous 06/19/26(Fri)02:11:28 No.43307079

>>43306298
>Emotion control is in the plans.
noice

Anonymous
06/19/26(Fri)02:42:16 No.43307109

Anonymous 06/19/26(Fri)02:42:16 No.43307109

>fucking around interdimentionally
>https://files.catbox.moe/bbmjgz.mp3
A old audio conversion/remix of the same name panel from last year (or two years ago?). Reposting to see other anons would enjoy this meditation thing .

Anonymous
06/19/26(Fri)03:52:01 No.43307177

Anonymous 06/19/26(Fri)03:52:01 No.43307177

Did Clipper's 1st master file MEGA get nuked? Whenever I try to access it, it just loads indefinitely.

Anonymous
06/19/26(Fri)13:06:35 No.43307688

Anonymous 06/19/26(Fri)13:06:35 No.43307688

>>43307177
Seems all there to me.

Anonymous
06/19/26(Fri)19:45:27 No.43308222

Anonymous 06/19/26(Fri)19:45:27 No.43308222

>>43305634
kek these lines

Anonymous
06/19/26(Fri)19:58:43 No.43308246

Anonymous 06/19/26(Fri)19:58:43 No.43308246

>>43307177
Nvm its working again

Anonymous
06/20/26(Sat)04:26:59 No.43308820

Anonymous 06/20/26(Sat)04:26:59 No.43308820

>>43294893

Anonymous
06/20/26(Sat)11:19:36 No.43309193

Anonymous 06/20/26(Sat)11:19:36 No.43309193

snowpity

Anonymous
06/20/26(Sat)13:42:55 No.43309365

Anonymous 06/20/26(Sat)13:42:55 No.43309365

File: time to be alive.png (53 KB, 600x280)

53 KB PNG

Im glad all of you Anons are all well.

Anonymous
06/20/26(Sat)17:30:08 No.43309693

Anonymous 06/20/26(Sat)17:30:08 No.43309693

>>43307109
comfy audio

Anonymous
06/21/26(Sun)02:11:35 No.43310315

Anonymous 06/21/26(Sun)02:11:35 No.43310315

>>43309365
Sunbeam

Anonymous
06/21/26(Sun)05:47:16 No.43310466

Anonymous 06/21/26(Sun)05:47:16 No.43310466

>>43309693
thanks, I had an early version were I converted the og audio with just rvc but the audio outcome was very meh, lucky rvc-sovits came out some time around it so I could gen Luna talking in a proper calm yet reassuring emotional style.

Anonymous
06/21/26(Sun)15:28:28 No.43311143

Anonymous 06/21/26(Sun)15:28:28 No.43311143

>>43310789
ai mare

Delta
06/21/26(Sun)17:58:40 No.43311332

Delta 06/21/26(Sun)17:58:40 No.43311332

>>43307079
>>43305634
Working on emotion control. More complicated than I expected because this big ass transformer model is less responsive to conditioning than models of old. Regardless, I'm toying around with an emotion category + an adjustable energy value (from 0 to 100).
For example, Twilight, same sentence, same Happy emotion:
95% energy: https://files.catbox.moe/smt36d.wav
15% energy: https://files.catbox.moe/nl0opy.wav

>>43306329
Not enough data for Derpy, unless we start using fan VAs (and if they have enough clean data for that)

Anonymous
06/21/26(Sun)19:28:02 No.43311442

Anonymous 06/21/26(Sun)19:28:02 No.43311442

File: 1370206539512.png (31 KB, 181x248)

31 KB PNG

>>43311332
Dang, nice work anon! Thanks for the swift turnaround on the feedback! If you're able to implement this successfully, it would help out a ton for the projects I have planned. Keep it up!

Anonymous
06/21/26(Sun)21:17:55 No.43311620

Anonymous 06/21/26(Sun)21:17:55 No.43311620

>>43311332
We are so back

Anonymous
06/22/26(Mon)06:48:19 No.43312305

Anonymous 06/22/26(Mon)06:48:19 No.43312305

>>43311332
Very nice, but is there any reason why she sounds more enthusiastic on the 15% than the 95% one?

Delta
06/22/26(Mon)10:41:35 No.43312608

Delta 06/22/26(Mon)10:41:35 No.43312608

>>43312305
Woops, I flipped the labels. They are in the wrong order; 95% is the 15% and 15% is the 95%.
On another angle, I found some NSFW voice pack: https://opennsfw.carrd.co/#vo2 ; I'm getting it transcribed so I can throw this into the training set of the next model. The gist of it is that it will teach the model this stuff and transfer capabilities to the pony speakers too.

Anonymous
06/22/26(Mon)13:00:20 No.43312766

Anonymous 06/22/26(Mon)13:00:20 No.43312766

>>43311332
>emotions controlled by percentage
fuck yes, this is exactly what I wanted since forever. Please tell me, you are having plans in adding option to mix two emotions together as well?

Anonymous
06/22/26(Mon)13:14:09 No.43312779

Anonymous 06/22/26(Mon)13:14:09 No.43312779

>>43311332
anon... i litterally cannot think of a way it gets better than this
genuinly what else is there beyond this? it's functionally everything we could need, it's 1:1 like the voices i hear nothing robotic, it has the emotion, and all the little things that make it the pony they are
the one time i keep up with /ppp/ and it's already at the end game

Anonymous
06/22/26(Mon)16:10:42 No.43313052

Anonymous 06/22/26(Mon)16:10:42 No.43313052

>>43312608
>Woops, I flipped the labels.
Ah yeah I figured.
Good work though! She sounds really happy in the higher energy sample.

Anonymous
06/22/26(Mon)16:16:47 No.43313059

Anonymous 06/22/26(Mon)16:16:47 No.43313059

>>43312608
>horny 100%
aw yiss

Anonymous
06/22/26(Mon)18:56:17 No.43313228

Anonymous 06/22/26(Mon)18:56:17 No.43313228

>>43303129
>>43313194
awesome

Anonymous
06/23/26(Tue)02:22:20 No.43313699

Anonymous 06/23/26(Tue)02:22:20 No.43313699

Up.

Anonymous
06/23/26(Tue)02:30:14 No.43313705

Anonymous 06/23/26(Tue)02:30:14 No.43313705

>>43313228
Now, this is all I could had ask for and more

Anonymous
06/23/26(Tue)07:32:20 No.43313914

Anonymous 06/23/26(Tue)07:32:20 No.43313914

>>43303129
if hydrus is still active in this thread, can we get this on haysay?

Anonymous
06/23/26(Tue)11:48:05 No.43314139

Anonymous 06/23/26(Tue)11:48:05 No.43314139

>>43312608
mister delta please please oh please tell me this emotions update will be coming soon all of dash's lines are screaming and i can't get her to settle the fuck down for emotional lines

Poopsikins
06/23/26(Tue)11:51:09 No.43314144

Poopsikins 06/23/26(Tue)11:51:09 No.43314144

>>43311332
amazing work, can't wait to see more updates!
so happy that the ppp is back, it motivates me to gen again.
https://files.catbox.moe/8g084q.wav

ThunderShy
06/23/26(Tue)12:50:55 No.43314205

ThunderShy 06/23/26(Tue)12:50:55 No.43314205

hi ponys whats up guess whos back

ThunderShy
06/23/26(Tue)12:52:33 No.43314207

ThunderShy 06/23/26(Tue)12:52:33 No.43314207

>>43314144
poopikins thought u were gone

Delta
06/23/26(Tue)13:01:49 No.43314222

Delta 06/23/26(Tue)13:01:49 No.43314222

>>43312766
First version will be just select from a set of emotions + energy slider, as it works decently.
>>43312779
There's still a lot of work to be done, like improving voice conversion and LLMs
>>43314139
Yes. Technically everything's in place but I'm trying to bundle NSFW too, which is proving a bit complicated. If I can't figure it out by the end of week, I'll just release as it is.

Anonymous
06/23/26(Tue)13:18:14 No.43314237

Anonymous 06/23/26(Tue)13:18:14 No.43314237

>>43314222
i love you thank you nonny this is fucking awesome

Anonymous
06/23/26(Tue)16:48:18 No.43314498

Anonymous 06/23/26(Tue)16:48:18 No.43314498

>>43314144
sunbeam

Anonymous
06/23/26(Tue)17:45:21 No.43314572

Anonymous 06/23/26(Tue)17:45:21 No.43314572

>>43314222
As someone in the minority who preferred using reference audio, do you plan to add an option for that? For some things, it's faster for me to just speak the lines and get it right once compared to the ritual of spamming generate for half an hour with 15. This certainly outputs faster, but I prefer to fine tune with different takes instead of adjusting emotions with sliders, especially because things like stammering and fluctuating tone of voice can't be changed easily with TTS. But so-vits and RVC are certainly showing their age compared to this.

Anonymous
06/23/26(Tue)21:14:23 No.43314851

Anonymous 06/23/26(Tue)21:14:23 No.43314851

>>43314207
Don't call it a comeback.

Anonymous
06/23/26(Tue)23:34:50 No.43315015

Anonymous 06/23/26(Tue)23:34:50 No.43315015

>>43314222
>I'm trying to bundle NSFW too
like
fucking lewd noises and a "horny" meter or some shit for the pony voices?
oh boy

Anonymous
06/23/26(Tue)23:40:55 No.43315021

Anonymous 06/23/26(Tue)23:40:55 No.43315021

>>43312608
>On another angle, I found some NSFW voice pack: https://opennsfw.carrd.co/#vo2 ; I'm getting it transcribed so I can throw this into the training set of the next model. The gist of it is that it will teach the model this stuff and transfer capabilities to the pony speakers too.
BASED BASED BASED BASED BASED

Anonymous
06/23/26(Tue)23:41:50 No.43315022

Anonymous 06/23/26(Tue)23:41:50 No.43315022

>>43315015
>>43315021
i'm so fucking excited to put it in my vn mod you anons have no idea

ThunderShy
06/24/26(Wed)01:07:06 No.43315097

ThunderShy 06/24/26(Wed)01:07:06 No.43315097

>>43314851
i will call it a comeback if i want to, the reason why i left this forumn was because of idiots like you, and you know ultimately happened the thread died ?, im here to at least provide content and keep this thing alive so no go fuck yourself with a cucumber degenerate fuck face no one fucking likes you

Delta
06/24/26(Wed)01:16:09 No.43315106

Delta 06/24/26(Wed)01:16:09 No.43315106

>>43314222
>>43315015
>>43315021
NSFW kinda works but is leading me down a research black hole at da moment(current iteration is extremely unstable) so I am dropping it for now in favor of focusing on emotion control.
Like, these are the only decent-ish samples I could squeeze out of it: https://files.catbox.moe/ehxmzw.wav ; https://files.catbox.moe/ebwxit.wav
>>43314572
Oh yeah, I plan on getting voice conversion upgraded soon.

Anonymous
06/24/26(Wed)01:30:59 No.43315122

Anonymous 06/24/26(Wed)01:30:59 No.43315122

>>43315106
>dat dashie audio
>cum for me anon
UUUUUUUUUUUUUNFFFF!!!!!!!!!!!!!!!
>dropping it
Fuck my nigger existence

Anonymous
06/24/26(Wed)02:14:22 No.43315158

Anonymous 06/24/26(Wed)02:14:22 No.43315158

>>43315106
I wouldn't mind some beta unstable testing version being on the page for the time being, but also if that's going to be a giant fucking hassle to have then don't bother yeah. Cool work though.

Anonymous
06/24/26(Wed)04:20:23 No.43315251

Anonymous 06/24/26(Wed)04:20:23 No.43315251

>>43315158
Also, adding onto this, I think NSFW is less of a priority than enabling further general control anyways. Being able to, yeah, control emotions and general sentiment etc is great, but being able to control intonation by placing some sort of emphasis on certain words etc would do a ton for making there be less gacha in getting what you want.

That being said, NSFW is still peak and something I'd love to see at some point. Some kind of ASMR toggle to go with it to make it sound like it's being whispered in your ear would likely do a lot for some anons too kek.

Anonymous
06/24/26(Wed)05:25:31 No.43315322

Anonymous 06/24/26(Wed)05:25:31 No.43315322

>>43315106
I hope you will make it a docker image with all the optimizations and shit

Anonymous
06/24/26(Wed)08:15:26 No.43315474

Anonymous 06/24/26(Wed)08:15:26 No.43315474

>>43315106
Btw, what's the current vram and cuda requirements to run this?

Anonymous
06/24/26(Wed)16:44:31 No.43316060

Anonymous 06/24/26(Wed)16:44:31 No.43316060

mare bump

Anonymous
06/25/26(Thu)06:48:09 No.43316730

Anonymous 06/25/26(Thu)06:48:09 No.43316730

>>43316060
Yeah I'm definitely excited to see what's cooking with the new TTS. It's worth keeping the thread boomped for it. The /chag/ usage of it was so damn cool already.

Anonymous
06/25/26(Thu)18:42:17 No.43317647

Anonymous 06/25/26(Thu)18:42:17 No.43317647

File: tf2 1750730551364987.png (265 KB, 860x1000)

265 KB PNG

>https://u.pone.rs/gykynemn.mp3

Anonymous
06/25/26(Thu)21:40:45 No.43317896

Anonymous 06/25/26(Thu)21:40:45 No.43317896

>>43317647
and the inside of a horse

Anonymous
06/26/26(Fri)01:33:51 No.43318088

Anonymous 06/26/26(Fri)01:33:51 No.43318088

>>43316730
>by end of week
just two days away just two days away its like christmas fucking morning

Anonymous
06/26/26(Fri)09:27:23 No.43318465

Anonymous 06/26/26(Fri)09:27:23 No.43318465

>>43318088 (checked)
Man I completely spaced that. God I'm so excited. I've still been using the demo like daily. I'm holding onto some lines I generated for potential pony shitpost projects and I'm really excited about it. This stuff is an OC inspiration goldmine.

Anonymous
06/26/26(Fri)13:04:37 No.43318684

Anonymous 06/26/26(Fri)13:04:37 No.43318684

>>43317647
Truth

Anonymous
06/26/26(Fri)14:24:41 No.43318783

Anonymous 06/26/26(Fri)14:24:41 No.43318783

>https://www.youtube.com/watch?v=45-GDaNgfM4
new BGM kino ~~its mostly an instrumental but there are still some ai voice bits in there so it counts~~

Anonymous
06/26/26(Fri)14:27:43 No.43318794

Anonymous 06/26/26(Fri)14:27:43 No.43318794

>>43318783
>pony zone was 5 years ago
i can feel my bones crumbling to dust

Anonymous
06/26/26(Fri)19:04:33 No.43319180

Anonymous 06/26/26(Fri)19:04:33 No.43319180

>>43318794
so do i bro, so do i

Anonymous
06/27/26(Sat)07:45:00 No.43319862

Anonymous 06/27/26(Sat)07:45:00 No.43319862

>>43315106
Oh, forgot to ask: is there a way to remove the 30 second output limit? It's still present, even when ran locally.

Anonymous
06/27/26(Sat)13:38:22 No.43320261

Anonymous 06/27/26(Sat)13:38:22 No.43320261

Pump seed inside mares

Delta
06/27/26(Sat)15:55:38 No.43320423

Delta 06/27/26(Sat)15:55:38 No.43320423

File: 3458815 (1).gif (2.51 MB, 1440x1080)

2.51 MB GIF

These big transformer models are pretty hard to condition on emotion; mine was ignoring the labels so I had to devise an output head which runs emotion classification, so that forces the model to actually pay attention to the emotion labels. It works but is a 'lil bit more subtle than in other models. For example, Rainbow Dash:

Angry, 95% energy: https://u.pone.rs/evxtaazv.wav
Neutral, 20% energy: https://u.pone.rs/oquscjwf.wav

But it should get stronger with more training, I'll release this in the next 2-4 days ~~So much technical stuff going on, maybe I should see if I can do a university-style lecture at a future Mare Fair.~~
Pic unrelated, Lucky Roll makes me hard
>>43319862
Whoopsie. There's a config knob in the code somewhere, it should be easy to find or just ask your favorite coding agent to do so--I'm fully sloppilled and don't read nor write code manually anymore. If you don't have any subscriptions, OpenCode offers free usage of good enough models.
>>43315474
13GB VRAM, any CUDA that can run modern pytorch will do
>>43315122
NSFW will be in some future version.

Anonymous
06/27/26(Sat)15:59:59 No.43320427

Anonymous 06/27/26(Sat)15:59:59 No.43320427

>>43320423
>13GB VRAM,
Fug
>any CUDA that can run modern pytorch will do
Double fug

Anonymous
06/27/26(Sat)16:29:57 No.43320467

Anonymous 06/27/26(Sat)16:29:57 No.43320467

>>43320423
Oh shit, was just making a post about how excited I was for the next version. Fucking awesome. I'm really looking forward to the chance to have Ponk actually speak softly in some lines. I was putting in cutesy comfy waifu stuff for her to say, but she would always yell it, haha. Twilight's been the most consistently tonally good voice for me so far from my experimenting.

Anonymous
06/27/26(Sat)16:30:49 No.43320468

Anonymous 06/27/26(Sat)16:30:49 No.43320468

>>43320427
Yeah I wish it were more compact so that I could load it with a language model

Anonymous
06/27/26(Sat)22:57:15 No.43320891

Anonymous 06/27/26(Sat)22:57:15 No.43320891

>>43320468
yeah, would be nice if there were voice and text models that weighted under 1GB to make it possible to have a "discussion" with multiple characters at the same time that can't actually read each other minds like the current llms do.

Anonymous
06/28/26(Sun)04:20:47 No.43321094

Anonymous 06/28/26(Sun)04:20:47 No.43321094

>>43318794
>pony zone was 5 years ago
Good God, how did that happen?

Poopsikins
06/28/26(Sun)21:32:18 No.43322236

Poopsikins 06/28/26(Sun)21:32:18 No.43322236

>>43314144
what I've made so far over the last few days, all with MOSS.

Awkward Dash and Trix:
https://files.catbox.moe/8g084q.wav
~~https://youtu.be/9NO_LqQfNpA?t=10~~

Smart Dash:
https://files.catbox.moe/p3l2om.wav
~~https://www.youtube.com/watch?v=SBiXajenKrg~~

Why would anon do that:
https://files.catbox.moe/nbalhb.wav
~~https://www.youtube.com/watch?v=m3MaTuv6QHI~~

Anonymous
06/29/26(Mon)02:07:19 No.43322632

Anonymous 06/29/26(Mon)02:07:19 No.43322632

>>43322236
kek, nice work.
~~woundn't mind Trixie sucking on my nose ifkwim~~

Anonymous
06/29/26(Mon)05:21:18 No.43322792

Anonymous 06/29/26(Mon)05:21:18 No.43322792

>>43322236
Man, this TTS really is something else. Bravo, anon! Have you ever considered making some audiobooks for a short fimfiction story, perchance? I feel like that would be a great use-case; the TTS is just that good.

Poopsikins
06/29/26(Mon)06:12:56 No.43322830

Poopsikins 06/29/26(Mon)06:12:56 No.43322830

>>43322792
>Have you ever considered making some audiobooks for a short fimfiction story,

Maybe, I've thought about it before. I'd rather see if I could animate a short story rather than creating an audiobook, though animation takes forever. There's a few fics that come to mind that I'll like to adapt someday (with the authors permission, if they're still contactable)

Anonymous
06/29/26(Mon)08:14:35 No.43322939

Anonymous 06/29/26(Mon)08:14:35 No.43322939

>>43322830
> I'd rather see if I could animate a short story rather than creating an audiobook, though animation takes forever.
For your own good, anon, I would advise against that heavily. Start small. Make an animated skit inspired by something under a minute or so, like one of those youtube shorts. The feeling of achieving those small goals consistently will give you the motivation to do something longer. And by animation, I hope you don't mean hand-drawn or flash animation, lol; even just PNGs sliding across the screen like the tax breaks animation would suffice. Don't burn yourself out

Anonymous
06/29/26(Mon)10:58:40 No.43323079

Anonymous 06/29/26(Mon)10:58:40 No.43323079

Why does dsv4 have such niggerishly slow prompt processing?

Anonymous
06/29/26(Mon)11:06:11 No.43323090

Anonymous 06/29/26(Mon)11:06:11 No.43323090

>>43323079
wrong board sorry

Anonymous
06/29/26(Mon)19:56:14 No.43323669

Anonymous 06/29/26(Mon)19:56:14 No.43323669

File: matrix 405b-9f20-ba4def269701.jpg (447 KB, 1440x900)

447 KB JPG

~~Not pony related but I just wanna say I am big fan of vibecoding small scripts that I could had write in a day or two but I can get LMM make them in under a minute~~

Delta
06/29/26(Mon)23:08:19 No.43323889

Delta 06/29/26(Mon)23:08:19 No.43323889

File: 3817484.jpg (71 KB, 1090x550)

71 KB JPG

>>43303129
>>43320423
Updated the model with first iteration of emotion control. Links remain the same:

HuggingFace: https://huggingface.co/ZDisket/MOSS-TTS-PNY
Colab Notebook: https://colab.research.google.com/drive/1tDIYCMumcW5w3JWnQ0tBGyAr-ZpaaXBB
Public demo: http://198.53.64.194:35029/

You now have 12 emotion classes to choose from, plus an energy slider. They do influence, but it's more of a nudge than a demand. You still have to craft your prompts. Regardless, I hope this makes it easier to get what you're looking for. Nonverbal is an emotion class reserved for NSFW mode sometime in the future.
Pic unrelated. Also, rate limits on the public demo have been doubled as I've now got an optimized runner that does single batch inference at 3.5x realtime.

Anonymous
06/30/26(Tue)03:43:50 No.43324133

Anonymous 06/30/26(Tue)03:43:50 No.43324133

>>43323889
excite

Anonymous
06/30/26(Tue)08:27:47 No.43324263

Anonymous 06/30/26(Tue)08:27:47 No.43324263

>>43323889
Thanks for yet another release! Quick question: how do I enable the optimized runner? Just paste the code from HF and then run the gradio? Or do I have to stick to powershell?

ThunderShy
06/30/26(Tue)12:09:33 No.43324467

ThunderShy 06/30/26(Tue)12:09:33 No.43324467

>>43323889
i was wondering if you guys could also try adding more ponys such as the student six and the cmc's

ThunderShy
06/30/26(Tue)12:17:00 No.43324470

ThunderShy 06/30/26(Tue)12:17:00 No.43324470

oh and thorax

Delta
06/30/26(Tue)12:30:47 No.43324486

Delta 06/30/26(Tue)12:30:47 No.43324486

>>43324263
The code from the HF repo and Gradio already has all the optimizations. I think some are turned off by default, because this speed is achieved by using TorchInductor and compile witchcraft to turn the whole inference flow into one big kernel to reduce overhead, but takes 5-10 minutes to startup, which is no problem for a long-running server.
Also, expect the demo to go from a Gradio app to something more refined UI-wise. Maybe I'll give it a real name. Taking suggestions

Anonymous
06/30/26(Tue)16:36:43 No.43324798

Anonymous 06/30/26(Tue)16:36:43 No.43324798

>>43323889
Oh fucking awesome. I was just using the demo in bed and then I woke up to see the new settings and checked the thread then. Was so damn hype. Only responding now but yeah this is sick.

As a Ponkfag I am especially pleased, as beforehand she would yell 99% of her lines whereas Twilight was fantastically pitched most of the time. Putting "calm" on "0% energy" has led me with a lot of softer lower pitch Pinkie speech compared to before, which I love a ton. Her voice is so cute when it's chill. Extra rate limit is also really appreciated as I love playing with this.

Anonymous
06/30/26(Tue)17:39:31 No.43324899

Anonymous 06/30/26(Tue)17:39:31 No.43324899

>>43324798
>Putting "calm" on "0% energy" has led me with a lot of softer lower pitch Pinkie
>https://u.pone.rs/krthqfzm.wav
holy fug, yeah this works great, finally a tts Ponk that doesn't talk like she just chug a entire barrel of energy drink.

Anonymous
06/30/26(Tue)18:33:48 No.43324950

Anonymous 06/30/26(Tue)18:33:48 No.43324950

File: file.png (620 KB, 2516x498)

620 KB PNG

>>43323889
I tried to use the collab and I got this error

Anonymous
06/30/26(Tue)18:39:34 No.43324959

Anonymous 06/30/26(Tue)18:39:34 No.43324959

>>43324798
>softer lower pitch Pinkie speech
I may be a ponkfag now.

Anonymous
06/30/26(Tue)19:21:38 No.43325014

Anonymous 06/30/26(Tue)19:21:38 No.43325014

File: 1363919329737.png (196 KB, 830x962)

196 KB PNG

>>43324899
Yeah exactly. She sounds so fuckin' cute dude. Pinkie's always been the roughest one in TTS in my experience, but this is finally starting to deliver something really nice.

I'm glad you found that combo useful too. I've been experimenting with the style text shit and have gotten some interesting results. If I come across good sentences that have really nice results for prompting I'll try and share them here.

Here, before posting this I agonizingly fucked around and managed to get a couple cute ones.

>ponk on the inside
https://u.pone.rs/lubytuhc.wav

>ponk gf (despite her being a pony) ~~(this one required genning it piecemeal and stitching together because it was kind of long, but I hope you guys like it, it took me a minute but I think it's really cute)~~
https://u.pone.rs/sbdyftft.wav

(also >>43324959 extremely based, hope you like the audio clips I made for this post, you're my bro now if that's true)

Anonymous
06/30/26(Tue)22:26:28 No.43325260

Anonymous 06/30/26(Tue)22:26:28 No.43325260

>>43323669
Examples?

Delta
06/30/26(Tue)23:10:44 No.43325312

Delta 06/30/26(Tue)23:10:44 No.43325312

I'm going to see if OAI Codex can port this model to C++/GGML if I just leave it on a loop.
>>43324950
Fixed. The new optimized path switched to TorchScript instead of ONNX for the vocoder and the Colab demo didn't download that artifact. Also, since Gradio is being weird with share links, the demo uses a Cloudflare tunnel.
>>43325014
Cool stuff anon. If you find interesting ways of using the model, do share them.

Anonymous
07/01/26(Wed)00:30:03 No.43325391

Anonymous 07/01/26(Wed)00:30:03 No.43325391

>>43325014
non-screechy ponka is best ponka. both are fine, of course, but I like it better when she's less histrionic. extremely cute gens btw

Anonymous
07/01/26(Wed)05:27:00 No.43325582

Anonymous 07/01/26(Wed)05:27:00 No.43325582

>>43325312
>port this model to C++/GGML
Yes please. Setting this up on windows is a pain and I get random crashes on wsl.

Anonymous
07/01/26(Wed)11:01:35 No.43326118

Anonymous 07/01/26(Wed)11:01:35 No.43326118

mares

Anonymous
07/01/26(Wed)15:30:01 No.43326420

Anonymous 07/01/26(Wed)15:30:01 No.43326420

>>43326118
I love them.

Delta
07/01/26(Wed)16:44:36 No.43326545

Delta 07/01/26(Wed)16:44:36 No.43326545

>>43325312
>>43325582
Turns out it could. With CUDA backend, practical VRAM requirements with GGML become 10GB with everything in float16, or 8.2GB with the model quantized to 8bit, 7.8 with 6-bit. Working on Vulkan backend which is the most vendor-agnostic then I'll release it.

Anonymous
07/01/26(Wed)17:14:07 No.43326588

Anonymous 07/01/26(Wed)17:14:07 No.43326588

>>43326545
>7.8 with 6-bit.
Noice! Finally I can have a web-free run tts for whatever project I want without worrying about the net connection randomly going to shit for no reason
>Working on Vulkan backend
not an AMDfag but I would imagine it would be nice to have this for the linux people

Anonymous
07/01/26(Wed)20:03:54 No.43326809

Anonymous 07/01/26(Wed)20:03:54 No.43326809

File: emmy the robot as pony 354131.png (346 KB, 2685x3189)

346 KB PNG

hooray for the new era of ai ponies!

Anonymous
07/01/26(Wed)20:35:53 No.43326855

Anonymous 07/01/26(Wed)20:35:53 No.43326855

https://u.pone.rs/wedifxxb.wav

~~I'm so sorry but I was cackling like a dipshit edgy 12 year old boy at this one.~~

Anonymous
07/01/26(Wed)21:05:46 No.43326892

Anonymous 07/01/26(Wed)21:05:46 No.43326892

>>43326855
kek

Anonymous
07/01/26(Wed)23:41:21 No.43327060

Anonymous 07/01/26(Wed)23:41:21 No.43327060

File: file.png (324 KB, 1111x503)

324 KB PNG

>>43325312
The collab no longer crashes but now it is stuck on this without spitting out a url

Delta
07/02/26(Thu)00:04:13 No.43327086

Delta 07/02/26(Thu)00:04:13 No.43327086

>>43327060
Fixed, refresh and try again. There was some funky shit that meant the polling was stuck.
>>43326588
>>43326545
Works on Windows. This is with the Vulkan backend for the TTS and ONNX/DirectML for the vocoder on 1x RTX 3080 Ti.
https://u.pone.rs/dvhptahc.mp4
Relatively slow because Windows overhead and I have a bunch of junk open, but on a clean Linux and solid GPU it's 2.5x realtime.

Anonymous
07/02/26(Thu)02:45:46 No.43327264

Anonymous 07/02/26(Thu)02:45:46 No.43327264

>>43327086
Looking forward to using it.

Anonymous
07/02/26(Thu)03:58:35 No.43327336

Anonymous 07/02/26(Thu)03:58:35 No.43327336

File: bc7a083bf66bc5cd2357436c3(...).png (22 KB, 122x108)

22 KB PNG

>>43326545

Anonymous
07/02/26(Thu)06:26:16 No.43327428

Anonymous 07/02/26(Thu)06:26:16 No.43327428

>>43327086
progressbeam

Anonymous
07/02/26(Thu)13:32:50 No.43327801

Anonymous 07/02/26(Thu)13:32:50 No.43327801

>>43327336
yay

Anonymous
07/02/26(Thu)19:48:16 No.43328369

Anonymous 07/02/26(Thu)19:48:16 No.43328369

Haysay is dead and I decided to run RVC itself locally, but I can't find the Maud Pie RVC model, any leads lead to dead links, and it seems more by the same person who made that model are also dead.

Anonymous
07/02/26(Thu)20:50:51 No.43328468

Anonymous 07/02/26(Thu)20:50:51 No.43328468

>>43328369
Hay Say is back up now. Sorry for the unexpected downtime.
Yeah, looks like the original Maud Pie model is a dead link now. I went ahead and re-uploaded it here, in case you still want to run it locally:
https://huggingface.co/hydrusbeta/hay_say_reuploaded_models/tree/main/rvc/Maud%20Pie

>>43313914
Yes, I'll look into it this weekend.

>>43323889
Thank you for doing the hard work of bringing us ponies on MOSS-TTS with emotion control. Good stuff so far.

Anonymous
07/03/26(Fri)04:42:22 No.43328894

Anonymous 07/03/26(Fri)04:42:22 No.43328894

>>43328468
>Hay Say is back up now. Sorry for the unexpected downtime.
Thanks Anon!

Anonymous
07/03/26(Fri)04:47:25 No.43328896

Anonymous 07/03/26(Fri)04:47:25 No.43328896

>>43328468
Its down again and thanks. Also, pretty much everything from KenDoStudio is gone.

Anonymous
07/03/26(Fri)08:27:48 No.43329070

Anonymous 07/03/26(Fri)08:27:48 No.43329070

>>43328896
Still down for me as well.

Anonymous
07/03/26(Fri)11:49:32 No.43329273

Anonymous 07/03/26(Fri)11:49:32 No.43329273

>>43328896
>KenDoStudio is gone.
damn

Anonymous
07/03/26(Fri)16:48:37 No.43329611

Anonymous 07/03/26(Fri)16:48:37 No.43329611

>>43323889
I would love a api version of this that can take in longer text I have so many cool ideas I can do

Anonymous
07/03/26(Fri)19:03:34 No.43329805

Anonymous 07/03/26(Fri)19:03:34 No.43329805

>>43325014
I don't pay much attention to this thread now (not because idc, I'm just too busy with uni), but I remember that australian guy running his voice through vits and thought she sounded adorable as an australian.

>>43324899
>finally a tts Ponk that doesn't talk like she just chug a entire barrel of energy drink
Nooo!

Anonymous
07/04/26(Sat)00:29:44 No.43330191

Anonymous 07/04/26(Sat)00:29:44 No.43330191

>>43327086
Sorry to bother, but any ETA on a reference audio function?

Anonymous
07/04/26(Sat)01:43:47 No.43330246

Anonymous 07/04/26(Sat)01:43:47 No.43330246

>>43327086
I'd like to ask how do you train the voices for MOSS? is there a guide somewhere. What are the specs to do train a character.

Anonymous
07/04/26(Sat)01:44:26 No.43330247

Anonymous 07/04/26(Sat)01:44:26 No.43330247

>>43327086
hey nonny is there any potential for a phenomizer or something of the like to be trained for this model because it gets complicated/uncommon words very wrong very often. it doesn't understand how to pronounce things that aren't in its training data which is all mlp dialogue. these are all just assumptions about the infrastructure but the problem remains the same; and it's incredibly jarring to hear a word like 'erosion' be pronounced 'seyfloerser' like a slurred drunk mare

Anonymous
07/04/26(Sat)06:28:26 No.43330547

Anonymous 07/04/26(Sat)06:28:26 No.43330547

>>43328896
>Still no HaySay
Is it time to get worried?

ThunderShy
07/04/26(Sat)09:45:17 No.43330695

ThunderShy 07/04/26(Sat)09:45:17 No.43330695

>>43328369
i might have it :)

ThunderShy
07/04/26(Sat)09:48:06 No.43330700

ThunderShy 07/04/26(Sat)09:48:06 No.43330700

hydrus beta if your out there whats happening with haysay.ai website bud ?

HydrusBeta
07/04/26(Sat)11:53:11 No.43330883

HydrusBeta 07/04/26(Sat)11:53:11 No.43330883

>>43330547
>>43330700
Hay Say went down a couple more times, unexpectedly. I have an idea as to what's causing it but I'm not 100% certain yet. I've added some additional metrics monitoring and I'll be keeping an eye on the server in case it happens again. The server is back up right now.

Anonymous
07/04/26(Sat)13:12:31 No.43331049

Anonymous 07/04/26(Sat)13:12:31 No.43331049

>>43330247
nvm i fixed it by asking fable to train a LoRA on a dataset that included the voices with less samples as often as the ones with more of them.
before/after a/b test wav:
https://u.pone.rs/tnonbtdv.mp3
lora is included in the:
/chag/ AI VN tts mod and can be used independently of it with some finaggling
https://u.pone.rs/sfwhsrkz.zip

Anonymous
07/04/26(Sat)14:16:38 No.43331172

Anonymous 07/04/26(Sat)14:16:38 No.43331172

The CIA got Delta it seems. The model was too good.

Delta
07/04/26(Sat)16:03:22 No.43331314

Delta 07/04/26(Sat)16:03:22 No.43331314

>>43331172
Nay, I'm just very focused on the Windows scheibe. Basically, the demo works, but with a bunch of quirks that make it not 12GB VRAM-safe if you do something like want output longer than 7 seconds, so have to implement and test streaming.
~~and was blendering something for second life, but that turned out harder than I thought~~
>>43330191
Like, RVC-style? Probably 2 weeks
>>43330247
>>43331049
Ah, dataset balanced sampling. Well done anon, I gotta do that.
>>43329611
Yeah. I still haven't tested the model's long context abilities.

Anonymous
07/04/26(Sat)18:13:22 No.43331525

Anonymous 07/04/26(Sat)18:13:22 No.43331525

>>43331314
>Probably 2 weeks
noice, danke Delta-kun

Anonymous
07/05/26(Sun)02:39:16 No.43332220

Anonymous 07/05/26(Sun)02:39:16 No.43332220

>>43331314
kino

ThunderShy
07/05/26(Sun)07:07:59 No.43332450

ThunderShy 07/05/26(Sun)07:07:59 No.43332450

>>43330883
also you should try to add this tool

MOSS-TTS-1.7B-PNY v0.1: a finetune of MOSS-TTS + my custom vocoder for 48KHz audio

HuggingFace: https://huggingface.co/ZDisket/MOSS-TTS-PNY
Colab Notebook: https://colab.research.google.com/drive/1tDIYCMumcW5w3JWnQ0tBGyAr-ZpaaXBB
Public demo: http://198.53.64.194:35029/

See pic related on how to run on Google Colaboratory.
For local setup on your own hardware, you want at least 13GB of VRAM. Model runs ~1.5x realtime on a single RTX 5090 with the optimized runner. Download from HF and ask Claude Code to set it up for you.

>>43301171
From a technical perspective, this consists of two models: 1. A finetune of MOSS-TTS with fixed speaker conditioning, and 2. A very custom iSTFTNet2 vocoder that turns hidden states of the MOSS Audio tokenizer into 48KHz audio (which can be also repurposed for singing voice conversion).
>>43301515
The TF2 speakers are a bit lower quality because they were thrown in as an afterthought. Next version will include better emotion control and quality.

Anonymous
07/05/26(Sun)07:28:47 No.43332477

Anonymous 07/05/26(Sun)07:28:47 No.43332477

>>43332450
What the hell is wrong with you?

Anonymous
07/05/26(Sun)12:22:59 No.43332865

Anonymous 07/05/26(Sun)12:22:59 No.43332865

>>43332477
>he just copy pastes the whole fucking thing
lel incredible retardation

Anonymous
07/05/26(Sun)19:12:51 No.43333578

Anonymous 07/05/26(Sun)19:12:51 No.43333578

>>43331314
Yeah, looking forward to playing with it

Anonymous
07/06/26(Mon)00:27:41 No.43333966

Anonymous 07/06/26(Mon)00:27:41 No.43333966

snowpity

ThunderShy
07/06/26(Mon)07:46:27 No.43334283

ThunderShy 07/06/26(Mon)07:46:27 No.43334283

>>43332477
>>43332865
retards are talking about themselves again

Anonymous
07/06/26(Mon)16:49:03 No.43334918

Anonymous 07/06/26(Mon)16:49:03 No.43334918

File: 9374659585773829.jpg (68 KB, 682x682)

68 KB JPG

Delta
07/06/26(Mon)17:40:53 No.43334990

Delta 07/06/26(Mon)17:40:53 No.43334990

Windows MOSS-TTS runner. Currently a terminal interface (type in anything to do TTS, use / for commands)
System requirements: At least 12GB VRAM GPU (any brand), modern Windows 10/11, make sure to update your GPU drivers.
Runner: https://u.pone.rs/rlsymqeb.zip
Instructions: Run download-models.bat. Once that's done, run run-vulkan-directml-quant.bat for 12GB GPU, run-vulkan-directml-full.bat for >16GB GPUs. It will open a console.
Source code: https://u.pone.rs/yzragcxy.zip
This uses GGML for inference with transformers (Vulkan backend which is vendor-agnostic) and ONNX with the DirectML backend for the vocoder.
You can find the quantized models themselves under https://huggingface.co/ZDisket/MOSS-TTS-PNY-GGUF ; the .bat downloads from there.

This thing is a research preview for playing around with.

Anonymous
07/06/26(Mon)18:55:54 No.43335100

Anonymous 07/06/26(Mon)18:55:54 No.43335100

File: twilight embarazada.png (1011 KB, 1098x904)

1011 KB PNG

The MOSS-TTS model seems to automatically understand other languages to an extent, even if you don't change the "language" field. Twilight will speak Spanish with an American English accent. Mispronunciations abound, however, and it takes some finagling.
https://files.catbox.moe/yp2wdl.mp3

Anonymous
07/06/26(Mon)19:40:26 No.43335172

Anonymous 07/06/26(Mon)19:40:26 No.43335172

>>43334990
>run run-vulkan-directml-quant.bat for 12GB GPU, run-vulkan-directml-full.bat for >16GB GPUs
So which option do I chose for 7.8GB vram? Or is that not yet implemented?

Delta
07/06/26(Mon)21:22:26 No.43335303

Delta 07/06/26(Mon)21:22:26 No.43335303

>>43335172
There is none (at least not without quantizing both models to hell), it turns out my earlier 7.8GB VRAM figure was based on only the models loaded without any inference cache. Sorry.
The MOSS team has a nano model, so that could be next after RVC
https://github.com/OpenMOSS/MOSS-TTS-Nano

Anonymous
07/06/26(Mon)21:30:09 No.43335314

Anonymous 07/06/26(Mon)21:30:09 No.43335314

>>43335303
what gpu are you using to train these models

Delta
07/06/26(Mon)21:38:50 No.43335325

Delta 07/06/26(Mon)21:38:50 No.43335325

>>43335314
AMD Instinct MI300X.

Anonymous
07/06/26(Mon)22:19:26 No.43335368

Anonymous 07/06/26(Mon)22:19:26 No.43335368

>Drop in for my once-in-a-blue-moon check-in at slash em el pee slash
>Check this thread out to see any interesting developments
>Nothing new in OP, scroll a bit
>Delta

https://vocaroo.com/1bvGHJAmkbx2
transcript:
Goodness grace! This is the smoothest I've ever heard of a text to speech! I might actually follow along for this one, given I have a GPU (gee pee you) or two I could enslave for this... Oh, I do hope someone manages to get pure phonetic alphabet input working on these, as those would honestly make control way easier. Þorn and Eð are my favourite letters...
https://vocaroo.com/1jTTlFQudWAy
Another test for fun
As for what hardware I have... Quite the mix:
RX 7900XTX
RTX 3060
Arc Pro B50
I'll try to follow along this thread, but I oughta get to sleep. I'll try experimenting tomorrow with the 7900XTX - the 3060 is currently an SDXL slave

Anonymous
07/07/26(Tue)03:14:13 No.43335700

Anonymous 07/07/26(Tue)03:14:13 No.43335700

mares

Anonymous
07/07/26(Tue)03:25:20 No.43335711

Anonymous 07/07/26(Tue)03:25:20 No.43335711

File: 912489237813017851.png (177 KB, 1029x732)

177 KB PNG

>>43335303
Not sure if that helps, but afaik when they quantize transformer models they usually leave attention layers at high quants (like q8) and quant the mlp layers more aggressively, say, q4. MLP makes up most of the parameters while being more resistant to quantization, at least in natural language. But I'm not sure whether it would transfer to this task because sequences are much shorter.

Anonymous
07/07/26(Tue)03:30:28 No.43335716

Anonymous 07/07/26(Tue)03:30:28 No.43335716

>>43335711
and yes, picrel shows the ggufs from your repo, I indicated what I would try changing. Looking forward to and preserving my cum for the release.

Anonymous
07/07/26(Tue)08:44:38 No.43336020

Anonymous 07/07/26(Tue)08:44:38 No.43336020

snowpity

Anonymous
07/07/26(Tue)14:50:42 No.43336692

Anonymous 07/07/26(Tue)14:50:42 No.43336692

>>43334990
uhhh, how do I get it work on linux?

Delta
07/07/26(Tue)19:00:12 No.43337124

Delta 07/07/26(Tue)19:00:12 No.43337124

>>43336692
Not officially supporting Linux because I can't be bothered with supporting every distro from Ubuntu to RaritysSmellyFlankOS. However, the source code does compile and works in Linux fine, so you can figure it out with AI.
>>43335711
I should check that out

Anonymous
07/07/26(Tue)20:01:20 No.43337245

Anonymous 07/07/26(Tue)20:01:20 No.43337245

>>43337124
>Not officially supporting Linux because I can't be bothered with supporting every distro from Ubuntu to RaritysSmellyFlankOS.
Koboldcpp the portable-ish Linux executable has worked fine on both my Fedora server and my Tumbleweed main machine, and likely works on Arch and Debian. It mostly depends on how it's packaged and what libraries it uses,
A vast majority of distros people commonly use have a few base distros as "upstream": Ubuntu, Mint and Pop!OS from Debian, Nobara and Bazzite from Fedora, Manjaro, Endeavour and CachyOS from Arch.
>TL;DR By building for Debian, Arch and Fedora you cover most of the Linux ecosystem.
Still, for wanting people to just compile for their own stuff and not worry, outline _a_ path to make it work on Linux and I'm sure most can follow along with the necessary changes.
Will try compiling the thing. If successful on my Tumbleweed, I'll try to document it and if it does become a single packed executable in the end I'll probably upload it.

Anonymous
07/07/26(Tue)20:04:29 No.43337251

Anonymous 07/07/26(Tue)20:04:29 No.43337251

>>43337124
>Not officially supporting Linux because I can't be bothered with supporting every distro from Ubuntu to RaritysSmellyFlankOS
Why not make an AppImage?

Anonymous
07/07/26(Tue)22:01:38 No.43337398

Anonymous 07/07/26(Tue)22:01:38 No.43337398

>>43337245
Soooo
Here's how it went... As Delta said I figured it out with AI. Just plain Claude Sonnet 5 medium effort no reasoning. And Claude also wrote this thingy:
>Got MOSS TTS building and running on Linux with Vulkan on an AMD 7900XTX, no CUDA/DirectML/Windows.
>Deps: vulkan-devel, shaderc+shaderc-devel (glslc is in the non-devel pkg on Tumbleweed, headers in -devel, annoying split), spirv-headers (not pulled in automatically, cmake will just fail on ggml-vulkan's CMakeLists until you install it separately). ONNX Runtime 1.26.0 CPU tarball, unused in the end — go GGUF vocoder route instead, explained below.
>cmake -DMOSS_TTS_ENABLE_VULKAN=ON -DMOSS_TTS_ENABLE_CUDA=OFF, build moss-tts-tui + moss-tts-engine targets specifically (there's a moss-vocoder-onnx target too but it's gated behind a flag that defaults off, don't bother, TUI doesn't need it, vocoding's in-process).
>Two real gotchas if you've got multiple GPUs:

>It picks Vulkan device index 0 by default. If you've got an iGPU or a second card ahead of your main one in enumeration order, it'll try to allocate on that instead and OOM on something that should be trivial. GGML_VK_VISIBLE_DEVICES=N env var fixes it.
>The .onnx vocoder file only works with --provider cpu/cuda/directml. If you want Vulkan, you need the separate .gguf vocoder file and --provider ggml-vulkan. Different file, different provider family, README doesn't make this obvious.

>Once on the GGUF vocoder + ggml-vulkan: hard crash, GGML_ASSERT(src0->type == GGML_TYPE_F32) failed inside ggml_vk_build_graph. F16 vocoder graph hits a Vulkan kernel that only supports F32 input somewhere. Not a build problem, not a driver problem — Vulkan backend genuinely can't run this op combo in F16 right now.
Fix: --provider ggml-cpu for the vocoder instead of ggml-vulkan, keep everything else on Vulkan. Main model + decoder4 (the actually expensive parts) still run on GPU, vocoder falls back to CPU. Works fine, vocoder's cheap enough that CPU fallback doesn't matter.
Back to me, the human:
>TL;DR yeah you can get it working on a setup similar to mine I guess
The Vulkan part was because my main machine has an Arc A310 handling desktop and displays plugged into the first slot. That way, it handles all the browsers and whatnot while the big boy card does everything else.
Dunno what else is particularly iffy about this report or something
Gonna make zip containing the cmake commands Claude made and whatnot

Delta
07/07/26(Tue)22:36:39 No.43337439

Delta 07/07/26(Tue)22:36:39 No.43337439

>>43337245
>>43337251
I'll support Linux as in give a "build it yourself" recipe.
>>43337398
Sounds about right. I forgot to say the vocoder is partially implemented in GGML but it's better to use ONNX, the GGML port is incomplete because a ton of operations have to be implemented from scratch (libllama was designed for transformers, not 2D convs). Unfortunately for Linux there is no DirectML equivalent; gotta use CUDA for NVIDIA GPU and MIGraphX for AMD.

Anonymous
07/07/26(Tue)22:36:47 No.43337441

Anonymous 07/07/26(Tue)22:36:47 No.43337441

>>43337398
https://u.pone.rs/klpogtoq.zip
Here's:
Some cmake commands
A script to run it
A readme
For building and running a Linux version
Hopefully others can iterate on this

Anonymous
07/07/26(Tue)22:39:51 No.43337443

Anonymous 07/07/26(Tue)22:39:51 No.43337443

>43337439
>Sounds about right. I forgot to say the vocoder is partially implemented in GGML but it's better to use ONNX, the GGML port is incomplete because a ton of operations have to be implemented from scratch (libllama was designed for transformers, not 2D convs). Unfortunately for Linux there is no DirectML equivalent; gotta use CUDA for NVIDIA GPU and MIGraphX for AMD.

sucks given MIGraphX is supposedly deprecated and the new method is python heavy or... something idk
That means what I have in >>43337441 is... Eh, it is what it is.
Hopefully improvements come around, I just did what I could. Now time to figure out how to operate this from the tui

Anonymous
07/07/26(Tue)22:55:29 No.43337451

Anonymous 07/07/26(Tue)22:55:29 No.43337451

>>43337441
oh
warning:
the build script is incomplete it doesn't pull the onnx runtime
mine and claudes mistake

Anonymous
07/07/26(Tue)23:02:14 No.43337458

Anonymous 07/07/26(Tue)23:02:14 No.43337458

>>43337451
To fix that mistake, add -DMOSS_TTS_ENABLE_ONNX=ON into the cmake thingy and then in the run commands change the vocoder model to the .onnx one and provider to cpu
it is now very late and the sun is shining brightly here...
https://voca.ro/16NdL4gLBT3m

Anonymous
07/08/26(Wed)02:21:53 No.43337665

Anonymous 07/08/26(Wed)02:21:53 No.43337665

>>43335368
>that battles atlas generation
fucking kek, I'm not the only one who sees a beautifully demented pro-AI and anti-human message in that song then. Love that track, absolutely did not expect to see it here.

Anonymous
07/08/26(Wed)07:36:35 No.43337974

Anonymous 07/08/26(Wed)07:36:35 No.43337974

Uhoh, the docs haven't been updated in a hot minute. How would you like to list your new tts model Delta?

Anonymous
07/08/26(Wed)10:00:10 No.43338073

Anonymous 07/08/26(Wed)10:00:10 No.43338073

>>43337665
Media Molecule did choose some absolute bangers to license...
If I knew how to use music tools and the bare singing necessary I'd be down to make a full cover with one of those rvc things or whatever BGM was using
Though I don't see any specific messaging

BGM
07/08/26(Wed)18:28:28 No.43338923

BGM 07/08/26(Wed)18:28:28 No.43338923

File: happyooo.gif (630 KB, 300x354)

630 KB GIF

>>43331314
>2 weeks
Nice, exciting stuff. I can't wait to try it out with speech-to-speech conversion.

Delta
07/08/26(Wed)18:48:43 No.43338953

Delta 07/08/26(Wed)18:48:43 No.43338953

>>43337974
You decide
>>43338923
Oh, hi BGM. Still using RVC right?

BGM
07/08/26(Wed)20:13:00 No.43339073

BGM 07/08/26(Wed)20:13:00 No.43339073

>>43338953
For now, yes. Your new thing looks very promising.

Anonymous
07/09/26(Thu)03:29:07 No.43339625

Anonymous 07/09/26(Thu)03:29:07 No.43339625

snowpity

Anonymous
07/09/26(Thu)07:31:14 No.43339896

Anonymous 07/09/26(Thu)07:31:14 No.43339896

Any chance on including the training process and script? Specially if it includes the emotional control as having that on larger variety of voices would be pretty amazing.

Anonymous
07/09/26(Thu)14:27:39 No.43340399

Anonymous 07/09/26(Thu)14:27:39 No.43340399

>>43339896
Yeah it would be stellar.

Anonymous
07/09/26(Thu)19:27:56 No.43340832

Anonymous 07/09/26(Thu)19:27:56 No.43340832

>>43338073
Yeah, my first exposure was LittleBigPlanet back when too. I've since come to love the lyrics for how absolutely diabolical they've become since the rise of AI lmao. An AI pony cover sounds perfect.

Anonymous
07/09/26(Thu)22:23:59 No.43341014

Anonymous 07/09/26(Thu)22:23:59 No.43341014

https://voca.ro/1kIzNC7MHlGg
REAL AUDIO OF ZECORA SPEAKING HER NATIVE LANGUAGE
(no seriously I have no idea how this happened)

Anonymous
07/10/26(Fri)00:59:03 No.43341186

Anonymous 07/10/26(Fri)00:59:03 No.43341186

>>43338073
You reminded me of Passion Pit - Sleepyhead, kek. Would be a neat song to do a mare cover of, if not to just do the high-pitched parts!

Anonymous
07/10/26(Fri)07:41:12 No.43341475

Anonymous 07/10/26(Fri)07:41:12 No.43341475

Oh boy, time to restart some of my older project with the new voice tools.

Anonymous
07/10/26(Fri)12:39:21 No.43341926

Anonymous 07/10/26(Fri)12:39:21 No.43341926

>>43339625
Very insightful.

Anonymous
07/10/26(Fri)18:45:02 No.43342734

Anonymous 07/10/26(Fri)18:45:02 No.43342734

>>43341926
mare

Anonymous
07/10/26(Fri)23:47:56 No.43343294

Anonymous 07/10/26(Fri)23:47:56 No.43343294

Any Panel for /mlp/con?

Anonymous
07/11/26(Sat)03:27:47 No.43343521

Anonymous 07/11/26(Sat)03:27:47 No.43343521

>>43343294
A little late to bring up. No, nothing has been discussed.

Anonymous
07/11/26(Sat)07:48:32 No.43343767

Anonymous 07/11/26(Sat)07:48:32 No.43343767

>>43343521
perhaps with all the new development there will be some panel made for marecon

Anonymous
07/11/26(Sat)10:37:57 No.43344077

Anonymous 07/11/26(Sat)10:37:57 No.43344077

Hello Delta,

While trying to run your Moss TTS model on CPU, I ran into a few obstacles I thought I should let you know about.

First, the default value for the option --decoder4-features-onnx is "ort_sessions/decoder4_features_fp32/model.onnx", but I don't see that model in your repo. Based on the code, I think this option is supposed to point at the vocoder onnx model, so I specified istftnet2_decoder4_50hz/istftnet2_decoder.onnx and that seemed to work. I recommend changing the default.

Second, I was still unable to get the oonx vocoder working because I got this error during the decoding step:
>File "...\moss_tts_torchopt_runner_bundle\portable_tts_runtime.py", line 624, in decode_outputs
>lengths_input_name = self.decoder4_session.get_inputs()[1].name
>IndexError: list index out of range
I'm not sure what's wrong here.

Lastly, lines 177 and 178 of portable_tts_runtime.py enforce the use of a cuda device when the option --decoder4-features-runtime is torch_fp16. However, this seems entirely unnecessary. In fact, I had to comment out those two lines in order to use the istftnet2_decoder_cpu.ts model, and it generated output successfully.

Delta
07/11/26(Sat)11:42:34 No.43344198

Delta 07/11/26(Sat)11:42:34 No.43344198

File: discriminators2.png (1.72 MB, 1866x1048)

1.72 MB PNG

I've got my whole iSTFTNet3 thing (what powers the custom decoder that has the audio quality) into RVC and it's training (multi speaker model with all notable characters), after solving GAN instability this will take like a week.
>>43344077
Thank you. I'll get that looked at
>>43339896
Yeah, I plan to. Right now, the codebase is a vibe coded mess.
>>43343294
A bit too late for that, but some other con I want to present some things including this model. Basically, I have a custom vocoder that is highly upgraded iSTFTNet2 including ConvNeXt-V2 blocks in the generator, and a custom discriminator setup: MPD, MS-STFT and MS-CQT-D (from https://arxiv.org/abs/2311.14957). But unlike vanilla implementations, I found out log1p magnitude and instantaneous frequency are more numerically stable, and I have dedicated full-band (sees whole frequency range) and high-band modules. Blah blah.

Anonymous
07/11/26(Sat)14:39:07 No.43344696

Anonymous 07/11/26(Sat)14:39:07 No.43344696

>>43344198
noice progress

Anonymous
07/11/26(Sat)17:08:45 No.43344996

Anonymous 07/11/26(Sat)17:08:45 No.43344996

>>43344198
>I found out log1p magnitude and instantaneous frequency are more numerically stable, and I have dedicated full-band
i have no idea what that means but im happy it makes pony voices better

Anonymous
07/12/26(Sun)00:39:45 No.43345702

Anonymous 07/12/26(Sun)00:39:45 No.43345702

snoof

Anonymous
07/12/26(Sun)05:02:07 No.43346012

Anonymous 07/12/26(Sun)05:02:07 No.43346012

>>43341475
Whatchya working on Anon?

Anonymous
07/12/26(Sun)06:26:12 No.43346097

Anonymous 07/12/26(Sun)06:26:12 No.43346097

>>43346012
just wished to get some greens turned into short audios, since talknet tts is dotting into neutral/deadpan note, while the rvc and other voice conversion models are aggressive not compatible with my voice.

Anonymous
07/12/26(Sun)11:31:37 No.43346429

Anonymous 07/12/26(Sun)11:31:37 No.43346429

>>43346097
Ah, understandable. Text to speech no good?

Anonymous
07/12/26(Sun)12:55:30 No.43346531

Anonymous 07/12/26(Sun)12:55:30 No.43346531

>>43346429
older models were good when doing few words or short sentences, but when trying to put something longer together I can hear the inconsistencies between each clip (speed, tone of voice, emotion, random word mispronunciations), which leads to generating 5 to 10 times more clips than is necessary just to sort out the good ones. With the Delta new model control makes it so much better at getting the voices follow the directions they are supposed to sound like.

Anonymous
07/12/26(Sun)16:15:52 No.43346930

Anonymous 07/12/26(Sun)16:15:52 No.43346930

By the way Delta, could you add a toggle to disable the emotion control? I still have the initial release and I feel like the quality of the voicelines were more accurate-sounding (not by much, but just enough to notice).

Anonymous
07/12/26(Sun)16:24:46 No.43346968

Anonymous 07/12/26(Sun)16:24:46 No.43346968

Hello Delta,

I don't think the "style text" input is used at all in MOSS-TTS-PNY. I'm guessing that's a mistake? Or maybe I am just missing something.

Line 713 of portable_tts_runtime would use style_text, but only if style_features_dim != 2
https://huggingface.co/ZDisket/MOSS-TTS-PNY/blob/main/moss_tts_torchopt_runner_bundle/portable_tts_runtime.py#L713
But... style_feature_dim is set to 2 in the config file and is never overridden in the code as far as I can tell:
https://huggingface.co/ZDisket/MOSS-TTS-PNY/blob/main/moss_tts_local_clipper_checkpoint/config.json#L106
By placing some debug points, I verified that neither line 559 (which loads the feature extractor) nor 713 (which runs the extractor) of portable_tts_runtime.py is ever executed.

By the way, is there another place you would prefer people to report issues and make feature requests, or is it fine for us to just post them in this thread?

Anonymous
07/13/26(Mon)00:08:10 No.43348252

Anonymous 07/13/26(Mon)00:08:10 No.43348252

>>43346968
Hello Delta,

I just wanted to report another thing I found. It seems that "Audio top-k" is also never used. Line 726 of portable_tts_runtime.py always passes a value of `None` to the model's generate method. I think that should be `audio_top_k` instead:
https://huggingface.co/ZDisket/MOSS-TTS-PNY/blob/main/moss_tts_torchopt_runner_bundle/portable_tts_runtime.py#L726
Also, the CLI version is missing an --audio-top-k option in run_tts_torchopt.py - there are only options for audio-temperature and audio-top-p - and line 214 of run_tts_torchopt.py just passes `None` for its value to the `synthesize` method.
https://huggingface.co/ZDisket/MOSS-TTS-PNY/blob/main/moss_tts_torchopt_runner_bundle/run_tts_torchopt.py#L149
https://huggingface.co/ZDisket/MOSS-TTS-PNY/blob/main/moss_tts_torchopt_runner_bundle/run_tts_torchopt.py#L214

Anonymous
07/13/26(Mon)05:04:38 No.43348679

Anonymous 07/13/26(Mon)05:04:38 No.43348679

>>43341014
Sounds like simmish.

Delta
07/13/26(Mon)10:31:12 No.43349081

Delta 07/13/26(Mon)10:31:12 No.43349081

>>43346930
Emotion control cannot be turned off currently.
>>43346968
Style text is a relic from a previous attempt by me to add emotion control, it was Cookie et al. BERT hidden as conditioning input, which I later found out didn't influence the model
>By the way, is there another place you would prefer people to report issues and make feature requests, or is it fine for us to just post them in this thread?
This thread is fine.
>>43348252
Top-k sampling can be enabled but I keep it off by default because temperature and top-p are enough.

Anonymous
07/13/26(Mon)17:18:06 No.43349764

Anonymous 07/13/26(Mon)17:18:06 No.43349764

mares

Anonymous
07/13/26(Mon)22:23:48 No.43350325

Anonymous 07/13/26(Mon)22:23:48 No.43350325

snoofpity

Anonymous
07/14/26(Tue)00:21:27 No.43350459

Anonymous 07/14/26(Tue)00:21:27 No.43350459

File: Website that wasn't even (...).png (27 KB, 536x294)

27 KB PNG

It's funny that he says this, like 15.Ai wasn't only usable for a grand total of one year out of the eight years the site has been up.

Anonymous
07/14/26(Tue)04:18:30 No.43350771

Anonymous 07/14/26(Tue)04:18:30 No.43350771

>>43344198
sunbeam

Anonymous
07/14/26(Tue)07:31:08 No.43351025

Anonymous 07/14/26(Tue)07:31:08 No.43351025

>>43350459
Yeah, the fun thing about the technology is that once something is made, somebody can recreate it, even if it take five years to do so.

Anonymous
07/14/26(Tue)10:03:42 No.43351241

Anonymous 07/14/26(Tue)10:03:42 No.43351241

>>43349081
Ah, no biggie. I'll just alternate between the initial release and the latest as needed.

By the way, I hope you take your time polishing the next release, especially for bug-fixing. I didn't even know the style text function was just window dressing. I think even the list of requirements might need some updating, too, since I recall running into some issues with dependencies that I had to go and solve on my own (something to do with triton IIRC). Plus, there was an issue with the Speaker names only registering as Speaker (ID), so I had to manually find out which character had which ID in order to generate with the corresponding character. I also think the installation requirements bears updating, too: they make it seem as if the CUDA versions for PyTorch don't really matter, but I think I ran into some issues when installing versions above 13.0+ (forgot if it was related to the triton dependency issue or not), which were only fixed once I downgraded to 12.6. Perhaps I'm retarded with computers and fucked up along the way (very likely), but maybe you could consider doing a fresh install yourself to see if the install process is in proper order?

Just take your time to incorporate the feedback in this thread and take care not to rush things. You're doing great so far, but you're being a tad bit sloppy at times.

Anonymous
07/14/26(Tue)12:51:53 No.43351455

Anonymous 07/14/26(Tue)12:51:53 No.43351455

>>43351025
It's been 5 years and nobody has.

Anonymous
07/14/26(Tue)19:01:39 No.43352124

Anonymous 07/14/26(Tue)19:01:39 No.43352124

>>43351025
mare snoof

Anonymous
07/14/26(Tue)19:02:09 No.43352125

Anonymous 07/14/26(Tue)19:02:09 No.43352125

Bumo.

Anonymous
07/14/26(Tue)21:51:31 No.43352448

Anonymous 07/14/26(Tue)21:51:31 No.43352448

Site down atm for anyone else?

Delta
07/14/26(Tue)23:07:51 No.43352630

Delta 07/14/26(Tue)23:07:51 No.43352630

>>43352448
Instance went offline. It was being hosted on a Vast RTX 5090 instance; if it continues being dead by tomorrow, I'll switch to something on RunPod.
>>43351241
Been researching the model itself for a long time, so as soon as I got it working, I got everything else ready as quickly as possible for a release. Next stuff should be more polished.

Anonymous
07/15/26(Wed)04:04:53 No.43353055

Anonymous 07/15/26(Wed)04:04:53 No.43353055

>>43352630
>if it continues being dead by tomorrow, I'll switch to something on RunPod.
Thanks

Anonymous
07/15/26(Wed)07:29:31 No.43353258

Anonymous 07/15/26(Wed)07:29:31 No.43353258

>>43351455
The speed of the board has decreased a lot as well.

Anonymous
07/15/26(Wed)09:50:56 No.43353419

Anonymous 07/15/26(Wed)09:50:56 No.43353419

>>43351455
rip

Anonymous
07/15/26(Wed)14:54:46 No.43353854

Anonymous 07/15/26(Wed)14:54:46 No.43353854

mares

Anonymous
07/15/26(Wed)19:26:24 No.43354305

Anonymous 07/15/26(Wed)19:26:24 No.43354305

File: 40k 1780516651858476.jpg (545 KB, 2792x2724)

545 KB JPG

>>43353854
thats the plan

Anonymous
07/16/26(Thu)02:40:11 No.43354838

Anonymous 07/16/26(Thu)02:40:11 No.43354838

>>43354305
robomares

Anonymous
07/16/26(Thu)06:01:58 No.43355013

Anonymous 07/16/26(Thu)06:01:58 No.43355013

>>43354305
~~That's a stallion.~~

Anonymous
07/16/26(Thu)10:57:20 No.43355286

Anonymous 07/16/26(Thu)10:57:20 No.43355286

>>43351455
Haysay is a thing. Though it's not quite the same.

Anonymous
07/16/26(Thu)15:14:55 No.43355660

Anonymous 07/16/26(Thu)15:14:55 No.43355660

File: 260042.jpg (430 KB, 4000x1884)

430 KB JPG

Just finished a cover of ABBA with custom pony lyrics.
https://u.pone.rs/wdeknkzd.flac

Anonymous
07/16/26(Thu)15:54:32 No.43355737

Anonymous 07/16/26(Thu)15:54:32 No.43355737

File: 4564464.gif (189 KB, 100x125)

189 KB GIF

>>43355660
sounds great anon!
keep up the good work

Anonymous
07/16/26(Thu)15:58:27 No.43355751

Anonymous 07/16/26(Thu)15:58:27 No.43355751

File: 1701105.gif (452 KB, 297x221)

452 KB GIF

>>43355660
HOOOOLY SHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIT

Anonymous
07/16/26(Thu)16:00:23 No.43355756

Anonymous 07/16/26(Thu)16:00:23 No.43355756

File: derpy dance happy.gif (722 KB, 1080x1080)

722 KB GIF

>>43355660
hell yeah, this is the quality of stuff that keeps me coming back. thank you for that anon!

Anonymous
07/16/26(Thu)17:59:18 No.43356044

Anonymous 07/16/26(Thu)17:59:18 No.43356044

>>43355660
kino!

Anonymous
07/16/26(Thu)21:40:58 No.43356539

Anonymous 07/16/26(Thu)21:40:58 No.43356539

>>43355660
Lovely work anon!

Anonymous
07/16/26(Thu)22:56:19 No.43356694

Anonymous 07/16/26(Thu)22:56:19 No.43356694

henlo frens. how 2 MOSS?

Anonymous
07/16/26(Thu)23:16:10 No.43356728

Anonymous 07/16/26(Thu)23:16:10 No.43356728

File: 6405138.png (550 KB, 5000x5000)

550 KB PNG

https://voca.ro/15vaRks9fDfv

Anonymous
07/16/26(Thu)23:20:29 No.43356736

Anonymous 07/16/26(Thu)23:20:29 No.43356736

File: 7413403.gif (1.46 MB, 498x291)

1.46 MB GIF

https://voca.ro/1hldXDn4d7ga

Anonymous
07/17/26(Fri)04:41:40 No.43357104

Anonymous 07/17/26(Fri)04:41:40 No.43357104

>>43355660
ossum

Anonymous
07/17/26(Fri)10:19:41 No.43357594

Anonymous 07/17/26(Fri)10:19:41 No.43357594

>>43356694
eat the Moss

Anonymous
07/17/26(Fri)14:33:47 No.43358262

Anonymous 07/17/26(Fri)14:33:47 No.43358262

>>43355660
Well, cool things still happen every now and then.

Anonymous
07/17/26(Fri)15:30:42 No.43358409

Anonymous 07/17/26(Fri)15:30:42 No.43358409

File: 7270131.gif (547 KB, 498x462)

547 KB GIF

>>43355660

Anonymous
07/17/26(Fri)19:52:28 No.43358904

Anonymous 07/17/26(Fri)19:52:28 No.43358904

From what I can see in /chag/, the text models are more or less on the same level as good green texts, and the ai voice stuff is pretty much soled, the only other thing left is the pony robot bodies to make waifus truly real.

Anonymous
07/17/26(Fri)19:58:57 No.43358916

Anonymous 07/17/26(Fri)19:58:57 No.43358916

>>43358904
I could never figure out how to make it work. If it doesn't require me to have a terrabyte of space to run them locally, it requires me to buy fucking Gemini or some stupid shit.

HydrusBeta
07/17/26(Fri)23:11:57 No.43359159

HydrusBeta 07/17/26(Fri)23:11:57 No.43359159

File: yay.png (61 KB, 587x288)

61 KB PNG

Moss TTS has been added to Hay Say. It is, however, veeeery sloooow. Performance has always been an issue with Hay Say, but it's quite noticeable this time. I am looking into ways to make it faster. In the meantime, you can reduce the "RVQ Codebook layers" parameters to reduce the inference time, but at the cost of audio quality. Delta's Gradio demo is still up, too, and it's way faster.

For haysay.ai, I have implemented an output cap of ~20 seconds. On a local installation, you can increase that limit by editing the docker-compose.yaml file; search it for the --max-new-tokens option. Every 12.5 tokens adds about 1 second to the limit.

HydrusBeta
07/18/26(Sat)00:47:15 No.43359242

HydrusBeta 07/18/26(Sat)00:47:15 No.43359242

>>43359159
Welp, that was short-lived. The server crashed 3 times within an hour and a half. Memory usage spiked just before the crash. I've removed Moss TTS from haysay.ai until I can figure something out. It is still available to run in Hay Say locally.

ThunderShy
07/18/26(Sat)04:23:56 No.43359444

ThunderShy 07/18/26(Sat)04:23:56 No.43359444

>>43303129
hello delta i was wondering if your able to add more pony's to moss tts like the student six and such others add more voices it would be great to see :)

Anonymous
07/18/26(Sat)09:37:56 No.43359680

Anonymous 07/18/26(Sat)09:37:56 No.43359680

mares

Anonymous
07/18/26(Sat)12:00:12 No.43359932

Anonymous 07/18/26(Sat)12:00:12 No.43359932

>>43359159
>For haysay.ai, I have implemented an output cap of ~20 seconds.
You mean for Moss TTS, or does that cap apply to all models now?

Anonymous
07/18/26(Sat)15:14:50 No.43360379

Anonymous 07/18/26(Sat)15:14:50 No.43360379

>>43359159
>Moss TTS has been added to Hay Say.
sunbeam

HydrusBeta
07/18/26(Sat)15:45:15 No.43360459

HydrusBeta 07/18/26(Sat)15:45:15 No.43360459

>>43359932
Only to MossTTS.

Anonymous
07/18/26(Sat)19:29:50 No.43360993

Anonymous 07/18/26(Sat)19:29:50 No.43360993

>>43357594
got sick :(

Anonymous
07/19/26(Sun)01:12:13 No.43361488

Anonymous 07/19/26(Sun)01:12:13 No.43361488

>>43360993
wtf sick mare

Anonymous
07/19/26(Sun)04:18:52 No.43361678

Anonymous 07/19/26(Sun)04:18:52 No.43361678

>>43352630
>>43355660
>>43359159
Thanks for all your work, Anons.

Anonymous
07/19/26(Sun)13:24:23 No.43362404

Anonymous 07/19/26(Sun)13:24:23 No.43362404

File: qw3bhis.gif (59 KB, 86x129)

59 KB GIF

>>43355660
lovely

Anonymous
07/19/26(Sun)15:50:34 No.43362722

Anonymous 07/19/26(Sun)15:50:34 No.43362722

mares

Anonymous
07/19/26(Sun)16:15:38 No.43362835

Anonymous 07/19/26(Sun)16:15:38 No.43362835

>https://pony.tube/w/6ERFDbLpSCKZv4iWtGu5Zn
darn, wish people were posting poners or catbox mp3 links with their videos.

Anonymous
07/19/26(Sun)18:52:36 No.43363292

Anonymous 07/19/26(Sun)18:52:36 No.43363292

>>43362835
What do you mean?

Anonymous
07/19/26(Sun)19:31:20 No.43363368

Anonymous 07/19/26(Sun)19:31:20 No.43363368

>>43363292
>What do you mean?
Like, random people will post ai covers songs as video, but there isnt a link to just get the song. So a lot of time I will need to download the whole video and throw it into audacity to extract just the song, I dont want to sound like /mu/ autist, but most streaming sites will mess around with encoding/format of the creator video file so whatever I download will be downgraded in quality to some degree.

Anonymous
07/19/26(Sun)22:35:53 No.43363686

Anonymous 07/19/26(Sun)22:35:53 No.43363686

>>43363368
Ah, understandable.

Anonymous
07/20/26(Mon)03:48:18 No.43364021

Anonymous 07/20/26(Mon)03:48:18 No.43364021

>>43363368
the audio will still be compressed but you can download youtube vids straight to an audio file using ffmpeg or yt-dlp.

Anonymous
07/20/26(Mon)06:25:11 No.43364257

Anonymous 07/20/26(Mon)06:25:11 No.43364257

>>43362404
Bouncy mare.

Anonymous
07/20/26(Mon)06:58:50 No.43364290

Anonymous 07/20/26(Mon)06:58:50 No.43364290

Can we get a status update Delta?

Anonymous
07/20/26(Mon)12:53:04 No.43364926

Anonymous 07/20/26(Mon)12:53:04 No.43364926

Uppo.

Anonymous
07/20/26(Mon)18:41:34 No.43365642

Anonymous 07/20/26(Mon)18:41:34 No.43365642

File: 7CD05B215DE50D15E359672F8(...).png (576 KB, 1250x900)

576 KB PNG

Anonymous
07/20/26(Mon)19:27:46 No.43365739

Anonymous 07/20/26(Mon)19:27:46 No.43365739

>>43352630
>>43364290
would likewise love to hear, if there's anything to share. I'm excited for V2V with this model.

Delta
07/20/26(Mon)20:18:30 No.43365824

Delta 07/20/26(Mon)20:18:30 No.43365824

>>43364290
>>43365739
Struggling to get RVC working with my changes, it's a more delicate model than I thought. (Training collapses because GAN instability or fails to deliver quality)

Anonymous
07/20/26(Mon)20:24:40 No.43365836

Anonymous 07/20/26(Mon)20:24:40 No.43365836

>>43365824
Don't stress yourself out over it too much! It was supposed to be a cheeky lil' request on our part; if you haven't found a viable path towards implementing it, I'd rather you just focus on developing the main TTS function. That's the main draw, anyway

Anonymous
07/20/26(Mon)20:29:04 No.43365840

Anonymous 07/20/26(Mon)20:29:04 No.43365840

File: Amy's Calling!.png (150 KB, 689x689)

150 KB PNG

>>43365824
Damn, here's hoping you find the secret formula. The T2V sounds great, a similar V2V would be fantastic to have in the toolset.

Anonymous
07/21/26(Tue)02:29:20 No.43366428

Anonymous 07/21/26(Tue)02:29:20 No.43366428

>>43355660
More than a decade later and the creativity never runs dry. I really love that from this fandom. Very good job and thanks for sharing this with us!

Anonymous
07/21/26(Tue)06:12:51 No.43366699

Anonymous 07/21/26(Tue)06:12:51 No.43366699

>>43365840
Love me some Sisi.

Anonymous
07/21/26(Tue)12:04:38 No.43367236

Anonymous 07/21/26(Tue)12:04:38 No.43367236

snowpity

Anonymous
07/21/26(Tue)15:51:23 No.43367736

Anonymous 07/21/26(Tue)15:51:23 No.43367736

>https://u.pone.rs/ohzbuqxz.mp3
incredibly important message from purplesmart

Anonymous
07/21/26(Tue)18:51:47 No.43368168

Anonymous 07/21/26(Tue)18:51:47 No.43368168

>>43367736
But they are tasty!

Anonymous
07/21/26(Tue)18:53:17 No.43368171

Anonymous 07/21/26(Tue)18:53:17 No.43368171

File: smart 1754718043789810.png (3.31 MB, 2792x2959)

3.31 MB PNG

>>43367736

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.