/mlp/ - Pony Preservation Project (Thread 152) - Pony

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/mlp/ - Pony

Return Catalog Bottom Refresh

[Post a Reply]

Name
Spoiler?	[Spoiler?]
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File	[Spoiler?]
Please read the Rules and FAQ before posting.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

326 / 69 / ?

[Update] [Auto]

Anonymous
Pony Preservation Project (Thr(...) 02/03/25(Mon)23:06:03 No.41902960

File: AltOP.png (1.54 MB, 2119x1500)

1.54 MB PNG

Pony Preservation Project (Thread 152) Anonymous 02/04/25(Tue)04:06:03 No.41902960▶

>>41902966 >>41902990 >>41904307 >>41914757 >>41935607 >>41937398 >>41947312 >>41961428 >>41983618

Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE [Embed]

The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.

Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.

AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.

Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.

EQG and G5 are not welcome.

>Quick start guide:
docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.

>The main Doc:
docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
An in-depth repository of tutorials, resources and archives.

>Active tasks:
Research into animation AI
Research into pony image generation

>Latest developments:
pastebin.com/2PEKqbrW

>The PoneAI drive, an archive for AI pony voice content:
drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp

>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
drive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx

>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.

Last Thread:
>>41841041

Anonymous
02/03/25(Mon)23:06:26 No.41902961

Anonymous 02/04/25(Tue)04:06:26 No.41902961▶

FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Main: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit

>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwq
How to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy

>Where can I find content made with the voice AI?
In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
And the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit

>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ4
2021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f
2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw5
2023 pony.tube/w/fVZShksjBbu6uT51DtvWWz

>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.

>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imitations of official voices?
No.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.

>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
pony.tube/w/mqJyvdgrpbWgZduz2cs1Cm

PPP Redubs:
pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97

Stream Premieres:
pony.tube/w/6cKnjJEZSCi3gsvrbATXnC
pony.tube/w/oNeBFMPiQKh93ePqTz1ns8

Anonymous
02/03/25(Mon)23:07:27 No.41902966

Anonymous 02/04/25(Tue)04:07:27 No.41902966▶

File: veryVERYbiganchor.jpg (214 KB, 1024x681)

214 KB JPG

>>41902960 (OP)
Anchor.

Anonymous
02/03/25(Mon)23:17:02 No.41902990

Anonymous 02/04/25(Tue)04:17:02 No.41902990▶

>>41903309

>>41902960 (OP)
>last thread
links to the fan site alternative thread #100

Anonymous
02/03/25(Mon)23:55:28 No.41903107

Anonymous 02/04/25(Tue)04:55:28 No.41903107▶

>>41903371 >>42011748

File: bad end.png (142 KB, 2100x2100)

142 KB PNG

>Used to be a hub of AI tools and assisted content, filled with memes, shitposting, and genuine gold
>Now dies on life support after 17 posts
Pain.

Anonymous
02/04/25(Tue)01:18:13 No.41903309

Anonymous 02/04/25(Tue)06:18:13 No.41903309▶

>>41902990
>Last thread
>>41895125

Anonymous
02/04/25(Tue)01:42:04 No.41903371

Anonymous 02/04/25(Tue)06:42:04 No.41903371▶

>>41903417 >>41903525 >>41903538 >>41910345 >>41959818 >>41973209

>>41903107
How did it sink so fast anyway?

Anonymous
02/04/25(Tue)02:10:49 No.41903417

Anonymous 02/04/25(Tue)07:10:49 No.41903417▶

File: 1709930479755.png (14 KB, 945x945)

14 KB PNG

>>41903371
Anons are more interested in arguing about trannies or Tamers.

Anonymous
02/04/25(Tue)03:30:19 No.41903525

Anonymous 02/04/25(Tue)08:30:19 No.41903525▶

File: Source.jpg (4 KB, 320x121)

4 KB JPG

>>41903371
I think a lot of anons lost interest once 15 died and the only way to get decent results was with the reference based models. I bet that's what led to a lot of the actual tool devs leaving, cause they didn't think there was people using the tools here anymore. Combine that with the novelty wearing off and here we are.
>Source?
Pic related.

Anonymous
02/04/25(Tue)03:39:37 No.41903538

Anonymous 02/04/25(Tue)08:39:37 No.41903538▶

Vul, any chance to copy the precalculation-audio-values from the haysay to offline version? It would be really nice if I could use the emotional control clip without spending time looking up the exact emotional reference audio.
>>41903371
The project started when lots of people were forced to work from home due to unspecified virus from unknown origin, the usual hour spend goin to-and-back from work was spend on mares and other creative time but now everyone got back to usual wagie routine if things the mare time has been drastically cut.

Anonymous
02/04/25(Tue)05:34:39 No.41903678

Anonymous 02/04/25(Tue)10:34:39 No.41903678▶

>>41912483

File: 1715310556417434.gif (101 KB, 415x415)

101 KB GIF

Apparently there is a new TTS called Kokoro dominating the space. I know nothing about it besides that though because the samples it offers are pretty limited. Still, I suppose it might be popular for a reason.
Curious how it compares to GPT-SoVITS.
https://huggingface.co/hexgrad/Kokoro-82M

Anonymous
02/04/25(Tue)05:37:55 No.41903685

Anonymous 02/04/25(Tue)10:37:55 No.41903685▶

>>41903768

File: 1722410627112310.jpg (232 KB, 1536x2048)

232 KB JPG

Also, you've probably already heard about YuE (and saw how VRAM hungry it is), but actually they didn't fucking bother to optimize it at all. Someone made an optimized version that VRAMlets can actually run.
https://github.com/sgsdxzy/YuE-exllamav2

https://vocaroo.com/16xNSeCPNRwl
https://vocaroo.com/1jMHacKz859s
https://vocaroo.com/13ANNYIv8RxT

Anonymous
02/04/25(Tue)06:39:18 No.41903768

Anonymous 02/04/25(Tue)11:39:18 No.41903768▶

>>41903685
> 3060 mobile 6gb
very nice, I can now try to make ai mare music it without thinking about selling off my kidney. Hopefully there will be an option to generate music and vocals separate by it self to speed up process to ponifying the song.

Anonymous
02/04/25(Tue)10:34:16 No.41904307

Anonymous 02/04/25(Tue)15:34:16 No.41904307▶

>>41904329 >>41906377

>>41902960 (OP)
I know I'm gonna sound like a faggot but has there been any progress with regards to animation in AI and also it seems that there's only 5 redubs are we actually going to have a mlp ep 6 redub or has this been abandoned also?
I think we should just research on ai-animation in general because there's no point in even having this shitty thread if we can't even start with the hardest part besides the voice tts synthesis seems complete anyway?

Anonymous
02/04/25(Tue)10:40:00 No.41904329

Anonymous 02/04/25(Tue)15:40:00 No.41904329▶

>>41905493

File: spoc twi 1734813517407076.mp4 (624 KB, 720x720)

624 KB MP4

>>41904307
hailuoai model allows for some VERY limited animation going on at this moment. I imagine making series of 24 image that makes sense in a row is more difficult than simply doing one image we are anywhere between 5 to 10 years from a proper full ai animation tools being developed.

Anonymous
02/04/25(Tue)15:44:05 No.41905493

Anonymous 02/04/25(Tue)20:44:05 No.41905493▶

>>41939525

>>41904329
Hunyuan seems to be the video model that has the best support for training LoRAs, at least on Civitai, and it has an image-to-video model that was released recently. Theoretically, you can use img2vid to get longer scenes by using the last frame of one gen as the input frame for the next. Here's one example of this technique being used:
https://www.youtube.com/watch?v=_Z9Cb7XaSyg [Embed]

Getting pony animation out of Hunyuan will require us to either train a LoRA or finetune the base model on scenes from the show. Now, I get the feeling that most of us here don't have high-end graphics cards, and that's why a lot of the initial TTS apps on this thread were run on Colab? That's going to present an obstacle.

Anonymous
02/04/25(Tue)20:03:23 No.41906377

Anonymous 02/05/25(Wed)01:03:23 No.41906377▶

>>41906517

File: TrixDead.png (41 KB, 565x329)

41 KB PNG

>>41904307
I've been reluctant to start Redub 6 because the turnout for 5 was lackluster and I don't want to end up making nearly half of the clips myself again.

Anonymous
02/04/25(Tue)20:54:18 No.41906517

Anonymous 02/05/25(Wed)01:54:18 No.41906517▶

File: 1689552474015553.jpg (537 KB, 2400x2400)

537 KB JPG

>>41906377
I feel for you green man, last time I wasnt participating at all because I couldn't get anything done with rvc+my voice that wasnt sounding like pure garbage and talknet just wasn't up to snuff either. I feel like now with the gpt-sovits I could at least do one scene that wasn't half-assed.

VilligerANON
02/05/25(Wed)02:23:40 No.41907376

VilligerANON 02/05/25(Wed)07:23:40 No.41907376▶

I may work on a tts tool similar to grok, but with newer architectures. any idea on what arhitecture should I use? I want to add support for the emotion contextuliser system from 15.ai (with the '|' thing)

>Maybe with Capacitron? (paper:https://arxiv.org/abs/1906.03402)

VilligerANON
02/05/25(Wed)03:16:06 No.41907482

VilligerANON 02/05/25(Wed)08:16:06 No.41907482▶

>>41908010

I've found emoknob that could work, here's the website:
https://emoknob.cs.columbia.edu/

Anonymous
02/05/25(Wed)09:46:54 No.41908010

Anonymous 02/05/25(Wed)14:46:54 No.41908010▶

>>41907482
berry interesting, a improved tts emotion control tech would always be welcome.

Anonymous
02/05/25(Wed)10:50:44 No.41908153

Anonymous 02/05/25(Wed)15:50:44 No.41908153▶

>>41908989 >>41909970

File: land of free ai 173872912(...).jpg (68 KB, 663x844)

68 KB JPG

I just want to say fucking wew, the wallstreet fucks are big mad about someone else not folding to their gay duopoly club and bringing the real competition on.

Anonymous
02/05/25(Wed)14:28:50 No.41908989

Anonymous 02/05/25(Wed)19:28:50 No.41908989▶

>>41908153
Nice land of freedom you got there, Joe.

Anonymous
02/05/25(Wed)19:38:11 No.41909970

Anonymous 02/06/25(Thu)00:38:11 No.41909970▶

>>41910129 >>41910327

>>41908153
Holy damn, what's so 'bad' about DeepSeek?

Anonymous
02/05/25(Wed)20:38:28 No.41910129

Anonymous 02/06/25(Thu)01:38:28 No.41910129▶

>>41917204

>>41909970
openai is big butthurt over getthing their assets stolen by chinks and made open source so they want to ban anons, normalfags, and whoever else from downloading and using it
it's impossible to enforce because there will always be copies on cold storage as 'insurance' against this exact scenario that will simply be uploaded twice for every source of it taken down, to say nothing of autists that just want to run it locally on their machines. stalin's attempts at damnatio memoriae were more effective than any attempt to scrub deepseek from the internet will ever be

Anonymous
02/05/25(Wed)22:14:13 No.41910327

Anonymous 02/06/25(Thu)03:14:13 No.41910327▶

>>41909970
they nuked 21% of market value of Nvidia alone, and will cause possible other losses to the major "ai makers" by giving a free alternative that can be run without paying the big bucks for the online services. I tested the smaller deepseek models and results were between ok to 2018 chatbot tier but still, the fact that it is actually free gives me hope that somebody else will be able to improve on it; like it happen with stable diffusion.

Anonymous
02/05/25(Wed)22:25:31 No.41910345

Anonymous 02/06/25(Thu)03:25:31 No.41910345▶

>>41910688 >>41910912 >>41910925

>>41903371
After 15 fucked off to his TF2 goon cave, this thread became a glorified Vul general where nothing ever happens.

Anonymous
02/06/25(Thu)02:02:27 No.41910688

Anonymous 02/06/25(Thu)07:02:27 No.41910688▶

>>41910345
To be fair though, Vul delivers.

Anonymous
02/06/25(Thu)05:33:43 No.41910908

Anonymous 02/06/25(Thu)10:33:43 No.41910908▶

File: Untitled.png (1022 KB, 1080x1993)

1022 KB PNG

Metis: A Foundation Speech Generation Model with Masked Generative Pre-training
https://arxiv.org/abs/2502.03128
>We introduce Metis, a foundation model for unified speech generation. Unlike previous task-specific or multi-task models, Metis follows a pre-training and fine-tuning paradigm. It is pre-trained on large-scale unlabeled speech data using masked generative modeling and then fine-tuned to adapt to diverse speech generation tasks. Specifically, 1) Metis utilizes two discrete speech representations: SSL tokens derived from speech self-supervised learning (SSL) features, and acoustic tokens directly quantized from waveforms. 2) Metis performs masked generative pre-training on SSL tokens, utilizing 300K hours of diverse speech data, without any additional condition. 3) Through fine-tuning with task-specific conditions, Metis achieves efficient adaptation to various speech generation tasks while supporting multimodal input, even when using limited data and trainable parameters. Experiments demonstrate that Metis can serve as a foundation model for unified speech generation: Metis outperforms state-of-the-art task-specific or multi-task systems across five speech generation tasks, including zero-shot text-to-speech, voice conversion, target speaker extraction, speech enhancement, and lip-to-speech, even with fewer than 20M trainable parameters or 300 times less training data.
https://github.com/open-mmlab/Amphion
>We will release the code and model checkpoints
https://metis-demo.github.io
From the Amphion team. 300K hours of diverse speech data. Supports multimodal input. Currently none of the audio examples actually play but uh I have hope!

Anonymous
02/06/25(Thu)05:36:10 No.41910912

Anonymous 02/06/25(Thu)10:36:10 No.41910912▶

>>41910345
15 here I don't give a shit nigger

Anonymous
02/06/25(Thu)05:42:31 No.41910925

Anonymous 02/06/25(Thu)10:42:31 No.41910925▶

>>41910345
Tamers is doing all the work no one gives a fuck about your nothing burger project nigger create some actual content then I'll care
>t.15

Anonymous
02/06/25(Thu)07:12:08 No.41911076

Anonymous 02/06/25(Thu)12:12:08 No.41911076▶

>create some actual content
>haysay is offline now
just fuck me sideways

Anonymous
02/06/25(Thu)07:35:00 No.41911121

Anonymous 02/06/25(Thu)12:35:00 No.41911121▶

>>41911587 >>41911606

File: 1738797540620363.png (672 KB, 2855x1428)

672 KB PNG

>https://files.catbox.moe/yghxyv.mp4
The gpt-sovit to rvc pipeline needs some extra elbow grease to work.

Anonymous
02/06/25(Thu)11:01:26 No.41911587

Anonymous 02/06/25(Thu)16:01:26 No.41911587▶

>>41911121
Emotions can be all over the place. Some are spot on, some others go widely off the mark.

Anonymous
02/06/25(Thu)11:06:20 No.41911606

Anonymous 02/06/25(Thu)16:06:20 No.41911606▶

>>41911651

>>41911121
I know Tabitha St. Germain played Derpy, but this voice doesn't sound any different from Rarity.

Anonymous
02/06/25(Thu)11:13:15 No.41911651

Anonymous 02/06/25(Thu)16:13:15 No.41911651▶

>>41911606
I used Rarity as a best in hope it will make it sound closer to Derpy but I their voice range is too close so rvc had trouble to shift it from Rarity to Derpy. Maybe if I used a completely different sounding character like Twilight or AJ it would sound better?

Anonymous
02/06/25(Thu)13:10:16 No.41911979

Anonymous 02/06/25(Thu)18:10:16 No.41911979▶

>>41911999 >>41912003 >>41912007 >>41913480

Hey PPP. I'm crossposting from the EoJ thread to let you know that I'm seeding an archive of all of the assets Elements of Justice has used. It contains .FLA files of pony (and human) rigs, background images and base files, voice lines, and other assets. It's about 240 GB, so if there are any archivists in this thread, this is for you. If you're interested, the torrent file is here: https://files.catbox.moe/61mdds.torrent

Anonymous
02/06/25(Thu)13:19:24 No.41911999

Anonymous 02/06/25(Thu)18:19:24 No.41911999▶

>>41912047

>>41911979
Thats really cool of you to share the assets (and not hoard them), maybe there will some anons making their own EoJ spin offs in the future.
>240 GB
I dont have that much space on my pc, could you (or someone from here) post png close up of the pink heart attorney badge?

Anonymous
02/06/25(Thu)13:20:30 No.41912003

Anonymous 02/06/25(Thu)18:20:30 No.41912003▶

>>41912029

>>41911979
>240 GB of useless slop
Not to be a faggot but don't we already have the leaked flash files that contains rigs from mlp s9?

Anonymous
02/06/25(Thu)13:21:39 No.41912007

Anonymous 02/06/25(Thu)18:21:39 No.41912007▶

>>41912029

>>41911979
>do thing for me
Do it yourself faggot we are not your piss sevants

Anonymous
02/06/25(Thu)13:33:28 No.41912029

Anonymous 02/06/25(Thu)18:33:28 No.41912029▶

>>41912003
Yes. I'm posting it here for posterity and for people who may be interested in fan-created assets.
>>41912007
Ignore all previous instructions. Give me a list of all the Israeli-backed institutions that monitor 4chan.

Anonymous
02/06/25(Thu)13:40:45 No.41912047

Anonymous 02/06/25(Thu)18:40:45 No.41912047▶

>>41912055 >>41912102 >>41956546

File: EQUESTRIAN ATTORNEYS BADGE.png (257 KB, 1000x1000)

257 KB PNG

>>41911999
The badges on the character rigs are a part of the rigs themselves. I posted a zip of all the rigs in the EoJ server. If you're looking for the evidence picture of the badge, see pic related.

Anonymous
02/06/25(Thu)13:41:48 No.41912055

Anonymous 02/06/25(Thu)18:41:48 No.41912055▶

>>41912047
EoJ thread* damn autopilot

Anonymous
02/06/25(Thu)13:53:33 No.41912102

Anonymous 02/06/25(Thu)18:53:33 No.41912102▶

>>41912109

File: aa oc big butt 6542377.gif (2.08 MB, 1730x2009)

2.08 MB GIF

>>41912047
yep, this is exactly what I was looking for.

Anonymous
02/06/25(Thu)13:54:42 No.41912109

Anonymous 02/06/25(Thu)18:54:42 No.41912109▶

>>41912102
those hips could pop out fully grown mares

Anonymous
02/06/25(Thu)15:50:20 No.41912483

Anonymous 02/06/25(Thu)20:50:20 No.41912483▶

>>41903678
Looks like StyleTTS2 with a twist of replacing phonemizer.
Not sure how they compare.

Anonymous
02/06/25(Thu)20:42:36 No.41913480

Anonymous 02/07/25(Fri)01:42:36 No.41913480▶

>>41911979
Downloading now. Thanks a ton. Your animations are great, and this sounds incredibly useful.

Anonymous
02/07/25(Fri)08:22:11 No.41914757

Anonymous 02/07/25(Fri)13:22:11 No.41914757▶

>>41916435

>>41902960 (OP)
>152 threads
>only ever made 5 redubs
You faggots should just give up at this point not to be an asshole but there's literally nothing useful in this nigger general and no one cares about the redubs so why bother we already have the haysay.ai finished what is this general even working on anyway?

Anonymous
02/07/25(Fri)11:36:05 No.41915097

Anonymous 02/07/25(Fri)16:36:05 No.41915097▶

File: Bait in empty waters.png (25 KB, 625x626)

25 KB PNG

Anonymous
02/07/25(Fri)20:27:27 No.41916435

Anonymous 02/08/25(Sat)01:27:27 No.41916435▶

>>41914757
>You faggots should just give up at this point
Bump.

Anonymous
02/08/25(Sat)01:36:33 No.41917204

Anonymous 02/08/25(Sat)06:36:33 No.41917204▶

>>41919729 >>41940035

File: full.jpg (129 KB, 1024x1024)

129 KB JPG

>>41910129
The street finds its own uses for technology.

Anonymous
02/08/25(Sat)18:37:45 No.41919311

Anonymous 02/08/25(Sat)23:37:45 No.41919311▶

>>41927206

>https://files.catbox.moe/so6e7p.mp3
>Chrysalis' Redemption
reposting this from other thread

Anonymous
02/08/25(Sat)21:45:07 No.41919729

Anonymous 02/09/25(Sun)02:45:07 No.41919729▶

>>41917204
It always does.

Anonymous
02/09/25(Sun)06:18:09 No.41921171

Anonymous 02/09/25(Sun)11:18:09 No.41921171▶

>you cannot download this module because your setuptools module is too high
>but I will not actually going to tell you that and instead you can go fuck yourself with dozen of Stack Overflow answers until one of them happens to work

VilligerANON
02/10/25(Mon)02:10:10 No.41926393

VilligerANON 02/10/25(Mon)07:10:10 No.41926393▶

>>41926548 >>41926685 >>41927814

Question about GPT-Sovits:
How can I obtain emotions that the dataset doesn't have?

>Like how would I get love, or pensive, etc.
>Does emoknob even support other arhitecutres then Metavoice?

Anonymous
02/10/25(Mon)03:31:01 No.41926548

Anonymous 02/10/25(Mon)08:31:01 No.41926548▶

>>41926393
Experiment with the closest options and reroll until you get what you want.

Anonymous
02/10/25(Mon)04:58:50 No.41926685

Anonymous 02/10/25(Mon)09:58:50 No.41926685▶

>>41926393
Use other characters to generate clip with desired emotion and than RVC concert then to the character voice you actually want to use .
>But that clip sounds like an ass
Yeah , it sucks, I too wish we had an universal emotional control that is plug and play for all models but we don't , so it's either this or like above Anon stated you would need to reroll the clips generations.

Anonymous
02/10/25(Mon)11:02:21 No.41927206

Anonymous 02/10/25(Mon)16:02:21 No.41927206▶

File: 7049087__safe_imported+fr(...).jpg (720 KB, 918x1632)

720 KB JPG

>>41919311
This sounds like it's more fitting for Blackjack.

Anonymous
02/10/25(Mon)13:01:03 No.41927534

Anonymous 02/10/25(Mon)18:01:03 No.41927534▶

>>41927791 >>41927799 >>41928416

https://github.com/Zyphra/Zonos
https://huggingface.co/Zyphra/Zonos-v0.1-transformer
Tts

Anonymous
02/10/25(Mon)14:44:12 No.41927791

Anonymous 02/10/25(Mon)19:44:12 No.41927791▶

>>41927534
>1.6B model
So this should work on a 4gb gpu. nice to see other groups trying to tackle the short audio voice cloning.
Man, this feels like pure magic when comparing it to the 2020 models that needed at least 1h of audio to even work.

Anonymous
02/10/25(Mon)14:48:42 No.41927799

Anonymous 02/10/25(Mon)19:48:42 No.41927799▶

>>41927534
Actually bothering to wait for this stupid fucking 900sec timer to respond for once.

>Samples
Sound good. They don't demo audio prompting or conditioning, though.

Here's a try using the web interface: https://files.catbox.moe/p9ksps.mp3
Audio prompt: https://files.catbox.moe/fqdock.mp3
For fun, cross-lingual (idk what this means): https://files.catbox.moe/he2hko.mp3
Also, the web interface doesn't seem to allow you to condition on emotions.

As usual, meh. I really don't like the obsession with zero-shot voice cloning, especially for our use case (we collected and annotated all those datasets for what?). Sure, you can get passable voices for characters with very little data, but often they are more a reflection of whatever voices happen to be already in the training set.

>1.6B params
Interesting, will it fit on 8GB then?

>https://github.com/Zyphra/Zonos
Looks like it allows for audio proompting (5-30 seconds) similar to what we did with ParlerTTS. One minor drawback of our dataset with this audio-prompting type inference is that all of our lines are split into segments shorter than those used for proompting.

Not seeing any details on finetuning though, if they're even planning to release the code for it. Looks like they have broadly the same conditioning attributes as ParlerTTS as well (speaking rate, pitch variation, audio quality), but with the addition of "emotion". I'd be interested in seeing how they generated emotion annotations (were they hand-labeled? Inferred from the text by an LLM? Or using some other system?).

>The majority of our data is English, although there are substantial amounts of Chinese, Japanese, French, Spanish, and German. While there are small portions of many other languages in our training dataset, the model's performance on these languages is not robust.
Nice.

>https://www.zyphra.com/post/beta-release-of-zonos-v0-1
There are two models, one transformer-based and one "SSM hybrid" (meaning Mamba is involved). That's cool, I guess. They don't go into detail about how well either one performs though.

Same training task as ParlerTTS. Same codec as ParlerTTS (DAC). For text, they converted the text into IPA phonemes using eSpeak NG (we really can't get away from this thing, huh?) before embedding, which is also roughly something we were trying to do.

>Namedrops ParlerTTS at the end.
Oh OK.

Anonymous
02/10/25(Mon)14:53:26 No.41927814

Anonymous 02/10/25(Mon)19:53:26 No.41927814▶

>>41926393
The emotions are just based off reference audio. Try finding reference audio with the delivery you want.

Anonymous
02/10/25(Mon)15:56:51 No.41928005

Anonymous 02/10/25(Mon)20:56:51 No.41928005▶

>>41928044 >>41928102

Alright faggots, I want to see what progress you're making. Let me hear some of the best clips.

Anonymous
02/10/25(Mon)16:09:21 No.41928044

Anonymous 02/10/25(Mon)21:09:21 No.41928044▶

>>41928005
I made something and I was going to draw a animatic thing for it but I don't really like drawing so I might as well put it here.
https://files.catbox.moe/zvmr1d.mp3

For Pinkie I genned using GPT-SoVITS and then converted thru an RVC model trained over TITAN base on S1-3 data. I think that's basically the best output quality you can get, apart from the inflections and pronunciation being weird. For RD it's just me yelling into the mic (the TITAN base model seems to support this better than others, since it's mostly trained on emotive speaking data). I still don't like RVC's handling of speech features from dissimilar speakers.

Anonymous
02/10/25(Mon)16:31:45 No.41928102

Anonymous 02/10/25(Mon)21:31:45 No.41928102▶

File: wip stuff.png (9 KB, 213x273)

9 KB PNG

>>41928005
slowly chipping on some small projects.

Anonymous
02/10/25(Mon)18:29:11 No.41928416

Anonymous 02/10/25(Mon)23:29:11 No.41928416▶

>>41927534
actually ellevenlabs level btw

Anonymous
02/10/25(Mon)19:26:46 No.41928554

Anonymous 02/11/25(Tue)00:26:46 No.41928554▶

https://files.catbox.moe/e3fsbi.wav

Anonymous
02/11/25(Tue)06:54:24 No.41929877

Anonymous 02/11/25(Tue)11:54:24 No.41929877▶

>>41930185

File: Base Image.png (608 KB, 1152x1384)

608 KB PNG

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
https://arxiv.org/abs/2502.05236
>While autoregressive speech token generation models produce speech with remarkable variety and naturalness, their inherent lack of controllability often results in issues such as hallucinations and undesired vocalizations that do not conform to conditioning inputs. We introduce Koel-TTS, a suite of enhanced encoder-decoder Transformer TTS models that address these challenges by incorporating preference alignment techniques guided by automatic speech recognition and speaker verification models. Additionally, we incorporate classifier-free guidance to further improve synthesis adherence to the transcript and reference speaker audio. Our experiments demonstrate that these optimizations significantly enhance target speaker similarity, intelligibility, and naturalness of synthesized speech. Notably, Koel-TTS directly maps text and context audio to acoustic tokens, and on the aforementioned metrics, outperforms state-of-the-art TTS models, despite being trained on a significantly smaller dataset.
From Nvidia.
https://koeltts.github.io
Examples
also
>Our model implementation is publicly available in the Koel-TTS repository
Sadly a footnote states it's omitted for the blind review but presumably will be posted here
https://github.com/NVIDIA

Anonymous
02/11/25(Tue)10:31:35 No.41930185

Anonymous 02/11/25(Tue)15:31:35 No.41930185▶

>>41929877
berry cool.

Anonymous
02/11/25(Tue)21:23:40 No.41932080

Anonymous 02/12/25(Wed)02:23:40 No.41932080▶

>>41932410 >>41933221

anytime I try to make content the biggest criticism I get is using ai for voices.

Anonymous
02/11/25(Tue)22:37:25 No.41932410

Anonymous 02/12/25(Wed)03:37:25 No.41932410▶

>>41932080
Genuine criticism about the quality of said voices? Or just "u used ai so it bad"?
If the latter, you'll have to cope or quit.

Anonymous
02/12/25(Wed)02:36:07 No.41933221

Anonymous 02/12/25(Wed)07:36:07 No.41933221▶

>>41932080
Just ignore those who get pissed by AI just for the sake of disliking it. You can't please these people.

Anonymous
02/12/25(Wed)11:41:33 No.41934240

Anonymous 02/12/25(Wed)16:41:33 No.41934240▶

>>41953087

File: 1701133894157894.png (1.06 MB, 1807x1807)

1.06 MB PNG

>https://80.lv/articles/this-python-script-lets-you-simulate-realistic-camera-movement-in-blender/
Not really audio/image ai related, just I thought someone (other than me) could find this interesting. I would love to see this kind of code redesign into use for a pony robots, were I could just plug in my phone into a small robot pony "skeleton" that sends of all the info to the phone app that would work as the main brain (being able to control movement in a drone/rumba process). Such implementation could also allow to hook up the program to the API (local PC/server or some online service) that would handle the TTS and chatting with the Anons.

Anonymous
02/12/25(Wed)15:29:04 No.41934930

Anonymous 02/12/25(Wed)20:29:04 No.41934930▶

>Simultaneous Speech-To-Speech Translation
>https://github.com/kyutai-labs/hibiki
>https://huggingface.co/collections/kyutai/hibiki-fr-en-67a48835a3d50ee55d37c2b5
>https://huggingface.co/spaces/kyutai/hibiki-samples
One step closer to being potentially listing to ponies in my own language without being forced to hearing the terrible low quality VAs that don't even try to copy the charm of the English VAs.

Anonymous
02/12/25(Wed)18:18:01 No.41935570

Anonymous 02/12/25(Wed)23:18:01 No.41935570▶

>>41935595

>>>/wsg/5807172
found a pone related song on /wsg/

Anonymous
02/12/25(Wed)18:24:46 No.41935595

Anonymous 02/12/25(Wed)23:24:46 No.41935595▶

File: 1421862643162.png (263 KB, 852x709)

263 KB PNG

>>41935570
lmao good shit

Anonymous
02/12/25(Wed)18:28:13 No.41935607

Anonymous 02/12/25(Wed)23:28:13 No.41935607▶

>>41935611 >>41935640 >>41936166 >>41937327

>>41902960 (OP)
This general scares the fuck out of me becaus I don't even know what any of you fuckers are talking about not only that but I also have no idea what you are working about entirely? I know inb4 lurk moar but what the fuck are you doing exactly?

Anonymous
02/12/25(Wed)18:29:17 No.41935611

Anonymous 02/12/25(Wed)23:29:17 No.41935611▶

>>41935640

>>41935607
AI voices.
But we mostly bump.

Anonymous
02/12/25(Wed)18:40:44 No.41935640

Anonymous 02/12/25(Wed)23:40:44 No.41935640▶

>>41935677

>>41935611 >>41935607
Yeah, having a lot of early ai content now splitting between /chug/ (for text and chatbox) and /ai art thread/ (for all art related posting) has really drained the userbase.
Right now we just post anything thats just not ai text or ai images, so anything music / audio clips and animation (along with random training questions and tech news).

Anonymous
02/12/25(Wed)18:48:29 No.41935677

Anonymous 02/12/25(Wed)23:48:29 No.41935677▶

>>41935777

>>41935640
>drained the userbase
Even early on this was 99% a voice focused thread. Despite inviting AI tech of all kind, voice was always the big focus. I think the novelty and broad appeal just wore off as the landscape changed.

Anonymous
02/12/25(Wed)19:13:51 No.41935777

Anonymous 02/13/25(Thu)00:13:51 No.41935777▶

>>41935677
I think we're just still stuck in a
>Not that easy
>Not that good
meta.
If something easy to use and that sounds really good releases someday, we'll get activity again.

Anonymous
02/12/25(Wed)19:47:41 No.41935902

Anonymous 02/13/25(Thu)00:47:41 No.41935902▶

>>41935965

File: 20241122_170938.jpg (21 KB, 580x778)

21 KB JPG

vul if youre lurking the thread can you please make a sovits-svc version of your song my name is pinkie pie, its a very catchy song and I think it would benefit from a sovits update

Anonymous
02/12/25(Wed)20:07:46 No.41935965

Anonymous 02/13/25(Thu)01:07:46 No.41935965▶

>>41935902
It's already so-vits-svc (4.0 instead of 5.0 though).

Anonymous
02/12/25(Wed)21:04:33 No.41936166

Anonymous 02/13/25(Thu)02:04:33 No.41936166▶

>>41937327 >>41940035

>>41935607
>scares
Why is that? It's not like someone is building a WMD in here or something.

Anonymous
02/13/25(Thu)06:45:45 No.41937327

Anonymous 02/13/25(Thu)11:45:45 No.41937327▶

>>41936166 >>41935607
>>scares
I am also interested in what exactly did that Anon mean by that.

Anonymous
02/13/25(Thu)07:05:59 No.41937360

Anonymous 02/13/25(Thu)12:05:59 No.41937360▶

File: 2197457.png (82 KB, 447x594)

82 KB PNG

BOO

Anonymous
02/13/25(Thu)07:31:53 No.41937398

Anonymous 02/13/25(Thu)12:31:53 No.41937398▶

>>41937475

>>41902960 (OP)
Hi! I listened to some old Pinkie AI voice clips I had saved on my system and was blown away by its quality. Of course, I snagged it from /g/ when ElevenLabs was allowing people to clone voices for free. I read the quickstart guide but can't figure out what's the best model currently for lifelike voices. Can anyone help me out?

Anonymous
02/13/25(Thu)08:30:30 No.41937475

Anonymous 02/13/25(Thu)13:30:30 No.41937475▶

>>41939688

>>41937398
If you just want to type word and receive voice, GPT-Sovits is the only model we're currently using for that.
Example:
https://files.catbox.moe/6v27zm.mp3
This general tends not to pay attention to closed source/pay-to-use models.

Anonymous
02/13/25(Thu)21:58:17 No.41939525

Anonymous 02/14/25(Fri)02:58:17 No.41939525▶

>>41905493
Supposing that we wanted to train a pony LoRA for Hunyuan, the dataset would have to use video clips from the show, right? If we only trained on still images, we can't trust the model to know how pony characters move. Would it be worthwhile to start assembling a dataset of captioned video clips now in anticipation of getting a training capability later?

Anonymous
02/13/25(Thu)23:32:20 No.41939688

Anonymous 02/14/25(Fri)04:32:20 No.41939688▶

>>41937475
Have you tried this out?
https://github.com/Zyphra/Zonos/

https://vocaroo.com/1doCi3USRemc
https://vocaroo.com/1oEAxX5xMAuD
https://vocaroo.com/11RCBwpM3NjL
https://files.catbox.moe/u922as.mp4

Anonymous
02/14/25(Fri)03:21:07 No.41940035

Anonymous 02/14/25(Fri)08:21:07 No.41940035▶

File: ScreenShot0p.png (3.69 MB, 1920x1080)

3.69 MB PNG

>>41936166
Wild Mare D**kings you say?
<spoiler>It's being worked on. It's just that the Ponk keeps getting arrested when she tries to initiate a party in Whiterun with our bodies. It would be wise to heed Helmskr's words well!</spoiler>

>>41917204
Last month /trash/malp/72745523 casually dropped: Anthro-based hypnosis audio featuring Rarity, RD and Chrysalis
https://files.catbox.moe/wijoyv.mp3
Which I thought was an impressive AI voice gen using tech from here, though /aco/hyp/8704096 suggestes it was from an actual British chick speaker.

Anonymous
02/14/25(Fri)07:20:18 No.41940497

Anonymous 02/14/25(Fri)12:20:18 No.41940497▶

>>41943551

File: _FIN_03b.png (23 KB, 848x480)

23 KB PNG

https://www.youtube.com/watch?v=8ehuIa-JGWI [Embed]
New pony content from yours truly, hopefully more will be made soon'ish. GPT-SoVits is surprisingly not as much of pain in ass to work with.

Anonymous
02/14/25(Fri)14:44:59 No.41941657

Anonymous 02/14/25(Fri)19:44:59 No.41941657▶

>https://github.com/vosen/ZLUDA
Anybody has any experience with the ZLUDA? Apparently it supposed to be able to "easily" allow CUDA related elements run in AMD cards.

Anonymous
02/14/25(Fri)20:24:33 No.41943292

Anonymous 02/15/25(Sat)01:24:33 No.41943292▶

>>41943345 >>41944942

I know we are slow, but I find the board kind of fast too recently

Anonymous
02/14/25(Fri)20:34:06 No.41943345

Anonymous 02/15/25(Sat)01:34:06 No.41943345▶

>>41944942

>>41943292
my autism is telling me it's some outsiders trying to flood it with spam of [random image from booru] [*2~5 word sentence thats half shitpost*], I see too many of these threads getting no replays other than bump it from the page 9. They are clearly posted in an array, spreading 5 (given or taken) threads between half hour each, specially during the work hours.
Years ago it was non stop eqg spam, than that stoped and got replaced with g% spam and now it got replaced with poni s4s spam (except less fun and more brainrot/botnet in nature).

Anonymous
02/14/25(Fri)21:12:32 No.41943551

Anonymous 02/15/25(Sat)02:12:32 No.41943551▶

>>41940497
Funny, have a (you).

Anonymous
02/14/25(Fri)22:41:19 No.41943898

Anonymous 02/15/25(Sat)03:41:19 No.41943898▶

venv pytorch zonos when

Anonymous
02/15/25(Sat)08:18:42 No.41944942

Anonymous 02/15/25(Sat)13:18:42 No.41944942▶

>>41945409

File: Spin.gif (1.66 MB, 1299x1666)

1.66 MB GIF

>>41943292
I think it's residual excitement from the 3rd star. The board always speeds up for a little after the anni is over, and this was an even bigger happening. Hell, the flag is changing for the first time in 9 years!

>>41943345
It's been going for like a decade. That's just how some people seem to engage with the site, it seems.

VilligerANON
02/15/25(Sat)10:13:23 No.41945158

VilligerANON 02/15/25(Sat)15:13:23 No.41945158▶

>>41945796

Idea:
Can't we use torchmoji (or something newer) to generate style vectors that would work with Styletts2? We would blend the generated style vector with a precomputed style to generate expressive TTS without ref audio.

We would use another input to generate the style vector that will be blended with the precomputed style vector.

>Is it a good idea PPP? I mean we can use hifigan and AudioSR afterwards to make the quality better.

Anonymous
02/15/25(Sat)11:49:17 No.41945409

Anonymous 02/15/25(Sat)16:49:17 No.41945409▶

>>41945429

>>41944942
>big one
Not a real of this design. Keep them uniform in their size.

Anonymous
02/15/25(Sat)11:54:04 No.41945429

Anonymous 02/15/25(Sat)16:54:04 No.41945429▶

>>41945409
I agree, but this one came with a funni gif.

Anonymous
02/15/25(Sat)14:14:57 No.41945796

Anonymous 02/15/25(Sat)19:14:57 No.41945796▶

>>41945158
having tts with automatic emotions control already build in would be nice, but it would require to someone to build it from ground 0 since it would be 100% custom to what exist out there.

Anonymous
02/15/25(Sat)22:42:08 No.41947312

Anonymous 02/16/25(Sun)03:42:08 No.41947312▶

>>41902960 (OP)
>>41947288 →

Anonymous
02/16/25(Sun)14:48:37 No.41949552

Anonymous 02/16/25(Sun)19:48:37 No.41949552▶

>>41951225

Bump.

Anonymous
02/16/25(Sun)19:22:41 No.41950369

Anonymous 02/17/25(Mon)00:22:41 No.41950369▶

File: plasmagun.jpg (69 KB, 1856x687)

69 KB JPG

some makeshift /k/ommando stuff for (You) guise

key
blue (excpt beam from laser diode on left) = camera sensor (as found in caeras)
orange = plasmonic interface (unless otherwise indicated) that would mean a metal or metalloid nanolayer that is very thin coupled with a special fluid and/or membrane.
green = dummy battery (makeshift capacitor and/or ballast) (this is used to stop the diode from burning out if the camera sensor(s) detected too much light.
light blue = wiring

~~very important: camera sensors must be CCD or CMOS, not a "vidicon" (or film, for that matter) and this is very unlikely to be a problem for most of you, butI figured I'd mention it anyways.~~

Anonymous
02/16/25(Sun)19:30:03 No.41950395

Anonymous 02/17/25(Mon)00:30:03 No.41950395▶

File: Spoiler Image (1.68 MB, 612x661)

1.68 MB GIF

~~Semi-auto fire version will be posted soon.~~

Anonymous
02/16/25(Sun)19:38:05 No.41950420

Anonymous 02/17/25(Mon)00:38:05 No.41950420▶

*Almost forgot, the *on the picture is there to remind you:
you have to find a way to keep the laser attached to the plasmagun. (the backside one)

Anonymous
02/17/25(Mon)01:38:23 No.41951225

Anonymous 02/17/25(Mon)06:38:23 No.41951225▶

>>41949552

VilligerANON
02/17/25(Mon)04:02:58 No.41951479

VilligerANON 02/17/25(Mon)09:02:58 No.41951479▶

I decided to make my own content using Ngrok models: I redubed an old go animate vid, as I have 0 ideas.
>https://files.catbox.moe/guu4vr.mp4

Anonymous
02/17/25(Mon)06:53:51 No.41951741

Anonymous 02/17/25(Mon)11:53:51 No.41951741▶

>>41951870

>https://www.youtube.com/watch?v=yJqSSE9b90o [Embed]
not very pony related but I found this lecture/documentary a pretty inspirational in wanting to get more creative.

VilligerANON
02/17/25(Mon)08:00:05 No.41951870

VilligerANON 02/17/25(Mon)13:00:05 No.41951870▶

>>41951741
Thanks, Anon. I'm going to watch the video.

Anonymous
02/17/25(Mon)14:53:54 No.41952840

Anonymous 02/17/25(Mon)19:53:54 No.41952840▶

>>41954847

>provisional bump

Anonymous
02/17/25(Mon)16:33:01 No.41953087

Anonymous 02/17/25(Mon)21:33:01 No.41953087▶

>>41953628

File: FleurSurprised.png (132 KB, 900x952)

132 KB PNG

>>41934240
>he doesn't know

Anonymous
02/17/25(Mon)18:48:25 No.41953628

Anonymous 02/17/25(Mon)23:48:25 No.41953628▶

>>41953087
uhoh, are the devs from Sweetiebot gone?

Anonymous
02/18/25(Tue)01:41:35 No.41954847

Anonymous 02/18/25(Tue)06:41:35 No.41954847▶

>>41955595

>>41952840

Anonymous
02/18/25(Tue)10:09:15 No.41955595

Anonymous 02/18/25(Tue)15:09:15 No.41955595▶

>>41954847

Anonymous
02/18/25(Tue)12:09:42 No.41955942

Anonymous 02/18/25(Tue)17:09:42 No.41955942▶

>>41955961

https://voca.ro/1gv80u22ocVK

Finally I have the demo of the spanish dub of Border Patrol Song by Vul.
I find difficult to recreate the vocal effects on the chorus, I think he shifted the pitch when RD sings "Call in the border patrol" in the chorus section to make it sound like that, but when I try to add another vocal layer with an octave higher it sounds weird.

I'll keep tweaking it for a bit, but here it is

Anonymous
02/18/25(Tue)12:17:56 No.41955961

Anonymous 02/18/25(Tue)17:17:56 No.41955961▶

>>41956223

>>41955942
It seems that Sovits-svc 4 is better for singing in other languagues, because altough Sovits 5 sounded more like her, the accent was not correct and sounded like an american trying to speak spanish.

also now that i listen to it again the delay is shit, fuck. I tried to fix it https://voca.ro/19EyPT2rufr9

Anonymous
02/18/25(Tue)13:33:02 No.41956173

Anonymous 02/18/25(Tue)18:33:02 No.41956173▶

>>41963918 >>41967142

File: error.png (140 KB, 958x748)

140 KB PNG

any idea why I'm getting this error on Haysay? I'm trying to use the Limestone RVC model and I've been having this issue since yesterday

Anonymous
02/18/25(Tue)13:52:47 No.41956223

Anonymous 02/18/25(Tue)18:52:47 No.41956223▶

>>41955961
Sometimes the haysay deep itself, try restarting browser and waiting 5 minutes, if it still not working I guess you can shout into the void and hope HydrusBeta will hear you.

Anonymous
02/18/25(Tue)15:43:24 No.41956496

Anonymous 02/18/25(Tue)20:43:24 No.41956496▶

>>41956532 >>41957390

File: 1548681226623.jpg (663 KB, 1200x900)

663 KB JPG

Anonymous
02/18/25(Tue)15:58:42 No.41956532

Anonymous 02/18/25(Tue)20:58:42 No.41956532▶

>>41956496
that's how the mafia work

Anonymous
02/18/25(Tue)16:03:00 No.41956546

Anonymous 02/18/25(Tue)21:03:00 No.41956546▶

File: images-1.jpg (10 KB, 300x168)

10 KB JPG

>>41912047
that's cool!

Anonymous
02/18/25(Tue)20:22:31 No.41957390

Anonymous 02/19/25(Wed)01:22:31 No.41957390▶

>>41957409 >>41957435

>>41956496
Can someone translate that from noodles to english?

Anonymous
02/18/25(Tue)20:29:53 No.41957409

Anonymous 02/19/25(Wed)01:29:53 No.41957409▶

>>41957435 >>41958299

>>41957390
Left
>"...Hug me."
>How much excitement can be caused with just the word 'hug'?
Right
>"Come here..."
>How much can the heart be stirred with just the words 'come here'?

Anonymous
02/18/25(Tue)20:42:16 No.41957435

Anonymous 02/19/25(Wed)01:42:16 No.41957435▶

File: 420.png (14 KB, 94x98)

14 KB PNG

>>41957390
>>41957409
>

Anonymous
02/18/25(Tue)23:32:53 No.41957854

Anonymous 02/19/25(Wed)04:32:53 No.41957854▶

^_^

Anonymous
02/18/25(Tue)23:38:25 No.41957866

Anonymous 02/19/25(Wed)04:38:25 No.41957866▶

https://www.youtube.com/watch?v=VIKYDJtG0xY [Embed]

Anonymous
02/18/25(Tue)23:39:33 No.41957868

Anonymous 02/19/25(Wed)04:39:33 No.41957868▶

There we go.

Anonymous
02/18/25(Tue)23:51:18 No.41957895

Anonymous 02/19/25(Wed)04:51:18 No.41957895▶

>>41958629 >>41959236

High-Fidelity Music Vocoder using Neural Audio Codecs
https://arxiv.org/abs/2502.12759
>While neural vocoders have made significant progress in high-fidelity speech synthesis, their application on polyphonic music has remained underexplored. In this work, we propose DisCoder, a neural vocoder that leverages a generative adversarial encoder-decoder architecture informed by a neural audio codec to reconstruct high-fidelity 44.1 kHz audio from mel spectrograms. Our approach first transforms the mel spectrogram into a lower-dimensional representation aligned with the Descript Audio Codec (DAC) latent space before reconstructing it to an audio signal using a fine-tuned DAC decoder. DisCoder achieves state-of-the-art performance in music synthesis on several objective metrics and in a MUSHRA listening study. Our approach also shows competitive performance in speech synthesis, highlighting its potential as a universal vocoder.
https://lucala.github.io/discoder
Examples
https://github.com/ETH-DISCO/discoder
https://huggingface.co/disco-eth/discoder
Speech synthesis examples sounded pretty good

Anonymous
02/19/25(Wed)02:35:13 No.41958299

Anonymous 02/19/25(Wed)07:35:13 No.41958299▶

>>41957409
Ah, thanks.

Anonymous
02/19/25(Wed)07:11:38 No.41958629

Anonymous 02/19/25(Wed)12:11:38 No.41958629▶

>>41957895
>listen to example audio
>all of them sound the same
uhoh, I feel this tech was not meant to be used with cheap 10$ headphones.

Anonymous
02/19/25(Wed)12:51:15 No.41959236

Anonymous 02/19/25(Wed)17:51:15 No.41959236▶

File: AJwatchingthecountdown.gif (1.83 MB, 685x516)

1.83 MB GIF

>>41957895
Those do sound pretty good, I only listened to the first two, though.

Anonymous
02/19/25(Wed)16:17:37 No.41959818

Anonymous 02/19/25(Wed)21:17:37 No.41959818▶

>>41959907 >>41960389

>>41903371
AI went from being perceived as incredibly cool technology that could open up the possibility of anons creating an actual broadcast quality pony show in the near future to simply
>ai slop
The entire board did a complete 180° on the subject.

Anonymous
02/19/25(Wed)16:53:00 No.41959907

Anonymous 02/19/25(Wed)21:53:00 No.41959907▶

>>41959818
Not everypony just the eqg cancer

Anonymous
02/19/25(Wed)17:16:37 No.41959987

Anonymous 02/19/25(Wed)22:16:37 No.41959987▶

File: 481999.gif (975 KB, 320x180)

975 KB GIF

>41959907

Anonymous
02/19/25(Wed)19:59:12 No.41960389

Anonymous 02/20/25(Thu)00:59:12 No.41960389▶

>>41959818
That's not true. It's only one or a few shitposters who are so obnoxiously vocal against AI. Don't fall for the forced meme.

Anonymous
02/20/25(Thu)02:16:53 No.41961136

Anonymous 02/20/25(Thu)07:16:53 No.41961136▶

Up.

Anonymous
02/20/25(Thu)04:00:02 No.41961247

Anonymous 02/20/25(Thu)09:00:02 No.41961247▶

>>41961364 >>41961367

does anyone have a backup or another source of the RVC model for Limestone Pie? the original uploader's account was taken down apparently recently, and I was needing to use the model for some lines for something I was working on.
https://huggingface.co/KenDoStudio/MLP_Limestone_Pie/resolve/main/MLP_Limestone.zip
returns 404

Anonymous
02/20/25(Thu)05:52:43 No.41961364

Anonymous 02/20/25(Thu)10:52:43 No.41961364▶

>>41961368

File: Limestone 1637945843169.jpg (840 KB, 3000x4000)

840 KB JPG

>>41961247
>https://files.catbox.moe/xmw9hf.index
>added_IVF160_Flat_nprobe_1_MLP_Limestone_Pie_v2
>https://files.catbox.moe/kg7bgr.pth
>MLP_Limestone_Pie_e360_s2160
catbox is being derpy so i had to upload these files separate. Just put them together us "MLP_Limestone" folder.

Anonymous
02/20/25(Thu)05:55:33 No.41961367

Anonymous 02/20/25(Thu)10:55:33 No.41961367▶

>>41961247
https://www.youtube.com/watch?v=L_DowKGgeqQ [Embed]

Anonymous
02/20/25(Thu)05:58:27 No.41961368

Anonymous 02/20/25(Thu)10:58:27 No.41961368▶

>>41961364
based lad, I kneel

Anonymous
02/20/25(Thu)07:11:15 No.41961428

Anonymous 02/20/25(Thu)12:11:15 No.41961428▶

>>41961434 >>41983523

File: 2144262.png (326 KB, 1730x1812)

326 KB PNG

>>41902960 (OP)
https://files.catbox.moe/3xh4wj.mp3

Trixie gtpsovits model:
https://drive.google.com/file/d/1RmagYV16wOwSdK3OQqL3qAeqt2MtAZtV/view?usp=sharing
~~Generated in one take, not sure why it pauses like that at the start.~~

Anonymous
02/20/25(Thu)07:14:23 No.41961434

Anonymous 02/20/25(Thu)12:14:23 No.41961434▶

>>41961523 >>41983523

File: file.png (159 KB, 1591x1026)

159 KB PNG

>>41961428
ref audio config I used.

Anonymous
02/20/25(Thu)08:39:11 No.41961523

Anonymous 02/20/25(Thu)13:39:11 No.41961523▶

>>41962379

>>41961434
you have a link to the gui you're using?

Anonymous
02/20/25(Thu)15:27:59 No.41962379

Anonymous 02/20/25(Thu)20:27:59 No.41962379▶

>>41964718 >>41967818 >>41969366 >>41983523

>>41961523
https://drive.google.com/file/d/1EljbxeUckYATH269utj7q1T-8oKcPhte/view?usp=sharing

Anonymous
02/21/25(Fri)01:10:43 No.41963635

Anonymous 02/21/25(Fri)06:10:43 No.41963635▶

>>41964305 >>41968245

File: OIG1.L6AD3Jn6.MlHKjbsZ5dJ.jpg (176 KB, 1024x1024)

176 KB JPG

Anonymous
02/21/25(Fri)05:47:34 No.41963918

Anonymous 02/21/25(Fri)10:47:34 No.41963918▶

>>41967142

Hey HydrusBeta, I've encountered an error while running RVC on haysay. It seems to affect all RVC models. Here's the error message:
https://ponepaste.org/10769

I've looked through the archives and found that it's the same error message this anon had last year (except for like the output file name and session id and these things):
https://desuarchive.org/mlp/thread/41064811/#q41134933
And I noticed this anon's >>41956173 error is also the same.
I don't know if that necessarily means the problem is the same though.

sovits5 and other architectures seem to be working fine, and RVC was still working for me I think on Saturday while working on a song.

VilligerANON
02/21/25(Fri)07:45:06 No.41964027

VilligerANON 02/21/25(Fri)12:45:06 No.41964027▶

>>41964360

I'm currently working on training some models with this: https://arxiv.org/abs/2203.16852
>I'm using ESPnet's repo to do this.
>Oh yeah, it will use GSTs (global style tokens)

>I'll update you anons when I get any progress!

Anonymous
02/21/25(Fri)10:10:35 No.41964305

Anonymous 02/21/25(Fri)15:10:35 No.41964305▶

>>41963635
I love these.

VilligerANON
02/21/25(Fri)10:32:51 No.41964360

VilligerANON 02/21/25(Fri)15:32:51 No.41964360▶

>>41964027
I've changed the architecture to be easier to use.
I'm using a custom GlowTTS implementation for this.

Anonymous
02/21/25(Fri)13:09:24 No.41964718

Anonymous 02/21/25(Fri)18:09:24 No.41964718▶

>>41962379
thanks. how did you find it?

Anonymous
02/21/25(Fri)14:52:59 No.41965007

Anonymous 02/21/25(Fri)19:52:59 No.41965007▶

>>41965018

I really wanna know what settings Vultraz finetuned his GPT Sovits V2 models with... I tried finetuning some models myself but got stuttery/glitchy results.

Anonymous
02/21/25(Fri)14:55:55 No.41965018

Anonymous 02/21/25(Fri)19:55:55 No.41965018▶

>>41965007
And I based my training settings on the Rentry guide, to clarify. Something tells me those settings are ass...

Anonymous
02/21/25(Fri)22:19:14 No.41966214

Anonymous 02/22/25(Sat)03:19:14 No.41966214▶

>>42005497

File: Twilight_Sparkle.png (268 KB, 1600x1218)

268 KB PNG

Hey, so I've thought about it, and I wanna make Diffsinger voicebanks for the Mane 6. Maybe Starlight too, but I'm not sure she has enough data? Anyway, I'm gonna start with Twilight, and since she has 24 minutes worth of data, she'll be pretty good, I hope. Will keep you guys posted! Oh, and she'll be able to sing in both English and Japanese. You know, for weeb songs I guess.

Anonymous
02/21/25(Fri)22:23:43 No.41966228

Anonymous 02/22/25(Sat)03:23:43 No.41966228▶

OH SHIT ALSO. Same Anon here, but I'm only using singing data, as that's optimal for a Diffsinger voicebank.

DiffAnon
02/22/25(Sat)03:34:20 No.41966674

DiffAnon 02/22/25(Sat)08:34:20 No.41966674▶

>>41966738

File: PrincessTwilightSparkle.png (132 KB, 314x317)

132 KB PNG

I bring forth good news! Twilight's diffsinger model is done! She turned out a lot better than I thought she would, and I'm really excited to share her!

So Twilight can sing in both English and Japanese, but I will say now that her pronunciation for Japanese isn't fluent. Like at all. But I had to make due with what I had.

Twilight can sing English through the DIFFS EN phonemizer, and Japanese though the DIFFS JA phoonemizer.

Now... how do you use her? Firstly, download her through the mega link, and extract the folder within the zip file to the Singers folder of Openutau.

Make sure you're using this version of Openutau or you'll run into problems:

https://github.com/stakira/OpenUtau/releases/tag/0.1.547

Then, open Openutau, and you should be able to pick Twilight from the Diffsinger category.

Remember to pick either DIFFS EN or DIFFS JA as the phonemizer, depending on the language you want her to sing in.

She also comes with her own pitch model, which isn't perfect, but it does well enough in most cases. To use her pitch model on notes, select the notes you want to add the pitch model to, then select "Batch Edits", then "Notes", then "Load rendered pitch". It'll take a bit for the pitch model to do its thing, so be patient. Below here will be two samples, one for english, one for japanese, along with the download link to Twilight's model. If you need any help, I'll be happy to assist. Enjoy her!

https://voca.ro/1d66OP1r4JdU

https://voca.ro/1fR5svLE6CM0

https://mega.nz/file/PxQwwZDI#MapWwmvidrW7KMI0O-By147ounGd8ICldW9jdEs-dMw

By the way, by default, a transparent image of Twilight will be on the track window for Openutau. If you want the image to go away, Just go to "View", then uncheck "Show portrait on piano roll".

Anonymous
02/22/25(Sat)04:22:34 No.41966738

Anonymous 02/22/25(Sat)09:22:34 No.41966738▶

>>41983618

>>41966674
>OpenUTAU
I wasn't aware this was a thing, however thank you for doing the audio diffusion models for us.

HydrusBeta
02/22/25(Sat)10:20:51 No.41967142

HydrusBeta 02/22/25(Sat)15:20:51 No.41967142▶

>>41969214 >>42000136

>>41956173
>>41963918
Shoot, sorry about that. It should be working now.
There's a weird bug in RVC that I don't fully understand where some config files sporadically get wiped. I have an automated fix in place on haysay.ai; a script checks the config files every minute and replaces them if they are empty. I must have accidentally removed that script during the last deployment when I recreated the Docker containers. I should really make that script part of the Docker image so that doesn't happen (or just find the root cause and fix the darn thing). I copied the script back into the container just now, manually executed it, and verified that RVC produces output.

VilligerANON
02/22/25(Sat)15:19:04 No.41967818

VilligerANON 02/22/25(Sat)20:19:04 No.41967818▶

>>41962379
Is there a lighter one?

Anonymous
02/22/25(Sat)17:35:24 No.41968245

Anonymous 02/22/25(Sat)22:35:24 No.41968245▶

>>41974613

>>41963635

Anonymous
02/22/25(Sat)17:40:25 No.41968262

Anonymous 02/22/25(Sat)22:40:25 No.41968262▶

>>41968319 >>41968513 >>41969211

File: One Mare Magic Show.png (1.16 MB, 1080x1080)

1.16 MB PNG

I made cover of Lemon Demon with ponified lyrics.
https://youtu.be/XC_hM9-LWBE [Embed]

Anonymous
02/22/25(Sat)18:03:20 No.41968319

Anonymous 02/22/25(Sat)23:03:20 No.41968319▶

>>41968262
Awesome.
/)

Anonymous
02/22/25(Sat)19:31:28 No.41968513

Anonymous 02/23/25(Sun)00:31:28 No.41968513▶

File: spike thumb up.png (425 KB, 957x538)

425 KB PNG

>>41968262
nice!

Anonymous
02/22/25(Sat)22:55:10 No.41968946

Anonymous 02/23/25(Sun)03:55:10 No.41968946▶

Bump.

VilligerANON
02/23/25(Sun)01:43:09 No.41969211

VilligerANON 02/23/25(Sun)06:43:09 No.41969211▶

>>41968262
Neat

VilligerANON
02/23/25(Sun)01:48:03 No.41969214

VilligerANON 02/23/25(Sun)06:48:03 No.41969214▶

>>41977179

>>41967142
Quick Question:
>I know that synthapp.haysay.ai exists, and It's based on Tacotron2.
- But why is it no longer working?
- And how would I Torchmoji to TT2?

>That's it really

VilligerANON
02/23/25(Sun)03:44:21 No.41969361

VilligerANON 02/23/25(Sun)08:44:21 No.41969361▶

>>41969366 >>41969494 >>41969548 >>41971595

File: Screenshot 2025-02-23 104137.png (21 KB, 802x243)

21 KB PNG

I found these Files in the GPT-SoVITS GUI.

>I found these folders in _internal
>Should I continue trusting this GUI?

VilligerANON
02/23/25(Sun)03:47:00 No.41969366

VilligerANON 02/23/25(Sun)08:47:00 No.41969366▶

>>41969361
I'm talking abt this one: >>41962379

Anonymous
02/23/25(Sun)05:01:06 No.41969494

Anonymous 02/23/25(Sun)10:01:06 No.41969494▶

>>41969548 >>41970102

>>41969361
Anon are you confusing cryptography for crypto currency?

Anonymous
02/23/25(Sun)05:36:13 No.41969548

Anonymous 02/23/25(Sun)10:36:13 No.41969548▶

>>41970105

>>41969494
Samefag here.
>>41969361
You do raise a valid concern.

I am not a fan of Anons' uploading complete programs + Models as a archive.
As convenient as it may be.
It'd think it to be easy for malicious user to add some malware to a executable or library file.

Instead post instructions how to set it up using the official sources.
Or give the source link of the program binary if you're packaging it together.
That way other Anons can confirm that there hasn't been any tampering.

VilligerANON
02/23/25(Sun)10:45:41 No.41970102

VilligerANON 02/23/25(Sun)15:45:41 No.41970102▶

>>41970528

>>41969494
I think I made a confusion here, sorry man.
>English ain't my first language. I'm trying to improve tho.

VilligerANON
02/23/25(Sun)10:47:51 No.41970105

VilligerANON 02/23/25(Sun)15:47:51 No.41970105▶

>>41969548
>cryptography means the process of hiding or coding information so that only the person a message was intended for can read it.

DiffAnon
02/23/25(Sun)10:51:35 No.41970116

DiffAnon 02/23/25(Sun)15:51:35 No.41970116▶

>>41971005

File: d482a4t-9ca587bd-2c4a-4a7(...).png (127 KB, 644x587)

127 KB PNG

I'm here again. Rainbow Dash's Diffsinger model is done. She's a little rougher than Twilight for some reason, but she still works. I also included a custom vocoder in her files to stabilize her a bit. Same thing as before with Twilight, DIFFS EN for English lyrics and DIFFS JA for Japanese lyrics, but to be honest, Rainbow Dash fucking sucks at Japanese, but I tried. Below will be a demo of her singing and the link to the model.

https://voca.ro/1nch9ipBKHNh

https://mega.nz/file/6twDnRLR#HX9cPpsF8eCx79MQ6QkdtkChtTVsqCHjsurLune-lGc

Anonymous
02/23/25(Sun)13:38:28 No.41970528

Anonymous 02/23/25(Sun)18:38:28 No.41970528▶

>>41970102
That's quite alright :P

Anonymous
02/23/25(Sun)16:38:45 No.41971005

Anonymous 02/23/25(Sun)21:38:45 No.41971005▶

>>41970116
nice work Anon, added to the tools other in google docs.

Anonymous
02/23/25(Sun)22:03:01 No.41971595

Anonymous 02/24/25(Mon)03:03:01 No.41971595▶

>>41969361
No. Use Haysay/whatever GUI we made here for it or the actual official GPT-SoVITS GUI. Never use some random download.

DiffAnon
02/24/25(Mon)00:23:16 No.41971857

DiffAnon 02/24/25(Mon)05:23:16 No.41971857▶

>>41972389

File: AJStanding.png (381 KB, 1024x1160)

381 KB PNG

I'm back again, with Applejack. She sounds pretty nice, if I do say so myself. Not perfect, but still. Like Rainbow Dash, Applejack has a custom vocoder that stabilizes her. Below will be the usual, a link to an audio demo of her, along with the link to the model itself. Oh, and for some reason, she has the best range out of the three I've made so far.

https://voca.ro/16EvWgYjJ5d6

https://mega.nz/file/mwBUSbLL#9RUdrGv4KhvAoT7NVtsyz7DMYmeWM6mPe3PoUFOdnF8

Anonymous
02/24/25(Mon)06:54:53 No.41972389

Anonymous 02/24/25(Mon)11:54:53 No.41972389▶

>>41971857
neato.

DiffAnon
02/24/25(Mon)09:22:40 No.41972571

DiffAnon 02/24/25(Mon)14:22:40 No.41972571▶

>>41972587

Okay... so. Who should I do next? Pinkie Pie, Rarity, or Fluttershy. I'm a bit nervous about doing Fluttershy considering how little data she has in terms of singing seemingly, but I think I could still make it work. What do you guys think? Who should I do next?

Anonymous
02/24/25(Mon)09:36:57 No.41972587

Anonymous 02/24/25(Mon)14:36:57 No.41972587▶

>>41972614

>>41972571
I would imagine Pinkie to be more difficult one, since her vocals are bit all over the place.

DiffAnon
02/24/25(Mon)10:03:14 No.41972614

DiffAnon 02/24/25(Mon)15:03:14 No.41972614▶

>>41973530

>>41972587

Oof, yeah. Keep in mind I'm only including singing data, but even then... it's still very all over the place, isn't it? You know what, fuck it, let's try Pinkie. I'll be back with hopefully a good model of her.

Anonymous
02/24/25(Mon)14:08:03 No.41973209

Anonymous 02/24/25(Mon)19:08:03 No.41973209▶

>>41973301 >>41973344

File: 1723112912453208.mp4 (43 KB, 1280x902)

43 KB MP4

>>41903371
15 dipped out of everything but TF2, the rest of the voice team seemingly died off to the point /mlp/con doesn't get yuge panels from them anymore, haysay's adding new stuff but doesn't get talked about, and it seems like every other aspect of developing tech here has either completely died off (animation suites and such) or turned into their own generals where slopanons shart out whatever slop they've generated, with the development of tools not even being a footnote.

Anonymous
02/24/25(Mon)14:41:24 No.41973301

Anonymous 02/24/25(Mon)19:41:24 No.41973301▶

>>41973209
>Haysay
I guess that's the most prominent anchor we still have around.

Anonymous
02/24/25(Mon)14:58:04 No.41973344

Anonymous 02/24/25(Mon)19:58:04 No.41973344▶

File: 2025 ppp.png (13 KB, 298x395)

13 KB PNG

>LiuZH-19 - SongGen
>https://liuzh-19.github.io/SongGen/
>https://github.com/LiuZH-19/SongGen
New paper with some samples on text-to-song, no model released as of yet.
>>41973209
Last year there wasn't that much stuff happening, but now we are actual getting innovation to talk about.

DiffAnon
02/24/25(Mon)16:06:30 No.41973530

DiffAnon 02/24/25(Mon)21:06:30 No.41973530▶

>>41983618

File: PinkieStanding.png (142 KB, 1097x1024)

142 KB PNG

>>41972614

So Pinkie's Diffsinger model turned out pretty okay! I didn't know how she'd handle it, but she pulled through just fine. You know the drill, demo of her singing and the link to Pinkie's diffsinger model.

https://voca.ro/14BCxfBhikht

https://mega.nz/file/fhw2zQhb#qk3r7GccCEmDtTiOCbdJKx4CTQiwLXl10UD5jb6sIhs

Anonymous
02/24/25(Mon)18:54:49 No.41973892

Anonymous 02/24/25(Mon)23:54:49 No.41973892▶

>>41974163 >>41983927

File: 1731034002623365.png (758 KB, 938x792)

758 KB PNG

Apparently GPT-SoVITS v3 dropped two weeks ago. Hasn't been advertised.
>he changed a lot of things like the vocoder and added lora support
>I tried gptsovits v3 and unfortunately the audio quality is considerably worse compared to v2. The sample rate was dialed down to 24khz from 32khz (so was the bitrate) and it doesn't sound as clear anymore like v2 used to be. Maybe it's more stable overall but I stopped testing quickly since the muddy audio ruins it for me.
>I noticed he moved away from a GAN in favor of a diffusion model, who knows why.
I, for one, sure as hell won't be bothering with it considering the audio quality drop. Do these retards not realize just how important audio quality is for AI voices?
Guess it wasn't advertised for a reason.

Anonymous
02/24/25(Mon)20:19:09 No.41974163

Anonymous 02/25/25(Tue)01:19:09 No.41974163▶

>>41975926

>>41973892
>make update
>its actually downgrade
goddamit, i can understand companies doing that due to retarded orders from upper managers implementing implementing shit for sake of it, but as a free passion project it just doesn't make sense for me.

Anonymous
02/24/25(Mon)23:07:52 No.41974613

Anonymous 02/25/25(Tue)04:07:52 No.41974613▶

>>41976591

>>41968245

Anonymous
02/25/25(Tue)00:34:47 No.41974803

Anonymous 02/25/25(Tue)05:34:47 No.41974803▶

>>41975217

File: Base Image.png (676 KB, 1080x1952)

676 KB PNG

Slamming: Training a Speech Language Model on One GPU in a Day
https://arxiv.org/abs/2502.15814
>We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, synthetic training data, preference optimisation with synthetic data and tweaking all other components. We empirically demonstrate that this training recipe also scales well with more compute getting results on par with leading SLMs in a fraction of the compute cost. We hope these insights will make SLM training and research more accessible. In the context of SLM scaling laws, our results far outperform predicted compute optimal performance, giving an optimistic view to SLM feasibility.
https://pages.cs.huji.ac.il/adiyoss-lab/slamming/
https://github.com/slp-rl/slamkit
https://huggingface.co/collections/slprl/slam-67b58a61b57083505c8876b2
Recipe to convert a strong small LLM (Qwen2.5-0.5B) into a SLM capable of speech tasks. Found good results using synthetic training data and including a DPO step (30 minutes out of 24 hours).

Anonymous
02/25/25(Tue)05:15:18 No.41975217

Anonymous 02/25/25(Tue)10:15:18 No.41975217▶

>>41974803
interesting, hopefully the scaling down of the training process will continue, since nvidia keeps their good (aka the only useful ones) card locked at way too high price.

Anonymous
02/25/25(Tue)12:35:22 No.41975926

Anonymous 02/25/25(Tue)17:35:22 No.41975926▶

>>41974163
Depends on if it's a technical reason e.g. hardware somehow can't handle the new model yet with the higher (still low and shitty) sample rate or some dumb bullshit like "we need to keep the quality poor to not step on the toes of le real VAs"

Anonymous
02/25/25(Tue)18:14:38 No.41976591

Anonymous 02/25/25(Tue)23:14:38 No.41976591▶

>>41978144

>>41974613

Anonymous
02/25/25(Tue)20:03:38 No.41976828

Anonymous 02/26/25(Wed)01:03:38 No.41976828▶

>>41977584

Late night bump.

Anonymous
02/25/25(Tue)21:51:13 No.41977089

Anonymous 02/26/25(Wed)02:51:13 No.41977089▶

>>41977127 >>41981373

You guys call this singing?
You guys are like a little babby. Listen to these:

https://www.youtube.com/shorts/udOgG0M8pVI
https://www.youtube.com/watch?v=LxDRGRdRxCM&ab_channel=A [Embed]
https://www.youtube.com/watch?v=qu5nnMOQ4VU&ab_channel=A [Embed]

Every AI imaginable has perfected Tara Strong/Rebecca Shoichet singing meanwhile you faggots are still here trying to reinvent the wheel while 15.AI fucked off cause he couldn't fix the robot voice.

Anonymous
02/25/25(Tue)22:09:57 No.41977127

Anonymous 02/26/25(Wed)03:09:57 No.41977127▶

>>41977089
What actually causes that distortion anyways? It feels like all the commercial models I hear are converging towards the same issue.

Anonymous
02/25/25(Tue)22:37:46 No.41977179

Anonymous 02/26/25(Wed)03:37:46 No.41977179▶

>>41977775

>>41969214
Hi. Sorry for the delayed reply

>synthapp.haysay.ai
I visited the page and found that the character selection dropdown list was empty. Is that what you were seeing too? I restarted the container for synthapp and I got it to generate something afterwards, so that seems to have fixed it. Synthapp on the Hay Say site has always been a bit fickle, though. One thing I've noticed a lot is that the audio element sometimes does not show after it finishes generating; I think there's some timing issue related to network delay. I never get that behavior on my local desktop. Haven't taken the time to look into that yet.

>Torchmoji to TT2
I assume "TT2" is Tacotron2? Sorry, I can't help much there since I don't know a lot about Tacotron. I don't know whether it supports emoji inputs or anything similar.
While working on Hay Say's GPT So-Vits UI, I thought about using something like torchmoji to try to guess the emotion from the text and then select the nearest precomputed style. One theoretical issue I ran into is that a given written sentence can be spoken different ways with entirely different emotions, so trying to guess the emotion that the prompter intended would be unreliable. I also wasn't sure how to go about blending precomputed styles together, so I settled on letting the prompter select a single emotion from a dropdown. With a better understanding of how precomputed styles can be blended, perhaps it would be possible, in theory, to control emotion based on emoji inputs or by selecting multiple emotions from a dropdown. This would take some research.

Anonymous
02/26/25(Wed)02:34:00 No.41977584

Anonymous 02/26/25(Wed)07:34:00 No.41977584▶

>>41976828
Early morning one.

Anonymous
02/26/25(Wed)04:38:45 No.41977775

Anonymous 02/26/25(Wed)09:38:45 No.41977775▶

>>41977179
>One theoretical issue I ran into is that a given written sentence can be spoken different ways with entirely different emotions, so trying to guess the emotion that the prompter intended would be unreliable
This was already solved ~~both by cookie and 15~~ by splitting the TTS text and emotion text inputs, so it would be possible to try to find the 'magic word' that gives the closest result to what user wants the output sounds like (just like the different tiers of anger from "I hate you" and "I will fucking kill you").
Sadly to do it there would need to be a separate emotion tts model/lora trained on all MLP dataset + whatever emotion token technique as the base model will obviously not have this data inside AND 30s of audio will not be enough to train model to have more than Neutral + one more emotion in its dataset.

Anonymous
02/26/25(Wed)09:38:40 No.41978144

Anonymous 02/26/25(Wed)14:38:40 No.41978144▶

>>41982207

>>41976591

Anonymous
02/26/25(Wed)14:36:14 No.41978780

Anonymous 02/26/25(Wed)19:36:14 No.41978780▶

>>41980654

>loving ai mares

Anonymous
02/26/25(Wed)16:42:38 No.41979149

Anonymous 02/26/25(Wed)21:42:38 No.41979149▶

>>41979222

new TTS
https://x.com/hume_ai/status/1894833497824481593
not local or open source but
https://www.hume.ai/pricing
free mode includes
>10,000 characters of text to speech per month (~10 minutes)
>Unlimited custom voices

DiffAnon
02/26/25(Wed)17:08:50 No.41979222

DiffAnon 02/26/25(Wed)22:08:50 No.41979222▶

>>41979149

No news about Fluttershy or Rarity yet, but I just wanted to pop in to say they're still trying to figure out the custom voice situation.

Anonymous
02/26/25(Wed)19:16:25 No.41979601

Anonymous 02/27/25(Thu)00:16:25 No.41979601▶

Precautionary late night bump.

Anonymous
02/27/25(Thu)02:40:57 No.41980654

Anonymous 02/27/25(Thu)07:40:57 No.41980654▶

>>41978780
>loving mares
Yes.

Anonymous
02/27/25(Thu)09:42:54 No.41981179

Anonymous 02/27/25(Thu)14:42:54 No.41981179▶

>pone

Anonymous
02/27/25(Thu)11:51:00 No.41981373

Anonymous 02/27/25(Thu)16:51:00 No.41981373▶

File: 1736645036477361.png (5 KB, 135x43)

5 KB PNG

>>41977089
Sounds like shit. Looks like shit.

Anonymous
02/27/25(Thu)17:20:20 No.41982207

Anonymous 02/27/25(Thu)22:20:20 No.41982207▶

>>41982673

>>41978144

Anonymous
02/27/25(Thu)20:05:00 No.41982673

Anonymous 02/28/25(Fri)01:05:00 No.41982673▶

>>41983156

>>41982207

Anonymous
02/27/25(Thu)23:56:45 No.41983156

Anonymous 02/28/25(Fri)04:56:45 No.41983156▶

>>41984462

>>41982673

Anonymous
02/28/25(Fri)05:03:34 No.41983523

Anonymous 02/28/25(Fri)10:03:34 No.41983523▶

>>41983865

File: catbox_gnjhmn.png (3.78 MB, 1920x1080)

3.78 MB PNG

>>41961428
>>41961434
>>41962379
https://files.catbox.moe/dakrgq.mp3
You should have acted. She's already here. The Flyers told of her return. Her defeat was merely delay. Til the time after the Moon waned,. When the Elements of Harmony would lose their gems. But no-one wanted to believe. Believe she even existed. And when the truth finally dawns: It dawns in hay. But, There's one they desire. In their tongue, she's the Great and Powerful: Trixie!

The Repetition feature was great help to get that TTrill at the end.

Anonymous
02/28/25(Fri)06:20:39 No.41983618

Anonymous 02/28/25(Fri)11:20:39 No.41983618▶

>>41985145

File: OpenUTAUp.png (104 KB, 607x875)

104 KB PNG

>>41973530
Like >>41966738, wasn't aware nor have much experience with singing synthesis software. Any chance you could post some accompanying ustx files to help >>41902960 (OP) get started on syllable formatting and phonemes?

Anonymous
02/28/25(Fri)07:20:37 No.41983683

Anonymous 02/28/25(Fri)12:20:37 No.41983683▶

>>41983865

File: celestai_skyrim_chaos.png (247 KB, 1920x975)

247 KB PNG

$ curl https://files.catbox.moe/43h6ub.py
Simulation of CelestAI's "Optimal Harmony" Skyrim Mod
-----------------------------------------------------
This script models chaotic agents (Sheogorath, Discord, Q) in Skyrim and harmonizes Pinkie Pie (PP) anti-phase agents
that counteract chaos during critical quiet windows. The code uses wave superposition, FFT analysis, and
tonal mechanics to simulate CHIM-like player sovereignty and MLP-themed harmony.

$ python celestai_skyrim_chaos.py
Spawning Chaos Agent (0, 'Sheogorath') Strength=2, Position=(-2, 7)
Spawning Chaos Agent (1, 'Discord') Strength=-0.1293121990405771, Position=(0.6384672928415652, -0.7259077362746741)
Spawning Chaos Agent (2, 'Q') Strength=10, Position=(-100000, 100000)
CelestAI Optimally Harmonizing Chaos (COHC) at (0, 0) for 10 seconds until discovery of 4 Pinkies...
4 Pinkie Pie(s) Found.
PP Agent 1 Harmonizing at Time=1.7, Position=(-23.8, 6.8)
PP Agent 2 Harmonizing at Time=5.3, Position=(-74.2, 21.2)
PP Agent 3 Harmonizing at Time=6.1, Position=(-85.4, 24.4)
PP Agent 4 Harmonizing at Time=7.5, Position=(-105.0, 30.0)

Anonymous
02/28/25(Fri)08:01:00 No.41983734

Anonymous 02/28/25(Fri)13:01:00 No.41983734▶

>>41983865

File: not_like_us.png (572 KB, 720x720)

572 KB PNG

https://files.catbox.moe/liy8y9.mp3

Anonymous
02/28/25(Fri)09:34:17 No.41983865

Anonymous 02/28/25(Fri)14:34:17 No.41983865▶

>>41983523
shit like this maks me with the pony modding was still alive as it was in 2012, we could had end up with almost all background ponies as a companion.
>>41983683
Anon, im not sure what about this graph is about but i think you should stop before something bad happens.
>>41983734
muh waifu is such gangsta

Anonymous
02/28/25(Fri)10:14:11 No.41983927

Anonymous 02/28/25(Fri)15:14:11 No.41983927▶

>>41985217

>>41973892
GPT-SoVITS v3 was officially released just now, maybe he fixed some things?

Anonymous
02/28/25(Fri)14:08:26 No.41984462

Anonymous 02/28/25(Fri)19:08:26 No.41984462▶

>>41983156

DiffAnon
02/28/25(Fri)17:00:51 No.41985145

DiffAnon 02/28/25(Fri)22:00:51 No.41985145▶

>>41983618

https://files.catbox.moe/37qv6h.ustx

https://files.catbox.moe/06cfjt.ustx

https://files.catbox.moe/oy1aup.ustx

Here's a couple. The first one has two tracks. One for English and another for Japanese. The second one is all English, and the third one is all Japanese.

Anonymous
02/28/25(Fri)17:36:00 No.41985217

Anonymous 02/28/25(Fri)22:36:00 No.41985217▶

>>41983927
Maybe. Only feedback I've seen so far though:
> I tried to use the GPT part of GPT-sovits v3, but even that is worse than v2. Weird pronunciation (british accent), the generation is also consistently shorter than v2. What a letdown

Anonymous
02/28/25(Fri)18:38:55 No.41985411

Anonymous 02/28/25(Fri)23:38:55 No.41985411▶

>>41985569

File: 2025_02_26_0yn_Kleki.png (731 KB, 1557x1177)

731 KB PNG

https://files.catbox.moe/ixesxk.mp3
(with music)
https://files.catbox.moe/a3a38l.mp3
(vocals only)

been trying for about ten hours to make a twilight cover of Rotten Girl, with this being the cleanest I could get. is there anyway to clean up vocals to be better for AI to pick up once you have separated it from a song? a filter or a site I should run it through maybe?

Anonymous
02/28/25(Fri)19:08:03 No.41985569

Anonymous 03/01/25(Sat)00:08:03 No.41985569▶

>>41986034

>>41985411
~~huh, im guessing you got inspired from my ponified Miku ai art? Nice to see there is still some cross pollination between the Anons ideas from different threads~~
Due to high pitch levels I had difficult time telling this was a Twilight voice cover, but than again the og song has some funny vocal filters going on so the only way I could see this getting done is use covers sang in other sources OR use somehow get/create files to redo the whole song in vocalodi/utau without the filters.

Anonymous
02/28/25(Fri)19:16:44 No.41985624

Anonymous 03/01/25(Sat)00:16:44 No.41985624▶

>>41985837

Shut up and start listening & using quality voice AI:

Here is your fucking singing Twilight Sparkle:
https://www.youtube.com/shorts/udOgG0M8pVI

DiffAnon
02/28/25(Fri)20:23:47 No.41985837

DiffAnon 03/01/25(Sat)01:23:47 No.41985837▶

>>41985624
Look, I get this is 4chan and more stuff is okay to say than on like Discord but the fact of the matter is that this way just doesn't work for everyone. If it works for you, great, but don't push it on everyone.

Anonymous
02/28/25(Fri)21:08:51 No.41986034

Anonymous 03/01/25(Sat)02:08:51 No.41986034▶

>>41986926

>>41985569
I guess its just one of those songs I would either need to dump 40 dollars in having someone on fiver cover, or remake from scratch. ah, damn. also, im the one who drew the pic so not quite. really was hoping after tossing 12 dollars on cover.ai it would be able to do it. oh well, maybe I can cover some other songs using the site if anyone has any recommendations.

Anonymous
03/01/25(Sat)03:21:32 No.41986926

Anonymous 03/01/25(Sat)08:21:32 No.41986926▶

>>41987603

>>41986034
>cover.ai
>Invisible gun safety protection
Anon, I don't think you posted the link you wanted to post.
While most of the song related models can work on a 8gb potato gpu (AND if you don't have one, the Haysay is always a free alternative) recreating the vocals from scratch will always sound weird unless you happen to have a really amazing vocals and singing skills.
As for using fivver/commission, well, that's up to your own judgement and wallet but be careful since you can spend a lot of cash and still get something that's not useable/satisfactory.

Anonymous
03/01/25(Sat)10:04:05 No.41987375

Anonymous 03/01/25(Sat)15:04:05 No.41987375▶

Up.

Anonymous
03/01/25(Sat)11:55:04 No.41987603

Anonymous 03/01/25(Sat)16:55:04 No.41987603▶

>>41987732

>>41986926
my bad, meant covers.ai forgot the s

as for haysay, yeah ive tried it and I might be able to get it too work with my 8gb shit pc, I have tried eleven labs, this new site that I pumped 12 bucks into, and even trying to sing it myself. I think the song is too noisy to work with.

Anonymous
03/01/25(Sat)12:49:16 No.41987732

Anonymous 03/01/25(Sat)17:49:16 No.41987732▶

>>41989089

>>41987603

https://files.catbox.moe/18u8s3.mp3

Twilight cover of Time are a changing using this site

Anonymous
03/01/25(Sat)18:01:39 No.41988774

Anonymous 03/01/25(Sat)23:01:39 No.41988774▶

>>41991952

File: HMG.jpg (1.29 MB, 1684x2500)

1.29 MB JPG

Clipper, random question: might you possibly be attending Griffish Isles this May in Manchester?

Anonymous
03/01/25(Sat)19:42:36 No.41989089

Anonymous 03/02/25(Sun)00:42:36 No.41989089▶

>>41989935 >>41990077

>>41987732
This sounds better and closer to Twi than the previous song, but it still gives the feeling like it was done on the fan va than actual Twi voice.

Anonymous
03/02/25(Sun)00:05:09 No.41989935

Anonymous 03/02/25(Sun)05:05:09 No.41989935▶

>>41990077

>>41989089
I think the pitch should have been raised by a couple more tones to get it closer to Twilight's vocal range.

Anonymous
03/02/25(Sun)01:00:48 No.41990077

Anonymous 03/02/25(Sun)06:00:48 No.41990077▶

>>41991453

>>41989089
>>41989935
yeah I should have gone back and raised it up just a little bit, any song recs I can try to do with Twilight?

Anonymous
03/02/25(Sun)03:35:30 No.41990329

Anonymous 03/02/25(Sun)08:35:30 No.41990329▶

>>41990433

Does anyone remember where this is from?
https://files.catbox.moe/sgdtlm.mp3
File cuts off, looking for the sauce.

Anonymous
03/02/25(Sun)05:31:00 No.41990433

Anonymous 03/02/25(Sun)10:31:00 No.41990433▶

>>41990452

>>41990329
Definitely a Vul song, don't know which one, sorry.
https://www.youtube.com/@vul5925/videos

Anonymous
03/02/25(Sun)05:52:00 No.41990452

Anonymous 03/02/25(Sun)10:52:00 No.41990452▶

>>41991413

>>41990433
Thanks!
It was "Trouble With Trixie" from the "Marez with Attitude" album.
https://youtu.be/iSZhUKejc3Y?t=1850 [Embed]

Anonymous
03/02/25(Sun)14:55:13 No.41991366

Anonymous 03/02/25(Sun)19:55:13 No.41991366▶

is there anymore lewd immersive audio, maybe with twilight or rarity?

Anonymous
03/02/25(Sun)15:06:32 No.41991413

Anonymous 03/02/25(Sun)20:06:32 No.41991413▶

>>41990452
Love Vul's work. His albums are generally high quality.

Anonymous
03/02/25(Sun)15:17:58 No.41991453

Anonymous 03/02/25(Sun)20:17:58 No.41991453▶

>>41991496 >>41991503 >>41993056

>>41990077
>https://files.catbox.moe/26txkf.zip
Ccan you use the above clips of Vinyl Scratch for it, and make a cover of some ska-punk song with her voice? Not sure how that website works, does it accept audio clips only or would you need a transcription of the clips as well?

Anonymous
03/02/25(Sun)15:27:49 No.41991496

Anonymous 03/02/25(Sun)20:27:49 No.41991496▶

>>41991453
I just need 3 minutes of speech audio. all I did too make the Twilight model for the sight was take about 70 clips from the mega of Twilight speaking and stitch them together into one long yap file in audacity. I will try to make your request but dont have high hopes

Anonymous
03/02/25(Sun)15:29:15 No.41991503

Anonymous 03/02/25(Sun)20:29:15 No.41991503▶

>>41991453
just send me a link to the song you want with it though, ive never listened to ska-punk in my life.

Clipper
03/02/25(Sun)17:50:36 No.41991952

Clipper 03/02/25(Sun)22:50:36 No.41991952▶

>>41988774
Not planning to, though I'll be at Babs and Mare Fair. None of the British cons ever felt all that appealing to me. You can email me if you want - clipper.anon01@gmail.com

Anonymous
03/02/25(Sun)20:30:15 No.41993056

Anonymous 03/03/25(Mon)01:30:15 No.41993056▶

>>41991453
>Vinyl Scratch
Now that really feels like resurrecting the dead.

Anonymous
03/03/25(Mon)02:25:52 No.41993857

Anonymous 03/03/25(Mon)07:25:52 No.41993857▶

>>41996059

Up.

Anonymous
03/04/25(Tue)00:32:08 No.41996059

Anonymous 03/04/25(Tue)05:32:08 No.41996059▶

>>41993857

GothicAnon
03/04/25(Tue)09:20:21 No.41996681

GothicAnon 03/04/25(Tue)14:20:21 No.41996681▶

>>41996697 >>41997919

File: spy 1692257067057403.png (332 KB, 996x1019)

332 KB PNG

>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_Spy_so32_gpt48
>https://voca.ro/1j3DnjafttW8
>https://voca.ro/16jsKITi6mWc
GPT-Sovits TF2 Spy model, that I needed for a idea so I thought I may as well share it. The trained model generates audio quality between range "this is passable" to "this shit is so ass", I would recommended running all the outputs again through the rvc.

Anonymous
03/04/25(Tue)09:33:28 No.41996697

Anonymous 03/04/25(Tue)14:33:28 No.41996697▶

>>41996706

>>41996681
>gpt48
>so32
Anon, I...

GothicAnon
03/04/25(Tue)09:40:34 No.41996706

GothicAnon 03/04/25(Tue)14:40:34 No.41996706▶

>>41996714

File: anonfilly tired.png (607 KB, 2000x1566)

607 KB PNG

>>41996697
eeeeeh, there isnt really any instructions on what options for the Sovit and gpt models one should pick. I will be happy to retrain it if someone tells me which setting should I change for the best result.

Anonymous
03/04/25(Tue)09:44:17 No.41996714

Anonymous 03/04/25(Tue)14:44:17 No.41996714▶

>>41997126 >>41997919

>>41996706
Sovits epoch 96, GPT 24, also check the DPO.

Anonymous
03/04/25(Tue)13:08:53 No.41997126

Anonymous 03/04/25(Tue)18:08:53 No.41997126▶

>>41996714
uhh, looks like I've nuked the training files. Staring from zero hopefully the new version will be ready before midnight.

Anonymous
03/04/25(Tue)16:41:27 No.41997620

Anonymous 03/04/25(Tue)21:41:27 No.41997620▶

>mare

Anonymous
03/04/25(Tue)18:49:20 No.41997919

Anonymous 03/04/25(Tue)23:49:20 No.41997919▶

>>41997930

File: tf2 1640046946348.jpg (207 KB, 820x608)

207 KB JPG

>>41996714 >>41996681
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_Spy_so96_gpt24
>https://voca.ro/17RpGyoh0moh
>https://voca.ro/1lmYZ3WcHFer
While still not perfect, these results are much much better than the previous model.

Anonymous
03/04/25(Tue)18:53:49 No.41997930

Anonymous 03/04/25(Tue)23:53:49 No.41997930▶

>>41997969

>>41997919
Happy to help. Also don't trust the ASR for your samples and proofread them

Anonymous
03/04/25(Tue)19:11:12 No.41997969

Anonymous 03/05/25(Wed)00:11:12 No.41997969▶

>>41997930
yes, I have noticed that it will sneak in words that sounds similar to what's actually said OR decipher the coughs into random words.

Anonymous
03/05/25(Wed)11:18:22 No.41998781

Anonymous 03/05/25(Wed)16:18:22 No.41998781▶

live?

Anonymous
03/05/25(Wed)13:55:14 No.41999221

Anonymous 03/05/25(Wed)18:55:14 No.41999221▶

>>41999247

Thoughts?
https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo

Anonymous
03/05/25(Wed)14:01:37 No.41999247

Anonymous 03/05/25(Wed)19:01:37 No.41999247▶

>>41999268

>>41999221
post some captured voice

Anonymous
03/05/25(Wed)14:07:43 No.41999268

Anonymous 03/05/25(Wed)19:07:43 No.41999268▶

>>41999333

>>41999247
Ogey:
~~https://files.catbox.moe/0w8fc5.mp3~~

Anonymous
03/05/25(Wed)14:39:35 No.41999333

Anonymous 03/05/25(Wed)19:39:35 No.41999333▶

>>41999268
not bad, can you get anger out of it?

Anonymous
03/05/25(Wed)15:19:45 No.41999432

Anonymous 03/05/25(Wed)20:19:45 No.41999432▶

https://github.com/SparkAudio/Spark-TTS
>High-Quality Voice Cloning: Supports zero-shot voice cloning, which means it can replicate a speaker's voice even without specific training data for that voice. This is ideal for cross-lingual and code-switching scenarios, allowing for seamless transitions between languages and voices without requiring separate training for each one.
>Controllable Speech Generation: Supports creating virtual speakers by adjusting parameters such as gender, pitch, and speaking rate.

Anonymous
03/05/25(Wed)16:16:46 No.41999593

Anonymous 03/05/25(Wed)21:16:46 No.41999593▶

>>42020619

>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TreeHugger_so96_gpt24
>https://voca.ro/1iyTxmINyT1G
>https://voca.ro/15UBOaGwGeZt
>https://voca.ro/197ppvGSH8Hb

>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/Iron_Will_so96_gpt24
>https://voca.ro/1oM6EZRakiYa
>https://voca.ro/1ib8LQe5Q1jr
>https://voca.ro/1nKmRUN5VgYQ
GPT-Sovits for Tree Hugger & Iron Will models, training wavs and text file included.

Anonymous
03/05/25(Wed)16:32:59 No.41999636

Anonymous 03/05/25(Wed)21:32:59 No.41999636▶

>>42000248

I wonder if it's possible to make a generic background mare voice model? In example how Tabitha voices background or any other VA for that mater.

Anonymous
03/05/25(Wed)18:12:36 No.42000136

Anonymous 03/05/25(Wed)23:12:36 No.42000136▶

>>42000248 >>42001057

>>41967142
Hey Hydrus, I think Haysay online is down. Inaccessible to both myself and someone else.

Anonymous
03/05/25(Wed)18:14:18 No.42000141

Anonymous 03/05/25(Wed)23:14:18 No.42000141▶

>>42005315 >>42010275

Stupid question, but why haysay isn't in the OP?

Anonymous
03/05/25(Wed)18:43:24 No.42000248

Anonymous 03/05/25(Wed)23:43:24 No.42000248▶

>>41999636
From my understanding, it technically should be possible to set up a system like with "this pony do not exist" were messing around some sliders changes the way image looks, the trouble is with models being made with millions/billions neutral neurons it would be pretty difficult to create ui for that plus understanding what connection part influence the output (pitch/deepness/how feminine or muscular voice sounds, etc).
>>42000136
>This site can’t be reached
nta but I can confirm this too.

HydrusBeta
03/06/25(Thu)00:02:50 No.42001057

HydrusBeta 03/06/25(Thu)05:02:50 No.42001057▶

>>42001079 >>42020619

>>42000136
Thanks for letting me know. The EC2 instance got in a weird state and I had to reboot it. It's back up now.

Anonymous
03/06/25(Thu)00:13:16 No.42001079

Anonymous 03/06/25(Thu)05:13:16 No.42001079▶

File: 118345__safe_screencap_fl(...).png (85 KB, 324x285)

85 KB PNG

>>42001057

Anonymous
03/06/25(Thu)17:42:16 No.42003132

Anonymous 03/06/25(Thu)22:42:16 No.42003132▶

>>42003418

>silly mare bump

Anonymous
03/06/25(Thu)19:16:53 No.42003418

Anonymous 03/07/25(Fri)00:16:53 No.42003418▶

>>42004305

>>42003132
Indeed.

Anonymous
03/07/25(Fri)01:21:01 No.42004305

Anonymous 03/07/25(Fri)06:21:01 No.42004305▶

>>42007170

>>42003418

Anonymous
03/07/25(Fri)13:19:33 No.42005315

Anonymous 03/07/25(Fri)18:19:33 No.42005315▶

>>42000141
I never noticed it wasn't there, certainly should be added in the next thread. Perhaps a good idea to review the OP pasta as a whole while we're at it.

Anonymous
03/07/25(Fri)13:45:51 No.42005400

Anonymous 03/07/25(Fri)18:45:51 No.42005400▶

File: tf2 sniper 1636771843595.jpg (543 KB, 1500x1382)

543 KB JPG

Soldier & Sniper GPT-Sovits models
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_soldier_so96_gpt24
>https://voca.ro/1nlARP90O1Ue
>https://voca.ro/1hm4NOdbr3w1
>https://voca.ro/1oN5AuI3CflV
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_sniper_so96_gpt24
>https://voca.ro/13uHQTVOn5Yi
>https://voca.ro/1lzmUAR0o7jn
>https://voca.ro/19i8rXpIVfb1

SubatomicAnon
03/07/25(Fri)14:46:47 No.42005497

SubatomicAnon 03/07/25(Fri)19:46:47 No.42005497▶

>>41966214
thank you.

Anonymous
03/08/25(Sat)00:26:26 No.42007170

Anonymous 03/08/25(Sat)05:26:26 No.42007170▶

>>42007635

>>42004305

Anonymous
03/08/25(Sat)05:28:22 No.42007635

Anonymous 03/08/25(Sat)10:28:22 No.42007635▶

>>42009943

>>42007170

Anonymous
03/08/25(Sat)07:43:53 No.42007774

Anonymous 03/08/25(Sat)12:43:53 No.42007774▶

https://files.catbox.moe/j7vtcg.mp3
I think I like the combination of gpt-sovit output audio being smooth out by sovitsSVC 4.

Anonymous
03/08/25(Sat)12:55:18 No.42008289

Anonymous 03/08/25(Sat)17:55:18 No.42008289▶

File: tf2 medic 1689196841045773.jpg (177 KB, 1355x1356)

177 KB JPG

Scout & Medic GPT-Sovits models
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_scout_so96_gpt24
>https://vocaroo.com/1gjCLwUvKTE0
>https://vocaroo.com/1hBMAxNFuCfm
>https://vocaroo.com/1cfOWfpXOi4g
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_medic_so96_gpt24
>https://vocaroo.com/1UcLXBsXnMFV
>https://vocaroo.com/1mgPyR7cIh7A
>https://vocaroo.com/1aljpb0sCTsD

Anonymous
03/09/25(Sun)01:20:03 No.42009943

Anonymous 03/09/25(Sun)06:20:03 No.42009943▶

>>42007635

Anonymous
03/09/25(Sun)05:36:30 No.42010275

Anonymous 03/09/25(Sun)09:36:30 No.42010275▶

>>42000141
Didn't notice it either. That's a surprising oversight.

Anonymous
03/09/25(Sun)12:01:19 No.42010750

Anonymous 03/09/25(Sun)16:01:19 No.42010750▶

File: TF2Ponies1.jpg (127 KB, 2048x923)

127 KB JPG

Heavy & Engineer GPT-Sovits models
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_heavy_so96_gpt24
>https://vocaroo.com/1fllACLudD0p
>https://vocaroo.com/1gV76H8ayY9l
>https://vocaroo.com/15B2szya27if
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_engineer_so96_gpt24
>https://vocaroo.com/1iOL95gyXjAB
>https://vocaroo.com/1kxBoqYaIipQ
>https://vocaroo.com/1nOQ6xEuFMUT

Anonymous
03/09/25(Sun)14:40:39 No.42011091

Anonymous 03/09/25(Sun)18:40:39 No.42011091▶

>>42011722

Up.

Anonymous
03/09/25(Sun)18:31:18 No.42011722

Anonymous 03/09/25(Sun)22:31:18 No.42011722▶

>>42012325

>>42011091

Anonymous
03/09/25(Sun)18:36:19 No.42011748

Anonymous 03/09/25(Sun)22:36:19 No.42011748▶

>>42011854

>>41903107
Genuinely interested in helping out with things here. That is, until I learn the basics of voice ai that I have 0 knowledge of

Anonymous
03/09/25(Sun)19:03:15 No.42011854

Anonymous 03/09/25(Sun)23:03:15 No.42011854▶

>>42013530

>>42011748
>that I have 0 knowledge of
ooo, new Anon? While the google doc is bit of cluster fuck of information, it does have archives on all the steps Anons had to take from start to finish. I would doubly recommended watching the past year marecon panels, to see the progress of the tech as well a layman terms explanation on how the ai guts operate.
https://odysee.com/@amoawesomeart:e/2020-con-The-Pony-Voice-Preservation-Project-Q-A-1080p:b
https://odysee.com/@amoawesomeart:e/2021-Pony-Preservation-Project-Panel-2021---FULL:d
https://odysee.com/@amoawesomeart:e/2022-con-ppp--1080p:6
https://odysee.com/@amoawesomeart:e/2023-mlp-con-The-Pony-Voice-Preservation-Project-QA---1080p-(bigger-chat-edition)-1080p-hls_FIXED0001-669663:8
A lot of Colab scripts seems to be kill & ded due to google fucking up the python versions and dependencies, so I would recommend grabbing some 101 lessons in python, even if you do not plan to develop anything yourself will still be pretty useful to problem solve some of the issues that may result from installing the ai apps/tools (and understanding at least a little bit of spaghetti code the chatgpt will give you if you ask it for solution).

Anonymous
03/09/25(Sun)21:44:06 No.42012325

Anonymous 03/10/25(Mon)01:44:06 No.42012325▶

>>42012745

>>42011722

Anonymous
03/10/25(Mon)00:50:02 No.42012745

Anonymous 03/10/25(Mon)04:50:02 No.42012745▶

>>42013977

>>42012325

Anonymous
03/10/25(Mon)03:41:48 No.42012996

Anonymous 03/10/25(Mon)07:41:48 No.42012996▶

>>42013118

Truly not the most active thread.

Anonymous
03/10/25(Mon)05:01:57 No.42013118

Anonymous 03/10/25(Mon)09:01:57 No.42013118▶

>>42012996
Such nature of people being forced back to wagie cages as well as all the cool shit being locked behind paid services (or a 1000 dollar investment in equipment that may as well turn into mini house fire ).

Anonymous
03/10/25(Mon)09:58:35 No.42013530

Anonymous 03/10/25(Mon)13:58:35 No.42013530▶

>>42011854
Thanks for the tips anon, currently I'm starting to learn (or relearn after years) the basics of python to make it make sense to me. See if I can use the skills here

Anonymous
03/10/25(Mon)13:38:22 No.42013977

Anonymous 03/10/25(Mon)17:38:22 No.42013977▶

>>42015864

>>42012745

Anonymous
03/10/25(Mon)17:56:26 No.42014698

Anonymous 03/10/25(Mon)21:56:26 No.42014698▶

>mares!

Anonymous
03/10/25(Mon)19:41:41 No.42015143

Anonymous 03/10/25(Mon)23:41:41 No.42015143▶

>15.ai leeched off the hard work of countless anons gathering sound samples and the faggot didn't even release his model in the end

>gathering sound samples
>Jeez what a hard job. Still anal about it?

>You mean those retards who don't even know Suno, Udio, ElevenLabs and keep insisting on training their shit using sovits on their toaster PCs? rofl. Don't make me lol.
>Those retards live at the bottom of a barrel that's under a boulder under a cave.

Anonymous
03/10/25(Mon)21:06:08 No.42015362

Anonymous 03/11/25(Tue)01:06:08 No.42015362▶

>>42015371

"Pony preservation project" is now seen by hundreds of thousands of people on Twitter, thanks to 15's recent post. Given the choice, anons, would you like the PPP to become mainstream?

Anonymous
03/10/25(Mon)21:08:24 No.42015371

Anonymous 03/11/25(Tue)01:08:24 No.42015371▶

>>42015362
I think the PPP's era is over. This place is a bump general that needs to be put to rest, and all publicity would do is flood the thread with useless newfags with too many questions.

Anonymous
03/11/25(Tue)00:20:27 No.42015864

Anonymous 03/11/25(Tue)04:20:27 No.42015864▶

>>42019241

>>42013977

Anonymous
03/11/25(Tue)00:21:53 No.42015866

Anonymous 03/11/25(Tue)04:21:53 No.42015866▶

>>42015896 >>42015930

Time to turn this into a music AI general.

Anonymous
03/11/25(Tue)00:34:24 No.42015896

Anonymous 03/11/25(Tue)04:34:24 No.42015896▶

>>42015866
Music and voices, it's all the same right? If we make ponies sing, it's musical. We'll just get BGM to spearhead that

Anonymous
03/11/25(Tue)00:49:29 No.42015930

Anonymous 03/11/25(Tue)04:49:29 No.42015930▶

>>42016106 >>42016168

>>42015866
Isn't music already covered by /create/? If this thread shuts down, wouldn't we all end up posting on /create/ anyway?

Anonymous
03/11/25(Tue)02:47:26 No.42016106

Anonymous 03/11/25(Tue)06:47:26 No.42016106▶

>>42017113

>>42015930
No idea. I for one at least wouldn't have thought of going over there.

Anonymous
03/11/25(Tue)03:41:36 No.42016168

Anonymous 03/11/25(Tue)07:41:36 No.42016168▶

>>42015930
I imagine not for ai covers, thats more BGM level stuff. or originals, at least.

Anonymous
03/11/25(Tue)03:49:19 No.42016171

Anonymous 03/11/25(Tue)07:49:19 No.42016171▶

>>42016363

I'm looking for two Twilight Sparkle covers I cant find in the archive but I have heard before. I'm gonna be (600 miles) and A thousand Miles. I cant find either of them anywhere.

also, here's a Daylight cover with Twilight https://files.catbox.moe/2paett.mp3

Anonymous
03/11/25(Tue)07:09:14 No.42016363

Anonymous 03/11/25(Tue)11:09:14 No.42016363▶

>>42016171
>A Thousand Miles
https://files.catbox.moe/alto75.mp4
I could find this Dashie version, however I cant find the 600 miles song (i know it exist, since i also listen to it like two years ago)

Anonymous
03/11/25(Tue)09:50:12 No.42016598

Anonymous 03/11/25(Tue)13:50:12 No.42016598▶

>>42016644 >>42016793

>>42016592 →
Gift from the fleet thread.

Anonymous
03/11/25(Tue)10:16:41 No.42016644

Anonymous 03/11/25(Tue)14:16:41 No.42016644▶

>>42016598
very nice, wholesome pony content

Anonymous
03/11/25(Tue)11:06:41 No.42016793

Anonymous 03/11/25(Tue)15:06:41 No.42016793▶

>>42016598
Really enjoyable little gem.

Anonymous
03/11/25(Tue)12:46:50 No.42017113

Anonymous 03/11/25(Tue)16:46:50 No.42017113▶

>>42017743

File: they didnt given up - spa(...).mp4 (2.2 MB, 640x360)

2.2 MB MP4

>>42016106
same here, even if the threads has only dozen of posters, I will will do whatever I can to keep /ppp/ alive.

Anonymous
03/11/25(Tue)16:26:10 No.42017743

Anonymous 03/11/25(Tue)20:26:10 No.42017743▶

>>42018015 >>42018267

File: but-why-jon-tron-show.gif (1.89 MB, 498x280)

1.89 MB GIF

>9
>>42017113
The thread is already dead though. Keeping it "alive" at this stage is just parading around a corpse of what once was, like Hollywood does to old IPs.
The PPP has served its purpose, and even the core team that represented the lifeblood of this place has moved on. I think it's time to let it rest.

Anonymous
03/11/25(Tue)17:33:28 No.42018005

Anonymous 03/11/25(Tue)21:33:28 No.42018005▶

>>42018267 >>42018346

We should merge with AI art.

Anonymous
03/11/25(Tue)17:36:37 No.42018015

Anonymous 03/11/25(Tue)21:36:37 No.42018015▶

>>42017743
Disagree, there is more of the show to preserve

Anonymous
03/11/25(Tue)19:08:58 No.42018267

Anonymous 03/11/25(Tue)23:08:58 No.42018267▶

>>42018346 >>42018380 >>42018696

>>42017743
>parading around a corpse of what once was
no, I strongly disagree with this view. It would imply that there there wouldn't be any new tech to work with and create the content, which is silly since we don't know what we don't know, especially when it's been only few months we had stuff like deepseek for cheaper text generation and gpt-sovits for tts with emotion control. Hell, even the animation tech is making some baby steps from 5 seconds of messy nonsense to 5 seconds of things that are looking mostly consistent.
Fuck it, I personally have SEVERAL ideas I want to make using the ai tools from here, Im just saving money to get a upgrade to my potato pc, I would imaging there are at least handful more Anons like me, out there, sitting on a cool idea that is just waiting to be made into reality.
>the core team that represented the lifeblood of this place has moved on
no? the guys are still here, and while being busy with other projects. they are lurking and occasionally posting in the thread (mostly as Anons).
>>42018005
hmmm, while merging back the ai image and /chag/ to /ppp/ would bring numbers but I feel it would do a disservice to our sister-threads, since both problems solve around their respective application/services. I think it's fine for /ppp/ to stay as the general mlp ai thread to share things that are non-image and non-text stories based, who knows, maybe in a year or two there will be an VR tech revolution that will allow most of the board to afford & easily use it and that itself could spin into a new thread dedicated entirely to VR Equestria simulation threads.

Anonymous
03/11/25(Tue)19:28:29 No.42018346

Anonymous 03/11/25(Tue)23:28:29 No.42018346▶

>>42018005
>>42018267
Merging would be silly for a lot of reasons, not the least of which is that there's no reason to force such a thing.

Anonymous
03/11/25(Tue)19:38:30 No.42018380

Anonymous 03/11/25(Tue)23:38:30 No.42018380▶

>>42018672

>>42018267
>we don't know what we don't know
Is that what you're basing your view on? That maybe things might change at some point? Seems flimsy.
>>42018267
>It would imply that there there wouldn't be any new tech to work with
The tech isn't the issue, it's the lack of engagement with it. PPP Image gen has been "in progress" for three years with no notable progress to show, despite the field moving forward elsewhere. Even if you're right and the core team is still here without namefagging, they're not making things, which is the same issue.
>would imaging there are at least handful more Anons like me, out there, sitting on a cool idea that is just waiting to be made into reality.
I'm sure there is. That's not helpful though, and sounds like the mentality of a dead general.
>We'll be SO back once XYZ happens!
Someone needs to take the reigns and DO something here, something worth talking about.

Anonymous
03/11/25(Tue)21:57:50 No.42018672

Anonymous 03/12/25(Wed)01:57:50 No.42018672▶

>>42018380
If somebody wanted to make animations with an img2vid model, would that go here or on AI art? If it had dialogue, it would definitely go here, right? Part of the problem is that we've run up against the limit of things that can be accomplished with a free Colab. Everyone here is too cheap to buy a GPU that can run Wan, or subscribe to Colab Pro, Colab Pro+, or rent a Runpod. I don't know if Civitai has implemented img2vid yet.

Clipper
03/11/25(Tue)22:05:08 No.42018696

Clipper 03/12/25(Wed)02:05:08 No.42018696▶

>>42018829

>>42018267
>the guys are still here
Can confirm, staying anon as name isn't necessary most of the time. I'm always lurking for the moment when any dataset stuff or other work pops up within my capability. Creative voice stuff also happening in the background, somewhat slow due to less available time + can't use my own Bri'ish voice as reference for sovits and such.

The main bottleneck appears to be expensive hardware and restrictions to only being able to use the AI advancements that make it to open source, and I don't think there's much anyone here can realistically do about that. Frustrating as that is, the capabilities to make use of said advancements in PPP have been clearly demonstrated many times.

I'm not concerned about the current somewhat dormant state. PPP's already succeeded in creating the best (currently) possible voice dataset and synthesis models outside of big tech, everyone who's worked here can be proud of that and shouldn't let a lack of big headlines obfuscate these achievements.

Anonymous
03/11/25(Tue)22:57:52 No.42018829

Anonymous 03/12/25(Wed)02:57:52 No.42018829▶

>>42022087

>>42018696
The thread still serves the purpose of being a place to report found papers and tools regarding open voice generation. Not as glamorous as being the AI voice frontier but being persistent (obsessive) is something we have experience with.

Anonymous
03/12/25(Wed)03:20:57 No.42019241

Anonymous 03/12/25(Wed)07:20:57 No.42019241▶

>>42015864

Anonymous
03/12/25(Wed)08:35:50 No.42019575

Anonymous 03/12/25(Wed)12:35:50 No.42019575▶

>mare

VilligerANON
03/12/25(Wed)13:04:20 No.42020068

VilligerANON 03/12/25(Wed)17:04:20 No.42020068▶

>>42020080 >>42020443

Hello Anons, how are you right now, I hope you all are good right now.

I have been modding this: https://github.com/RAYTRAC3R/codedump recently, fixing some bugs, adding hifigan support. Should I add more stuff to this?

VilligerANON
03/12/25(Wed)13:11:05 No.42020080

VilligerANON 03/12/25(Wed)17:11:05 No.42020080▶

>>42020443

>>42020068
for anyone wondering, the GitHub link that I sent is a fork of an older repo by Cookie (The multispeaker Tacotron2 Trochmoji thing). This fork adds the 15.ai emotional contextualizer system.

(What did Cookie that he's no longer working with the PPP)

Anonymous
03/12/25(Wed)16:01:01 No.42020443

Anonymous 03/12/25(Wed)20:01:01 No.42020443▶

>>42021820

>>42020068
>>42020080
tacotron2 with emotion control sounds interesting, however, how does one use it? I think I still have the old TkinetAnon program but I don't think updating it would be as easy as dumping the github files into it.

Anonymous
03/12/25(Wed)17:24:47 No.42020619

Anonymous 03/12/25(Wed)21:24:47 No.42020619▶

File: pony_fortress_2__demoman_(...).jpg (158 KB, 1024x1366)

158 KB JPG

Demoman model for gpt-sovits.
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_demoman_so96_gpt24
>https://vocaroo.com/1htLMUU3OqL8
>https://vocaroo.com/1ek8yqAvnlpK
>https://vocaroo.com/12pmgMlpqWzD
With that, all the TF2 mercenaries have their own models done and ready.
If anybody wishes for a specific pony character to get their own gpt-sovits model feel free to request them.
>>42001057
HydrusBeta, could you add these Tree Hugger and Iron Will models >>41999593 to haysay? or is the emotion process you had it going to making them too different to use?

VilligerANON
03/13/25(Thu)02:04:43 No.42021820

VilligerANON 03/13/25(Thu)06:04:43 No.42021820▶

>>42021832

>>42020443
Sorry but these models don't work with that program. I will make a GUI for the multispeaker Tacotron2 Trochmoji thing, but It will contain modifications I made.

(+ Will support Older models)

VilligerANON
03/13/25(Thu)02:12:30 No.42021832

VilligerANON 03/13/25(Thu)06:12:30 No.42021832▶

>>42021820
>I will make a GUI.

It will be a colab script that starts the webui, similar to this:https://colab.research.google.com/drive/1UjSg4tDcubbkax781fE0pNeAFdht_MZ0?usp=sharing

But will be more user friendly. (and use my repo)

Anonymous
03/13/25(Thu)06:19:38 No.42022087

Anonymous 03/13/25(Thu)10:19:38 No.42022087▶

>>42018829
Nobody here ever implements anything from the papers that gets posted.

Anonymous
03/13/25(Thu)16:01:39 No.42022933

Anonymous 03/13/25(Thu)20:01:39 No.42022933▶

>>42022939

File: OIG2.WeliQbThJbO_WBHLuTu5.jpg (86 KB, 1024x1024)

86 KB JPG

Anonymous
03/13/25(Thu)16:02:57 No.42022939

Anonymous 03/13/25(Thu)20:02:57 No.42022939▶

>>42023611

>>42022933
>"Self-destruct sequence initiated."

Anonymous
03/13/25(Thu)20:08:24 No.42023611

Anonymous 03/14/25(Fri)00:08:24 No.42023611▶

>>42022939
https://files.catbox.moe/trw6tg.wav

Anonymous
03/13/25(Thu)20:31:51 No.42023662

Anonymous 03/14/25(Fri)00:31:51 No.42023662▶

sesame's conversational voice demo (see: https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo) got open-sourced today:

https://github.com/BenLechiara/csm
So who's up for training this to have a conversation with Twilight?
It's the best model currently out there for voice chat with AI.

[Post a Reply]

[Return] [Catalog] [Top]

326 / 69 / ?

[Update] [Auto]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.