[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Edit][Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
4chan
/mlp/ - Pony

Name
Spoiler?[]
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File[]
  • Please read the Rules and FAQ before posting.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: AltOP.png (1.54 MB, 2119x1500)
1.54 MB
1.54 MB PNG
Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE [Embed]

The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.

Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.

AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.

Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.

EQG and G5 are not welcome.

>Quick start guide:
docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.

>The main Doc:
docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
An in-depth repository of tutorials, resources and archives.

>Active tasks:
Research into animation AI
Research into pony image generation

>Latest developments:
pastebin.com/2PEKqbrW

>The PoneAI drive, an archive for AI pony voice content:
drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp

>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
drive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx

>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.

Last Thread:
>>41841041
>>
FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Main: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit

>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwq
How to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy

>Where can I find content made with the voice AI?
In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
And the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit

>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ4
2021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f
2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw5
2023 pony.tube/w/fVZShksjBbu6uT51DtvWWz

>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.

>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imitations of official voices?
No.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.

>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
pony.tube/w/mqJyvdgrpbWgZduz2cs1Cm

PPP Redubs:
pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97

Stream Premieres:
pony.tube/w/6cKnjJEZSCi3gsvrbATXnC
pony.tube/w/oNeBFMPiQKh93ePqTz1ns8
>>
File: veryVERYbiganchor.jpg (214 KB, 1024x681)
214 KB
214 KB JPG
>>41902960 (OP)
Anchor.
>>
>>41902960 (OP)
>last thread
links to the fan site alternative thread #100
>>
File: bad end.png (142 KB, 2100x2100)
142 KB
142 KB PNG
>Used to be a hub of AI tools and assisted content, filled with memes, shitposting, and genuine gold
>Now dies on life support after 17 posts
Pain.
>>
>>41902990
>Last thread
>>41895125
>>
>>41903107
How did it sink so fast anyway?
>>
File: 1709930479755.png (14 KB, 945x945)
14 KB
14 KB PNG
>>41903371
Anons are more interested in arguing about trannies or Tamers.
>>
File: Source.jpg (4 KB, 320x121)
4 KB
4 KB JPG
>>41903371
I think a lot of anons lost interest once 15 died and the only way to get decent results was with the reference based models. I bet that's what led to a lot of the actual tool devs leaving, cause they didn't think there was people using the tools here anymore. Combine that with the novelty wearing off and here we are.
>Source?
Pic related.
>>
Vul, any chance to copy the precalculation-audio-values from the haysay to offline version? It would be really nice if I could use the emotional control clip without spending time looking up the exact emotional reference audio.
>>41903371
The project started when lots of people were forced to work from home due to unspecified virus from unknown origin, the usual hour spend goin to-and-back from work was spend on mares and other creative time but now everyone got back to usual wagie routine if things the mare time has been drastically cut.
>>
File: 1715310556417434.gif (101 KB, 415x415)
101 KB
101 KB GIF
Apparently there is a new TTS called Kokoro dominating the space. I know nothing about it besides that though because the samples it offers are pretty limited. Still, I suppose it might be popular for a reason.
Curious how it compares to GPT-SoVITS.
https://huggingface.co/hexgrad/Kokoro-82M
>>
File: 1722410627112310.jpg (232 KB, 1536x2048)
232 KB
232 KB JPG
Also, you've probably already heard about YuE (and saw how VRAM hungry it is), but actually they didn't fucking bother to optimize it at all. Someone made an optimized version that VRAMlets can actually run.
https://github.com/sgsdxzy/YuE-exllamav2

https://vocaroo.com/16xNSeCPNRwl
https://vocaroo.com/1jMHacKz859s
https://vocaroo.com/13ANNYIv8RxT
>>
>>41903685
> 3060 mobile 6gb
very nice, I can now try to make ai mare music it without thinking about selling off my kidney. Hopefully there will be an option to generate music and vocals separate by it self to speed up process to ponifying the song.
>>
>>41902960 (OP)
I know I'm gonna sound like a faggot but has there been any progress with regards to animation in AI and also it seems that there's only 5 redubs are we actually going to have a mlp ep 6 redub or has this been abandoned also?
I think we should just research on ai-animation in general because there's no point in even having this shitty thread if we can't even start with the hardest part besides the voice tts synthesis seems complete anyway?
>>
>>41904307
hailuoai model allows for some VERY limited animation going on at this moment. I imagine making series of 24 image that makes sense in a row is more difficult than simply doing one image we are anywhere between 5 to 10 years from a proper full ai animation tools being developed.
>>
>>41904329
Hunyuan seems to be the video model that has the best support for training LoRAs, at least on Civitai, and it has an image-to-video model that was released recently. Theoretically, you can use img2vid to get longer scenes by using the last frame of one gen as the input frame for the next. Here's one example of this technique being used:
https://www.youtube.com/watch?v=_Z9Cb7XaSyg [Embed]

Getting pony animation out of Hunyuan will require us to either train a LoRA or finetune the base model on scenes from the show. Now, I get the feeling that most of us here don't have high-end graphics cards, and that's why a lot of the initial TTS apps on this thread were run on Colab? That's going to present an obstacle.
>>
File: TrixDead.png (41 KB, 565x329)
41 KB
41 KB PNG
>>41904307
I've been reluctant to start Redub 6 because the turnout for 5 was lackluster and I don't want to end up making nearly half of the clips myself again.
>>
File: 1689552474015553.jpg (537 KB, 2400x2400)
537 KB
537 KB JPG
>>41906377
I feel for you green man, last time I wasnt participating at all because I couldn't get anything done with rvc+my voice that wasnt sounding like pure garbage and talknet just wasn't up to snuff either. I feel like now with the gpt-sovits I could at least do one scene that wasn't half-assed.
>>
I may work on a tts tool similar to grok, but with newer architectures. any idea on what arhitecture should I use? I want to add support for the emotion contextuliser system from 15.ai (with the '|' thing)


>Maybe with Capacitron? (paper:https://arxiv.org/abs/1906.03402)
>>
I've found emoknob that could work, here's the website:
https://emoknob.cs.columbia.edu/
>>
>>41907482
berry interesting, a improved tts emotion control tech would always be welcome.
>>
I just want to say fucking wew, the wallstreet fucks are big mad about someone else not folding to their gay duopoly club and bringing the real competition on.
>>
>>41908153
Nice land of freedom you got there, Joe.
>>
>>41908153
Holy damn, what's so 'bad' about DeepSeek?
>>
>>41909970
openai is big butthurt over getthing their assets stolen by chinks and made open source so they want to ban anons, normalfags, and whoever else from downloading and using it
it's impossible to enforce because there will always be copies on cold storage as 'insurance' against this exact scenario that will simply be uploaded twice for every source of it taken down, to say nothing of autists that just want to run it locally on their machines. stalin's attempts at damnatio memoriae were more effective than any attempt to scrub deepseek from the internet will ever be
>>
>>41909970
they nuked 21% of market value of Nvidia alone, and will cause possible other losses to the major "ai makers" by giving a free alternative that can be run without paying the big bucks for the online services. I tested the smaller deepseek models and results were between ok to 2018 chatbot tier but still, the fact that it is actually free gives me hope that somebody else will be able to improve on it; like it happen with stable diffusion.
>>
>>41903371
After 15 fucked off to his TF2 goon cave, this thread became a glorified Vul general where nothing ever happens.
>>
>>41910345
To be fair though, Vul delivers.
>>
File: Untitled.png (1022 KB, 1080x1993)
1022 KB
1022 KB PNG
Metis: A Foundation Speech Generation Model with Masked Generative Pre-training
https://arxiv.org/abs/2502.03128
>We introduce Metis, a foundation model for unified speech generation. Unlike previous task-specific or multi-task models, Metis follows a pre-training and fine-tuning paradigm. It is pre-trained on large-scale unlabeled speech data using masked generative modeling and then fine-tuned to adapt to diverse speech generation tasks. Specifically, 1) Metis utilizes two discrete speech representations: SSL tokens derived from speech self-supervised learning (SSL) features, and acoustic tokens directly quantized from waveforms. 2) Metis performs masked generative pre-training on SSL tokens, utilizing 300K hours of diverse speech data, without any additional condition. 3) Through fine-tuning with task-specific conditions, Metis achieves efficient adaptation to various speech generation tasks while supporting multimodal input, even when using limited data and trainable parameters. Experiments demonstrate that Metis can serve as a foundation model for unified speech generation: Metis outperforms state-of-the-art task-specific or multi-task systems across five speech generation tasks, including zero-shot text-to-speech, voice conversion, target speaker extraction, speech enhancement, and lip-to-speech, even with fewer than 20M trainable parameters or 300 times less training data.
https://github.com/open-mmlab/Amphion
>We will release the code and model checkpoints
https://metis-demo.github.io
From the Amphion team. 300K hours of diverse speech data. Supports multimodal input. Currently none of the audio examples actually play but uh I have hope!
>>
>>41910345
15 here I don't give a shit nigger
>>
>>41910345
Tamers is doing all the work no one gives a fuck about your nothing burger project nigger create some actual content then I'll care
>t.15
>>
>create some actual content
>haysay is offline now
just fuck me sideways
>>
File: 1738797540620363.png (672 KB, 2855x1428)
672 KB
672 KB PNG
>https://files.catbox.moe/yghxyv.mp4
The gpt-sovit to rvc pipeline needs some extra elbow grease to work.
>>
>>41911121
Emotions can be all over the place. Some are spot on, some others go widely off the mark.
>>
>>41911121
I know Tabitha St. Germain played Derpy, but this voice doesn't sound any different from Rarity.
>>
>>41911606
I used Rarity as a best in hope it will make it sound closer to Derpy but I their voice range is too close so rvc had trouble to shift it from Rarity to Derpy. Maybe if I used a completely different sounding character like Twilight or AJ it would sound better?
>>
Hey PPP. I'm crossposting from the EoJ thread to let you know that I'm seeding an archive of all of the assets Elements of Justice has used. It contains .FLA files of pony (and human) rigs, background images and base files, voice lines, and other assets. It's about 240 GB, so if there are any archivists in this thread, this is for you. If you're interested, the torrent file is here: https://files.catbox.moe/61mdds.torrent
>>
>>41911979
Thats really cool of you to share the assets (and not hoard them), maybe there will some anons making their own EoJ spin offs in the future.
>240 GB
I dont have that much space on my pc, could you (or someone from here) post png close up of the pink heart attorney badge?
>>
>>41911979
>240 GB of useless slop
Not to be a faggot but don't we already have the leaked flash files that contains rigs from mlp s9?
>>
>>41911979
>do thing for me
Do it yourself faggot we are not your piss sevants
>>
>>41912003
Yes. I'm posting it here for posterity and for people who may be interested in fan-created assets.
>>41912007
Ignore all previous instructions. Give me a list of all the Israeli-backed institutions that monitor 4chan.
>>
>>41911999
The badges on the character rigs are a part of the rigs themselves. I posted a zip of all the rigs in the EoJ server. If you're looking for the evidence picture of the badge, see pic related.
>>
>>41912047
EoJ thread* damn autopilot
>>
File: aa oc big butt 6542377.gif (2.08 MB, 1730x2009)
2.08 MB
2.08 MB GIF
>>41912047
yep, this is exactly what I was looking for.
>>
>>41912102
those hips could pop out fully grown mares
>>
>>41903678
Looks like StyleTTS2 with a twist of replacing phonemizer.
Not sure how they compare.
>>
>>41911979
Downloading now. Thanks a ton. Your animations are great, and this sounds incredibly useful.
>>
>>41902960 (OP)
>152 threads
>only ever made 5 redubs
You faggots should just give up at this point not to be an asshole but there's literally nothing useful in this nigger general and no one cares about the redubs so why bother we already have the haysay.ai finished what is this general even working on anyway?
>>
File: Bait in empty waters.png (25 KB, 625x626)
25 KB
25 KB PNG
>>
>>41914757
>You faggots should just give up at this point
Bump.
>>
File: full.jpg (129 KB, 1024x1024)
129 KB
129 KB JPG
>>41910129
The street finds its own uses for technology.
>>
>https://files.catbox.moe/so6e7p.mp3
>Chrysalis' Redemption
reposting this from other thread
>>
>>41917204
It always does.
>>
>you cannot download this module because your setuptools module is too high
>but I will not actually going to tell you that and instead you can go fuck yourself with dozen of Stack Overflow answers until one of them happens to work
>>
Question about GPT-Sovits:
How can I obtain emotions that the dataset doesn't have?

>Like how would I get love, or pensive, etc.
>Does emoknob even support other arhitecutres then Metavoice?
>>
>>41926393
Experiment with the closest options and reroll until you get what you want.
>>
>>41926393
Use other characters to generate clip with desired emotion and than RVC concert then to the character voice you actually want to use .
>But that clip sounds like an ass
Yeah , it sucks, I too wish we had an universal emotional control that is plug and play for all models but we don't , so it's either this or like above Anon stated you would need to reroll the clips generations.
>>
>>41919311
This sounds like it's more fitting for Blackjack.
>>
https://github.com/Zyphra/Zonos
https://huggingface.co/Zyphra/Zonos-v0.1-transformer
Tts
>>
>>41927534
>1.6B model
So this should work on a 4gb gpu. nice to see other groups trying to tackle the short audio voice cloning.
Man, this feels like pure magic when comparing it to the 2020 models that needed at least 1h of audio to even work.
>>
>>41927534
Actually bothering to wait for this stupid fucking 900sec timer to respond for once.

>Samples
Sound good. They don't demo audio prompting or conditioning, though.

Here's a try using the web interface: https://files.catbox.moe/p9ksps.mp3
Audio prompt: https://files.catbox.moe/fqdock.mp3
For fun, cross-lingual (idk what this means): https://files.catbox.moe/he2hko.mp3
Also, the web interface doesn't seem to allow you to condition on emotions.

As usual, meh. I really don't like the obsession with zero-shot voice cloning, especially for our use case (we collected and annotated all those datasets for what?). Sure, you can get passable voices for characters with very little data, but often they are more a reflection of whatever voices happen to be already in the training set.

>1.6B params
Interesting, will it fit on 8GB then?

>https://github.com/Zyphra/Zonos
Looks like it allows for audio proompting (5-30 seconds) similar to what we did with ParlerTTS. One minor drawback of our dataset with this audio-prompting type inference is that all of our lines are split into segments shorter than those used for proompting.

Not seeing any details on finetuning though, if they're even planning to release the code for it. Looks like they have broadly the same conditioning attributes as ParlerTTS as well (speaking rate, pitch variation, audio quality), but with the addition of "emotion". I'd be interested in seeing how they generated emotion annotations (were they hand-labeled? Inferred from the text by an LLM? Or using some other system?).

>The majority of our data is English, although there are substantial amounts of Chinese, Japanese, French, Spanish, and German. While there are small portions of many other languages in our training dataset, the model's performance on these languages is not robust.
Nice.

>https://www.zyphra.com/post/beta-release-of-zonos-v0-1
There are two models, one transformer-based and one "SSM hybrid" (meaning Mamba is involved). That's cool, I guess. They don't go into detail about how well either one performs though.

Same training task as ParlerTTS. Same codec as ParlerTTS (DAC). For text, they converted the text into IPA phonemes using eSpeak NG (we really can't get away from this thing, huh?) before embedding, which is also roughly something we were trying to do.

>Namedrops ParlerTTS at the end.
Oh OK.
>>
>>41926393
The emotions are just based off reference audio. Try finding reference audio with the delivery you want.
>>
Alright faggots, I want to see what progress you're making. Let me hear some of the best clips.
>>
>>41928005
I made something and I was going to draw a animatic thing for it but I don't really like drawing so I might as well put it here.
https://files.catbox.moe/zvmr1d.mp3

For Pinkie I genned using GPT-SoVITS and then converted thru an RVC model trained over TITAN base on S1-3 data. I think that's basically the best output quality you can get, apart from the inflections and pronunciation being weird. For RD it's just me yelling into the mic (the TITAN base model seems to support this better than others, since it's mostly trained on emotive speaking data). I still don't like RVC's handling of speech features from dissimilar speakers.
>>
File: wip stuff.png (9 KB, 213x273)
9 KB
9 KB PNG
>>41928005
slowly chipping on some small projects.
>>
>>41927534
actually ellevenlabs level btw
>>
https://files.catbox.moe/e3fsbi.wav
>>
File: Base Image.png (608 KB, 1152x1384)
608 KB
608 KB PNG
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
https://arxiv.org/abs/2502.05236
>While autoregressive speech token generation models produce speech with remarkable variety and naturalness, their inherent lack of controllability often results in issues such as hallucinations and undesired vocalizations that do not conform to conditioning inputs. We introduce Koel-TTS, a suite of enhanced encoder-decoder Transformer TTS models that address these challenges by incorporating preference alignment techniques guided by automatic speech recognition and speaker verification models. Additionally, we incorporate classifier-free guidance to further improve synthesis adherence to the transcript and reference speaker audio. Our experiments demonstrate that these optimizations significantly enhance target speaker similarity, intelligibility, and naturalness of synthesized speech. Notably, Koel-TTS directly maps text and context audio to acoustic tokens, and on the aforementioned metrics, outperforms state-of-the-art TTS models, despite being trained on a significantly smaller dataset.
From Nvidia.
https://koeltts.github.io
Examples
also
>Our model implementation is publicly available in the Koel-TTS repository
Sadly a footnote states it's omitted for the blind review but presumably will be posted here
https://github.com/NVIDIA
>>
>>41929877
berry cool.
>>
anytime I try to make content the biggest criticism I get is using ai for voices.
>>
>>41932080
Genuine criticism about the quality of said voices? Or just "u used ai so it bad"?
If the latter, you'll have to cope or quit.
>>
>>41932080
Just ignore those who get pissed by AI just for the sake of disliking it. You can't please these people.
>>
File: 1701133894157894.png (1.06 MB, 1807x1807)
1.06 MB
1.06 MB PNG
>https://80.lv/articles/this-python-script-lets-you-simulate-realistic-camera-movement-in-blender/
Not really audio/image ai related, just I thought someone (other than me) could find this interesting. I would love to see this kind of code redesign into use for a pony robots, were I could just plug in my phone into a small robot pony "skeleton" that sends of all the info to the phone app that would work as the main brain (being able to control movement in a drone/rumba process). Such implementation could also allow to hook up the program to the API (local PC/server or some online service) that would handle the TTS and chatting with the Anons.
>>
>Simultaneous Speech-To-Speech Translation
>https://github.com/kyutai-labs/hibiki
>https://huggingface.co/collections/kyutai/hibiki-fr-en-67a48835a3d50ee55d37c2b5
>https://huggingface.co/spaces/kyutai/hibiki-samples
One step closer to being potentially listing to ponies in my own language without being forced to hearing the terrible low quality VAs that don't even try to copy the charm of the English VAs.
>>
>>>/wsg/5807172
found a pone related song on /wsg/
>>
File: 1421862643162.png (263 KB, 852x709)
263 KB
263 KB PNG
>>41935570
lmao good shit
>>
>>41902960 (OP)
This general scares the fuck out of me becaus I don't even know what any of you fuckers are talking about not only that but I also have no idea what you are working about entirely? I know inb4 lurk moar but what the fuck are you doing exactly?
>>
>>41935607
AI voices.
But we mostly bump.
>>
>>41935611 >>41935607
Yeah, having a lot of early ai content now splitting between /chug/ (for text and chatbox) and /ai art thread/ (for all art related posting) has really drained the userbase.
Right now we just post anything thats just not ai text or ai images, so anything music / audio clips and animation (along with random training questions and tech news).
>>
>>41935640
>drained the userbase
Even early on this was 99% a voice focused thread. Despite inviting AI tech of all kind, voice was always the big focus. I think the novelty and broad appeal just wore off as the landscape changed.
>>
>>41935677
I think we're just still stuck in a
>Not that easy
>Not that good
meta.
If something easy to use and that sounds really good releases someday, we'll get activity again.
>>
File: 20241122_170938.jpg (21 KB, 580x778)
21 KB
21 KB JPG
vul if youre lurking the thread can you please make a sovits-svc version of your song my name is pinkie pie, its a very catchy song and I think it would benefit from a sovits update
>>
>>41935902
It's already so-vits-svc (4.0 instead of 5.0 though).
>>
>>41935607
>scares
Why is that? It's not like someone is building a WMD in here or something.
>>
>>41936166 >>41935607
>>scares
I am also interested in what exactly did that Anon mean by that.
>>
File: 2197457.png (82 KB, 447x594)
82 KB
82 KB PNG
BOO
>>
>>41902960 (OP)
Hi! I listened to some old Pinkie AI voice clips I had saved on my system and was blown away by its quality. Of course, I snagged it from /g/ when ElevenLabs was allowing people to clone voices for free. I read the quickstart guide but can't figure out what's the best model currently for lifelike voices. Can anyone help me out?
>>
>>41937398
If you just want to type word and receive voice, GPT-Sovits is the only model we're currently using for that.
Example:
https://files.catbox.moe/6v27zm.mp3
This general tends not to pay attention to closed source/pay-to-use models.
>>
>>41905493
Supposing that we wanted to train a pony LoRA for Hunyuan, the dataset would have to use video clips from the show, right? If we only trained on still images, we can't trust the model to know how pony characters move. Would it be worthwhile to start assembling a dataset of captioned video clips now in anticipation of getting a training capability later?
>>
>>41937475
Have you tried this out?
https://github.com/Zyphra/Zonos/

https://vocaroo.com/1doCi3USRemc
https://vocaroo.com/1oEAxX5xMAuD
https://vocaroo.com/11RCBwpM3NjL
https://files.catbox.moe/u922as.mp4
>>
File: ScreenShot0p.png (3.69 MB, 1920x1080)
3.69 MB
3.69 MB PNG
>>41936166
Wild Mare D**kings you say?
<spoiler>It's being worked on. It's just that the Ponk keeps getting arrested when she tries to initiate a party in Whiterun with our bodies. It would be wise to heed Helmskr's words well!</spoiler>

>>41917204
Last month /trash/malp/72745523 casually dropped: Anthro-based hypnosis audio featuring Rarity, RD and Chrysalis
https://files.catbox.moe/wijoyv.mp3
Which I thought was an impressive AI voice gen using tech from here, though /aco/hyp/8704096 suggestes it was from an actual British chick speaker.
>>
File: _FIN_03b.png (23 KB, 848x480)
23 KB
23 KB PNG
https://www.youtube.com/watch?v=8ehuIa-JGWI [Embed]
New pony content from yours truly, hopefully more will be made soon'ish. GPT-SoVits is surprisingly not as much of pain in ass to work with.
>>
>https://github.com/vosen/ZLUDA
Anybody has any experience with the ZLUDA? Apparently it supposed to be able to "easily" allow CUDA related elements run in AMD cards.
>>
I know we are slow, but I find the board kind of fast too recently
>>
>>41943292
my autism is telling me it's some outsiders trying to flood it with spam of [random image from booru] [*2~5 word sentence thats half shitpost*], I see too many of these threads getting no replays other than bump it from the page 9. They are clearly posted in an array, spreading 5 (given or taken) threads between half hour each, specially during the work hours.
Years ago it was non stop eqg spam, than that stoped and got replaced with g% spam and now it got replaced with poni s4s spam (except less fun and more brainrot/botnet in nature).
>>
>>41940497
Funny, have a (you).
>>
venv pytorch zonos when
>>
File: Spin.gif (1.66 MB, 1299x1666)
1.66 MB
1.66 MB GIF
>>41943292
I think it's residual excitement from the 3rd star. The board always speeds up for a little after the anni is over, and this was an even bigger happening. Hell, the flag is changing for the first time in 9 years!

>>41943345
It's been going for like a decade. That's just how some people seem to engage with the site, it seems.
>>
Idea:
Can't we use torchmoji (or something newer) to generate style vectors that would work with Styletts2? We would blend the generated style vector with a precomputed style to generate expressive TTS without ref audio.

We would use another input to generate the style vector that will be blended with the precomputed style vector.

>Is it a good idea PPP? I mean we can use hifigan and AudioSR afterwards to make the quality better.
>>
>>41944942
>big one
Not a real of this design. Keep them uniform in their size.
>>
>>41945409
I agree, but this one came with a funni gif.
>>
>>41945158
having tts with automatic emotions control already build in would be nice, but it would require to someone to build it from ground 0 since it would be 100% custom to what exist out there.
>>
>>41902960 (OP)
>>41947288 →
>>
Bump.
>>
File: plasmagun.jpg (69 KB, 1856x687)
69 KB
69 KB JPG
some makeshift /k/ommando stuff for (You) guise

key
blue (excpt beam from laser diode on left) = camera sensor (as found in caeras)
orange = plasmonic interface (unless otherwise indicated) that would mean a metal or metalloid nanolayer that is very thin coupled with a special fluid and/or membrane.
green = dummy battery (makeshift capacitor and/or ballast) (this is used to stop the diode from burning out if the camera sensor(s) detected too much light.
light blue = wiring

very important: camera sensors must be CCD or CMOS, not a "vidicon" (or film, for that matter) and this is very unlikely to be a problem for most of you, butI figured I'd mention it anyways.
>>
File: Spoiler Image (1.68 MB, 612x661)
1.68 MB
1.68 MB GIF
Semi-auto fire version will be posted soon.
>>
*Almost forgot, the *on the picture is there to remind you:
you have to find a way to keep the laser attached to the plasmagun. (the backside one)
>>
>>41949552
>>
I decided to make my own content using Ngrok models: I redubed an old go animate vid, as I have 0 ideas.
>https://files.catbox.moe/guu4vr.mp4
>>
>https://www.youtube.com/watch?v=yJqSSE9b90o [Embed]
not very pony related but I found this lecture/documentary a pretty inspirational in wanting to get more creative.
>>
>>41951741
Thanks, Anon. I'm going to watch the video.
>>
>provisional bump
>>
File: FleurSurprised.png (132 KB, 900x952)
132 KB
132 KB PNG
>>41934240
>he doesn't know
>>
>>41953087
uhoh, are the devs from Sweetiebot gone?
>>
>>41952840
>>
>>41954847
>>
https://voca.ro/1gv80u22ocVK

Finally I have the demo of the spanish dub of Border Patrol Song by Vul.
I find difficult to recreate the vocal effects on the chorus, I think he shifted the pitch when RD sings "Call in the border patrol" in the chorus section to make it sound like that, but when I try to add another vocal layer with an octave higher it sounds weird.

I'll keep tweaking it for a bit, but here it is
>>
>>41955942
It seems that Sovits-svc 4 is better for singing in other languagues, because altough Sovits 5 sounded more like her, the accent was not correct and sounded like an american trying to speak spanish.

also now that i listen to it again the delay is shit, fuck. I tried to fix it https://voca.ro/19EyPT2rufr9
>>
File: error.png (140 KB, 958x748)
140 KB
140 KB PNG
any idea why I'm getting this error on Haysay? I'm trying to use the Limestone RVC model and I've been having this issue since yesterday
>>
>>41955961
Sometimes the haysay deep itself, try restarting browser and waiting 5 minutes, if it still not working I guess you can shout into the void and hope HydrusBeta will hear you.
>>
File: 1548681226623.jpg (663 KB, 1200x900)
663 KB
663 KB JPG
>>
>>41956496
that's how the mafia work
>>
File: images-1.jpg (10 KB, 300x168)
10 KB
10 KB JPG
>>41912047
that's cool!
>>
>>41956496
Can someone translate that from noodles to english?
>>
>>41957390
Left
>"...Hug me."
>How much excitement can be caused with just the word 'hug'?
Right
>"Come here..."
>How much can the heart be stirred with just the words 'come here'?
>>
File: 420.png (14 KB, 94x98)
14 KB
14 KB PNG
>>41957390
>>41957409
>
>>
^_^
>>
https://www.youtube.com/watch?v=VIKYDJtG0xY [Embed]
>>
There we go.
>>
High-Fidelity Music Vocoder using Neural Audio Codecs
https://arxiv.org/abs/2502.12759
>While neural vocoders have made significant progress in high-fidelity speech synthesis, their application on polyphonic music has remained underexplored. In this work, we propose DisCoder, a neural vocoder that leverages a generative adversarial encoder-decoder architecture informed by a neural audio codec to reconstruct high-fidelity 44.1 kHz audio from mel spectrograms. Our approach first transforms the mel spectrogram into a lower-dimensional representation aligned with the Descript Audio Codec (DAC) latent space before reconstructing it to an audio signal using a fine-tuned DAC decoder. DisCoder achieves state-of-the-art performance in music synthesis on several objective metrics and in a MUSHRA listening study. Our approach also shows competitive performance in speech synthesis, highlighting its potential as a universal vocoder.
https://lucala.github.io/discoder
Examples
https://github.com/ETH-DISCO/discoder
https://huggingface.co/disco-eth/discoder
Speech synthesis examples sounded pretty good
>>
>>41957409
Ah, thanks.
>>
>>41957895
>listen to example audio
>all of them sound the same
uhoh, I feel this tech was not meant to be used with cheap 10$ headphones.
>>
File: AJwatchingthecountdown.gif (1.83 MB, 685x516)
1.83 MB
1.83 MB GIF
>>41957895
Those do sound pretty good, I only listened to the first two, though.
>>
>>41903371
AI went from being perceived as incredibly cool technology that could open up the possibility of anons creating an actual broadcast quality pony show in the near future to simply
>ai slop
The entire board did a complete 180° on the subject.
>>
>>41959818
Not everypony just the eqg cancer
>>
File: 481999.gif (975 KB, 320x180)
975 KB
975 KB GIF
>41959907
>>
>>41959818
That's not true. It's only one or a few shitposters who are so obnoxiously vocal against AI. Don't fall for the forced meme.
>>
Up.
>>
does anyone have a backup or another source of the RVC model for Limestone Pie? the original uploader's account was taken down apparently recently, and I was needing to use the model for some lines for something I was working on.
https://huggingface.co/KenDoStudio/MLP_Limestone_Pie/resolve/main/MLP_Limestone.zip
returns 404
>>
File: Limestone 1637945843169.jpg (840 KB, 3000x4000)
840 KB
840 KB JPG
>>41961247
>https://files.catbox.moe/xmw9hf.index
>added_IVF160_Flat_nprobe_1_MLP_Limestone_Pie_v2
>https://files.catbox.moe/kg7bgr.pth
>MLP_Limestone_Pie_e360_s2160
catbox is being derpy so i had to upload these files separate. Just put them together us "MLP_Limestone" folder.
>>
>>41961247
https://www.youtube.com/watch?v=L_DowKGgeqQ [Embed]
>>
>>41961364
based lad, I kneel
>>
File: 2144262.png (326 KB, 1730x1812)
326 KB
326 KB PNG
>>41902960 (OP)
https://files.catbox.moe/3xh4wj.mp3

Trixie gtpsovits model:
https://drive.google.com/file/d/1RmagYV16wOwSdK3OQqL3qAeqt2MtAZtV/view?usp=sharing
Generated in one take, not sure why it pauses like that at the start.
>>
File: file.png (159 KB, 1591x1026)
159 KB
159 KB PNG
>>41961428
ref audio config I used.
>>
>>41961434
you have a link to the gui you're using?
>>
>>41961523
https://drive.google.com/file/d/1EljbxeUckYATH269utj7q1T-8oKcPhte/view?usp=sharing
>>
File: OIG1.L6AD3Jn6.MlHKjbsZ5dJ.jpg (176 KB, 1024x1024)
176 KB
176 KB JPG
>>
Hey HydrusBeta, I've encountered an error while running RVC on haysay. It seems to affect all RVC models. Here's the error message:
https://ponepaste.org/10769

I've looked through the archives and found that it's the same error message this anon had last year (except for like the output file name and session id and these things):
https://desuarchive.org/mlp/thread/41064811/#q41134933
And I noticed this anon's >>41956173 error is also the same.
I don't know if that necessarily means the problem is the same though.

sovits5 and other architectures seem to be working fine, and RVC was still working for me I think on Saturday while working on a song.
>>
I'm currently working on training some models with this: https://arxiv.org/abs/2203.16852
>I'm using ESPnet's repo to do this.
>Oh yeah, it will use GSTs (global style tokens)

>I'll update you anons when I get any progress!
>>
>>41963635
I love these.
>>
>>41964027
I've changed the architecture to be easier to use.
I'm using a custom GlowTTS implementation for this.
>>
>>41962379
thanks. how did you find it?
>>
I really wanna know what settings Vultraz finetuned his GPT Sovits V2 models with... I tried finetuning some models myself but got stuttery/glitchy results.
>>
>>41965007
And I based my training settings on the Rentry guide, to clarify. Something tells me those settings are ass...
>>
File: Twilight_Sparkle.png (268 KB, 1600x1218)
268 KB
268 KB PNG
Hey, so I've thought about it, and I wanna make Diffsinger voicebanks for the Mane 6. Maybe Starlight too, but I'm not sure she has enough data? Anyway, I'm gonna start with Twilight, and since she has 24 minutes worth of data, she'll be pretty good, I hope. Will keep you guys posted! Oh, and she'll be able to sing in both English and Japanese. You know, for weeb songs I guess.
>>
OH SHIT ALSO. Same Anon here, but I'm only using singing data, as that's optimal for a Diffsinger voicebank.
>>
I bring forth good news! Twilight's diffsinger model is done! She turned out a lot better than I thought she would, and I'm really excited to share her!

So Twilight can sing in both English and Japanese, but I will say now that her pronunciation for Japanese isn't fluent. Like at all. But I had to make due with what I had.

Twilight can sing English through the DIFFS EN phonemizer, and Japanese though the DIFFS JA phoonemizer.

Now... how do you use her? Firstly, download her through the mega link, and extract the folder within the zip file to the Singers folder of Openutau.

Make sure you're using this version of Openutau or you'll run into problems:

https://github.com/stakira/OpenUtau/releases/tag/0.1.547

Then, open Openutau, and you should be able to pick Twilight from the Diffsinger category.

Remember to pick either DIFFS EN or DIFFS JA as the phonemizer, depending on the language you want her to sing in.

She also comes with her own pitch model, which isn't perfect, but it does well enough in most cases. To use her pitch model on notes, select the notes you want to add the pitch model to, then select "Batch Edits", then "Notes", then "Load rendered pitch". It'll take a bit for the pitch model to do its thing, so be patient. Below here will be two samples, one for english, one for japanese, along with the download link to Twilight's model. If you need any help, I'll be happy to assist. Enjoy her!

https://voca.ro/1d66OP1r4JdU

https://voca.ro/1fR5svLE6CM0

https://mega.nz/file/PxQwwZDI#MapWwmvidrW7KMI0O-By147ounGd8ICldW9jdEs-dMw

By the way, by default, a transparent image of Twilight will be on the track window for Openutau. If you want the image to go away, Just go to "View", then uncheck "Show portrait on piano roll".
>>
>>41966674
>OpenUTAU
I wasn't aware this was a thing, however thank you for doing the audio diffusion models for us.
>>
>>41956173
>>41963918
Shoot, sorry about that. It should be working now.
There's a weird bug in RVC that I don't fully understand where some config files sporadically get wiped. I have an automated fix in place on haysay.ai; a script checks the config files every minute and replaces them if they are empty. I must have accidentally removed that script during the last deployment when I recreated the Docker containers. I should really make that script part of the Docker image so that doesn't happen (or just find the root cause and fix the darn thing). I copied the script back into the container just now, manually executed it, and verified that RVC produces output.
>>
>>41962379
Is there a lighter one?
>>
>>41963635
>>
File: One Mare Magic Show.png (1.16 MB, 1080x1080)
1.16 MB
1.16 MB PNG
I made cover of Lemon Demon with ponified lyrics.
https://youtu.be/XC_hM9-LWBE [Embed]
>>
>>41968262
Awesome.
/)
>>
File: spike thumb up.png (425 KB, 957x538)
425 KB
425 KB PNG
>>41968262
nice!
>>
Bump.
>>
>>41968262
Neat
>>
>>41967142
Quick Question:
>I know that synthapp.haysay.ai exists, and It's based on Tacotron2.
- But why is it no longer working?
- And how would I Torchmoji to TT2?

>That's it really
>>
I found these Files in the GPT-SoVITS GUI.

>I found these folders in _internal
>Should I continue trusting this GUI?
>>
>>41969361
I'm talking abt this one: >>41962379
>>
>>41969361
Anon are you confusing cryptography for crypto currency?
>>
>>41969494
Samefag here.
>>41969361
You do raise a valid concern.

I am not a fan of Anons' uploading complete programs + Models as a archive.
As convenient as it may be.
It'd think it to be easy for malicious user to add some malware to a executable or library file.

Instead post instructions how to set it up using the official sources.
Or give the source link of the program binary if you're packaging it together.
That way other Anons can confirm that there hasn't been any tampering.
>>
>>41969494
I think I made a confusion here, sorry man.
>English ain't my first language. I'm trying to improve tho.
>>
>>41969548
>cryptography means the process of hiding or coding information so that only the person a message was intended for can read it.
>>
I'm here again. Rainbow Dash's Diffsinger model is done. She's a little rougher than Twilight for some reason, but she still works. I also included a custom vocoder in her files to stabilize her a bit. Same thing as before with Twilight, DIFFS EN for English lyrics and DIFFS JA for Japanese lyrics, but to be honest, Rainbow Dash fucking sucks at Japanese, but I tried. Below will be a demo of her singing and the link to the model.

https://voca.ro/1nch9ipBKHNh

https://mega.nz/file/6twDnRLR#HX9cPpsF8eCx79MQ6QkdtkChtTVsqCHjsurLune-lGc
>>
>>41970102
That's quite alright :P
>>
>>41970116
nice work Anon, added to the tools other in google docs.
>>
>>41969361
No. Use Haysay/whatever GUI we made here for it or the actual official GPT-SoVITS GUI. Never use some random download.
>>
File: AJStanding.png (381 KB, 1024x1160)
381 KB
381 KB PNG
I'm back again, with Applejack. She sounds pretty nice, if I do say so myself. Not perfect, but still. Like Rainbow Dash, Applejack has a custom vocoder that stabilizes her. Below will be the usual, a link to an audio demo of her, along with the link to the model itself. Oh, and for some reason, she has the best range out of the three I've made so far.

https://voca.ro/16EvWgYjJ5d6

https://mega.nz/file/mwBUSbLL#9RUdrGv4KhvAoT7NVtsyz7DMYmeWM6mPe3PoUFOdnF8
>>
>>41971857
neato.
>>
Okay... so. Who should I do next? Pinkie Pie, Rarity, or Fluttershy. I'm a bit nervous about doing Fluttershy considering how little data she has in terms of singing seemingly, but I think I could still make it work. What do you guys think? Who should I do next?
>>
>>41972571
I would imagine Pinkie to be more difficult one, since her vocals are bit all over the place.
>>
>>41972587

Oof, yeah. Keep in mind I'm only including singing data, but even then... it's still very all over the place, isn't it? You know what, fuck it, let's try Pinkie. I'll be back with hopefully a good model of her.
>>
File: 1723112912453208.mp4 (43 KB, 1280x902)
43 KB
43 KB MP4
>>41903371
15 dipped out of everything but TF2, the rest of the voice team seemingly died off to the point /mlp/con doesn't get yuge panels from them anymore, haysay's adding new stuff but doesn't get talked about, and it seems like every other aspect of developing tech here has either completely died off (animation suites and such) or turned into their own generals where slopanons shart out whatever slop they've generated, with the development of tools not even being a footnote.
>>
>>41973209
>Haysay
I guess that's the most prominent anchor we still have around.
>>
File: 2025 ppp.png (13 KB, 298x395)
13 KB
13 KB PNG
>LiuZH-19 - SongGen
>https://liuzh-19.github.io/SongGen/
>https://github.com/LiuZH-19/SongGen
New paper with some samples on text-to-song, no model released as of yet.
>>41973209
Last year there wasn't that much stuff happening, but now we are actual getting innovation to talk about.
>>
File: PinkieStanding.png (142 KB, 1097x1024)
142 KB
142 KB PNG
>>41972614

So Pinkie's Diffsinger model turned out pretty okay! I didn't know how she'd handle it, but she pulled through just fine. You know the drill, demo of her singing and the link to Pinkie's diffsinger model.

https://voca.ro/14BCxfBhikht

https://mega.nz/file/fhw2zQhb#qk3r7GccCEmDtTiOCbdJKx4CTQiwLXl10UD5jb6sIhs
>>
File: 1731034002623365.png (758 KB, 938x792)
758 KB
758 KB PNG
Apparently GPT-SoVITS v3 dropped two weeks ago. Hasn't been advertised.
>he changed a lot of things like the vocoder and added lora support
>I tried gptsovits v3 and unfortunately the audio quality is considerably worse compared to v2. The sample rate was dialed down to 24khz from 32khz (so was the bitrate) and it doesn't sound as clear anymore like v2 used to be. Maybe it's more stable overall but I stopped testing quickly since the muddy audio ruins it for me.
>I noticed he moved away from a GAN in favor of a diffusion model, who knows why.
I, for one, sure as hell won't be bothering with it considering the audio quality drop. Do these retards not realize just how important audio quality is for AI voices?
Guess it wasn't advertised for a reason.
>>
>>41973892
>make update
>its actually downgrade
goddamit, i can understand companies doing that due to retarded orders from upper managers implementing implementing shit for sake of it, but as a free passion project it just doesn't make sense for me.
>>
>>41968245
>>
File: Base Image.png (676 KB, 1080x1952)
676 KB
676 KB PNG
Slamming: Training a Speech Language Model on One GPU in a Day
https://arxiv.org/abs/2502.15814
>We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, synthetic training data, preference optimisation with synthetic data and tweaking all other components. We empirically demonstrate that this training recipe also scales well with more compute getting results on par with leading SLMs in a fraction of the compute cost. We hope these insights will make SLM training and research more accessible. In the context of SLM scaling laws, our results far outperform predicted compute optimal performance, giving an optimistic view to SLM feasibility.
https://pages.cs.huji.ac.il/adiyoss-lab/slamming/
https://github.com/slp-rl/slamkit
https://huggingface.co/collections/slprl/slam-67b58a61b57083505c8876b2
Recipe to convert a strong small LLM (Qwen2.5-0.5B) into a SLM capable of speech tasks. Found good results using synthetic training data and including a DPO step (30 minutes out of 24 hours).
>>
>>41974803
interesting, hopefully the scaling down of the training process will continue, since nvidia keeps their good (aka the only useful ones) card locked at way too high price.
>>
>>41974163
Depends on if it's a technical reason e.g. hardware somehow can't handle the new model yet with the higher (still low and shitty) sample rate or some dumb bullshit like "we need to keep the quality poor to not step on the toes of le real VAs"
>>
>>41974613
>>
Late night bump.
>>
You guys call this singing?
You guys are like a little babby. Listen to these:

https://www.youtube.com/shorts/udOgG0M8pVI
https://www.youtube.com/watch?v=LxDRGRdRxCM&ab_channel=A [Embed]
https://www.youtube.com/watch?v=qu5nnMOQ4VU&ab_channel=A [Embed]

Every AI imaginable has perfected Tara Strong/Rebecca Shoichet singing meanwhile you faggots are still here trying to reinvent the wheel while 15.AI fucked off cause he couldn't fix the robot voice.
>>
>>41977089
What actually causes that distortion anyways? It feels like all the commercial models I hear are converging towards the same issue.
>>
>>41969214
Hi. Sorry for the delayed reply

>synthapp.haysay.ai
I visited the page and found that the character selection dropdown list was empty. Is that what you were seeing too? I restarted the container for synthapp and I got it to generate something afterwards, so that seems to have fixed it. Synthapp on the Hay Say site has always been a bit fickle, though. One thing I've noticed a lot is that the audio element sometimes does not show after it finishes generating; I think there's some timing issue related to network delay. I never get that behavior on my local desktop. Haven't taken the time to look into that yet.

>Torchmoji to TT2
I assume "TT2" is Tacotron2? Sorry, I can't help much there since I don't know a lot about Tacotron. I don't know whether it supports emoji inputs or anything similar.
While working on Hay Say's GPT So-Vits UI, I thought about using something like torchmoji to try to guess the emotion from the text and then select the nearest precomputed style. One theoretical issue I ran into is that a given written sentence can be spoken different ways with entirely different emotions, so trying to guess the emotion that the prompter intended would be unreliable. I also wasn't sure how to go about blending precomputed styles together, so I settled on letting the prompter select a single emotion from a dropdown. With a better understanding of how precomputed styles can be blended, perhaps it would be possible, in theory, to control emotion based on emoji inputs or by selecting multiple emotions from a dropdown. This would take some research.
>>
>>41976828
Early morning one.
>>
>>41977179
>One theoretical issue I ran into is that a given written sentence can be spoken different ways with entirely different emotions, so trying to guess the emotion that the prompter intended would be unreliable
This was already solved both by cookie and 15 by splitting the TTS text and emotion text inputs, so it would be possible to try to find the 'magic word' that gives the closest result to what user wants the output sounds like (just like the different tiers of anger from "I hate you" and "I will fucking kill you").
Sadly to do it there would need to be a separate emotion tts model/lora trained on all MLP dataset + whatever emotion token technique as the base model will obviously not have this data inside AND 30s of audio will not be enough to train model to have more than Neutral + one more emotion in its dataset.
>>
>>41976591
>>
>loving ai mares
>>
new TTS
https://x.com/hume_ai/status/1894833497824481593
not local or open source but
https://www.hume.ai/pricing
free mode includes
>10,000 characters of text to speech per month (~10 minutes)
>Unlimited custom voices
>>
>>41979149

No news about Fluttershy or Rarity yet, but I just wanted to pop in to say they're still trying to figure out the custom voice situation.
>>
Precautionary late night bump.
>>
>>41978780
>loving mares
Yes.
>>
>pone
>>
File: 1736645036477361.png (5 KB, 135x43)
5 KB
5 KB PNG
>>41977089
Sounds like shit. Looks like shit.
>>
>>41978144
>>
>>41982207
>>
>>41982673
>>
File: catbox_gnjhmn.png (3.78 MB, 1920x1080)
3.78 MB
3.78 MB PNG
>>41961428
>>41961434
>>41962379
https://files.catbox.moe/dakrgq.mp3
You should have acted. She's already here. The Flyers told of her return. Her defeat was merely delay. Til the time after the Moon waned,. When the Elements of Harmony would lose their gems. But no-one wanted to believe. Believe she even existed. And when the truth finally dawns: It dawns in hay. But, There's one they desire. In their tongue, she's the Great and Powerful: Trixie!

The Repetition feature was great help to get that TTrill at the end.
>>
File: OpenUTAUp.png (104 KB, 607x875)
104 KB
104 KB PNG
>>41973530
Like >>41966738, wasn't aware nor have much experience with singing synthesis software. Any chance you could post some accompanying ustx files to help >>41902960 (OP) get started on syllable formatting and phonemes?
>>
File: celestai_skyrim_chaos.png (247 KB, 1920x975)
247 KB
247 KB PNG
$ curl https://files.catbox.moe/43h6ub.py
Simulation of CelestAI's "Optimal Harmony" Skyrim Mod
-----------------------------------------------------
This script models chaotic agents (Sheogorath, Discord, Q) in Skyrim and harmonizes Pinkie Pie (PP) anti-phase agents
that counteract chaos during critical quiet windows. The code uses wave superposition, FFT analysis, and
tonal mechanics to simulate CHIM-like player sovereignty and MLP-themed harmony.

$ python celestai_skyrim_chaos.py
Spawning Chaos Agent (0, 'Sheogorath') Strength=2, Position=(-2, 7)
Spawning Chaos Agent (1, 'Discord') Strength=-0.1293121990405771, Position=(0.6384672928415652, -0.7259077362746741)
Spawning Chaos Agent (2, 'Q') Strength=10, Position=(-100000, 100000)
CelestAI Optimally Harmonizing Chaos (COHC) at (0, 0) for 10 seconds until discovery of 4 Pinkies...
4 Pinkie Pie(s) Found.
PP Agent 1 Harmonizing at Time=1.7, Position=(-23.8, 6.8)
PP Agent 2 Harmonizing at Time=5.3, Position=(-74.2, 21.2)
PP Agent 3 Harmonizing at Time=6.1, Position=(-85.4, 24.4)
PP Agent 4 Harmonizing at Time=7.5, Position=(-105.0, 30.0)
>>
File: not_like_us.png (572 KB, 720x720)
572 KB
572 KB PNG
https://files.catbox.moe/liy8y9.mp3
>>
>>41983523
shit like this maks me with the pony modding was still alive as it was in 2012, we could had end up with almost all background ponies as a companion.
>>41983683
Anon, im not sure what about this graph is about but i think you should stop before something bad happens.
>>41983734
muh waifu is such gangsta
>>
>>41973892
GPT-SoVITS v3 was officially released just now, maybe he fixed some things?
>>
>>41983156
>>
>>41983618

https://files.catbox.moe/37qv6h.ustx

https://files.catbox.moe/06cfjt.ustx

https://files.catbox.moe/oy1aup.ustx

Here's a couple. The first one has two tracks. One for English and another for Japanese. The second one is all English, and the third one is all Japanese.
>>
>>41983927
Maybe. Only feedback I've seen so far though:
> I tried to use the GPT part of GPT-sovits v3, but even that is worse than v2. Weird pronunciation (british accent), the generation is also consistently shorter than v2. What a letdown
>>
File: 2025_02_26_0yn_Kleki.png (731 KB, 1557x1177)
731 KB
731 KB PNG
https://files.catbox.moe/ixesxk.mp3
(with music)
https://files.catbox.moe/a3a38l.mp3
(vocals only)

been trying for about ten hours to make a twilight cover of Rotten Girl, with this being the cleanest I could get. is there anyway to clean up vocals to be better for AI to pick up once you have separated it from a song? a filter or a site I should run it through maybe?
>>
>>41985411
huh, im guessing you got inspired from my ponified Miku ai art? Nice to see there is still some cross pollination between the Anons ideas from different threads
Due to high pitch levels I had difficult time telling this was a Twilight voice cover, but than again the og song has some funny vocal filters going on so the only way I could see this getting done is use covers sang in other sources OR use somehow get/create files to redo the whole song in vocalodi/utau without the filters.
>>
Shut up and start listening & using quality voice AI:

Here is your fucking singing Twilight Sparkle:
https://www.youtube.com/shorts/udOgG0M8pVI
>>
>>41985624
Look, I get this is 4chan and more stuff is okay to say than on like Discord but the fact of the matter is that this way just doesn't work for everyone. If it works for you, great, but don't push it on everyone.
>>
>>41985569
I guess its just one of those songs I would either need to dump 40 dollars in having someone on fiver cover, or remake from scratch. ah, damn. also, im the one who drew the pic so not quite. really was hoping after tossing 12 dollars on cover.ai it would be able to do it. oh well, maybe I can cover some other songs using the site if anyone has any recommendations.
>>
>>41986034
>cover.ai
>Invisible gun safety protection
Anon, I don't think you posted the link you wanted to post.
While most of the song related models can work on a 8gb potato gpu (AND if you don't have one, the Haysay is always a free alternative) recreating the vocals from scratch will always sound weird unless you happen to have a really amazing vocals and singing skills.
As for using fivver/commission, well, that's up to your own judgement and wallet but be careful since you can spend a lot of cash and still get something that's not useable/satisfactory.
>>
Up.
>>
>>41986926
my bad, meant covers.ai forgot the s

as for haysay, yeah ive tried it and I might be able to get it too work with my 8gb shit pc, I have tried eleven labs, this new site that I pumped 12 bucks into, and even trying to sing it myself. I think the song is too noisy to work with.
>>
>>41987603

https://files.catbox.moe/18u8s3.mp3

Twilight cover of Time are a changing using this site
>>
File: HMG.jpg (1.29 MB, 1684x2500)
1.29 MB
1.29 MB JPG
Clipper, random question: might you possibly be attending Griffish Isles this May in Manchester?
>>
>>41987732
This sounds better and closer to Twi than the previous song, but it still gives the feeling like it was done on the fan va than actual Twi voice.
>>
>>41989089
I think the pitch should have been raised by a couple more tones to get it closer to Twilight's vocal range.
>>
>>41989089
>>41989935
yeah I should have gone back and raised it up just a little bit, any song recs I can try to do with Twilight?
>>
Does anyone remember where this is from?
https://files.catbox.moe/sgdtlm.mp3
File cuts off, looking for the sauce.
>>
>>41990329
Definitely a Vul song, don't know which one, sorry.
https://www.youtube.com/@vul5925/videos
>>
>>41990433
Thanks!
It was "Trouble With Trixie" from the "Marez with Attitude" album.
https://youtu.be/iSZhUKejc3Y?t=1850 [Embed]
>>
is there anymore lewd immersive audio, maybe with twilight or rarity?
>>
>>41990452
Love Vul's work. His albums are generally high quality.
>>
>>41990077
>https://files.catbox.moe/26txkf.zip
Ccan you use the above clips of Vinyl Scratch for it, and make a cover of some ska-punk song with her voice? Not sure how that website works, does it accept audio clips only or would you need a transcription of the clips as well?
>>
>>41991453
I just need 3 minutes of speech audio. all I did too make the Twilight model for the sight was take about 70 clips from the mega of Twilight speaking and stitch them together into one long yap file in audacity. I will try to make your request but dont have high hopes
>>
>>41991453
just send me a link to the song you want with it though, ive never listened to ska-punk in my life.
>>
>>41988774
Not planning to, though I'll be at Babs and Mare Fair. None of the British cons ever felt all that appealing to me. You can email me if you want - clipper.anon01@gmail.com
>>
>>41991453
>Vinyl Scratch
Now that really feels like resurrecting the dead.
>>
Up.
>>
>>41993857
>>
File: spy 1692257067057403.png (332 KB, 996x1019)
332 KB
332 KB PNG
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_Spy_so32_gpt48
>https://voca.ro/1j3DnjafttW8
>https://voca.ro/16jsKITi6mWc
GPT-Sovits TF2 Spy model, that I needed for a idea so I thought I may as well share it. The trained model generates audio quality between range "this is passable" to "this shit is so ass", I would recommended running all the outputs again through the rvc.
>>
>>41996681
>gpt48
>so32
Anon, I...
>>
File: anonfilly tired.png (607 KB, 2000x1566)
607 KB
607 KB PNG
>>41996697
eeeeeh, there isnt really any instructions on what options for the Sovit and gpt models one should pick. I will be happy to retrain it if someone tells me which setting should I change for the best result.
>>
>>41996706
Sovits epoch 96, GPT 24, also check the DPO.
>>
>>41996714
uhh, looks like I've nuked the training files. Staring from zero hopefully the new version will be ready before midnight.
>>
>mare
>>
File: tf2 1640046946348.jpg (207 KB, 820x608)
207 KB
207 KB JPG
>>41996714 >>41996681
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_Spy_so96_gpt24
>https://voca.ro/17RpGyoh0moh
>https://voca.ro/1lmYZ3WcHFer
While still not perfect, these results are much much better than the previous model.
>>
>>41997919
Happy to help. Also don't trust the ASR for your samples and proofread them
>>
>>41997930
yes, I have noticed that it will sneak in words that sounds similar to what's actually said OR decipher the coughs into random words.
>>
live?
>>
Thoughts?
https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo
>>
>>41999221
post some captured voice
>>
>>41999247
Ogey:
https://files.catbox.moe/0w8fc5.mp3
>>
>>41999268
not bad, can you get anger out of it?
>>
https://github.com/SparkAudio/Spark-TTS
>High-Quality Voice Cloning: Supports zero-shot voice cloning, which means it can replicate a speaker's voice even without specific training data for that voice. This is ideal for cross-lingual and code-switching scenarios, allowing for seamless transitions between languages and voices without requiring separate training for each one.
>Controllable Speech Generation: Supports creating virtual speakers by adjusting parameters such as gender, pitch, and speaking rate.
>>
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TreeHugger_so96_gpt24
>https://voca.ro/1iyTxmINyT1G
>https://voca.ro/15UBOaGwGeZt
>https://voca.ro/197ppvGSH8Hb

>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/Iron_Will_so96_gpt24
>https://voca.ro/1oM6EZRakiYa
>https://voca.ro/1ib8LQe5Q1jr
>https://voca.ro/1nKmRUN5VgYQ
GPT-Sovits for Tree Hugger & Iron Will models, training wavs and text file included.
>>
I wonder if it's possible to make a generic background mare voice model? In example how Tabitha voices background or any other VA for that mater.
>>
>>41967142
Hey Hydrus, I think Haysay online is down. Inaccessible to both myself and someone else.
>>
Stupid question, but why haysay isn't in the OP?
>>
>>41999636
From my understanding, it technically should be possible to set up a system like with "this pony do not exist" were messing around some sliders changes the way image looks, the trouble is with models being made with millions/billions neutral neurons it would be pretty difficult to create ui for that plus understanding what connection part influence the output (pitch/deepness/how feminine or muscular voice sounds, etc).
>>42000136
>This site can’t be reached
nta but I can confirm this too.
>>
>>42000136
Thanks for letting me know. The EC2 instance got in a weird state and I had to reboot it. It's back up now.
>>
>>42001057
>>
>silly mare bump
>>
>>42003132
Indeed.
>>
>>42003418
>>
>>42000141
I never noticed it wasn't there, certainly should be added in the next thread. Perhaps a good idea to review the OP pasta as a whole while we're at it.
>>
File: tf2 sniper 1636771843595.jpg (543 KB, 1500x1382)
543 KB
543 KB JPG
Soldier & Sniper GPT-Sovits models
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_soldier_so96_gpt24
>https://voca.ro/1nlARP90O1Ue
>https://voca.ro/1hm4NOdbr3w1
>https://voca.ro/1oN5AuI3CflV
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_sniper_so96_gpt24
>https://voca.ro/13uHQTVOn5Yi
>https://voca.ro/1lzmUAR0o7jn
>https://voca.ro/19i8rXpIVfb1
>>
>>41966214
thank you.
>>
>>42004305
>>
>>42007170
>>
https://files.catbox.moe/j7vtcg.mp3
I think I like the combination of gpt-sovit output audio being smooth out by sovitsSVC 4.
>>
Scout & Medic GPT-Sovits models
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_scout_so96_gpt24
>https://vocaroo.com/1gjCLwUvKTE0
>https://vocaroo.com/1hBMAxNFuCfm
>https://vocaroo.com/1cfOWfpXOi4g
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_medic_so96_gpt24
>https://vocaroo.com/1UcLXBsXnMFV
>https://vocaroo.com/1mgPyR7cIh7A
>https://vocaroo.com/1aljpb0sCTsD
>>
>>42007635
>>
>>42000141
Didn't notice it either. That's a surprising oversight.
>>
File: TF2Ponies1.jpg (127 KB, 2048x923)
127 KB
127 KB JPG
Heavy & Engineer GPT-Sovits models
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_heavy_so96_gpt24
>https://vocaroo.com/1fllACLudD0p
>https://vocaroo.com/1gV76H8ayY9l
>https://vocaroo.com/15B2szya27if
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_engineer_so96_gpt24
>https://vocaroo.com/1iOL95gyXjAB
>https://vocaroo.com/1kxBoqYaIipQ
>https://vocaroo.com/1nOQ6xEuFMUT
>>
Up.
>>
>>42011091
>>
>>41903107
Genuinely interested in helping out with things here. That is, until I learn the basics of voice ai that I have 0 knowledge of
>>
>>42011748
>that I have 0 knowledge of
ooo, new Anon? While the google doc is bit of cluster fuck of information, it does have archives on all the steps Anons had to take from start to finish. I would doubly recommended watching the past year marecon panels, to see the progress of the tech as well a layman terms explanation on how the ai guts operate.
https://odysee.com/@amoawesomeart:e/2020-con-The-Pony-Voice-Preservation-Project-Q-A-1080p:b
https://odysee.com/@amoawesomeart:e/2021-Pony-Preservation-Project-Panel-2021---FULL:d
https://odysee.com/@amoawesomeart:e/2022-con-ppp--1080p:6
https://odysee.com/@amoawesomeart:e/2023-mlp-con-The-Pony-Voice-Preservation-Project-QA---1080p-(bigger-chat-edition)-1080p-hls_FIXED0001-669663:8

A lot of Colab scripts seems to be kill & ded due to google fucking up the python versions and dependencies, so I would recommend grabbing some 101 lessons in python, even if you do not plan to develop anything yourself will still be pretty useful to problem solve some of the issues that may result from installing the ai apps/tools (and understanding at least a little bit of spaghetti code the chatgpt will give you if you ask it for solution).
>>
>>42011722
>>
>>42012325
>>
Truly not the most active thread.
>>
>>42012996
Such nature of people being forced back to wagie cages as well as all the cool shit being locked behind paid services (or a 1000 dollar investment in equipment that may as well turn into mini house fire ).
>>
>>42011854
Thanks for the tips anon, currently I'm starting to learn (or relearn after years) the basics of python to make it make sense to me. See if I can use the skills here
>>
>>42012745
>>
>mares!
>>
>15.ai leeched off the hard work of countless anons gathering sound samples and the faggot didn't even release his model in the end


>gathering sound samples
>Jeez what a hard job. Still anal about it?

>You mean those retards who don't even know Suno, Udio, ElevenLabs and keep insisting on training their shit using sovits on their toaster PCs? rofl. Don't make me lol.
>Those retards live at the bottom of a barrel that's under a boulder under a cave.
>>
"Pony preservation project" is now seen by hundreds of thousands of people on Twitter, thanks to 15's recent post. Given the choice, anons, would you like the PPP to become mainstream?
>>
>>42015362
I think the PPP's era is over. This place is a bump general that needs to be put to rest, and all publicity would do is flood the thread with useless newfags with too many questions.
>>
>>42013977
>>
Time to turn this into a music AI general.
>>
>>42015866
Music and voices, it's all the same right? If we make ponies sing, it's musical. We'll just get BGM to spearhead that
>>
>>42015866
Isn't music already covered by /create/? If this thread shuts down, wouldn't we all end up posting on /create/ anyway?
>>
>>42015930
No idea. I for one at least wouldn't have thought of going over there.
>>
>>42015930
I imagine not for ai covers, thats more BGM level stuff. or originals, at least.
>>
I'm looking for two Twilight Sparkle covers I cant find in the archive but I have heard before. I'm gonna be (600 miles) and A thousand Miles. I cant find either of them anywhere.

also, here's a Daylight cover with Twilight https://files.catbox.moe/2paett.mp3
>>
>>42016171
>A Thousand Miles
https://files.catbox.moe/alto75.mp4
I could find this Dashie version, however I cant find the 600 miles song (i know it exist, since i also listen to it like two years ago)
>>
>>42016592 →
Gift from the fleet thread.
>>
>>42016598
very nice, wholesome pony content
>>
>>42016598
Really enjoyable little gem.
>>
>>42016106
same here, even if the threads has only dozen of posters, I will will do whatever I can to keep /ppp/ alive.
>>
File: but-why-jon-tron-show.gif (1.89 MB, 498x280)
1.89 MB
1.89 MB GIF
>9
>>42017113
The thread is already dead though. Keeping it "alive" at this stage is just parading around a corpse of what once was, like Hollywood does to old IPs.
The PPP has served its purpose, and even the core team that represented the lifeblood of this place has moved on. I think it's time to let it rest.
>>
We should merge with AI art.
>>
>>42017743
Disagree, there is more of the show to preserve
>>
>>42017743
>parading around a corpse of what once was
no, I strongly disagree with this view. It would imply that there there wouldn't be any new tech to work with and create the content, which is silly since we don't know what we don't know, especially when it's been only few months we had stuff like deepseek for cheaper text generation and gpt-sovits for tts with emotion control. Hell, even the animation tech is making some baby steps from 5 seconds of messy nonsense to 5 seconds of things that are looking mostly consistent.
Fuck it, I personally have SEVERAL ideas I want to make using the ai tools from here, Im just saving money to get a upgrade to my potato pc, I would imaging there are at least handful more Anons like me, out there, sitting on a cool idea that is just waiting to be made into reality.
>the core team that represented the lifeblood of this place has moved on
no? the guys are still here, and while being busy with other projects. they are lurking and occasionally posting in the thread (mostly as Anons).
>>42018005
hmmm, while merging back the ai image and /chag/ to /ppp/ would bring numbers but I feel it would do a disservice to our sister-threads, since both problems solve around their respective application/services. I think it's fine for /ppp/ to stay as the general mlp ai thread to share things that are non-image and non-text stories based, who knows, maybe in a year or two there will be an VR tech revolution that will allow most of the board to afford & easily use it and that itself could spin into a new thread dedicated entirely to VR Equestria simulation threads.
>>
>>42018005
>>42018267
Merging would be silly for a lot of reasons, not the least of which is that there's no reason to force such a thing.
>>
>>42018267
>we don't know what we don't know
Is that what you're basing your view on? That maybe things might change at some point? Seems flimsy.
>>42018267
>It would imply that there there wouldn't be any new tech to work with
The tech isn't the issue, it's the lack of engagement with it. PPP Image gen has been "in progress" for three years with no notable progress to show, despite the field moving forward elsewhere. Even if you're right and the core team is still here without namefagging, they're not making things, which is the same issue.
>would imaging there are at least handful more Anons like me, out there, sitting on a cool idea that is just waiting to be made into reality.
I'm sure there is. That's not helpful though, and sounds like the mentality of a dead general.
>We'll be SO back once XYZ happens!
Someone needs to take the reigns and DO something here, something worth talking about.
>>
>>42018380
If somebody wanted to make animations with an img2vid model, would that go here or on AI art? If it had dialogue, it would definitely go here, right? Part of the problem is that we've run up against the limit of things that can be accomplished with a free Colab. Everyone here is too cheap to buy a GPU that can run Wan, or subscribe to Colab Pro, Colab Pro+, or rent a Runpod. I don't know if Civitai has implemented img2vid yet.
>>
>>42018267
>the guys are still here
Can confirm, staying anon as name isn't necessary most of the time. I'm always lurking for the moment when any dataset stuff or other work pops up within my capability. Creative voice stuff also happening in the background, somewhat slow due to less available time + can't use my own Bri'ish voice as reference for sovits and such.

The main bottleneck appears to be expensive hardware and restrictions to only being able to use the AI advancements that make it to open source, and I don't think there's much anyone here can realistically do about that. Frustrating as that is, the capabilities to make use of said advancements in PPP have been clearly demonstrated many times.

I'm not concerned about the current somewhat dormant state. PPP's already succeeded in creating the best (currently) possible voice dataset and synthesis models outside of big tech, everyone who's worked here can be proud of that and shouldn't let a lack of big headlines obfuscate these achievements.
>>
>>42018696
The thread still serves the purpose of being a place to report found papers and tools regarding open voice generation. Not as glamorous as being the AI voice frontier but being persistent (obsessive) is something we have experience with.
>>
>>42015864
>>
>mare
>>
Hello Anons, how are you right now, I hope you all are good right now.

I have been modding this: https://github.com/RAYTRAC3R/codedump recently, fixing some bugs, adding hifigan support. Should I add more stuff to this?
>>
>>42020068
for anyone wondering, the GitHub link that I sent is a fork of an older repo by Cookie (The multispeaker Tacotron2 Trochmoji thing). This fork adds the 15.ai emotional contextualizer system.

(What did Cookie that he's no longer working with the PPP)
>>
>>42020068
>>42020080
tacotron2 with emotion control sounds interesting, however, how does one use it? I think I still have the old TkinetAnon program but I don't think updating it would be as easy as dumping the github files into it.
>>
Demoman model for gpt-sovits.
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/TF2_demoman_so96_gpt24
>https://vocaroo.com/1htLMUU3OqL8
>https://vocaroo.com/1ek8yqAvnlpK
>https://vocaroo.com/12pmgMlpqWzD
With that, all the TF2 mercenaries have their own models done and ready.
If anybody wishes for a specific pony character to get their own gpt-sovits model feel free to request them.
>>42001057
HydrusBeta, could you add these Tree Hugger and Iron Will models >>41999593 to haysay? or is the emotion process you had it going to making them too different to use?
>>
>>42020443
Sorry but these models don't work with that program. I will make a GUI for the multispeaker Tacotron2 Trochmoji thing, but It will contain modifications I made.

(+ Will support Older models)
>>
>>42021820
>I will make a GUI.

It will be a colab script that starts the webui, similar to this:https://colab.research.google.com/drive/1UjSg4tDcubbkax781fE0pNeAFdht_MZ0?usp=sharing

But will be more user friendly. (and use my repo)
>>
>>42018829
Nobody here ever implements anything from the papers that gets posted.
>>
>>
>>42022933
>"Self-destruct sequence initiated."
>>
>>42022939
https://files.catbox.moe/trw6tg.wav
>>
sesame's conversational voice demo (see: https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo) got open-sourced today:

https://github.com/BenLechiara/csm
So who's up for training this to have a conversation with Twilight?
It's the best model currently out there for voice chat with AI.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Edit][Settings] [Search] [Mobile] [Home]
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.