[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/mlp/ - Pony

Name
Spoiler?[]
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File[]
  • Please read the Rules and FAQ before posting.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: AltOPp.png (1.54 MB, 2119x1500)
1.54 MB
1.54 MB PNG
Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE

The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.

Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.

AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.

Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.

EQG and G5 are not welcome.

>Quick start guide:
docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.

>The main Doc:
docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
An in-depth repository of tutorials, resources and archives.

>Active tasks:
Research into animation AI
Research into pony image generation

>Latest developments:
GDrive clone of Master File now available >>37159549
SortAnon releases script to run TalkNet on Windows >>37299594
TalkNet training script >>37374942
GPT-J downloadable model >>37646318
FiMmicroSoL model >>38027533
Delta GPT-J notebook + tutorial >>38018428
New FiMfic GPT model >>38308297 >>38347556 >>38301248
FimFic dataset release >>38391839
Offline GPT-PNY >>38821349
FiMfic dataset >>38934474
SD weights >>38959367
SD low vram >>38959447
Huggingface SD: >>38979677
Colab SD >>38981735
NSFW Pony Model >>39114433
New DeltaVox >>39678806
so-vits-svt 4.0 >>39683876
so-vits-svt tutorial >>39692758
Hay Say >>39920556
Text generation colab >>40271923 >>40276284
/mlp/ image dataset >>40393331
Haysay on the web! >>40391443
SFX seperator >>40786997 >>40790270
Cream Heart model + Alltalk-tts >>40836410
Clipper investigates further data cleaning >>40860022 >>40872222 >>40890799 >>40902356
HydrusBeta working on HaySay 2.0 >>40840723
Blueblood RVC model >>40887151
AI Redub 5 Releases >>40871923

>The PoneAI drive, an archive for AI pony voice content:
drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp

>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
drive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx

>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.

Last Thread:
>>40829408
>>
FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Main: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit

>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwq
How to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy

>Where can I find content made with the voice AI?
In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
And the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit

>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ4
2021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f
2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw5
2023 pony.tube/w/fVZShksjBbu6uT51DtvWWz

>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.

>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imitations of official voices?
No.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.

>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
pony.tube/w/mqJyvdgrpbWgZduz2cs1Cm

PPP Redubs:
pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97

Stream Premieres:
pony.tube/w/6cKnjJEZSCi3gsvrbATXnC
pony.tube/w/oNeBFMPiQKh93ePqTz1ns8
>>
File: biganchor.jpg (161 KB, 640x640)
161 KB
161 KB JPG
>>40921071
>>
>page 9
>>
>>40920442
so excited to try this shit out!
>>
>that last pic in the previous edition
Kek
>>
>>40921702
we salute the fallen
>>
>>40921702
Sneaky.
>>
Up again.
>>
>>40921562
Glad you're excited! Now I can share something of Pinkie! Don't worry, according to my friend the program can do English too, Japanese just works really well.

Pinkie(With instru): https://pomf2.lain.la/f/ny70wit.wav

Pinkie(Without instru): https://pomf2.lain.la/f/v5lzgzo1.wav
>>
Along with Pinkie, there's also an English thing of Applejack singing the same song that the first every ai voice sang... how touching. It's not perfect, but it's still really good. Oh, and good news, my friend is planning on adding Spike and Discord to the fray! Not too thrilled about Spike but Discord could be fun!

https://pomf2.lain.la/f/mcs9g4x5.wav
>>
>>40922574
How about 'go fuck yourself'?
>>
Rarity time! You know the drill, one with the instrumental, and one without! By the way, how does MareLoid sound for the program? My friend wants to keep things chill for it, but I think the name could work!

Rarity (With instru): https://pomf2.lain.la/f/124xilny.wav


Rarity (Without Instru): https://pomf2.lain.la/f/ynuagi48.wav
>>
Bump.
>>
File: upsies.jpg (95 KB, 1024x1024)
95 KB
95 KB JPG
upsies
>>
File: horse.png (113 KB, 256x256)
113 KB
113 KB PNG
Musicians are dead. suno ai
https://files.catbox.moe/fxcjwn.mp4
>>
>>40924457
What is the point of an AI that is incapable of generating fetish erotica?
>>
The mare's a freak. The msre never misses a beat.
>>
File: pinkmetal.jpg (99 KB, 516x516)
99 KB
99 KB JPG
>>40924457
https://files.catbox.moe/6s2yv7.mp4
>>
File: G-1.png (128 KB, 256x256)
128 KB
128 KB PNG
>>40925098
https://files.catbox.moe/uistc5.mp4
>>
File: download (1).png (514 KB, 1584x836)
514 KB
514 KB PNG
Hello again! As of now, Spike and Discord have been added into what we now call MareLoid! I hope I'm not becoming annoying with all my updates, but my friend and I are really passionate about this. To give a reminder to those who don't know what I'm talking about, MareLoid, if everything goes well, is a way to make your favorite mares sing songs without reference audio, similar to how one uses Synthesizer V or Vocaloid. Allow me to share what my friend cooked up with Spike and Discord!

Discord (With instru): https://pomf2.lain.la/f/gmmapcn5.wav

Discord (Without Instru): https://pomf2.lain.la/f/oyen6178.wav

Spike (With instru): https://pomf2.lain.la/f/8j4qfieo.wav

Spike (Without instru): https://pomf2.lain.la/f/sgoqfpjp.wav

Any suggestions on who to add next? We're thinking Trixie perhaps?
>>
>>40925332
Granny Smith just for shits and giggles

Besides that, Celestia, Luna, and Cadance would be neat.
>>
>>40925418
You. You do not know how GOOD she is... I know you meant it for shits and giggles but DAMN she can sing!

Granny Smith (With instru): https://pomf2.lain.la/f/5ud769h6.wav

Granny Smith (Without instru): https://pomf2.lain.la/f/qsmyx7s0.wav

As for the princesses, they're on the list!
>>
>>40924457
can it do pony in the style of dragonforce?
>>
>>40925332
will there soon be a way for us to try this ourselves?
>>
>>40925866
Unfortunately, it's not that simple. I don't know programming as well as my friend does, but according to them, it's not exactly at a state where they feel comfortable sharing it with anyone yet. They're... kind of like 15 in terms of perfectionism, so I can't really say if anyone but them will be able to use it. But if all goes well, and they're able to find it perfect, it may have a chance of release. We can only hope.
>>
Up.
>>
VOICECRAFT: Zero-Shot Speech Editing and Text-to-Speech in the Wild
https://jasonppy.github.io/assets/pdfs/VoiceCraft.pdf
>We introduce VOICECRAFT, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts1 . VOICECRAFT employs a Transformer decoder architecture and introduces a token rearrangement procedure that combines causal masking and delayed stacking to enable generation within an existing sequence. On speech editing tasks, VOICECRAFT produces edited speech that is nearly indistinguishable from unedited recordings in terms of naturalness, as evaluated by humans; for zero-shot TTS, our model outperforms prior SotA models including VALLE and the popular commercial model XTTS v2. Crucially, the models are evaluated on challenging and realistic datasets, that consist of diverse accents, speaking styles, recording conditions, and background noise and music, and our model performs consistently well compared to other models and real recordings. In particular, for speech editing evaluation, we introduce a high quality, challenging, and realistic dataset named REALEDIT.
https://jasonppy.github.io/VoiceCraft_web/
https://github.com/jasonppy/VoiceCraft
>[] Upload model weights
weights not up yet but soon maybe
>>
>>40926585
A better TTS tools would be nice.
>>
>>40926585
>The training of the 830M VOICECRAFT model took about 2 weeks on 4 NVIDIA A40 GPUs.
cheapest 4x A40s on vast.ai is $1.803/hr so ~$605.8
>>
>>40927049
though that one seems a bit fucked so probably $2 is a better estimate so $672. 4x 4090s might actually be a better deal. $2.4 an hour and if the tflops ratio works out in training then it'd be ~$300
>>
Page 10 bump.
>>
Hey BGM, I remember you mentioning half a year ago that you were planning to make a Linky song but it's been quite a bit since then so I wanted to ask if you've changed your mind by any chance.
>>
>>40920442
>>40923239
I only check this thread once a week, but this is a super impressive project, anon!
>>
Update: The princesses have been added! Here's a preview of them singing that one new years eve song in Japanese!

Celestia: https://pomf2.lain.la/f/4mcnfwdg.wav

Luna: https://pomf2.lain.la/f/weq0rkl7.wav

Cadence: https://pomf2.lain.la/f/r8hfkabz.wav
>>
File: lyrics.png (217 KB, 959x1012)
217 KB
217 KB PNG
>>40924457
This is very cool. Literal first attempt which took me less than a minute to "create":
https://files.catbox.moe/17calb.mp3
Lyrics generated with GPT, copypasted into this thing and that's it. It's not perfect-perfect, but it's 10x better than the previous AI tools to make songs and for this I intentionally went with the first lyrics and the first song gen, no edits (those damn 'hands' in chorus, ree) or rerolls. The 2-minute trial ate the outro, though.

>>40925864
It's unfortunately not trained to mimic styles of specific bands/artists, you'd have to describe it and probably generate a bunch.
>>
Surprise! The CMC are in MareLoid! This will probably be the last update for a bit, but I'll still be lurking here!

Applebloom: https://pomf2.lain.la/f/t2hcxnzz.wav

Sweetie Belle: https://pomf2.lain.la/f/exs4e1q0.wav

Scootaloo: https://pomf2.lain.la/f/xqpiatdx.wav
>>
bump
>>
File: twiggles.png (174 KB, 977x1024)
174 KB
174 KB PNG
>>40925105
https://files.catbox.moe/fczgrs.mp3

I'm having fun with this. Prompt was "Sad song about a girl named Twilight in the style of lo-fi hip-hop in the late 1990s."

I wonder if there's a way for me to continue/redo a prompt with this specific beat.
>>
File: Pinkie 53.jpg (110 KB, 667x800)
110 KB
110 KB JPG
>>40929685
This is crazy.
https://files.catbox.moe/b73nte.mp3
https://files.catbox.moe/lr859w.mp3

prompt: a emo, industrial, dream pop, psychedelic, shoegaze, trip hop, vaporwave song about pinkie pie the pony being depressed that her friends don't love her anymore.
>>
File: aimare.png (96 KB, 256x256)
96 KB
96 KB PNG
>>40924457
"Made" a sequel song using the new v3 version. i used this sunoai just a couple of days ago and you had to pay for v3, now v3 is free. so v4 must be around the corner already.
https://files.catbox.moe/00wefs.mp3
prompt,
a sad country breakup song about a man's pony mare named applejack, being taken away from him. belonging the days where he can pet his mare once again.
>>
Mare between the lines
>>
>>40903672
Well this didn't exactly work. I ended up trial and erroring a whole bunch of different things and I think the predictor_encoder and style_encoder have the biggest impact on F0. Unfortunately characters were no longer recognizable even though the F0 was closer to desired.
>>
File: Maudgen.png (936 KB, 1024x1024)
936 KB
936 KB PNG
>>40924457
Very fun stuff, reminds me of Jukebox AI when it released on Google Colab. This is so much easier to use, I just had to make something after genning these lyrics.
https://youtu.be/gRNP3lBEHNQ
>>
Hey marrs
>>
Bumping the mares.
>>
>>40928808
is it possible to make the mane six (especially fluttershy) do black metal growls with this program?
>>
>>40929685
https://files.catbox.moe/7via22.mp3 Stitched together a full version.
>>
Bump.
>>
I have some news to share, it's not... great.

Due to circumstances involving my friend's perfectionism, they deem the quality of the voices within MareLoid to be not up to their standards, and rather than try another AI algorithm which could give "better" results... they've decided to shelf MareLoid for the time being, meaning no new voices will be added and no new demonstrations will be made for the time being. I know some of you really wanted to try this out, myself included, but I can't just badger my friend about reconsidering, they're dead-set on shelving it for now. I'm really sorry about all of this, please don't be too mad or blame my friend too much, they just have... really high standards. That's all I can say. I'll still lurk here for stuff relating to the mares, but unless MareLoid starts up again, this is the last time you'll hear of it for now... I apologize greatly.
>>
File: Drink.png (1.83 MB, 2000x1500)
1.83 MB
1.83 MB PNG
>>40932952
>Perfectionist hides something great from the world because it isn't perfect
Don't worry, we're used to it.
>>
>>40932952
inb4 mareloid anon’s friend is 15
>>
>>40932952
>We never actually made anything, we just used so-vits-svc, you got fucking trolled, faggots
Every time.
>>
>>40933744
You called? :)
>>
>>40933757
https://files.catbox.moe/6sls9v.flac
>>
>>40933757
>>40933838
>Post removed
...What did Jannie mean by this?

Like, genuinely, what on earth could the rational have been for that? Dafuq?
>>
>>40934086
>...What did Jannie mean by this?
Maybe anon deleted because he was too ashamed of himself?
>>
>>40934108
No, I'm the anon. Still ashamed of myself, but now also confused.
>>
>>40934086
>>40934108
>>40934114
Same. The comment was just talking about how he liked my RD cuck audio and joked about wanting another one. Mods, are you okay?
>>
>>40925955
>They're... kind of like 15 in terms of perfectionism
Lol, 15 does it because he wants to scam his gullible patreon paypigs.
>>
>>40934129
>Jannie, are you okay?
>So, jannie, are you okay?
>Are you okay, jannie?
>Jannie, are you okay?
>So, jannie, are you okay?
>Are you okay, jannie?
>Jannie, are you okay?
>So, jannie, are you okay?
>Are you okay, jannie?
>Jannie, are you okay?
>So, jannie, are you okay?
>Are you okay, jannie?
>Jannie, are you okay?
>Will you tell us that you're okay?
>There's a post on the image board
>That he cucked you - a greentext, jannie
>He came into your thread
>He left cumstains, on the carpet
>Then you ran into the basement
>You struck it down
>It was your doing
>Jannie, are you okay?
>So, jannie, are you okay?
>Are you okay, jannie?
>You've been hit by
>You've been hit by, a rogue janitor
>>
>Page 9
>>
>>40933757
>>40934129
>>>/trash/
>>
>>40934750
>four hours to reply
NGMI
>>
>>40934656
+1
>>
>>40934194
That thing is still going?
>>
Mare yourself out
>>
The big mare gak
>>
>>40935551
No, 15ai shut down last year, and 15 has been completely silent since February last year. Even if he did come back, he's already been outpaced by everyone and their mother. His work is amateurish now compared to shit like ElevenLabs, so-vits, and even that Suno song software that dropped recently.
>>
File: rainbow dense.gif (1.77 MB, 926x864)
1.77 MB
1.77 MB GIF
>>40936557
The benefit of 15.ai was that tech illiterate people like me could use funny pony noise machine to create or enhance content. As neat as the shit you people do is, it's kind of like visiting a zoo and looking at the monkeys in the monkey pen bang rocks against eachother. That's why these threads are so inactive.
>>
>>40936584
I mean, most of us used 15ai because it was much easier and simpler to create content, and we didn't need to resort to recording ourselves pretending to be a cute mare getting fucked and have our roommates or family overhear our degenerate shit.
>>
File: Noise mix pre and post.png (60 KB, 4717x667)
60 KB
60 KB PNG
>>40921076
Review of S5 is done.
https://files.catbox.moe/wofd6s.json
>>
File: 949yug.png (274 KB, 1920x1080)
274 KB
274 KB PNG
>>40933513
>>40934964
It's getting to be that time of year again, /mlp/con is (probably) soon to be confirmed for late June which means I'm starting to have preliminary thoughts about another PPP panel. I don't think it's quite time to start properly planning anything just yet, just time for anyone who's got any ideas or would like to be involved to start thinking about it. I'd be happy to be the main host/organiser again if no one else wants to take the reins, and assuming that happens I'll probably put out the first proper call to action once I've finished with the remaining dataset review work and /mlp/con finally actually confirms a proper date.
>>
not sure if this is the right place to ask, but wondering if there's a more efficient process in audacity for extracting vocals from a song. right now i'm clipping the original vocals, getting the generated clip from talknet/svc, and then syncing it back to the instrumental track.
>>
File: 3326844.jpg (381 KB, 2000x2000)
381 KB
381 KB JPG
>>40937252
>/ppp/ panel again
Do we even have much to talk about beside spending 10 minutes to shill haysay and newer Pony Diffusion and 2 hours of pointless mini games?
>>
Is there some LLM angled towards pony stuff? I would like to generate pony related text locally but the uncensored model that I have doesn't really know much more than surface level.
>>
>>40937252
>>40937342
I have desire to do a little bit of shilling for using that process of ai cloned clips into creating lengthy dataset for other models. And helping out in the background with whatever needs assistance.
>>
>>40936911
>have our roommates or family overhear our degenerate shit
But that's part of the flavor!
>>
before you go to bed
>>
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
https://arxiv.org/abs/2403.16973
https://github.com/jasonppy/VoiceCraft
> Upload model weights (encodec weights are up)
previously posted but they've put their paper on arxiv now. oh they've posted the encodec weight
https://github.com/jasonppy/VoiceCraft/issues/12
lol
>>
>>40938392
Also in the middle of the day.
>>
Mare anyway
>>
>>40940052
But not on page 9.
>>
File: Spike being useful.gif (254 KB, 423x460)
254 KB
254 KB GIF
haysay.ai JUST went down. Anyone else?
>>
>>40940484
I've restarted the Docker containers and its back up now.
>>
I am rarted. I'm only now remembering to do this. YEARS LATER. But better late than never!

__________

Here is a collection of all the moans and sighs I've generated over the years with 15ai when it was still active. Altogether, this is 538MB. It is currently split into three separate zip files due to Catbox's 200MB limit. Many of these audio clips were made for audios that I either never got around to or couldn't make with the shutdown of 15ai. If I can find a good alternative, they'll be finished someday.

Part 1 contains:
>Daring Do
>Derpy Hooves
>Fluttershy 2
>Gilda
>Pinkie Pie
>Sci-Twi (For consistency's sake, but it's just Twilight)
>Scootaloo
>Spike
>Trixie
>Twilight Sparkle 2
>Twilight Sparkle

Part 2 contains:
>Applejack
>Fluttershy
>Rarity

Part 3 is Rainbow Dash. Some of it was generated for the cuck audio, but can be cut and spliced at will. unless you're a cuck

PART 1 -----> https://files.catbox.moe/nfwu5p.zip
PART 2 -----> https://files.catbox.moe/kd1jvv.zip
RAINBOW DASH 2 -----> https://files.catbox.moe/w5stv4.zip
>>
Precautionary prior to work bump.
>>
>10
>>
>Page 8 in 90 minutes
I see there's some shit going on in the catalog again.
>>
>>40941866
>Page 8 in 35 minutes
What the fuck is going on today?
>>
>>40942427
>several threads with random "lets talk about mare" with less than a dozen post
im not saying its some raid, but it sure stink like one, designated to pump off threads on such slow board.
>>
>>40941013
>>>/trash/
>>
>>40942541
Stop advertising yourself, Goku.
>>
File: newspaper_rarity.jpg (357 KB, 1600x1604)
357 KB
357 KB JPG
>>40942500
Why the fuck would that be a raid? That's how boards are supposed to work. Are you saying a board where half the content is people page9 bumping their inactive generals is a healthy situation?
>>
>>40942604
it's called pattern recognition. There clearly wasn't anything big happening (eg Hasbro ending G% , rule 15 getting removed or Faust making her own pony series with blackjack and hookers), if it was weekend this maybe would make sense since people are having time off and shitposting as usual, but having board go into abnormal overload posting in middle of the work days is just bit weird.
>>
File: 1696239849878580.png (221 KB, 686x714)
221 KB
221 KB PNG
>>40942604
Generally, new threads are supposed to have some form of viable topic to discuss. A flood of random screencaps with less than a full sentence for an OP is just a bunch of shit.
>>
>>40942931
Whatever it was, it seems to be over for now.
>>
File: CeliCake.png (333 KB, 521x512)
333 KB
333 KB PNG
Alright, suno is pretty fun.
https://app.suno.ai/song/5b3abea5-7ad6-48f3-ace5-1f11f1444c15
>>
>>40943827
>5 short songs per day or pay up
fug, does anyone know if the alternatives that can create similar outputs with just text only? ( know there is BARK, but so far, the results from anons seems just so-so)
>>
>>40943827
I suppose the next step would be to take a suno output, separate the vocals, transform the vocals into a character voice, then mix them back together with the instrumental.
>>
https://github.com/DoMusic/Hybrid-Net
>Real-time audio source separation, generate lyrics, chords, beat. A transformer-based hybrid multimodal model, various transformer models address different problems in the field of music information retrieval, these models generate corresponding information dependencies that mutually influence each other.
if anyone could test to see if this is superior to UVR (I'm not into audio stuff) that would be appreciated
>>
>>40944463
>no requirements.txt file
this is a bit of 'uhoh' moment.
>>
>>40944406
I wouldn't be surprised if they offer isolated vocals as an option eventually. I assume they're generating both in separate models and then mixing, so it should be trivial.

>>40944314
I mean no, that's why everyone's hot for teacher over suno, it's the first one to do music gen from nothing that doesn't sound like trash (earlier attempts were getting there, but always kinda lost the plot, never remained coherent for a full 2 minutes as far as I know.) Personally, I can't wait till we can get something like this going privately, so we can pirate musical styles and specifically request chord progressions if we want. Impossible with current memory/processor limitations, but who knows what the future holds.
>>
Bump.
>>
>>40944762
How so?
>>
>>40938470
>VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.
> 03/28/2024: Model weights are up on HuggingFace here!

weights are up, should we try to fine tune it?
>>
bmup
>>
>>40946869
https://github.com/jasonppy/VoiceCraft
> 03/28/2024: Model weights are up on HuggingFace
https://huggingface.co/pyp1/VoiceCraft/tree/main
>>
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
https://arxiv.org/abs/2403.17694
>In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image. Our methodology is divided into two stages. Initially, we extract 3D intermediate representations from audio and project them into a sequence of 2D facial landmarks. Subsequently, we employ a robust diffusion model, coupled with a motion module, to convert the landmark sequence into photorealistic and temporally consistent portrait animation. Experimental results demonstrate the superiority of AniPortrait in terms of facial naturalness, pose diversity, and visual quality, thereby offering an enhanced perceptual experience. Moreover, our methodology exhibits considerable potential in terms of flexibility and controllability, which can be effectively applied in areas such as facial motion editing or face reenactment.
https://github.com/Zejun-Yang/AniPortrait
> Update the code to generate pose_temp.npy for head pose control.
>We will release audio2pose pre-trained weight for audio2video after futher optimization. You can choose head pose template in ./configs/inference/head_pose_temp as substitution.
lip sync for images to audio. has face reenactment ability too. github has more videos. it's pretty rough but lots of room to improve. they used 4 days with 4 A100s to train it so if using vast.ai with the 4x A100 80GB offering it would cost $336 to recreate (using their datasets which might not be optimal). also still hilarious we can't use webms with audio outside of /gif/
specifically for you guys it would probably require a new model to be trained with ponies or w/e for it to work well
>>
>>40947581
Not only that, it would have to go human --->equine--->cartoon equine. It would have to be capable of hopping 3 categories and fundamentally different skull structures, vs 0. Probably still possible eventually, but a much harder problem.
>>
>>40947581
>Pony facial animations
Man, that would be damn cool.
>>
an anon tested the voiceclone capability of voicecraft
>new tts model dropped and here is the source file
https://voca.ro/157IzI9y4YZ6
>and here is the generated voice.
https://voca.ro/1ojHkZ87XRVL
>>
https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices
>Today we are sharing preliminary insights and results from a small-scale preview of a model called Voice Engine, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.
>It is notable that a small model with a single 15-second sample can create emotive and realistic voices.
>>
>>40948519
this is sounding pretty good, little bit too good. If this isn't someone trolling by recording themselves twice, it's a pretty nice indication that the voice cloning tech is going from being kind of shit to pretty decent.
>>
>>40948519
How on earth are you supposed to train a voice on this? I'm so confused by the instructions. Am I just supposed to wait until a colab or huggingface demo appears?
>>
File: Twilight think.jpg (94 KB, 825x1080)
94 KB
94 KB JPG
>>40948519
Eh? That is impressive. However, all I've heard so far are low quality inputs.
Can we get some high quality results? Preferably pone? It should be much easier to tell if there's any flaws.
>>
hey does anyone have a link to that pinkie pie american pie cover with vul style voices?
>>
>>40949398
https://files.catbox.moe/g3ahk9.flac
I got you senpai
>>
>>40949524
I LOVE YOU ANON!!!!
>>
>>40949398
Isn't that on Youtube too?
>>
>Page 9
>>
>suno ai songs are all saved to mp3 and the link is easily accessible with standard webpage inspection keybinding
Not sure if I should feel happy that there is no need to fuck around with 3rd party software to download songs, or disappointed that apparently one of the more advanced musical ai tools are being handled by some IT intern guy.
>>
>mare
>>
>>>/wsg/5497184
>I think the main barrier is good local tts. Stability AI actually trained a SOTA tts model back in February based on enabling Audiocraft to generate text. They pussied out though and didn't release the model. Here's the paper if a tech savvy anon wants to spend $20,000 and 3 months of dev time to make the best open source tts solution that can also do music.
>https://arxiv.org/abs/2402.01912

reee
>>
>>40951924
>$20,000
maybe not SOTA but voicecraft (>>40926585) took 2 weeks with 4xA40s to train. Vast.ai current pricing has that work out to be around $672.
>>
Bump.
>>
>>40941013
>no replies for this high effort post
anon I am appalled how and why the fuck did you do this
why so many what were the uses for it?
>>
File: 1417744044882.gif (60 KB, 720x1050)
60 KB
60 KB GIF
>https://youtu.be/f8DKD78BrQA?si=a1ejJszeKdC4idq3
for those of you who watched this may have noticed a little something where they used a simulation (3D models trying and failing at a certain activity until they get it right) in order to train AI and it just fucking hit me just how close we are to pony bots
>but muh computing power
well yeah, but hypothetically if we had the funds for training and that was taken care of, as long as we had some sort of model and card (think /CHAG/ because we could totally get those guys to whip something up for us if we needed to just use a basic model as the base for the robo-pony) and then all that's left would be
>the hardware
now the hardware would also be expensive since you have essentially a full blown PC (to run the local model instead of a proxy)
>inb4 the nightmare of having your waifus proxy getting taken down
and then on top of the PC parts like GPU you also have the mechanical movements which maybe requires some knowlege on hydraulics (maybe thats way too powerful for a small pony but the point stands) and the actual robotic parts would be expensive as fuck, I feel this would end up being the hardest part, but i guess the nice thing about having this general thread is we can always come up with hypothetical stuff that could end up as the actual basis for robo-pones once we have the abillity to do so
you know what they say, (i looked it up and couldn't find it so ill improvise) an ounce of planning is worth a pound of action
having these ideas and plans on us could give us an opening to actually do stuff the second we have the funds or the tech
besides, by breaking it down into actionable steps (in a similar way to how anon's claimed singular scenes for the Redub projects in part due to the way the google spreadsheet was setup) we could totally have an a plan and get our waifus out ASAP
and with growing promise im sure funds would start pouring in from anons on here, i mean look what happened to 'Fallen Oak Sanctuary' at mare fair, if they can get 50k im sure a project like this, which directly affects all the anons on this board, is surely to get funding the second it shows some promise
and the only way to show promise is to show solid planning and theory to back it up and generate excitement from anons who can donate
>>
File: 1599526626979.jpg (403 KB, 1237x1019)
403 KB
403 KB JPG
>>40953149
another thing would be (i guess you could phrase it as) contextual context where instead of AI taking a certain amount of most recent messages (more of a /CHAG/ thing with sillytavern and context token limits) you could probably just use keyword related context (which im confused as to why this isnt a thing in silly tavern yet)
but anyways ignore that fucking slop above this sentance, the point im trying to make is we would have to figure out how to maintain proper context so our little robo-pones aren't overwhelmed everytime they want to speak
im sure you could have both keyword context as well as (i guess you could call it) simile context where previous conversations which are similar but not exact to a current conversation can still be used as context but how that would work is beyond me

at this point in time we pretty much are just leeching off of what tech companies give us and adapting it to pony stuff, i doubt we could compete with these tech giants in terms of actual AI advancement but still, if we are adapting their AI we could at least plan for the actual hardware of robo-pones since we all know a NVIDIA-robo-twilight aint happening anytime soon so it's up to us to get designs and plans rolling out in preparation for funding (maybe mare fair could help with this one day or even /mlp/con)

all that matters is the momentum and actual visible plans is being shown and surely funding would roll in from other excited anons
>>
>>40953111
I wanted to create a multitude of moans and splice individual ones together in specific ways in order to get a more "natural" sounding series of moans, something that sounded genuine and realistic, like the character is actually breathing and doing it. Sometimes one generation will have a good gasp and then moan like they're going "huh?", while another one derps out when it gasps but they do this little shiver afterwards, and it fits just right.

For example, with the Fluttershy audio I did two years ago, there are a few generations I was able to make that straight up sounded like Fluttershy was cumming hard, and it worked so well for the context of the story I had set up.

Many of the others were for audios I had planned but either never got around to, or couldn't do them because 15ai shut down and never came back up. But either way, the sounds I generated were just sitting on my hard drive collecting dust, so I wanted to post and share them for folks to use either in audios or animations. They're still a bit noisy because they were generated in early 2022, but with some good filtering effects you can easily remove it.

Some of the audios also have dialogue included in them, but they're meant to be ignored since those generations included some good moans.
>>
>>40953222
>>>/trash/
>>
Gradio port of VoiceCraft if anyone cares.
https://github.com/friendlyFriend4000/VoiceCraft
>>
>>40953388
>windows version maybe
oh jeez. Im not ready to go full penguinpilled, despite most of the stuff I use is technically available on both OS however all the little changes just fuck up my productivity.
>>
File: Noise mix pre and post.png (63 KB, 5670x693)
63 KB
63 KB PNG
>>40921076
>>40937247
Review of S6 is done.
https://files.catbox.moe/2a6kml.json

vul - do you have the original audio for EQG and the FiM movie? I think it'll be worth running them through demu as well to see if we can get some more clean audio there.
>>
Since Suno is restricting prompts based on certain words and has no idea what the fuck it's supposed to be, I generated these:

Suno! -----> https://files.catbox.moe/c7qk63.mp3
NO FUN ALLOWED -----> https://files.catbox.moe/brdne4.mp3

Here's another version that's not as good, but they say "rock solo" in a cool way: https://files.catbox.moe/uho0fd.mp3
>>
File: space 1664195774480418.jpg (548 KB, 1608x2081)
548 KB
548 KB JPG
>https://app.suno.ai/song/a9973479-6cdc-4098-a8d8-cf0787d64943
Alright, its my first time fucking around with this thing, and I through of using some half asses lyrics at it and the outcome is pretty OKish.
It's still pretty cucked with the artificial set up credit system to randomly cut off the song just few seconds before the end (probably by design, to trick people into going full paypig mode), however if I had a offline version of this tech I would totally spend two days trying to make it perfect.
IF/WHEN my above complain stop being problem, I can see this as another boom of pony music, just as there was a boom of them after the rvc/sovits got created year ago.
>>
>>40953388
https://github.com/kijai/ComfyUI-VoiceCraft
>>
File: Velvet Sparkle.jpg (161 KB, 1000x437)
161 KB
161 KB JPG
If i could ask someone a request here. Could anyone generate an Ai dub based on this image.
>>
>>40954154
>clipper
I LIKE YOUR SNOW PONY VIDEO
also where did u get all that science equipment from snow bro?
>>
doot
>>
>>40954981
I've try to see if it would be possible to quickly edit this with pony voice, but the constant change in reverb as well as randomly going from solo to duet to solo in the same sentence makes a very cursed sounding outputs.
So with the fact that 70% of the lines would need a redoing, I can see this being a pretty nice prototyping tool for people to then create proper songs, either with raw vocals or audio conversion. To get a "plug voice to rvc/sovit and get mare" future we will need to wait for a uncucked offline version of this tech.
>>
>>40955053
https://github.com/haoheliu/versatile_audio_super_resolution
having some way to upscale the 16kHz voicecraft output in an easy to use manner might be good
>>
15 ai come back
>>
>>40958251
Nah, too busy wiping his ass with his patreon money.
>>
>Page 9
>>
File: OIG3.J_PymwmqP1MCLzx7tV2j.jpg (199 KB, 1024x1024)
199 KB
199 KB JPG
>>40959439
>>
>saw this on /g/
Is this were we are heading, future with fantastical and whimsical scifi level of tech and all the normies going "ehh, whatever dude"?
>>
>10
>>
>9
>>
>>40936557
>ElevenLabs
Doesn't let me use it without upgrade my subscription
>>
>>40960343
>the normies going "ehh, whatever dude"
That's probably for the best. Normies wouldn't grasp what they're dealing with and only attract a larger outrage mob by painting a bigger target on AI.
>>
File: OIG4.CHKkSMqoRY9TOpDxUM9q.jpg (127 KB, 1024x1024)
127 KB
127 KB JPG
>>
>>40964097
>teaching robots how to boop
That's a powerful tool you give them.
>>
Since making songs with suno ai (with some help with chat gpt) is relatively easy, how would Anons feel about making a some kind of musical ai album with the theme of taking an episode of S1 and making song about it (whenever it's something serious or more comedic)?
Just throwing this idea out here to see if there is any interest.
>>
>>40964432
That sounds like a fun idea.
>>
https://stability.ai/news/stable-audio-2-0
https://stableaudio.com/
no weights and the website requires a log in but seems like for now they're making it usable for free
>>
>>40964753
thats nice
>To protect creator copyrights, for audio uploads, we partner with Audible Magic to utilize their content recognition (ACR) technology to power real-time content matching to prevent copyright infringement.
while this is bit shit, I can understand that they need this for when music companies will go full jew on them (we can't have nice things unless one of the five big corpos also take a pound of flesh for it). Sadly this most likely means that they will not publish the weight models, however if they also publish (or have a leak) on how their models is train there is a small chance someone else could pick it up and make a non-cucked version.
>>
File: dashie_glertchy.jpg (437 KB, 1536x1536)
437 KB
437 KB JPG
>>40964753
Just tested a bit of a slightly outlandish prompt; and oh my god, the result is just so weird and painful the more you listen.
>"My Little Pony, Pop, catchy, Rainbow Dash vocals, Pony, Horse, Neigh, Mare"
>https://files.catbox.moe/aidi5e.mp3
>>
File: Mare_is_scare.png (31 KB, 278x361)
31 KB
31 KB PNG
>Uses Stable Audio to generate horse SFX foleys
>Some trot some sniff but the rest all horror spook
>She growls and make otherworldly noises
>Mare is scare

https://files.catbox.moe/b97srx.mp4
>>
File: unicor piano.jpg (116 KB, 768x768)
116 KB
116 KB JPG
https://docs.google.com/document/d/17hP2rkQHlU43nNOdy1iDEJzKbhgm_kjSaCAo6J1plTY/edit?usp=drive_link
>>40964432
>TLDR Anons come together to make a full album with songs that are created whatever ai tools they have access to.

Alright, here is the basic set up for the Pony Ai Album, for anyone interested simply replay to this post with the episode that what you wish to work on (however songs based on characters/setting/ides are welcome too).
The individual song will have soft deadline of two~three weeks, if the song is not delivered by that time it will became available for choosing once again.
If there is anything that (you) think would need to be added to the main doc please do tell.
>>
>>40965188
>https://files.catbox.moe/aidi5e.mp3
This sounds like a song from a Pinkamena Party album.
>>
boop
>>
Hello Anons

So...seems like RVC can do laughter but under certain conditions, the biggest one is that it has to be somewhat airy, and a short chuckle or chortle...a soft laugh. I'll post results with various characters soon
>>
>https://app.suno.ai/song/6385f0fa-7bfd-4faf-ac2f-c3f540fec9c5/
just reposting a /g/ banger, it's not as amazing as the '4am' but it's still pretty bloody impressive.
>>
>10
>>
>>40967851
Looking forward to it.
>>
>>40968699
>>
Up.
>>
Thread preservation bump.
>>
>>40972116
yee
>>
>>40972707
Have another.
>>
still no way to make mares do death metal growls?
>>
File: 1712326099625504.jpg (107 KB, 1080x802)
107 KB
107 KB JPG
Requesting pic related integrated into an outtake of the episode.
>>
And I need mares more than want mares
And I want mares for all time
And the wichita linemare
Is still on the line
>>
>>40940580
is it open sauce?
>>
File: Noise mix pre and post.png (59 KB, 4717x667)
59 KB
59 KB PNG
>>40921076
>>40954154
Review of S7 is done.
https://files.catbox.moe/4vscco.json

That's all the core FiM episodes done. I'd still like to try running demu on our other audio sources to improve the dataset further. vul, would you be able to run it on the EQG audio and the FiM movie?

>>40956515
I'm a chemist irl so was just using the stuff I already had in the lab.
>>
>>40975418
Yes. The main repository for the UI is here:
https://github.com/hydrusbeta/hay_say_ui
Documentation on running a public server is here:
https://github.com/hydrusbeta/hay_say_ui/tree/main/running%20as%20server
Each architecture (RVC, ControllableTalkNet, StyleTTS, SVC) also has 1 or 2 of its own repositories, which are used for building its Docker image. There are about 10 repos involved in the whole project:
https://github.com/hydrusbeta?tab=repositories
>>
>>40975623
Do you have the associated audios for EQG and the movies
>>
>>40975895
>>40975623
Here is the merged labels index, could be useful in the future. If it turns out to be inaccurate it can be recomputed afterwards; the file structure is what's most important
https://files.catbox.moe/tqglrp.json
>>
>>40975895
Should be these, I also included the Best Gift Ever and Rainbow Roadtrip specials:
https://drive.google.com/file/d/12QomZA_D1XiRNciPkIxqw_reU1c64fnm
>>
>>40976188 (checked)
Isolated versions. The current code is not compatible with these audios but you can get a head start on downloading them:
https://drive.google.com/drive/folders/1dw4nYR9PjJk2C81Hzjgkym26hMULNJMO?usp=drive_link

Best Gift Ever demu0:
https://drive.google.com/file/d/1oI_qo8TAkHCzx_waQOcFZhLcNcbnnVFt/view?usp=drive_link
>>
>>40977357
Also here is the labels index for the extra files:
https://files.catbox.moe/cfcvsw.json
>>
>>40977357
>>40977360
FYI something seems to have messed up in the processing for the FiM movie and Rainbow Roadtrip. Not sure what happened other than that the timestamps are off...
>>
>>40977418
Actually there seems to be a discrepancy between the master file annotations and the version provided here
>>40976188
>>
>>40977476
Also unrelated but I still can't believe that is the highest quality audio of Rainbow Roadtrip that was ever released. I wonder if the Netflix version would've been better.
>>
boop
>>
>>40978197
again
>>
https://files.catbox.moe/0j8902.mp3
>>
>>40979460
I like the vibes you are going for in there.
>>
>>40977418
>>40977476
>>40977553
The audio for the specials and movie have always been somewhat tricky, there've been multiple sources used over the years so entirely possible the version I have locally is different. I wouldn't worry too much about it if it's unduly difficult, we already have plenty of other data.
>>
File: full.png (1.8 MB, 2048x2048)
1.8 MB
1.8 MB PNG
>>40979460
AWAKEN, MY MASTERS!
>>
>>40980260
I manually tried to align a few clips, and there does not appear to be a constant offset for Rainbow Roadtrip or the FiM movie. I imagine some clever data finagling could be done to re-align the clips but it's beyond me at the moment. Also, the Rainbow Roadtrip voice clips in the master file seem to be polarity inverted relative to the copy we have.

Going to focus on modifying the code to work with the EQG data.
>>
>>40980260
>>40980649
Updated instructions:
0. Download new release https://github.com/effusiveperiscope/PonySorter-B/releases/tag/20240407updated
1. Copy the >>40977360 into the ponysorter directory and rename it to extras_labels_index.json
2. Copy extra_process_dumps from here >>40977357 into in_audio
3. In config.yaml, change index_file to point to extras_labels_index.json. Point master_file_1 and master_file_2 at the correct paths (master file 2 I think is not used atm but it would be if we got the movie audio working in the future)
Lmk if there are issues
>>
Casualties of mare
>>
Up.
>>
Pass the mare grenade
>>
>>40980710
Got an error.
>>
>>40982445
That's odd. I can't find anything substantial about this online. Try restarting? Have you made any modifications to audio hardware/drivers/system updates?
>>
>>40982570
>Redownload PonySorter several times
>Painstakingly triple-check all steps of the process
>Waste over an hour troubleshooting mystery issue
>"Just restart PC lmao"
>It works now
REEEEEEEEEEEE
>>
>>40982634
many such cases
>>
>>40982634
The magic of technology,
>>
File: 1703695423326963.gif (785 KB, 507x508)
785 KB
785 KB GIF
>>
bump
>>
>>40982185
The holy mare grenade.
>>
>>40980649
I poked at this a little bit and the situation is even more odd for the movie--while I can get the Rainbow Roadtrip lines close enough to null out, I can't seem to get any of the audio clips for the movie in phase and I'm kind of suspecting that they are actually at very slightly different playback rates.

At this point I'm seriously considering using an AI STT with timestamps and just matching those against the manual annotations. Might also be helpful for Best Gift Ever?
>>
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
https://arxiv.org/abs/2404.04645
>Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation on a new set of speakers can be achieved by fine-tuning the whole model for each new domain, thus making it parameter-inefficient. This problem can be solved by Adapters that provide a parameter-efficient alternative to domain adaptation. Although famous in NLP, speech synthesis has not seen much improvement from Adapters. In this work, we present HyperTTS, which comprises a small learnable network, "hypernetwork", that generates parameters of the Adapter blocks, allowing us to condition Adapters on speaker representations and making them dynamic. Extensive evaluations of two domain adaptation settings demonstrate its effectiveness in achieving state-of-the-art performance in the parameter-efficient regime. We also compare different variants of HyperTTS, comparing them with baselines in different studies. Promising results on the dynamic adaptation of adapter parameters using hypernetworks open up new avenues for domain-generic multi-speaker TTS systems.
https://github.com/declare-lab/HyperTTS
code was posted 10 months ago but the arxiv paper was just posted. hope the guy who does finetunes tries it out to see if it somehow is actually useful
>>
>>40984892
So unfortunately, a good 30% of the time Whisper doesn't seem to recognize Pinkie as an actual voice (understandable, since I don't think there are any voices like Pinkie Pie in open training datasets). Going to keep plugging and see how much of the data this actually affects.
>>
>>40985101
>whisper
might want to try out different models
https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
>>
>>40985101
also a paper that while not directly applicable could actually be the path forward for unique voice acting asr
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
https://arxiv.org/abs/2404.04295
>This paper proposes Transducers with Pronunciation-aware Embeddings (PET). Unlike conventional Transducers where the decoder embeddings for different tokens are trained independently, the PET model's decoder embedding incorporates shared components for text tokens with the same or similar pronunciations. With experiments conducted in multiple datasets in Mandarin Chinese and Korean, we show that PET models consistently improve speech recognition accuracy compared to conventional Transducers. Our investigation also uncovers a phenomenon that we call error chain reactions. Instead of recognition errors being evenly spread throughout an utterance, they tend to group together, with subsequent errors often following earlier ones. Our analysis shows that PET models effectively mitigate this issue by substantially reducing the likelihood of the model generating additional errors following a prior one. Our implementation will be open-sourced with the NeMo toolkit.
https://github.com/NVIDIA/NeMo
>>
>>40985180
I'm not sure if there are word-level timestamps for the NVIDIA ones.
>>
Page 9 bump.
>>
>>40985372
>>40985101
I managed to get 94% of the lines "aligned" with a transcription on Rainbow Roadtrip with this method but the timestamps are all wrong because whisper_timestamped has a tendency to drift without voice activity detection (VAD). Unfortunately with VAD enabled it basically does not recognize Pinkie Pie as a speaking voice. So I ended up going back to using cross correlation, but with windowing around the expected time offset, and that seems to be promising.
>>
>>40965450
The end reminds me of a car hitting hard on the brakes.
>>
>>40986757
>>40982634
Okay, here is an updated extra_labels_index.json with new timestamps that match the provided audio: https://files.catbox.moe/p57n0j.json

There is one caveat in that all of the HH_MM_SS timestamps in the file names become inaccurate. In this new release (remember to copy your config over):
https://github.com/effusiveperiscope/PonySorter-B/releases/tag/20241004updated

The timestamps for the -exported- file names (as well as for the label file) are based off the "actual" timestamp.

For anyone who cares, here is the code I used for alignment:
https://github.com/effusiveperiscope/PPPDataset/blob/main/data_realigner.ipynb
And pic related is a time offset graph for the FiM movie (you can see the graph for Rainbow Roadtrip in the notebook). Seems to be pretty consistent with a different playback rate (the x axis is line index, so it's not exactly linear with respect to time but it would be pretty close). OTOH, the Rainbow Roadtrip time offset graph is very clearly stepwise, probably due to different commercial breaks.
>>
everyone is a fucking rPHetard o2vwn this forum
>>
>>40987180
everyone is a fucking retard here including me
>>
>>40987180
Can you translate that to english?
>>
>>40987180
how fat are your fingers dude
i cannot possibly conceive fingers fat enough to hit the keys that you did
at this point i just feel bad for you
>>
>>40987618
it might be that ogre poster with a caged onion from a decade ago
>>
>>40987642
it completely blows my fucking mind. it honestly does. i can't believe it.
>rPHetard
>o2vwn
>>40987180
please, please, think about this, please. i'm begging you to think about this. please.
>>
save
>>
>>40951924
There is now an open source reproduction of that paper:
https://github.com/huggingface/parler-tts
https://huggingface.co/spaces/parler-tts/parler_tts_mini

It takes two text inputs: the text to say and a prompt describing the speaker, speaking rate, recording environment, audio quality, etc.
All datasets, code (including train/finetune), and models will be released under a permissive license. So far, they've released a 600M model trained on 10.5k hours. They're working on scaling it to 50k hours.
>>
>>40988141
hmm, its not exactly the same thing, its more like a nicer TTS tool minus option to choose an exact specific voice
>ask for angry female
>it just talk faster
Other than that, it's nice to see alternatives, sadly "singing" or "musical" are not tags that are affecting the output in any way HOWEVER there is always a hope that someone out there can train a new upgraded model.
I still think it will take a year for someone to recreate the suno ai, and once that happens we will see entire mare albums being made each month.
>>
>>40965026
>entire point of software is to generate something using something as a base of inspiration
>somehow this is a copyright issue
Damn, I guess I can't start an italian symphonic power metal band for fear of it being a "copyright infringement" on Rhapsody of Fire's existence.
>>
File: 110424 aud.png (101 KB, 810x437)
101 KB
101 KB PNG
I can't wait for a better tools to be made soon, since fixing the strange filter effect is not possible and lines need to be re-sang again.
>>
>>40987618
Or maybe he was drunk as fuck.
>>
>amre at 10
>>
>>40990187
Shamefur dispray.
>>
>>40988141
Interesting. I wonder how far the natural language description can be pushed. Spitballing here: one unique feature of using audio from a TV show is that if we had a show pseudo-script for the episodes (i.e. describing what is going on between the characters) it might be possible to generate synthetic descriptions for each speaking line that could give much more granular control over delivery. I'm also interested if anyone comes up with a solution for long-form inference with each line conditioned on the last, since that would be a natural fit for show audio too.
>>
>>40990187
Almost again.
>>
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
https://arxiv.org/abs/2404.06690
>Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge in the field. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. CoVoMix is capable of first converting dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers. These token streams are then fed into a flow-matching based acoustic model to generate mixed mel-spectrograms. Finally, the speech waveforms are produced using a HiFi-GAN model. Furthermore, we devise a comprehensive set of metrics for measuring the effectiveness of dialogue modeling and generation. Our experimental results show that CoVoMix can generate dialogues that are not only human-like in their naturalness and coherence but also involve multiple talkers engaging in multiple rounds of conversation. These dialogues, generated within a single channel, are characterized by seamless speech transitions, including overlapping speech, and appropriate paralinguistic behaviors such as laughter.
https://www.microsoft.com/en-us/research/project/covomix/
obviously from microsoft so as usual for their voicegen stuff no weights. pretty good though from the examples
>>
Posting this from wsg

https://github.com/huggingface/parler-tts
>>/wsg/5503058
I discovered an AI program called Resemble Enhance. I used it to clean the vocals I downloaded from youtube. I uploaded the clean vocals to elevenlabs. Here are the results.
Franklin D. Roosevelt
https://vocaroo.com/1gSsiKrYtjkA
Harry S. Truman
https://vocaroo.com/1nEqj1b4XOaV
Jackson Beck (Superman announcer)
https://vocaroo.com/1kgRi6ICDNCd
Mr. Delicious
https://vocaroo.com/1ggEgkhUPRRk
I overlayed the FDR, Harry Truman, and Jackson Beck audio files with a song called Spidey Meets His Girl via kdenlive
https://m.soundcloud.com/udi-harpaz-composer/spidy-meets-his-girl?in=udi-harpaz-composer%2Fsets%2Fspiderman-by-udi-harpaz
Then I uploaded the kdenlive output wav files to resemble enhance to remove the hissing and scratching sound. It can also remove reverb this way. Here's an example.
Before:
https://vocaroo.com/1kvyT6Wh8A2v
After:
https://vocaroo.com/1i7q7woh25jt
>>
>>40991829
>https://github.com/resemble-ai/resemble-enhance
shit, wrong github link
>>
So the is now a copycat of 'suno', called 'udio' that does exactly same thing.
>free beta
Im guessing it will go full aiDungeon Mormon way and try to get people hooked up than slam them with 15$+ subscription or some other bullshit. And additionally, there is still no weight model included, so it goes straight into gay and fake category like the rest of those subscription services.
>>
File: OIG1.H8Cktot.hP.ItFdkf55i.jpg (165 KB, 1024x1024)
165 KB
165 KB JPG
>>
>>40993044
Artificial lewd is still lewd.
>>
>>40993206
But what if he generated artificial consent along with the artificial mare?
>>
I wanna make an RVC of a vocaloid, but the rawest form of audio I have is wav samples of all the phonemes for that vocaloid. Chopped up. In other words, it's not really singing moreso it's just chopped up samples. I've tried to put them all together in one audio file, but the end result is choppy. When I have them be used separately from each other, the result is a hot mess. If only I could just... have the voice sing naturally for the input audio, but it's a vocaloid, and the raw samples are very short in length, like the longest is one second long. I'm at a loss and don't know what to do. Any help to do this right would be massively appreciated.
>>
>>40994089
This may not be the best approach however wouldn't make sense to look for songs with the Vocaloid character and apply one of the vocal remover programs on it to get as close to "raw" audio output as possible, than place those clips in audacity to both chop them up into workable 10~ seconds clips as well as to pick out the better sounding clips (as to remove the chaff from grain).
I am not as familiar with the gaming side of Vocaloid but I would imagine there are at least some games that are using the officially approved voices, so maybe it would be worth it to scout out some forums for extracted raw audio from them ?
>>
>>40994089
>>40994108
I wonder if the artists would be willing to share the raw vocaloid data for their songs or other projects? Would probably be the best way to get clean speaking/singing audio if you're not skilled at making your own.
>>
>>40994195
On some rare occasion there are people willing to share raw project files, so that's not completely impossible, it all depends on how much time you are willing to spend to get the best quality dataset VS quickly grabbing a lower quality clips and start working on training imminently.
>>
>>40987060
Got an error while trying to load the fim movie:
>Signature fim_movie_demu1 from audio but no corresponding label
Not sure what the "label" is supposed to be.

>timestamps in the file names become inaccurate
Not a big deal, they're mostly there for human readability and to prevent duplicate filenames. Shouldn't have any impact on the AI side of things.
>>
>>40994282
Did you preserve the config settings (e.g. it has to point at extra_labels_index.json instead of episodes_labels_index.json)
>>
>>40994312
Yeah, config looks correct.
>>
>>40960343
https://www.youtube.com/watch?v=13IW_KDCkJo
It concerns me how one of my favorite songs from my teen years has come to describe 4chan and society as a whole when it come to anything but contemporary politics and drama of any variety.
it's impossible, but I'd find it interesting to have a mare voiced cover of this
>>
Same anon as the one who asked about the RVC stuff here.

I remember there was someone who was able to make a unique sound to a vocaloid and made them sound like a human singer through RVC. All I know is that they used the vocaloid samples for it, and I know nothing else of the process. They have since deleted the video showcasing the results, but it still intrigued me. I want to make a unique sound for the vocaloid to sound more human, not just a vocaloid ported to RVC. Are my ambitions too high here? Is this even possible?
>>
>>40994335
I don't know how this occurred. My only guess is that somehow I managed to upload a non-updated version of the executable here >>40987060. Deleted the old release and trying this again.
https://github.com/effusiveperiscope/PonySorter-B/releases/tag/20241204updated
>>
>>40994365
Got the same error again. Could you tell me what the "label" is it's looking for? Perhaps I've just got something in the wrong place.
>>
>>40994363
Same anon, and I think I was misunderstood. I don't mean the raw samples from a vocaloid singing in the editor... I mean the literal raw samples that were used for recording the vocaloid, which are, surprisingly enough, VERY easy to extract from the vocaloid itself.

https://voca.ro/1djZ7LBMphZ5

Above is all of Miku's raw samples compiled together. This is what I'm talking about when I mean "samples".
>>
>>40994416
I can't reproduce this error. The "label" is supposed to be from the labels index. However the signature should not even be "fim_movie_demu1". How is your in_audio structured?
>>
>>40994509
>>40994416
Actually, can you post the full traceback + your current save file? I think I might know what is happening.
>>
>>40994521
https://files.catbox.moe/q017lo.json

Full traceback:
pygame 2.5.2 (SDL 2.28.3, Python 3.10.13)
Hello from the pygame community. https://www.pygame.org/contribute.html
[2024-04-12 21:54:59,786] [INFO] Loading project P:/PVPP/Tools/Pony Sorter 5c/EQG.json
[2024-04-12 21:55:00,095] [INFO] Loading fim_rainbow roadtrip
orig 3632.7363125
demu0 3632.7363125
demu1 3632.7363125
[2024-04-12 21:55:18,574] [INFO] Loaded fim_rainbow roadtrip
[2024-04-12 21:55:20,353] [INFO] Loaded P:/PVPP/Tools/Pony Sorter 5c/EQG.json
[2024-04-12 21:55:26,361] [INFO] Loading fim_movie_demu1
[2024-04-12 21:55:26,437] [WARNING] Signature fim_movie_demu1 from audio but no corresponding label
Traceback (most recent call last):
File "gui.py", line 496, in load_selection
File "gui.py", line 428, in load_selection
File "core.py", line 140, in load_sig
TypeError: 'NoneType' object is not iterable
>>
>>40994533
Why is fim_movie only 418.0 MB?
>>
File: 3098740.jpg (22 KB, 290x292)
22 KB
22 KB JPG
>>40994545
Because it only has fim_movie_demu1 in it. That's probably the issue.
>>
>>40994545
>>40994561
I put the original audio in there and it works now. Got a "hash mismatch" warning, not sure how significant that is. Audio seems to play fine.
>>
>>40994585
A hash mismatch warning may indicate that a file was only partially downloaded/uploaded
>>
>10
>>
File: 1159369.png (501 KB, 1279x717)
501 KB
501 KB PNG
>>40995125
>10, mares edition
>>
>>40996134
>Not showing 10 mares
1 job
>>
>>40994597
Finished all the reviewing, but now got a new error on trying to export:

Savefile - https://files.catbox.moe/2a39wk.json

Terminal:
pygame 2.5.2 (SDL 2.28.3, Python 3.10.13)
Hello from the pygame community. https://www.pygame.org/contribute.html
[2024-04-13 13:36:11,278] [INFO] Loading project P:/PVPP/Tools/Pony Sorter 5c/EQG.json
[2024-04-13 13:36:11,576] [INFO] Loading eqg_better together_s02e04
orig 177.6775
demu0 177.6775
demu1 177.6775
[2024-04-13 13:36:12,584] [INFO] Loaded eqg_better together_s02e04
[2024-04-13 13:36:14,319] [INFO] Loaded P:/PVPP/Tools/Pony Sorter 5c/EQG.json
['eqg_better together_s02e04', 'eqg_better together_s02e05', 'eqg_better together_s02e06', 'eqg_better together_s02e07', 'eqg_dance magic', 'eqg_forgotten friendship', 'eqg_friendship_games', 'eqg_legend_of_everfree', 'eqg_mirror magic', 'eqg_movie magic', 'eqg_rollercoaster of friendship', 'fim_rainbow roadtrip', 'fim_movie']
[2024-04-13 13:36:31,004] [INFO] Processing eqg_better together_s02e04
Traceback (most recent call last):
File "gui.py", line 104, in export_all_audio
File "core.py", line 277, in export_audio
File "utils.py", line 55, in path_reparse
File "utils.py", line 44, in label_reparse
File "utils.py", line 37, in convert_decimal_seconds_to_hh_mm_ss
TypeError: unsupported operand type(s) for /: 'str' and 'int'
>>
>>
>>40996631
Fixed
https://github.com/effusiveperiscope/PonySorter-B/releases/tag/20241304updated
View stats should also work now
>>
bmup
>>
File: OIG1.vy9OgD8DBpp9zsAjQoVV.jpg (193 KB, 1024x1024)
193 KB
193 KB JPG
>>
https://youtu.be/Tk8tDpweJB4?si=lbIcDfJw1hE3ABk2
Just got recommended this anyone made a AI version yet
>>
Up.
>>
File: check.png (290 KB, 3000x3000)
290 KB
290 KB PNG
>>40921076
The master file has now been updated with the newly cleaned data. If you use the data in the master file, you should re-download the whole thing. Also give it a quick look-over to ensure nothing's missing.
https://mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig (same link as before)

I still have a local copy of the old version that I'll hold onto for a while, just in case something goes wrong with the new version.

>>40997261
episodes_labels_index_updated:
FiM - https://files.catbox.moe/4vscco.json
EQG + movie - https://files.catbox.moe/8kp84w.json
These should cover all of the reviewed data, can you use these to make a graph showing the overall changes of clean/noisy/very noisy?
>>
>>40999872
oh boy, there is just enough data to train Iron Will rvc model on clean + noisy files.
>>
>>40994436
Same anon here. Sorry if I'm being intrusive, I just thought that with this being an audio centered thing, people could help. If I can't be helped here, where could I go to get the assistance I need for my issue?
>>
>>41000255
How have the previous answers to your questions not been able to address your issue?
>>40994195
>>40994213
>>
>>41000300
Because they're talking about samples as in RENDERS of the vocaloids, and not the actual extracted vocal samples used to record the vocaloid in the first place. If that makes sense.
>>
>>41000466
Why do you want to train off of extracted vocal samples rather than renders?
>>
>>40999872
There seem to be a few gaps in the data, so I am going to have to check things over.
>>
>>41000507
First gap: s1e1 to s1e16 (consistent with the "two different save files" problem we had earlier)
Second gap: s5 (the labels index you posted for s5 seems to only contain modified data for s4.) However this does not seem to affect the actual audio files. In any case I will be downloading the actual master file so I can check against that as well.
>>
>>41000466
I think that's a lot harder (impossible?) to achieve with just a model. The model would have no real context on how each sample should flow into each other to sound natural.
>>
>>41000220
https://files.catbox.moe/aokhto.mp3
https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/MLP_Iron_Will_GA
2m8s (with the use of all the clean lines and most of the noisy lines) have been used to train this RVC Iron Will model. I pushed the training to run for 350, as I feel the extra steps do help out when the dataset is so short.
Now it's time to lurk for another obscure background ponies that may have gained additional clean lines for production of training datasets.
>>
>>41000730
Harder/Impossible? Is there any way to make them smoother somehow? Or anything like that? Or something that make them less choppy? The reason I insist on using the samples is because they’re the audio files with no vocaloid engine noise in them. No matter what render I could use, it’ll always have vocaloid engine noise at least in some capacity.
>>
Did this masterpiece ever get remade with better models?
https://vocaroo.com/kwLL9EyTbAQ
>>
>>41000960
Sorry not really sure how you'd accomplish that with AI. Maybe you could manually do something with the audio files and map them onto the vocaloid instructions and train on those outputs?
>>
>>41001070
I don’t know if that would work, nor where to start. The weirdest thing is that some people have done it before, and done it considerably well, to the point there’s a very human-like and dynamic nature of the voice not present in the samples for the RVC models. The only thing that’s wrong here is that unfortunately, they don’t make their methods public. So anyone interested in replicating it is essentially scrambling around for answers and leads.
>>
>>40966045 Since there is no response to this post for over a week I guess the idea can be consider ded
https://suno.com/song/7a5c06b1-dafa-4717-967f-6416a3de2b0a
https://vocaroo.com/1iNd7HL85PoL
Here is song inspired by the green text (37557739) of Anon's dream about mare with stage name Black Sunset singing metal power song in some indie Equestria band.
>catbox is temporary ded
other than that, I can confirm that suno devs starting to behave little bit more kosher, last week one could easly generate 2m+ song, and now the song generator has given me 1m20s output.
>>
>>41000966
Not as far as I know.
>>
File: final cns split.png (117 KB, 4800x3000)
117 KB
117 KB PNG
>>41000580
>>41000507
>>40999872
Here is the "corrected" version of the episodes_labels_index.json for the FiM files: https://files.catbox.moe/g0mg81.json

From analysis of the master file there were no skipped FiM episodes aside from the ones that are supposed to be skipped (special source/outtakes), at least according to the generated labels, and all lines inside the FiM label files correspond to an existing file in the master file. *However* the current Label Files directory does not contain label files for EQG/movies/specials as well as the older label files for audio we didn't handle like songs.txt, mobile game, and Other.

Over the entire observed data: Of 58578 total lines, 14927 noise ratings were modified (~25%). The clean/noisy/very noisy split was 17353/19629/21596 prior, 21599/22475/14504 post.
>>
File: 1702557325855805.gif (71 KB, 216x195)
71 KB
71 KB GIF
>>
>>41001932
Not sure if this is possible but could you create a text file that list a comparison of all characters lines in hours/minutes/seconds? Eg:
Twilight C:3 hours N:2 hours VN: 1 hour
Fluttershy C:2 hours N: 45 minutes VN: 20 minutes
Minuette C: 1 minute N: 25 second VN: 10 second
>>
Page 10 bump.
>>
>>41002579
Would a csv be better? Here is a CSV for Clean/Noisy/Very Noisy split in seconds, ordered by Clean+Noisy, calculated across FiM episodes + EQG data + movie and specials. Duration is calculated from label timestamps.
https://files.catbox.moe/xu0age.csv
>>
>>41003547
cheers, this format is much better
>all of fim has 250 spoken characters
huh, thats about the same number as I would expect of npcs in a pretty decent rpg.
>>
>>41001932
>all lines inside the FiM label files correspond to an existing file in the master file.
Does that mean the missing stuff from >>41000580 can be disregarded, or is there something still to fix there?

>the current Label Files directory does not contain label files
Fixed. Forgot to upload the old unchanged labels earlier.

>pic
Looks like a worthwhile improvement, especially considering that a lot of the "noisy" lines are now less noisy.
>>
>>41003813
Some one-offs were not annotated I think.
>>41003960
>Does that mean the missing stuff from >>41000580 # can be disregarded, or is there something still to fix there?
Yes, that was just inside the label index.
>>
Up from page 9.
>>
>>41004630
from 10
>>
>>41005291
Not so good.
>>
File: 6d42371089b9046c.png (484 KB, 720x720)
484 KB
484 KB PNG
So what is everyone working on? I'm cracking my head on figuring out some punk song lyrics from perspective of Octavia.
>>
https://huggingface.co/spaces/Xenova/musicgen-web
not new but a web version for that one guy who seems to like musicgen
>>
>>41006687
trying to generate 20s music make the generator shit itself, and the 10s samples are bit meh (I remember using the offline version a year ago and it work much better than).
>>
>>41001527
need more pony power metal
>>
File: OIG4.0KTmUS14y4ixYljSmBNN.jpg (143 KB, 1024x1024)
143 KB
143 KB JPG
>>
>>41007990
A loot of artificial booping happening here lately.
Not that I complain.
>>
>>40999872
I uploaded a copy of the Master File here: https://drive.google.com/drive/u/2/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQ
Sort of. I fixed one typo: in Special Source, one of the folders says "Freindship" instead of "Friendship". It should technically also be "Rollercoaster" rather than "Roller Coaster". I plan to do a more thorough check tomorrow for typos and potentially other errors.

The Special Source/Luster Dawn folder is empty. Is it supposed to be?
>>
>>41007304
That would be nice, and to tell the truth I look forward that isn't just another wubstep remix.
Now this makes me wonder, what is the rares type of pony musical genre ? Metal, jazz and ska seems pretty rare.
>>
Long-form music generation with latent diffusion
https://arxiv.org/abs/2404.10301
>Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure. We show that by training a generative model on long temporal contexts it is possible to produce long-form music of up to 4m45s. Our model consists of a diffusion-transformer operating on a highly downsampled continuous latent representation (latent rate of 21.5Hz). It obtains state-of-the-art generations according to metrics on audio quality and prompt alignment, and subjective tests reveal that it produces full-length music with coherent structure.
https://stability-ai.github.io/stable-audio-2-demo/
https://github.com/Stability-AI/stable-audio-tools/
stable audio 2 paper. still no weights
>>
>>41009282
Basically anything that isn't EDM, hyperpop, or dubstep. Swing music that's actual swing music and not "electroswing" yeah let's just take this 1940s music and ruin it with electric bassy drums, then ruin it further by chopping it up so it's repeating the same little bit, and to top it off let's master it by making the electric drum beat louder than the actual music! is also quite rare.
>>
>>41009637
out of curiosity what is your opinion on https://youtu.be/N7wTBGP4UFs?si=W4w_mSo6vQj9QpwD?
>>
https://huggingface.co/spaces/pyp1/VoiceCraft_gradio
webui for voicecraft
>>
>>41009954
input
https://vocaroo.com/1nItsHLVEtqp
output
https://vocaroo.com/18qK1E6F1CnJ
Now this may have create some possibilities in the future, however they would need to update the output quality, the 16000Hz suck ass pretty hard.
>>
>>41008753
>Freindship
>Roller Coaster
Fixed. Just simple typos.

>Special Source/Luster Dawn folder is empty
My records show the same empty folder existed in the previous master file version, so not a new issue that's suddenly been introduced. Likely a temp folder from the first pass at processing the S9 leaks that I forgot to delete.
>>
File: OIG3.f9imj0ENUK7Fg6HBW5Za.jpg (104 KB, 1024x1024)
104 KB
104 KB JPG
>>
File: 48823546.21_image.png (386 KB, 637x637)
386 KB
386 KB PNG
messing around with sovits4 again and how feasible would it be to try and recreate all the vocal effects taylor uses and blank space? here is a small example of what I've got so far using dash's voice to replace vocals
https://files.catbox.moe/0etyeh.mp3
don't even know what vocal effects are used in blank space
>>
>>41010125
>Sliced Dialogue/Label files/s01e07_Music.txt
Everything else using a lowercase '_music', but this one uses an uppercase M.
>Sliced Dialogue/EQG/EQG Roller Coaster of Friendship
Should be "EQG Rollercoaster of Friendship" to match the updated name in Special Source.

Some transcripts use "Flurry Heart", some use "Flurry".
>./fim_s09e01_original.txt:940.685169 941.573000 00_15_41_Flurry Heart_Anxious_Very Noisy_mama!
>./s09e01.txt:940.685169 941.573000 00_15_41_Flurry Heart_Anxious_Noisy_mama!
>./fim_s09e01.txt:940.685169 941.573000 00_15_41_Flurry Heart_Anxious_Very Noisy_mama!
>./s09e01_demu0.txt:940.685169 941.573000 00_15_41_Flurry Heart_Anxious_Noisy_mama!
>./fim_s07e03.txt:683.600000 684.812919 00_11_24_Flurry_Happy_Very Noisy_ta da!
>./s07e03.txt:683.600000 684.812919 00_11_24_Flurry_Happy_Very Noisy_ta da!
>./fim_s07e03_original.txt:683.600000 684.812919 00_11_24_Flurry_Happy_Very Noisy_ta da!
>./s07e03_master_ver.txt:683.600000 684.812919 00_11_24_Flurry_Happy_Very Noisy_ta da!

Typo (Applejack223 should be Applejack):
>./s01e18.txt:5.484631 7.017492 00_00_05_Applejack223_Neutral_Noisy_we're almost there, youngins!
>./s01e18_demu1.txt:5.484631 7.017492 00_00_05_Applejack223_Neutral_Noisy_we're almost there, youngins!

Typo (Mrs.Cake should be Mrs. Cake, with a space for consistency with other labels)
>s02e10.txt:366.058770 370.764964 00_06_06_Mrs.Cake_Happy_Noisy_When we found out it was your birthday, we couldn't resist trying out a new recipe!
>fim_s02e10_original.txt:366.058770 370.764964 00_06_06_Mrs.Cake_Happy_Noisy_When we found out it was your birthday, we couldn't resist trying out a new recipe!
>fim_s02e10.txt:366.058770 370.764964 00_06_06_Mrs.Cake_Happy_Noisy_When we found out it was your birthday, we couldn't resist trying out a new recipe!
>s02e10_demu0.txt:366.058770 370.764964 00_06_06_Mrs.Cake_Happy_Noisy_When we found out it was your birthday, we couldn't resist trying out a new recipe!

Typo (Jet Set has an extra space at the end of his name)
>./fim_s02e09_original.txt:144.263221 147.820380 00_02_24_Jet Set _Neutral__We saw you from across the cafe and just had to find out.
>./s02e09_master_ver.txt:144.263221 147.820380 00_02_24_Jet Set _Neutral__We saw you from across the cafe and just had to find out.
>./s02e09.txt:144.263221 147.820380 00_02_24_Jet Set _Neutral__We saw you from across the cafe and just had to find out.
>./fim_s02e09.txt:144.263221 147.820380 00_02_24_Jet Set _Neutral__We saw you from across the cafe and just had to find out.
>>
>>41010125
>>41011157
In Rainbow Roadtrip, there are some weird "Si" emotion tags.
>00_22_01_Mayor Sunny Skies_Sad Si_Noisy_then fences went #up#, we lost track of our neighbors, each year passing dimming spirits all #around#.txt
>00_22_09_Mayor Sunny Skies_Sad Si_Very Noisy_the happy days came to an #end#, and nopony had time to spend to gather in the #town#.txt
>00_22_18_Mayor Sunny Skies_Sad Si_Very Noisy_i thought i knew exactly what the #festival# #needed#.txt
>00_22_26_Mayor Sunny Skies_Sad Si_Very Noisy_a bigger better #rainbow#, would surely make them see #it#.txt
>00_22_35_Mayor Sunny Skies_Sad Si_Very Noisy_but the extra magic was too much for the #rainbow# #generator#.txt
>00_22_42_Mayor Sunny Skies_Sad Si_Very Noisy_and i'm the one who brought the #rainbow#, to an end.txt
>00_22_54_Mayor Sunny Skies_Sad Si_Noisy_that's how our #town#, our little pony #town#, that's how our town saw the #end#, of the #rainbow#.txt

For consistency, the character name should be "Petunia Petals" rather than "Petunia" in Rainbow Roadtrip:
>00_42_27_Petunia_Neutral_Very Noisy_not all of it was.txt

And "Mr. Moody Root" should be "Moody Root":
>00_38_40_Mr. Moody Root_Annoyed_Very Noisy_who wants to know_.txt
>>
I pushed the code I'm using to validate the Master File here:
https://github.com/synthbot-anon/horsefm-lib
>>
>Page 10
>>
>9 bump
>>
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
https://arxiv.org/abs/2404.10667
>We introduce VASA, a framework for generating lifelike talking faces with appealing visual affective skills (VAS) given a single static image and a speech audio clip. Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness. The core innovations include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos. Through extensive experiments including evaluation on a set of new metrics, we show that our method significantly outperforms previous methods along various dimensions comprehensively. Our method not only delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512x512 videos at up to 40 FPS with negligible starting latency. It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.
https://www.microsoft.com/en-us/research/project/vasa-1/
from microsoft so no weights ever I'm sure. obviously human faces but just posting for potential in the future for using such tech for animated faces
>>
had a thought yesterday and figured i'd ask here
most AI work seems to be done on either windows or linux, and seems to be 'shackled' for lack of a better word to the limitations of those operating systems
what would have to be done to build a brand new OS from the ground up specifically for AI development/training/operation and would a dedicated OS result in any noticable or worthwhile improvement?
>>
File: terri 1665282602622.png (809 KB, 2310x2147)
809 KB
809 KB PNG
>>41012620
>making custom OS for pony ai development
Technically doable but I do not think even with all the Tuplafags on the board we have enough schizo power to make that into reality.
>>
>>41011157
Changed "Flurry Heart" to just "Flurry" for s9e1. Left s7e3 version as-is.
Fixed the "Applejack223" typo.
Fixed the "Mrs.Cake" typo.
Fixed the "Jet Set _" typo.

>>41011159
Changed all "Si"s to "Singing".
Fixed "Petunia" typo.
Fixed "Mr. Moody Root" typo.

That should cover everything.
>>
Late night preservation post.
>>
>>41012620
Compatibility with all other AI research is the highest priority for speeding up development, and creating a new OS would make that essentially impossible.
The OS is mostly a pass-through for all of the software that runs on it. It's not much of a bottleneck, especially on Linux, and especially with docker.
>>
>>41012699
The FiM folder seems to have a copy of the Special Source EQG Short folder. Otherwise, that seems to be everything for now.
The updated Gdrive clone is here: https://drive.google.com/drive/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQ?usp=drive_link
I uploaded a copy of the dialogue dataset to HuggingFace as well: https://huggingface.co/datasets/synthbot/pony-speech
I updated horsefm-lib with code for pushing the data to HuggingFace: https://github.com/synthbot-anon/horsefm-lib/blob/main/notebooks/export_data.ipynb

I realized that my clone of Master File 2 is outdated. I'm fixing that now. I'll check that for errors too.
>>
>>41012699
In Rainbow Roadtrip, the Master File has some singing lines from 00_21_05 to 00_22_54. Some of these seem to be missing from the Songs folder in Master File 2. For example, this is a singing line that only exists in Master File 1:
>00_21_05_Mayor Sunny Skies_Happy Singing_Very Noisy_they planned for weeks, cooked for days, celebrated fifty #ways#, so everypony would gather here, in our town at the end of the #rainbow#..flac

Typo (tag should be Singing rather than Singingnging):
>00_20_00_Mayor Sunny Skies_Happy Singingnging_Noisy_next-door neighbors chatting, over white wood fences, stopping on the street to say #hello#..flac
>00_20_00_Mayor Sunny Skies_Happy Singingnging_Noisy_next-door neighbors chatting, over white wood fences, stopping on the street to say #hello#.txt

>>41013398
The Master File 2 gdrive clone is here: https://drive.google.com/drive/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQ?usp=drive_link
The singing data is on HuggingFace here: https://huggingface.co/datasets/synthbot/pony-singing
>>
File: OIG1.UrRXIjP4TpSLMAN.FwAJ.jpg (162 KB, 1024x1024)
162 KB
162 KB JPG
>>41012943
>>
>>41012699
Errors in the SFX & Music labels:
https://ponepaste.org/9979

I'm not done finding all of them yet. There's a large list of tags at the bottom of the paste that I still need to read through.
>>
>>41013861
That's a weapons grade cute robo flutters.
>>
File: OIG2.EpDP2bHi1OlMqfHAQCNg.jpg (154 KB, 1024x1024)
154 KB
154 KB JPG
>>
File: 1700452279968348.jpg (153 KB, 1280x960)
153 KB
153 KB JPG
>>40954981
>>40957582
https://www.youtube.com/watch?v=TWcajuXywHA
So from my experimentation this is what I've learned. Trying to run the extracted vocals direcly on rvc/sovits creates weird result due to the model is applying random reverb, audio effects and duo vocals layered on top of each other, and using the talknet to create a somewhat uniform effect is somewhat difficult as the pitch levels are jumping all over the place.
I've tried to fix above with reducing more derped and un-salvageable lines within the rvc on my own but the pitch difference between them was pretty noticeable, so as final attempt at saving it I run the end result within audiotune filter and it seem to calm down the random pitch ups and downs.
The more frustrating part of it was trying to get Pinkie voice sing a tune that is slow, calm and bit sad, all the three things that are very opposite of what would be expected from her type of voice.
A best solution here would be to have this kind of tech offline and just train the singing part of the model to understand what Pinkie sounds like to just get the exact result from the get go, but that's most likely will not happen until next year.
>>
File: 1709595004589913.gif (1.83 MB, 306x306)
1.83 MB
1.83 MB GIF
>>41011079
ayo, this shit slaps. do you got any more dash covers? :)
>>
>>41013398
Removed duplicate folder.

>>41013574
>Missing singing lines
Copied over all lines from master file to master file 2 between 21:05 - 22:54, that should fix it.

>Singingnging
Corrected typo.

>>41013995
SFX and Music.
I know there will be a load of typos/inconsistencies in there, that was all done by hand and I pretty much just made up the tagging system as I went along. I might pass on fixing most of those if they're not too bad, going through all that's gonna be very tedious.
>>
>>41011079
I know nothing about vocal effects, but this is already pretty nice for a sovits cover.
>>
>>41015066
>singing lines
I'll update my gdrive later today. The HF repos should already be up-to-date since they treat the two Master Files as one pile of data.
>SFX and Music
>I might pass on fixing most of those if they're not too bad, going through all that's gonna be very tedious.
Agreed, and I've only checked a small fraction of the tags so far.
If any code anons want to try automating some of these fixes, see the paste linked in >>41013995. These are all either mismatches, typos, or tag consistency issues between the sfx.txt/music.txt label files in Master File 1 and the .flac files in Master File 2's Music and SFX folder. Fixing them means updating labels in .txt files and changing filenames for .flac files.
>>
>>41015066
I realized that the mismatches between label files and audio files would be the hardest to programmatically fix, and I think you can fix all of those by re-exporting the data. It's these episodes that have issues:
>FiM_s01e01
>FiM_s01e02
>FiM_s01e04
>FiM_s01e05
>FiM_s01e25
>FiM_s02e01
>FiM_s02e03
>FiM_s02e05
>FiM_s02e07
>FiM_s02e14
>FiM_s02e17
>FiM_s02e20
If you can re-export those, it would be much easier for code anons to fix the rest of the errors.
>>
>>40925332
1 month late but Fluttershy and Pinkie
>>
>>41015066
The Master File 2 link in Master File 1 is broken, which might explain the mismatches. The link in the Master File mega points to:
>https://mega.nz/folder/0UhSmYAB#WBrB-qCprQTofkAhwMp5CQ
Which doesn't seem to be available. I'm using the link from the OP:
>https://mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
Which works.
>>
>>41014860
This looks it got a fair share of inspiration from ET.
>>
>Page 9
>>
Up.
>>
>>
>>40921076
My gdrive now has the updated Master File 2 plus the latest Master File updates. Same link.
https://drive.google.com/drive/u/2/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQ
And I updated my horsefm-lib to support Master File 2 for the singing lines.
https://github.com/synthbot-anon/horsefm-lib
The export_data.ipynb notebook shows how to separate the dialogue & singing lines.
https://github.com/synthbot-anon/horsefm-lib/blob/main/notebooks/export_data.ipynb
>>
>10
>>
Does anyone know what format Stable Video Diffusion needs for fine-tuning? Is there a preferred frame rate, resolution, image format? Is transparency okay? Anything else to consider?
>>
>>41020076
>>
>>41021498
-1, but still bad.
>>
File: OIG1.kb5PGzwgnigEAK.UhDWD.jpg (157 KB, 1024x1024)
157 KB
157 KB JPG
>>
>>41022450
Damn, that's a captivating eye she has here.
>>
>>41020716
NTA but i would love to know more about this too
>>
>>41022450
Another boop.
>>
File: teaser.png (672 KB, 1296x665)
672 KB
672 KB PNG
Learn2Talk: 3D Talking Face Learns from 2D Talking Face
https://arxiv.org/abs/2404.12888
>Speech-driven facial animation methods usually contain two main classes, 3D and 2D talking face, both of which attract considerable research attention in recent years. However, to the best of our knowledge, the research on 3D talking face does not go deeper as 2D talking face, in the aspect of lip-synchronization (lip-sync) and speech perception. To mind the gap between the two sub-fields, we propose a learning framework named Learn2Talk, which can construct a better 3D talking face network by exploiting two expertise points from the field of 2D talking face. Firstly, inspired by the audio-video sync network, a 3D sync-lip expert model is devised for the pursuit of lip-sync between audio and 3D facial motion. Secondly, a teacher model selected from 2D talking face methods is used to guide the training of the audio-to-3D motions regression network to yield more 3D vertex accuracy. Extensive experiments show the advantages of the proposed framework in terms of lip-sync, vertex accuracy and speech perception, compared with state-of-the-arts. Finally, we show two applications of the proposed framework: audio-visual speech recognition and speech-driven 3D Gaussian Splatting based avatar animation.
https://lkjkjoiuiu.github.io/Learn2Talk/
no weights. but pretty cool. relevant here I think though obviously you'd need to do a big fine-tune or even train a new model for pony/animation to pony 3d models
>>
File: NAI v3 mares.jpg (769 KB, 2432x1664)
769 KB
769 KB JPG
Huh, wasn't expecting NAI to ever update their ancient furry model to V3, but turns out they did.

Some pretty decent mares can result, this time without as much of washed out coloring the old one had. It can also kind of do cutie marks, interestingly. I've still got a bunch of Anlas from months ago, so I'll do more testing. Not expecting this to be a competitor to Pony Diffusion; comparable alternative to note though.
>>
>>41023564
Indeed.
>>
apt-get install -y espeak espeak-data libespeak1 libespeak-dev
apt-get install -y festival*
apt-get install -y build-essential
apt-get install -y flac libasound2-dev libsndfile1-dev vorbis-tools
apt-get install -y libxml2-dev libxslt-dev zlib1g-dev
pip install -r gradio_requirements.txt

Does anybody know how do i install this linux bullshit on windows?
>>41024137
that's nice looking amres, even the cutie marks are decently shaped and not a bunch of colorful nonsense
>>
File: file.png (9 KB, 1011x56)
9 KB
9 KB PNG
I'm getting this error when trying to download Talknet models with Haysay.

Any advice?
>>
>>41024413
Yes.
>>
LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search
https://arxiv.org/abs/2404.14063
>Evolutionary Algorithms and Generative Deep Learning have been two of the most powerful tools for sound generation tasks. However, they have limitations: Evolutionary Algorithms require complicated designs, posing challenges in control and achieving realistic sound generation. Generative Deep Learning models often copy from the dataset and lack creativity. In this paper, we propose LVNS-RAVE, a method to combine Evolutionary Algorithms and Generative Deep Learning to produce realistic and novel sounds. We use the RAVE model as the sound generator and the VGGish model as a novelty evaluator in the Latent Vector Novelty Search (LVNS) algorithm. The reported experiments show that the method can successfully generate diversified, novel audio samples under different mutation setups using different pre-trained RAVE models. The characteristics of the generation process can be easily controlled with the mutation parameters. The proposed algorithm can be a creative tool for sound artists and musicians.
https://github.com/fisheggg/LVNS-RAVE
https://huggingface.co/Intelligent-Instruments-Lab/rave-models/tree/main
audiogen stuff. examples on their github. short paper but the models were trained 6 months ago? guess they really wanted their paper in some specific conference
>>
Trying to generate audio with RVC using Starlight but I get this error...
>>
>>41025282
Try again. It happened to me too sometimes but worked well at a later attempt.
>>
File: 3349451.png (1.1 MB, 1593x1494)
1.1 MB
1.1 MB PNG
https://voca.ro/1k7cZmIGmTU9
>>
Up.
>>
>>41025913
Typing "voca.ro" in desuarchive's search box was definitely the best decision I've made today.
AŁA KURWA GRYZIE
>>
>>41025771
Still doesn't work...does file size and audio duration matter? The file I'm trying to convert is around 10 and a half minutes
>>
FlashSpeech: Efficient Zero-Shot Speech Synthesis
https://arxiv.org/abs/2404.14700
>Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large-scale zero-shot speech synthesis system with approximately 5\% of the inference time compared with previous work. FlashSpeech is built on the latent consistency model and applies a novel adversarial consistency training approach that can train from scratch without the need for a pre-trained diffusion model as the teacher. Furthermore, a new prosody generator module enhances the diversity of prosody, making the rhythm of the speech sound more natural. The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation. Our experimental results demonstrate the superior performance of FlashSpeech. Notably, FlashSpeech can be about 20 times faster than other zero-shot speech synthesis systems while maintaining comparable performance in terms of voice quality and similarity. Furthermore, FlashSpeech demonstrates its versatility by efficiently performing tasks like voice conversion, speech editing, and diverse speech sampling.
https://flashspeech.github.io/
new voice model. no weights or code. but they're a chinese team so maybe. they described their training method so might even be possible to recreate or make a custom one (like for ponies). hard to say how long it took but they used 8x H800s which iirc have half the memory bandwidth of h100s
>>
>>40921076
I uploaded the fimfarchive data here: https://huggingface.co/datasets/synthbot/fimfarchive
After pip install --upgrade datasets, you can load it with:
>from datasets import load_dataset
>dataset = load_dataset("synthbot/fimfarchive")

Here's the code for completing, converting, and pushing the dataset from the fimfarchive:
https://github.com/synthbot-anon/horsewords-lib
>>
>>40971560
The PPP turned 5 years old on April 5, happy belated birthday
>>
>>40921076
https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/MLP_Sweetie_Belle_Squeeky
https://vocaroo.com/18BYdlV0bMSa
Sweetie_Bell trained on her S1 and S2 lines, mostly the squeaky ones. The transpose will need some playing around as with some input audio you will need to set it to 12 and with others to 24.
>>
File: 3034661.jpg (233 KB, 1640x1640)
233 KB
233 KB JPG
>>41028148
>>
>>41028354
Kek.
>>
>>40921071
1111 = 15
>>
File: 1708127702548213.gif (618 KB, 473x472)
618 KB
618 KB GIF
>>
>>41028148
> 5 years
it's has been this long, huh...
>>
>>41028148
We are gathered here today, 5 years since the inception of a dream, to celebrate the union of spirits in the desperate pursuit of waifus. We honor our horsefuckers, those that clip, that build, that produce, that preach, that bicker. We honor those that inspire us, that give direction to our lives, and that walk us always forward toward our beloved mares.

Happy birthday, PPP.
>>
>>41028148
>>41029227
Jesus, it's that old already?
How did time fly so fast?
>>
File: 1777884.png (460 KB, 900x900)
460 KB
460 KB PNG
queen piccalis
https://files.catbox.moe/0guftr.mp3
>>
>>41029834
catbox borked the file
https://voca.ro/13fiEnurSQRQ
>>
>>41029227
>its been 5 years and I still haven't summoned the courage to be retarded trying to get a local tool running
>>
Good evening.

I think it was about three years ago that I asked about the possibility of vocal capture, in which an AI-generated voice maintains the timbre of the character it trained on, but follows the pitch and inflections of another recording. Has that been explored in the last three years, or is it still too far off?
>>
>>41030231
Was your post around Jan 2021? For example: https://desuarchive.org/mlp/thread/36432529/#36456897
That was just 5 months before TalkNet, which does exactly what you're talking about. Then around the start of last year there was so-vits-svc and RVC, which improved on TalkNet. So it's been explored quite a bit
>>
>>41030133
To be fair, it's just regular retardation for me. Half the thread reads like a different language for my techlet pea of a brain.
>>
>Page nein
>>
I need you to generate this with Discord's voice.
>O-OH MY CELESTIA!! I-IS THAT PRINCESS TWILIGHT SPARKLE?
>THAT'S MY FAVORITE SNOWPITY, JAZZY, SLOW BURN, BONE CHILLING, ATMOSPHERE-OOZING, TROPE-SUBVERTING, GENRE-REDEFINING, GUT-WRENCHING, SPINE-TINGLING, EMOTIONALLY TAXING, PARANOIA-INDUCING, JAW-CLENCHING, NERVE-WRACKING, CHARACTER-DEVELOPMENT DRIVEN, SOUL-SHAKING, NAIL-BITING, ANXIETY-WRITTEN, KAFKAESQUE, POST-LYNCHIAN, QUESTION-ASKING, SOCIALLY-AWARE, ETHNICALLY-DIVERSE, POLITICALLY-COGNIZANT, CULTURALLY RELEVANT, SOCIALLY-PRESCIENT, THOUGHT-PROVOKING, ARTISANALLY-CRAFTED PONY IN THE SERIES!!!
>>
>>41030864
+1
>>
>>41031700
+0
>>
>>41032035
Once more.
>>
>>40924154
>>
File: full.gif (602 KB, 600x1000)
602 KB
602 KB GIF
>>41031350
https://files.catbox.moe/ybp1gz.wav
>>
>>41029834
>>41029876
She sounds kinda eastern here. Couldn't say why though.
>>
https://github.com/jasonppy/VoiceCraft
new models
https://huggingface.co/pyp1/VoiceCraft/tree/main

Month old release 330M weights:
https://vocaroo.com/17v80p9NQi6A

Three weeks old 330M weights:
https://vocaroo.com/1aMwxaZb1jgp

Newest 330M weights:
https://vocaroo.com/1h2sj2e9Zp8Z

Newest upsampled with audiosr:
https://vocaroo.com/17Jx0xDoXz05
>>
>>41032035
-1
>>
File: OIG3.87p6TpX7HUzfOLVJ96Xa.jpg (155 KB, 1024x1024)
155 KB
155 KB JPG
>>
>>41034179
>>
File: 1712532987613611.gif (1.53 MB, 1280x720)
1.53 MB
1.53 MB GIF
>>40937305
I feel like if anyone can answer this it's clipper
hope he responds to you anon
>>
>>40937342
>pointless minigames
i have a fucking BLAST every time we have minigames they're so fun
thing i look forward to most is actually the minigames
>>
>>41034888
>>41034891
>looked at the dates of those posts
i realize im retarded im sorry bros
>>
>>40937342
I liked the minigames. I'd rather have that than an hour of autists doing a very technical in-depth presentation that is really difficult to follow.vdymr
>>
Hydrus, if haysay.ai has multispeaker StyleTTS2 models that were trained on more characters than the Mane 6, why do precomputed styles only allow me to select from the Mane 6?
>>
Is there a torrent link for fimfiction archive?
>>
>>41035656
Pretty sure there is a links to that somewhere in the scripts of the fim tool in main googledoc.
Otherwise there is this user made torrent https://www.fimfiction.net/user/116950/Fimfarchive
>>
>>41035496
Partly because I never got around to creating more precomputed styles and partly because I was struggling with getting good results for the characters I did attempt. I tried to make precomputed styles for Starlight Glimmer and Gilda, but either the emotion/trait I was targeting wasn't coming across very strong or the generated audio didn't sound much like the character, especially for Gilda. The output would oftentimes sound more like Twilight Sparkle. I did manage to get a few OK ones for Starlight Glimmer; I'll add those soon.
>>
>>41035908
thanks
>>
>>41036171
NTA.
This may be a silly question but is there a nice and simple offline UI for StyleTTS2 ?
>>
>>41036205
You can install and run Hay Say locally:
https://github.com/hydrusbeta/hay_say_ui?tab=readme-ov-file#installation-instructions

There's also an online colab here:
https://colab.research.google.com/drive/1ys8SkP-VW7CkhnwVveEGINaszG1kRaYl?usp=sharing#scrollTo=pGArrru8BpEe
It should be possible to download its python notebook file (.ipynb) and then run it on a local Jupyter environment.
>>
>>41036313
Oh, that colab link in my previous post is for the one that comes with an epub downloader. Here's the link to another colab without epub:
https://colab.research.google.com/drive/1dDwKPYc2daS3MZxpinlfyIHd2jmGiHLh
>>
>>41036205
https://github.com/effusiveperiscope/StyleTTS2_GUI
>>
>>41036322
>>41036331
So it generates the waveform directly? How much slower is it than Tacotron+vocoder?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.