/mlp/ - Pony Preservation Project (Thread 142) - Pony

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/mlp/ - Pony

Return Catalog Bottom Refresh

[Post a Reply]

Name
Spoiler?	[Spoiler?]
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File	[Spoiler?]
Please read the Rules and FAQ before posting.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
Pony Preservation Project (Thr(...) 03/19/24(Tue)20:11:34 No.40921071

File: AltOPp.png (1.54 MB, 2119x1500)

1.54 MB PNG

Pony Preservation Project (Thread 142) Anonymous 03/19/24(Tue)20:11:34 No.40921071

Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE

The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.

Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.

AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.

Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.

EQG and G5 are not welcome.

>Quick start guide:
docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.

>The main Doc:
docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
An in-depth repository of tutorials, resources and archives.

>Active tasks:
Research into animation AI
Research into pony image generation

>Latest developments:
GDrive clone of Master File now available >>37159549
SortAnon releases script to run TalkNet on Windows >>37299594
TalkNet training script >>37374942
GPT-J downloadable model >>37646318
FiMmicroSoL model >>38027533
Delta GPT-J notebook + tutorial >>38018428
New FiMfic GPT model >>38308297 >>38347556 >>38301248
FimFic dataset release >>38391839
Offline GPT-PNY >>38821349
FiMfic dataset >>38934474
SD weights >>38959367
SD low vram >>38959447
Huggingface SD: >>38979677
Colab SD >>38981735
NSFW Pony Model >>39114433
New DeltaVox >>39678806
so-vits-svt 4.0 >>39683876
so-vits-svt tutorial >>39692758
Hay Say >>39920556
Text generation colab >>40271923 >>40276284
/mlp/ image dataset >>40393331
Haysay on the web! >>40391443
SFX seperator >>40786997 >>40790270
Cream Heart model + Alltalk-tts >>40836410
Clipper investigates further data cleaning >>40860022 >>40872222 >>40890799 >>40902356
HydrusBeta working on HaySay 2.0 >>40840723
Blueblood RVC model >>40887151
AI Redub 5 Releases >>40871923

>The PoneAI drive, an archive for AI pony voice content:
drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp

>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
drive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx

>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.

Last Thread:
>>40829408

Anonymous
03/19/24(Tue)20:11:55 No.40921072

Anonymous 03/19/24(Tue)20:11:55 No.40921072

FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Main: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit

>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwq
How to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy

>Where can I find content made with the voice AI?
In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
And the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit

>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ4
2021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f
2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw5
2023 pony.tube/w/fVZShksjBbu6uT51DtvWWz

>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.

>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imitations of official voices?
No.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.

>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
pony.tube/w/mqJyvdgrpbWgZduz2cs1Cm

PPP Redubs:
pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97

Stream Premieres:
pony.tube/w/6cKnjJEZSCi3gsvrbATXnC
pony.tube/w/oNeBFMPiQKh93ePqTz1ns8

Anonymous
03/19/24(Tue)20:12:56 No.40921076

Anonymous 03/19/24(Tue)20:12:56 No.40921076

File: biganchor.jpg (161 KB, 640x640)

161 KB JPG

>>40921071

Anonymous
03/19/24(Tue)23:37:26 No.40921554

Anonymous 03/19/24(Tue)23:37:26 No.40921554

>page 9

Anonymous
03/19/24(Tue)23:41:30 No.40921562

Anonymous 03/19/24(Tue)23:41:30 No.40921562

>>40920442
so excited to try this shit out!

Anonymous
03/20/24(Wed)00:57:06 No.40921702

Anonymous 03/20/24(Wed)00:57:06 No.40921702

>that last pic in the previous edition
Kek

Anonymous
03/20/24(Wed)01:26:54 No.40921729

Anonymous 03/20/24(Wed)01:26:54 No.40921729

>>40921702
we salute the fallen

Anonymous
03/20/24(Wed)04:17:17 No.40921950

Anonymous 03/20/24(Wed)04:17:17 No.40921950

>>40921702
Sneaky.

Anonymous
03/20/24(Wed)12:04:11 No.40922552

Anonymous 03/20/24(Wed)12:04:11 No.40922552

Up again.

Anonymous
03/20/24(Wed)14:01:31 No.40922805

Anonymous 03/20/24(Wed)14:01:31 No.40922805

>>40921562
Glad you're excited! Now I can share something of Pinkie! Don't worry, according to my friend the program can do English too, Japanese just works really well.

Pinkie(With instru): https://pomf2.lain.la/f/ny70wit.wav

Pinkie(Without instru): https://pomf2.lain.la/f/v5lzgzo1.wav

Anonymous
03/20/24(Wed)14:09:51 No.40922823

Anonymous 03/20/24(Wed)14:09:51 No.40922823

Along with Pinkie, there's also an English thing of Applejack singing the same song that the first every ai voice sang... how touching. It's not perfect, but it's still really good. Oh, and good news, my friend is planning on adding Spike and Discord to the fray! Not too thrilled about Spike but Discord could be fun!

https://pomf2.lain.la/f/mcs9g4x5.wav

Anonymous
03/20/24(Wed)15:40:27 No.40922990

Anonymous 03/20/24(Wed)15:40:27 No.40922990

>>40922574
How about 'go fuck yourself'?

MareLoid Anon
03/20/24(Wed)17:12:38 No.40923239

MareLoid Anon 03/20/24(Wed)17:12:38 No.40923239

Rarity time! You know the drill, one with the instrumental, and one without! By the way, how does MareLoid sound for the program? My friend wants to keep things chill for it, but I think the name could work!

Rarity (With instru): https://pomf2.lain.la/f/124xilny.wav

Rarity (Without Instru): https://pomf2.lain.la/f/ynuagi48.wav

Anonymous
03/20/24(Wed)20:07:23 No.40923703

Anonymous 03/20/24(Wed)20:07:23 No.40923703

Bump.

Anonymous
03/21/24(Thu)00:07:19 No.40924154

Anonymous 03/21/24(Thu)00:07:19 No.40924154

File: upsies.jpg (95 KB, 1024x1024)

95 KB JPG

upsies

Anonymous
03/21/24(Thu)04:19:10 No.40924457

Anonymous 03/21/24(Thu)04:19:10 No.40924457

File: horse.png (113 KB, 256x256)

113 KB PNG

Musicians are dead. suno ai
https://files.catbox.moe/fxcjwn.mp4

Anonymous
03/21/24(Thu)04:25:55 No.40924465

Anonymous 03/21/24(Thu)04:25:55 No.40924465

>>40924457
What is the point of an AI that is incapable of generating fetish erotica?

Anonymous
03/21/24(Thu)10:15:04 No.40924910

Anonymous 03/21/24(Thu)10:15:04 No.40924910

The mare's a freak. The msre never misses a beat.

Anonymous
03/21/24(Thu)12:23:19 No.40925098

Anonymous 03/21/24(Thu)12:23:19 No.40925098

File: pinkmetal.jpg (99 KB, 516x516)

99 KB JPG

>>40924457
https://files.catbox.moe/6s2yv7.mp4

Anonymous
03/21/24(Thu)12:30:35 No.40925105

Anonymous 03/21/24(Thu)12:30:35 No.40925105

File: G-1.png (128 KB, 256x256)

128 KB PNG

>>40925098
https://files.catbox.moe/uistc5.mp4

MareLoid Anon
03/21/24(Thu)14:25:50 No.40925332

MareLoid Anon 03/21/24(Thu)14:25:50 No.40925332

File: download (1).png (514 KB, 1584x836)

514 KB PNG

Hello again! As of now, Spike and Discord have been added into what we now call MareLoid! I hope I'm not becoming annoying with all my updates, but my friend and I are really passionate about this. To give a reminder to those who don't know what I'm talking about, MareLoid, if everything goes well, is a way to make your favorite mares sing songs without reference audio, similar to how one uses Synthesizer V or Vocaloid. Allow me to share what my friend cooked up with Spike and Discord!

Discord (With instru): https://pomf2.lain.la/f/gmmapcn5.wav

Discord (Without Instru): https://pomf2.lain.la/f/oyen6178.wav

Spike (With instru): https://pomf2.lain.la/f/8j4qfieo.wav

Spike (Without instru): https://pomf2.lain.la/f/sgoqfpjp.wav

Any suggestions on who to add next? We're thinking Trixie perhaps?

Anonymous
03/21/24(Thu)14:53:26 No.40925418

Anonymous 03/21/24(Thu)14:53:26 No.40925418

>>40925332
Granny Smith just for shits and giggles

Besides that, Celestia, Luna, and Cadance would be neat.

MareLoid Anon
03/21/24(Thu)17:03:10 No.40925751

MareLoid Anon 03/21/24(Thu)17:03:10 No.40925751

>>40925418
You. You do not know how GOOD she is... I know you meant it for shits and giggles but DAMN she can sing!

Granny Smith (With instru): https://pomf2.lain.la/f/5ud769h6.wav

Granny Smith (Without instru): https://pomf2.lain.la/f/qsmyx7s0.wav

As for the princesses, they're on the list!

Anonymous
03/21/24(Thu)17:40:52 No.40925864

Anonymous 03/21/24(Thu)17:40:52 No.40925864

>>40924457
can it do pony in the style of dragonforce?

Anonymous
03/21/24(Thu)17:42:07 No.40925866

Anonymous 03/21/24(Thu)17:42:07 No.40925866

>>40925332
will there soon be a way for us to try this ourselves?

MareLoid Anon
03/21/24(Thu)18:05:26 No.40925955

MareLoid Anon 03/21/24(Thu)18:05:26 No.40925955

>>40925866
Unfortunately, it's not that simple. I don't know programming as well as my friend does, but according to them, it's not exactly at a state where they feel comfortable sharing it with anyone yet. They're... kind of like 15 in terms of perfectionism, so I can't really say if anyone but them will be able to use it. But if all goes well, and they're able to find it perfect, it may have a chance of release. We can only hope.

Anonymous
03/21/24(Thu)21:25:55 No.40926435

Anonymous 03/21/24(Thu)21:25:55 No.40926435

Up.

Anonymous
03/21/24(Thu)22:29:10 No.40926585

Anonymous 03/21/24(Thu)22:29:10 No.40926585

VOICECRAFT: Zero-Shot Speech Editing and Text-to-Speech in the Wild
https://jasonppy.github.io/assets/pdfs/VoiceCraft.pdf
>We introduce VOICECRAFT, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts1 . VOICECRAFT employs a Transformer decoder architecture and introduces a token rearrangement procedure that combines causal masking and delayed stacking to enable generation within an existing sequence. On speech editing tasks, VOICECRAFT produces edited speech that is nearly indistinguishable from unedited recordings in terms of naturalness, as evaluated by humans; for zero-shot TTS, our model outperforms prior SotA models including VALLE and the popular commercial model XTTS v2. Crucially, the models are evaluated on challenging and realistic datasets, that consist of diverse accents, speaking styles, recording conditions, and background noise and music, and our model performs consistently well compared to other models and real recordings. In particular, for speech editing evaluation, we introduce a high quality, challenging, and realistic dataset named REALEDIT.
https://jasonppy.github.io/VoiceCraft_web/
https://github.com/jasonppy/VoiceCraft
>[] Upload model weights
weights not up yet but soon maybe

Anonymous
03/22/24(Fri)03:37:56 No.40927020

Anonymous 03/22/24(Fri)03:37:56 No.40927020

>>40926585
A better TTS tools would be nice.

Anonymous
03/22/24(Fri)03:54:45 No.40927049

Anonymous 03/22/24(Fri)03:54:45 No.40927049

>>40926585
>The training of the 830M VOICECRAFT model took about 2 weeks on 4 NVIDIA A40 GPUs.
cheapest 4x A40s on vast.ai is $1.803/hr so ~$605.8

Anonymous
03/22/24(Fri)04:01:19 No.40927057

Anonymous 03/22/24(Fri)04:01:19 No.40927057

>>40927049
though that one seems a bit fucked so probably $2 is a better estimate so $672. 4x 4090s might actually be a better deal. $2.4 an hour and if the tflops ratio works out in training then it'd be ~$300

Anonymous
03/22/24(Fri)10:22:44 No.40927536

Anonymous 03/22/24(Fri)10:22:44 No.40927536

Page 10 bump.

Anonymous
03/22/24(Fri)11:29:35 No.40927675

Anonymous 03/22/24(Fri)11:29:35 No.40927675

Hey BGM, I remember you mentioning half a year ago that you were planning to make a Linky song but it's been quite a bit since then so I wanted to ask if you've changed your mind by any chance.

Anonymous
03/22/24(Fri)16:59:11 No.40928320

Anonymous 03/22/24(Fri)16:59:11 No.40928320

>>40920442
>>40923239
I only check this thread once a week, but this is a super impressive project, anon!

MareLoid Anon
03/22/24(Fri)19:03:51 No.40928583

MareLoid Anon 03/22/24(Fri)19:03:51 No.40928583

Update: The princesses have been added! Here's a preview of them singing that one new years eve song in Japanese!

Celestia: https://pomf2.lain.la/f/4mcnfwdg.wav

Luna: https://pomf2.lain.la/f/weq0rkl7.wav

Cadence: https://pomf2.lain.la/f/r8hfkabz.wav

Anonymous
03/22/24(Fri)19:58:29 No.40928704

Anonymous 03/22/24(Fri)19:58:29 No.40928704

File: lyrics.png (217 KB, 959x1012)

217 KB PNG

>>40924457
This is very cool. Literal first attempt which took me less than a minute to "create":
https://files.catbox.moe/17calb.mp3
Lyrics generated with GPT, copypasted into this thing and that's it. It's not perfect-perfect, but it's 10x better than the previous AI tools to make songs and for this I intentionally went with the first lyrics and the first song gen, no edits (those damn 'hands' in chorus, ree) or rerolls. The 2-minute trial ate the outro, though.

>>40925864
It's unfortunately not trained to mimic styles of specific bands/artists, you'd have to describe it and probably generate a bunch.

MareLoid Anon
03/22/24(Fri)20:30:42 No.40928808

MareLoid Anon 03/22/24(Fri)20:30:42 No.40928808

Surprise! The CMC are in MareLoid! This will probably be the last update for a bit, but I'll still be lurking here!

Applebloom: https://pomf2.lain.la/f/t2hcxnzz.wav

Sweetie Belle: https://pomf2.lain.la/f/exs4e1q0.wav

Scootaloo: https://pomf2.lain.la/f/xqpiatdx.wav

Anonymous
03/23/24(Sat)01:23:09 No.40929539

Anonymous 03/23/24(Sat)01:23:09 No.40929539

bump

Anonymous
03/23/24(Sat)03:32:54 No.40929685

Anonymous 03/23/24(Sat)03:32:54 No.40929685

File: twiggles.png (174 KB, 977x1024)

174 KB PNG

>>40925105
https://files.catbox.moe/fczgrs.mp3

I'm having fun with this. Prompt was "Sad song about a girl named Twilight in the style of lo-fi hip-hop in the late 1990s."

I wonder if there's a way for me to continue/redo a prompt with this specific beat.

Anonymous
03/23/24(Sat)03:48:42 No.40929696

Anonymous 03/23/24(Sat)03:48:42 No.40929696

File: Pinkie 53.jpg (110 KB, 667x800)

110 KB JPG

>>40929685
This is crazy.
https://files.catbox.moe/b73nte.mp3
https://files.catbox.moe/lr859w.mp3

prompt: a emo, industrial, dream pop, psychedelic, shoegaze, trip hop, vaporwave song about pinkie pie the pony being depressed that her friends don't love her anymore.

Anonymous
03/23/24(Sat)04:12:52 No.40929715

Anonymous 03/23/24(Sat)04:12:52 No.40929715

File: aimare.png (96 KB, 256x256)

96 KB PNG

>>40924457
"Made" a sequel song using the new v3 version. i used this sunoai just a couple of days ago and you had to pay for v3, now v3 is free. so v4 must be around the corner already.
https://files.catbox.moe/00wefs.mp3
prompt,
a sad country breakup song about a man's pony mare named applejack, being taken away from him. belonging the days where he can pet his mare once again.

Anonymous
03/23/24(Sat)11:23:43 No.40930228

Anonymous 03/23/24(Sat)11:23:43 No.40930228

Mare between the lines

Anonymous
03/23/24(Sat)11:49:27 No.40930263

Anonymous 03/23/24(Sat)11:49:27 No.40930263

>>40903672
Well this didn't exactly work. I ended up trial and erroring a whole bunch of different things and I think the predictor_encoder and style_encoder have the biggest impact on F0. Unfortunately characters were no longer recognizable even though the F0 was closer to desired.

Poopsikins
03/23/24(Sat)12:52:56 No.40930370

Poopsikins 03/23/24(Sat)12:52:56 No.40930370

File: Maudgen.png (936 KB, 1024x1024)

936 KB PNG

>>40924457
Very fun stuff, reminds me of Jukebox AI when it released on Google Colab. This is so much easier to use, I just had to make something after genning these lyrics.
https://youtu.be/gRNP3lBEHNQ

Anonymous
03/23/24(Sat)18:58:31 No.40931288

Anonymous 03/23/24(Sat)18:58:31 No.40931288

Hey marrs

Anonymous
03/23/24(Sat)22:00:16 No.40931752

Anonymous 03/23/24(Sat)22:00:16 No.40931752

Bumping the mares.

Anonymous
03/24/24(Sun)00:14:55 No.40932029

Anonymous 03/24/24(Sun)00:14:55 No.40932029

>>40928808
is it possible to make the mane six (especially fluttershy) do black metal growls with this program?

Anonymous
03/24/24(Sun)00:43:30 No.40932069

Anonymous 03/24/24(Sun)00:43:30 No.40932069

File: 2678409__safe_artist-colo(...).png (320 KB, 3524x3736)

320 KB PNG

>>40929685
https://files.catbox.moe/7via22.mp3 Stitched together a full version.

Anonymous
03/24/24(Sun)06:35:47 No.40932545

Anonymous 03/24/24(Sun)06:35:47 No.40932545

Bump.

MareLoid Anon
03/24/24(Sun)11:32:48 No.40932952

MareLoid Anon 03/24/24(Sun)11:32:48 No.40932952

I have some news to share, it's not... great.

Due to circumstances involving my friend's perfectionism, they deem the quality of the voices within MareLoid to be not up to their standards, and rather than try another AI algorithm which could give "better" results... they've decided to shelf MareLoid for the time being, meaning no new voices will be added and no new demonstrations will be made for the time being. I know some of you really wanted to try this out, myself included, but I can't just badger my friend about reconsidering, they're dead-set on shelving it for now. I'm really sorry about all of this, please don't be too mad or blame my friend too much, they just have... really high standards. That's all I can say. I'll still lurk here for stuff relating to the mares, but unless MareLoid starts up again, this is the last time you'll hear of it for now... I apologize greatly.

Anonymous
03/24/24(Sun)11:42:09 No.40932977

Anonymous 03/24/24(Sun)11:42:09 No.40932977

File: Drink.png (1.83 MB, 2000x1500)

1.83 MB PNG

>>40932952
>Perfectionist hides something great from the world because it isn't perfect
Don't worry, we're used to it.

Anonymous
03/24/24(Sun)11:48:43 No.40932995

Anonymous 03/24/24(Sun)11:48:43 No.40932995

>>40932952
inb4 mareloid anon’s friend is 15

Anonymous
03/24/24(Sun)14:29:35 No.40933431

Anonymous 03/24/24(Sun)14:29:35 No.40933431

>>40932952
>We never actually made anything, we just used so-vits-svc, you got fucking trolled, faggots
Every time.

RealDash
03/24/24(Sun)15:47:31 No.40933757

RealDash 03/24/24(Sun)15:47:31 No.40933757

>>40933744
You called? :)

Anonymous
03/24/24(Sun)16:07:30 No.40933838

Anonymous 03/24/24(Sun)16:07:30 No.40933838

>>40933757
https://files.catbox.moe/6sls9v.flac

Anonymous
03/24/24(Sun)17:25:56 No.40934086

Anonymous 03/24/24(Sun)17:25:56 No.40934086

>>40933757
>>40933838
>Post removed
...What did Jannie mean by this?

~~Like, genuinely, what on earth could the rational have been for that? Dafuq?~~

Anonymous
03/24/24(Sun)17:31:54 No.40934108

Anonymous 03/24/24(Sun)17:31:54 No.40934108

>>40934086
>...What did Jannie mean by this?
Maybe anon deleted because he was too ashamed of himself?

Anonymous
03/24/24(Sun)17:34:00 No.40934114

Anonymous 03/24/24(Sun)17:34:00 No.40934114

>>40934108
No, I'm the anon. Still ashamed of myself, but now also confused.

RealDash
03/24/24(Sun)17:38:05 No.40934129

RealDash 03/24/24(Sun)17:38:05 No.40934129

>>40934086
>>40934108
>>40934114
Same. The comment was just talking about how he liked my RD cuck audio and joked about wanting another one. Mods, are you okay?

Anonymous
03/24/24(Sun)17:56:48 No.40934194

Anonymous 03/24/24(Sun)17:56:48 No.40934194

>>40925955
>They're... kind of like 15 in terms of perfectionism
Lol, 15 does it because he wants to scam his gullible patreon paypigs.

Anonymous
03/24/24(Sun)18:42:12 No.40934320

Anonymous 03/24/24(Sun)18:42:12 No.40934320

File: 0ac4f8578351427dc94fadc7d(...).png (149 KB, 406x650)

149 KB PNG

>>40934129
>Jannie, are you okay?
>So, jannie, are you okay?
>Are you okay, jannie?
>Jannie, are you okay?
>So, jannie, are you okay?
>Are you okay, jannie?
>Jannie, are you okay?
>So, jannie, are you okay?
>Are you okay, jannie?
>Jannie, are you okay?
>So, jannie, are you okay?
>Are you okay, jannie?
>Jannie, are you okay?
>Will you tell us that you're okay?
>There's a post on the image board
>That he cucked you - a greentext, jannie
>He came into your thread
>He left cumstains, on the carpet
>Then you ran into the basement
>You struck it down
>It was your doing
>Jannie, are you okay?
>So, jannie, are you okay?
>Are you okay, jannie?
>You've been hit by
>You've been hit by, a rogue janitor

Anonymous
03/24/24(Sun)21:07:18 No.40934656

Anonymous 03/24/24(Sun)21:07:18 No.40934656

>Page 9

Anonymous
03/24/24(Sun)21:41:08 No.40934750

Anonymous 03/24/24(Sun)21:41:08 No.40934750

>>40933757
>>40934129
>>>/trash/

Anonymous
03/24/24(Sun)23:29:54 No.40934956

Anonymous 03/24/24(Sun)23:29:54 No.40934956

>>40934750
>four hours to reply
NGMI

Anonymous
03/25/24(Mon)03:28:11 No.40935401

Anonymous 03/25/24(Mon)03:28:11 No.40935401

>>40934656
+1

Anonymous
03/25/24(Mon)05:31:43 No.40935551

Anonymous 03/25/24(Mon)05:31:43 No.40935551

>>40934194
That thing is still going?

Anonymous
03/25/24(Mon)11:30:06 No.40936051

Anonymous 03/25/24(Mon)11:30:06 No.40936051

Mare yourself out

Anonymous
03/25/24(Mon)14:43:57 No.40936542

Anonymous 03/25/24(Mon)14:43:57 No.40936542

The big mare gak

Anonymous
03/25/24(Mon)14:53:37 No.40936557

Anonymous 03/25/24(Mon)14:53:37 No.40936557

>>40935551
No, 15ai shut down last year, and 15 has been completely silent since February last year. Even if he did come back, he's already been outpaced by everyone and their mother. His work is amateurish now compared to shit like ElevenLabs, so-vits, and even that Suno song software that dropped recently.

Anonymous
03/25/24(Mon)15:06:39 No.40936584

Anonymous 03/25/24(Mon)15:06:39 No.40936584

File: rainbow dense.gif (1.77 MB, 926x864)

1.77 MB GIF

>>40936557
The benefit of 15.ai was that tech illiterate people like me could use funny pony noise machine to create or enhance content. As neat as the shit you people do is, it's kind of like visiting a zoo and looking at the monkeys in the monkey pen bang rocks against eachother. That's why these threads are so inactive.

Anonymous
03/25/24(Mon)16:56:16 No.40936911

Anonymous 03/25/24(Mon)16:56:16 No.40936911

>>40936584
I mean, most of us used 15ai because it was much easier and simpler to create content, and we didn't need to resort to recording ourselves pretending to be a cute mare getting fucked and have our roommates or family overhear our degenerate shit.

Clipper
03/25/24(Mon)18:56:32 No.40937247

Clipper 03/25/24(Mon)18:56:32 No.40937247

File: Noise mix pre and post.png (60 KB, 4717x667)

60 KB PNG

>>40921076
Review of S5 is done.
https://files.catbox.moe/wofd6s.json

Clipper
03/25/24(Mon)18:58:55 No.40937252

Clipper 03/25/24(Mon)18:58:55 No.40937252

File: 949yug.png (274 KB, 1920x1080)

274 KB PNG

>>40933513
>>40934964
It's getting to be that time of year again, /mlp/con is (probably) soon to be confirmed for late June which means I'm starting to have preliminary thoughts about another PPP panel. I don't think it's quite time to start properly planning anything just yet, just time for anyone who's got any ideas or would like to be involved to start thinking about it. I'd be happy to be the main host/organiser again if no one else wants to take the reins, and assuming that happens I'll probably put out the first proper call to action once I've finished with the remaining dataset review work and /mlp/con finally actually confirms a proper date.

Anonymous
03/25/24(Mon)19:12:44 No.40937305

Anonymous 03/25/24(Mon)19:12:44 No.40937305

not sure if this is the right place to ask, but wondering if there's a more efficient process in audacity for extracting vocals from a song. right now i'm clipping the original vocals, getting the generated clip from talknet/svc, and then syncing it back to the instrumental track.

Anonymous
03/25/24(Mon)19:22:06 No.40937342

Anonymous 03/25/24(Mon)19:22:06 No.40937342

File: 3326844.jpg (381 KB, 2000x2000)

381 KB JPG

>>40937252
>/ppp/ panel again
Do we even have much to talk about beside spending 10 minutes to shill haysay and newer Pony Diffusion and 2 hours of pointless mini games?

Anonymous
03/25/24(Mon)19:28:35 No.40937362

Anonymous 03/25/24(Mon)19:28:35 No.40937362

Is there some LLM angled towards pony stuff? I would like to generate pony related text locally but the uncensored model that I have doesn't really know much more than surface level.

GothicAnon
03/25/24(Mon)20:07:39 No.40937477

GothicAnon 03/25/24(Mon)20:07:39 No.40937477

>>40937252
>>40937342
I have desire to do a little bit of shilling for using that process of ai cloned clips into creating lengthy dataset for other models. And helping out in the background with whatever needs assistance.

Anonymous
03/25/24(Mon)22:06:01 No.40937858

Anonymous 03/25/24(Mon)22:06:01 No.40937858

>>40936911
>have our roommates or family overhear our degenerate shit
But that's part of the flavor!

Anonymous
03/26/24(Tue)02:28:53 No.40938392

Anonymous 03/26/24(Tue)02:28:53 No.40938392

before you go to bed

Anonymous
03/26/24(Tue)03:29:20 No.40938470

Anonymous 03/26/24(Tue)03:29:20 No.40938470

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
https://arxiv.org/abs/2403.16973
https://github.com/jasonppy/VoiceCraft
> Upload model weights (encodec weights are up)
previously posted but they've put their paper on arxiv now. oh they've posted the encodec weight
https://github.com/jasonppy/VoiceCraft/issues/12
lol

Anonymous
03/26/24(Tue)11:51:45 No.40939085

Anonymous 03/26/24(Tue)11:51:45 No.40939085

>>40938392
Also in the middle of the day.

Anonymous
03/26/24(Tue)18:48:17 No.40940052

Anonymous 03/26/24(Tue)18:48:17 No.40940052

Mare anyway

Anonymous
03/26/24(Tue)21:09:32 No.40940428

Anonymous 03/26/24(Tue)21:09:32 No.40940428

>>40940052
But not on page 9.

Anonymous
03/26/24(Tue)21:25:34 No.40940484

Anonymous 03/26/24(Tue)21:25:34 No.40940484

File: Spike being useful.gif (254 KB, 423x460)

254 KB GIF

haysay.ai JUST went down. Anyone else?

HydrusBeta
03/26/24(Tue)22:03:10 No.40940580

HydrusBeta 03/26/24(Tue)22:03:10 No.40940580

>>40940484
I've restarted the Docker containers and its back up now.

RealDash
03/27/24(Wed)01:30:31 No.40941013

RealDash 03/27/24(Wed)01:30:31 No.40941013

File: 2546049__safe_artist-colo(...).png (368 KB, 1111x810)

368 KB PNG

I am rarted. I'm only now remembering to do this. YEARS LATER. But better late than never!

__________

Here is a collection of all the moans and sighs I've generated over the years with 15ai when it was still active. Altogether, this is 538MB. It is currently split into three separate zip files due to Catbox's 200MB limit. Many of these audio clips were made for audios that I either never got around to or couldn't make with the shutdown of 15ai. If I can find a good alternative, they'll be finished someday.

Part 1 contains:
>Daring Do
>Derpy Hooves
>Fluttershy 2
>Gilda
>Pinkie Pie
>Sci-Twi (For consistency's sake, but it's just Twilight)
>Scootaloo
>Spike
>Trixie
>Twilight Sparkle 2
>Twilight Sparkle

Part 2 contains:
>Applejack
>Fluttershy
>Rarity

Part 3 is Rainbow Dash. Some of it was generated for the cuck audio, but can be cut and spliced at will. ~~unless you're a cuck~~

PART 1 -----> https://files.catbox.moe/nfwu5p.zip
PART 2 -----> https://files.catbox.moe/kd1jvv.zip
RAINBOW DASH 2 -----> https://files.catbox.moe/w5stv4.zip

Anonymous
03/27/24(Wed)04:35:55 No.40941213

Anonymous 03/27/24(Wed)04:35:55 No.40941213

Precautionary prior to work bump.

Anonymous
03/27/24(Wed)10:24:16 No.40941649

Anonymous 03/27/24(Wed)10:24:16 No.40941649

>10

Anonymous
03/27/24(Wed)12:02:19 No.40941866

Anonymous 03/27/24(Wed)12:02:19 No.40941866

>Page 8 in 90 minutes
I see there's some shit going on in the catalog again.

Anonymous
03/27/24(Wed)15:01:07 No.40942427

Anonymous 03/27/24(Wed)15:01:07 No.40942427

>>40941866
>Page 8 in 35 minutes
What the fuck is going on today?

Anonymous
03/27/24(Wed)15:26:29 No.40942500

Anonymous 03/27/24(Wed)15:26:29 No.40942500

>>40942427
>several threads with random "lets talk about mare" with less than a dozen post
im not saying its some raid, but it sure stink like one, designated to pump off threads on such slow board.

Anonymous
03/27/24(Wed)15:41:13 No.40942541

Anonymous 03/27/24(Wed)15:41:13 No.40942541

>>40941013
>>>/trash/

Anonymous
03/27/24(Wed)15:52:36 No.40942583

Anonymous 03/27/24(Wed)15:52:36 No.40942583

>>40942541
Stop advertising yourself, Goku.

Anonymous
03/27/24(Wed)15:58:12 No.40942604

Anonymous 03/27/24(Wed)15:58:12 No.40942604

File: newspaper_rarity.jpg (357 KB, 1600x1604)

357 KB JPG

>>40942500
Why the fuck would that be a raid? That's how boards are supposed to work. Are you saying a board where half the content is people page9 bumping their inactive generals is a healthy situation?

Anonymous
03/27/24(Wed)16:06:43 No.40942628

Anonymous 03/27/24(Wed)16:06:43 No.40942628

>>40942604
it's called pattern recognition. There clearly wasn't anything big happening (eg Hasbro ending G% , rule 15 getting removed or Faust making her own pony series with blackjack and hookers), if it was weekend this maybe would make sense since people are having time off and shitposting as usual, but having board go into abnormal overload posting in middle of the work days is just bit weird.

Anonymous
03/27/24(Wed)18:12:00 No.40942931

Anonymous 03/27/24(Wed)18:12:00 No.40942931

File: 1696239849878580.png (221 KB, 686x714)

221 KB PNG

>>40942604
Generally, new threads are supposed to have some form of viable topic to discuss. A flood of random screencaps with less than a full sentence for an OP is just a bunch of shit.

Anonymous
03/27/24(Wed)21:22:14 No.40943514

Anonymous 03/27/24(Wed)21:22:14 No.40943514

>>40942931
Whatever it was, it seems to be over for now.

Anonymous
03/27/24(Wed)23:21:43 No.40943827

Anonymous 03/27/24(Wed)23:21:43 No.40943827

File: CeliCake.png (333 KB, 521x512)

333 KB PNG

Alright, suno is pretty fun.
https://app.suno.ai/song/5b3abea5-7ad6-48f3-ace5-1f11f1444c15

Anonymous
03/28/24(Thu)04:08:47 No.40944314

Anonymous 03/28/24(Thu)04:08:47 No.40944314

>>40943827
>5 short songs per day or pay up
fug, does anyone know if the alternatives that can create similar outputs with just text only? ( know there is BARK, but so far, the results from anons seems just so-so)

Anonymous
03/28/24(Thu)05:12:47 No.40944406

Anonymous 03/28/24(Thu)05:12:47 No.40944406

>>40943827
I suppose the next step would be to take a suno output, separate the vocals, transform the vocals into a character voice, then mix them back together with the instrumental.

Anonymous
03/28/24(Thu)06:05:59 No.40944463

Anonymous 03/28/24(Thu)06:05:59 No.40944463

https://github.com/DoMusic/Hybrid-Net
>Real-time audio source separation, generate lyrics, chords, beat. A transformer-based hybrid multimodal model, various transformer models address different problems in the field of music information retrieval, these models generate corresponding information dependencies that mutually influence each other.
if anyone could test to see if this is superior to UVR (I'm not into audio stuff) that would be appreciated

Anonymous
03/28/24(Thu)09:34:35 No.40944762

Anonymous 03/28/24(Thu)09:34:35 No.40944762

>>40944463
>no requirements.txt file
this is a bit of 'uhoh' moment.

Anonymous
03/28/24(Thu)11:18:50 No.40944978

Anonymous 03/28/24(Thu)11:18:50 No.40944978

>>40944406
I wouldn't be surprised if they offer isolated vocals as an option eventually. I assume they're generating both in separate models and then mixing, so it should be trivial.

>>40944314
I mean no, that's why everyone's hot for teacher over suno, it's the first one to do music gen from nothing that doesn't sound like trash (earlier attempts were getting there, but always kinda lost the plot, never remained coherent for a full 2 minutes as far as I know.) Personally, I can't wait till we can get something like this going privately, so we can pirate musical styles and specifically request chord progressions if we want. Impossible with current memory/processor limitations, but who knows what the future holds.

Anonymous
03/28/24(Thu)14:42:41 No.40945384

Anonymous 03/28/24(Thu)14:42:41 No.40945384

Bump.

Anonymous
03/28/24(Thu)21:22:44 No.40946561

Anonymous 03/28/24(Thu)21:22:44 No.40946561

>>40944762
How so?

Anonymous
03/28/24(Thu)23:20:20 No.40946869

Anonymous 03/28/24(Thu)23:20:20 No.40946869

>>40938470
>VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.
> 03/28/2024: Model weights are up on HuggingFace here!

weights are up, should we try to fine tune it?

Anonymous
03/29/24(Fri)04:54:13 No.40947442

Anonymous 03/29/24(Fri)04:54:13 No.40947442

bmup

Anonymous
03/29/24(Fri)05:00:41 No.40947446

Anonymous 03/29/24(Fri)05:00:41 No.40947446

>>40946869
https://github.com/jasonppy/VoiceCraft
> 03/28/2024: Model weights are up on HuggingFace
https://huggingface.co/pyp1/VoiceCraft/tree/main

Anonymous
03/29/24(Fri)07:02:41 No.40947581

Anonymous 03/29/24(Fri)07:02:41 No.40947581

File: 316711084-849fce22-0db1-4(...).webm (1.15 MB, 1544x516)

1.15 MB WEBM

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
https://arxiv.org/abs/2403.17694
>In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image. Our methodology is divided into two stages. Initially, we extract 3D intermediate representations from audio and project them into a sequence of 2D facial landmarks. Subsequently, we employ a robust diffusion model, coupled with a motion module, to convert the landmark sequence into photorealistic and temporally consistent portrait animation. Experimental results demonstrate the superiority of AniPortrait in terms of facial naturalness, pose diversity, and visual quality, thereby offering an enhanced perceptual experience. Moreover, our methodology exhibits considerable potential in terms of flexibility and controllability, which can be effectively applied in areas such as facial motion editing or face reenactment.
https://github.com/Zejun-Yang/AniPortrait
> Update the code to generate pose_temp.npy for head pose control.
>We will release audio2pose pre-trained weight for audio2video after futher optimization. You can choose head pose template in ./configs/inference/head_pose_temp as substitution.
lip sync for images to audio. has face reenactment ability too. github has more videos. it's pretty rough but lots of room to improve. they used 4 days with 4 A100s to train it so if using vast.ai with the 4x A100 80GB offering it would cost $336 to recreate (using their datasets which might not be optimal). also still hilarious we can't use webms with audio outside of /gif/
specifically for you guys it would probably require a new model to be trained with ponies or w/e for it to work well

Anonymous
03/29/24(Fri)10:57:23 No.40947934

Anonymous 03/29/24(Fri)10:57:23 No.40947934

>>40947581
Not only that, it would have to go human --->equine--->cartoon equine. It would have to be capable of hopping 3 categories and fundamentally different skull structures, vs 0. Probably still possible eventually, but a much harder problem.

Anonymous
03/29/24(Fri)12:30:44 No.40948118

Anonymous 03/29/24(Fri)12:30:44 No.40948118

>>40947581
>Pony facial animations
Man, that would be damn cool.

Anonymous
03/29/24(Fri)15:33:05 No.40948519

Anonymous 03/29/24(Fri)15:33:05 No.40948519

an anon tested the voiceclone capability of voicecraft
>new tts model dropped and here is the source file
https://voca.ro/157IzI9y4YZ6
>and here is the generated voice.
https://voca.ro/1ojHkZ87XRVL

Anonymous
03/29/24(Fri)16:31:42 No.40948661

Anonymous 03/29/24(Fri)16:31:42 No.40948661

https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices
>Today we are sharing preliminary insights and results from a small-scale preview of a model called Voice Engine, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.
>It is notable that a small model with a single 15-second sample can create emotive and realistic voices.

Anonymous
03/29/24(Fri)17:28:24 No.40948788

Anonymous 03/29/24(Fri)17:28:24 No.40948788

>>40948519
this is sounding pretty good, little bit too good. If this isn't someone trolling by recording themselves twice, it's a pretty nice indication that the voice cloning tech is going from being kind of shit to pretty decent.

Anonymous
03/29/24(Fri)21:34:35 No.40949380

Anonymous 03/29/24(Fri)21:34:35 No.40949380

>>40948519
How on earth are you supposed to train a voice on this? I'm so confused by the instructions. Am I just supposed to wait until a colab or huggingface demo appears?

Anonymous
03/29/24(Fri)21:36:48 No.40949386

Anonymous 03/29/24(Fri)21:36:48 No.40949386

File: Twilight think.jpg (94 KB, 825x1080)

94 KB JPG

>>40948519
Eh? That is impressive. However, all I've heard so far are low quality inputs.
Can we get some high quality results? Preferably pone? It should be much easier to tell if there's any flaws.

Anonymous
03/29/24(Fri)21:41:18 No.40949398

Anonymous 03/29/24(Fri)21:41:18 No.40949398

hey does anyone have a link to that pinkie pie american pie cover with vul style voices?

Anonymous
03/29/24(Fri)22:10:53 No.40949524

Anonymous 03/29/24(Fri)22:10:53 No.40949524

>>40949398
https://files.catbox.moe/g3ahk9.flac
I got you senpai

Anonymous
03/29/24(Fri)22:12:33 No.40949530

Anonymous 03/29/24(Fri)22:12:33 No.40949530

>>40949524
I LOVE YOU ANON!!!!

Anonymous
03/29/24(Fri)23:13:07 No.40949785

Anonymous 03/29/24(Fri)23:13:07 No.40949785

>>40949398
Isn't that on Youtube too?

Anonymous
03/30/24(Sat)05:05:42 No.40950360

Anonymous 03/30/24(Sat)05:05:42 No.40950360

>Page 9

Anonymous
03/30/24(Sat)09:28:05 No.40950685

Anonymous 03/30/24(Sat)09:28:05 No.40950685

>suno ai songs are all saved to mp3 and the link is easily accessible with standard webpage inspection keybinding
Not sure if I should feel happy that there is no need to fuck around with 3rd party software to download songs, or disappointed that apparently one of the more advanced musical ai tools are being handled by some IT intern guy.

Anonymous
03/30/24(Sat)12:43:52 No.40951133

Anonymous 03/30/24(Sat)12:43:52 No.40951133

>mare

Anonymous
03/30/24(Sat)17:25:41 No.40951924

Anonymous 03/30/24(Sat)17:25:41 No.40951924

>>>/wsg/5497184
>I think the main barrier is good local tts. Stability AI actually trained a SOTA tts model back in February based on enabling Audiocraft to generate text. They pussied out though and didn't release the model. Here's the paper if a tech savvy anon wants to spend $20,000 and 3 months of dev time to make the best open source tts solution that can also do music.
>https://arxiv.org/abs/2402.01912

reee

Anonymous
03/30/24(Sat)18:00:54 No.40952002

Anonymous 03/30/24(Sat)18:00:54 No.40952002

>>40951924
>$20,000
maybe not SOTA but voicecraft (>>40926585) took 2 weeks with 4xA40s to train. Vast.ai current pricing has that work out to be around $672.

Anonymous
03/30/24(Sat)23:27:23 No.40952862

Anonymous 03/30/24(Sat)23:27:23 No.40952862

Bump.

Anonymous
03/31/24(Sun)01:37:49 No.40953111

Anonymous 03/31/24(Sun)01:37:49 No.40953111

>>40941013
>no replies for this high effort post
anon I am appalled how and why the fuck did you do this
why so many what were the uses for it?

Anonymous
03/31/24(Sun)02:02:16 No.40953149

Anonymous 03/31/24(Sun)02:02:16 No.40953149

File: 1417744044882.gif (60 KB, 720x1050)

60 KB GIF

>https://youtu.be/f8DKD78BrQA?si=a1ejJszeKdC4idq3
for those of you who watched this may have noticed a little something where they used a simulation (3D models trying and failing at a certain activity until they get it right) in order to train AI and it just fucking hit me just how close we are to pony bots
>but muh computing power
well yeah, but hypothetically if we had the funds for training and that was taken care of, as long as we had some sort of model and card (think /CHAG/ because we could totally get those guys to whip something up for us if we needed to just use a basic model as the base for the robo-pony) and then all that's left would be
>the hardware
now the hardware would also be expensive since you have essentially a full blown PC (to run the local model instead of a proxy)
>inb4 the nightmare of having your waifus proxy getting taken down
and then on top of the PC parts like GPU you also have the mechanical movements which maybe requires some knowlege on hydraulics (maybe thats way too powerful for a small pony but the point stands) and the actual robotic parts would be expensive as fuck, I feel this would end up being the hardest part, but i guess the nice thing about having this general thread is we can always come up with hypothetical stuff that could end up as the actual basis for robo-pones once we have the abillity to do so
you know what they say, (i looked it up and couldn't find it so ill improvise) an ounce of planning is worth a pound of action
having these ideas and plans on us could give us an opening to actually do stuff the second we have the funds or the tech
besides, by breaking it down into actionable steps (in a similar way to how anon's claimed singular scenes for the Redub projects in part due to the way the google spreadsheet was setup) we could totally have an a plan and get our waifus out ASAP
and with growing promise im sure funds would start pouring in from anons on here, i mean look what happened to 'Fallen Oak Sanctuary' at mare fair, if they can get 50k im sure a project like this, which directly affects all the anons on this board, is surely to get funding the second it shows some promise
and the only way to show promise is to show solid planning and theory to back it up and generate excitement from anons who can donate

Anonymous
03/31/24(Sun)02:25:48 No.40953181

Anonymous 03/31/24(Sun)02:25:48 No.40953181

File: 1599526626979.jpg (403 KB, 1237x1019)

403 KB JPG

>>40953149
another thing would be (i guess you could phrase it as) contextual context where instead of AI taking a certain amount of most recent messages (more of a /CHAG/ thing with sillytavern and context token limits) you could probably just use keyword related context (which im confused as to why this isnt a thing in silly tavern yet)
but anyways ignore that fucking slop above this sentance, the point im trying to make is we would have to figure out how to maintain proper context so our little robo-pones aren't overwhelmed everytime they want to speak
im sure you could have both keyword context as well as (i guess you could call it) simile context where previous conversations which are similar but not exact to a current conversation can still be used as context but how that would work is beyond me

at this point in time we pretty much are just leeching off of what tech companies give us and adapting it to pony stuff, i doubt we could compete with these tech giants in terms of actual AI advancement but still, if we are adapting their AI we could at least plan for the actual hardware of robo-pones since we all know a NVIDIA-robo-twilight aint happening anytime soon so it's up to us to get designs and plans rolling out in preparation for funding (maybe mare fair could help with this one day or even /mlp/con)

all that matters is the momentum and actual visible plans is being shown and surely funding would roll in from other excited anons

RealDash
03/31/24(Sun)02:41:52 No.40953222

RealDash 03/31/24(Sun)02:41:52 No.40953222

>>40953111
I wanted to create a multitude of moans and splice individual ones together in specific ways in order to get a more "natural" sounding series of moans, something that sounded genuine and realistic, like the character is actually breathing and doing it. Sometimes one generation will have a good gasp and then moan like they're going "huh?", while another one derps out when it gasps but they do this little shiver afterwards, and it fits just right.

For example, with the Fluttershy audio I did two years ago, there are a few generations I was able to make that straight up sounded like Fluttershy was cumming hard, and it worked so well for the context of the story I had set up.

Many of the others were for audios I had planned but either never got around to, or couldn't do them because 15ai shut down and never came back up. But either way, the sounds I generated were just sitting on my hard drive collecting dust, so I wanted to post and share them for folks to use either in audios or animations. They're still a bit noisy because they were generated in early 2022, but with some good filtering effects you can easily remove it.

Some of the audios also have dialogue included in them, but they're meant to be ignored since those generations included some good moans.

Anonymous
03/31/24(Sun)04:30:41 No.40953341

Anonymous 03/31/24(Sun)04:30:41 No.40953341

>>40953222
>>>/trash/

Anonymous
03/31/24(Sun)05:40:57 No.40953388

Anonymous 03/31/24(Sun)05:40:57 No.40953388

Gradio port of VoiceCraft if anyone cares.
https://github.com/friendlyFriend4000/VoiceCraft

Anonymous
03/31/24(Sun)07:53:16 No.40953514

Anonymous 03/31/24(Sun)07:53:16 No.40953514

>>40953388
>windows version maybe
oh jeez. Im not ready to go full penguinpilled, despite most of the stuff I use is technically available on both OS however all the little changes just fuck up my productivity.

Clipper
03/31/24(Sun)14:17:54 No.40954154

Clipper 03/31/24(Sun)14:17:54 No.40954154

File: Noise mix pre and post.png (63 KB, 5670x693)

63 KB PNG

>>40921076
>>40937247
Review of S6 is done.
https://files.catbox.moe/2a6kml.json

vul - do you have the original audio for EQG and the FiM movie? I think it'll be worth running them through demu as well to see if we can get some more clean audio there.

Anonymous
03/31/24(Sun)15:29:27 No.40954334

Anonymous 03/31/24(Sun)15:29:27 No.40954334

Since Suno is restricting prompts based on certain words and has no idea what the fuck it's supposed to be, I generated these:

Suno! -----> https://files.catbox.moe/c7qk63.mp3
NO FUN ALLOWED -----> https://files.catbox.moe/brdne4.mp3

Here's another version that's not as good, but they say "rock solo" in a cool way: https://files.catbox.moe/uho0fd.mp3

Anonymous
03/31/24(Sun)19:41:27 No.40954981

Anonymous 03/31/24(Sun)19:41:27 No.40954981

File: space 1664195774480418.jpg (548 KB, 1608x2081)

548 KB JPG

>https://app.suno.ai/song/a9973479-6cdc-4098-a8d8-cf0787d64943
Alright, its my first time fucking around with this thing, and I through of using some half asses lyrics at it and the outcome is pretty OKish.
It's still pretty cucked with the artificial set up credit system to randomly cut off the song just few seconds before the end (probably by design, to trick people into going full paypig mode), however if I had a offline version of this tech I would totally spend two days trying to make it perfect.
IF/WHEN my above complain stop being problem, I can see this as another boom of pony music, just as there was a boom of them after the rvc/sovits got created year ago.

Anonymous
03/31/24(Sun)20:06:33 No.40955053

Anonymous 03/31/24(Sun)20:06:33 No.40955053

>>40953388
https://github.com/kijai/ComfyUI-VoiceCraft

Anonymous
03/31/24(Sun)20:42:37 No.40955160

Anonymous 03/31/24(Sun)20:42:37 No.40955160

File: Velvet Sparkle.jpg (161 KB, 1000x437)

161 KB JPG

If i could ask someone a request here. Could anyone generate an Ai dub based on this image.

Anonymous
04/01/24(Mon)04:27:37 No.40956515

Anonymous 04/01/24(Mon)04:27:37 No.40956515

>>40954154
>clipper
I LIKE YOUR SNOW PONY VIDEO
also where did u get all that science equipment from snow bro?

Anonymous
04/01/24(Mon)08:56:43 No.40956973

Anonymous 04/01/24(Mon)08:56:43 No.40956973

doot

Anonymous
04/01/24(Mon)12:27:56 No.40957582

Anonymous 04/01/24(Mon)12:27:56 No.40957582

>>40954981
I've try to see if it would be possible to quickly edit this with pony voice, but the constant change in reverb as well as randomly going from solo to duet to solo in the same sentence makes a very cursed sounding outputs.
So with the fact that 70% of the lines would need a redoing, I can see this being a pretty nice prototyping tool for people to then create proper songs, either with raw vocals or audio conversion. To get a "plug voice to rvc/sovit and get mare" future we will need to wait for a uncucked offline version of this tech.

Anonymous
04/01/24(Mon)15:27:19 No.40958229

Anonymous 04/01/24(Mon)15:27:19 No.40958229

>>40955053
https://github.com/haoheliu/versatile_audio_super_resolution
having some way to upscale the 16kHz voicecraft output in an easy to use manner might be good

Anonymous
04/01/24(Mon)15:35:28 No.40958251

Anonymous 04/01/24(Mon)15:35:28 No.40958251

15 ai come back

Anonymous
04/01/24(Mon)18:58:52 No.40959036

Anonymous 04/01/24(Mon)18:58:52 No.40959036

>>40958251
Nah, too busy wiping his ass with his patreon money.

Anonymous
04/01/24(Mon)21:43:29 No.40959439

Anonymous 04/01/24(Mon)21:43:29 No.40959439

>Page 9

Anonymous
04/02/24(Tue)01:40:07 No.40960144

Anonymous 04/02/24(Tue)01:40:07 No.40960144

File: OIG3.J_PymwmqP1MCLzx7tV2j.jpg (199 KB, 1024x1024)

199 KB JPG

>>40959439

Anonymous
04/02/24(Tue)02:41:45 No.40960343

Anonymous 04/02/24(Tue)02:41:45 No.40960343

File: tldr normies do not care.png (47 KB, 1241x527)

47 KB PNG

>saw this on /g/
Is this were we are heading, future with fantastical and whimsical scifi level of tech and all the normies going "ehh, whatever dude"?

Anonymous
04/02/24(Tue)10:18:02 No.40961355

Anonymous 04/02/24(Tue)10:18:02 No.40961355

>10

Anonymous
04/02/24(Tue)18:16:15 No.40962663

Anonymous 04/02/24(Tue)18:16:15 No.40962663

>9

Anonymous
04/02/24(Tue)21:17:52 No.40963126

Anonymous 04/02/24(Tue)21:17:52 No.40963126

>>40936557
>ElevenLabs
Doesn't let me use it without upgrade my subscription

Anonymous
04/02/24(Tue)21:33:59 No.40963194

Anonymous 04/02/24(Tue)21:33:59 No.40963194

>>40960343
>the normies going "ehh, whatever dude"
That's probably for the best. Normies wouldn't grasp what they're dealing with and only attract a larger outrage mob by painting a bigger target on AI.

Anonymous
04/03/24(Wed)03:16:23 No.40964097

Anonymous 04/03/24(Wed)03:16:23 No.40964097

File: OIG4.CHKkSMqoRY9TOpDxUM9q.jpg (127 KB, 1024x1024)

127 KB JPG

Anonymous
04/03/24(Wed)03:41:08 No.40964141

Anonymous 04/03/24(Wed)03:41:08 No.40964141

>>40964097
>teaching robots how to boop
That's a powerful tool you give them.

Anonymous
04/03/24(Wed)07:29:14 No.40964432

Anonymous 04/03/24(Wed)07:29:14 No.40964432

Since making songs with suno ai (with some help with chat gpt) is relatively easy, how would Anons feel about making a some kind of musical ai album with the theme of taking an episode of S1 and making song about it (whenever it's something serious or more comedic)?
Just throwing this idea out here to see if there is any interest.

Anonymous
04/03/24(Wed)09:35:25 No.40964620

Anonymous 04/03/24(Wed)09:35:25 No.40964620

>>40964432
That sounds like a fun idea.

Anonymous
04/03/24(Wed)10:46:58 No.40964753

Anonymous 04/03/24(Wed)10:46:58 No.40964753

https://stability.ai/news/stable-audio-2-0
https://stableaudio.com/
no weights and the website requires a log in but seems like for now they're making it usable for free

Anonymous
04/03/24(Wed)12:15:28 No.40965026

Anonymous 04/03/24(Wed)12:15:28 No.40965026

>>40964753
thats nice
>To protect creator copyrights, for audio uploads, we partner with Audible Magic to utilize their content recognition (ACR) technology to power real-time content matching to prevent copyright infringement.
while this is bit shit, I can understand that they need this for when music companies will go full jew on them (we can't have nice things unless one of the five big corpos also take a pound of flesh for it). Sadly this most likely means that they will not publish the weight models, however if they also publish (or have a leak) on how their models is train there is a small chance someone else could pick it up and make a non-cucked version.

Anonymous
04/03/24(Wed)12:59:55 No.40965188

Anonymous 04/03/24(Wed)12:59:55 No.40965188

File: dashie_glertchy.jpg (437 KB, 1536x1536)

437 KB JPG

>>40964753
Just tested a bit of a slightly outlandish prompt; and oh my god, the result is just so weird and painful the more you listen.
>"My Little Pony, Pop, catchy, Rainbow Dash vocals, Pony, Horse, Neigh, Mare"
>https://files.catbox.moe/aidi5e.mp3

Anonymous
04/03/24(Wed)14:06:58 No.40965450

Anonymous 04/03/24(Wed)14:06:58 No.40965450

File: Mare_is_scare.png (31 KB, 278x361)

31 KB PNG

>Uses Stable Audio to generate horse SFX foleys
>Some trot some sniff but the rest all horror spook
>She growls and make otherworldly noises
>Mare is scare

https://files.catbox.moe/b97srx.mp4

GothicAnon
04/03/24(Wed)16:58:10 No.40966045

GothicAnon 04/03/24(Wed)16:58:10 No.40966045

File: unicor piano.jpg (116 KB, 768x768)

116 KB JPG

https://docs.google.com/document/d/17hP2rkQHlU43nNOdy1iDEJzKbhgm_kjSaCAo6J1plTY/edit?usp=drive_link
>>40964432
>TLDR Anons come together to make a full album with songs that are created whatever ai tools they have access to.

Alright, here is the basic set up for the Pony Ai Album, for anyone interested simply replay to this post with the episode that what you wish to work on (however songs based on characters/setting/ides are welcome too).
The individual song will have soft deadline of two~three weeks, if the song is not delivered by that time it will became available for choosing once again.
If there is anything that (you) think would need to be added to the main doc please do tell.

Anonymous
04/03/24(Wed)20:24:02 No.40966801

Anonymous 04/03/24(Wed)20:24:02 No.40966801

>>40965188
>https://files.catbox.moe/aidi5e.mp3
This sounds like a song from a Pinkamena Party album.

Anonymous
04/04/24(Thu)00:39:00 No.40967514

Anonymous 04/04/24(Thu)00:39:00 No.40967514

boop

Anonymous
04/04/24(Thu)03:17:43 No.40967851

Anonymous 04/04/24(Thu)03:17:43 No.40967851

Hello Anons

So...seems like RVC can do laughter but under certain conditions, the biggest one is that it has to be somewhat airy, and a short chuckle or chortle...a soft laugh. I'll post results with various characters soon

Anonymous
04/04/24(Thu)07:37:51 No.40968199

Anonymous 04/04/24(Thu)07:37:51 No.40968199

>https://app.suno.ai/song/6385f0fa-7bfd-4faf-ac2f-c3f540fec9c5/
just reposting a /g/ banger, it's not as amazing as the '4am' but it's still pretty bloody impressive.

Anonymous
04/04/24(Thu)10:49:27 No.40968699

Anonymous 04/04/24(Thu)10:49:27 No.40968699

>10

Anonymous
04/04/24(Thu)19:20:27 No.40970190

Anonymous 04/04/24(Thu)19:20:27 No.40970190

>>40967851
Looking forward to it.

Anonymous
04/04/24(Thu)22:49:15 No.40970820

Anonymous 04/04/24(Thu)22:49:15 No.40970820

>>40968699

Anonymous
04/05/24(Fri)05:03:24 No.40971560

Anonymous 04/05/24(Fri)05:03:24 No.40971560

Up.

Anonymous
04/05/24(Fri)11:09:31 No.40972116

Anonymous 04/05/24(Fri)11:09:31 No.40972116

Thread preservation bump.

Anonymous
04/05/24(Fri)15:02:54 No.40972707

Anonymous 04/05/24(Fri)15:02:54 No.40972707

>>40972116
yee

Anonymous
04/05/24(Fri)19:59:39 No.40973508

Anonymous 04/05/24(Fri)19:59:39 No.40973508

>>40972707
Have another.

Anonymous
04/05/24(Fri)20:58:18 No.40973721

Anonymous 04/05/24(Fri)20:58:18 No.40973721

still no way to make mares do death metal growls?

Anonymous
04/05/24(Fri)23:03:54 No.40974253

Anonymous 04/05/24(Fri)23:03:54 No.40974253

File: 1712326099625504.jpg (107 KB, 1080x802)

107 KB JPG

Requesting pic related integrated into an outtake of the episode.

Anonymous
04/06/24(Sat)03:32:45 No.40975071

Anonymous 04/06/24(Sat)03:32:45 No.40975071

And I need mares more than want mares
And I want mares for all time
And the wichita linemare
Is still on the line

Anonymous
04/06/24(Sat)06:57:49 No.40975418

Anonymous 04/06/24(Sat)06:57:49 No.40975418

>>40940580
is it open sauce?

Clipper
04/06/24(Sat)09:01:23 No.40975623

Clipper 04/06/24(Sat)09:01:23 No.40975623

File: Noise mix pre and post.png (59 KB, 4717x667)

59 KB PNG

>>40921076
>>40954154
Review of S7 is done.
https://files.catbox.moe/4vscco.json

That's all the core FiM episodes done. I'd still like to try running demu on our other audio sources to improve the dataset further. vul, would you be able to run it on the EQG audio and the FiM movie?

>>40956515
I'm a chemist irl so was just using the stuff I already had in the lab.

Anonymous
04/06/24(Sat)11:22:36 No.40975891

Anonymous 04/06/24(Sat)11:22:36 No.40975891

>>40975418
Yes. The main repository for the UI is here:
https://github.com/hydrusbeta/hay_say_ui
Documentation on running a public server is here:
https://github.com/hydrusbeta/hay_say_ui/tree/main/running%20as%20server
Each architecture (RVC, ControllableTalkNet, StyleTTS, SVC) also has 1 or 2 of its own repositories, which are used for building its Docker image. There are about 10 repos involved in the whole project:
https://github.com/hydrusbeta?tab=repositories

Anonymous
04/06/24(Sat)11:23:56 No.40975895

Anonymous 04/06/24(Sat)11:23:56 No.40975895

>>40975623
Do you have the associated audios for EQG and the movies

Anonymous
04/06/24(Sat)12:15:39 No.40975990

Anonymous 04/06/24(Sat)12:15:39 No.40975990

>>40975895
>>40975623
Here is the merged labels index, could be useful in the future. If it turns out to be inaccurate it can be recomputed afterwards; the file structure is what's most important
https://files.catbox.moe/tqglrp.json

Clipper
04/06/24(Sat)13:40:35 No.40976188

Clipper 04/06/24(Sat)13:40:35 No.40976188

>>40975895
Should be these, I also included the Best Gift Ever and Rainbow Roadtrip specials:
https://drive.google.com/file/d/12QomZA_D1XiRNciPkIxqw_reU1c64fnm

Anonymous
04/06/24(Sat)20:04:10 No.40977357

Anonymous 04/06/24(Sat)20:04:10 No.40977357

>>40976188 (checked)
Isolated versions. The current code is not compatible with these audios but you can get a head start on downloading them:
https://drive.google.com/drive/folders/1dw4nYR9PjJk2C81Hzjgkym26hMULNJMO?usp=drive_link

Best Gift Ever demu0:
https://drive.google.com/file/d/1oI_qo8TAkHCzx_waQOcFZhLcNcbnnVFt/view?usp=drive_link

Anonymous
04/06/24(Sat)20:05:15 No.40977360

Anonymous 04/06/24(Sat)20:05:15 No.40977360

>>40977357
Also here is the labels index for the extra files:
https://files.catbox.moe/cfcvsw.json

Anonymous
04/06/24(Sat)20:32:11 No.40977418

Anonymous 04/06/24(Sat)20:32:11 No.40977418

>>40977357
>>40977360
FYI something seems to have messed up in the processing for the FiM movie and Rainbow Roadtrip. Not sure what happened other than that the timestamps are off...

Anonymous
04/06/24(Sat)20:54:54 No.40977476

Anonymous 04/06/24(Sat)20:54:54 No.40977476

>>40977418
Actually there seems to be a discrepancy between the master file annotations and the version provided here
>>40976188

Anonymous
04/06/24(Sat)21:22:30 No.40977553

Anonymous 04/06/24(Sat)21:22:30 No.40977553

>>40977476
Also unrelated but I still can't believe that is the highest quality audio of Rainbow Roadtrip that was ever released. I wonder if the Netflix version would've been better.

Anonymous
04/07/24(Sun)01:36:38 No.40978197

Anonymous 04/07/24(Sun)01:36:38 No.40978197

boop

Anonymous
04/07/24(Sun)09:32:54 No.40978824

Anonymous 04/07/24(Sun)09:32:54 No.40978824

>>40978197
again

Anonymous
04/07/24(Sun)14:11:55 No.40979460

Anonymous 04/07/24(Sun)14:11:55 No.40979460

https://files.catbox.moe/0j8902.mp3

Anonymous
04/07/24(Sun)16:19:14 No.40979828

Anonymous 04/07/24(Sun)16:19:14 No.40979828

>>40979460
I like the vibes you are going for in there.

Clipper
04/07/24(Sun)19:05:48 No.40980260

Clipper 04/07/24(Sun)19:05:48 No.40980260

>>40977418
>>40977476
>>40977553
The audio for the specials and movie have always been somewhat tricky, there've been multiple sources used over the years so entirely possible the version I have locally is different. I wouldn't worry too much about it if it's unduly difficult, we already have plenty of other data.

Anonymous
04/07/24(Sun)21:46:01 No.40980629

Anonymous 04/07/24(Sun)21:46:01 No.40980629

File: full.png (1.8 MB, 2048x2048)

1.8 MB PNG

>>40979460
AWAKEN, MY MASTERS!

Anonymous
04/07/24(Sun)21:58:50 No.40980649

Anonymous 04/07/24(Sun)21:58:50 No.40980649

>>40980260
I manually tried to align a few clips, and there does not appear to be a constant offset for Rainbow Roadtrip or the FiM movie. I imagine some clever data finagling could be done to re-align the clips but it's beyond me at the moment. Also, the Rainbow Roadtrip voice clips in the master file seem to be polarity inverted relative to the copy we have.

Going to focus on modifying the code to work with the EQG data.

Anonymous
04/07/24(Sun)22:29:52 No.40980710

Anonymous 04/07/24(Sun)22:29:52 No.40980710

>>40980260
>>40980649
Updated instructions:
0. Download new release https://github.com/effusiveperiscope/PonySorter-B/releases/tag/20240407updated
1. Copy the >>40977360 into the ponysorter directory and rename it to extras_labels_index.json
2. Copy extra_process_dumps from here >>40977357 into in_audio
3. In config.yaml, change index_file to point to extras_labels_index.json. Point master_file_1 and master_file_2 at the correct paths (master file 2 I think is not used atm but it would be if we got the movie audio working in the future)
Lmk if there are issues

Anonymous
04/08/24(Mon)03:22:44 No.40981220

Anonymous 04/08/24(Mon)03:22:44 No.40981220

Casualties of mare

Anonymous
04/08/24(Mon)09:19:23 No.40981630

Anonymous 04/08/24(Mon)09:19:23 No.40981630

Up.

Anonymous
04/08/24(Mon)14:45:57 No.40982185

Anonymous 04/08/24(Mon)14:45:57 No.40982185

Pass the mare grenade

Clipper
04/08/24(Mon)16:27:25 No.40982445

Clipper 04/08/24(Mon)16:27:25 No.40982445

File: image_2024-04-08_212659377.png (84 KB, 701x238)

84 KB PNG

>>40980710
Got an error.

Anonymous
04/08/24(Mon)17:16:23 No.40982570

Anonymous 04/08/24(Mon)17:16:23 No.40982570

>>40982445
That's odd. I can't find anything substantial about this online. Try restarting? Have you made any modifications to audio hardware/drivers/system updates?

Clipper
04/08/24(Mon)17:44:59 No.40982634

Clipper 04/08/24(Mon)17:44:59 No.40982634

File: image_2024-04-08_224000486.png (64 KB, 1921x641)

64 KB PNG

>>40982570
>Redownload PonySorter several times
>Painstakingly triple-check all steps of the process
>Waste over an hour troubleshooting mystery issue
>"Just restart PC lmao"
>It works now
REEEEEEEEEEEE

Anonymous
04/08/24(Mon)18:12:19 No.40982720

Anonymous 04/08/24(Mon)18:12:19 No.40982720

>>40982634
many such cases

Anonymous
04/08/24(Mon)19:58:19 No.40983022

Anonymous 04/08/24(Mon)19:58:19 No.40983022

>>40982634
The magic of technology,

Anonymous
04/09/24(Tue)01:20:32 No.40983644

Anonymous 04/09/24(Tue)01:20:32 No.40983644

File: 1703695423326963.gif (785 KB, 507x508)

785 KB GIF

Anonymous
04/09/24(Tue)05:42:56 No.40983999

Anonymous 04/09/24(Tue)05:42:56 No.40983999

bump

Anonymous
04/09/24(Tue)11:05:29 No.40984595

Anonymous 04/09/24(Tue)11:05:29 No.40984595

>>40982185
The holy mare grenade.

Anonymous
04/09/24(Tue)13:22:38 No.40984892

Anonymous 04/09/24(Tue)13:22:38 No.40984892

>>40980649
I poked at this a little bit and the situation is even more odd for the movie--while I can get the Rainbow Roadtrip lines close enough to null out, I can't seem to get any of the audio clips for the movie in phase and I'm kind of suspecting that they are actually at very slightly different playback rates.

At this point I'm seriously considering using an AI STT with timestamps and just matching those against the manual annotations. Might also be helpful for Best Gift Ever?

Anonymous
04/09/24(Tue)13:31:07 No.40984921

Anonymous 04/09/24(Tue)13:31:07 No.40984921

HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
https://arxiv.org/abs/2404.04645
>Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation on a new set of speakers can be achieved by fine-tuning the whole model for each new domain, thus making it parameter-inefficient. This problem can be solved by Adapters that provide a parameter-efficient alternative to domain adaptation. Although famous in NLP, speech synthesis has not seen much improvement from Adapters. In this work, we present HyperTTS, which comprises a small learnable network, "hypernetwork", that generates parameters of the Adapter blocks, allowing us to condition Adapters on speaker representations and making them dynamic. Extensive evaluations of two domain adaptation settings demonstrate its effectiveness in achieving state-of-the-art performance in the parameter-efficient regime. We also compare different variants of HyperTTS, comparing them with baselines in different studies. Promising results on the dynamic adaptation of adapter parameters using hypernetworks open up new avenues for domain-generic multi-speaker TTS systems.
https://github.com/declare-lab/HyperTTS
code was posted 10 months ago but the arxiv paper was just posted. hope the guy who does finetunes tries it out to see if it somehow is actually useful

Anonymous
04/09/24(Tue)14:21:23 No.40985101

Anonymous 04/09/24(Tue)14:21:23 No.40985101

File: 261272__safe_pinkie+pie_s(...).png (1.09 MB, 7200x8567)

1.09 MB PNG

>>40984892
So unfortunately, a good 30% of the time Whisper doesn't seem to recognize Pinkie as an actual voice (understandable, since I don't think there are any voices like Pinkie Pie in open training datasets). Going to keep plugging and see how much of the data this actually affects.

Anonymous
04/09/24(Tue)14:49:44 No.40985180

Anonymous 04/09/24(Tue)14:49:44 No.40985180

>>40985101
>whisper
might want to try out different models
https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

Anonymous
04/09/24(Tue)15:40:47 No.40985306

Anonymous 04/09/24(Tue)15:40:47 No.40985306

>>40985101
also a paper that while not directly applicable could actually be the path forward for unique voice acting asr
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
https://arxiv.org/abs/2404.04295
>This paper proposes Transducers with Pronunciation-aware Embeddings (PET). Unlike conventional Transducers where the decoder embeddings for different tokens are trained independently, the PET model's decoder embedding incorporates shared components for text tokens with the same or similar pronunciations. With experiments conducted in multiple datasets in Mandarin Chinese and Korean, we show that PET models consistently improve speech recognition accuracy compared to conventional Transducers. Our investigation also uncovers a phenomenon that we call error chain reactions. Instead of recognition errors being evenly spread throughout an utterance, they tend to group together, with subsequent errors often following earlier ones. Our analysis shows that PET models effectively mitigate this issue by substantially reducing the likelihood of the model generating additional errors following a prior one. Our implementation will be open-sourced with the NeMo toolkit.
https://github.com/NVIDIA/NeMo

Anonymous
04/09/24(Tue)16:07:04 No.40985372

Anonymous 04/09/24(Tue)16:07:04 No.40985372

>>40985180
I'm not sure if there are word-level timestamps for the NVIDIA ones.

Anonymous
04/09/24(Tue)19:21:49 No.40985926

Anonymous 04/09/24(Tue)19:21:49 No.40985926

Page 9 bump.

Anonymous
04/10/24(Wed)01:24:46 No.40986757

Anonymous 04/10/24(Wed)01:24:46 No.40986757

>>40985372
>>40985101
I managed to get 94% of the lines "aligned" with a transcription on Rainbow Roadtrip with this method but the timestamps are all wrong because whisper_timestamped has a tendency to drift without voice activity detection (VAD). Unfortunately with VAD enabled it basically does not recognize Pinkie Pie as a speaking voice. So I ended up going back to using cross correlation, but with windowing around the expected time offset, and that seems to be promising.

Anonymous
04/10/24(Wed)03:34:25 No.40986978

Anonymous 04/10/24(Wed)03:34:25 No.40986978

>>40965450
The end reminds me of a car hitting hard on the brakes.

Anonymous
04/10/24(Wed)04:33:31 No.40987060

Anonymous 04/10/24(Wed)04:33:31 No.40987060

File: fim_movie_time_offsets.png (18 KB, 833x428)

18 KB PNG

>>40986757
>>40982634
Okay, here is an updated extra_labels_index.json with new timestamps that match the provided audio: https://files.catbox.moe/p57n0j.json

There is one caveat in that all of the HH_MM_SS timestamps in the file names become inaccurate. In this new release (remember to copy your config over):
https://github.com/effusiveperiscope/PonySorter-B/releases/tag/20241004updated

The timestamps for the -exported- file names (as well as for the label file) are based off the "actual" timestamp.

For anyone who cares, here is the code I used for alignment:
https://github.com/effusiveperiscope/PPPDataset/blob/main/data_realigner.ipynb
And pic related is a time offset graph for the FiM movie (you can see the graph for Rainbow Roadtrip in the notebook). Seems to be pretty consistent with a different playback rate (the x axis is line index, so it's not exactly linear with respect to time but it would be pretty close). OTOH, the Rainbow Roadtrip time offset graph is very clearly stepwise, probably due to different commercial breaks.

Anonymous
04/10/24(Wed)06:09:14 No.40987180

Anonymous 04/10/24(Wed)06:09:14 No.40987180

everyone is a fucking rPHetard o2vwn this forum

Anonymous
04/10/24(Wed)06:10:16 No.40987183

Anonymous 04/10/24(Wed)06:10:16 No.40987183

>>40987180
everyone is a fucking retard here including me

Anonymous
04/10/24(Wed)09:54:25 No.40987469

Anonymous 04/10/24(Wed)09:54:25 No.40987469

>>40987180
Can you translate that to english?

Anonymous
04/10/24(Wed)11:27:25 No.40987618

Anonymous 04/10/24(Wed)11:27:25 No.40987618

>>40987180
how fat are your fingers dude
i cannot possibly conceive fingers fat enough to hit the keys that you did
at this point i just feel bad for you

Anonymous
04/10/24(Wed)11:34:36 No.40987642

Anonymous 04/10/24(Wed)11:34:36 No.40987642

>>40987618
it might be that ogre poster with a caged onion from a decade ago

Anonymous
04/10/24(Wed)11:36:19 No.40987648

Anonymous 04/10/24(Wed)11:36:19 No.40987648

>>40987642
it completely blows my fucking mind. it honestly does. i can't believe it.
>rPHetard
>o2vwn
>>40987180
please, please, think about this, please. i'm begging you to think about this. please.

Anonymous
04/10/24(Wed)13:17:53 No.40987943

Anonymous 04/10/24(Wed)13:17:53 No.40987943

save

Anonymous
04/10/24(Wed)14:14:15 No.40988141

Anonymous 04/10/24(Wed)14:14:15 No.40988141

>>40951924
There is now an open source reproduction of that paper:
https://github.com/huggingface/parler-tts
https://huggingface.co/spaces/parler-tts/parler_tts_mini

It takes two text inputs: the text to say and a prompt describing the speaker, speaking rate, recording environment, audio quality, etc.
All datasets, code (including train/finetune), and models will be released under a permissive license. So far, they've released a 600M model trained on 10.5k hours. They're working on scaling it to 50k hours.

Anonymous
04/10/24(Wed)16:56:40 No.40988575

Anonymous 04/10/24(Wed)16:56:40 No.40988575

>>40988141
hmm, its not exactly the same thing, its more like a nicer TTS tool minus option to choose an exact specific voice
>ask for angry female
>it just talk faster
Other than that, it's nice to see alternatives, sadly "singing" or "musical" are not tags that are affecting the output in any way HOWEVER there is always a hope that someone out there can train a new upgraded model.
I still think it will take a year for someone to recreate the suno ai, and once that happens we will see entire mare albums being made each month.

Anonymous
04/10/24(Wed)17:48:43 No.40988751

Anonymous 04/10/24(Wed)17:48:43 No.40988751

>>40965026
>entire point of software is to generate something using something as a base of inspiration
>somehow this is a copyright issue
Damn, I guess I can't start an italian symphonic power metal band for fear of it being a "copyright infringement" on Rhapsody of Fire's existence.

Anonymous
04/10/24(Wed)19:31:43 No.40989120

Anonymous 04/10/24(Wed)19:31:43 No.40989120

File: 110424 aud.png (101 KB, 810x437)

101 KB PNG

I can't wait for a better tools to be made soon, since fixing the strange filter effect is not possible and lines need to be re-sang again.

Anonymous
04/10/24(Wed)21:28:48 No.40989450

Anonymous 04/10/24(Wed)21:28:48 No.40989450

>>40987618
Or maybe he was drunk as fuck.

Anonymous
04/11/24(Thu)02:30:22 No.40990187

Anonymous 04/11/24(Thu)02:30:22 No.40990187

>amre at 10

Anonymous
04/11/24(Thu)03:44:41 No.40990290

Anonymous 04/11/24(Thu)03:44:41 No.40990290

>>40990187
Shamefur dispray.

Anonymous
04/11/24(Thu)05:11:19 No.40990403

Anonymous 04/11/24(Thu)05:11:19 No.40990403

>>40988141
Interesting. I wonder how far the natural language description can be pushed. Spitballing here: one unique feature of using audio from a TV show is that if we had a show pseudo-script for the episodes (i.e. describing what is going on between the characters) it might be possible to generate synthetic descriptions for each speaking line that could give much more granular control over delivery. I'm also interested if anyone comes up with a solution for long-form inference with each line conditioned on the last, since that would be a natural fit for show audio too.

Anonymous
04/11/24(Thu)09:49:38 No.40990737

Anonymous 04/11/24(Thu)09:49:38 No.40990737

>>40990187
Almost again.

Anonymous
04/11/24(Thu)13:05:12 No.40991154

Anonymous 04/11/24(Thu)13:05:12 No.40991154

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
https://arxiv.org/abs/2404.06690
>Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge in the field. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. CoVoMix is capable of first converting dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers. These token streams are then fed into a flow-matching based acoustic model to generate mixed mel-spectrograms. Finally, the speech waveforms are produced using a HiFi-GAN model. Furthermore, we devise a comprehensive set of metrics for measuring the effectiveness of dialogue modeling and generation. Our experimental results show that CoVoMix can generate dialogues that are not only human-like in their naturalness and coherence but also involve multiple talkers engaging in multiple rounds of conversation. These dialogues, generated within a single channel, are characterized by seamless speech transitions, including overlapping speech, and appropriate paralinguistic behaviors such as laughter.
https://www.microsoft.com/en-us/research/project/covomix/
obviously from microsoft so as usual for their voicegen stuff no weights. pretty good though from the examples

Anonymous
04/11/24(Thu)16:20:46 No.40991829

Anonymous 04/11/24(Thu)16:20:46 No.40991829

Posting this from wsg

https://github.com/huggingface/parler-tts
>>/wsg/5503058
I discovered an AI program called Resemble Enhance. I used it to clean the vocals I downloaded from youtube. I uploaded the clean vocals to elevenlabs. Here are the results.
Franklin D. Roosevelt
https://vocaroo.com/1gSsiKrYtjkA
Harry S. Truman
https://vocaroo.com/1nEqj1b4XOaV
Jackson Beck (Superman announcer)
https://vocaroo.com/1kgRi6ICDNCd
Mr. Delicious
https://vocaroo.com/1ggEgkhUPRRk
I overlayed the FDR, Harry Truman, and Jackson Beck audio files with a song called Spidey Meets His Girl via kdenlive
https://m.soundcloud.com/udi-harpaz-composer/spidy-meets-his-girl?in=udi-harpaz-composer%2Fsets%2Fspiderman-by-udi-harpaz
Then I uploaded the kdenlive output wav files to resemble enhance to remove the hissing and scratching sound. It can also remove reverb this way. Here's an example.
Before:
https://vocaroo.com/1kvyT6Wh8A2v
After:
https://vocaroo.com/1i7q7woh25jt

Anonymous
04/11/24(Thu)16:25:18 No.40991849

Anonymous 04/11/24(Thu)16:25:18 No.40991849

>>40991829
>https://github.com/resemble-ai/resemble-enhance
shit, wrong github link

Anonymous
04/11/24(Thu)19:27:56 No.40992255

Anonymous 04/11/24(Thu)19:27:56 No.40992255

So the is now a copycat of 'suno', called 'udio' that does exactly same thing.
>free beta
Im guessing it will go full aiDungeon Mormon way and try to get people hooked up than slam them with 15$+ subscription or some other bullshit. And additionally, there is still no weight model included, so it goes straight into gay and fake category like the rest of those subscription services.

Anonymous
04/12/24(Fri)01:58:48 No.40993044

Anonymous 04/12/24(Fri)01:58:48 No.40993044

File: OIG1.H8Cktot.hP.ItFdkf55i.jpg (165 KB, 1024x1024)

165 KB JPG

Anonymous
04/12/24(Fri)03:30:40 No.40993206

Anonymous 04/12/24(Fri)03:30:40 No.40993206

>>40993044
Artificial lewd is still lewd.

Anonymous
04/12/24(Fri)10:50:10 No.40993735

Anonymous 04/12/24(Fri)10:50:10 No.40993735

>>40993206
But what if he generated artificial consent along with the artificial mare?

Anonymous
04/12/24(Fri)13:43:20 No.40994089

Anonymous 04/12/24(Fri)13:43:20 No.40994089

I wanna make an RVC of a vocaloid, but the rawest form of audio I have is wav samples of all the phonemes for that vocaloid. Chopped up. In other words, it's not really singing moreso it's just chopped up samples. I've tried to put them all together in one audio file, but the end result is choppy. When I have them be used separately from each other, the result is a hot mess. If only I could just... have the voice sing naturally for the input audio, but it's a vocaloid, and the raw samples are very short in length, like the longest is one second long. I'm at a loss and don't know what to do. Any help to do this right would be massively appreciated.

Anonymous
04/12/24(Fri)13:52:34 No.40994108

Anonymous 04/12/24(Fri)13:52:34 No.40994108

>>40994089
This may not be the best approach however wouldn't make sense to look for songs with the Vocaloid character and apply one of the vocal remover programs on it to get as close to "raw" audio output as possible, than place those clips in audacity to both chop them up into workable 10~ seconds clips as well as to pick out the better sounding clips (as to remove the chaff from grain).
I am not as familiar with the gaming side of Vocaloid but I would imagine there are at least some games that are using the officially approved voices, so maybe it would be worth it to scout out some forums for extracted raw audio from them ?

Anonymous
04/12/24(Fri)14:28:43 No.40994195

Anonymous 04/12/24(Fri)14:28:43 No.40994195

>>40994089
>>40994108
I wonder if the artists would be willing to share the raw vocaloid data for their songs or other projects? Would probably be the best way to get clean speaking/singing audio if you're not skilled at making your own.

Anonymous
04/12/24(Fri)14:36:43 No.40994213

Anonymous 04/12/24(Fri)14:36:43 No.40994213

>>40994195
On some rare occasion there are people willing to share raw project files, so that's not completely impossible, it all depends on how much time you are willing to spend to get the best quality dataset VS quickly grabbing a lower quality clips and start working on training imminently.

Clipper
04/12/24(Fri)15:19:44 No.40994282

Clipper 04/12/24(Fri)15:19:44 No.40994282

>>40987060
Got an error while trying to load the fim movie:
>Signature fim_movie_demu1 from audio but no corresponding label
Not sure what the "label" is supposed to be.

>timestamps in the file names become inaccurate
Not a big deal, they're mostly there for human readability and to prevent duplicate filenames. Shouldn't have any impact on the AI side of things.

Anonymous
04/12/24(Fri)15:31:18 No.40994312

Anonymous 04/12/24(Fri)15:31:18 No.40994312

>>40994282
Did you preserve the config settings (e.g. it has to point at extra_labels_index.json instead of episodes_labels_index.json)

Clipper
04/12/24(Fri)15:40:11 No.40994335

Clipper 04/12/24(Fri)15:40:11 No.40994335

File: image_2024-04-12_203900976.png (20 KB, 502x324)

20 KB PNG

>>40994312
Yeah, config looks correct.

Anonymous
04/12/24(Fri)15:45:44 No.40994347

Anonymous 04/12/24(Fri)15:45:44 No.40994347

>>40960343
https://www.youtube.com/watch?v=13IW_KDCkJo
It concerns me how one of my favorite songs from my teen years has come to describe 4chan and society as a whole when it come to anything but contemporary politics and drama of any variety.
~~it's impossible, but I'd find it interesting to have a mare voiced cover of this~~

Anonymous
04/12/24(Fri)15:56:30 No.40994363

Anonymous 04/12/24(Fri)15:56:30 No.40994363

Same anon as the one who asked about the RVC stuff here.

I remember there was someone who was able to make a unique sound to a vocaloid and made them sound like a human singer through RVC. All I know is that they used the vocaloid samples for it, and I know nothing else of the process. They have since deleted the video showcasing the results, but it still intrigued me. I want to make a unique sound for the vocaloid to sound more human, not just a vocaloid ported to RVC. Are my ambitions too high here? Is this even possible?

Anonymous
04/12/24(Fri)15:56:36 No.40994365

Anonymous 04/12/24(Fri)15:56:36 No.40994365

>>40994335
I don't know how this occurred. My only guess is that somehow I managed to upload a non-updated version of the executable here >>40987060. Deleted the old release and trying this again.
https://github.com/effusiveperiscope/PonySorter-B/releases/tag/20241204updated

Clipper
04/12/24(Fri)16:16:28 No.40994416

Clipper 04/12/24(Fri)16:16:28 No.40994416

>>40994365
Got the same error again. Could you tell me what the "label" is it's looking for? Perhaps I've just got something in the wrong place.

Anonymous
04/12/24(Fri)16:25:47 No.40994436

Anonymous 04/12/24(Fri)16:25:47 No.40994436

>>40994363
Same anon, and I think I was misunderstood. I don't mean the raw samples from a vocaloid singing in the editor... I mean the literal raw samples that were used for recording the vocaloid, which are, surprisingly enough, VERY easy to extract from the vocaloid itself.

https://voca.ro/1djZ7LBMphZ5

Above is all of Miku's raw samples compiled together. This is what I'm talking about when I mean "samples".

Anonymous
04/12/24(Fri)16:49:33 No.40994509

Anonymous 04/12/24(Fri)16:49:33 No.40994509

>>40994416
I can't reproduce this error. The "label" is supposed to be from the labels index. However the signature should not even be "fim_movie_demu1". How is your in_audio structured?

Anonymous
04/12/24(Fri)16:53:15 No.40994521

Anonymous 04/12/24(Fri)16:53:15 No.40994521

>>40994509
>>40994416
Actually, can you post the full traceback + your current save file? I think I might know what is happening.

Clipper
04/12/24(Fri)16:57:45 No.40994533

Clipper 04/12/24(Fri)16:57:45 No.40994533

File: image_2024-04-12_215648976.png (40 KB, 736x393)

40 KB PNG

>>40994521
https://files.catbox.moe/q017lo.json

Full traceback:
pygame 2.5.2 (SDL 2.28.3, Python 3.10.13)
Hello from the pygame community. https://www.pygame.org/contribute.html
[2024-04-12 21:54:59,786] [INFO] Loading project P:/PVPP/Tools/Pony Sorter 5c/EQG.json
[2024-04-12 21:55:00,095] [INFO] Loading fim_rainbow roadtrip
orig 3632.7363125
demu0 3632.7363125
demu1 3632.7363125
[2024-04-12 21:55:18,574] [INFO] Loaded fim_rainbow roadtrip
[2024-04-12 21:55:20,353] [INFO] Loaded P:/PVPP/Tools/Pony Sorter 5c/EQG.json
[2024-04-12 21:55:26,361] [INFO] Loading fim_movie_demu1
[2024-04-12 21:55:26,437] [WARNING] Signature fim_movie_demu1 from audio but no corresponding label
Traceback (most recent call last):
File "gui.py", line 496, in load_selection
File "gui.py", line 428, in load_selection
File "core.py", line 140, in load_sig
TypeError: 'NoneType' object is not iterable

Anonymous
04/12/24(Fri)17:03:07 No.40994545

Anonymous 04/12/24(Fri)17:03:07 No.40994545

>>40994533
Why is fim_movie only 418.0 MB?

Clipper
04/12/24(Fri)17:08:45 No.40994561

Clipper 04/12/24(Fri)17:08:45 No.40994561

File: 3098740.jpg (22 KB, 290x292)

22 KB JPG

>>40994545
Because it only has fim_movie_demu1 in it. That's probably the issue.

Clipper
04/12/24(Fri)17:14:35 No.40994585

Clipper 04/12/24(Fri)17:14:35 No.40994585

File: image_2024-04-12_221324418.png (85 KB, 1895x641)

85 KB PNG

>>40994545
>>40994561
I put the original audio in there and it works now. Got a "hash mismatch" warning, not sure how significant that is. Audio seems to play fine.

Anonymous
04/12/24(Fri)17:18:28 No.40994597

Anonymous 04/12/24(Fri)17:18:28 No.40994597

>>40994585
A hash mismatch warning may indicate that a file was only partially downloaded/uploaded

Anonymous
04/12/24(Fri)20:01:31 No.40995125

Anonymous 04/12/24(Fri)20:01:31 No.40995125

>10

Anonymous
04/13/24(Sat)01:42:54 No.40996134

Anonymous 04/13/24(Sat)01:42:54 No.40996134

File: 1159369.png (501 KB, 1279x717)

501 KB PNG

>>40995125
>10, mares edition

Anonymous
04/13/24(Sat)04:15:18 No.40996341

Anonymous 04/13/24(Sat)04:15:18 No.40996341

>>40996134
>Not showing 10 mares
1 job

Clipper
04/13/24(Sat)08:39:11 No.40996631

Clipper 04/13/24(Sat)08:39:11 No.40996631

>>40994597
Finished all the reviewing, but now got a new error on trying to export:

Savefile - https://files.catbox.moe/2a39wk.json

Terminal:
pygame 2.5.2 (SDL 2.28.3, Python 3.10.13)
Hello from the pygame community. https://www.pygame.org/contribute.html
[2024-04-13 13:36:11,278] [INFO] Loading project P:/PVPP/Tools/Pony Sorter 5c/EQG.json
[2024-04-13 13:36:11,576] [INFO] Loading eqg_better together_s02e04
orig 177.6775
demu0 177.6775
demu1 177.6775
[2024-04-13 13:36:12,584] [INFO] Loaded eqg_better together_s02e04
[2024-04-13 13:36:14,319] [INFO] Loaded P:/PVPP/Tools/Pony Sorter 5c/EQG.json
['eqg_better together_s02e04', 'eqg_better together_s02e05', 'eqg_better together_s02e06', 'eqg_better together_s02e07', 'eqg_dance magic', 'eqg_forgotten friendship', 'eqg_friendship_games', 'eqg_legend_of_everfree', 'eqg_mirror magic', 'eqg_movie magic', 'eqg_rollercoaster of friendship', 'fim_rainbow roadtrip', 'fim_movie']
[2024-04-13 13:36:31,004] [INFO] Processing eqg_better together_s02e04
Traceback (most recent call last):
File "gui.py", line 104, in export_all_audio
File "core.py", line 277, in export_audio
File "utils.py", line 55, in path_reparse
File "utils.py", line 44, in label_reparse
File "utils.py", line 37, in convert_decimal_seconds_to_hh_mm_ss
TypeError: unsupported operand type(s) for /: 'str' and 'int'

Anonymous
04/13/24(Sat)13:14:25 No.40997193

Anonymous 04/13/24(Sat)13:14:25 No.40997193

File: 2877288__safe_applejack_f(...).png (48 KB, 544x478)

48 KB PNG

Anonymous
04/13/24(Sat)13:35:37 No.40997261

Anonymous 04/13/24(Sat)13:35:37 No.40997261

>>40996631
Fixed
https://github.com/effusiveperiscope/PonySorter-B/releases/tag/20241304updated
View stats should also work now

Anonymous
04/13/24(Sat)20:36:36 No.40998521

Anonymous 04/13/24(Sat)20:36:36 No.40998521

bmup

Anonymous
04/14/24(Sun)01:13:58 No.40999164

Anonymous 04/14/24(Sun)01:13:58 No.40999164

File: OIG1.vy9OgD8DBpp9zsAjQoVV.jpg (193 KB, 1024x1024)

193 KB JPG

Anonymous
04/14/24(Sun)03:47:42 No.40999400

Anonymous 04/14/24(Sun)03:47:42 No.40999400

https://youtu.be/Tk8tDpweJB4?si=lbIcDfJw1hE3ABk2
Just got recommended this anyone made a AI version yet

Anonymous
04/14/24(Sun)09:10:43 No.40999829

Anonymous 04/14/24(Sun)09:10:43 No.40999829

Up.

Clipper
04/14/24(Sun)09:42:35 No.40999872

Clipper 04/14/24(Sun)09:42:35 No.40999872

File: check.png (290 KB, 3000x3000)

290 KB PNG

>>40921076
The master file has now been updated with the newly cleaned data. If you use the data in the master file, you should re-download the whole thing. Also give it a quick look-over to ensure nothing's missing.
https://mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig (same link as before)

I still have a local copy of the old version that I'll hold onto for a while, just in case something goes wrong with the new version.

>>40997261
episodes_labels_index_updated:
FiM - https://files.catbox.moe/4vscco.json
EQG + movie - https://files.catbox.moe/8kp84w.json
These should cover all of the reviewed data, can you use these to make a graph showing the overall changes of clean/noisy/very noisy?

Anonymous
04/14/24(Sun)12:13:24 No.41000220

Anonymous 04/14/24(Sun)12:13:24 No.41000220

File: iron will fittness charisma.gif (401 KB, 480x270)

401 KB GIF

>>40999872
oh boy, there is just enough data to train Iron Will rvc model on clean + noisy files.

Anonymous
04/14/24(Sun)12:25:17 No.41000255

Anonymous 04/14/24(Sun)12:25:17 No.41000255

>>40994436
Same anon here. Sorry if I'm being intrusive, I just thought that with this being an audio centered thing, people could help. If I can't be helped here, where could I go to get the assistance I need for my issue?

Anonymous
04/14/24(Sun)12:39:21 No.41000300

Anonymous 04/14/24(Sun)12:39:21 No.41000300

>>41000255
How have the previous answers to your questions not been able to address your issue?
>>40994195
>>40994213

Anonymous
04/14/24(Sun)13:30:53 No.41000466

Anonymous 04/14/24(Sun)13:30:53 No.41000466

>>41000300
Because they're talking about samples as in RENDERS of the vocaloids, and not the actual extracted vocal samples used to record the vocaloid in the first place. If that makes sense.

Anonymous
04/14/24(Sun)13:33:39 No.41000475

Anonymous 04/14/24(Sun)13:33:39 No.41000475

>>41000466
Why do you want to train off of extracted vocal samples rather than renders?

Anonymous
04/14/24(Sun)13:41:35 No.41000507

Anonymous 04/14/24(Sun)13:41:35 No.41000507

>>40999872
There seem to be a few gaps in the data, so I am going to have to check things over.

Anonymous
04/14/24(Sun)14:03:58 No.41000580

Anonymous 04/14/24(Sun)14:03:58 No.41000580

>>41000507
First gap: s1e1 to s1e16 (consistent with the "two different save files" problem we had earlier)
Second gap: s5 (the labels index you posted for s5 seems to only contain modified data for s4.) However this does not seem to affect the actual audio files. In any case I will be downloading the actual master file so I can check against that as well.

Anonymous
04/14/24(Sun)15:12:33 No.41000730

Anonymous 04/14/24(Sun)15:12:33 No.41000730

>>41000466
I think that's a lot harder (impossible?) to achieve with just a model. The model would have no real context on how each sample should flow into each other to sound natural.

Anonymous
04/14/24(Sun)16:29:08 No.41000924

Anonymous 04/14/24(Sun)16:29:08 No.41000924

File: Iron_Will_holding_Flutter(...).png (471 KB, 1280x720)

471 KB PNG

>>41000220
https://files.catbox.moe/aokhto.mp3
https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/MLP_Iron_Will_GA
2m8s (with the use of all the clean lines and most of the noisy lines) have been used to train this RVC Iron Will model. I pushed the training to run for 350, as I feel the extra steps do help out when the dataset is so short.
Now it's time to lurk for another obscure background ponies that may have gained additional clean lines for production of training datasets.

Anonymous
04/14/24(Sun)16:42:10 No.41000960

Anonymous 04/14/24(Sun)16:42:10 No.41000960

>>41000730
Harder/Impossible? Is there any way to make them smoother somehow? Or anything like that? Or something that make them less choppy? The reason I insist on using the samples is because they’re the audio files with no vocaloid engine noise in them. No matter what render I could use, it’ll always have vocaloid engine noise at least in some capacity.

Anonymous
04/14/24(Sun)16:43:19 No.41000966

Anonymous 04/14/24(Sun)16:43:19 No.41000966

Did this masterpiece ever get remade with better models?
https://vocaroo.com/kwLL9EyTbAQ

Anonymous
04/14/24(Sun)17:33:43 No.41001070

Anonymous 04/14/24(Sun)17:33:43 No.41001070

>>41000960
Sorry not really sure how you'd accomplish that with AI. Maybe you could manually do something with the audio files and map them onto the vocaloid instructions and train on those outputs?

Anonymous
04/14/24(Sun)17:44:51 No.41001094

Anonymous 04/14/24(Sun)17:44:51 No.41001094

>>41001070
I don’t know if that would work, nor where to start. The weirdest thing is that some people have done it before, and done it considerably well, to the point there’s a very human-like and dynamic nature of the voice not present in the samples for the RVC models. The only thing that’s wrong here is that unfortunately, they don’t make their methods public. So anyone interested in replicating it is essentially scrambling around for answers and leads.

Anonymous
04/14/24(Sun)19:50:04 No.41001527

Anonymous 04/14/24(Sun)19:50:04 No.41001527

File: image_f48b08aa-4f01-4472-(...).png (104 KB, 256x256)

104 KB PNG

~~>>40966045 Since there is no response to this post for over a week I guess the idea can be consider ded~~
https://suno.com/song/7a5c06b1-dafa-4717-967f-6416a3de2b0a
https://vocaroo.com/1iNd7HL85PoL
Here is song inspired by the green text (37557739) of Anon's dream about mare with stage name Black Sunset singing metal power song in some indie Equestria band.
>catbox is temporary ded
other than that, I can confirm that suno devs starting to behave little bit more kosher, last week one could easly generate 2m+ song, and now the song generator has given me 1m20s output.

Anonymous
04/14/24(Sun)21:21:40 No.41001855

Anonymous 04/14/24(Sun)21:21:40 No.41001855

>>41000966
Not as far as I know.

Anonymous
04/14/24(Sun)21:47:03 No.41001932

Anonymous 04/14/24(Sun)21:47:03 No.41001932

File: final cns split.png (117 KB, 4800x3000)

117 KB PNG

>>41000580
>>41000507
>>40999872
Here is the "corrected" version of the episodes_labels_index.json for the FiM files: https://files.catbox.moe/g0mg81.json

From analysis of the master file there were no skipped FiM episodes aside from the ones that are supposed to be skipped (special source/outtakes), at least according to the generated labels, and all lines inside the FiM label files correspond to an existing file in the master file. *However* the current Label Files directory does not contain label files for EQG/movies/specials as well as the older label files for audio we didn't handle like songs.txt, mobile game, and Other.

Over the entire observed data: Of 58578 total lines, 14927 noise ratings were modified (~25%). The clean/noisy/very noisy split was 17353/19629/21596 prior, 21599/22475/14504 post.

Anonymous
04/15/24(Mon)00:54:44 No.41002384

Anonymous 04/15/24(Mon)00:54:44 No.41002384

File: 1702557325855805.gif (71 KB, 216x195)

71 KB GIF

Anonymous
04/15/24(Mon)02:30:36 No.41002579

Anonymous 04/15/24(Mon)02:30:36 No.41002579

>>41001932
Not sure if this is possible but could you create a text file that list a comparison of all characters lines in hours/minutes/seconds? Eg:
Twilight C:3 hours N:2 hours VN: 1 hour
Fluttershy C:2 hours N: 45 minutes VN: 20 minutes
Minuette C: 1 minute N: 25 second VN: 10 second

Anonymous
04/15/24(Mon)12:26:30 No.41003396

Anonymous 04/15/24(Mon)12:26:30 No.41003396

Page 10 bump.

Anonymous
04/15/24(Mon)13:27:22 No.41003547

Anonymous 04/15/24(Mon)13:27:22 No.41003547

File: Screenshot 2024-04-15 102627.png (139 KB, 765x1148)

139 KB PNG

>>41002579
Would a csv be better? Here is a CSV for Clean/Noisy/Very Noisy split in seconds, ordered by Clean+Noisy, calculated across FiM episodes + EQG data + movie and specials. Duration is calculated from label timestamps.
https://files.catbox.moe/xu0age.csv

Anonymous
04/15/24(Mon)15:04:46 No.41003813

Anonymous 04/15/24(Mon)15:04:46 No.41003813

>>41003547
cheers, this format is much better
>all of fim has 250 spoken characters
huh, thats about the same number as I would expect of npcs in a pretty decent rpg.

Clipper
04/15/24(Mon)15:41:45 No.41003960

Clipper 04/15/24(Mon)15:41:45 No.41003960

>>41001932
>all lines inside the FiM label files correspond to an existing file in the master file.
Does that mean the missing stuff from >>41000580 can be disregarded, or is there something still to fix there?

>the current Label Files directory does not contain label files
Fixed. Forgot to upload the old unchanged labels earlier.

>pic
Looks like a worthwhile improvement, especially considering that a lot of the "noisy" lines are now less noisy.

Anonymous
04/15/24(Mon)16:45:34 No.41004140

Anonymous 04/15/24(Mon)16:45:34 No.41004140

>>41003813
Some one-offs were not annotated I think.
>>41003960
>Does that mean the missing stuff from >>41000580 # can be disregarded, or is there something still to fix there?
Yes, that was just inside the label index.

Anonymous
04/15/24(Mon)20:20:04 No.41004630

Anonymous 04/15/24(Mon)20:20:04 No.41004630

Up from page 9.

Anonymous
04/16/24(Tue)01:49:24 No.41005291

Anonymous 04/16/24(Tue)01:49:24 No.41005291

>>41004630
from 10

Anonymous
04/16/24(Tue)03:39:57 No.41005445

Anonymous 04/16/24(Tue)03:39:57 No.41005445

>>41005291
Not so good.

Anonymous
04/16/24(Tue)08:39:25 No.41005790

Anonymous 04/16/24(Tue)08:39:25 No.41005790

File: 6d42371089b9046c.png (484 KB, 720x720)

484 KB PNG

So what is everyone working on? I'm cracking my head on figuring out some punk song lyrics from perspective of Octavia.

Anonymous
04/16/24(Tue)16:09:18 No.41006687

Anonymous 04/16/24(Tue)16:09:18 No.41006687

https://huggingface.co/spaces/Xenova/musicgen-web
not new but a web version for that one guy who seems to like musicgen

Anonymous
04/16/24(Tue)18:45:31 No.41007109

Anonymous 04/16/24(Tue)18:45:31 No.41007109

>>41006687
trying to generate 20s music make the generator shit itself, and the 10s samples are bit meh (I remember using the offline version a year ago and it work much better than).

Anonymous
04/16/24(Tue)20:02:10 No.41007304

Anonymous 04/16/24(Tue)20:02:10 No.41007304

>>41001527
need more pony power metal

Anonymous
04/17/24(Wed)00:37:54 No.41007990

Anonymous 04/17/24(Wed)00:37:54 No.41007990

File: OIG4.0KTmUS14y4ixYljSmBNN.jpg (143 KB, 1024x1024)

143 KB JPG

Anonymous
04/17/24(Wed)02:32:54 No.41008222

Anonymous 04/17/24(Wed)02:32:54 No.41008222

>>41007990
A loot of artificial booping happening here lately.
Not that I complain.

Synthbot
04/17/24(Wed)08:36:20 No.41008753

Synthbot 04/17/24(Wed)08:36:20 No.41008753

>>40999872
I uploaded a copy of the Master File here: https://drive.google.com/drive/u/2/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQ
Sort of. I fixed one typo: in Special Source, one of the folders says "Freindship" instead of "Friendship". It should technically also be "Rollercoaster" rather than "Roller Coaster". I plan to do a more thorough check tomorrow for typos and potentially other errors.

The Special Source/Luster Dawn folder is empty. Is it supposed to be?

Anonymous
04/17/24(Wed)13:11:02 No.41009282

Anonymous 04/17/24(Wed)13:11:02 No.41009282

>>41007304
That would be nice, and to tell the truth I look forward that isn't just another wubstep remix.
Now this makes me wonder, what is the rares type of pony musical genre ? Metal, jazz and ska seems pretty rare.

Anonymous
04/17/24(Wed)15:02:01 No.41009567

Anonymous 04/17/24(Wed)15:02:01 No.41009567

Long-form music generation with latent diffusion
https://arxiv.org/abs/2404.10301
>Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure. We show that by training a generative model on long temporal contexts it is possible to produce long-form music of up to 4m45s. Our model consists of a diffusion-transformer operating on a highly downsampled continuous latent representation (latent rate of 21.5Hz). It obtains state-of-the-art generations according to metrics on audio quality and prompt alignment, and subjective tests reveal that it produces full-length music with coherent structure.
https://stability-ai.github.io/stable-audio-2-demo/
https://github.com/Stability-AI/stable-audio-tools/
stable audio 2 paper. still no weights

Anonymous
04/17/24(Wed)15:41:03 No.41009637

Anonymous 04/17/24(Wed)15:41:03 No.41009637

>>41009282
Basically anything that isn't EDM, hyperpop, or dubstep. Swing music that's actual swing music and not "electroswing" yeah let's just take this 1940s music and ruin it with electric bassy drums, then ruin it further by chopping it up so it's repeating the same little bit, and to top it off let's master it by making the electric drum beat louder than the actual music! is also quite rare.

Anonymous
04/17/24(Wed)16:55:15 No.41009800

Anonymous 04/17/24(Wed)16:55:15 No.41009800

>>41009637
out of curiosity what is your opinion on https://youtu.be/N7wTBGP4UFs?si=W4w_mSo6vQj9QpwD?

Anonymous
04/17/24(Wed)17:49:42 No.41009954

Anonymous 04/17/24(Wed)17:49:42 No.41009954

https://huggingface.co/spaces/pyp1/VoiceCraft_gradio
webui for voicecraft

Anonymous
04/17/24(Wed)18:20:14 No.41010050

Anonymous 04/17/24(Wed)18:20:14 No.41010050

>>41009954
input
https://vocaroo.com/1nItsHLVEtqp
output
https://vocaroo.com/18qK1E6F1CnJ
Now this may have create some possibilities in the future, however they would need to update the output quality, the 16000Hz suck ass pretty hard.

Clipper
04/17/24(Wed)18:49:22 No.41010125

Clipper 04/17/24(Wed)18:49:22 No.41010125

>>41008753
>Freindship
>Roller Coaster
Fixed. Just simple typos.

>Special Source/Luster Dawn folder is empty
My records show the same empty folder existed in the previous master file version, so not a new issue that's suddenly been introduced. Likely a temp folder from the first pass at processing the S9 leaks that I forgot to delete.

Anonymous
04/18/24(Thu)00:23:53 No.41010905

Anonymous 04/18/24(Thu)00:23:53 No.41010905

File: OIG3.f9imj0ENUK7Fg6HBW5Za.jpg (104 KB, 1024x1024)

104 KB JPG

Anonymous
04/18/24(Thu)01:57:30 No.41011079

Anonymous 04/18/24(Thu)01:57:30 No.41011079

File: 48823546.21_image.png (386 KB, 637x637)

386 KB PNG

messing around with sovits4 again and how feasible would it be to try and recreate all the vocal effects taylor uses and blank space? here is a small example of what I've got so far using dash's voice to replace vocals
https://files.catbox.moe/0etyeh.mp3
don't even know what vocal effects are used in blank space

Synthbot
04/18/24(Thu)02:40:28 No.41011157

Synthbot 04/18/24(Thu)02:40:28 No.41011157

>>41010125
>Sliced Dialogue/Label files/s01e07_Music.txt
Everything else using a lowercase '_music', but this one uses an uppercase M.
>Sliced Dialogue/EQG/EQG Roller Coaster of Friendship
Should be "EQG Rollercoaster of Friendship" to match the updated name in Special Source.

Some transcripts use "Flurry Heart", some use "Flurry".
>./fim_s09e01_original.txt:940.685169 941.573000 00_15_41_Flurry Heart_Anxious_Very Noisy_mama!
>./s09e01.txt:940.685169 941.573000 00_15_41_Flurry Heart_Anxious_Noisy_mama!
>./fim_s09e01.txt:940.685169 941.573000 00_15_41_Flurry Heart_Anxious_Very Noisy_mama!
>./s09e01_demu0.txt:940.685169 941.573000 00_15_41_Flurry Heart_Anxious_Noisy_mama!
>./fim_s07e03.txt:683.600000 684.812919 00_11_24_Flurry_Happy_Very Noisy_ta da!
>./s07e03.txt:683.600000 684.812919 00_11_24_Flurry_Happy_Very Noisy_ta da!
>./fim_s07e03_original.txt:683.600000 684.812919 00_11_24_Flurry_Happy_Very Noisy_ta da!
>./s07e03_master_ver.txt:683.600000 684.812919 00_11_24_Flurry_Happy_Very Noisy_ta da!

Typo (Applejack223 should be Applejack):
>./s01e18.txt:5.484631 7.017492 00_00_05_Applejack223_Neutral_Noisy_we're almost there, youngins!
>./s01e18_demu1.txt:5.484631 7.017492 00_00_05_Applejack223_Neutral_Noisy_we're almost there, youngins!

Typo (Mrs.Cake should be Mrs. Cake, with a space for consistency with other labels)
>s02e10.txt:366.058770 370.764964 00_06_06_Mrs.Cake_Happy_Noisy_When we found out it was your birthday, we couldn't resist trying out a new recipe!
>fim_s02e10_original.txt:366.058770 370.764964 00_06_06_Mrs.Cake_Happy_Noisy_When we found out it was your birthday, we couldn't resist trying out a new recipe!
>fim_s02e10.txt:366.058770 370.764964 00_06_06_Mrs.Cake_Happy_Noisy_When we found out it was your birthday, we couldn't resist trying out a new recipe!
>s02e10_demu0.txt:366.058770 370.764964 00_06_06_Mrs.Cake_Happy_Noisy_When we found out it was your birthday, we couldn't resist trying out a new recipe!

Typo (Jet Set has an extra space at the end of his name)
>./fim_s02e09_original.txt:144.263221 147.820380 00_02_24_Jet Set _Neutral__We saw you from across the cafe and just had to find out.
>./s02e09_master_ver.txt:144.263221 147.820380 00_02_24_Jet Set _Neutral__We saw you from across the cafe and just had to find out.
>./s02e09.txt:144.263221 147.820380 00_02_24_Jet Set _Neutral__We saw you from across the cafe and just had to find out.
>./fim_s02e09.txt:144.263221 147.820380 00_02_24_Jet Set _Neutral__We saw you from across the cafe and just had to find out.

Synthbot
04/18/24(Thu)02:40:58 No.41011159

Synthbot 04/18/24(Thu)02:40:58 No.41011159

>>41010125
>>41011157
In Rainbow Roadtrip, there are some weird "Si" emotion tags.
>00_22_01_Mayor Sunny Skies_Sad Si_Noisy_then fences went #up#, we lost track of our neighbors, each year passing dimming spirits all #around#.txt
>00_22_09_Mayor Sunny Skies_Sad Si_Very Noisy_the happy days came to an #end#, and nopony had time to spend to gather in the #town#.txt
>00_22_18_Mayor Sunny Skies_Sad Si_Very Noisy_i thought i knew exactly what the #festival# #needed#.txt
>00_22_26_Mayor Sunny Skies_Sad Si_Very Noisy_a bigger better #rainbow#, would surely make them see #it#.txt
>00_22_35_Mayor Sunny Skies_Sad Si_Very Noisy_but the extra magic was too much for the #rainbow# #generator#.txt
>00_22_42_Mayor Sunny Skies_Sad Si_Very Noisy_and i'm the one who brought the #rainbow#, to an end.txt
>00_22_54_Mayor Sunny Skies_Sad Si_Noisy_that's how our #town#, our little pony #town#, that's how our town saw the #end#, of the #rainbow#.txt

For consistency, the character name should be "Petunia Petals" rather than "Petunia" in Rainbow Roadtrip:
>00_42_27_Petunia_Neutral_Very Noisy_not all of it was.txt

And "Mr. Moody Root" should be "Moody Root":
>00_38_40_Mr. Moody Root_Annoyed_Very Noisy_who wants to know_.txt

Synthbot
04/18/24(Thu)03:07:22 No.41011187

Synthbot 04/18/24(Thu)03:07:22 No.41011187

I pushed the code I'm using to validate the Master File here:
https://github.com/synthbot-anon/horsefm-lib

Anonymous
04/18/24(Thu)09:12:08 No.41011652

Anonymous 04/18/24(Thu)09:12:08 No.41011652

>Page 10

Anonymous
04/18/24(Thu)12:51:54 No.41012026

Anonymous 04/18/24(Thu)12:51:54 No.41012026

>9 bump

Anonymous
04/18/24(Thu)16:07:02 No.41012445

Anonymous 04/18/24(Thu)16:07:02 No.41012445

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
https://arxiv.org/abs/2404.10667
>We introduce VASA, a framework for generating lifelike talking faces with appealing visual affective skills (VAS) given a single static image and a speech audio clip. Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness. The core innovations include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos. Through extensive experiments including evaluation on a set of new metrics, we show that our method significantly outperforms previous methods along various dimensions comprehensively. Our method not only delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512x512 videos at up to 40 FPS with negligible starting latency. It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.
https://www.microsoft.com/en-us/research/project/vasa-1/
from microsoft so no weights ever I'm sure. obviously human faces but just posting for potential in the future for using such tech for animated faces

Anonymous
04/18/24(Thu)17:36:09 No.41012620

Anonymous 04/18/24(Thu)17:36:09 No.41012620

had a thought yesterday and figured i'd ask here
most AI work seems to be done on either windows or linux, and seems to be 'shackled' for lack of a better word to the limitations of those operating systems
what would have to be done to build a brand new OS from the ground up specifically for AI development/training/operation and would a dedicated OS result in any noticable or worthwhile improvement?

Anonymous
04/18/24(Thu)17:56:53 No.41012672

Anonymous 04/18/24(Thu)17:56:53 No.41012672

File: terri 1665282602622.png (809 KB, 2310x2147)

809 KB PNG

>>41012620
>making custom OS for pony ai development
Technically doable but I do not think even with all the Tuplafags on the board we have enough schizo power to make that into reality.

Clipper
04/18/24(Thu)18:08:21 No.41012699

Clipper 04/18/24(Thu)18:08:21 No.41012699

>>41011157
Changed "Flurry Heart" to just "Flurry" for s9e1. Left s7e3 version as-is.
Fixed the "Applejack223" typo.
Fixed the "Mrs.Cake" typo.
Fixed the "Jet Set _" typo.

>>41011159
Changed all "Si"s to "Singing".
Fixed "Petunia" typo.
Fixed "Mr. Moody Root" typo.

That should cover everything.

Anonymous
04/18/24(Thu)19:48:33 No.41012943

Anonymous 04/18/24(Thu)19:48:33 No.41012943

Late night preservation post.

Anonymous
04/18/24(Thu)20:29:55 No.41013031

Anonymous 04/18/24(Thu)20:29:55 No.41013031

>>41012620
Compatibility with all other AI research is the highest priority for speeding up development, and creating a new OS would make that essentially impossible.
The OS is mostly a pass-through for all of the software that runs on it. It's not much of a bottleneck, especially on Linux, and especially with docker.

Synthbot
04/18/24(Thu)23:50:09 No.41013398

Synthbot 04/18/24(Thu)23:50:09 No.41013398

>>41012699
The FiM folder seems to have a copy of the Special Source EQG Short folder. Otherwise, that seems to be everything for now.
The updated Gdrive clone is here: https://drive.google.com/drive/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQ?usp=drive_link
I uploaded a copy of the dialogue dataset to HuggingFace as well: https://huggingface.co/datasets/synthbot/pony-speech
I updated horsefm-lib with code for pushing the data to HuggingFace: https://github.com/synthbot-anon/horsefm-lib/blob/main/notebooks/export_data.ipynb

I realized that my clone of Master File 2 is outdated. I'm fixing that now. I'll check that for errors too.

Synthbot
04/19/24(Fri)01:06:40 No.41013574

Synthbot 04/19/24(Fri)01:06:40 No.41013574

>>41012699
In Rainbow Roadtrip, the Master File has some singing lines from 00_21_05 to 00_22_54. Some of these seem to be missing from the Songs folder in Master File 2. For example, this is a singing line that only exists in Master File 1:
>00_21_05_Mayor Sunny Skies_Happy Singing_Very Noisy_they planned for weeks, cooked for days, celebrated fifty #ways#, so everypony would gather here, in our town at the end of the #rainbow#..flac

Typo (tag should be Singing rather than Singingnging):
>00_20_00_Mayor Sunny Skies_Happy Singingnging_Noisy_next-door neighbors chatting, over white wood fences, stopping on the street to say #hello#..flac
>00_20_00_Mayor Sunny Skies_Happy Singingnging_Noisy_next-door neighbors chatting, over white wood fences, stopping on the street to say #hello#.txt

>>41013398
The Master File 2 gdrive clone is here: https://drive.google.com/drive/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQ?usp=drive_link
The singing data is on HuggingFace here: https://huggingface.co/datasets/synthbot/pony-singing

Anonymous
04/19/24(Fri)05:00:44 No.41013861

Anonymous 04/19/24(Fri)05:00:44 No.41013861

File: OIG1.UrRXIjP4TpSLMAN.FwAJ.jpg (162 KB, 1024x1024)

162 KB JPG

>>41012943

Synthbot
04/19/24(Fri)06:54:54 No.41013995

Synthbot 04/19/24(Fri)06:54:54 No.41013995

>>41012699
Errors in the SFX & Music labels:
https://ponepaste.org/9979

I'm not done finding all of them yet. There's a large list of tags at the bottom of the paste that I still need to read through.

Anonymous
04/19/24(Fri)11:08:29 No.41014426

Anonymous 04/19/24(Fri)11:08:29 No.41014426

>>41013861
That's a weapons grade cute robo flutters.

Anonymous
04/19/24(Fri)14:37:16 No.41014860

Anonymous 04/19/24(Fri)14:37:16 No.41014860

File: OIG2.EpDP2bHi1OlMqfHAQCNg.jpg (154 KB, 1024x1024)

154 KB JPG

Anonymous
04/19/24(Fri)15:02:47 No.41014919

Anonymous 04/19/24(Fri)15:02:47 No.41014919

File: 1700452279968348.jpg (153 KB, 1280x960)

153 KB JPG

>>40954981
>>40957582
https://www.youtube.com/watch?v=TWcajuXywHA
So from my experimentation this is what I've learned. Trying to run the extracted vocals direcly on rvc/sovits creates weird result due to the model is applying random reverb, audio effects and duo vocals layered on top of each other, and using the talknet to create a somewhat uniform effect is somewhat difficult as the pitch levels are jumping all over the place.
I've tried to fix above with reducing more derped and un-salvageable lines within the rvc on my own but the pitch difference between them was pretty noticeable, so as final attempt at saving it I run the end result within audiotune filter and it seem to calm down the random pitch ups and downs.
The more frustrating part of it was trying to get Pinkie voice sing a tune that is slow, calm and bit sad, all the three things that are very opposite of what would be expected from her type of voice.
A best solution here would be to have this kind of tech offline and just train the singing part of the model to understand what Pinkie sounds like to just get the exact result from the get go, but that's most likely will not happen until next year.

Anonymous
04/19/24(Fri)15:32:48 No.41014980

Anonymous 04/19/24(Fri)15:32:48 No.41014980

File: 1709595004589913.gif (1.83 MB, 306x306)

1.83 MB GIF

>>41011079
ayo, this shit slaps. do you got any more dash covers? :)

Clipper
04/19/24(Fri)16:11:10 No.41015066

Clipper 04/19/24(Fri)16:11:10 No.41015066

>>41013398
Removed duplicate folder.

>>41013574
>Missing singing lines
Copied over all lines from master file to master file 2 between 21:05 - 22:54, that should fix it.

>Singingnging
Corrected typo.

>>41013995
SFX and Music.
I know there will be a load of typos/inconsistencies in there, that was all done by hand and I pretty much just made up the tagging system as I went along. I might pass on fixing most of those if they're not too bad, going through all that's gonna be very tedious.

Anonymous
04/19/24(Fri)18:55:46 No.41015477

Anonymous 04/19/24(Fri)18:55:46 No.41015477

>>41011079
I know nothing about vocal effects, but this is already pretty nice for a sovits cover.

Synthbot
04/19/24(Fri)19:18:09 No.41015557

Synthbot 04/19/24(Fri)19:18:09 No.41015557

>>41015066
>singing lines
I'll update my gdrive later today. The HF repos should already be up-to-date since they treat the two Master Files as one pile of data.
>SFX and Music
>I might pass on fixing most of those if they're not too bad, going through all that's gonna be very tedious.
Agreed, and I've only checked a small fraction of the tags so far.
If any code anons want to try automating some of these fixes, see the paste linked in >>41013995. These are all either mismatches, typos, or tag consistency issues between the sfx.txt/music.txt label files in Master File 1 and the .flac files in Master File 2's Music and SFX folder. Fixing them means updating labels in .txt files and changing filenames for .flac files.

Synthbot
04/19/24(Fri)19:57:41 No.41015670

Synthbot 04/19/24(Fri)19:57:41 No.41015670

>>41015066
I realized that the mismatches between label files and audio files would be the hardest to programmatically fix, and I think you can fix all of those by re-exporting the data. It's these episodes that have issues:
>FiM_s01e01
>FiM_s01e02
>FiM_s01e04
>FiM_s01e05
>FiM_s01e25
>FiM_s02e01
>FiM_s02e03
>FiM_s02e05
>FiM_s02e07
>FiM_s02e14
>FiM_s02e17
>FiM_s02e20
If you can re-export those, it would be much easier for code anons to fix the rest of the errors.

Anonymous
04/19/24(Fri)21:46:23 No.41015946

Anonymous 04/19/24(Fri)21:46:23 No.41015946

>>40925332
1 month late but Fluttershy and Pinkie

Synthbot
04/20/24(Sat)01:06:42 No.41016453

Synthbot 04/20/24(Sat)01:06:42 No.41016453

>>41015066
The Master File 2 link in Master File 1 is broken, which might explain the mismatches. The link in the Master File mega points to:
>https://mega.nz/folder/0UhSmYAB#WBrB-qCprQTofkAhwMp5CQ
Which doesn't seem to be available. I'm using the link from the OP:
>https://mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
Which works.

Anonymous
04/20/24(Sat)04:05:55 No.41016751

Anonymous 04/20/24(Sat)04:05:55 No.41016751

>>41014860
This looks it got a fair share of inspiration from ET.

Anonymous
04/20/24(Sat)11:56:14 No.41017420

Anonymous 04/20/24(Sat)11:56:14 No.41017420

>Page 9

Anonymous
04/20/24(Sat)19:26:50 No.41018666

Anonymous 04/20/24(Sat)19:26:50 No.41018666

Up.

Anonymous
04/21/24(Sun)00:27:41 No.41019337

Anonymous 04/21/24(Sun)00:27:41 No.41019337

File: Screenshot 2024-02-10 102511.png (189 KB, 378x350)

189 KB PNG

Synthbot
04/21/24(Sun)03:16:03 No.41019588

Synthbot 04/21/24(Sun)03:16:03 No.41019588

>>40921076
My gdrive now has the updated Master File 2 plus the latest Master File updates. Same link.
https://drive.google.com/drive/u/2/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQ
And I updated my horsefm-lib to support Master File 2 for the singing lines.
https://github.com/synthbot-anon/horsefm-lib
The export_data.ipynb notebook shows how to separate the dialogue & singing lines.
https://github.com/synthbot-anon/horsefm-lib/blob/main/notebooks/export_data.ipynb

Anonymous
04/21/24(Sun)08:52:08 No.41020076

Anonymous 04/21/24(Sun)08:52:08 No.41020076

>10

Anonymous
04/21/24(Sun)13:52:08 No.41020716

Anonymous 04/21/24(Sun)13:52:08 No.41020716

Does anyone know what format Stable Video Diffusion needs for fine-tuning? Is there a preferred frame rate, resolution, image format? Is transparency okay? Anything else to consider?

Anonymous
04/21/24(Sun)18:50:30 No.41021498

Anonymous 04/21/24(Sun)18:50:30 No.41021498

>>41020076

Anonymous
04/21/24(Sun)21:57:53 No.41022009

Anonymous 04/21/24(Sun)21:57:53 No.41022009

>>41021498
-1, but still bad.

Anonymous
04/22/24(Mon)00:19:59 No.41022450

Anonymous 04/22/24(Mon)00:19:59 No.41022450

File: OIG1.kb5PGzwgnigEAK.UhDWD.jpg (157 KB, 1024x1024)

157 KB JPG

Anonymous
04/22/24(Mon)03:25:04 No.41022734

Anonymous 04/22/24(Mon)03:25:04 No.41022734

>>41022450
Damn, that's a captivating eye she has here.

Anonymous
04/22/24(Mon)07:20:59 No.41023036

Anonymous 04/22/24(Mon)07:20:59 No.41023036

>>41020716
NTA but i would love to know more about this too

Anonymous
04/22/24(Mon)12:48:52 No.41023564

Anonymous 04/22/24(Mon)12:48:52 No.41023564

>>41022450
Another boop.

Anonymous
04/22/24(Mon)16:15:11 No.41023953

Anonymous 04/22/24(Mon)16:15:11 No.41023953

File: teaser.png (672 KB, 1296x665)

672 KB PNG

Learn2Talk: 3D Talking Face Learns from 2D Talking Face
https://arxiv.org/abs/2404.12888
>Speech-driven facial animation methods usually contain two main classes, 3D and 2D talking face, both of which attract considerable research attention in recent years. However, to the best of our knowledge, the research on 3D talking face does not go deeper as 2D talking face, in the aspect of lip-synchronization (lip-sync) and speech perception. To mind the gap between the two sub-fields, we propose a learning framework named Learn2Talk, which can construct a better 3D talking face network by exploiting two expertise points from the field of 2D talking face. Firstly, inspired by the audio-video sync network, a 3D sync-lip expert model is devised for the pursuit of lip-sync between audio and 3D facial motion. Secondly, a teacher model selected from 2D talking face methods is used to guide the training of the audio-to-3D motions regression network to yield more 3D vertex accuracy. Extensive experiments show the advantages of the proposed framework in terms of lip-sync, vertex accuracy and speech perception, compared with state-of-the-arts. Finally, we show two applications of the proposed framework: audio-visual speech recognition and speech-driven 3D Gaussian Splatting based avatar animation.
https://lkjkjoiuiu.github.io/Learn2Talk/
no weights. but pretty cool. relevant here I think though obviously you'd need to do a big fine-tune or even train a new model for pony/animation to pony 3d models

Anonymous
04/22/24(Mon)17:42:58 No.41024137

Anonymous 04/22/24(Mon)17:42:58 No.41024137

File: NAI v3 mares.jpg (769 KB, 2432x1664)

769 KB JPG

Huh, wasn't expecting NAI to ever update their ancient furry model to V3, but turns out they did.

Some pretty decent mares can result, this time without as much of washed out coloring the old one had. It can also kind of do cutie marks, interestingly. I've still got a bunch of Anlas from months ago, so I'll do more testing. Not expecting this to be a competitor to Pony Diffusion; comparable alternative to note though.

Anonymous
04/22/24(Mon)19:43:38 No.41024413

Anonymous 04/22/24(Mon)19:43:38 No.41024413

>>41023564
Indeed.

Anonymous
04/22/24(Mon)20:02:28 No.41024464

Anonymous 04/22/24(Mon)20:02:28 No.41024464

apt-get install -y espeak espeak-data libespeak1 libespeak-dev
apt-get install -y festival*
apt-get install -y build-essential
apt-get install -y flac libasound2-dev libsndfile1-dev vorbis-tools
apt-get install -y libxml2-dev libxslt-dev zlib1g-dev
pip install -r gradio_requirements.txt

Does anybody know how do i install this linux bullshit on windows?
>>41024137
that's nice looking amres, even the cutie marks are decently shaped and not a bunch of colorful nonsense

Anonymous
04/22/24(Mon)23:58:51 No.41024857

Anonymous 04/22/24(Mon)23:58:51 No.41024857

File: file.png (9 KB, 1011x56)

9 KB PNG

I'm getting this error when trying to download Talknet models with Haysay.

Any advice?

Anonymous
04/23/24(Tue)02:43:55 No.41025116

Anonymous 04/23/24(Tue)02:43:55 No.41025116

>>41024413
Yes.

Anonymous
04/23/24(Tue)04:26:39 No.41025268

Anonymous 04/23/24(Tue)04:26:39 No.41025268

LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search
https://arxiv.org/abs/2404.14063
>Evolutionary Algorithms and Generative Deep Learning have been two of the most powerful tools for sound generation tasks. However, they have limitations: Evolutionary Algorithms require complicated designs, posing challenges in control and achieving realistic sound generation. Generative Deep Learning models often copy from the dataset and lack creativity. In this paper, we propose LVNS-RAVE, a method to combine Evolutionary Algorithms and Generative Deep Learning to produce realistic and novel sounds. We use the RAVE model as the sound generator and the VGGish model as a novelty evaluator in the Latent Vector Novelty Search (LVNS) algorithm. The reported experiments show that the method can successfully generate diversified, novel audio samples under different mutation setups using different pre-trained RAVE models. The characteristics of the generation process can be easily controlled with the mutation parameters. The proposed algorithm can be a creative tool for sound artists and musicians.
https://github.com/fisheggg/LVNS-RAVE
https://huggingface.co/Intelligent-Instruments-Lab/rave-models/tree/main
audiogen stuff. examples on their github. short paper but the models were trained 6 months ago? guess they really wanted their paper in some specific conference

Anonymous
04/23/24(Tue)04:35:56 No.41025282

Anonymous 04/23/24(Tue)04:35:56 No.41025282

File: Screen Shot 2024-04-23 at(...).png (713 KB, 2080x1276)

713 KB PNG

Trying to generate audio with RVC using Starlight but I get this error...

Anonymous
04/23/24(Tue)10:44:23 No.41025771

Anonymous 04/23/24(Tue)10:44:23 No.41025771

>>41025282
Try again. It happened to me too sometimes but worked well at a later attempt.

Anonymous
04/23/24(Tue)11:47:39 No.41025913

Anonymous 04/23/24(Tue)11:47:39 No.41025913

File: 3349451.png (1.1 MB, 1593x1494)

1.1 MB PNG

https://voca.ro/1k7cZmIGmTU9

Anonymous
04/23/24(Tue)15:08:35 No.41026453

Anonymous 04/23/24(Tue)15:08:35 No.41026453

Up.

Anonymous
04/23/24(Tue)19:38:37 No.41027057

Anonymous 04/23/24(Tue)19:38:37 No.41027057

>>41025913
Typing "voca.ro" in desuarchive's search box was definitely the best decision I've made today.
~~AŁA KURWA GRYZIE~~

Anonymous
04/23/24(Tue)23:40:01 No.41027551

Anonymous 04/23/24(Tue)23:40:01 No.41027551

>>41025771
Still doesn't work...does file size and audio duration matter? The file I'm trying to convert is around 10 and a half minutes

Anonymous
04/24/24(Wed)01:39:39 No.41027775

Anonymous 04/24/24(Wed)01:39:39 No.41027775

FlashSpeech: Efficient Zero-Shot Speech Synthesis
https://arxiv.org/abs/2404.14700
>Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large-scale zero-shot speech synthesis system with approximately 5\% of the inference time compared with previous work. FlashSpeech is built on the latent consistency model and applies a novel adversarial consistency training approach that can train from scratch without the need for a pre-trained diffusion model as the teacher. Furthermore, a new prosody generator module enhances the diversity of prosody, making the rhythm of the speech sound more natural. The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation. Our experimental results demonstrate the superior performance of FlashSpeech. Notably, FlashSpeech can be about 20 times faster than other zero-shot speech synthesis systems while maintaining comparable performance in terms of voice quality and similarity. Furthermore, FlashSpeech demonstrates its versatility by efficiently performing tasks like voice conversion, speech editing, and diverse speech sampling.
https://flashspeech.github.io/
new voice model. no weights or code. but they're a chinese team so maybe. they described their training method so might even be possible to recreate or make a custom one (like for ponies). hard to say how long it took but they used 8x H800s which iirc have half the memory bandwidth of h100s

Synthbot
04/24/24(Wed)03:29:08 No.41027971

Synthbot 04/24/24(Wed)03:29:08 No.41027971

>>40921076
I uploaded the fimfarchive data here: https://huggingface.co/datasets/synthbot/fimfarchive
After pip install --upgrade datasets, you can load it with:
>from datasets import load_dataset
>dataset = load_dataset("synthbot/fimfarchive")

Here's the code for completing, converting, and pushing the dataset from the fimfarchive:
https://github.com/synthbot-anon/horsewords-lib

Anonymous
04/24/24(Wed)06:00:02 No.41028148

Anonymous 04/24/24(Wed)06:00:02 No.41028148

>>40971560
The PPP turned 5 years old on April 5, happy belated birthday

Anonymous
04/24/24(Wed)07:33:49 No.41028248

Anonymous 04/24/24(Wed)07:33:49 No.41028248

File: sweetie_belle spike levitation.png (168 KB, 898x600)

168 KB PNG

>>40921076
https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/MLP_Sweetie_Belle_Squeeky
https://vocaroo.com/18BYdlV0bMSa
Sweetie_Bell trained on her S1 and S2 lines, mostly the squeaky ones. The transpose will need some playing around as with some input audio you will need to set it to 12 and with others to 24.

Anonymous
04/24/24(Wed)09:32:52 No.41028354

Anonymous 04/24/24(Wed)09:32:52 No.41028354

File: 3034661.jpg (233 KB, 1640x1640)

233 KB JPG

>>41028148

Anonymous
04/24/24(Wed)09:33:59 No.41028355

Anonymous 04/24/24(Wed)09:33:59 No.41028355

>>41028354
Kek.

Anonymous
04/24/24(Wed)14:30:43 No.41028840

Anonymous 04/24/24(Wed)14:30:43 No.41028840

>>40921071
1111 = 15

Anonymous
04/24/24(Wed)14:31:14 No.41028841

Anonymous 04/24/24(Wed)14:31:14 No.41028841

File: 1708127702548213.gif (618 KB, 473x472)

618 KB GIF

Anonymous
04/24/24(Wed)16:36:19 No.41029143

Anonymous 04/24/24(Wed)16:36:19 No.41029143

File: old monochrome 1671679216(...).png (330 KB, 683x1000)

330 KB PNG

>>41028148
> 5 years
it's has been this long, huh...

Anonymous
04/24/24(Wed)17:17:17 No.41029227

Anonymous 04/24/24(Wed)17:17:17 No.41029227

>>41028148
We are gathered here today, 5 years since the inception of a dream, to celebrate the union of spirits in the desperate pursuit of waifus. We honor our horsefuckers, those that clip, that build, that produce, that preach, that bicker. We honor those that inspire us, that give direction to our lives, and that walk us always forward toward our beloved mares.

Happy birthday, PPP.

Anonymous
04/24/24(Wed)19:56:36 No.41029634

Anonymous 04/24/24(Wed)19:56:36 No.41029634

>>41028148
>>41029227
Jesus, it's that old already?
How did time fly so fast?

Anonymous
04/24/24(Wed)21:31:11 No.41029834

Anonymous 04/24/24(Wed)21:31:11 No.41029834

File: 1777884.png (460 KB, 900x900)

460 KB PNG

queen piccalis
https://files.catbox.moe/0guftr.mp3

Anonymous
04/24/24(Wed)21:57:41 No.41029876

Anonymous 04/24/24(Wed)21:57:41 No.41029876

>>41029834
catbox borked the file
https://voca.ro/13fiEnurSQRQ

Anonymous
04/25/24(Thu)00:14:44 No.41030133

Anonymous 04/25/24(Thu)00:14:44 No.41030133

>>41029227
>its been 5 years and I still haven't summoned the courage to be retarded trying to get a local tool running

Anonymous
04/25/24(Thu)00:54:51 No.41030231

Anonymous 04/25/24(Thu)00:54:51 No.41030231

Good evening.

I think it was about three years ago that I asked about the possibility of vocal capture, in which an AI-generated voice maintains the timbre of the character it trained on, but follows the pitch and inflections of another recording. Has that been explored in the last three years, or is it still too far off?

Anonymous
04/25/24(Thu)01:16:21 No.41030275

Anonymous 04/25/24(Thu)01:16:21 No.41030275

>>41030231
Was your post around Jan 2021? For example: https://desuarchive.org/mlp/thread/36432529/#36456897
That was just 5 months before TalkNet, which does exactly what you're talking about. Then around the start of last year there was so-vits-svc and RVC, which improved on TalkNet. So it's been explored quite a bit

Anonymous
04/25/24(Thu)03:31:18 No.41030500

Anonymous 04/25/24(Thu)03:31:18 No.41030500

>>41030133
To be fair, it's just regular retardation for me. Half the thread reads like a different language for my techlet pea of a brain.

Anonymous
04/25/24(Thu)09:14:28 No.41030864

Anonymous 04/25/24(Thu)09:14:28 No.41030864

>Page nein

Anonymous
04/25/24(Thu)14:32:30 No.41031350

Anonymous 04/25/24(Thu)14:32:30 No.41031350

I need you to generate this with Discord's voice.
>O-OH MY CELESTIA!! I-IS THAT PRINCESS TWILIGHT SPARKLE?
>THAT'S MY FAVORITE SNOWPITY, JAZZY, SLOW BURN, BONE CHILLING, ATMOSPHERE-OOZING, TROPE-SUBVERTING, GENRE-REDEFINING, GUT-WRENCHING, SPINE-TINGLING, EMOTIONALLY TAXING, PARANOIA-INDUCING, JAW-CLENCHING, NERVE-WRACKING, CHARACTER-DEVELOPMENT DRIVEN, SOUL-SHAKING, NAIL-BITING, ANXIETY-WRITTEN, KAFKAESQUE, POST-LYNCHIAN, QUESTION-ASKING, SOCIALLY-AWARE, ETHNICALLY-DIVERSE, POLITICALLY-COGNIZANT, CULTURALLY RELEVANT, SOCIALLY-PRESCIENT, THOUGHT-PROVOKING, ARTISANALLY-CRAFTED PONY IN THE SERIES!!!

Anonymous
04/25/24(Thu)16:42:17 No.41031700

Anonymous 04/25/24(Thu)16:42:17 No.41031700

>>41030864
+1

Anonymous
04/25/24(Thu)18:16:25 No.41032035

Anonymous 04/25/24(Thu)18:16:25 No.41032035

>>41031700
+0

Anonymous
04/25/24(Thu)20:38:30 No.41032415

Anonymous 04/25/24(Thu)20:38:30 No.41032415

>>41032035
Once more.

Anonymous
04/26/24(Fri)00:53:35 No.41032921

Anonymous 04/26/24(Fri)00:53:35 No.41032921

>>40924154

Anonymous
04/26/24(Fri)02:26:53 No.41033060

Anonymous 04/26/24(Fri)02:26:53 No.41033060

File: full.gif (602 KB, 600x1000)

602 KB GIF

>>41031350
https://files.catbox.moe/ybp1gz.wav

Anonymous
04/26/24(Fri)04:27:12 No.41033240

Anonymous 04/26/24(Fri)04:27:12 No.41033240

>>41029834
>>41029876
She sounds kinda eastern here. Couldn't say why though.

Anonymous
04/26/24(Fri)07:31:20 No.41033431

Anonymous 04/26/24(Fri)07:31:20 No.41033431

https://github.com/jasonppy/VoiceCraft
new models
https://huggingface.co/pyp1/VoiceCraft/tree/main

Month old release 330M weights:
https://vocaroo.com/17v80p9NQi6A

Three weeks old 330M weights:
https://vocaroo.com/1aMwxaZb1jgp

Newest 330M weights:
https://vocaroo.com/1h2sj2e9Zp8Z

Newest upsampled with audiosr:
https://vocaroo.com/17Jx0xDoXz05

Anonymous
04/26/24(Fri)11:05:39 No.41033669

Anonymous 04/26/24(Fri)11:05:39 No.41033669

>>41032035
-1

Anonymous
04/26/24(Fri)16:04:56 No.41034179

Anonymous 04/26/24(Fri)16:04:56 No.41034179

File: OIG3.87p6TpX7HUzfOLVJ96Xa.jpg (155 KB, 1024x1024)

155 KB JPG

Anonymous
04/26/24(Fri)21:04:03 No.41034878

Anonymous 04/26/24(Fri)21:04:03 No.41034878

>>41034179

Anonymous
04/26/24(Fri)21:10:25 No.41034888

Anonymous 04/26/24(Fri)21:10:25 No.41034888

File: 1712532987613611.gif (1.53 MB, 1280x720)

1.53 MB GIF

>>40937305
I feel like if anyone can answer this it's clipper
hope he responds to you anon

Anonymous
04/26/24(Fri)21:11:26 No.41034891

Anonymous 04/26/24(Fri)21:11:26 No.41034891

>>40937342
>pointless minigames
i have a fucking BLAST every time we have minigames they're so fun
thing i look forward to most is actually the minigames

Anonymous
04/26/24(Fri)21:13:53 No.41034893

Anonymous 04/26/24(Fri)21:13:53 No.41034893

>>41034888
>>41034891
>looked at the dates of those posts
i realize im retarded im sorry bros

Anonymous
04/26/24(Fri)21:14:37 No.41034894

Anonymous 04/26/24(Fri)21:14:37 No.41034894

>>40937342
I liked the minigames. I'd rather have that than an hour of autists doing a very technical in-depth presentation that is really difficult to follow.vdymr

Anonymous
04/27/24(Sat)01:31:28 No.41035496

Anonymous 04/27/24(Sat)01:31:28 No.41035496

Hydrus, if haysay.ai has multispeaker StyleTTS2 models that were trained on more characters than the Mane 6, why do precomputed styles only allow me to select from the Mane 6?

Anonymous
04/27/24(Sat)03:49:31 No.41035656

Anonymous 04/27/24(Sat)03:49:31 No.41035656

Is there a torrent link for fimfiction archive?

Anonymous
04/27/24(Sat)06:59:59 No.41035908

Anonymous 04/27/24(Sat)06:59:59 No.41035908

>>41035656
Pretty sure there is a links to that somewhere in the scripts of the fim tool in main googledoc.
Otherwise there is this user made torrent https://www.fimfiction.net/user/116950/Fimfarchive

Anonymous
04/27/24(Sat)10:36:29 No.41036171

Anonymous 04/27/24(Sat)10:36:29 No.41036171

>>41035496
Partly because I never got around to creating more precomputed styles and partly because I was struggling with getting good results for the characters I did attempt. I tried to make precomputed styles for Starlight Glimmer and Gilda, but either the emotion/trait I was targeting wasn't coming across very strong or the generated audio didn't sound much like the character, especially for Gilda. The output would oftentimes sound more like Twilight Sparkle. I did manage to get a few OK ones for Starlight Glimmer; I'll add those soon.

Anonymous
04/27/24(Sat)10:38:55 No.41036180

Anonymous 04/27/24(Sat)10:38:55 No.41036180

>>41035908
thanks

Anonymous
04/27/24(Sat)10:48:38 No.41036205

Anonymous 04/27/24(Sat)10:48:38 No.41036205

>>41036171
NTA.
This may be a silly question but is there a nice and simple offline UI for StyleTTS2 ?

Anonymous
04/27/24(Sat)12:01:58 No.41036313

Anonymous 04/27/24(Sat)12:01:58 No.41036313

>>41036205
You can install and run Hay Say locally:
https://github.com/hydrusbeta/hay_say_ui?tab=readme-ov-file#installation-instructions

There's also an online colab here:
https://colab.research.google.com/drive/1ys8SkP-VW7CkhnwVveEGINaszG1kRaYl?usp=sharing#scrollTo=pGArrru8BpEe
It should be possible to download its python notebook file (.ipynb) and then run it on a local Jupyter environment.

Anonymous
04/27/24(Sat)12:11:07 No.41036322

Anonymous 04/27/24(Sat)12:11:07 No.41036322

>>41036313
Oh, that colab link in my previous post is for the one that comes with an epub downloader. Here's the link to another colab without epub:
https://colab.research.google.com/drive/1dDwKPYc2daS3MZxpinlfyIHd2jmGiHLh

Anonymous
04/27/24(Sat)12:16:37 No.41036331

Anonymous 04/27/24(Sat)12:16:37 No.41036331

>>41036205
https://github.com/effusiveperiscope/StyleTTS2_GUI

Anonymous
04/27/24(Sat)12:43:23 No.41036377

Anonymous 04/27/24(Sat)12:43:23 No.41036377

>>41036322
>>41036331
So it generates the waveform directly? How much slower is it than Tacotron+vocoder?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.