Welcome to the Pony Voice Preservation Project!youtu.be/730zGRwbQuEThe Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.EQG and G5 are not welcome.>Quick start guide:docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/editIntroduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.>The main Doc:docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/editAn in-depth repository of tutorials, resources and archives.>Active tasks:Research into animation AIResearch into pony image generation>Latest developments:GDrive clone of Master File now available >>37159549SortAnon releases script to run TalkNet on Windows >>37299594TalkNet training script >>37374942GPT-J downloadable model >>37646318FiMmicroSoL model >>38027533Delta GPT-J notebook + tutorial >>38018428New FiMfic GPT model >>38308297 >>38347556 >>38301248FimFic dataset release >>38391839Offline GPT-PNY >>38821349FiMfic dataset >>38934474SD weights >>38959367SD low vram >>38959447Huggingface SD: >>38979677Colab SD >>38981735NSFW Pony Model >>39114433New DeltaVox >>39678806so-vits-svt 4.0 >>39683876so-vits-svt tutorial >>39692758Hay Say >>39920556Text generation colab >>40271923 >>40276284/mlp/ image dataset >>40393331Haysay on the web! >>40391443SFX seperator >>40786997 >>40790270Cream Heart model + Alltalk-tts >>40836410Clipper investigates further data cleaning >>40860022 >>40872222 >>40890799 >>40902356HydrusBeta working on HaySay 2.0 >>40840723Blueblood RVC model >>40887151AI Redub 5 Releases >>40871923>The PoneAI drive, an archive for AI pony voice content:drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp>Clipper’s Master Files, the central location for MLP voice data:mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSigmega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQdrive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx>Cool, where is the discord/forum/whatever unifying place for this project?You're looking at it.Last Thread:>>40829408
FAQs:If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/editMain: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit>Where can I find the AI text-to-speech tools and how do I use them?A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwqHow to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy>Where can I find content made with the voice AI?In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCpAnd the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit>I want to know more about the PPP, but I can’t be arsed to read the doc.See the live PPP panel shows presented on /mlp/con for a more condensed overview.2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ42021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw52023 pony.tube/w/fVZShksjBbu6uT51DtvWWz>How can I help with the PPP?Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.>Did you know that such and such voiced this other thing that could be used for voice data?It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.>What about fan-imitations of official voices?No.>Will you guys be doing a [insert language here] version of the AI?Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.>What about [insert OC here]'s voice?It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.>I have an idea!Great. Post it in the thread and we'll discuss it.>Do you have a Code of Conduct?Of course: 15.ai/code>Is this project open source? Who is in charge of this?pony.tube/w/mqJyvdgrpbWgZduz2cs1CmPPP Redubs:pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97Stream Premieres:pony.tube/w/6cKnjJEZSCi3gsvrbATXnCpony.tube/w/oNeBFMPiQKh93ePqTz1ns8
>>40921071
>page 9
>>40920442so excited to try this shit out!
>that last pic in the previous edition Kek
>>40921702we salute the fallen
>>40921702Sneaky.
Up again.
>>40921562Glad you're excited! Now I can share something of Pinkie! Don't worry, according to my friend the program can do English too, Japanese just works really well.Pinkie(With instru): https://pomf2.lain.la/f/ny70wit.wavPinkie(Without instru): https://pomf2.lain.la/f/v5lzgzo1.wav
Along with Pinkie, there's also an English thing of Applejack singing the same song that the first every ai voice sang... how touching. It's not perfect, but it's still really good. Oh, and good news, my friend is planning on adding Spike and Discord to the fray! Not too thrilled about Spike but Discord could be fun!https://pomf2.lain.la/f/mcs9g4x5.wav
>>40922574How about 'go fuck yourself'?
Rarity time! You know the drill, one with the instrumental, and one without! By the way, how does MareLoid sound for the program? My friend wants to keep things chill for it, but I think the name could work!Rarity (With instru): https://pomf2.lain.la/f/124xilny.wavRarity (Without Instru): https://pomf2.lain.la/f/ynuagi48.wav
Bump.
upsies
Musicians are dead. suno ai https://files.catbox.moe/fxcjwn.mp4
>>40924457What is the point of an AI that is incapable of generating fetish erotica?
The mare's a freak. The msre never misses a beat.
>>40924457https://files.catbox.moe/6s2yv7.mp4
>>40925098https://files.catbox.moe/uistc5.mp4
Hello again! As of now, Spike and Discord have been added into what we now call MareLoid! I hope I'm not becoming annoying with all my updates, but my friend and I are really passionate about this. To give a reminder to those who don't know what I'm talking about, MareLoid, if everything goes well, is a way to make your favorite mares sing songs without reference audio, similar to how one uses Synthesizer V or Vocaloid. Allow me to share what my friend cooked up with Spike and Discord!Discord (With instru): https://pomf2.lain.la/f/gmmapcn5.wavDiscord (Without Instru): https://pomf2.lain.la/f/oyen6178.wavSpike (With instru): https://pomf2.lain.la/f/8j4qfieo.wavSpike (Without instru): https://pomf2.lain.la/f/sgoqfpjp.wavAny suggestions on who to add next? We're thinking Trixie perhaps?
>>40925332Granny Smith just for shits and gigglesBesides that, Celestia, Luna, and Cadance would be neat.
>>40925418You. You do not know how GOOD she is... I know you meant it for shits and giggles but DAMN she can sing!Granny Smith (With instru): https://pomf2.lain.la/f/5ud769h6.wavGranny Smith (Without instru): https://pomf2.lain.la/f/qsmyx7s0.wavAs for the princesses, they're on the list!
>>40924457can it do pony in the style of dragonforce?
>>40925332will there soon be a way for us to try this ourselves?
>>40925866Unfortunately, it's not that simple. I don't know programming as well as my friend does, but according to them, it's not exactly at a state where they feel comfortable sharing it with anyone yet. They're... kind of like 15 in terms of perfectionism, so I can't really say if anyone but them will be able to use it. But if all goes well, and they're able to find it perfect, it may have a chance of release. We can only hope.
Up.
VOICECRAFT: Zero-Shot Speech Editing and Text-to-Speech in the Wildhttps://jasonppy.github.io/assets/pdfs/VoiceCraft.pdf>We introduce VOICECRAFT, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts1 . VOICECRAFT employs a Transformer decoder architecture and introduces a token rearrangement procedure that combines causal masking and delayed stacking to enable generation within an existing sequence. On speech editing tasks, VOICECRAFT produces edited speech that is nearly indistinguishable from unedited recordings in terms of naturalness, as evaluated by humans; for zero-shot TTS, our model outperforms prior SotA models including VALLE and the popular commercial model XTTS v2. Crucially, the models are evaluated on challenging and realistic datasets, that consist of diverse accents, speaking styles, recording conditions, and background noise and music, and our model performs consistently well compared to other models and real recordings. In particular, for speech editing evaluation, we introduce a high quality, challenging, and realistic dataset named REALEDIT.https://jasonppy.github.io/VoiceCraft_web/https://github.com/jasonppy/VoiceCraft>[] Upload model weightsweights not up yet but soon maybe
>>40926585A better TTS tools would be nice.
>>40926585>The training of the 830M VOICECRAFT model took about 2 weeks on 4 NVIDIA A40 GPUs.cheapest 4x A40s on vast.ai is $1.803/hr so ~$605.8
>>40927049though that one seems a bit fucked so probably $2 is a better estimate so $672. 4x 4090s might actually be a better deal. $2.4 an hour and if the tflops ratio works out in training then it'd be ~$300
Page 10 bump.
Hey BGM, I remember you mentioning half a year ago that you were planning to make a Linky song but it's been quite a bit since then so I wanted to ask if you've changed your mind by any chance.
>>40920442>>40923239I only check this thread once a week, but this is a super impressive project, anon!
Update: The princesses have been added! Here's a preview of them singing that one new years eve song in Japanese!Celestia: https://pomf2.lain.la/f/4mcnfwdg.wavLuna: https://pomf2.lain.la/f/weq0rkl7.wavCadence: https://pomf2.lain.la/f/r8hfkabz.wav
>>40924457This is very cool. Literal first attempt which took me less than a minute to "create":https://files.catbox.moe/17calb.mp3Lyrics generated with GPT, copypasted into this thing and that's it. It's not perfect-perfect, but it's 10x better than the previous AI tools to make songs and for this I intentionally went with the first lyrics and the first song gen, no edits (those damn 'hands' in chorus, ree) or rerolls. The 2-minute trial ate the outro, though.>>40925864It's unfortunately not trained to mimic styles of specific bands/artists, you'd have to describe it and probably generate a bunch.
Surprise! The CMC are in MareLoid! This will probably be the last update for a bit, but I'll still be lurking here!Applebloom: https://pomf2.lain.la/f/t2hcxnzz.wavSweetie Belle: https://pomf2.lain.la/f/exs4e1q0.wavScootaloo: https://pomf2.lain.la/f/xqpiatdx.wav
bump
>>40925105https://files.catbox.moe/fczgrs.mp3I'm having fun with this. Prompt was "Sad song about a girl named Twilight in the style of lo-fi hip-hop in the late 1990s."I wonder if there's a way for me to continue/redo a prompt with this specific beat.
>>40929685This is crazy.https://files.catbox.moe/b73nte.mp3https://files.catbox.moe/lr859w.mp3prompt: a emo, industrial, dream pop, psychedelic, shoegaze, trip hop, vaporwave song about pinkie pie the pony being depressed that her friends don't love her anymore.
>>40924457"Made" a sequel song using the new v3 version. i used this sunoai just a couple of days ago and you had to pay for v3, now v3 is free. so v4 must be around the corner already.https://files.catbox.moe/00wefs.mp3prompt,a sad country breakup song about a man's pony mare named applejack, being taken away from him. belonging the days where he can pet his mare once again.
Mare between the lines
>>40903672Well this didn't exactly work. I ended up trial and erroring a whole bunch of different things and I think the predictor_encoder and style_encoder have the biggest impact on F0. Unfortunately characters were no longer recognizable even though the F0 was closer to desired.
>>40924457Very fun stuff, reminds me of Jukebox AI when it released on Google Colab. This is so much easier to use, I just had to make something after genning these lyrics.https://youtu.be/gRNP3lBEHNQ
Hey marrs
Bumping the mares.
>>40928808is it possible to make the mane six (especially fluttershy) do black metal growls with this program?
>>40929685https://files.catbox.moe/7via22.mp3 Stitched together a full version.
I have some news to share, it's not... great.Due to circumstances involving my friend's perfectionism, they deem the quality of the voices within MareLoid to be not up to their standards, and rather than try another AI algorithm which could give "better" results... they've decided to shelf MareLoid for the time being, meaning no new voices will be added and no new demonstrations will be made for the time being. I know some of you really wanted to try this out, myself included, but I can't just badger my friend about reconsidering, they're dead-set on shelving it for now. I'm really sorry about all of this, please don't be too mad or blame my friend too much, they just have... really high standards. That's all I can say. I'll still lurk here for stuff relating to the mares, but unless MareLoid starts up again, this is the last time you'll hear of it for now... I apologize greatly.
>>40932952>Perfectionist hides something great from the world because it isn't perfectDon't worry, we're used to it.
>>40932952inb4 mareloid anon’s friend is 15
>>40932952>We never actually made anything, we just used so-vits-svc, you got fucking trolled, faggotsEvery time.
>>40933744You called? :)
>>40933757https://files.catbox.moe/6sls9v.flac
>>40933757>>40933838>Post removed...What did Jannie mean by this?Like, genuinely, what on earth could the rational have been for that? Dafuq?
>>40934086>...What did Jannie mean by this?Maybe anon deleted because he was too ashamed of himself?
>>40934108No, I'm the anon. Still ashamed of myself, but now also confused.
>>40934086>>40934108>>40934114Same. The comment was just talking about how he liked my RD cuck audio and joked about wanting another one. Mods, are you okay?
>>40925955>They're... kind of like 15 in terms of perfectionismLol, 15 does it because he wants to scam his gullible patreon paypigs.
>>40934129>Jannie, are you okay?>So, jannie, are you okay?>Are you okay, jannie?>Jannie, are you okay?>So, jannie, are you okay?>Are you okay, jannie?>Jannie, are you okay?>So, jannie, are you okay?>Are you okay, jannie?>Jannie, are you okay?>So, jannie, are you okay?>Are you okay, jannie?>Jannie, are you okay?>Will you tell us that you're okay?>There's a post on the image board>That he cucked you - a greentext, jannie>He came into your thread>He left cumstains, on the carpet>Then you ran into the basement>You struck it down>It was your doing>Jannie, are you okay?>So, jannie, are you okay?>Are you okay, jannie?>You've been hit by>You've been hit by, a rogue janitor
>Page 9
>>40933757>>40934129>>>/trash/
>>40934750>four hours to replyNGMI
>>40934656+1
>>40934194That thing is still going?
Mare yourself out
The big mare gak
>>40935551No, 15ai shut down last year, and 15 has been completely silent since February last year. Even if he did come back, he's already been outpaced by everyone and their mother. His work is amateurish now compared to shit like ElevenLabs, so-vits, and even that Suno song software that dropped recently.
>>40936557The benefit of 15.ai was that tech illiterate people like me could use funny pony noise machine to create or enhance content. As neat as the shit you people do is, it's kind of like visiting a zoo and looking at the monkeys in the monkey pen bang rocks against eachother. That's why these threads are so inactive.
>>40936584I mean, most of us used 15ai because it was much easier and simpler to create content, and we didn't need to resort to recording ourselves pretending to be a cute mare getting fucked and have our roommates or family overhear our degenerate shit.
>>40921076Review of S5 is done.https://files.catbox.moe/wofd6s.json
>>40933513>>40934964It's getting to be that time of year again, /mlp/con is (probably) soon to be confirmed for late June which means I'm starting to have preliminary thoughts about another PPP panel. I don't think it's quite time to start properly planning anything just yet, just time for anyone who's got any ideas or would like to be involved to start thinking about it. I'd be happy to be the main host/organiser again if no one else wants to take the reins, and assuming that happens I'll probably put out the first proper call to action once I've finished with the remaining dataset review work and /mlp/con finally actually confirms a proper date.
not sure if this is the right place to ask, but wondering if there's a more efficient process in audacity for extracting vocals from a song. right now i'm clipping the original vocals, getting the generated clip from talknet/svc, and then syncing it back to the instrumental track.
>>40937252>/ppp/ panel againDo we even have much to talk about beside spending 10 minutes to shill haysay and newer Pony Diffusion and 2 hours of pointless mini games?
Is there some LLM angled towards pony stuff? I would like to generate pony related text locally but the uncensored model that I have doesn't really know much more than surface level.
>>40937252>>40937342I have desire to do a little bit of shilling for using that process of ai cloned clips into creating lengthy dataset for other models. And helping out in the background with whatever needs assistance.
>>40936911>have our roommates or family overhear our degenerate shitBut that's part of the flavor!
before you go to bed
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wildhttps://arxiv.org/abs/2403.16973https://github.com/jasonppy/VoiceCraft> Upload model weights (encodec weights are up)previously posted but they've put their paper on arxiv now. oh they've posted the encodec weighthttps://github.com/jasonppy/VoiceCraft/issues/12lol
>>40938392Also in the middle of the day.
Mare anyway
>>40940052But not on page 9.
haysay.ai JUST went down. Anyone else?
>>40940484I've restarted the Docker containers and its back up now.
I am rarted. I'm only now remembering to do this. YEARS LATER. But better late than never!__________Here is a collection of all the moans and sighs I've generated over the years with 15ai when it was still active. Altogether, this is 538MB. It is currently split into three separate zip files due to Catbox's 200MB limit. Many of these audio clips were made for audios that I either never got around to or couldn't make with the shutdown of 15ai. If I can find a good alternative, they'll be finished someday.Part 1 contains:>Daring Do>Derpy Hooves>Fluttershy 2>Gilda>Pinkie Pie>Sci-Twi (For consistency's sake, but it's just Twilight)>Scootaloo>Spike>Trixie >Twilight Sparkle 2>Twilight SparklePart 2 contains:>Applejack>Fluttershy>RarityPart 3 is Rainbow Dash. Some of it was generated for the cuck audio, but can be cut and spliced at will. unless you're a cuckPART 1 -----> https://files.catbox.moe/nfwu5p.zipPART 2 -----> https://files.catbox.moe/kd1jvv.zipRAINBOW DASH 2 -----> https://files.catbox.moe/w5stv4.zip
Precautionary prior to work bump.
>10
>Page 8 in 90 minutesI see there's some shit going on in the catalog again.
>>40941866>Page 8 in 35 minutesWhat the fuck is going on today?
>>40942427>several threads with random "lets talk about mare" with less than a dozen postim not saying its some raid, but it sure stink like one, designated to pump off threads on such slow board.
>>40941013>>>/trash/
>>40942541Stop advertising yourself, Goku.
>>40942500Why the fuck would that be a raid? That's how boards are supposed to work. Are you saying a board where half the content is people page9 bumping their inactive generals is a healthy situation?
>>40942604it's called pattern recognition. There clearly wasn't anything big happening (eg Hasbro ending G% , rule 15 getting removed or Faust making her own pony series with blackjack and hookers), if it was weekend this maybe would make sense since people are having time off and shitposting as usual, but having board go into abnormal overload posting in middle of the work days is just bit weird.
>>40942604Generally, new threads are supposed to have some form of viable topic to discuss. A flood of random screencaps with less than a full sentence for an OP is just a bunch of shit.
>>40942931Whatever it was, it seems to be over for now.
Alright, suno is pretty fun.https://app.suno.ai/song/5b3abea5-7ad6-48f3-ace5-1f11f1444c15
>>40943827>5 short songs per day or pay upfug, does anyone know if the alternatives that can create similar outputs with just text only? ( know there is BARK, but so far, the results from anons seems just so-so)
>>40943827I suppose the next step would be to take a suno output, separate the vocals, transform the vocals into a character voice, then mix them back together with the instrumental.
https://github.com/DoMusic/Hybrid-Net>Real-time audio source separation, generate lyrics, chords, beat. A transformer-based hybrid multimodal model, various transformer models address different problems in the field of music information retrieval, these models generate corresponding information dependencies that mutually influence each other.if anyone could test to see if this is superior to UVR (I'm not into audio stuff) that would be appreciated
>>40944463>no requirements.txt filethis is a bit of 'uhoh' moment.
>>40944406I wouldn't be surprised if they offer isolated vocals as an option eventually. I assume they're generating both in separate models and then mixing, so it should be trivial.>>40944314I mean no, that's why everyone's hot for teacher over suno, it's the first one to do music gen from nothing that doesn't sound like trash (earlier attempts were getting there, but always kinda lost the plot, never remained coherent for a full 2 minutes as far as I know.) Personally, I can't wait till we can get something like this going privately, so we can pirate musical styles and specifically request chord progressions if we want. Impossible with current memory/processor limitations, but who knows what the future holds.
>>40944762How so?
>>40938470>VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.> 03/28/2024: Model weights are up on HuggingFace here!weights are up, should we try to fine tune it?
bmup
>>40946869https://github.com/jasonppy/VoiceCraft> 03/28/2024: Model weights are up on HuggingFacehttps://huggingface.co/pyp1/VoiceCraft/tree/main
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animationhttps://arxiv.org/abs/2403.17694>In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image. Our methodology is divided into two stages. Initially, we extract 3D intermediate representations from audio and project them into a sequence of 2D facial landmarks. Subsequently, we employ a robust diffusion model, coupled with a motion module, to convert the landmark sequence into photorealistic and temporally consistent portrait animation. Experimental results demonstrate the superiority of AniPortrait in terms of facial naturalness, pose diversity, and visual quality, thereby offering an enhanced perceptual experience. Moreover, our methodology exhibits considerable potential in terms of flexibility and controllability, which can be effectively applied in areas such as facial motion editing or face reenactment.https://github.com/Zejun-Yang/AniPortrait> Update the code to generate pose_temp.npy for head pose control.>We will release audio2pose pre-trained weight for audio2video after futher optimization. You can choose head pose template in ./configs/inference/head_pose_temp as substitution.lip sync for images to audio. has face reenactment ability too. github has more videos. it's pretty rough but lots of room to improve. they used 4 days with 4 A100s to train it so if using vast.ai with the 4x A100 80GB offering it would cost $336 to recreate (using their datasets which might not be optimal). also still hilarious we can't use webms with audio outside of /gif/specifically for you guys it would probably require a new model to be trained with ponies or w/e for it to work well
>>40947581Not only that, it would have to go human --->equine--->cartoon equine. It would have to be capable of hopping 3 categories and fundamentally different skull structures, vs 0. Probably still possible eventually, but a much harder problem.
>>40947581>Pony facial animationsMan, that would be damn cool.
an anon tested the voiceclone capability of voicecraft>new tts model dropped and here is the source filehttps://voca.ro/157IzI9y4YZ6>and here is the generated voice.https://voca.ro/1ojHkZ87XRVL
https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices>Today we are sharing preliminary insights and results from a small-scale preview of a model called Voice Engine, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.>It is notable that a small model with a single 15-second sample can create emotive and realistic voices.
>>40948519this is sounding pretty good, little bit too good. If this isn't someone trolling by recording themselves twice, it's a pretty nice indication that the voice cloning tech is going from being kind of shit to pretty decent.
>>40948519How on earth are you supposed to train a voice on this? I'm so confused by the instructions. Am I just supposed to wait until a colab or huggingface demo appears?
>>40948519Eh? That is impressive. However, all I've heard so far are low quality inputs.Can we get some high quality results? Preferably pone? It should be much easier to tell if there's any flaws.
hey does anyone have a link to that pinkie pie american pie cover with vul style voices?
>>40949398https://files.catbox.moe/g3ahk9.flacI got you senpai
>>40949524I LOVE YOU ANON!!!!
>>40949398Isn't that on Youtube too?
>suno ai songs are all saved to mp3 and the link is easily accessible with standard webpage inspection keybindingNot sure if I should feel happy that there is no need to fuck around with 3rd party software to download songs, or disappointed that apparently one of the more advanced musical ai tools are being handled by some IT intern guy.
>mare
>>>/wsg/5497184>I think the main barrier is good local tts. Stability AI actually trained a SOTA tts model back in February based on enabling Audiocraft to generate text. They pussied out though and didn't release the model. Here's the paper if a tech savvy anon wants to spend $20,000 and 3 months of dev time to make the best open source tts solution that can also do music.>https://arxiv.org/abs/2402.01912reee
>>40951924>$20,000maybe not SOTA but voicecraft (>>40926585) took 2 weeks with 4xA40s to train. Vast.ai current pricing has that work out to be around $672.
>>40941013>no replies for this high effort postanon I am appalled how and why the fuck did you do thiswhy so many what were the uses for it?
>https://youtu.be/f8DKD78BrQA?si=a1ejJszeKdC4idq3for those of you who watched this may have noticed a little something where they used a simulation (3D models trying and failing at a certain activity until they get it right) in order to train AI and it just fucking hit me just how close we are to pony bots>but muh computing powerwell yeah, but hypothetically if we had the funds for training and that was taken care of, as long as we had some sort of model and card (think /CHAG/ because we could totally get those guys to whip something up for us if we needed to just use a basic model as the base for the robo-pony) and then all that's left would be>the hardwarenow the hardware would also be expensive since you have essentially a full blown PC (to run the local model instead of a proxy)>inb4 the nightmare of having your waifus proxy getting taken downand then on top of the PC parts like GPU you also have the mechanical movements which maybe requires some knowlege on hydraulics (maybe thats way too powerful for a small pony but the point stands) and the actual robotic parts would be expensive as fuck, I feel this would end up being the hardest part, but i guess the nice thing about having this general thread is we can always come up with hypothetical stuff that could end up as the actual basis for robo-pones once we have the abillity to do soyou know what they say, (i looked it up and couldn't find it so ill improvise) an ounce of planning is worth a pound of actionhaving these ideas and plans on us could give us an opening to actually do stuff the second we have the funds or the techbesides, by breaking it down into actionable steps (in a similar way to how anon's claimed singular scenes for the Redub projects in part due to the way the google spreadsheet was setup) we could totally have an a plan and get our waifus out ASAP and with growing promise im sure funds would start pouring in from anons on here, i mean look what happened to 'Fallen Oak Sanctuary' at mare fair, if they can get 50k im sure a project like this, which directly affects all the anons on this board, is surely to get funding the second it shows some promiseand the only way to show promise is to show solid planning and theory to back it up and generate excitement from anons who can donate
>>40953149another thing would be (i guess you could phrase it as) contextual context where instead of AI taking a certain amount of most recent messages (more of a /CHAG/ thing with sillytavern and context token limits) you could probably just use keyword related context (which im confused as to why this isnt a thing in silly tavern yet)but anyways ignore that fucking slop above this sentance, the point im trying to make is we would have to figure out how to maintain proper context so our little robo-pones aren't overwhelmed everytime they want to speakim sure you could have both keyword context as well as (i guess you could call it) simile context where previous conversations which are similar but not exact to a current conversation can still be used as context but how that would work is beyond meat this point in time we pretty much are just leeching off of what tech companies give us and adapting it to pony stuff, i doubt we could compete with these tech giants in terms of actual AI advancement but still, if we are adapting their AI we could at least plan for the actual hardware of robo-pones since we all know a NVIDIA-robo-twilight aint happening anytime soon so it's up to us to get designs and plans rolling out in preparation for funding (maybe mare fair could help with this one day or even /mlp/con) all that matters is the momentum and actual visible plans is being shown and surely funding would roll in from other excited anons
>>40953111I wanted to create a multitude of moans and splice individual ones together in specific ways in order to get a more "natural" sounding series of moans, something that sounded genuine and realistic, like the character is actually breathing and doing it. Sometimes one generation will have a good gasp and then moan like they're going "huh?", while another one derps out when it gasps but they do this little shiver afterwards, and it fits just right.For example, with the Fluttershy audio I did two years ago, there are a few generations I was able to make that straight up sounded like Fluttershy was cumming hard, and it worked so well for the context of the story I had set up.Many of the others were for audios I had planned but either never got around to, or couldn't do them because 15ai shut down and never came back up. But either way, the sounds I generated were just sitting on my hard drive collecting dust, so I wanted to post and share them for folks to use either in audios or animations. They're still a bit noisy because they were generated in early 2022, but with some good filtering effects you can easily remove it.Some of the audios also have dialogue included in them, but they're meant to be ignored since those generations included some good moans.
>>40953222>>>/trash/
Gradio port of VoiceCraft if anyone cares.https://github.com/friendlyFriend4000/VoiceCraft
>>40953388>windows version maybeoh jeez. Im not ready to go full penguinpilled, despite most of the stuff I use is technically available on both OS however all the little changes just fuck up my productivity.
>>40921076>>40937247Review of S6 is done.https://files.catbox.moe/2a6kml.jsonvul - do you have the original audio for EQG and the FiM movie? I think it'll be worth running them through demu as well to see if we can get some more clean audio there.
Since Suno is restricting prompts based on certain words and has no idea what the fuck it's supposed to be, I generated these:Suno! -----> https://files.catbox.moe/c7qk63.mp3NO FUN ALLOWED -----> https://files.catbox.moe/brdne4.mp3Here's another version that's not as good, but they say "rock solo" in a cool way: https://files.catbox.moe/uho0fd.mp3
>https://app.suno.ai/song/a9973479-6cdc-4098-a8d8-cf0787d64943Alright, its my first time fucking around with this thing, and I through of using some half asses lyrics at it and the outcome is pretty OKish. It's still pretty cucked with the artificial set up credit system to randomly cut off the song just few seconds before the end (probably by design, to trick people into going full paypig mode), however if I had a offline version of this tech I would totally spend two days trying to make it perfect.IF/WHEN my above complain stop being problem, I can see this as another boom of pony music, just as there was a boom of them after the rvc/sovits got created year ago.
>>40953388https://github.com/kijai/ComfyUI-VoiceCraft
If i could ask someone a request here. Could anyone generate an Ai dub based on this image.
>>40954154>clipperI LIKE YOUR SNOW PONY VIDEOalso where did u get all that science equipment from snow bro?
doot
>>40954981I've try to see if it would be possible to quickly edit this with pony voice, but the constant change in reverb as well as randomly going from solo to duet to solo in the same sentence makes a very cursed sounding outputs.So with the fact that 70% of the lines would need a redoing, I can see this being a pretty nice prototyping tool for people to then create proper songs, either with raw vocals or audio conversion. To get a "plug voice to rvc/sovit and get mare" future we will need to wait for a uncucked offline version of this tech.
>>40955053https://github.com/haoheliu/versatile_audio_super_resolutionhaving some way to upscale the 16kHz voicecraft output in an easy to use manner might be good
15 ai come back
>>40958251Nah, too busy wiping his ass with his patreon money.
>>40959439
>saw this on /g/Is this were we are heading, future with fantastical and whimsical scifi level of tech and all the normies going "ehh, whatever dude"?
>9
>>40936557>ElevenLabsDoesn't let me use it without upgrade my subscription
>>40960343>the normies going "ehh, whatever dude"That's probably for the best. Normies wouldn't grasp what they're dealing with and only attract a larger outrage mob by painting a bigger target on AI.
>>40964097>teaching robots how to boopThat's a powerful tool you give them.
Since making songs with suno ai (with some help with chat gpt) is relatively easy, how would Anons feel about making a some kind of musical ai album with the theme of taking an episode of S1 and making song about it (whenever it's something serious or more comedic)?Just throwing this idea out here to see if there is any interest.
>>40964432That sounds like a fun idea.
https://stability.ai/news/stable-audio-2-0https://stableaudio.com/no weights and the website requires a log in but seems like for now they're making it usable for free
>>40964753thats nice>To protect creator copyrights, for audio uploads, we partner with Audible Magic to utilize their content recognition (ACR) technology to power real-time content matching to prevent copyright infringement.while this is bit shit, I can understand that they need this for when music companies will go full jew on them (we can't have nice things unless one of the five big corpos also take a pound of flesh for it). Sadly this most likely means that they will not publish the weight models, however if they also publish (or have a leak) on how their models is train there is a small chance someone else could pick it up and make a non-cucked version.
>>40964753Just tested a bit of a slightly outlandish prompt; and oh my god, the result is just so weird and painful the more you listen.>"My Little Pony, Pop, catchy, Rainbow Dash vocals, Pony, Horse, Neigh, Mare">https://files.catbox.moe/aidi5e.mp3
>Uses Stable Audio to generate horse SFX foleys>Some trot some sniff but the rest all horror spook>She growls and make otherworldly noises>Mare is scarehttps://files.catbox.moe/b97srx.mp4
https://docs.google.com/document/d/17hP2rkQHlU43nNOdy1iDEJzKbhgm_kjSaCAo6J1plTY/edit?usp=drive_link>>40964432>TLDR Anons come together to make a full album with songs that are created whatever ai tools they have access to.Alright, here is the basic set up for the Pony Ai Album, for anyone interested simply replay to this post with the episode that what you wish to work on (however songs based on characters/setting/ides are welcome too).The individual song will have soft deadline of two~three weeks, if the song is not delivered by that time it will became available for choosing once again.If there is anything that (you) think would need to be added to the main doc please do tell.
>>40965188>https://files.catbox.moe/aidi5e.mp3This sounds like a song from a Pinkamena Party album.
boop
Hello AnonsSo...seems like RVC can do laughter but under certain conditions, the biggest one is that it has to be somewhat airy, and a short chuckle or chortle...a soft laugh. I'll post results with various characters soon
>https://app.suno.ai/song/6385f0fa-7bfd-4faf-ac2f-c3f540fec9c5/just reposting a /g/ banger, it's not as amazing as the '4am' but it's still pretty bloody impressive.
>>40967851Looking forward to it.
>>40968699
Thread preservation bump.
>>40972116yee
>>40972707Have another.
still no way to make mares do death metal growls?
Requesting pic related integrated into an outtake of the episode.
And I need mares more than want maresAnd I want mares for all timeAnd the wichita linemareIs still on the line
>>40940580is it open sauce?
>>40921076>>40954154Review of S7 is done. https://files.catbox.moe/4vscco.jsonThat's all the core FiM episodes done. I'd still like to try running demu on our other audio sources to improve the dataset further. vul, would you be able to run it on the EQG audio and the FiM movie? >>40956515I'm a chemist irl so was just using the stuff I already had in the lab.
>>40975418Yes. The main repository for the UI is here:https://github.com/hydrusbeta/hay_say_uiDocumentation on running a public server is here:https://github.com/hydrusbeta/hay_say_ui/tree/main/running%20as%20serverEach architecture (RVC, ControllableTalkNet, StyleTTS, SVC) also has 1 or 2 of its own repositories, which are used for building its Docker image. There are about 10 repos involved in the whole project:https://github.com/hydrusbeta?tab=repositories
>>40975623Do you have the associated audios for EQG and the movies
>>40975895>>40975623Here is the merged labels index, could be useful in the future. If it turns out to be inaccurate it can be recomputed afterwards; the file structure is what's most importanthttps://files.catbox.moe/tqglrp.json
>>40975895Should be these, I also included the Best Gift Ever and Rainbow Roadtrip specials:https://drive.google.com/file/d/12QomZA_D1XiRNciPkIxqw_reU1c64fnm
>>40976188 (checked)Isolated versions. The current code is not compatible with these audios but you can get a head start on downloading them:https://drive.google.com/drive/folders/1dw4nYR9PjJk2C81Hzjgkym26hMULNJMO?usp=drive_linkBest Gift Ever demu0:https://drive.google.com/file/d/1oI_qo8TAkHCzx_waQOcFZhLcNcbnnVFt/view?usp=drive_link
>>40977357Also here is the labels index for the extra files:https://files.catbox.moe/cfcvsw.json
>>40977357>>40977360FYI something seems to have messed up in the processing for the FiM movie and Rainbow Roadtrip. Not sure what happened other than that the timestamps are off...
>>40977418Actually there seems to be a discrepancy between the master file annotations and the version provided here>>40976188
>>40977476Also unrelated but I still can't believe that is the highest quality audio of Rainbow Roadtrip that was ever released. I wonder if the Netflix version would've been better.
>>40978197again
https://files.catbox.moe/0j8902.mp3
>>40979460I like the vibes you are going for in there.
>>40977418>>40977476>>40977553The audio for the specials and movie have always been somewhat tricky, there've been multiple sources used over the years so entirely possible the version I have locally is different. I wouldn't worry too much about it if it's unduly difficult, we already have plenty of other data.
>>40979460AWAKEN, MY MASTERS!
>>40980260I manually tried to align a few clips, and there does not appear to be a constant offset for Rainbow Roadtrip or the FiM movie. I imagine some clever data finagling could be done to re-align the clips but it's beyond me at the moment. Also, the Rainbow Roadtrip voice clips in the master file seem to be polarity inverted relative to the copy we have.Going to focus on modifying the code to work with the EQG data.
>>40980260>>40980649Updated instructions: 0. Download new release https://github.com/effusiveperiscope/PonySorter-B/releases/tag/20240407updated1. Copy the >>40977360 into the ponysorter directory and rename it to extras_labels_index.json2. Copy extra_process_dumps from here >>40977357 into in_audio3. In config.yaml, change index_file to point to extras_labels_index.json. Point master_file_1 and master_file_2 at the correct paths (master file 2 I think is not used atm but it would be if we got the movie audio working in the future)Lmk if there are issues
Casualties of mare
Pass the mare grenade
>>40980710Got an error.
>>40982445That's odd. I can't find anything substantial about this online. Try restarting? Have you made any modifications to audio hardware/drivers/system updates?
>>40982570>Redownload PonySorter several times>Painstakingly triple-check all steps of the process>Waste over an hour troubleshooting mystery issue>"Just restart PC lmao">It works nowREEEEEEEEEEEE
>>40982634many such cases
>>40982634The magic of technology,
>>40982185The holy mare grenade.
>>40980649I poked at this a little bit and the situation is even more odd for the movie--while I can get the Rainbow Roadtrip lines close enough to null out, I can't seem to get any of the audio clips for the movie in phase and I'm kind of suspecting that they are actually at very slightly different playback rates. At this point I'm seriously considering using an AI STT with timestamps and just matching those against the manual annotations. Might also be helpful for Best Gift Ever?
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworkshttps://arxiv.org/abs/2404.04645>Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation on a new set of speakers can be achieved by fine-tuning the whole model for each new domain, thus making it parameter-inefficient. This problem can be solved by Adapters that provide a parameter-efficient alternative to domain adaptation. Although famous in NLP, speech synthesis has not seen much improvement from Adapters. In this work, we present HyperTTS, which comprises a small learnable network, "hypernetwork", that generates parameters of the Adapter blocks, allowing us to condition Adapters on speaker representations and making them dynamic. Extensive evaluations of two domain adaptation settings demonstrate its effectiveness in achieving state-of-the-art performance in the parameter-efficient regime. We also compare different variants of HyperTTS, comparing them with baselines in different studies. Promising results on the dynamic adaptation of adapter parameters using hypernetworks open up new avenues for domain-generic multi-speaker TTS systems. https://github.com/declare-lab/HyperTTScode was posted 10 months ago but the arxiv paper was just posted. hope the guy who does finetunes tries it out to see if it somehow is actually useful
>>40984892So unfortunately, a good 30% of the time Whisper doesn't seem to recognize Pinkie as an actual voice (understandable, since I don't think there are any voices like Pinkie Pie in open training datasets). Going to keep plugging and see how much of the data this actually affects.
>>40985101>whispermight want to try out different models https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
>>40985101also a paper that while not directly applicable could actually be the path forward for unique voice acting asrTransducers with Pronunciation-aware Embeddings for Automatic Speech Recognitionhttps://arxiv.org/abs/2404.04295>This paper proposes Transducers with Pronunciation-aware Embeddings (PET). Unlike conventional Transducers where the decoder embeddings for different tokens are trained independently, the PET model's decoder embedding incorporates shared components for text tokens with the same or similar pronunciations. With experiments conducted in multiple datasets in Mandarin Chinese and Korean, we show that PET models consistently improve speech recognition accuracy compared to conventional Transducers. Our investigation also uncovers a phenomenon that we call error chain reactions. Instead of recognition errors being evenly spread throughout an utterance, they tend to group together, with subsequent errors often following earlier ones. Our analysis shows that PET models effectively mitigate this issue by substantially reducing the likelihood of the model generating additional errors following a prior one. Our implementation will be open-sourced with the NeMo toolkit.https://github.com/NVIDIA/NeMo
>>40985180I'm not sure if there are word-level timestamps for the NVIDIA ones.
Page 9 bump.
>>40985372>>40985101I managed to get 94% of the lines "aligned" with a transcription on Rainbow Roadtrip with this method but the timestamps are all wrong because whisper_timestamped has a tendency to drift without voice activity detection (VAD). Unfortunately with VAD enabled it basically does not recognize Pinkie Pie as a speaking voice. So I ended up going back to using cross correlation, but with windowing around the expected time offset, and that seems to be promising.
>>40965450The end reminds me of a car hitting hard on the brakes.
>>40986757>>40982634Okay, here is an updated extra_labels_index.json with new timestamps that match the provided audio: https://files.catbox.moe/p57n0j.jsonThere is one caveat in that all of the HH_MM_SS timestamps in the file names become inaccurate. In this new release (remember to copy your config over): https://github.com/effusiveperiscope/PonySorter-B/releases/tag/20241004updatedThe timestamps for the -exported- file names (as well as for the label file) are based off the "actual" timestamp.For anyone who cares, here is the code I used for alignment:https://github.com/effusiveperiscope/PPPDataset/blob/main/data_realigner.ipynbAnd pic related is a time offset graph for the FiM movie (you can see the graph for Rainbow Roadtrip in the notebook). Seems to be pretty consistent with a different playback rate (the x axis is line index, so it's not exactly linear with respect to time but it would be pretty close). OTOH, the Rainbow Roadtrip time offset graph is very clearly stepwise, probably due to different commercial breaks.
everyone is a fucking rPHetard o2vwn this forum
>>40987180everyone is a fucking retard here including me
>>40987180Can you translate that to english?
>>40987180how fat are your fingers dudei cannot possibly conceive fingers fat enough to hit the keys that you didat this point i just feel bad for you
>>40987618it might be that ogre poster with a caged onion from a decade ago
>>40987642it completely blows my fucking mind. it honestly does. i can't believe it.>rPHetard>o2vwn>>40987180please, please, think about this, please. i'm begging you to think about this. please.
save
>>40951924There is now an open source reproduction of that paper:https://github.com/huggingface/parler-ttshttps://huggingface.co/spaces/parler-tts/parler_tts_miniIt takes two text inputs: the text to say and a prompt describing the speaker, speaking rate, recording environment, audio quality, etc.All datasets, code (including train/finetune), and models will be released under a permissive license. So far, they've released a 600M model trained on 10.5k hours. They're working on scaling it to 50k hours.
>>40988141hmm, its not exactly the same thing, its more like a nicer TTS tool minus option to choose an exact specific voice>ask for angry female>it just talk fasterOther than that, it's nice to see alternatives, sadly "singing" or "musical" are not tags that are affecting the output in any way HOWEVER there is always a hope that someone out there can train a new upgraded model.I still think it will take a year for someone to recreate the suno ai, and once that happens we will see entire mare albums being made each month.
>>40965026>entire point of software is to generate something using something as a base of inspiration>somehow this is a copyright issueDamn, I guess I can't start an italian symphonic power metal band for fear of it being a "copyright infringement" on Rhapsody of Fire's existence.
I can't wait for a better tools to be made soon, since fixing the strange filter effect is not possible and lines need to be re-sang again.
>>40987618Or maybe he was drunk as fuck.
>amre at 10
>>40990187Shamefur dispray.
>>40988141Interesting. I wonder how far the natural language description can be pushed. Spitballing here: one unique feature of using audio from a TV show is that if we had a show pseudo-script for the episodes (i.e. describing what is going on between the characters) it might be possible to generate synthetic descriptions for each speaking line that could give much more granular control over delivery. I'm also interested if anyone comes up with a solution for long-form inference with each line conditioned on the last, since that would be a natural fit for show audio too.
>>40990187Almost again.
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversationshttps://arxiv.org/abs/2404.06690>Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge in the field. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. CoVoMix is capable of first converting dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers. These token streams are then fed into a flow-matching based acoustic model to generate mixed mel-spectrograms. Finally, the speech waveforms are produced using a HiFi-GAN model. Furthermore, we devise a comprehensive set of metrics for measuring the effectiveness of dialogue modeling and generation. Our experimental results show that CoVoMix can generate dialogues that are not only human-like in their naturalness and coherence but also involve multiple talkers engaging in multiple rounds of conversation. These dialogues, generated within a single channel, are characterized by seamless speech transitions, including overlapping speech, and appropriate paralinguistic behaviors such as laughter. https://www.microsoft.com/en-us/research/project/covomix/obviously from microsoft so as usual for their voicegen stuff no weights. pretty good though from the examples
Posting this from wsghttps://github.com/huggingface/parler-tts>>/wsg/5503058I discovered an AI program called Resemble Enhance. I used it to clean the vocals I downloaded from youtube. I uploaded the clean vocals to elevenlabs. Here are the results.Franklin D. Roosevelthttps://vocaroo.com/1gSsiKrYtjkAHarry S. Trumanhttps://vocaroo.com/1nEqj1b4XOaVJackson Beck (Superman announcer)https://vocaroo.com/1kgRi6ICDNCdMr. Delicioushttps://vocaroo.com/1ggEgkhUPRRkI overlayed the FDR, Harry Truman, and Jackson Beck audio files with a song called Spidey Meets His Girl via kdenlivehttps://m.soundcloud.com/udi-harpaz-composer/spidy-meets-his-girl?in=udi-harpaz-composer%2Fsets%2Fspiderman-by-udi-harpazThen I uploaded the kdenlive output wav files to resemble enhance to remove the hissing and scratching sound. It can also remove reverb this way. Here's an example.Before:https://vocaroo.com/1kvyT6Wh8A2vAfter:https://vocaroo.com/1i7q7woh25jt
>>40991829>https://github.com/resemble-ai/resemble-enhanceshit, wrong github link
So the is now a copycat of 'suno', called 'udio' that does exactly same thing. >free betaIm guessing it will go full aiDungeon Mormon way and try to get people hooked up than slam them with 15$+ subscription or some other bullshit. And additionally, there is still no weight model included, so it goes straight into gay and fake category like the rest of those subscription services.
>>40993044Artificial lewd is still lewd.
>>40993206But what if he generated artificial consent along with the artificial mare?
I wanna make an RVC of a vocaloid, but the rawest form of audio I have is wav samples of all the phonemes for that vocaloid. Chopped up. In other words, it's not really singing moreso it's just chopped up samples. I've tried to put them all together in one audio file, but the end result is choppy. When I have them be used separately from each other, the result is a hot mess. If only I could just... have the voice sing naturally for the input audio, but it's a vocaloid, and the raw samples are very short in length, like the longest is one second long. I'm at a loss and don't know what to do. Any help to do this right would be massively appreciated.
>>40994089This may not be the best approach however wouldn't make sense to look for songs with the Vocaloid character and apply one of the vocal remover programs on it to get as close to "raw" audio output as possible, than place those clips in audacity to both chop them up into workable 10~ seconds clips as well as to pick out the better sounding clips (as to remove the chaff from grain).I am not as familiar with the gaming side of Vocaloid but I would imagine there are at least some games that are using the officially approved voices, so maybe it would be worth it to scout out some forums for extracted raw audio from them ?
>>40994089>>40994108I wonder if the artists would be willing to share the raw vocaloid data for their songs or other projects? Would probably be the best way to get clean speaking/singing audio if you're not skilled at making your own.
>>40994195On some rare occasion there are people willing to share raw project files, so that's not completely impossible, it all depends on how much time you are willing to spend to get the best quality dataset VS quickly grabbing a lower quality clips and start working on training imminently.
>>40987060Got an error while trying to load the fim movie:>Signature fim_movie_demu1 from audio but no corresponding labelNot sure what the "label" is supposed to be.>timestamps in the file names become inaccurateNot a big deal, they're mostly there for human readability and to prevent duplicate filenames. Shouldn't have any impact on the AI side of things.
>>40994282Did you preserve the config settings (e.g. it has to point at extra_labels_index.json instead of episodes_labels_index.json)
>>40994312Yeah, config looks correct.
>>40960343https://www.youtube.com/watch?v=13IW_KDCkJoIt concerns me how one of my favorite songs from my teen years has come to describe 4chan and society as a whole when it come to anything but contemporary politics and drama of any variety. it's impossible, but I'd find it interesting to have a mare voiced cover of this
Same anon as the one who asked about the RVC stuff here.I remember there was someone who was able to make a unique sound to a vocaloid and made them sound like a human singer through RVC. All I know is that they used the vocaloid samples for it, and I know nothing else of the process. They have since deleted the video showcasing the results, but it still intrigued me. I want to make a unique sound for the vocaloid to sound more human, not just a vocaloid ported to RVC. Are my ambitions too high here? Is this even possible?
>>40994335I don't know how this occurred. My only guess is that somehow I managed to upload a non-updated version of the executable here >>40987060. Deleted the old release and trying this again.https://github.com/effusiveperiscope/PonySorter-B/releases/tag/20241204updated
>>40994365Got the same error again. Could you tell me what the "label" is it's looking for? Perhaps I've just got something in the wrong place.
>>40994363Same anon, and I think I was misunderstood. I don't mean the raw samples from a vocaloid singing in the editor... I mean the literal raw samples that were used for recording the vocaloid, which are, surprisingly enough, VERY easy to extract from the vocaloid itself.https://voca.ro/1djZ7LBMphZ5Above is all of Miku's raw samples compiled together. This is what I'm talking about when I mean "samples".
>>40994416I can't reproduce this error. The "label" is supposed to be from the labels index. However the signature should not even be "fim_movie_demu1". How is your in_audio structured?
>>40994509>>40994416Actually, can you post the full traceback + your current save file? I think I might know what is happening.
>>40994521https://files.catbox.moe/q017lo.jsonFull traceback:pygame 2.5.2 (SDL 2.28.3, Python 3.10.13)Hello from the pygame community. https://www.pygame.org/contribute.html[2024-04-12 21:54:59,786] [INFO] Loading project P:/PVPP/Tools/Pony Sorter 5c/EQG.json[2024-04-12 21:55:00,095] [INFO] Loading fim_rainbow roadtriporig 3632.7363125demu0 3632.7363125demu1 3632.7363125[2024-04-12 21:55:18,574] [INFO] Loaded fim_rainbow roadtrip[2024-04-12 21:55:20,353] [INFO] Loaded P:/PVPP/Tools/Pony Sorter 5c/EQG.json[2024-04-12 21:55:26,361] [INFO] Loading fim_movie_demu1[2024-04-12 21:55:26,437] [WARNING] Signature fim_movie_demu1 from audio but no corresponding labelTraceback (most recent call last): File "gui.py", line 496, in load_selection File "gui.py", line 428, in load_selection File "core.py", line 140, in load_sigTypeError: 'NoneType' object is not iterable
>>40994533Why is fim_movie only 418.0 MB?
>>40994545Because it only has fim_movie_demu1 in it. That's probably the issue.
>>40994545>>40994561I put the original audio in there and it works now. Got a "hash mismatch" warning, not sure how significant that is. Audio seems to play fine.
>>40994585A hash mismatch warning may indicate that a file was only partially downloaded/uploaded
>>40995125>10, mares edition
>>40996134>Not showing 10 mares1 job
>>40994597Finished all the reviewing, but now got a new error on trying to export:Savefile - https://files.catbox.moe/2a39wk.jsonTerminal:pygame 2.5.2 (SDL 2.28.3, Python 3.10.13)Hello from the pygame community. https://www.pygame.org/contribute.html[2024-04-13 13:36:11,278] [INFO] Loading project P:/PVPP/Tools/Pony Sorter 5c/EQG.json[2024-04-13 13:36:11,576] [INFO] Loading eqg_better together_s02e04orig 177.6775demu0 177.6775demu1 177.6775[2024-04-13 13:36:12,584] [INFO] Loaded eqg_better together_s02e04[2024-04-13 13:36:14,319] [INFO] Loaded P:/PVPP/Tools/Pony Sorter 5c/EQG.json['eqg_better together_s02e04', 'eqg_better together_s02e05', 'eqg_better together_s02e06', 'eqg_better together_s02e07', 'eqg_dance magic', 'eqg_forgotten friendship', 'eqg_friendship_games', 'eqg_legend_of_everfree', 'eqg_mirror magic', 'eqg_movie magic', 'eqg_rollercoaster of friendship', 'fim_rainbow roadtrip', 'fim_movie'][2024-04-13 13:36:31,004] [INFO] Processing eqg_better together_s02e04Traceback (most recent call last): File "gui.py", line 104, in export_all_audio File "core.py", line 277, in export_audio File "utils.py", line 55, in path_reparse File "utils.py", line 44, in label_reparse File "utils.py", line 37, in convert_decimal_seconds_to_hh_mm_ssTypeError: unsupported operand type(s) for /: 'str' and 'int'
>>40996631Fixedhttps://github.com/effusiveperiscope/PonySorter-B/releases/tag/20241304updatedView stats should also work now
https://youtu.be/Tk8tDpweJB4?si=lbIcDfJw1hE3ABk2Just got recommended this anyone made a AI version yet
>>40921076The master file has now been updated with the newly cleaned data. If you use the data in the master file, you should re-download the whole thing. Also give it a quick look-over to ensure nothing's missing.https://mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig (same link as before)I still have a local copy of the old version that I'll hold onto for a while, just in case something goes wrong with the new version.>>40997261episodes_labels_index_updated:FiM - https://files.catbox.moe/4vscco.jsonEQG + movie - https://files.catbox.moe/8kp84w.jsonThese should cover all of the reviewed data, can you use these to make a graph showing the overall changes of clean/noisy/very noisy?
>>40999872oh boy, there is just enough data to train Iron Will rvc model on clean + noisy files.
>>40994436Same anon here. Sorry if I'm being intrusive, I just thought that with this being an audio centered thing, people could help. If I can't be helped here, where could I go to get the assistance I need for my issue?
>>41000255How have the previous answers to your questions not been able to address your issue?>>40994195>>40994213
>>41000300Because they're talking about samples as in RENDERS of the vocaloids, and not the actual extracted vocal samples used to record the vocaloid in the first place. If that makes sense.
>>41000466Why do you want to train off of extracted vocal samples rather than renders?
>>40999872There seem to be a few gaps in the data, so I am going to have to check things over.
>>41000507First gap: s1e1 to s1e16 (consistent with the "two different save files" problem we had earlier)Second gap: s5 (the labels index you posted for s5 seems to only contain modified data for s4.) However this does not seem to affect the actual audio files. In any case I will be downloading the actual master file so I can check against that as well.
>>41000466I think that's a lot harder (impossible?) to achieve with just a model. The model would have no real context on how each sample should flow into each other to sound natural.
>>41000220https://files.catbox.moe/aokhto.mp3https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/MLP_Iron_Will_GA2m8s (with the use of all the clean lines and most of the noisy lines) have been used to train this RVC Iron Will model. I pushed the training to run for 350, as I feel the extra steps do help out when the dataset is so short.Now it's time to lurk for another obscure background ponies that may have gained additional clean lines for production of training datasets.
>>41000730Harder/Impossible? Is there any way to make them smoother somehow? Or anything like that? Or something that make them less choppy? The reason I insist on using the samples is because they’re the audio files with no vocaloid engine noise in them. No matter what render I could use, it’ll always have vocaloid engine noise at least in some capacity.
Did this masterpiece ever get remade with better models?https://vocaroo.com/kwLL9EyTbAQ
>>41000960Sorry not really sure how you'd accomplish that with AI. Maybe you could manually do something with the audio files and map them onto the vocaloid instructions and train on those outputs?
>>41001070I don’t know if that would work, nor where to start. The weirdest thing is that some people have done it before, and done it considerably well, to the point there’s a very human-like and dynamic nature of the voice not present in the samples for the RVC models. The only thing that’s wrong here is that unfortunately, they don’t make their methods public. So anyone interested in replicating it is essentially scrambling around for answers and leads.
>>40966045 Since there is no response to this post for over a week I guess the idea can be consider dedhttps://suno.com/song/7a5c06b1-dafa-4717-967f-6416a3de2b0ahttps://vocaroo.com/1iNd7HL85PoLHere is song inspired by the green text (37557739) of Anon's dream about mare with stage name Black Sunset singing metal power song in some indie Equestria band.>catbox is temporary dedother than that, I can confirm that suno devs starting to behave little bit more kosher, last week one could easly generate 2m+ song, and now the song generator has given me 1m20s output.
>>41000966Not as far as I know.
>>41000580>>41000507>>40999872Here is the "corrected" version of the episodes_labels_index.json for the FiM files: https://files.catbox.moe/g0mg81.jsonFrom analysis of the master file there were no skipped FiM episodes aside from the ones that are supposed to be skipped (special source/outtakes), at least according to the generated labels, and all lines inside the FiM label files correspond to an existing file in the master file. *However* the current Label Files directory does not contain label files for EQG/movies/specials as well as the older label files for audio we didn't handle like songs.txt, mobile game, and Other.Over the entire observed data: Of 58578 total lines, 14927 noise ratings were modified (~25%). The clean/noisy/very noisy split was 17353/19629/21596 prior, 21599/22475/14504 post.
>>41001932Not sure if this is possible but could you create a text file that list a comparison of all characters lines in hours/minutes/seconds? Eg:Twilight C:3 hours N:2 hours VN: 1 hourFluttershy C:2 hours N: 45 minutes VN: 20 minutesMinuette C: 1 minute N: 25 second VN: 10 second
>>41002579Would a csv be better? Here is a CSV for Clean/Noisy/Very Noisy split in seconds, ordered by Clean+Noisy, calculated across FiM episodes + EQG data + movie and specials. Duration is calculated from label timestamps.https://files.catbox.moe/xu0age.csv
>>41003547cheers, this format is much better>all of fim has 250 spoken charactershuh, thats about the same number as I would expect of npcs in a pretty decent rpg.
>>41001932>all lines inside the FiM label files correspond to an existing file in the master file.Does that mean the missing stuff from >>41000580 can be disregarded, or is there something still to fix there?>the current Label Files directory does not contain label filesFixed. Forgot to upload the old unchanged labels earlier.>picLooks like a worthwhile improvement, especially considering that a lot of the "noisy" lines are now less noisy.
>>41003813Some one-offs were not annotated I think.>>41003960>Does that mean the missing stuff from >>41000580 # can be disregarded, or is there something still to fix there?Yes, that was just inside the label index.
Up from page 9.
>>41004630from 10
>>41005291Not so good.
So what is everyone working on? I'm cracking my head on figuring out some punk song lyrics from perspective of Octavia.
https://huggingface.co/spaces/Xenova/musicgen-webnot new but a web version for that one guy who seems to like musicgen
>>41006687trying to generate 20s music make the generator shit itself, and the 10s samples are bit meh (I remember using the offline version a year ago and it work much better than).
>>41001527need more pony power metal
>>41007990A loot of artificial booping happening here lately.Not that I complain.
>>40999872I uploaded a copy of the Master File here: https://drive.google.com/drive/u/2/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQSort of. I fixed one typo: in Special Source, one of the folders says "Freindship" instead of "Friendship". It should technically also be "Rollercoaster" rather than "Roller Coaster". I plan to do a more thorough check tomorrow for typos and potentially other errors.The Special Source/Luster Dawn folder is empty. Is it supposed to be?
>>41007304That would be nice, and to tell the truth I look forward that isn't just another wubstep remix. Now this makes me wonder, what is the rares type of pony musical genre ? Metal, jazz and ska seems pretty rare.
Long-form music generation with latent diffusionhttps://arxiv.org/abs/2404.10301>Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure. We show that by training a generative model on long temporal contexts it is possible to produce long-form music of up to 4m45s. Our model consists of a diffusion-transformer operating on a highly downsampled continuous latent representation (latent rate of 21.5Hz). It obtains state-of-the-art generations according to metrics on audio quality and prompt alignment, and subjective tests reveal that it produces full-length music with coherent structure.https://stability-ai.github.io/stable-audio-2-demo/https://github.com/Stability-AI/stable-audio-tools/stable audio 2 paper. still no weights
>>41009282Basically anything that isn't EDM, hyperpop, or dubstep. Swing music that's actual swing music and not "electroswing" yeah let's just take this 1940s music and ruin it with electric bassy drums, then ruin it further by chopping it up so it's repeating the same little bit, and to top it off let's master it by making the electric drum beat louder than the actual music! is also quite rare.
>>41009637out of curiosity what is your opinion on https://youtu.be/N7wTBGP4UFs?si=W4w_mSo6vQj9QpwD?
https://huggingface.co/spaces/pyp1/VoiceCraft_gradiowebui for voicecraft
>>41009954input https://vocaroo.com/1nItsHLVEtqpoutputhttps://vocaroo.com/18qK1E6F1CnJNow this may have create some possibilities in the future, however they would need to update the output quality, the 16000Hz suck ass pretty hard.
>>41008753>Freindship>Roller CoasterFixed. Just simple typos.>Special Source/Luster Dawn folder is emptyMy records show the same empty folder existed in the previous master file version, so not a new issue that's suddenly been introduced. Likely a temp folder from the first pass at processing the S9 leaks that I forgot to delete.
messing around with sovits4 again and how feasible would it be to try and recreate all the vocal effects taylor uses and blank space? here is a small example of what I've got so far using dash's voice to replace vocalshttps://files.catbox.moe/0etyeh.mp3don't even know what vocal effects are used in blank space
>>41010125>Sliced Dialogue/Label files/s01e07_Music.txtEverything else using a lowercase '_music', but this one uses an uppercase M.>Sliced Dialogue/EQG/EQG Roller Coaster of FriendshipShould be "EQG Rollercoaster of Friendship" to match the updated name in Special Source.Some transcripts use "Flurry Heart", some use "Flurry".>./fim_s09e01_original.txt:940.685169 941.573000 00_15_41_Flurry Heart_Anxious_Very Noisy_mama!>./s09e01.txt:940.685169 941.573000 00_15_41_Flurry Heart_Anxious_Noisy_mama!>./fim_s09e01.txt:940.685169 941.573000 00_15_41_Flurry Heart_Anxious_Very Noisy_mama!>./s09e01_demu0.txt:940.685169 941.573000 00_15_41_Flurry Heart_Anxious_Noisy_mama!>./fim_s07e03.txt:683.600000 684.812919 00_11_24_Flurry_Happy_Very Noisy_ta da!>./s07e03.txt:683.600000 684.812919 00_11_24_Flurry_Happy_Very Noisy_ta da!>./fim_s07e03_original.txt:683.600000 684.812919 00_11_24_Flurry_Happy_Very Noisy_ta da!>./s07e03_master_ver.txt:683.600000 684.812919 00_11_24_Flurry_Happy_Very Noisy_ta da!Typo (Applejack223 should be Applejack):>./s01e18.txt:5.484631 7.017492 00_00_05_Applejack223_Neutral_Noisy_we're almost there, youngins!>./s01e18_demu1.txt:5.484631 7.017492 00_00_05_Applejack223_Neutral_Noisy_we're almost there, youngins!Typo (Mrs.Cake should be Mrs. Cake, with a space for consistency with other labels)>s02e10.txt:366.058770 370.764964 00_06_06_Mrs.Cake_Happy_Noisy_When we found out it was your birthday, we couldn't resist trying out a new recipe!>fim_s02e10_original.txt:366.058770 370.764964 00_06_06_Mrs.Cake_Happy_Noisy_When we found out it was your birthday, we couldn't resist trying out a new recipe!>fim_s02e10.txt:366.058770 370.764964 00_06_06_Mrs.Cake_Happy_Noisy_When we found out it was your birthday, we couldn't resist trying out a new recipe!>s02e10_demu0.txt:366.058770 370.764964 00_06_06_Mrs.Cake_Happy_Noisy_When we found out it was your birthday, we couldn't resist trying out a new recipe!Typo (Jet Set has an extra space at the end of his name)>./fim_s02e09_original.txt:144.263221 147.820380 00_02_24_Jet Set _Neutral__We saw you from across the cafe and just had to find out.>./s02e09_master_ver.txt:144.263221 147.820380 00_02_24_Jet Set _Neutral__We saw you from across the cafe and just had to find out.>./s02e09.txt:144.263221 147.820380 00_02_24_Jet Set _Neutral__We saw you from across the cafe and just had to find out.>./fim_s02e09.txt:144.263221 147.820380 00_02_24_Jet Set _Neutral__We saw you from across the cafe and just had to find out.
>>41010125>>41011157In Rainbow Roadtrip, there are some weird "Si" emotion tags.>00_22_01_Mayor Sunny Skies_Sad Si_Noisy_then fences went #up#, we lost track of our neighbors, each year passing dimming spirits all #around#.txt>00_22_09_Mayor Sunny Skies_Sad Si_Very Noisy_the happy days came to an #end#, and nopony had time to spend to gather in the #town#.txt>00_22_18_Mayor Sunny Skies_Sad Si_Very Noisy_i thought i knew exactly what the #festival# #needed#.txt>00_22_26_Mayor Sunny Skies_Sad Si_Very Noisy_a bigger better #rainbow#, would surely make them see #it#.txt>00_22_35_Mayor Sunny Skies_Sad Si_Very Noisy_but the extra magic was too much for the #rainbow# #generator#.txt>00_22_42_Mayor Sunny Skies_Sad Si_Very Noisy_and i'm the one who brought the #rainbow#, to an end.txt>00_22_54_Mayor Sunny Skies_Sad Si_Noisy_that's how our #town#, our little pony #town#, that's how our town saw the #end#, of the #rainbow#.txtFor consistency, the character name should be "Petunia Petals" rather than "Petunia" in Rainbow Roadtrip:>00_42_27_Petunia_Neutral_Very Noisy_not all of it was.txtAnd "Mr. Moody Root" should be "Moody Root":>00_38_40_Mr. Moody Root_Annoyed_Very Noisy_who wants to know_.txt
I pushed the code I'm using to validate the Master File here:https://github.com/synthbot-anon/horsefm-lib
>Page 10
>9 bump
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Timehttps://arxiv.org/abs/2404.10667>We introduce VASA, a framework for generating lifelike talking faces with appealing visual affective skills (VAS) given a single static image and a speech audio clip. Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness. The core innovations include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos. Through extensive experiments including evaluation on a set of new metrics, we show that our method significantly outperforms previous methods along various dimensions comprehensively. Our method not only delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512x512 videos at up to 40 FPS with negligible starting latency. It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.https://www.microsoft.com/en-us/research/project/vasa-1/from microsoft so no weights ever I'm sure. obviously human faces but just posting for potential in the future for using such tech for animated faces
had a thought yesterday and figured i'd ask heremost AI work seems to be done on either windows or linux, and seems to be 'shackled' for lack of a better word to the limitations of those operating systemswhat would have to be done to build a brand new OS from the ground up specifically for AI development/training/operation and would a dedicated OS result in any noticable or worthwhile improvement?
>>41012620>making custom OS for pony ai developmentTechnically doable but I do not think even with all the Tuplafags on the board we have enough schizo power to make that into reality.
>>41011157Changed "Flurry Heart" to just "Flurry" for s9e1. Left s7e3 version as-is.Fixed the "Applejack223" typo.Fixed the "Mrs.Cake" typo.Fixed the "Jet Set _" typo.>>41011159Changed all "Si"s to "Singing".Fixed "Petunia" typo.Fixed "Mr. Moody Root" typo.That should cover everything.
Late night preservation post.
>>41012620Compatibility with all other AI research is the highest priority for speeding up development, and creating a new OS would make that essentially impossible.The OS is mostly a pass-through for all of the software that runs on it. It's not much of a bottleneck, especially on Linux, and especially with docker.
>>41012699The FiM folder seems to have a copy of the Special Source EQG Short folder. Otherwise, that seems to be everything for now.The updated Gdrive clone is here: https://drive.google.com/drive/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQ?usp=drive_linkI uploaded a copy of the dialogue dataset to HuggingFace as well: https://huggingface.co/datasets/synthbot/pony-speechI updated horsefm-lib with code for pushing the data to HuggingFace: https://github.com/synthbot-anon/horsefm-lib/blob/main/notebooks/export_data.ipynbI realized that my clone of Master File 2 is outdated. I'm fixing that now. I'll check that for errors too.
>>41012699In Rainbow Roadtrip, the Master File has some singing lines from 00_21_05 to 00_22_54. Some of these seem to be missing from the Songs folder in Master File 2. For example, this is a singing line that only exists in Master File 1:>00_21_05_Mayor Sunny Skies_Happy Singing_Very Noisy_they planned for weeks, cooked for days, celebrated fifty #ways#, so everypony would gather here, in our town at the end of the #rainbow#..flacTypo (tag should be Singing rather than Singingnging):>00_20_00_Mayor Sunny Skies_Happy Singingnging_Noisy_next-door neighbors chatting, over white wood fences, stopping on the street to say #hello#..flac>00_20_00_Mayor Sunny Skies_Happy Singingnging_Noisy_next-door neighbors chatting, over white wood fences, stopping on the street to say #hello#.txt>>41013398The Master File 2 gdrive clone is here: https://drive.google.com/drive/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQ?usp=drive_linkThe singing data is on HuggingFace here: https://huggingface.co/datasets/synthbot/pony-singing
>>41012943
>>41012699Errors in the SFX & Music labels:https://ponepaste.org/9979I'm not done finding all of them yet. There's a large list of tags at the bottom of the paste that I still need to read through.
>>41013861That's a weapons grade cute robo flutters.
>>40954981>>40957582https://www.youtube.com/watch?v=TWcajuXywHASo from my experimentation this is what I've learned. Trying to run the extracted vocals direcly on rvc/sovits creates weird result due to the model is applying random reverb, audio effects and duo vocals layered on top of each other, and using the talknet to create a somewhat uniform effect is somewhat difficult as the pitch levels are jumping all over the place.I've tried to fix above with reducing more derped and un-salvageable lines within the rvc on my own but the pitch difference between them was pretty noticeable, so as final attempt at saving it I run the end result within audiotune filter and it seem to calm down the random pitch ups and downs.The more frustrating part of it was trying to get Pinkie voice sing a tune that is slow, calm and bit sad, all the three things that are very opposite of what would be expected from her type of voice.A best solution here would be to have this kind of tech offline and just train the singing part of the model to understand what Pinkie sounds like to just get the exact result from the get go, but that's most likely will not happen until next year.
>>41011079ayo, this shit slaps. do you got any more dash covers? :)
>>41013398Removed duplicate folder.>>41013574>Missing singing linesCopied over all lines from master file to master file 2 between 21:05 - 22:54, that should fix it.>SingingngingCorrected typo.>>41013995SFX and Music.I know there will be a load of typos/inconsistencies in there, that was all done by hand and I pretty much just made up the tagging system as I went along. I might pass on fixing most of those if they're not too bad, going through all that's gonna be very tedious.
>>41011079I know nothing about vocal effects, but this is already pretty nice for a sovits cover.
>>41015066>singing linesI'll update my gdrive later today. The HF repos should already be up-to-date since they treat the two Master Files as one pile of data. >SFX and Music>I might pass on fixing most of those if they're not too bad, going through all that's gonna be very tedious.Agreed, and I've only checked a small fraction of the tags so far.If any code anons want to try automating some of these fixes, see the paste linked in >>41013995. These are all either mismatches, typos, or tag consistency issues between the sfx.txt/music.txt label files in Master File 1 and the .flac files in Master File 2's Music and SFX folder. Fixing them means updating labels in .txt files and changing filenames for .flac files.
>>41015066I realized that the mismatches between label files and audio files would be the hardest to programmatically fix, and I think you can fix all of those by re-exporting the data. It's these episodes that have issues:>FiM_s01e01>FiM_s01e02>FiM_s01e04>FiM_s01e05>FiM_s01e25>FiM_s02e01>FiM_s02e03>FiM_s02e05>FiM_s02e07>FiM_s02e14>FiM_s02e17>FiM_s02e20If you can re-export those, it would be much easier for code anons to fix the rest of the errors.
>>409253321 month late but Fluttershy and Pinkie
>>41015066The Master File 2 link in Master File 1 is broken, which might explain the mismatches. The link in the Master File mega points to:>https://mega.nz/folder/0UhSmYAB#WBrB-qCprQTofkAhwMp5CQWhich doesn't seem to be available. I'm using the link from the OP:>https://mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQWhich works.
>>41014860This looks it got a fair share of inspiration from ET.
>>40921076My gdrive now has the updated Master File 2 plus the latest Master File updates. Same link.https://drive.google.com/drive/u/2/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQAnd I updated my horsefm-lib to support Master File 2 for the singing lines.https://github.com/synthbot-anon/horsefm-libThe export_data.ipynb notebook shows how to separate the dialogue & singing lines.https://github.com/synthbot-anon/horsefm-lib/blob/main/notebooks/export_data.ipynb
Does anyone know what format Stable Video Diffusion needs for fine-tuning? Is there a preferred frame rate, resolution, image format? Is transparency okay? Anything else to consider?
>>41020076
>>41021498-1, but still bad.
>>41022450Damn, that's a captivating eye she has here.
>>41020716NTA but i would love to know more about this too
>>41022450Another boop.
Learn2Talk: 3D Talking Face Learns from 2D Talking Facehttps://arxiv.org/abs/2404.12888>Speech-driven facial animation methods usually contain two main classes, 3D and 2D talking face, both of which attract considerable research attention in recent years. However, to the best of our knowledge, the research on 3D talking face does not go deeper as 2D talking face, in the aspect of lip-synchronization (lip-sync) and speech perception. To mind the gap between the two sub-fields, we propose a learning framework named Learn2Talk, which can construct a better 3D talking face network by exploiting two expertise points from the field of 2D talking face. Firstly, inspired by the audio-video sync network, a 3D sync-lip expert model is devised for the pursuit of lip-sync between audio and 3D facial motion. Secondly, a teacher model selected from 2D talking face methods is used to guide the training of the audio-to-3D motions regression network to yield more 3D vertex accuracy. Extensive experiments show the advantages of the proposed framework in terms of lip-sync, vertex accuracy and speech perception, compared with state-of-the-arts. Finally, we show two applications of the proposed framework: audio-visual speech recognition and speech-driven 3D Gaussian Splatting based avatar animation.https://lkjkjoiuiu.github.io/Learn2Talk/no weights. but pretty cool. relevant here I think though obviously you'd need to do a big fine-tune or even train a new model for pony/animation to pony 3d models
Huh, wasn't expecting NAI to ever update their ancient furry model to V3, but turns out they did.Some pretty decent mares can result, this time without as much of washed out coloring the old one had. It can also kind of do cutie marks, interestingly. I've still got a bunch of Anlas from months ago, so I'll do more testing. Not expecting this to be a competitor to Pony Diffusion; comparable alternative to note though.
>>41023564Indeed.
apt-get install -y espeak espeak-data libespeak1 libespeak-devapt-get install -y festival*apt-get install -y build-essentialapt-get install -y flac libasound2-dev libsndfile1-dev vorbis-toolsapt-get install -y libxml2-dev libxslt-dev zlib1g-devpip install -r gradio_requirements.txtDoes anybody know how do i install this linux bullshit on windows?>>41024137that's nice looking amres, even the cutie marks are decently shaped and not a bunch of colorful nonsense
I'm getting this error when trying to download Talknet models with Haysay.Any advice?
>>41024413Yes.
LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Searchhttps://arxiv.org/abs/2404.14063>Evolutionary Algorithms and Generative Deep Learning have been two of the most powerful tools for sound generation tasks. However, they have limitations: Evolutionary Algorithms require complicated designs, posing challenges in control and achieving realistic sound generation. Generative Deep Learning models often copy from the dataset and lack creativity. In this paper, we propose LVNS-RAVE, a method to combine Evolutionary Algorithms and Generative Deep Learning to produce realistic and novel sounds. We use the RAVE model as the sound generator and the VGGish model as a novelty evaluator in the Latent Vector Novelty Search (LVNS) algorithm. The reported experiments show that the method can successfully generate diversified, novel audio samples under different mutation setups using different pre-trained RAVE models. The characteristics of the generation process can be easily controlled with the mutation parameters. The proposed algorithm can be a creative tool for sound artists and musicians.https://github.com/fisheggg/LVNS-RAVEhttps://huggingface.co/Intelligent-Instruments-Lab/rave-models/tree/mainaudiogen stuff. examples on their github. short paper but the models were trained 6 months ago? guess they really wanted their paper in some specific conference
Trying to generate audio with RVC using Starlight but I get this error...
>>41025282Try again. It happened to me too sometimes but worked well at a later attempt.
https://voca.ro/1k7cZmIGmTU9
>>41025913Typing "voca.ro" in desuarchive's search box was definitely the best decision I've made today.AŁA KURWA GRYZIE
>>41025771Still doesn't work...does file size and audio duration matter? The file I'm trying to convert is around 10 and a half minutes
FlashSpeech: Efficient Zero-Shot Speech Synthesishttps://arxiv.org/abs/2404.14700>Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large-scale zero-shot speech synthesis system with approximately 5\% of the inference time compared with previous work. FlashSpeech is built on the latent consistency model and applies a novel adversarial consistency training approach that can train from scratch without the need for a pre-trained diffusion model as the teacher. Furthermore, a new prosody generator module enhances the diversity of prosody, making the rhythm of the speech sound more natural. The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation. Our experimental results demonstrate the superior performance of FlashSpeech. Notably, FlashSpeech can be about 20 times faster than other zero-shot speech synthesis systems while maintaining comparable performance in terms of voice quality and similarity. Furthermore, FlashSpeech demonstrates its versatility by efficiently performing tasks like voice conversion, speech editing, and diverse speech sampling.https://flashspeech.github.io/new voice model. no weights or code. but they're a chinese team so maybe. they described their training method so might even be possible to recreate or make a custom one (like for ponies). hard to say how long it took but they used 8x H800s which iirc have half the memory bandwidth of h100s
>>40921076I uploaded the fimfarchive data here: https://huggingface.co/datasets/synthbot/fimfarchiveAfter pip install --upgrade datasets, you can load it with:>from datasets import load_dataset>dataset = load_dataset("synthbot/fimfarchive")Here's the code for completing, converting, and pushing the dataset from the fimfarchive:https://github.com/synthbot-anon/horsewords-lib
>>40971560The PPP turned 5 years old on April 5, happy belated birthday
>>40921076https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/MLP_Sweetie_Belle_Squeekyhttps://vocaroo.com/18BYdlV0bMSaSweetie_Bell trained on her S1 and S2 lines, mostly the squeaky ones. The transpose will need some playing around as with some input audio you will need to set it to 12 and with others to 24.
>>41028148
>>41028354Kek.
>>409210711111 = 15
>>41028148> 5 yearsit's has been this long, huh...
>>41028148We are gathered here today, 5 years since the inception of a dream, to celebrate the union of spirits in the desperate pursuit of waifus. We honor our horsefuckers, those that clip, that build, that produce, that preach, that bicker. We honor those that inspire us, that give direction to our lives, and that walk us always forward toward our beloved mares.Happy birthday, PPP.
>>41028148>>41029227Jesus, it's that old already?How did time fly so fast?
queen piccalishttps://files.catbox.moe/0guftr.mp3
>>41029834catbox borked the filehttps://voca.ro/13fiEnurSQRQ
>>41029227>its been 5 years and I still haven't summoned the courage to be retarded trying to get a local tool running
Good evening.I think it was about three years ago that I asked about the possibility of vocal capture, in which an AI-generated voice maintains the timbre of the character it trained on, but follows the pitch and inflections of another recording. Has that been explored in the last three years, or is it still too far off?
>>41030231Was your post around Jan 2021? For example: https://desuarchive.org/mlp/thread/36432529/#36456897That was just 5 months before TalkNet, which does exactly what you're talking about. Then around the start of last year there was so-vits-svc and RVC, which improved on TalkNet. So it's been explored quite a bit
>>41030133To be fair, it's just regular retardation for me. Half the thread reads like a different language for my techlet pea of a brain.
>Page nein
I need you to generate this with Discord's voice.>O-OH MY CELESTIA!! I-IS THAT PRINCESS TWILIGHT SPARKLE?>THAT'S MY FAVORITE SNOWPITY, JAZZY, SLOW BURN, BONE CHILLING, ATMOSPHERE-OOZING, TROPE-SUBVERTING, GENRE-REDEFINING, GUT-WRENCHING, SPINE-TINGLING, EMOTIONALLY TAXING, PARANOIA-INDUCING, JAW-CLENCHING, NERVE-WRACKING, CHARACTER-DEVELOPMENT DRIVEN, SOUL-SHAKING, NAIL-BITING, ANXIETY-WRITTEN, KAFKAESQUE, POST-LYNCHIAN, QUESTION-ASKING, SOCIALLY-AWARE, ETHNICALLY-DIVERSE, POLITICALLY-COGNIZANT, CULTURALLY RELEVANT, SOCIALLY-PRESCIENT, THOUGHT-PROVOKING, ARTISANALLY-CRAFTED PONY IN THE SERIES!!!
>>41030864+1
>>41031700+0
>>41032035Once more.
>>40924154
>>41031350https://files.catbox.moe/ybp1gz.wav
>>41029834>>41029876She sounds kinda eastern here. Couldn't say why though.
https://github.com/jasonppy/VoiceCraftnew modelshttps://huggingface.co/pyp1/VoiceCraft/tree/mainMonth old release 330M weights:https://vocaroo.com/17v80p9NQi6AThree weeks old 330M weights:https://vocaroo.com/1aMwxaZb1jgpNewest 330M weights:https://vocaroo.com/1h2sj2e9Zp8ZNewest upsampled with audiosr:https://vocaroo.com/17Jx0xDoXz05
>>41032035-1
>>41034179
>>40937305I feel like if anyone can answer this it's clipperhope he responds to you anon
>>40937342>pointless minigamesi have a fucking BLAST every time we have minigames they're so funthing i look forward to most is actually the minigames
>>41034888>>41034891>looked at the dates of those postsi realize im retarded im sorry bros
>>40937342I liked the minigames. I'd rather have that than an hour of autists doing a very technical in-depth presentation that is really difficult to follow.vdymr
Hydrus, if haysay.ai has multispeaker StyleTTS2 models that were trained on more characters than the Mane 6, why do precomputed styles only allow me to select from the Mane 6?
Is there a torrent link for fimfiction archive?
>>41035656Pretty sure there is a links to that somewhere in the scripts of the fim tool in main googledoc.Otherwise there is this user made torrent https://www.fimfiction.net/user/116950/Fimfarchive
>>41035496Partly because I never got around to creating more precomputed styles and partly because I was struggling with getting good results for the characters I did attempt. I tried to make precomputed styles for Starlight Glimmer and Gilda, but either the emotion/trait I was targeting wasn't coming across very strong or the generated audio didn't sound much like the character, especially for Gilda. The output would oftentimes sound more like Twilight Sparkle. I did manage to get a few OK ones for Starlight Glimmer; I'll add those soon.
>>41035908thanks
>>41036171NTA. This may be a silly question but is there a nice and simple offline UI for StyleTTS2 ?
>>41036205You can install and run Hay Say locally:https://github.com/hydrusbeta/hay_say_ui?tab=readme-ov-file#installation-instructionsThere's also an online colab here:https://colab.research.google.com/drive/1ys8SkP-VW7CkhnwVveEGINaszG1kRaYl?usp=sharing#scrollTo=pGArrru8BpEeIt should be possible to download its python notebook file (.ipynb) and then run it on a local Jupyter environment.
>>41036313Oh, that colab link in my previous post is for the one that comes with an epub downloader. Here's the link to another colab without epub:https://colab.research.google.com/drive/1dDwKPYc2daS3MZxpinlfyIHd2jmGiHLh
>>41036205https://github.com/effusiveperiscope/StyleTTS2_GUI
>>41036322>>41036331So it generates the waveform directly? How much slower is it than Tacotron+vocoder?