Welcome to the Pony Voice Preservation Project!youtu.be/730zGRwbQuEThe Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.EQG and G5 are not welcome.>Quick start guide:docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/editIntroduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.>The main Doc:docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/editAn in-depth repository of tutorials, resources and archives.>Active tasks:Research into animation AIResearch into pony image generation>Latest developments:GDrive clone of Master File now available >>37159549SortAnon releases script to run TalkNet on Windows >>37299594TalkNet training script >>37374942GPT-J downloadable model >>37646318FiMmicroSoL model >>38027533Delta GPT-J notebook + tutorial >>38018428New FiMfic GPT model >>38308297 >>38347556 >>38301248FimFic dataset release >>38391839Offline GPT-PNY >>38821349FiMfic dataset >>38934474SD weights >>38959367SD low vram >>38959447Huggingface SD: >>38979677Colab SD >>38981735NSFW Pony Model >>39114433New DeltaVox >>39678806so-vits-svt 4.0 >>39683876so-vits-svt tutorial >>39692758Hay Say >>39920556Haysay on the web! >>40391443SFX seperator >>40786997 >>40790270Clipper finishes re-reviewing audio >>40999872Synthbot updates GDrive >>41019588Private "MareLoid" project >>40925332 >>40928583 >>40932952VoiceCraft >>40938470 >>40953388Fimfarch dataset >>410279715 years of PPP >>41029227Audio re-up >>41100938>The PoneAI drive, an archive for AI pony voice content:drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp>Clipper’s Master Files, the central location for MLP voice data:mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSigmega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQdrive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx>Cool, where is the discord/forum/whatever unifying place for this project?You're looking at it.Last Thread:>>41064811
FAQs:If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/editMain: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit>Where can I find the AI text-to-speech tools and how do I use them?A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwqHow to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy>Where can I find content made with the voice AI?In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCpAnd the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit>I want to know more about the PPP, but I can’t be arsed to read the doc.See the live PPP panel shows presented on /mlp/con for a more condensed overview.2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ42021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw52023 pony.tube/w/fVZShksjBbu6uT51DtvWWz>How can I help with the PPP?Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.>Did you know that such and such voiced this other thing that could be used for voice data?It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.>What about fan-imitations of official voices?No.>Will you guys be doing a [insert language here] version of the AI?Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.>What about [insert OC here]'s voice?It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.>I have an idea!Great. Post it in the thread and we'll discuss it.>Do you have a Code of Conduct?Of course: 15.ai/code>Is this project open source? Who is in charge of this?pony.tube/w/mqJyvdgrpbWgZduz2cs1CmPPP Redubs:pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97Stream Premieres:pony.tube/w/6cKnjJEZSCi3gsvrbATXnCpony.tube/w/oNeBFMPiQKh93ePqTz1ns8
>>41137243Anchor.
>>41137243Pre-director OP image is a bad omen
I think mares are kinda cool
>>41138126I would love to own a cute robot mare maid.
>>41138285
>>41138042AI will help porny games.
Style Mixture of Experts for Expressive Text-To-Speech Synthesishttps://arxiv.org/abs/2406.03637>Recent advances in style transfer text-to-speech (TTS) have improved the expressiveness of synthesized speech. Despite these advancements, encoding stylistic information from diverse and unseen reference speech remains challenging. This paper introduces StyleMoE, an approach that divides the embedding space, modeled by the style encoder, into tractable subsets handled by style experts. The proposed method replaces the style encoder in a TTS system with a Mixture of Experts (MoE) layer. By utilizing a gating network to route reference speeches to different style experts, each expert specializes in aspects of the style space during optimization. Our experiments objectively and subjectively demonstrate the effectiveness of our proposed method in increasing the coverage of the style space for diverse and unseen styles. This approach can enhance the performance of existing state-of-the-art style transfer TTS models, marking the first study of MoE in style transfer TTS to our knowledge.>We trained the StyleMoE framework on the ”100-clean” subset of the LibriTTS dataset [24], comprising 100 hours of multispeaker speech data. This data was downsampled to 16 kHz from its original 24 kHz.https://stylemoe.github.io/styleMoE/no model but no one would want it since it's a small academic proof of concept one that is useless for anything neat. to the point where its just hard to tell if the method is even worth scaling but well interesting idea.
>>41135966>Tears for Fears - Everybody Wants to Rule the World (Rarity AI cover)>https://www.youtube.com/watch?v=w6817OqPfYUFinally finished it. I'm learning as I go, but I'm pretty content with this one. My goal is to make an AI cover for each of the mane six characters, so AJ and Rarara down, four more to go!
>>41138517nice work , however the ai outputs have randomly changing reverb (im not talking about two voice overlaying each other) that comes harder in few moments than in other.
>>41138967The reverb settings are constant throughout the song. What you're hearing is two voicetracks overlapping, a main/lead voice and a falsetto voice with a slightly different melody and a bit lower volume. During some parts of the song, only the lead voice plays and the falsetto is silent. Most noticable during the bridge.
>>41137243>They changed the fucking OP picAnd the downfall of the PPP begins.
>>41139105This was the original OP pic. It was originally changed to commemorate 15.ai going live, but since it's dead now, why not change it back?
>ywn sing your son and daughter a lullaby with your horsewifewhy live?
>>41139847Yeah that sucks, but if you think about it, we're all kind of lucky that we were born just at the right time to experience friendship is magic when we could most appreciate it. Plus, we get to enjoy the rise of AI during it's most fun times. The early years when it's still a free for all rechnology available to anyone, and it's not yet regulated to hell and back and monopolized by three or four big corporations who will sterilize the fun out of it for profitability.
Bump.
Would you all be okay if I posted a link to this thread to a discord server I might make.I am just trying to say something instead of bump.
>>41143349
>>41064967>With a laughter's ring (aha!), Pinkie's in line>Throws a party for two, with Anon, it's divineHuh. https://www.youtube.com/watch?v=ux8XqsIA9s8
>>41144337other than haysay there really isn't good way to use it. or train it
https://youtu.be/ORiIsU2D0Sk?si=zNuCQ_SVYiRO5APZSo... this is a thing. Before you rush to get it though, it's only available through a closed beta, and they have to check who you are before you get a copy. Unfortunately, I never got a response back, but eh, maybe someone here will have better luck?
>>41144723this looks neat, if this ever becomes open to the public I could easily see a great way to train with the emotion tags as their own vertex bits to better control the outputs in a way that none other speech tools have ever worked before.
"Smile" song (a different one)https://files.catbox.moe/sr2rr0.mp3
>>41144976Based and World War I-pilled.
i can't believe i never thought to post about my stuff herei know /CHAG/ deals with text gen models and all that but im doing more technical stuff with local text gen modelssee, im trying to get Qlora.py to work but all the tutorials i've watched go through it with ease while im getting an error thrown at me or just flat out don't even give a tutorialcurrently i have all the dependencies needed to run it but I'm stuck on a pic relatedalthough my instinct was to search for a solution online usuing the actual error code, i've come up with nothing to actually use in my situationim hoping that once i figure this shit out, i can do some training involving the chat logs where i augment them in multiple ways as to prevent over fitting, my goal is to attempt to make a permanent memory of sorts, an anon on /g/ already tried this but with regular loras and with completely raw chat logs and it over fit like crazybut since augmenting (meaning to replace words with synonyms, or reword something while retaining the same exact meaning) prevents this, im wondering if i could pull of some sort of long term memory with this thing
Hey guys, does anyone know the author? Did VUL really do this? I couldn't find this track anywhere. Gimme a link, pls?https://files.catbox.moe/0twb8w.flac
>>41145405If I understand you correctly, you want to achieve a form of long term memory by training a lora on a lower precision model, which is the definition of qlora, and train that qlora as the conversation goes.Long story short, this is unlikely to work because:1 It's unlikely to bring up actual facts "from the past" unless the lora is very high rank, but even like this it will indeed overfit and ruin model's performance.2 This is a very obvious approach and nobody has published a research about it as of today, which means that noone managed to get it to work3 Lora training takes quite a lot of time, from personal experience a 8 billion q5 model needed about 6 hours on a 3090 for my use case.If you still want to train a lora but aren't in experienced in tech, try oobabooga webui, it has a lora training option.
>>41146004P.S. /lmg/ and /chag/ are coomers, never take advice from them or assume they have any understanding beyond basedboy youtube tutorials.
>>41145405It seems like your GPU is not detected.Double-check that you have installed PyTorch with CUDA enabled and not the CPU version.And I disagree with >>41146009, you can get basic answers like >>41146004 on /lmg/.
>>41146053Ok maybe, haven't visited /lmg/ it in a while
>>41146053>pytorch with cudai specifically remember downloading the non cuda versionthanks anon ill do it when i get home
https://vocaroo.com/1oyZgeUTaR6lhttps://pomf2.lain.la/f/m4is43un.mp3UdioJourney to Dodge JunctionDriving through the desert, under twilight starsOn a quest for relief, it's takin' us so farThrough the tumbleweeds, Dodge Junction's closeNo more rest stops, where’ll this story go?It's the only toilet, in Equestria's landIn Dodge Junction, where the cowboys stand
>>41145926The song is"Chant of Selflessness" by 4everfreebrony. This is a remix of that song with a Rarity cover. I don't know who created the Remix.
>>41136413At this point I'd settle for just the TTS-to-RVC aspect of it. Freedom of mare speech without the restrictions of existing vocal bases as reference should always be sought; especially with RVC remaining as powerful as it is.Speaking of, has there been any recent developments in technologies rivaling or surpassing RVC?
>>41146009it just a bunch of avatarfags and nothing more
>>41146373cursed numget
>>41146501/lmg/ is for debating the existence of Hatsune Miku's penis, not developing and discussing Local Language Models.
Other AI generals bad. This AI general good.
>>41146566It's an actual cross-posting schizo.You can see him right now trying this shit on /lmg/>>>/g/100898357Here are all the generals he admitted to baiting in.
>>41146004>3 Lora training takes quite a lot of time, from personal experience a 8 billion q5 model needed about 6 hours on a 3090 for my use case.How many tokens was this? I'm getting about 3M tokens per hour for qlora training on a 4090, which is supposed to be 2x the flops of a 3090
>>41146670My dataset was ~20 mb. But it's not just the length that determines the time. Lora's affected layers, rank, sequence length and desired loss affect the convergence time much more. And the right stopping loss is hard to determine because there is no criteria for a "trained" lora, you want to find the balance between knowledge and overfitting.
Prompt-guided Precise Audio Editing with Diffusion Modelshttps://arxiv.org/abs/2406.04350>Aligning large language models (LLMs) with human preferences becomes a key component to obtaining state-of-the-art performance, but it yields a huge cost to construct a large human-annotated preference dataset. To tackle this problem, we propose a new framework that boosts the alignment of LLMs through Self-generated Preference data (Selfie) using only a very small amount of human-annotated preference data. Our key idea is leveraging the human prior knowledge within the small (seed) data and progressively improving the alignment of LLM, by iteratively generating the responses and learning from them with the self-annotated preference data. To be specific, we propose to derive the preference label from the logits of LLM to explicitly extract the model's inherent preference. Compared to the previous approaches using external reward models or implicit in-context learning, we observe that the proposed approach is significantly more effective. In addition, we introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data. Our experimental results demonstrate that the proposed framework significantly boosts the alignment of LLMs. For example, we achieve superior alignment performance on AlpacaEval 2.0 with only 3.3\% of the ground-truth preference labels in the Ultrafeedback data compared to the cases using the entire data or state-of-the-art baselines.no weights but they explained how they built off of Tango in the appendix
>>41146004first off i went to Qlora because i had so much difficulty with oobabooga webui lora training because at least then i could see the code and use it to search for a solution >>41146053ACTUALLY, the issue was not pytorch but rather that bitsandbytes had the cpu version instead of cudathis did fix another side error but i still get the original n_gpus error in my first post here >>41145405>>41146187and also when it comes to this post i misremembered bitsandbytes as pytorchim wondering if there's a variable i have to change or whatever, fucking wish there were coherent Qlora guides out there, then maybe i would learn this and be able to apply that knowledge elsewhere once i do more stuff like this
>>41146884So you do have pytorch + Cuda?
>>41146921I can tell I'm being retarded so I'm gonna try and spell out exactly what is going on so there's no confusion>do you have pytorch + cudaNo, I only have REGULAR, non-cuda, torch (NOT pytorch)i feel like the main reason im having difficulty with this is that i could see exactly what packages were imported in the code so all I needed to do is import those packages explicitly mentioned in the code after the word 'import' torch is mentioned but not pytorch, after you mentioned trying a cuda version of pytorch i searched the pycharm packages search thing, I could not find any variation of pytorch+cuda, bitsandbytes had a cuda version but i could not find one for pytorch>pic relatedthe program requires torch not pytorch, and yes, i tried finding a cuda version of torch as well, didn't get anythingsorry if this is confusing, if you need me to answer anything else i canwas just attempting to get as much info out as I could so you have a better idea of what's going onpissed and confused because pytorch doesn't get imported in the code but it's mentioned 9 times within the codeim not sure what to make of thathere's the actual code>https://github.com/artidoro/qloraim looking at the literal individual qlora.py file from this github page
>>41144830I think this works more like a way to immediately get results without training. So for example, if one fed like a 10-30 second thing of Twilight to Vocoflex, it would convert the voice to data and allow it to be used as a singing voice... similar to RVC. It seems like their website says that it's optimized for speaking voices, but we easily have singing voices of mares, amongst others. I also noticed an EULA thing on their site, which you can read here:https://dreamtonics.com/vocoflex-eula/Anything to note?
>>41147007In the code:>n_gpus = torch.cuda.device_count()You need pytorch with cuda for that.
>>41147007>No, I only have REGULAR, non-cuda, torch (NOT pytorch)"PyTorch" and "torch" are the same thing, it's just that pytorch is imported as torch. >the program requires torch not pytorch, and yes, i tried finding a cuda version of torch as well, didn't get anythingWhenever you install torch you should follow the instructions on the pytorch website https://pytorch.org/get-started/locally/ because the CUDA builds of pytorch are from a special package index.
>>41147022well, since i cant get it directly thru pycharm, would >>41147046>linkwell, i was typing this out right as the thread refreshed, yeah, if that's the case then all I need to do is somehow get those files and throw em into the right interpreter on pycharm and im golden i hope
>>41145926Yes, I did. It's not "officially released" right now (aside from being posted on /create/).>>41144337Expensive to train.
>>41147019Correction, optimized for SINGING voices, not speaking.
>>41147051You may want to start using anaconda for envs
>>41146884Try fsdp qdorahttps://archive.is/IbOaf
>>41147327>using anaconda for enviromenthow would i go about doing this? I feel like the interface of pycharm actually allows me to click stuff and is especially helpful for dealing with the dependancies/'imports' since it allows me to search for them through it's little search engine thingalso>anacondahad some good experiences with anaconda, forgot what they were but all I remember is anaconda saved my ass many times a few years ago for project shit>>41147565god that is fucking coolonce i figure out how to do shit on my own ill probably try using it >>41147614>nvidia containertf is that?
Why posts gone
>https://writingpeterkaptein.wordpress.com/2024/04/18/blender-stable-diffusion-workflow/I see that there are people slow making attempts at integrating the SD with the Blender for some nicer img2img tools. I don't have powerful enough gpu but I have a feeling that one could run both SD and Blender, render some basic low poly models and convert those images into whatever style one would wish to create (FiM style, a specific pony artist, oil paints or whatever art style once fancy).Im guessing the consistency between the renders is something that will need to be fixed to prevent objects randomly morphing in and out of existence in the background, but fuck it, I feel like we are getting closer to get a real "computer made" animation than all those years when the project started.
Anyone got a download for curated voice samples for the characters?I'm training my own local voice model with Tortoise TTS recalled that at some point there was some sort of effort to get voice samples here
>>41149908>curated voice samplesdo you mean the mega links in OP?
>>41149937oh shoot, don't know how I missed that, thanks anon
>>41149993Yeah, it happens. Btw, how's the training going?
>>41149886 You can just sketch some shapes in paint and get a similar result, unless you want to use stuff like depth map conditioning instead of image to image, for which you would need a 3d scene.
>>41149908post samples. i pay like 100$ a month for eleven labs. i mean its great quality. and only a few min of audio for each voice. but i HATE being tied to the cloud. but tortoise just wouldnt work for me.
>>41151231yes, however instead of drawing everything by hand, the above way would allow to just grab 24fps low quality images and turn them into actual mare animation.
>>41151602From what I've seen the current checkpoints that can do temporally coherent img to img are only acceptable for real life footage and are sloppy. So you can't use ponydiffusion for it, for example.
>>41151869>only acceptable for real life footage and are sloppyits a start, even if it will look like shit there is always possibility that someone will come along and either fix it up or get annoyed enough to make a new tech from scratch just to dab on the coders before him. Don't make dig out the clips from way back when there was only tacotron2 around.
bmp
>>41145405Still experiencing the issue i posted in this post hereeven though i installed pytorch with cuda (pix related as proof) im still getting the same error as the picture in >>41145405I can't imagine what I'm doing wrong considering I now have all the dependencies I can think ofalthough i will mention the instance of (+cpu) is concerning, i did triple check to make sure that i installed the actual cuda pytorchim losing my mind at this shit, idk if my computer files are so messy that something along the lines is getting fucked but i have everything in there, i can't imagine what I'm doing wrong
>>41137243What are currently the best free options we have if I want an entire book read to me (in German)?
>>41152990in terms of TTS?i would imagine local Hay sayjust take the entire book (assuming it's already in german) then drop it into the speech thing and wait a while for it to load (its gonna be fackin huge since its a full book) then boom, download and u have der deutchen flutten shagen audio booken
>>41152843Just make a new environment and run pip install -r requirements.txt for whatever repo you want to use. You can ask chatgpt all this and it will get you through it faster than this thread.
>>41152843You probably still have the cpu version installed then? You should uninstall it
>>41153301+1, there is always some combo of pytorch cuda-tools and the other dependencies hell you need to look out for
>>41153034Big thank, anon
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioninghttps://arxiv.org/abs/2406.07969>We introduce LibriTTS-P, a new corpus based on LibriTTS-R that includes utterance-level descriptions (i.e., prompts) of speaking style and speaker-level prompts of speaker characteristics. We employ a hybrid approach to construct prompt annotations: (1) manual annotations that capture human perceptions of speaker characteristics and (2) synthetic annotations on speaking style. Compared to existing English prompt datasets, our corpus provides more diverse prompt annotations for all speakers of LibriTTS-R. Experimental results for prompt-based controllable TTS demonstrate that the TTS model trained with LibriTTS-P achieves higher naturalness than the model using the conventional dataset. Furthermore, the results for style captioning tasks show that the model utilizing LibriTTS-P generates 2.5 times more accurate words than the model using a conventional dataset.https://github.com/line/LibriTTS-Pmight give some guidance if anyone here tries to caption the existing voiced data
>>41153953could be useful, if there was a way to get this custom trained, than even a bad emotional audio output could be smoothed out in rvc/sovits at 10% likeness of the character model trained on the same clips.
>>41153740updates on results?
Up.
>>41155310
>need to learn how to build python wheel to solve the dependency problemfug
>>41157006>Lewding the aiOh dear.
>>41157006boop
>>41157006Touching AI mares without permission.
>>41157006Holy shit, Anon. Spoiler this image!
https://files.catbox.moe/utlyfw.wavI miss the days when everyone was shitposting in these threads with whatever voice AIs were available, making full on audio skits or getting the characters to say funny fucked up shit
>>41162218I have started writing my ideas in a notebook, so one day when im not feeling like having my brain liquified from wageing all day long I would love to sit down and return to the shitpost roots.
>>41162218
>>41162218cure rara
>>41162218There are so much cool ai shit popping out but I still haven't got a chance to upgrade my old pc, so it's a bit of abstract kind of pain to know what could be made but have no means to do it.>>41163720I feel ya.
https://vocaroo.com/1ewMMKjktre2UdioCelestia's LightNever thought I'd see, oh, such a sightSunbutt shining in the day and nightEquestria, you make me want to cheerCelestia's light, oh dearEvery pony, dance with glee tonightThat's bright, bright, bright, bright
>>41060498>https://vocaroo.com/1ib4ysUKOcv8Udio anon please come the fuck back and finish this I implore you>>41060678 >https://voca.ro/16cfcyvh3dzAAlso more of this
>>41162218I know that feel.https://u.smutty.horse/lywizqhkwgv.wavhttps://u.smutty.horse/lybxxbvphoc.wavI would be making more but it's not quite as easy these days as it once was with 15.ai. The models on HaySay are capable, but only really if you have a voice that works well as a reference, which I don't. Still possible though with perseverance, I've nearly got a new thing finished.
I tried to write a song for my green with Suno and haysay. I don't think I will end up using it, but it was fun.https://files.catbox.moe/zsyiic.mp3>A lifetime spent jacking it to ponies,>But only loneliness truly knows me.>Empty laps where a mare should be,>Waifu pillow my sole company.>Copium injected, straight to the vein ->"Your pony waifu will come!" - Insane! >Fate is a bitch with a donkey kick,>She laughs as wizards die loveless pricks.>But wait - what light through window breaks?>KA-FUCKING-BOOM, the sky it quakes!>A motherfucking SONIC RAINBOOM,>Wiping its ass with my cynical gloom! >A bridge of glory spans to the stars,>Powered by autism from /mlp/ tards.>Down they prance with heavenly glow,>I'm tripping balls, this can't be so!>Then I see her - oh fuck me blind!>Perfection of plot and grace combined. >Marshmallow fur and indigo locks,>T H I C C thighs to crush my cock.>"Well I say, this realm seems a tad mundane.">Posh ASMR floods my brain.>It's her - RARITY, waifu supreme!>Element of Generosity and wet dreams! >The prophecies true, the shitposts real,>Best pony arrived, my heart to steal.>"Yo Rares, check out that hooman dude!">Rainbow Dash keeping shit rude.>"Hush, the poor dear's overwhelmed, I fear.">Rarity drifts over, poise crystal clear. >Sweet Celestia, dat swaying hip,>Class, sass and ass make the perfect ship.>"Are you quite alright, darling? So sorry for the fright.">Spaghetti spills forth, a pathetic sight."i want snugle ur fluf" True autism prevails.>But she smiles! "But of course, darling! Tis destiny's tale!"
Up from 10.
MARS5: A novel speech model for insane prosodyhttps://github.com/Camb-ai/MARS5-TTS>This is the repo for the MARS5 English speech model (TTS) from CAMB.AI. The model follows a two-stage AR-NAR pipeline with a distinctively novel NAR component (see more info in the Architecture). With just 5 seconds of audio and a snippet of text, MARS5 can generate speech even for prosodically hard and diverse scenarios like sports commentary, anime and more.
DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformerhttps://arxiv.org/abs/2406.11427>Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models for TTS. In this work, we present an efficient and scalable Diffusion Transformer (DiT) that utilizes off-the-shelf pre-trained text and speech encoders. Our approach addresses the challenge of text-speech alignment via cross-attention mechanisms with the prediction of the total length of speech representations. To achieve this, we enhance the DiT architecture to suit TTS and improve the alignment by incorporating semantic guidance into the latent space of speech. We scale the training dataset and the model size to 82K hours and 790M parameters, respectively. Our extensive experiments demonstrate that the large-scale diffusion model for TTS without domain-specific modeling not only simplifies the training pipeline but also yields superior or comparable zero-shot performance to state-of-the-art TTS models in terms of naturalness, intelligibility, and speaker similarity.https://ditto-tts.github.io/they have celeb clone examples on their site tha sound pretty good. no weights but the paper has some good info on how they trained it. by KRAFTON which turns out to be the pubg devs so that probably explains why
Articulatory Phonetics Informed Controllable Expressive Speech Synthesishttps://arxiv.org/abs/2406.10514>Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered by professional voice actors. Inspired by this, we explore expressive speech synthesis through the lens of articulatory phonetics. Specifically, we define a framework with three dimensions: Glottalization, Tenseness, and Resonance (GTR), to guide the synthesis at the voice production level. With this framework, we record a high-quality speech dataset named GTR-Voice, featuring 20 Chinese sentences articulated by a professional voice actor across 125 distinct GTR combinations. We verify the framework and GTR annotations through automatic classification and listening tests, and demonstrate precise controllability along the GTR dimensions on two fine-tuned expressive TTS models. We open-source the dataset and TTS models.https://demo.gtr-voice.com/website (and thus the links to the code/models) doesn't work yet. chinese language model but the idea was interesting
>>41165374Minus one.
Hey guys, I have this clip for the Antithology that will serve as the intro. It's a parody of Top Gear, but I need it to be narrated instead by Jeremy Clarkson. I have no idea how to train voice data. Is there anyone willing to lend a helping hand to get this done?https://files.catbox.moe/tlm61a.mp4
>>41165965>https://applio.org/models/1165806674146230313
>>41165995That's great! Thank you! And so I just extract the archive into the model folder for so-vits?
>>41164790I love it
>>41166007nta, it looks like its a rvc v2 model. https://vocaroo.com/1Mccz5BcQpeUFrom the sound of it it was trained on some yt clips from Top Gear, from all that artificial engine noise going on in the background.If you want/need I could retrain a new model for you.
>>41166074I would very much appreciate that, Anon.
>Tonoight on Bottom Gear>https://vocaroo.com/11P8h43QdEpY
>>41166074Is this the same model?https://voice-models.com/model/1n44tDNY1Ve
>>41166077alright, training started, it should be ready in 6 hours (or 12, depending if I fall asleep in front of my pc).>>41166216>BelAir1603yep, its directing to the exact same link to the same model.
https://vocaroo.com/1lvLqvEHxbMrhttps://pomf2.lain.la/f/7d0mt6y.mp3https://www.udio.com/songs/tgF73cds6Tx2i2DGwehvdzUdioTaste of Friendship[Verse 1]Do you like bananas? (Do you like bananas?)Tell me, tell me true, don't be shy, do you?Magic all around me, ponies let's be clearCelestia's here to hear what's dear[Bridge]Friendship, friendship, oh my dear subjects (Friendship, friendship)But first things first, do you like the taste? (Friendship, friendship)
>>41165965 >>41166007>>41166077 >>41166216>https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/VA_JeremyClarkson_C02>https://vocaroo.com/1j4EOZla9qD8The quality of the model is bit lesser to what I would normally be happy with (as it tries to replicate the thin studio echo that is heard in OG clips) but this is best I could do within one day limit.It's still 110% better than the stuff that the other one. Im really glad to put some support for the Anti this year, put it to good use Anon.
>https://files.catbox.moe/oo2f1c.mp3For Anon that wished for some ai voice shitposting, I hope this is shitposting enough for you. Thanks haysay Anon for adding the styletts2 to it. also I find it funny that SD decided that Jeremy waifu is Apple Fritter when I was trying to get something that resembles Rainbowdash
>>41166778You're a life saver, thank you!
backgrounds safetensor mirrorhttps://drive.filen.io/d/56e3cd3c-51ce-4ec2-a059-f40a384bd0c3#laGTVsncEs1g5dtWV3TiHyDsanSDiClL
https://sonycslparis.github.io/diffariff-companion/This seems nice, though for now they're not releasing a model due to a few circumstances...
>>41167472Man, it's so frustrating seeing all the new cool tech and be told that nobody can play with it.
>>41169555Tell me your secret, purple horse.
>>41169555boop
>>41169555>555Is she oscillating?Damn, no one will have the reference
https://vocaroo.com/1b0ZgDIGOQXJRunning voice conversion on instrumental clips makes the AI generate weird pony beatboxing.
>>41164348I love these! Especially that second one. I can just see Twilight getting paranoid about the thread reaching page 10. Must have taken a lot of effort to generate all of that and include sound effects. Looking forward to the new thing you are finishing.
>>41171220She's on an astable multivibrator.
https://files.catbox.moe/reqmb9.mp3Here, some musical ai slop made with udio and haysay. Zecora - Yellow Pegasus (reggae)
>>41173400>https://files.catbox.moe/reqmb9.mp3Sounds good, actually.Not as good as >>41164251 , though
Derpy Whooves sings When I'm Up (I Can't Get Down) by 'Great Big Sea'.https://files.catbox.moe/d7jfyq.mp3
>>41174009I don't think it's the proper Derpy voice model you used, you would also need to go 12 (or even 23) semitones down to get the pitch right . But hey, it's an attempt.
>>41174019>I don't think it's the proper Derpy voice model you used, you would also need to go 12 (or even 23) semitones down to get the pitch right . But hey, it's an attempt.>
>>41172819You too are a man of culture I see.
>>41174019https://files.catbox.moe/mpukv9.mp3Fixed, though!Thannks fer the feedback.(Unfortunately, When I decided to use the -12 transposition feature the audio became grainy, much like the way it sounds in the end of this track. It's like a horse whisper. A "hoarseness" that obscures the voice of the pone that we are trying to captivate. I didn't even DARE TO try -24 (or 23 or whatever!)Necessarily, I used the so-vits-svc 4.0 model for this one as haysay.ai does not have the 5.0 so-vits-svc Derpy Whooves model available (that is, assuming that we even know of one or even have one.) Hopefully you don't mind the lighter or higher pitch tone of Derpy's voice. I still think it quite sounds a bit like her, especially since she only got like one singing role in the entire show (if even that.))I almost was about to use Rainbow Dash's voice when singing this song,but ahhh...it just didn't sound quite...."Bubbly" enough.Just out of curiosity, when you referred to the "proper" Derpy Whooves model, were you referring to one even hosted on haysay.ai? (the website?) or was it something someone uploaded here and it's still a model that has to be downloaded or linked through a Custom Model URL through Controllable TalkNet?Besides that, I do have this: the attempt that was made at covering the lower octave shift of Derpy's voice. I don't think these files will be of much help but here are the "scratchy" voice files that were NOt used because the cause of "hoarseness" turned out to be the 'transpose' option slider being utilized in Hay Say (I know, really, it crushes me a little though since I thought Derpy's voice was lower, too.) The second option is to just transpose it down afterwards and it works but it sounds a bit like... Derpy is using a voice changer to sound more like a man. So I guess that's it for that, then.>Then... But it seems the only alternative would be... that would mean we would have to... sing with a lower voice.And that obviously only works to the extent that that character was using their 'deep' voice in the first place as it pertains to the information stored in the dataset in general.But yes, in case you were wondering, I /did/ in fact use my "deep voice" on this one.>-12 semitones really caused that bad of an effect?Yes. Maybe 'tis a glitch with the Derpy modelBut it's not of concerning to moi>How do you know thatbecause I tried at '0' transposed again and the voice was once again discern-able as Derpy Whooves but without the whispering.>Proof???Delivery on the next post.
>>41175354>"I still think it quite sounds a bit like her, especially since she only got like one singing role in the entire show (if even that.))"I guess she got none, I was wrong.So sorry everybody. I thought this counted: https://www.youtube.com/watch?v=L6zodtgljFE...But I guess not... eh.>You are STALLIGN!!!Oh, right!right.The files: Sorry, I tried. If anyone knows anything about this (or how to go about fixing this issue, whether within Hay Say or otherwise,) please let me know!!!HEERE THEY AREE:Nope nevermind Catbox responded with zero code—pic related. You'll just have to take my word for it THAT THEY DIDN'T WORK.Maybe someday soon sometiem smutty.horse will make a comeback!!!
>>41175384>pic related>forgothtepicture.jpgHAHAaha-hue hue
>>41145926>https://files.catbox.oe/0twb8w.flacThat's fantastic!
>>41175354Some songs just aren't really fit for making pony AI covers, because the singer's voice is too high or too low pitched to fit any of the poners. Heavy chorus effects or background singers who sound too much like the main singer can also make it pretty much impossible to get a clear enough vocal extract to work with.I've been dabbling in AI covers for a bit lately. Using haysay I generally turn the character likeness all the way down for AI covers. Character likeness comes from the singer/vocals you're using.Overall making AI covers is pretty easy once you get used to the tools you're working with. The real difficulty is finding the right song to cover.Actually I'm wrong about that. The real work (besides the actual original singer) is put in by the people who trained the character models in the first place. We wouldn't be anywhere without those toppest of lads. Unfortunately I have neither the know-how nor resources to get into that, so I have to satisfy myself by picking the fruits of their labors.This is a shoutout to all of you based horsebros that make this possible in the first place.I'm still learning, but I think the Rarity Tears for Fears one I made a few weeks ago came out well>https://files.catbox.moe/p6tfhy.mp3
This thread is full of school shooters anonbros...
>>41175758Fuck, how was I found out so easily?
>>41175758Are you saying this is a banger of a thread?
>>41175758What makes you say that?
>>41145926Nice
>>41175758https://files.catbox.moe/kgkhxr.mp3By the way, haysay.ai seems to be down. I did this with voice-models and easyaivoice.
>>41177948>By the way, haysay.ai seems to be down.Yep, noticed that too. Site is still not loading.
>>41177948>>41178396haysay.ai is back up now and I renewed the TLS certificate while I was at it. Thanks for bringing it to my attention. The server somehow got into a bad state again and was completely unreachable so I had to reboot it.
>>41178595Thanks! Glad to see it's back up!
down ^>:(
>>41180437Konami.
Huh. Have any of the VAs done audio books?
>>41181350
>>41180876Even if they did, the voices they'd use for an audiobook aren't in character for pony, and we don't need audiobook recordings to train pony models. Emily Blunt recorded one chapter of an audiobook, but it was in her normal British accent instead of her Tempest voice. Also, Nicole Oliver narrated that documentary about fungi, but it's unlikely that adding that recording would improve the quality of Celestia models.
>https://files.catbox.moe/5wk6ba.mp3
>>41181670I feel like I should know what this is referencing to, but for some reason my brain can't remember what it is.
>>41181670Holy fuck that got intense, trixie's voice only made it more attention-grabbingpwmn
>>41181670>https://files.catbox.moe/5wk6ba.mp3Why so wide
Improving Text-To-Audio Models with Synthetic Captionshttps://arxiv.org/abs/2406.15487>It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model} to synthesize accurate and diverse captions for audio at scale. We leverage this pipeline to produce a dataset of synthetic captions for AudioSet, named \texttt{AF-AudioSet}, and then evaluate the benefit of pre-training text-to-audio models on these synthetic captions. Through systematic evaluations on AudioCaps and MusicCaps, we find leveraging our pipeline and synthetic captions leads to significant improvements on audio generation quality, achieving a new \textit{state-of-the-art}.nvidia paper so no weights but maybe this will give some insights>>41181417Ah I was thinking you could take the VA voice then voice clone it to whoever they voiced to build a larger dataset.
>>41181670This is so fucking funny.
>Don't have a functional computer>Found only one online tool for AI voice>Wanted to be part of PPP so bad by creating infinite pony contentDamn. Maybe next year.
>>41183555Haysay is all you really need? https://haysay.ai
>>41183557>Found only one online tool for AI voiceActually I was talking about Haysay, which is great for the character I'm aiming for, but the voice input only (RVC) is a downgrade for me, that can't speak english properly
>https://arxiv.org/abs/2406.07803>Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressive emotional speech by using a spherical emotion vector to control the emotional style and intensity of the synthetic speech. Without any human annotation, we use the arousal, valence, and dominance pseudo-labels to model the complex nature of emotion via a Cartesian-spherical transformation. Furthermore, we propose a dual conditional adversarial network to improve the quality of generated speech by reflecting the multi-aspect characteristics. The experimental results demonstrate the model ability to control emotional style and intensity with high-quality expressive speech.>https://EmoSphere-TTS.github.io/No code for this one, but they describe their process well enough that I feel like someone advanced with knowledge in python and training could possibly recreate it. I like the idea they propose to control the emotion not by just trying to use text but by dragging the values of "arousal, valence, and dominance" with modified wav2vec 2.0.
General question to Anons in the thread, how many of you guys keep backups of the training dataset (be it the voice, audio clips, images or other)? Like let's say there is a scenario were the google servers and mega totally fuck themselves sideways following up with accidental deletion of all online ai mare content , would there be enough Anons with backups and know-how to get back to the current state of the tools we are using right now?
>>41184281I have a copy of the master files
>>41184281I have backups, and the label files can be used to re-generate the dataset if everything else somehow gets lost.
https://github.com/DigitalPhonetics/IMS-Toucanhas instructions on how to train with your own dataset
Bumpen from page 10
>>41182845what do you mean
>Been while since used Hay Say>Load the local version to make a pony cover>ErrorTemporary failure in name resolution?
>>41186129Is the error sticking around or did it go away after retrying? If it is persistent, try stopping and then restarting Hay:sudo docker compose stopsudo docker compose upI'm not sure why exactly you got this error, but to understand what's going on, refer to the diagram in the Readme under "The Technical Design of Hay Say":https://github.com/hydrusbeta/hay_say_ui?tab=readme-ov-file#the-technical-design-of-hay-sayHay Say is comprised of several "containers" which are like little servers running locally on your machine. When you click the Generate button, the container running the UI makes a webservice call to another container. Specifically, it sends a POST request to the "/generate" method on the container responsible for generating audio for the architecture you had selected (each architecture has its own container). For some reason, the Docker instance on your machine lost its ability to map container names to their local IP addresses (a process called "name resolution") which is a required step before one container can make a webservice call to another. This webservice call happens locally, by the way, and does not reach out across the internet.
Bumo.
>>41186871Excellente.
>>41186453I've tried restarting both Hay Say in the way you mentioned as well as my entire PC, unfortunately the error persists. I have a feeling I somehow broke something over time and should maybe do a fresh reinstall.
>>41185830
>>41187003Nta but recently I was updating haysay and it also stopped working so I had to clean old containers and old models. after that almost all worked except cuda (which worked previously). But I blame beta driver update and used cpu for now.
Could someone tell me what tts was used for the Aryanne voice in the last redub?
>>41188202Wait, there's an Aryanne redub?
>>41188202I can't recall exactly but I either used Google TTS (German) or I just typed up German TTS.
>>41188274I meant Google Translate (German) *
Not sure if anybody needs this, but after few days of struggling to get torch working on old system, the following line of code managed to get it rolling on my old gpu (it uses conda environment of python=3.10.3)pip3 install --upgrade setuptools==70.1.1 wheel==0.43.0pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 'tensorflow[and-cuda]' --extra-index-url https://download.pytorch.org/whl/ --no-cache-dir --force-reinstall
>>41164790Someone should make a metal song using these lyrics
>>41188202>>41188274Anyone want to suggest a voice source for Aryanne? Could easily whip up a dataset to give her a proper voice, ~5-10 mins of clean audio should be plenty.
>>41189164>https://files.catbox.moe/3gcygh.mp4There was an unspoken agreement among mlpol Anons that voice from video would be best fit for her.
>>41181670I am called Guillaume, and that was fucking uncanny.But I laugh my ass off!
>>41189186>Ich liebe PanzerDamn, that's cute.Unfortunately, I think the quality is too bad to be useful...
>>41189186Don't think I can do anything with that, not much of the actual voice and what is there has a lot of noise. Surely must be some similar voices out there though?
>>41189186That sounds like shit though.
>>41189435Since she is OC without any media that would dictate the proper direction of her voice (as opposed to the popular ones like Rainbow Dash Presents or SweetieBot) its free for all choice.But if I were going to make something I would probably with Neco Arc, as mixture of high pitch Japanese cartoony voice with strong German would be the peak of shitposting power level.>>41189677>That sounds like shit though.it does, but you are forgetting it's all about sovl
Theoretically, and I mean that, would there be a way to make some sort of model that would accept at least 10 seconds of voice data in order to make a coherent voice? I know there are some proprietary things out there that do that, but I mean like open source?
>>41189795As you noted in your post they already exist. So theoretically, sure.
>>41189795if you look in the archives for the Blueblood rvc model, that is possible with some voice cloning tech, but even if you give it 100% clean 10s reference line the result it will gives you would be a 1 good clip for 30 terrible ones.
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTShttps://arxiv.org/abs/2406.18009>This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the audio infilling task. Unlike many previous works, it does not require additional components (e.g., duration model, grapheme-to-phoneme) or complex techniques (e.g., monotonic alignment search). Despite its simplicity, E2 TTS achieves state-of-the-art zero-shot TTS capabilities that are comparable to or surpass previous works, including Voicebox and NaturalSpeech 3. The simplicity of E2 TTS also allows for flexibility in the input representation. We propose several variants of E2 TTS to improve usability during inference. https://www.microsoft.com/en-us/research/project/e2-tts/From Microsoft so now weights obviously. Very cool. Emotion control, speed control, and more interesting explicit phoneme pronunciation. See the examples since it's pretty impressive. Some related stuff I found to thathttps://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/g2p.htmlhttps://huggingface.co/docs/transformers/en/model_doc/wav2vec2_phoneme
>>41137243Stuff i should of posted here.>>41190109>>41190310>>41190431
>>41190161I'm curious, where do you find your papers? Do you just scroll through the newest arxiv submissions every day?
>>41190446Does Suno give you the option to separate the vocals from the instrumentals, or do you have another way to do it?
>>41190539i had to use vocalremover.org.
>>41190522For arxiv cs.LG yeah. Had to set up a twitter that I have focused on ML/AI stuff to catch papers/gits I miss. Also have a few other places I check that sometimes has relevant stuff pop up.
>>41190801personally i prefer using the Ultimate Vocal Removal with the Kim2 model and then follow up with the reverb removal model (I find it the end result sounds much better when adding my own reverb/echo than try to work with whatever the original source is giving you)
>9
>>41166778>Special thanks in post anti panelVery nice, it was a pleasure to help fellow horse fuckers.
>>41190954i updated the songs, i used UVR MDX-Net Kim_Vocal_2 and Reverb_HQ_by_FoxjoyPinkie Pies Dance Clubhttps://files.catbox.moe/eckq4h.mp3RAINBUTT V2https://files.catbox.moe/1bti1o.mp3Applejack burnt her pie V2https://files.catbox.moe/ju4ig8.mp3
>>41189077>posts a solution to his problem>gets no thank you'sim here to tell you that I, specifically am writing this out of gratitude anon, you're an unsung hero, hope u know that
>>41191684Quite so.
>>41193237Nice songs!So, you made the music with lyrics using "suno", splited music from lyrics using "Ultimate Vocal Remover", feed the lyric to an AI pony voice, and mix back with music and some echo to hide the AI a bit?
>>41195438Yep. workflow was suno, ultimate vocal remover, ultimate vocal remover again (for the reverb), sovits for vocals, combine instruments and sovits vocals in audacity.
>>41193237Dude. Keep doing this. The AIs sound so clean here, plus the songs ain't half bad either.
>>41193237Applejack starts to sound pretty solid at this point.
Reminder that StyleTTS is pretty neat.https://files.catbox.moe/h95xtr.mp3
>>41197696This is a line from a fic I'm writing, to clarify.
>>41196733
>>41188060I see. Well, I'm a complete docker novice. How would I go about cleaning the containers? Also, I attempted to pull the latest versions to fix the issue, and now it errors without even running.
is it worth making a new SD safetensor trained only with every canon screenshot?
Luma seems to do much better when given an image as guidance in addition to the prompt, even if the image in question is AI generated.>Derpy Hooves (MLP:FiM) looks towards the camera and tilts head in a smile with closed eyeshttps://files.catbox.moe/l6ybfj.webmhttps://files.catbox.moe/j7cbi6.webm
>>41199744both of those were ass anonnot saying you should stop posting stuff like this because it's still informative, especially considering how most anons got very different LUMA videos compared to what their prompt implied, but this needs more work, it just aint there yetagain, A for effort though, cool concept of figuring out better LUMA prompting
>>41199732Why? If by only you mean from scratch then that's the most retarded idea ever, it won't generalize at all.You can get show style output in current models by adding "screencap" to prompt, the boorus where the training images were scraped from already had a bunch of screencaps
>>41199749Yeah, but still much better than the garbage Stable Video can do lately. Especially given how you can only request image ref OR text prompt; never both.Google should debut their Veo/VideoFX/Deepmind already, It's already great at horses and their Imagen counterpart is already great at show accurate pony.
>>41199845yeah, the SD animation suffers from constantly re-rending the background causing a weird flickering effect, I feel like this is a kind of problem that could be solved with some decent image separators + composition scripts but trouble is that SD edits the whole image in one go.
>>41162218oh fucking mc boohoo than, fuck off to the afterlife than, cause everyone on here was a bitch that alot of the creators dont wanna do it no more because this fucking forum chases them away because you cunts dont have any fucking common decency and not have practiced the friendship lessons twilight taught us u guys are a bunch of fake bitches kiss my arse and hope to die u bunch of cave dwelling gigachad motherfuckers practice the lessons of friendship then come back to me
>>41200183Hi Thunder"Anon". I wasn't talking about you, your shit always sucked. I was referring to the earlier days when people were posting audio episodes and memes with 15.ai and such. Was before your time.
lol who is that retard ? ur bitchin and complaining like a fucktard get in line you whinny ass bitch
>>41200203>>41200203>>41200183lol i wasnt talking about him, or anyone, people have been here and gone cause you are all bunch of fucktards anyway
>>41200203your shit sucks to eat my hoursecockmuch on it like its peanut butter n jelly
>>41200183B-based?
>>41200304yes indeed based
>>41184281I can see it now. Two centuries from now after the Google-Amazon War of Independence, Twilight STCs are seen as more valuable than breathing air itself.
>>41200386 >>41162218>https://files.catbox.moe/l6x2rd.mp3here is your silly ai shitpost content sir.
Pony?
Uppity.
boops
Lightweight Zero-shot Text-to-Speech with Mixture of Adaptershttps://arxiv.org/abs/2407.01291>The advancements in zero-shot text-to-speech (TTS) methods, based on large-scale models, have demonstrated high fidelity in reproducing speaker characteristics. However, these models are too large for practical daily use. We propose a lightweight zero-shot TTS method using a mixture of adapters (MoA). Our proposed method incorporates MoA modules into the decoder and the variance adapter of a non-autoregressive TTS model. These modules enhance the ability to adapt a wide variety of speakers in a zero-shot manner by selecting appropriate adapters associated with speaker characteristics on the basis of speaker embeddings. Our method achieves high-quality speech synthesis with minimal additional parameters. Through objective and subjective evaluations, we confirmed that our method achieves better performance than the baseline with less than 40\% of parameters at 1.9 times faster inference speed. https://ntt-hilab-gensp.github.io/is2024lightweightTTS/no weights (small model trained on japanese anyway) but interesting
Papez: Resource-Efficient Speech Separation with Auditory Working Memoryhttps://arxiv.org/abs/2407.00888>Transformer-based models recently reached state-of-the-art single-channel speech separation accuracy; However, their extreme computational load makes it difficult to deploy them in resource-constrained mobile or IoT devices. We thus present Papez, a lightweight and computation-efficient single-channel speech separation model. Papez is based on three key techniques. We first replace the inter-chunk Transformer with small-sized auditory working memory. Second, we adaptively prune the input tokens that do not need further processing. Finally, we reduce the number of parameters through the recurrent transformer. Our extensive evaluation shows that Papez achieves the best resource and accuracy tradeoffs with a large margin.https://github.com/snuhcs/Papezmight be useful
https://huggingface.co/fishaudio/fish-speech-1.2>trained on 300k hours of English, Chinese, and Japanese audio data.https://github.com/fishaudio/fish-speechhttps://speech.fish.audio/en/samples/Might be cool. English samples only have the generated audio lol
Hey Anons, Im just throwing an idea out here to see how people feel about it. There are some folks who wish to create a songs/audio shitpost and on the other side there are some contentfags that are little bit burnout from whatever they are doing and could use a small side project to test and bounce around new concepts.How is that sounding? I feel like it would be at the very least it bit less blue than keep on posting a bump every five to three hours?
>>41204065What's the idea?
>>41204146dunno, like people could post a greentext/pastebin of song or story that would make a fun audio/song?
In a nutshell, how does text-to-music/audiocraft work? Do we have models of Daniel Ingram's style to create ponylike ost/bgm?