Welcome to the Pony Voice Preservation Project!youtu.be/730zGRwbQuEThe Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.EQG and G5 are not welcome.>Quick start guide:docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/editIntroduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.>The main Doc:docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/editAn in-depth repository of tutorials, resources and archives.>Online speech generationhaysay.ai>Active tasks:Research into animation AIResearch into pony image generation>Latest developments:pastebin.com/4p00iUZM>The PoneAI drive, an archive for AI pony voice content:drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp>Clipper’s Master Files, the central location for MLP voice data:mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSigmega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQdrive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx>Cool, where is the discord/forum/whatever unifying place for this project?You're looking at it.Last Thread: https://desuarchive.org/mlp/thread/43127073/#43127073
FAQs:If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/editMain: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit>Where can I find the AI text-to-speech tools and how do I use them?A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwqHow to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy>Where can I find content made with the voice AI?In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCpAnd the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit>I want to know more about the PPP, but I can’t be arsed to read the doc.See the live PPP panel shows presented on /mlp/con for a more condensed overview.2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ42021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw52023 pony.tube/w/fVZShksjBbu6uT51DtvWWz>How can I help with the PPP?Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.>Did you know that such and such voiced this other thing that could be used for voice data?It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.>What about fan-imitations of official voices?No.>Will you guys be doing a [insert language here] version of the AI?Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.>What about [insert OC here]'s voice?It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.>I have an idea!Great. Post it in the thread and we'll discuss it.>Do you have a Code of Conduct?Of course: 15.ai/code>Is this project open source? Who is in charge of this?pony.tube/w/mqJyvdgrpbWgZduz2cs1CmPPP Redubs:pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97Stream Premieres:pony.tube/w/6cKnjJEZSCi3gsvrbATXnCpony.tube/w/oNeBFMPiQKh93ePqTz1ns8
Quick thread to see the state of things.15.ai is now dead forever.https://x.com/fifteenai/status/2060098921582772567
Is there a dataset describing the scenes of the show, kinda like a transcript but with detailed information on who does what?
>>43289213I don't think there would be such thing for mlp (otherwise it would be scrapped for data years ago), I guess the closes would be to try rip off netflix and other tv shows that have the Audio Description/Video Description tracks for the blind people, and see if that could be somehow used as dataset in whatever project you are planing to do.
>>43289085This MF just abandons every project of his, doesn't heHe'll abandon that marketplace soon enough and then announce a brand new project that he totally won't abandon, bros!
>>43289238I doubt it would be detailed enough. I heard multimodal llms can take in video, even small ones like the ones I can run on my machine, so one could theoretically tag the entire show like this.That aside, picrel is a compilation of experiments I did a while ago while playing with show frame compression. I trained a hierarchical VQ autoencoder that encodes 256x384 frames into two maps: 16x24 and 8x12 codebook indices, each codebook has 1024 entries, and then tried to generate larger map given smaller map using discrete diffusion. It's quite undertrained but I got bored of it. Just thought you guys would appreciate the abominations, some of them are even cute.
>>43289447I think I saw some models months ago that were able to watch 10s video and describe what was happening in a scene along with any interaction people had with the setting, I wish I could remember that the name of it was since that sounds like something you could potentially reuse for your stuff.Now that I type all of this, I kind of wish there was a program that could easy way to create automated audiobook from a fic, but that would require for tts model to be combined with some llm to understand which characters are included in the story and automatically swap the voices of character/narrator as well as add any relevant background music and sound effects.
>>43289481>easy way to create automated audiobook from a ficI made such app a while ago https://files.catbox.moe/cwj64u.mp4https://drive.google.com/drive/folders/14zMbURz1SuYNMoewX88EjR8sHEkcaKXa
>>43289508>if yours only supports cuda 11.x but you still want to run on gpu, run the following inside PVT folder: runtime/python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118Could it be possible to make it work with the cu116? My system+gpu can't get python to work with anything above that, and Im not going to be upgrading my pc for at least next two/three years.
>>43289524Replace cu118 with cu116 and try it, should work.
Oh hey, the lights are back on here?
does anyone know if there was an upgrade in openly available musical ai models or is the yue music model from last/two years ago still the "best" option?
What would people think about trying to make some kind of collaborative "album", with Anons posting their pony songs made with whatever ai model/service of their choice every Friday?
>>43291450The idea sounds interesting. I'm not sure if there are enough Anons who are actively genning to make that system work though.
Up.
Hello! Maybe someone here could help me find a certain audio file I'm looking for. It's Princess Celestia saying "And so, as punishment for your insolence and treachery, I hereby sentence you to death by decapitation! May the blade fall swiftly and your head be raised high as a warning to all those who would dare to defy the will of the crown." I can't remember whether it was AI generated or audio from Nicole Oliver herself or a fan dub, but I remember that the quality was very good. I know I listened to it in November 2024, but it may have been posted way before that. I scoured the PPP threads on desuarchive and Clipper's master files and did not find it.
>>43293091>The PoneAI drive, an archive for AI pony voice content:drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCpTry here, if it was genned in a thread then it'll be found in here.
>>43293096Thank you, Anon. There's lots of good audio in there but, alas, I could not find the file.
>>43293091I found it! I can't believe I missed it the first time; it's right there on desu in Nov 2024 just like I said, >>41653178Direct link to the audio, for anyone interested:https://vocaroo.com/14eyuFuDu0Zs
>>43293091>>43293256That sounds pretty OOC for Celly.
nine
>>43293971plus one
>cant use raw audio clip because singing reverb fucks with the voice conversion>cant use the DeReverbed audio because random bits of words are just nuked from the clipdear song makers, plz stop using aggressive autotune and fake ass reverb, its making finding a good song for ai pony covers way more difficult than it needs to be.
>https://u.pone.rs/basikmdv.mp3>technobridge523 - BLACK SUNSET (Cover)What a strange feeling, bumping into a brand new ai cover of old original ai song made in ppp thread from two years ago.
What's the latest and greatest in local voice gen?
we are so dead
>>43289079We drove this deep into the heart of the Archives. We thought it was dead. We were wrong.
>>43297041we are so alive
Gpt sovits on haysay is fucked and keeps giving an error. The only decent one that allows text to speech and has all the good characters.
>>43297664Hmm, Im guessing HydrusBeta amazon server decided to crash out on that specific tts code , which sucks balls. Im guessing it wouldn't be helpful saying there is local ui installation of the Gpt sovits that can run pretty OK on 8gb vram? worse come to worse, you could always request audio clip form anons in the thread here, as long as it's pony related it would be fine
>>43297664There was a JSON file that got into an corrupted state. I've regenerated it, and I think that fixed the issue. Let me know if the problem persists.
>>43297705I've been working around it by using the characters on the tts one and converting them into the other voices with mixed since they require an audio input.>>43297961This worked, thank you anon.
>>43297961Thanks for the fix.
>>43297961Thanks anon!
>>43297961noice one, m8!
>>43289085Didn't he say he was going to open source this stuff eventually? Now would be a good time.
>>43299651I dont think he ever has, and even posting on ppp was very sporadic in 2019/2020. It would be cool if he had, or at least maybe create the documents showing exact steps of how it was made so people with the know-how would be able to recreate it with the modern tech.
>>43299651>>43300095I'm certain he at least stated several times that he was gonna release a paper about his research, which he never did, and likely never will.
>>43300117Damn
>>43289079hey check out my fuckass tts. I'll release model and stuff for realsies soonsamples:https://files.catbox.moe/b5asjw.wavhttps://files.catbox.moe/2vk68m.wavdemo:http://198.53.64.194:35029/
>>43300755
>>43300755https://files.catbox.moe/hpg3m1.wavhttps://files.catbox.moe/xrrt16.wavhttps://files.catbox.moe/uz1fyc.wav
>>43300755Is this your own arch or did you tune something?
>>43301171Both. I'll explain later.
>>43300755>https://files.catbox.moe/x2bt7n.wavhey, thats neat. I can do a tf2 audio shitposting again.
>>43300755This is really, really good - at least, from the small samples I've generated. And I bet it sounds even better if you know what the hell you're doing with top_p, temperature, and those other autistic sliders. Is there a release window?
>>43301579>>43300755Also, just my intuition, but did you use S1 voice clips for your model? If so, bravo.
>>43301579Tomorrow at latest, tonight at earliest.
>>43301712Awesome. Tested it a bit more and it'll probably be my daily driver for AI TTS. Will you be posting guides and whatnot, especially for things like interjections (hm, huh, wow, etc.)? Doing those just result in long silences; makes me think it might be bugged or something. And is the 30s audio output limit for the testing phase only?
>>43301727>Will you be posting guides and whatnot, especially for things like interjections (hm, huh, wow, etc.)Not even I know how to get those out of the model, that will require further research>And is the 30s audio output limit for the testing phase only?Yeah, plus a rate limit so everyone gets to test it. When running it by yourself you can do whatever you want.
>>43301515https://files.catbox.moe/chaxe4.wav
>>43301745kek
>>43300755Heyo, any chance for option to automatically convert the wav to MP3 files when generating them (I'm just but lazy to converting every single file by hand) ?
>>43300755https://u.pone.rs/hyboomqt.wavAlright, I've been pretty jaded towards voice AI as of late, but I gotta admit this is pretty impressive.
>>43302597Yup. It'll still require some audio splicing between multiple generations in some cases for the best possible take, but the quality floor has a notable bump to it compared to past TTS software I've dabbled in.
>>43300755MOSS-TTS-1.7B-PNY v0.1: a finetune of MOSS-TTS + my custom vocoder for 48KHz audioHuggingFace: https://huggingface.co/ZDisket/MOSS-TTS-PNYColab Notebook: https://colab.research.google.com/drive/1tDIYCMumcW5w3JWnQ0tBGyAr-ZpaaXBBPublic demo: http://198.53.64.194:35029/See pic related on how to run on Google Colaboratory. For local setup on your own hardware, you want at least 13GB of VRAM. Model runs ~1.5x realtime on a single RTX 5090 with the optimized runner. Download from HF and ask Claude Code to set it up for you.>>43301171From a technical perspective, this consists of two models: 1. A finetune of MOSS-TTS with fixed speaker conditioning, and 2. A very custom iSTFTNet2 vocoder that turns hidden states of the MOSS Audio tokenizer into 48KHz audio (which can be also repurposed for singing voice conversion).>>43301515The TF2 speakers are a bit lower quality because they were thrown in as an afterthought. Next version will include better emotion control and quality.
>>43303129Neat, thanks anon, gonna check it out
>>43303129It works! Very expressive, though it underpronounces some words.https://litter.catbox.moe/inza6f99g9ioti7z.wavhttps://litter.catbox.moe/epx5r2ssfynizrzv.wavI think it would benefit from running the lm quantized with ggml. It's current ~12 gb vram footprint makes it impractical for most tasks, and it's pretty slow too.
>>43303129>Delta>Google ColabThis feels like one hell of a throwback to the very early days of PPP. Welcome back and godspeed.
>>43303320Known issue probably due to not much data, try lowering the temperature for now, something like 0.6. Temperature basically controls how "creative" the model is. Higher temperature is more chaotic.>It's current ~12 gb vram footprint makes it impractical for most tasks, and it's pretty slow tooVery unoptimized, a 1.7B transformer should run much faster. I should figure something out for that soon>>43303339Thank you. I'm only getting started. Need to improve this and figure out singing voice conversion and LLM finetunes in 2024 I finetuned Llama on a desuarchive dump of /mlp/ and found out I could ERP with it in greentext
>>43303129Cool, also Celestia is best pony.
>>43303320>~12 gb vram footprint makes it impractical for most tasksSadly this, the reason why rvc exploded as ai voice tool was that it was poorfag friendly as it could run in 4gb vram and train any voice on a 8gb vram card.
>>43303320>litterboxDamn
>>43303129noice
Bump.
>>43303129I was watching family guy reruns on cytube the other day, and I thought what the hell, I'll test the TTS using the show's dialogue. This tool actually rocks. The audio quality is INSANELY crisp and practically artifact-free (at least for the M6 and Trixie - haven't tested anybody else yet) I think once you release better emotion control, it will truly be heads-and-shoulders above 15.ai, because as it stands, I'm not sure if there's a way to force a certain emotion into the line's delivery unless the dialogue itself clearly betrays that particular emotion. https://files.catbox.moe/u3hjsj.wavhttps://files.catbox.moe/ob9myx.wavhttps://files.catbox.moe/yurj0p.wavhttps://files.catbox.moe/o5olvb.wavhttps://files.catbox.moe/bogptq.wavhttps://files.catbox.moe/75ttqf.wavhttps://files.catbox.moe/fx1zry.wavhttps://files.catbox.moe/ifr120.wavhttps://files.catbox.moe/isd7iz.wavhttps://files.catbox.moe/k9espp.wavhttps://files.catbox.moe/vsw445.wavhttps://files.catbox.moe/l1ahoj.wavGodspeed, fren. Looking forward to those updates.
>>43303475I played around with optimization, q4 quantization doesn't speed it up, the bottleneck is n_vq_for_inference. Halving it (16 instead of 32) produces audio in half the time but it's not as good https://litter.catbox.moe/uqzjohq2d3i03o9k.wav but I assume you already know all this. I wonder if 32 codebook channels is overkill for this task since it only has a limited set of voices to represent.
>>43301515Good luck, anon - I made the Heavy say>We are going to destroy Israel AND generate ponies!
>>43305634Emotion control is in the plans. I'll stick with Cookie's BERT conditioning technique. Audio still has artifacts on the TF2 speakers as those represent only a minority in its training set.>>43305746I've got it up to ~3x realtime with some cudagraph and torch compile witchcraft>I wonder if 32 codebook channels is overkill for this task since it only has a limited set of voices to representThat I was wondering too. It would probably require training a vocoder specifically for that.>>43303611Is RVC still the go-to for singing voice conversion? I can swap out the vocoder in it for mine to improve its audio quality
>>43306298>Is RVC still the go-to for singing voice conversion?Yep.
>>43303129These models sound amazing! What are the limits? Are you able to train a derpy model or is there not enough data?
>>43306298I'll be waiting for updates
>>43306298>Emotion control is in the plans.noice
>fucking around interdimentionally >https://files.catbox.moe/bbmjgz.mp3A old audio conversion/remix of the same name panel from last year (or two years ago?). Reposting to see other anons would enjoy this meditation thing .
Did Clipper's 1st master file MEGA get nuked? Whenever I try to access it, it just loads indefinitely.
>>43307177Seems all there to me.
>>43305634kek these lines
>>43307177Nvm its working again
>>43294893
snowpity
Im glad all of you Anons are all well.
>>43307109comfy audio
>>43309365Sunbeam
>>43309693thanks, I had an early version were I converted the og audio with just rvc but the audio outcome was very meh, lucky rvc-sovits came out some time around it so I could gen Luna talking in a proper calm yet reassuring emotional style.
>>43310789ai mare
>>43307079>>43305634Working on emotion control. More complicated than I expected because this big ass transformer model is less responsive to conditioning than models of old. Regardless, I'm toying around with an emotion category + an adjustable energy value (from 0 to 100).For example, Twilight, same sentence, same Happy emotion:95% energy: https://files.catbox.moe/smt36d.wav15% energy: https://files.catbox.moe/nl0opy.wav>>43306329Not enough data for Derpy, unless we start using fan VAs (and if they have enough clean data for that)
>>43311332Dang, nice work anon! Thanks for the swift turnaround on the feedback! If you're able to implement this successfully, it would help out a ton for the projects I have planned. Keep it up!
>>43311332We are so back
>>43311332Very nice, but is there any reason why she sounds more enthusiastic on the 15% than the 95% one?
>>43312305Woops, I flipped the labels. They are in the wrong order; 95% is the 15% and 15% is the 95%.On another angle, I found some NSFW voice pack: https://opennsfw.carrd.co/#vo2 ; I'm getting it transcribed so I can throw this into the training set of the next model. The gist of it is that it will teach the model this stuff and transfer capabilities to the pony speakers too.
>>43311332>emotions controlled by percentagefuck yes, this is exactly what I wanted since forever. Please tell me, you are having plans in adding option to mix two emotions together as well?
>>43311332anon... i litterally cannot think of a way it gets better than thisgenuinly what else is there beyond this? it's functionally everything we could need, it's 1:1 like the voices i hear nothing robotic, it has the emotion, and all the little things that make it the pony they arethe one time i keep up with /ppp/ and it's already at the end game
>>43312608>Woops, I flipped the labels.Ah yeah I figured.Good work though! She sounds really happy in the higher energy sample.
>>43312608>horny 100%aw yiss
>>43303129>>43313194awesome
>>43313228Now, this is all I could had ask for and more
>>43303129if hydrus is still active in this thread, can we get this on haysay?
>>43312608mister delta please please oh please tell me this emotions update will be coming soon all of dash's lines are screaming and i can't get her to settle the fuck down for emotional lines
>>43311332amazing work, can't wait to see more updates!so happy that the ppp is back, it motivates me to gen again.https://files.catbox.moe/8g084q.wav
hi ponys whats up guess whos back
>>43314144poopikins thought u were gone
>>43312766First version will be just select from a set of emotions + energy slider, as it works decently.>>43312779There's still a lot of work to be done, like improving voice conversion and LLMs>>43314139Yes. Technically everything's in place but I'm trying to bundle NSFW too, which is proving a bit complicated. If I can't figure it out by the end of week, I'll just release as it is.
>>43314222i love you thank you nonny this is fucking awesome
>>43314144sunbeam
>>43314222As someone in the minority who preferred using reference audio, do you plan to add an option for that? For some things, it's faster for me to just speak the lines and get it right once compared to the ritual of spamming generate for half an hour with 15. This certainly outputs faster, but I prefer to fine tune with different takes instead of adjusting emotions with sliders, especially because things like stammering and fluctuating tone of voice can't be changed easily with TTS. But so-vits and RVC are certainly showing their age compared to this.
>>43314207Don't call it a comeback.
>>43314222>I'm trying to bundle NSFW toolikefucking lewd noises and a "horny" meter or some shit for the pony voices?oh boy
>>43312608>On another angle, I found some NSFW voice pack: https://opennsfw.carrd.co/#vo2 ; I'm getting it transcribed so I can throw this into the training set of the next model. The gist of it is that it will teach the model this stuff and transfer capabilities to the pony speakers too.BASED BASED BASED BASED BASED
>>43315015>>43315021i'm so fucking excited to put it in my vn mod you anons have no idea
>>43314851i will call it a comeback if i want to, the reason why i left this forumn was because of idiots like you, and you know ultimately happened the thread died ?, im here to at least provide content and keep this thing alive so no go fuck yourself with a cucumber degenerate fuck face no one fucking likes you
>>43314222>>43315015>>43315021NSFW kinda works but is leading me down a research black hole at da moment(current iteration is extremely unstable) so I am dropping it for now in favor of focusing on emotion control.Like, these are the only decent-ish samples I could squeeze out of it: https://files.catbox.moe/ehxmzw.wav ; https://files.catbox.moe/ebwxit.wav>>43314572Oh yeah, I plan on getting voice conversion upgraded soon.
>>43315106>dat dashie audio >cum for me anonUUUUUUUUUUUUUNFFFF!!!!!!!!!!!!!!!>dropping itFuck my nigger existence
>>43315106I wouldn't mind some beta unstable testing version being on the page for the time being, but also if that's going to be a giant fucking hassle to have then don't bother yeah. Cool work though.
>>43315158Also, adding onto this, I think NSFW is less of a priority than enabling further general control anyways. Being able to, yeah, control emotions and general sentiment etc is great, but being able to control intonation by placing some sort of emphasis on certain words etc would do a ton for making there be less gacha in getting what you want.That being said, NSFW is still peak and something I'd love to see at some point. Some kind of ASMR toggle to go with it to make it sound like it's being whispered in your ear would likely do a lot for some anons too kek.
>>43315106I hope you will make it a docker image with all the optimizations and shit
>>43315106Btw, what's the current vram and cuda requirements to run this?
mare bump
>>43316060Yeah I'm definitely excited to see what's cooking with the new TTS. It's worth keeping the thread boomped for it. The /chag/ usage of it was so damn cool already.
>https://u.pone.rs/gykynemn.mp3
>>43317647and the inside of a horse
>>43316730>by end of weekjust two days away just two days away its like christmas fucking morning
>>43318088 (checked)Man I completely spaced that. God I'm so excited. I've still been using the demo like daily. I'm holding onto some lines I generated for potential pony shitpost projects and I'm really excited about it. This stuff is an OC inspiration goldmine.
>>43317647Truth
>https://www.youtube.com/watch?v=45-GDaNgfM4new BGM kino its mostly an instrumental but there are still some ai voice bits in there so it counts
>>43318783>pony zone was 5 years agoi can feel my bones crumbling to dust
>>43318794so do i bro, so do i
>>43315106Oh, forgot to ask: is there a way to remove the 30 second output limit? It's still present, even when ran locally.
Pump seed inside mares
These big transformer models are pretty hard to condition on emotion; mine was ignoring the labels so I had to devise an output head which runs emotion classification, so that forces the model to actually pay attention to the emotion labels. It works but is a 'lil bit more subtle than in other models. For example, Rainbow Dash:Angry, 95% energy: https://u.pone.rs/evxtaazv.wavNeutral, 20% energy: https://u.pone.rs/oquscjwf.wavBut it should get stronger with more training, I'll release this in the next 2-4 days So much technical stuff going on, maybe I should see if I can do a university-style lecture at a future Mare Fair.Pic unrelated, Lucky Roll makes me hard>>43319862Whoopsie. There's a config knob in the code somewhere, it should be easy to find or just ask your favorite coding agent to do so--I'm fully sloppilled and don't read nor write code manually anymore. If you don't have any subscriptions, OpenCode offers free usage of good enough models.>>4331547413GB VRAM, any CUDA that can run modern pytorch will do>>43315122NSFW will be in some future version.
>>43320423>13GB VRAM, Fug>any CUDA that can run modern pytorch will doDouble fug
>>43320423Oh shit, was just making a post about how excited I was for the next version. Fucking awesome. I'm really looking forward to the chance to have Ponk actually speak softly in some lines. I was putting in cutesy comfy waifu stuff for her to say, but she would always yell it, haha. Twilight's been the most consistently tonally good voice for me so far from my experimenting.
>>43320427Yeah I wish it were more compact so that I could load it with a language model
>>43320468yeah, would be nice if there were voice and text models that weighted under 1GB to make it possible to have a "discussion" with multiple characters at the same time that can't actually read each other minds like the current llms do.
>>43318794>pony zone was 5 years agoGood God, how did that happen?
>>43314144what I've made so far over the last few days, all with MOSS.Awkward Dash and Trix:https://files.catbox.moe/8g084q.wavhttps://youtu.be/9NO_LqQfNpA?t=10Smart Dash:https://files.catbox.moe/p3l2om.wavhttps://www.youtube.com/watch?v=SBiXajenKrgWhy would anon do that:https://files.catbox.moe/nbalhb.wavhttps://www.youtube.com/watch?v=m3MaTuv6QHI
>>43322236kek, nice work.woundn't mind Trixie sucking on my nose ifkwim
>>43322236Man, this TTS really is something else. Bravo, anon! Have you ever considered making some audiobooks for a short fimfiction story, perchance? I feel like that would be a great use-case; the TTS is just that good.
>>43322792>Have you ever considered making some audiobooks for a short fimfiction story,Maybe, I've thought about it before. I'd rather see if I could animate a short story rather than creating an audiobook, though animation takes forever. There's a few fics that come to mind that I'll like to adapt someday (with the authors permission, if they're still contactable)
>>43322830> I'd rather see if I could animate a short story rather than creating an audiobook, though animation takes forever.For your own good, anon, I would advise against that heavily. Start small. Make an animated skit inspired by something under a minute or so, like one of those youtube shorts. The feeling of achieving those small goals consistently will give you the motivation to do something longer. And by animation, I hope you don't mean hand-drawn or flash animation, lol; even just PNGs sliding across the screen like the tax breaks animation would suffice. Don't burn yourself out
Why does dsv4 have such niggerishly slow prompt processing?
>>43323079wrong board sorry
Not pony related but I just wanna say I am big fan of vibecoding small scripts that I could had write in a day or two but I can get LMM make them in under a minute
>>43303129>>43320423Updated the model with first iteration of emotion control. Links remain the same:HuggingFace: https://huggingface.co/ZDisket/MOSS-TTS-PNYColab Notebook: https://colab.research.google.com/drive/1tDIYCMumcW5w3JWnQ0tBGyAr-ZpaaXBBPublic demo: http://198.53.64.194:35029/You now have 12 emotion classes to choose from, plus an energy slider. They do influence, but it's more of a nudge than a demand. You still have to craft your prompts. Regardless, I hope this makes it easier to get what you're looking for. Nonverbal is an emotion class reserved for NSFW mode sometime in the future.Pic unrelated. Also, rate limits on the public demo have been doubled as I've now got an optimized runner that does single batch inference at 3.5x realtime.
>>43323889excite
>>43323889Thanks for yet another release! Quick question: how do I enable the optimized runner? Just paste the code from HF and then run the gradio? Or do I have to stick to powershell?
>>43323889i was wondering if you guys could also try adding more ponys such as the student six and the cmc's
oh and thorax
>>43324263The code from the HF repo and Gradio already has all the optimizations. I think some are turned off by default, because this speed is achieved by using TorchInductor and compile witchcraft to turn the whole inference flow into one big kernel to reduce overhead, but takes 5-10 minutes to startup, which is no problem for a long-running server.Also, expect the demo to go from a Gradio app to something more refined UI-wise. Maybe I'll give it a real name. Taking suggestions
>>43323889Oh fucking awesome. I was just using the demo in bed and then I woke up to see the new settings and checked the thread then. Was so damn hype. Only responding now but yeah this is sick.As a Ponkfag I am especially pleased, as beforehand she would yell 99% of her lines whereas Twilight was fantastically pitched most of the time. Putting "calm" on "0% energy" has led me with a lot of softer lower pitch Pinkie speech compared to before, which I love a ton. Her voice is so cute when it's chill. Extra rate limit is also really appreciated as I love playing with this.
>>43324798>Putting "calm" on "0% energy" has led me with a lot of softer lower pitch Pinkie>https://u.pone.rs/krthqfzm.wavholy fug, yeah this works great, finally a tts Ponk that doesn't talk like she just chug a entire barrel of energy drink.
>>43323889I tried to use the collab and I got this error
>>43324798>softer lower pitch Pinkie speechI may be a ponkfag now.
Yes it can be done result attempt #0https://files.catbox.moe/qjzfpe.mp3>Can we help?Yes, anyone can. Just have a google account or any account. Use Claude.AI for prompts since it understands better than the average person how music AI works.https://www.msong.aiUse msong.ai since it's the best one available that beats the crap out of Suno & Udio ... that's why they only allow you 2 tries every 24h. (it's also why you have to record it manually which is sad, but Youtube already compresses the shit out of everything, that's why Youtube is only for popularity, not for hosting)This line should be changed else we turn the poor girl into a tranny. Let's not.( and these are the boys, Stab and Jab.)>Yes, female deer can grow antlers, but it depends heavily on the species and the individual animal. Could change it from boys to girls if we want to keep the antler lines. It will still be weird as fuck that female Ronno is flirting with Faline but eh we can just pretend Ronno is a sarcastic bitch to everyone.>See, man's got this stick that can make his voice sound like one of us.Kek.Ronno's lines and prompt inside:https://pastebin.com/KARtLG4q
>>43324899Yeah exactly. She sounds so fuckin' cute dude. Pinkie's always been the roughest one in TTS in my experience, but this is finally starting to deliver something really nice.I'm glad you found that combo useful too. I've been experimenting with the style text shit and have gotten some interesting results. If I come across good sentences that have really nice results for prompting I'll try and share them here.Here, before posting this I agonizingly fucked around and managed to get a couple cute ones.>ponk on the insidehttps://u.pone.rs/lubytuhc.wav>ponk gf (despite her being a pony) (this one required genning it piecemeal and stitching together because it was kind of long, but I hope you guys like it, it took me a minute but I think it's really cute)https://u.pone.rs/sbdyftft.wav(also >>43324959 extremely based, hope you like the audio clips I made for this post, you're my bro now if that's true)
>>43323669Examples?
I'm going to see if OAI Codex can port this model to C++/GGML if I just leave it on a loop.>>43324950Fixed. The new optimized path switched to TorchScript instead of ONNX for the vocoder and the Colab demo didn't download that artifact. Also, since Gradio is being weird with share links, the demo uses a Cloudflare tunnel.>>43325014Cool stuff anon. If you find interesting ways of using the model, do share them.
>>43325014non-screechy ponka is best ponka. both are fine, of course, but I like it better when she's less histrionic. extremely cute gens btw