/mlp/ - Pony Preservation Project (Thread 150) - Pony

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/mlp/ - Pony

Return Catalog Bottom Refresh

[Post a Reply]

Name
Spoiler?	[Spoiler?]
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File	[Spoiler?]
Please read the Rules and FAQ before posting.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
Pony Preservation Project (Thr(...) 12/05/24(Thu)21:16:06 No.41706417

File: AltOPp.png (1.54 MB, 2119x1500)

1.54 MB PNG

Pony Preservation Project (Thread 150) Anonymous 12/05/24(Thu)21:16:06 No.41706417

Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE

The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.

Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.

AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.

Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.

EQG and G5 are not welcome.

>Quick start guide:
docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.

>The main Doc:
docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
An in-depth repository of tutorials, resources and archives.

>Active tasks:
Research into animation AI
Research into pony image generation

>Latest developments:
ponepaste.org/10569

>The PoneAI drive, an archive for AI pony voice content:
drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp

>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
drive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx

>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.

Last Thread:
>>41571795

Anonymous
12/05/24(Thu)21:16:32 No.41706418

Anonymous 12/05/24(Thu)21:16:32 No.41706418

FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Main: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit

>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwq
How to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy

>Where can I find content made with the voice AI?
In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
And the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit

>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ4
2021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f
2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw5
2023 pony.tube/w/fVZShksjBbu6uT51DtvWWz

>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.

>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imitations of official voices?
No.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.

>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
pony.tube/w/mqJyvdgrpbWgZduz2cs1Cm

PPP Redubs:
pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97

Stream Premieres:
pony.tube/w/6cKnjJEZSCi3gsvrbATXnC
pony.tube/w/oNeBFMPiQKh93ePqTz1ns8

Anonymous
12/05/24(Thu)21:17:33 No.41706422

Anonymous 12/05/24(Thu)21:17:33 No.41706422

File: veryVERYbiganchor.jpg (214 KB, 1024x681)

214 KB JPG

>>41706417
Anchor.

Anonymous
12/05/24(Thu)23:35:43 No.41706747

Anonymous 12/05/24(Thu)23:35:43 No.41706747

>>41691563
Am I gonna need the pro version of SynthV to do this? >>41690198 If so, any way around that whopping $90 price tag?

Anonymous
12/06/24(Fri)00:29:30 No.41706831

Anonymous 12/06/24(Fri)00:29:30 No.41706831

>>41706747
So I made the first few lines in SynthV and I can already tell this isn't going to work.
There's absolutely no way I'm going to get the timing down anywhere close enough to line up with a karaoke track of the song. I can have very tidy robotic like timing, but the song has all sorts of fermatas and tempo variations that just don't play nicely with the hard timing of a midi-like note generator.

The only way this is going to work is if there exists an AI tool that can take the existing audio track and imitate the melody and timing itself.
I've seen people turn existing songs into the same song but sung by a pony, so clearly THAT is possible, but does there exist a similar tool which also allows you to change the words but keep the same pitch and phrasing?

Anonymous
12/06/24(Fri)00:36:30 No.41706840

Anonymous 12/06/24(Fri)00:36:30 No.41706840

>>41706831
Not unless you sing the song yourself.

Anonymous
12/06/24(Fri)00:38:04 No.41706844

Anonymous 12/06/24(Fri)00:38:04 No.41706844

>>41706840
...That's not completely out of the question if AI could take my voice and turn it into a pony's.

Anonymous
12/06/24(Fri)04:55:48 No.41707197

Anonymous 12/06/24(Fri)04:55:48 No.41707197

Slow start.

Anonymous
12/06/24(Fri)06:06:15 No.41707278

Anonymous 12/06/24(Fri)06:06:15 No.41707278

Just shooting a general question here, do anybody here knows of program/github project that is able to take 22050Hz audio and bump up the quality to 48000Hz with the ai predictions to fill out the un-cropped segments?

Anonymous
12/06/24(Fri)06:53:52 No.41707322

Anonymous 12/06/24(Fri)06:53:52 No.41707322

>>41707278
https://audioldm.github.io/audiosr/
https://github.com/haoheliu/versatile_audio_super_resolution/

Anonymous
12/06/24(Fri)10:26:43 No.41707571

Anonymous 12/06/24(Fri)10:26:43 No.41707571

File: SynthV Tempo.png (29 KB, 273x613)

29 KB PNG

>>41706747
>need the pro version of SynthV?
Nope, most of my covers that have used SynthV have all been done via the free/basic version. You are limited to 3 channels/tracks though, but you only really need that many for simple covers.
>>41706831
If you note pic related, there's a 3 digit number next to 4/4, this is your tempo. You'll need to adjust this to suit the bpm of your desired song, which can be found via a quick lookup. You can also right click anywhere on that line to create a marker that will change the tempo once the song reaches that point. Above that, there's also a snap amount in the piano roll that you can adjust too to be smaller than the default. 1/8 Quarter. Pressing Alt+Ctrl while dragging the note start/end bits ignores snapping. This method may still work for you, but there's a bit of a learning curve involved.

Anonymous
12/06/24(Fri)10:39:05 No.41707591

Anonymous 12/06/24(Fri)10:39:05 No.41707591

>>41707571
I understand how the program works, but fermatas and various smooth transitions between tempos are extraordinarily difficult to work with in a system like that. Yeah you can make an arbitrary tempo transition, even a smooth one, but good luck trying to time that to an existing piece of music without being off on your timing. I used to work with Logic and FL Studio a lot so it's not like I'm new to this process, just rusty.

Anonymous
12/06/24(Fri)17:32:40 No.41708621

Anonymous 12/06/24(Fri)17:32:40 No.41708621

>>41707278
What do you mean un-cropped? Regular resampling can be done with ffmpeg. Or do you want to somehow extend spectrum of audio?

Anonymous
12/06/24(Fri)19:15:40 No.41709124

Anonymous 12/06/24(Fri)19:15:40 No.41709124

File: spectogram added.png (1019 KB, 2076x1194)

1019 KB PNG

>>41708621
the thing from anon first link, to add the information that is cropped in the low quality audio.

Anonymous
12/06/24(Fri)20:49:53 No.41709527

Anonymous 12/06/24(Fri)20:49:53 No.41709527

Late night bump.

Anonymous
12/07/24(Sat)00:28:07 No.41710170

Anonymous 12/07/24(Sat)00:28:07 No.41710170

>>41709527
later night bump

Anonymous
12/07/24(Sat)03:29:38 No.41710492

Anonymous 12/07/24(Sat)03:29:38 No.41710492

>>41710355
G4 instrumental album might be useful for isolating vocals?

Anonymous
12/07/24(Sat)06:25:04 No.41710706

Anonymous 12/07/24(Sat)06:25:04 No.41710706

>>41710492
Most of G4 instrumentals and vocals were leaked long time ago. See 2019 leak. It even has vocals on different stages of sound processing.

Anonymous
12/07/24(Sat)06:28:42 No.41710711

Anonymous 12/07/24(Sat)06:28:42 No.41710711

>>41710706
Except at least Glass of Water. I guess it is likely to help to extract better vocals of some songs.

Anonymous
12/07/24(Sat)07:06:21 No.41710763

Anonymous 12/07/24(Sat)07:06:21 No.41710763

>>41710492
>>41710706
>>41710711
True, True Friend is not what is in the show

Anonymous
12/07/24(Sat)09:17:50 No.41710938

Anonymous 12/07/24(Sat)09:17:50 No.41710938

>https://files.catbox.moe/tvkahf.mp3
>Rags To Riches - Tony Bennett - cover with Vinyl Scratch
I was thinking of using Rarara voice but than again I feel like VS isn't used often enough. some of the instrumental and vocal segments did not separated correctly causing the rvc to derp out, I did my best trying to correct few words with the steps gpt-sovit -> talknet -> rvc.

Anonymous
12/07/24(Sat)13:31:08 No.41711504

Anonymous 12/07/24(Sat)13:31:08 No.41711504

https://x.com/fifteenai/status/1865439846744871044
>The past and future of 15.ai
>The plan was always to make the backend open source when the time was right — sadly, much before that time could come, I got hit with a notice saying that I couldn’t do that at all
What a bizarre lie

Anonymous
12/07/24(Sat)13:48:31 No.41711554

Anonymous 12/07/24(Sat)13:48:31 No.41711554

>>41711504
>The plan was always to make the backend open source
Ah yes, that's why he was as vague and secretive as possible for years since the inception of the project and didn't open-source a single thing.
> 2 more weeks

Anonymous
12/07/24(Sat)15:14:08 No.41711800

Anonymous 12/07/24(Sat)15:14:08 No.41711800

>>41710938
I wonder if she's seldom used because of her lack of voice in official media. Anyway. here's the two song covers I did using her. https://files.catbox.moe/64h78h.mp3
https://files.catbox.moe/3tux3i.mp3

Anonymous
12/07/24(Sat)16:13:06 No.41711970

Anonymous 12/07/24(Sat)16:13:06 No.41711970

File: FP_omXcXwAA6r3h.jpg (963 KB, 2000x2892)

963 KB JPG

Dumping full Twitter post since there's no other way to see it (if this retarded site will let me)
>The past and future of 15.ai

>I’ve been meaning to write this for a long time, but I’ve never been good at writing things on social media. I know it’s been a while since I posted anything, but I want to reflect on how 15.ai came to be, share some of my thoughts, and talk about the future of the project.
>The idea of 15.ai started as early as 2016 when I stumbled upon a paper written by DeepMind called “WaveNet: A Generative Model for Raw Audio” as an undergrad at MIT – I was 18 years old. That paper lit a fire under me. I didn’t just want to learn about AI voice generation – I wanted to push it further, to see what was really possible. I dove headfirst into it, fully convinced I could refine this technology and explore its potential like no one else had.
>For three years, I worked on this project alongside my undergraduate studies, and in 2017, the famous Tacotron2 paper was released. In 2019, I gave a lecture and presentation about my findings, as I was able to replicate the results of WaveNet and Tacotron but with about 25% of the data they claimed was necessary (shoutouts to Dr. Edelman at MIT, he’s a great guy and none of this would have happened without him). I had originally planned to base my PhD dissertation on this work and bring that percentage down even lower (my extremely audacious prediction at the time was that you only needed 15 seconds of data to replicate a person’s voice; hence, the name 15), but when the startup I was working on with friends was accepted into the Y Combinator incubator on the very day I had to decide, I chose to enter the industry instead.
>Fast forward about a year and a half, I left the startup in 2020 for various reasons. While I had made a good amount of money working in the industry, my exit wasn’t exactly on the best terms. I felt pretty angry that I had given up my dream of pursuing a PhD and becoming a professor for something that ultimately left me feeling unfulfilled. So, I threw myself back into research. I wanted to prove that the ideas I had once planned for my dissertation weren’t just credible – they were groundbreaking. But the grad school application cycle had already passed, and I wasn’t about to wait a year to apply.
>Instead, I decided to take matters into my own hands. The best way to get my work noticed was to show it off. No gatekeeping, no barriers – just a free, accessible tool for anyone to use. I wanted to democratize AI research. I wanted to give people something that didn’t require coding skills or expensive hardware, something they could just use and be amazed by.

Anonymous
12/07/24(Sat)16:14:08 No.41711975

Anonymous 12/07/24(Sat)16:14:08 No.41711975

>>41711970
>I got to work right away. I hacked together a functional frontend and backend for the website while scouring the Internet for interesting data sources, since well-known speech corpora like LJSpeech were boring. The whole point of the project was to prove that it was possible to replicate speech accurately with as little data as possible. Cloning a monotone voice that enunciates syllables slowly and coherently wasn’t all that impressive; real speech has complex undertones and nuances, and I wanted to capture that challenge.
>That’s when I found the goldmine: My Little Pony: Friendship Is Magic. I was familiar with the show – I had watched it when I was in middle school, but I hadn’t engaged with the fandom in years because of my studies. What truly impressed me was the dedication of the show’s fans from the “Pony Preservation Project”, who had compiled an extensive speech corpus unlike anything I’d seen before. Every single line from the show had been meticulously trimmed, denoised, transcribed, and even emotion-tagged. This was work that no other fandom had ever achieved at the time. (This was 2020 before any of that could be automated – this had all been done by hand.)
>With this newfound data source, I found myself at a turning point. I realized that with this, I could not only push the boundaries of my research but also demonstrate the true potential of what this technology could achieve. I extracted the data from the PPP along with multiple other data sources that I had to manually transcribe (like the voices for GLaDOS, Wheatley, SpongeBob, the Narrator from The Stanley Parable, etc.), trained separate models on the data, and hosted them on the website. The design of the website was intentional – while it was supposed to be very easy to use, I didn’t want my research to go unnoticed. That was why I had included a bunch of relevant numbers, graphs, etc. next to the generated audio files.
>As I added more voices to the website, I realized it was possible to encode all the speakers into a single embedding, which would allow me to train all of the voices simultaneously instead of sequentially, saving me a huge amount of time on research and development. Near the end of 2020, I released a version of the website that added over 50 character voices to the website at once – a huge step up from the 7 or 8 or so I had previously.

Anonymous
12/07/24(Sat)16:15:12 No.41711980

Anonymous 12/07/24(Sat)16:15:12 No.41711980

>>41711975
>Then, 2021 happened. The website exploded. It was all over Twitter, YouTube, and eventually, news outlets. Before I knew it, I was getting slammed by millions of requests every day. Autoscaling on AWS quickly turned into a nightmare, and as I watched the charges rack up, I realized I was in for a long ride. At its absolute peak, I was charged $12K for a single month (yes, you read that right), which included costs for training, inference, hosting, and everything else needed to keep the site running. But honestly? I was too stubborn to stop. I knew what I was getting into, and as a 23-year-old living alone, it was terrifying – but also kind of thrilling.
>The attention came with offers – job interviews, acquisition proposals, you name it. I turned them all down. In hindsight, maybe not the smartest move, but I didn’t want to monetize the project or turn it into a job. I was afraid that would kill the joy I had for it. I just wanted to build something cool and keep improving it. So, I kept quiet and decided to focus entirely on expanding the list of characters and improving the underlying technology.
>In early 2022, the whole Voiceverse NFT plagiarism thing happened, which pissed me off, but ultimately it didn’t do anything in the long run. So there’s that.
>Then, in the middle of 2022, things started to go wrong. I received multiple complaints of copyright violations, and I received a cease-and-desist letter. I dismissed it as unimportant and chose to disregard it, since, technically, copyright law surrounding generative AI at the time was on my side. But due to certain other details that I can’t share here, I was effectively forced into stopping operations of the website immediately without warning or preparation.
>I wanted to bring back the original website as quickly as possible, but my only option was to pivot to something that steered clear of copyright issues. That was easier said than done. I had built my reputation on doing things differently, on showing that I could take on challenges others wouldn’t touch, and now I was in a position where I had to tread carefully. It was frustrating as hell, but I knew I wasn’t going to let this project die – not when I’d come that far.
>Looking back, I’ll admit I was a bit egotistical during this time. I thought I could handle everything on my own: the scaling issues, the legal headaches, the insane costs. I thought I was untouchable because, honestly, I believed I was doing something no one else could. And maybe I still believe that to some extent, because even now, I’m proud of what I built. But I can also see now that my stubbornness might’ve cost me. Maybe if I’d accepted a few offers or reached out for help, things could’ve been different.

Anonymous
12/07/24(Sat)16:16:26 No.41711985

Anonymous 12/07/24(Sat)16:16:26 No.41711985

>>41711980
>Even so, I don’t really regret the core decisions I made. I wanted to create something that mattered, something that made people think, “Wow, someone really built this and gave it away for free?” And I like to think that I succeeded. 15.ai wasn’t just a tool; it was proof that cutting-edge technology and AI doesn’t have to be locked behind paywalls or reserved for corporations. It was a challenge to the status quo, and it was also a little bit of me flexing.
>As for what’s next, I’m still figuring that out. The copyright issues, the shutdown – it all sucked, but it didn’t break me. I’m still working, still thinking about how to bring this back in a way that’s better, smarter, and maybe just a bit more sustainable. I have some ideas, but if I’ve learned anything from all this, it’s that nothing goes exactly as planned. So, I’m going to keep pushing, keep experimenting, and keep doing things my way. Because if nothing else, that’s what got me here in the first place.
>Thanks to everyone who stuck around during the highs, the lows, and everything in between. Whether you loved the site, hated it, or just thought it was interesting, you’re part of what made this whole thing worth it.

>- 15
>P.S.: For journalists, researchers, or anyone else with questions about the project, feel free to reach out to me at 15@15.ai. I’m always open to discussing the journey, the tech, or whatever else you’re curious about.

https://x.com/fifteenai/status/1865439846744871044
https://nitter.poast.org/fifteenai/with_replies

Anonymous
12/07/24(Sat)16:38:55 No.41712091

Anonymous 12/07/24(Sat)16:38:55 No.41712091

File: Twiggy diggy.png (115 KB, 589x699)

115 KB PNG

>>41711985
This is proof that you can be some genius obsessed with academics, yet still completely and utterly fail if you don't have the balls to do or say anything. Not even vaguely state the truth at any point - he has taken multiple years to even do that at the bare minimum. His work was absolutely impressive, but after things went downhill, all he ever did was do his annual report of "it's coming back soon, I swear."

>The plan was always to make the backend open source when the time was right — sadly, much before that time could come, I got hit with a notice saying that I couldn’t do that at all
> I never had a Patreon because I wanted it — people felt like that they had some obligation to donate money (even though I wrote that I wanted no handouts) so I asked someone else to make the account for me. I never wanted to make money off this project.
> As for “keeping people in the dark”, as much as I wanted to tell people what happened, I didn’t even know what was going to happen at the time given the volatility of the space. If I said anything, I’d probably open myself up to something else. If you want a refund, let me know.
> Making a Patreon because people told me to was a huge mistake, honestly. It saddens me to see so many people think my project was a scam because I didn’t notify people even though I never wanted to make money to begin with.
Right.
Every time, this guy does something and then abruptly walks away with no explanation. This happened with Mare Fair early on, before the venue incident. Luckily they sorted it out without much trouble, but again, >15. Then later he comes out to cry about how hard he's had it. Please, spare us the theatrics. Either deliver on your promises, or don't. Just say it, man.

Anonymous
12/07/24(Sat)16:47:14 No.41712128

Anonymous 12/07/24(Sat)16:47:14 No.41712128

>>41711985
The thing is, if he just released the code to the public then no company would have been able to put the cat back in the bag. It would still be available.

Anonymous
12/07/24(Sat)17:01:36 No.41712185

Anonymous 12/07/24(Sat)17:01:36 No.41712185

File: 2284421__safe_starlight+g(...).jpg (2.42 MB, 2800x2500)

2.42 MB JPG

>>41711975
>>That’s when I found the goldmine: My Little Pony: Friendship Is Magic. I was familiar with the show – I had watched it when I was in middle school, but I hadn’t engaged with the fandom in years because of my studies. What truly impressed me was the dedication of the show’s fans from the “Pony Preservation Project”, who had compiled an extensive speech corpus unlike anything I’d seen before. Every single line from the show had been meticulously trimmed, denoised, transcribed, and even emotion-tagged. This was work that no other fandom had ever achieved at the time. (This was 2020 before any of that could be automated – this had all been done by hand.)
Coward.

Ah yes, there was this obscure (by this point) TV show and I just so happened to randomly stumble across this even more obscure thread in a hated corner of the internet, and they just so happened to have really good manually trimmed sets.

I'm also just baffled by how totally contrary this is to the show's lessons. He refused to get any help, never stood by his alleged principles through open sourcing his work (you can always do it somehow no matter what threats you got, let's not bullshit ourselves here), and ultimately wanted all of the fame and glory for himself even if he tries to downplay it. This is literally S0 Twilight.
I'm just... disappointed, I guess. I hoped for more from this guy.

Anonymous
12/07/24(Sat)17:33:04 No.41712334

Anonymous 12/07/24(Sat)17:33:04 No.41712334

>>41712185
People who act like him are either geniuses or fucking retards. The outcome in this case is unfortunate.

Anonymous
12/07/24(Sat)18:10:10 No.41712487

Anonymous 12/07/24(Sat)18:10:10 No.41712487

>>41711970
Anybody want to update 15's Wikipedia article citing these tweets?

Anonymous
12/08/24(Sun)01:39:51 No.41713838

Anonymous 12/08/24(Sun)01:39:51 No.41713838

File: OIG3.RwlwgqFUic77w6ZKbNDy.jpg (141 KB, 1024x1024)

141 KB JPG

Anonymous
12/08/24(Sun)02:18:15 No.41713908

Anonymous 12/08/24(Sun)02:18:15 No.41713908

So did anyone do an auto pony image tagger yet?

Anonymous
12/08/24(Sun)08:51:44 No.41714371

Anonymous 12/08/24(Sun)08:51:44 No.41714371

>>41713908
Last year I was trying out some of the auto1111 addons that did that, but from what I remember sadly most of them were bit sucky in the quality of description that were produced.

Anonymous
12/08/24(Sun)10:07:29 No.41714454

Anonymous 12/08/24(Sun)10:07:29 No.41714454

>>41712487
Wasn't that article marked for deletion?

Anonymous
12/08/24(Sun)13:25:34 No.41714930

Anonymous 12/08/24(Sun)13:25:34 No.41714930

I wonder, can music-lyrics separator, voice recognition and tts be trained all at once on instrumental+vocal tracks from leak? And how it will perform?

Anonymous
12/08/24(Sun)20:02:22 No.41715842

Anonymous 12/08/24(Sun)20:02:22 No.41715842

>>41713908
Not a good one to my knowledge. Using the pony diffusion encoder + kmeans might be an easy way to do it.

Anonymous
12/09/24(Mon)04:21:08 No.41716737

Anonymous 12/09/24(Mon)04:21:08 No.41716737

File: 1650169816081.png (376 KB, 800x850)

376 KB PNG

I would like to make a toast for keeping the haysay ai alive, so even I, with my potato pc, can make pony voice stuff.

Anonymous
12/09/24(Mon)09:14:49 No.41717139

Anonymous 12/09/24(Mon)09:14:49 No.41717139

>>41711970
His project will be buried with him. Good riddance.

Anonymous
12/09/24(Mon)09:26:39 No.41717159

Anonymous 12/09/24(Mon)09:26:39 No.41717159

>>41687414
Is there some generation speed-up when using precomputed values compared to passing the ref audio and text ref?

Anonymous
12/09/24(Mon)09:31:46 No.41717165

Anonymous 12/09/24(Mon)09:31:46 No.41717165

It's sad to see that you're all angry, jealous and bitter about 15. Just because he's better than all of you put together.

Anonymous
12/09/24(Mon)10:43:37 No.41717252

Anonymous 12/09/24(Mon)10:43:37 No.41717252

>>41717165
Should I put that on your epitaph, 15?

Anonymous
12/09/24(Mon)10:47:24 No.41717257

Anonymous 12/09/24(Mon)10:47:24 No.41717257

>>41717165

It's... not that we're jealous, we're just... kinda exhausted. Exhausted of him making promise after promise and not being straight forward with us. Is he talented? Absolutely! But his hubris is holding him back.

Anonymous
12/09/24(Mon)11:48:50 No.41717393

Anonymous 12/09/24(Mon)11:48:50 No.41717393

15.ai still had the closest inflection to the show, and best controllability of any TTS model
The project needs to return in some form or we will definitely be taking a big step backwards in those respects

Anonymous
12/09/24(Mon)14:21:36 No.41717735

Anonymous 12/09/24(Mon)14:21:36 No.41717735

>>41717393
Nah sovits is good enough

Anonymous
12/09/24(Mon)15:00:20 No.41717837

Anonymous 12/09/24(Mon)15:00:20 No.41717837

>>41717393
>will
It seems you missed the news, but 15 TTS has been unavailable for a very long time now. In fact, it has been so long that we now have an open source competitor on the same level, minus some control.

Anonymous
12/09/24(Mon)21:51:43 No.41718749

Anonymous 12/09/24(Mon)21:51:43 No.41718749

so does this actually work? can i get an ai of my waifu?

HydrusBeta
12/10/24(Tue)01:22:46 No.41719101

HydrusBeta 12/10/24(Tue)01:22:46 No.41719101

>>41717159
I haven't run any benchmark or anything, but based on the limited manual testing I've done so far, there is negligible impact on the generation speed.

Anonymous
12/10/24(Tue)02:41:40 No.41719209

Anonymous 12/10/24(Tue)02:41:40 No.41719209

>>41717165
Holy missed point, Batman.

Anonymous
12/10/24(Tue)06:23:53 No.41719499

Anonymous 12/10/24(Tue)06:23:53 No.41719499

>>41718749
Silly tavern had some tts options for a while, and even a module for speech to text, so yeah, you can chat with your waifu if your GPU is chunky enough to run all these tools together.

Anonymous
12/10/24(Tue)07:55:54 No.41719620

Anonymous 12/10/24(Tue)07:55:54 No.41719620

>>41711970
>my extremely audacious prediction at the time was that you only needed 15 seconds of data to replicate a person’s voice; hence, the name 15
I thought it was a reference to GR15.

Anonymous
12/10/24(Tue)10:10:59 No.41719881

Anonymous 12/10/24(Tue)10:10:59 No.41719881

So 15.ai is dead. Atleast we finished playing catchup

Anonymous
12/10/24(Tue)20:14:25 No.41721476

Anonymous 12/10/24(Tue)20:14:25 No.41721476

>>41719881
Really would have been easier if we had cerainty months ago.

Anonymous
12/10/24(Tue)23:52:04 No.41722084

Anonymous 12/10/24(Tue)23:52:04 No.41722084

ElevenLabs won unfortunately. They're also about to release a singing/music AI that might make Udio look like a baby.

Time to put ElevenLabs to good use cause using your local machine to generate good audio is a pipedream unless you got 30k-90k to spare.

Anonymous
12/10/24(Tue)23:56:30 No.41722097

Anonymous 12/10/24(Tue)23:56:30 No.41722097

>>41722084
GPT-SoVITS works on my machine just fine

Anonymous
12/11/24(Wed)00:24:28 No.41722174

Anonymous 12/11/24(Wed)00:24:28 No.41722174

>>41722084
lol
lmao
nigger

Anonymous
12/11/24(Wed)02:28:17 No.41722400

Anonymous 12/11/24(Wed)02:28:17 No.41722400

>>41722084
>Time to put ElevenLabs to good use
Lol. Roflmao, even.

Anonymous
12/11/24(Wed)09:20:01 No.41722986

Anonymous 12/11/24(Wed)09:20:01 No.41722986

>page 10
This is sad. I offer you guys a project to work on so you stay afloat...

Redub Tamers' videos. I'll even pick it for you cause obviously you'll bitch about which video to pick.
https://www.youtube.com/watch?v=UubZA0fNiYg&ab_channel=Tamers12345

Anonymous
12/11/24(Wed)09:33:29 No.41723024

Anonymous 12/11/24(Wed)09:33:29 No.41723024

>>41722986
>Tamers
Inb4 drama.

Anonymous
12/11/24(Wed)09:38:57 No.41723036

Anonymous 12/11/24(Wed)09:38:57 No.41723036

File: just two racist homophobe(...).png (498 KB, 767x430)

498 KB PNG

>>41723024
+1 agree, it's not really PPP related, and there is still whole FiM season 1 of the redub to be done.

Anonymous
12/11/24(Wed)10:25:40 No.41723119

Anonymous 12/11/24(Wed)10:25:40 No.41723119

>>41723036
>filename
Uh, what?

Anonymous
12/11/24(Wed)11:17:31 No.41723226

Anonymous 12/11/24(Wed)11:17:31 No.41723226

>>41722400
Okay, what's the beef with elevenlabs? I don't like that it's not open source, but is that the reason everyone hates it here?

Anonymous
12/11/24(Wed)12:31:18 No.41723415

Anonymous 12/11/24(Wed)12:31:18 No.41723415

>>41723226
elevenlabs are grifters that steal other people's code and turn it into a paid service

Anonymous
12/11/24(Wed)12:41:08 No.41723451

Anonymous 12/11/24(Wed)12:41:08 No.41723451

>>41723415
And yet they btfo everyone else?

Anonymous
12/11/24(Wed)12:44:01 No.41723462

Anonymous 12/11/24(Wed)12:44:01 No.41723462

File: [laughs in Moonwhinny].png (293 KB, 1096x1184)

293 KB PNG

>>41723451

Anonymous
12/11/24(Wed)12:45:35 No.41723470

Anonymous 12/11/24(Wed)12:45:35 No.41723470

>>41723451
okay okay, but seriously, why the fuck are you here if you want to use elevenlabs

Anonymous
12/11/24(Wed)13:59:11 No.41723679

Anonymous 12/11/24(Wed)13:59:11 No.41723679

>>41712185
>I'm also just baffled by how totally contrary this is to the show's lessons. He refused to get any help, never stood by his alleged principles through open sourcing his work (you can always do it somehow no matter what threats you got, let's not bullshit ourselves here), and ultimately wanted all of the fame and glory for himself even if he tries to downplay it. This is literally S0 Twilight.
Funny you say that when he's a Twifag.
Didn't he say he's never seen an episode? Or has that changed?
I'd also like to know how many board projects were killed by ego. This is getting out of hand.

Anonymous
12/11/24(Wed)14:37:53 No.41723830

Anonymous 12/11/24(Wed)14:37:53 No.41723830

Not sure where to ask this but this seems like a good place. There's a project I'm working on and I'll need archive scrapes of /mlp/. Are there any I can download, even if they aren't up-to-date by over a year? Or will I have to start over?

Anonymous
12/11/24(Wed)18:34:47 No.41724926

Anonymous 12/11/24(Wed)18:34:47 No.41724926

>>41723830
I swear I remember someone talking about it the threads... in like 2022 or something like that? hopefully one of the codefags will know more about this.

Anonymous
12/11/24(Wed)19:33:12 No.41725265

Anonymous 12/11/24(Wed)19:33:12 No.41725265

>>41723415
Other peoples' code? As in... there was tech out there that could've been open source but Elevenlabs got to them first? Or how does it work?

Anonymous
12/11/24(Wed)19:50:18 No.41725332

Anonymous 12/11/24(Wed)19:50:18 No.41725332

anyone member cookie? I member.

Anonymous
12/11/24(Wed)23:01:58 No.41726107

Anonymous 12/11/24(Wed)23:01:58 No.41726107

File: 1722913782785.png (67 KB, 387x367)

67 KB PNG

>>41711970
>>39874017
I wasn't sure whether to be upset or understanding. Probably upset since nobody would be able to stop this if he had just made it open-source instead of protecting his ego by insisting he gets all the credit. Now it's kill. And we will never have anything like it again. I don't know how well the claim of C&Ds killing the project holds up, but I heard pony voices aren't on uber*uck anymore, supposedly because of the same legal issues 15 claims to have, though I refuse to make an account to verify this.
I see a lot of disappointment expressed towards ignoring the classic /mlp/ advice of ignoring C&Ds, like nothing will ever happen. Perhaps composing these posts is why 15 won't be going to Mare Fair next year. I know /mlp/ has a tendency to just turn on people once they stop being le based, (although Imalou got what was coming to her) so that's probably all he can do. If there's anything to learn from this, never let your ego run the show.

Anonymous
12/11/24(Wed)23:50:59 No.41726281

Anonymous 12/11/24(Wed)23:50:59 No.41726281

>>41723679
>Didn't he say he's never seen an episode?
>I had watched it when I was in middle school
nice reading comprehension, retard

Anonymous
12/12/24(Thu)00:24:12 No.41726334

Anonymous 12/12/24(Thu)00:24:12 No.41726334

>>41726107
Wait what happened to Imalou?

Anonymous
12/12/24(Thu)00:36:46 No.41726352

Anonymous 12/12/24(Thu)00:36:46 No.41726352

>>41726334
caved into pressure from twitter and denounced /mlp/

Anonymous
12/12/24(Thu)00:46:57 No.41726367

Anonymous 12/12/24(Thu)00:46:57 No.41726367

>>41726334
A anon fucked her so silly she turned bi

Anonymous
12/12/24(Thu)02:15:50 No.41726497

Anonymous 12/12/24(Thu)02:15:50 No.41726497

any update on gpt-so-vits coming to haysay.ai?

Anonymous
12/12/24(Thu)02:34:18 No.41726524

Anonymous 12/12/24(Thu)02:34:18 No.41726524

File: 53120__safe_artist-colon-(...).jpg (52 KB, 683x529)

52 KB JPG

does any know it's possible make twilight ai cover of this song? if so could someone try?
https://files.catbox.moe/o60yol.mp3

Anonymous
12/12/24(Thu)03:22:30 No.41726576

Anonymous 12/12/24(Thu)03:22:30 No.41726576

File: Untitled.png (706 KB, 1080x2002)

706 KB PNG

Zero-Shot Mono-to-Binaural Speech Synthesis
https://arxiv.org/abs/2412.08356
>We present ZeroBAS, a neural method to synthesize binaural audio from monaural audio recordings and positional information without training on any binaural data. To our knowledge, this is the first published zero-shot neural approach to mono-to-binaural audio synthesis. Specifically, we show that a parameter-free geometric time warping and amplitude scaling based on source location suffices to get an initial binaural synthesis that can be refined by iteratively applying a pretrained denoising vocoder. Furthermore, we find this leads to generalization across room conditions, which we measure by introducing a new dataset, TUT Mono-to-Binaural, to evaluate state-of-the-art monaural-to-binaural synthesis methods on unseen conditions. Our zero-shot method is perceptually on-par with the performance of supervised methods on the standard mono-to-binaural dataset, and even surpasses them on our out-of-distribution TUT Mono-to-Binaural dataset. Our results highlight the potential of pretrained generative audio models and zero-shot learning to unlock robust binaural audio synthesis.
https://github.com/google-research/google-research
Might be posted here. Downstream will augment AR and VR experiences.

Anonymous
12/12/24(Thu)03:34:30 No.41726596

Anonymous 12/12/24(Thu)03:34:30 No.41726596

File: Untitled.png (299 KB, 1080x837)

299 KB PNG

LatentSpeech: Latent Diffusion for Text-To-Speech Generation
https://arxiv.org/abs/2412.08117
>Diffusion-based Generative AI gains significant attention for its superior performance over other generative techniques like Generative Adversarial Networks and Variational Autoencoders. While it has achieved notable advancements in fields such as computer vision and natural language processing, their application in speech generation remains under-explored. Mainstream Text-to-Speech systems primarily map outputs to Mel-Spectrograms in the spectral space, leading to high computational loads due to the sparsity of MelSpecs. To address these limitations, we propose LatentSpeech, a novel TTS generation approach utilizing latent diffusion models. By using latent embeddings as the intermediate representation, LatentSpeech reduces the target dimension to 5% of what is required for MelSpecs, simplifying the processing for the TTS encoder and vocoder and enabling efficient high-quality speech generation. This study marks the first integration of latent diffusion models in TTS, enhancing the accuracy and naturalness of generated speech. Experimental results on benchmark datasets demonstrate that LatentSpeech achieves a 25% improvement in Word Error Rate and a 24% improvement in Mel Cepstral Distortion compared to existing models, with further improvements rising to 49.5% and 26%, respectively, with additional training data. These findings highlight the potential of LatentSpeech to advance the state-of-the-art in TTS technology
https://github.com/haoweilou/LatentSpeech
Code is up. might be actually useful

Anonymous
12/12/24(Thu)04:12:31 No.41726641

Anonymous 12/12/24(Thu)04:12:31 No.41726641

File: Untitled.png (1.65 MB, 1080x4373)

1.65 MB PNG

Multimodal Latent Language Modeling with Next-Token Diffusion
https://arxiv.org/abs/2412.08635
>Multimodal generative models require a unified approach to handle both discrete data (e.g., text and code) and continuous data (e.g., image, audio, video). In this work, we propose Latent Language Modeling (LatentLM), which seamlessly integrates continuous and discrete data using causal Transformers. Specifically, we employ a variational autoencoder (VAE) to represent continuous data as latent vectors and introduce next-token diffusion for autoregressive generation of these vectors. Additionally, we develop σ-VAE to address the challenges of variance collapse, which is crucial for autoregressive modeling. Extensive experiments demonstrate the effectiveness of LatentLM across various modalities. In image generation, LatentLM surpasses Diffusion Transformers in both performance and scalability. When integrated into multimodal large language models, LatentLM provides a general-purpose interface that unifies multimodal generation and understanding. Experimental results show that LatentLM achieves favorable performance compared to Transfusion and vector quantized models in the setting of scaling up training tokens. In text-to-speech synthesis, LatentLM outperforms the state-of-the-art VALL-E 2 model in speaker similarity and robustness, while requiring 10x fewer decoding steps. The results establish LatentLM as a highly effective and scalable approach to advance large multimodal models.
https://github.com/microsoft/unilm/tree/master/LatentLM
Code is up. outperforms VALL-E 2 model in speaker similarity and robustness.

Anonymous
12/12/24(Thu)09:17:49 No.41726991

Anonymous 12/12/24(Thu)09:17:49 No.41726991

>>41726334
TDS struck her badly.

Anonymous
12/12/24(Thu)11:07:15 No.41727149

Anonymous 12/12/24(Thu)11:07:15 No.41727149

>>41726524
Sadly the audio separator is struggling a lot with it. The recording is very low quality, and the voice is too low volume to break through the sounds of the guitar.

Anonymous
12/12/24(Thu)14:57:21 No.41727621

Anonymous 12/12/24(Thu)14:57:21 No.41727621

Bump.

Anonymous
12/12/24(Thu)15:45:16 No.41727756

Anonymous 12/12/24(Thu)15:45:16 No.41727756

File: 1728645113826856.jpg (23 KB, 720x720)

23 KB JPG

Does anyone have a Minuette/Colgate voice file? In an .onnx format. I am using Piper

HydrusBeta
12/12/24(Thu)15:55:31 No.41727797

HydrusBeta 12/12/24(Thu)15:55:31 No.41727797

>>41726497
It's coming very soon. I pushed images to docker hub last night but discovered a couple of bugs after some additional testing, so I do NOT recommend updating Hay Say yet if you have it installed locally. I plan to fix the bugs tonight and deploy to haysay.ai either tonight or tomorrow evening.

Anonymous
12/12/24(Thu)15:58:15 No.41727805

Anonymous 12/12/24(Thu)15:58:15 No.41727805

i have a bunch of voices downloaded from hugging face, they are .pth and .index files

i dont know what uses them but i would like to know if anyone here know where i can use them, i tried alltalktts2 but it sounds like garbage, even with 5 minutes of voice clip that i painstakingly transcripted, it still sound like early 15.ai voice

Anonymous
12/12/24(Thu)17:29:16 No.41728107

Anonymous 12/12/24(Thu)17:29:16 No.41728107

>>41727805
>.pth and .index files
uhh, could it be rvc? if you have a 5 minute dataset you can easily train the rvc + sovits and gpt-sovits with it.

Anonymous
12/12/24(Thu)19:49:15 No.41728480

Anonymous 12/12/24(Thu)19:49:15 No.41728480

>>41726576
Why?.. DSP methods are more controllable and physically-accurate.
>>41726596
Looks interesting. Maybe somepony will generate test audio.
>>41726641
This might become one model for STS and TTS. And STT if you need it.

Anonymous
12/12/24(Thu)19:51:20 No.41728482

Anonymous 12/12/24(Thu)19:51:20 No.41728482

>>41728480
STS? Speech to speech? What, like changing from one voice to another?

Anonymous
12/12/24(Thu)20:09:39 No.41728528

Anonymous 12/12/24(Thu)20:09:39 No.41728528

>>41728482
Yes. Like what so-vits-svc does.

I noticed noise at the end of audio https://mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig/folder/Kwp33AQA

Anonymous
12/12/24(Thu)23:48:01 No.41729033

Anonymous 12/12/24(Thu)23:48:01 No.41729033

File: 1714623760529277.jpg (1.52 MB, 1200x1278)

1.52 MB JPG

So where are the Colgate ai voice files?

Anonymous
12/13/24(Fri)01:03:27 No.41729142

Anonymous 12/13/24(Fri)01:03:27 No.41729142

File: eror.png (119 KB, 971x880)

119 KB PNG

Don't know how important this is but i got one generation out of this but the next one shit the bed
>first gen = "Hnng...
>second gen (one that broke) = HEEEEH!!!!!
just fucking around with it of course

Anonymous
12/13/24(Fri)01:17:09 No.41729157

Anonymous 12/13/24(Fri)01:17:09 No.41729157

>>41729142
I think StyleTTS2 is tripping over the extra exclamation marks. "Heeeeh!" seems to work.

Anonymous
12/13/24(Fri)01:46:39 No.41729193

Anonymous 12/13/24(Fri)01:46:39 No.41729193

>>41729157
Cold starting my glimmy
https://files.catbox.moe/i2imfe.mp3

Anonymous
12/13/24(Fri)01:47:40 No.41729194

Anonymous 12/13/24(Fri)01:47:40 No.41729194

>>41727149
that's a shame, what about this song? does it have the same issues as well https://files.catbox.moe/d9q6s3.mp3

Anonymous
12/13/24(Fri)02:25:50 No.41729234

Anonymous 12/13/24(Fri)02:25:50 No.41729234

>>41729193
Glimmy on the autobahn
https://files.catbox.moe/pubqox.mp3

Anonymous
12/13/24(Fri)05:04:42 No.41729395

Anonymous 12/13/24(Fri)05:04:42 No.41729395

>>41729193
>>41729234
>sensiblechuckle.jif

Anonymous
12/13/24(Fri)06:07:09 No.41729448

Anonymous 12/13/24(Fri)06:07:09 No.41729448

>>41729234
Got this with Rarity once.
https://files.catbox.moe/5p3rki.mp3

Anonymous
12/13/24(Fri)09:25:00 No.41729769

Anonymous 12/13/24(Fri)09:25:00 No.41729769

File: sep 2024-12-13_142023.jpg (191 KB, 1253x606)

191 KB JPG

>>41729194
This is more doable, there is still some instrumental leakage to the vocals but that should be easy to clean up.

Anonymous
12/13/24(Fri)12:23:42 No.41730121

Anonymous 12/13/24(Fri)12:23:42 No.41730121

>>41729769
that's good to know, curious are you actually doing the whole ai cover? since I'm not incapable of doing it myself

Anonymous
12/13/24(Fri)12:27:30 No.41730130

Anonymous 12/13/24(Fri)12:27:30 No.41730130

>>41730121
i am incapable*

Anonymous
12/13/24(Fri)12:32:11 No.41730144

Anonymous 12/13/24(Fri)12:32:11 No.41730144

>>41730121
ehh, please wait a bit as I am kind of busy will way to many holiday related project
>incapable of doing it myself
its not really that difficult, even if all you have is potato pc/laptop, you just need to be able to use some kind of vocal separator (either the Ultimate Vocal Remover for offline, or one of the few online ones), than clean up the vocals a bit with audacity and chop it up into 3~10 seconds segments.
After that just use haysay rvc and sovits 5, this process (when i can actually sit down and only focus on it) takes less than two hours to render all clips and put the together into new cover.

Anonymous
12/13/24(Fri)13:05:29 No.41730224

Anonymous 12/13/24(Fri)13:05:29 No.41730224

>>41730144
i don't have a laptop nor pc and I'm fine waiting so take your time mate

Anonymous
12/13/24(Fri)13:36:55 No.41730297

Anonymous 12/13/24(Fri)13:36:55 No.41730297

>>41730224
well, mr phoneposter, I have two version for you than:
https://files.catbox.moe/awanqk.mp3
>Solo Twi
https://files.catbox.moe/0ym3y9.mp3
>Duo with reference vocals
I can feel in my bones how much rvc and sovit was struggling with the non-english words, so I think the duo version is bit better out of the two.

Anonymous
12/13/24(Fri)14:28:46 No.41730435

Anonymous 12/13/24(Fri)14:28:46 No.41730435

>>41730297
sounds better than i thought it would, thank you so much mate

Anonymous
12/13/24(Fri)14:46:57 No.41730472

Anonymous 12/13/24(Fri)14:46:57 No.41730472

>>41729194
just for the record, when you make an AI cover, the singer needs to be in the same vocal range as the pony youre trying to replace it with. If you have song where the original vocals are sung by a guy with a deep voice, trying to replace it with Twilight wont work

Anonymous
12/13/24(Fri)15:58:21 No.41730607

Anonymous 12/13/24(Fri)15:58:21 No.41730607

>>41730472
my dude, thats what the pitch change settings are for.

Anonymous
12/13/24(Fri)15:59:54 No.41730612

Anonymous 12/13/24(Fri)15:59:54 No.41730612

>one of the most active generals on this board was reduced to a rotting corpse because of one man
>15 single-handedly crippled the PPP and put it on life support
kinda based, ngľ

Anonymous
12/13/24(Fri)16:17:32 No.41730656

Anonymous 12/13/24(Fri)16:17:32 No.41730656

File: my brain is full of fuck.jpg (105 KB, 381x479)

105 KB JPG

>>41730607
unless you can change the vocals by a full octave, you can't really change the pitch as that would make the vocals sound out of tune with the music.

Anonymous
12/13/24(Fri)18:07:38 No.41730886

Anonymous 12/13/24(Fri)18:07:38 No.41730886

>>41730656
>change the vocals by a full octave
...dude, im not musical expert but this has been pretty well explained in the past threads, one octane = 12 pitch shifts, thats what happen when you use the rvc/sovits and change the pitch by (plus/minus) 12 or 24 (or other multiplications of 12) to get the correct pitch/octane.
Yeah, the edited vocals will not sound 100% as good as if the original va had sang it but its still way better than using the talknet or re-editing the tts clips.

Anonymous
12/13/24(Fri)20:58:29 No.41731373

Anonymous 12/13/24(Fri)20:58:29 No.41731373

Up.

HydrusBeta
12/13/24(Fri)21:16:05 No.41731435

HydrusBeta 12/13/24(Fri)21:16:05 No.41731435

GPT-SoVITS v2 has (finally!) been added to Hay Say. You have the choice of either uploading reference audio of the character speaking or selecting an emotion from a dropdown list. If you select from the dropdown list, then Hay Say will randomly select from a set of precomputed embeddings for that emotion. This update also includes a security enhancement that restricts network access for most of the Docker containers. That ensures that if there are any malicious packages somewhere in the software dependency tree, they can't phone home.

Note: The way you select the reference audio for GPT-SoVITS in Hay Say is different from how you do it for StyleTTS2. I think this way is a little better. I hope this inconsistency in the UI isn't too confusing.

I'm a little disappointed at the server's performance. Generating one or two sentences with gpt so-vits on haysay.ai takes about about 40 seconds. I have a couple of projects on the horizon that should improve performance across the board, but I'm going to take a break from Hay Say development for a little while to focus on another (but still pony-related) project.

Anonymous
12/13/24(Fri)22:42:26 No.41731661

Anonymous 12/13/24(Fri)22:42:26 No.41731661

File: 1695147544973.png (3 KB, 215x315)

3 KB PNG

>>41731435

Anonymous
12/13/24(Fri)23:36:46 No.41731800

Anonymous 12/13/24(Fri)23:36:46 No.41731800

>>41731435
>Generating one or two sentences with gpt so-vits on haysay.ai takes about about 40 seconds
On CPU? That's already fast enough

Anonymous
12/14/24(Sat)03:52:38 No.41732320

Anonymous 12/14/24(Sat)03:52:38 No.41732320

File: cadence glass.png (84 KB, 500x517)

84 KB PNG

>>41730886
I have made several AI covers, and for the average vocals transposing a full octave is way too much, unless you like Alvin and the Chipmunks or something.

Here's an example of something I was working on. The first minute or so is normal, then the clip repeats with the pitch of Cadance's vocals raised as if to mimic what it would sound like if the original vocals were a bit too low for her range and it needed some minor pitch work. This is one single semitone and you can notice right away how off-key it is.

>https://files.catbox.moe/b2wcon.mp3

>>41731435
That sounds cool. I'll have to try that out sometime. Sounds like it would a nice tool for doing fanfic readings/radio play or something like that.

Anonymous
12/14/24(Sat)08:04:00 No.41732707

Anonymous 12/14/24(Sat)08:04:00 No.41732707

>>41732320
>change of pitch mid song
yep, that will fuck up any conversion if you try to use the exact same setting across the clips. in this case you could try to run the clips as a different voice to "normalize" the pitch and than apply desired pony voice on the output, or find some alternative cover song and take the vocals audio from that instead.

Anonymous
12/14/24(Sat)11:38:51 No.41733226

Anonymous 12/14/24(Sat)11:38:51 No.41733226

>>41730297
It has such heavy accent, almost inunderstandable at times, but I like the idea.

Anonymous
12/14/24(Sat)13:33:04 No.41733556

Anonymous 12/14/24(Sat)13:33:04 No.41733556

Why so-vits-svc 5.0 sounds terrible compared to 4.0?

Did anypony try https://github.com/yl4579/StyleTTS-VC and was it good?

Anonymous
12/14/24(Sat)14:02:37 No.41733644

Anonymous 12/14/24(Sat)14:02:37 No.41733644

There is new paper https://arxiv.org/abs/2409.10058
https://styletts-zs.github.io/

Anonymous
12/14/24(Sat)18:25:15 No.41734531

Anonymous 12/14/24(Sat)18:25:15 No.41734531

>>41723830
There's a 2019 DB dump from Desuarchive. It's the most recent /mlp/ archive dump as far as I can tell.
https://archive.org/details/desuarchive_db_201909

Anonymous
12/14/24(Sat)20:33:41 No.41734872

Anonymous 12/14/24(Sat)20:33:41 No.41734872

GPT-SoVITS doesn't handle Pinkie well. I used line from one fanfic as a test.
>Why would you say that, Sunset? I mean, this is super exciting! Here, I read the backside on the DVD yesterday. Did you know that Rainbow Dash and Applejack have the same voice actress? The same goes for me and Fluttershy!

Anonymous
12/14/24(Sat)21:05:54 No.41734962

Anonymous 12/14/24(Sat)21:05:54 No.41734962

>>41734872
I think the model has problems handling an input string that long. Try breaking it up into chunks of no more than 1-2 sentences.

Anonymous
12/14/24(Sat)21:26:11 No.41735046

Anonymous 12/14/24(Sat)21:26:11 No.41735046

>>41734872
Can you upload your result to compare later?

Anonymous
12/15/24(Sun)02:44:03 No.41735885

Anonymous 12/15/24(Sun)02:44:03 No.41735885

File: OIG2.5tRsMLU0Dy0fo0ct4g6J.jpg (202 KB, 1024x1024)

202 KB JPG

Anonymous
12/15/24(Sun)09:42:32 No.41736373

Anonymous 12/15/24(Sun)09:42:32 No.41736373

>>41735885
How many of those have you made?

Anonymous
12/15/24(Sun)14:02:20 No.41736962

Anonymous 12/15/24(Sun)14:02:20 No.41736962

>>41736373
I go to Bing and generate a new one every time I want to boop the thread. This is the ultimate consequence of being able to create endless slop with no effort.

Anonymous
12/15/24(Sun)16:17:49 No.41737527

Anonymous 12/15/24(Sun)16:17:49 No.41737527

File: IMG_8570.jpg (191 KB, 1500x750)

191 KB JPG

I wish there were more covers of original mlp songs made wit suno, udio, etc. I think these sites are very well suite for this, much beter than most original songs people make with it.

They have a free trial of V4 with the mobile app and I put in low effort WWU in metal style:
https://suno.com/song/5b6b728a-b027-437f-8e58-6a0bf1596b83

Anonymous
12/15/24(Sun)19:14:33 No.41738298

Anonymous 12/15/24(Sun)19:14:33 No.41738298

>>41737527
I do try but getting inspiration for writing good quality lyrics is pretty difficult (even with ai text models, since these have been poisoned with pop music lowbrown lyrics dataset).

Anonymous
12/15/24(Sun)21:43:14 No.41738714

Anonymous 12/15/24(Sun)21:43:14 No.41738714

>>41736962
It does have its merits though.

Anonymous
12/15/24(Sun)23:28:21 No.41739000

Anonymous 12/15/24(Sun)23:28:21 No.41739000

File: 1733960756689922.png (153 KB, 900x1001)

153 KB PNG

Somewhere within this decade or the next, an 18 year old is going to build a robot pony waifu with advanced 2030s AI and grow old with it, like Sweetie Bot but real. And well, I think about the possible repercussions of having a partner that cannot and will never physically age, at least the way humans do. Imagine your great grandnephew or grandniece powering up your rusting 80 year old robot pony waifu long after you've passed away, I say niece or nephew because it's already implied you won't be having kids but your siblings did. What would they do with it? Would the pony rather be dead knowing that its creator died nearly 20 years ago? What exactly would she want to exist for, having outlived her purpose, to make (You) happy? I think about this stuff way too much. It's definitely going to happen.

Anonymous
12/16/24(Mon)02:47:43 No.41739401

Anonymous 12/16/24(Mon)02:47:43 No.41739401

File: Untitled.png (1.57 MB, 1080x3998)

1.57 MB PNG

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
https://arxiv.org/abs/2412.10117
>In our previous work, we introduced CosyVoice, a multilingual speech synthesis model based on supervised discrete speech tokens. By employing progressive semantic decoding with two popular generative models, language models (LMs) and Flow Matching, CosyVoice demonstrated high prosody naturalness, content consistency, and speaker similarity in speech in-context learning. Recently, significant progress has been made in multi-modal large language models (LLMs), where the response latency and real-time factor of speech synthesis play a crucial role in the interactive experience. Therefore, in this report, we present an improved streaming speech synthesis model, CosyVoice 2, which incorporates comprehensive and systematic optimizations. Specifically, we introduce finite-scalar quantization to improve the codebook utilization of speech tokens. For the text-speech LM, we streamline the model architecture to allow direct use of a pre-trained LLM as the backbone. In addition, we develop a chunk-aware causal flow matching model to support various synthesis scenarios, enabling both streaming and non-streaming synthesis within a single model. By training on a large-scale multilingual dataset, CosyVoice 2 achieves human-parity naturalness, minimal response latency, and virtually lossless synthesis quality in the streaming mode.
https://funaudiollm.github.io/cosyvoice2
https://github.com/FunAudioLLM/CosyVoice
https://www.modelscope.cn/studios/iic/CosyVoice2-0.5B
https://huggingface.co/FunAudioLLM
Code is up. Modelscope has a demo with Chinese UI. No weights uploaded to HF yet
multilingual though majority voice data was chinese with english second (some japanese/korean). can voice clone after a fine-tune. example page has a good one of elon

Anonymous
12/16/24(Mon)10:56:41 No.41740018

Anonymous 12/16/24(Mon)10:56:41 No.41740018

>>41739401
>Emotions, speaking rate, dialect. 'role playing'
This is the kind of tts control I fucking wish we had all along.

Anonymous
12/16/24(Mon)13:37:06 No.41740442

Anonymous 12/16/24(Mon)13:37:06 No.41740442

>>41739401
https://www.modelscope.cn/models/iic/CosyVoice2-0.5B/summary
Weights

Anonymous
12/16/24(Mon)18:24:58 No.41741296

Anonymous 12/16/24(Mon)18:24:58 No.41741296

Really enjoying GPT SoVITS. It does Rarity really well. This was done with the anxious preset.

https://files.catbox.moe/prycko.mp3

I did try to upload reference audio and it failed out. Is there a format or length required?

Anonymous
12/16/24(Mon)19:15:24 No.41741439

Anonymous 12/16/24(Mon)19:15:24 No.41741439

>>41741296
>length required
I think in the OG training files it notes to only use 3~10s clips so Im guessing that would apply to the UI tts as well?

Anonymous
12/16/24(Mon)19:56:17 No.41741520

Anonymous 12/16/24(Mon)19:56:17 No.41741520

>>41731435
WAV + OGG are missing as export formats, and mp3 format doesn't work when download is attempted. I'm assuming it's something changed recently, it was working fine ~a week ago.

Anonymous
12/16/24(Mon)20:02:59 No.41741542

Anonymous 12/16/24(Mon)20:02:59 No.41741542

>>41741520
On the website, I should clarify.

HydrusBeta
12/16/24(Mon)21:45:06 No.41741850

HydrusBeta 12/16/24(Mon)21:45:06 No.41741850

>>41741296
Glad to hear you're enjoying it! That's a nice generated Rarity clip. I find it interesting that it generated little gasps/breaths in the right places too.

I played around with using reference audio for a bit but was unable to generate any errors. Could you provide details about the reference file you uploaded or post the error message if you are able to reproduce the issue again? There is no length requirement; I commented out the code that throws an error if the reference is too short or too long (https://github.com/hydrusbeta/GPT-SoVITS/blob/main/GPT_SoVITS/inference_webui.py#L454). Any format that can be read by Librosa should work (which covers a ton of formats). Internally, Hay Say converts the file to .wav and tells GPT-SoVits to use that file.

>>41741520
Ah Shoot. The code for saving to different file formats looks old; I think I never committed my updated code to GIT. I'll work on fixing that now. Thanks for bringing it to my attention.

Anonymous
12/16/24(Mon)21:49:55 No.41741860

Anonymous 12/16/24(Mon)21:49:55 No.41741860

File: error.png (7 KB, 960x318)

7 KB PNG

>>41741850
I'm stuck.

HydrusBeta
12/16/24(Mon)23:03:59 No.41742076

HydrusBeta 12/16/24(Mon)23:03:59 No.41742076

>>41741860
Sorry to see you're getting that error. Strange that it's reporting an unexpected End-of-file. can you pull individual images one at a time? Try this:
docker pull hydrusbeta/hay_say:hay_say_ui

If that works, then you can do the same for the rest of them:
docker pull hydrusbeta/hay_say:so_vits_svc_3_server
docker pull hydrusbeta/hay_say:so_vits_svc_4_server
docker pull hydrusbeta/hay_say:so_vits_svc_5_server
docker pull hydrusbeta/hay_say:rvc_server
docker pull hydrusbeta/hay_say:styletts2_server
docker pull hydrusbeta/hay_say:gpt_so_vits_server
docker pull hydrusbeta/hay_say:controllable_talknet_server

If it fails consistently on one particular image, that could indicate that there's a corrupt layer on one of them, somehow. Please let me know if that's the case. If you successfully pull all images, then try docker compose up again. In the meantime, I'll try installing Hay Say on a system I haven't installed it on before, to see if I can reproduce the issue.

I also see a TLS handshake timeout at the top of your screenshot. Any nonstandard network stuff going on? For example, are you on a VPN or using a proxy?

HydrusBeta
12/17/24(Tue)00:18:09 No.41742274

HydrusBeta 12/17/24(Tue)00:18:09 No.41742274

>>41741520
The file format dropdown should be fixed now on haysay.ai. There are a lot of new options; it now supports all file formats that soundfile supports. I'll push an updated image to Docker tonight for local installs too.

HydrusBeta
12/17/24(Tue)00:32:09 No.41742304

HydrusBeta 12/17/24(Tue)00:32:09 No.41742304

>>41741860
>>41742076
I was able to install Hay Say on a Windows system that hasn't seen it before, so I think that rules out a corrupt image layer. Seeing as how your system was able to partially pull the images, I suspect it was some network glitch that randomly happened partway through the download. Try "docker compose up" again or see if you can pull the images individually.

Anonymous
12/17/24(Tue)00:55:38 No.41742343

Anonymous 12/17/24(Tue)00:55:38 No.41742343

>>41742076
>>41742304
>For example, are you on a VPN or using a proxy?
I'm not doing anything like that but it worked after I spammed docker compose up several times until it completed, so it works now. I did a clean Windows install a few months ago and didn't have this issue and I haven't changed any router settings. Thanks for checking though.

HydrusBeta
12/17/24(Tue)01:04:37 No.41742368

HydrusBeta 12/17/24(Tue)01:04:37 No.41742368

>>41742343
Great! Glad to hear you got it working.

Anonymous
12/17/24(Tue)01:16:52 No.41742392

Anonymous 12/17/24(Tue)01:16:52 No.41742392

>>41739000
She'll make a husbando modeled on you.

Anonymous
12/17/24(Tue)20:38:48 No.41744445

Anonymous 12/17/24(Tue)20:38:48 No.41744445

>>41742392
A colony of pony bots that multiply themselves would be an improvement.

Anonymous
12/18/24(Wed)00:52:51 No.41745048

Anonymous 12/18/24(Wed)00:52:51 No.41745048

File: scared_will.png (153 KB, 421x500)

153 KB PNG

>>41744445
>A colony of pony bots that multiply themselves

https://www.youtube.com/watch?v=dwG6MO92xtI

Anonymous
12/18/24(Wed)02:23:26 No.41745264

Anonymous 12/18/24(Wed)02:23:26 No.41745264

>>41745048
At least they won't eat your stockpiles since they are robots.

Anonymous
12/18/24(Wed)07:17:47 No.41745630

Anonymous 12/18/24(Wed)07:17:47 No.41745630

>>41745264
>Gray goo scenario but it's cute mares
what a time to be alive

Anonymous
12/18/24(Wed)07:54:31 No.41745695

Anonymous 12/18/24(Wed)07:54:31 No.41745695

>>41742274
for gpt-so-vits are we able to also do other characters' voices not listed on haysay by using our own reference audio of said characters, including non-pony?

Anonymous
12/18/24(Wed)11:32:03 No.41746008

Anonymous 12/18/24(Wed)11:32:03 No.41746008

>>41745695
Nta, but Could somebody test that? I'm kind of stuck phone posting until after new year.

Anonymous
12/18/24(Wed)11:57:02 No.41746058

Anonymous 12/18/24(Wed)11:57:02 No.41746058

>>41745695
also can we maybe get some more voice emotion options for the mane six and maybe other ponies such as horny, seductive, hypnotized, etc

Anonymous
12/18/24(Wed)14:21:11 No.41746475

Anonymous 12/18/24(Wed)14:21:11 No.41746475

>>41745630
It's one of the better ways to go, I guess.

Anonymous
12/18/24(Wed)18:51:26 No.41747201

Anonymous 12/18/24(Wed)18:51:26 No.41747201

>10

Anonymous
12/18/24(Wed)20:31:57 No.41747523

Anonymous 12/18/24(Wed)20:31:57 No.41747523

Precautionary late night bump.

Anonymous
12/18/24(Wed)23:56:24 No.41748123

Anonymous 12/18/24(Wed)23:56:24 No.41748123

I don't like gpt sovits, maskgct for now seems to be the best open source option, a shame it takes like 20gb of vram.
is there any new exciting tts tech coming up to look up for?

Anonymous
12/19/24(Thu)02:40:45 No.41748513

Anonymous 12/19/24(Thu)02:40:45 No.41748513

>>41748123
>takes like 20gb of vram
Christ, it's staggering how hard this stuff can go on hardware.

Anonymous
12/19/24(Thu)09:00:32 No.41749005

Anonymous 12/19/24(Thu)09:00:32 No.41749005

>>41748123
>maskgct
Uhh, I don't think that's included in standard gptsovit, since I can run this on my old 8gb vram.

Anonymous
12/19/24(Thu)14:55:45 No.41749695

Anonymous 12/19/24(Thu)14:55:45 No.41749695

File: audio.png (32 KB, 273x931)

32 KB PNG

I know the OP post says G5 not welcome but it feels like a waste not to do anything with the raw voice actor files that have been leaking

Anonymous
12/19/24(Thu)15:00:10 No.41749702

Anonymous 12/19/24(Thu)15:00:10 No.41749702

>>41749695
>Sparky
Is it just farting sounds?

Anonymous
12/19/24(Thu)15:06:27 No.41749713

Anonymous 12/19/24(Thu)15:06:27 No.41749713

>>41749702
his VA reads what the lines are then does them
https://vocaroo.com/16axVp3nhrTx
https://vocaroo.com/1oMVVJ019jMt

Anonymous
12/19/24(Thu)20:58:12 No.41750888

Anonymous 12/19/24(Thu)20:58:12 No.41750888

>>41749695
G5 is a clusterfuck VA wise though. It starts all the way down with the change in the voice cast for the main characters.

Anonymous
12/19/24(Thu)22:32:45 No.41751186

Anonymous 12/19/24(Thu)22:32:45 No.41751186

What does Top K, Top P and Temperature do in GPT SoVITS?

Anonymous
12/20/24(Fri)09:28:34 No.41752226

Anonymous 12/20/24(Fri)09:28:34 No.41752226

>>41751186
+1 to that question.

Anonymous
12/20/24(Fri)10:39:01 No.41752313

Anonymous 12/20/24(Fri)10:39:01 No.41752313

>>41734531
I'll see what I can do with this.

Anonymous
12/20/24(Fri)14:01:45 No.41752756

Anonymous 12/20/24(Fri)14:01:45 No.41752756

File: 1567072416408.png (778 KB, 1201x1780)

778 KB PNG

>>41731435
>Meadowbrook model
>even sounds recognizable
https://voca.ro/16eMFsLvpesI
Fucking incredible, I can die happy now.

Anonymous
12/20/24(Fri)15:34:39 No.41752973

Anonymous 12/20/24(Fri)15:34:39 No.41752973

>>41749695
I guess if no one else is going to do it, I will

Anonymous
12/20/24(Fri)19:29:23 No.41753579

Anonymous 12/20/24(Fri)19:29:23 No.41753579

This seems like it could be useful for workflows.
https://github.com/intel/openvino-plugins-ai-audacity/tree/main
>A set of AI-enabled effects, generators, and analyzers for Audacity®. These AI features run 100% locally on your PC -- no internet connection necessary! OpenVINO™ is used to run AI models on supported accelerators found on the user's system such as CPU, GPU, and NPU.
>Music Separation -- Separate a mono or stereo track into individual stems -- Drums, Bass, Vocals, & Other Instruments.
>Noise Suppression -- Removes background noise from an audio sample.
>Music Generation & Continuation -- Uses MusicGen LLM to generate snippets of music, or to generate a continuation of an existing snippet of music.
>Whisper Transcription -- Uses whisper.cpp to generate a label track containing the transcription or translation for a given selection of spoken audio or vocals.
>Super Resolution -- Upscales and enriches audio for improved clarity and detail.\

Anonymous
12/20/24(Fri)22:20:06 No.41754075

Anonymous 12/20/24(Fri)22:20:06 No.41754075

File: Picsart_24-12-21_11-13-40-220.jpg (2.54 MB, 3464x3464)

2.54 MB JPG

NovelAI's new V4 anime model is actually pretty decent at mares, even though it's not directly intended. Could be useful for more artsy styles.

Anonymous
12/20/24(Fri)22:53:19 No.41754165

Anonymous 12/20/24(Fri)22:53:19 No.41754165

>>41754075
Can v4 do voices now?

Anonymous
12/21/24(Sat)02:23:20 No.41754678

Anonymous 12/21/24(Sat)02:23:20 No.41754678

File: Pony, Feral, Solo, Pinkie(...).png (828 KB, 1024x1024)

828 KB PNG

>>41754165
NovelAI has pretty much only updated their TTS once, like... 3 years ago or something. They likely forgot it even exists.

Anonymous
12/21/24(Sat)02:28:37 No.41754687

Anonymous 12/21/24(Sat)02:28:37 No.41754687

>>41754678
So why exactly are you posting here instead of the AI art thread?

Anonymous
12/21/24(Sat)03:27:54 No.41754764

Anonymous 12/21/24(Sat)03:27:54 No.41754764

>>41754075
those are some neat twiggles

Anonymous
12/21/24(Sat)05:25:01 No.41754933

Anonymous 12/21/24(Sat)05:25:01 No.41754933

File: Pony, Feral, Solo, Rainbo(...).png (678 KB, 1024x1024)

678 KB PNG

>>41754687
>Forgor
>More mares
>Limited time during break
>Improvement of existing tech highlighted here in the past
>Since when PPP thread voice only
Rainbow face went wurbwap

Anonymous
12/21/24(Sat)08:39:36 No.41755173

Anonymous 12/21/24(Sat)08:39:36 No.41755173

>>41754687
nta but i feel like it's good idea to share ai news between the ai threads just to keep people updated.

Anonymous
12/21/24(Sat)11:23:25 No.41755632

Anonymous 12/21/24(Sat)11:23:25 No.41755632

>>41755173
Yeah, it's good idea when it makes you money. Too bad you don't share news about anything else.

Anonymous
12/21/24(Sat)15:41:01 No.41756232

Anonymous 12/21/24(Sat)15:41:01 No.41756232

>>41755632
meds

Anonymous
12/21/24(Sat)23:59:26 No.41757931

Anonymous 12/21/24(Sat)23:59:26 No.41757931

>10

Anonymous
12/22/24(Sun)03:39:08 No.41758388

Anonymous 12/22/24(Sun)03:39:08 No.41758388

https://civitai.com/models/833294?modelVersionId=1190596
The final version of NoobAI v-pred is out.

Anonymous
12/22/24(Sun)14:27:45 No.41759472

Anonymous 12/22/24(Sun)14:27:45 No.41759472

>>41757931
Almost there.

Anonymous
12/22/24(Sun)14:33:20 No.41759488

Anonymous 12/22/24(Sun)14:33:20 No.41759488

File: 172969362522.jpg (37 KB, 640x460)

37 KB JPG

>>41759472

Anonymous
12/22/24(Sun)20:13:12 No.41760457

Anonymous 12/22/24(Sun)20:13:12 No.41760457

>>41757931

Anonymous
12/23/24(Mon)03:54:47 No.41761484

Anonymous 12/23/24(Mon)03:54:47 No.41761484

>>41706417
EqG voices for ponified EqGirls so I can make more FiM ponies

Anonymous
12/23/24(Mon)10:05:25 No.41762138

Anonymous 12/23/24(Mon)10:05:25 No.41762138

>>41759488
Mares forever!

Anonymous
12/23/24(Mon)19:26:25 No.41763643

Anonymous 12/23/24(Mon)19:26:25 No.41763643

>>41762138
Indeed.

Anonymous
12/24/24(Tue)01:26:20 No.41764389

Anonymous 12/24/24(Tue)01:26:20 No.41764389

File: OIG3.yvrDZfmwKniM3X2LUvO8.jpg (182 KB, 1024x1024)

182 KB JPG

Anonymous
12/24/24(Tue)02:30:31 No.41764480

Anonymous 12/24/24(Tue)02:30:31 No.41764480

>slow day to day free man!

Anonymous
12/24/24(Tue)04:56:03 No.41764730

Anonymous 12/24/24(Tue)04:56:03 No.41764730

>>41764480
Yeah, that can happen during Christmas.

Anonymous
12/24/24(Tue)05:04:23 No.41764743

Anonymous 12/24/24(Tue)05:04:23 No.41764743

Happy Hearth's Warming, everypreservationist!

Anonymous
12/24/24(Tue)11:41:29 No.41765461

Anonymous 12/24/24(Tue)11:41:29 No.41765461

>>41764743
You too, fellow preservationist!

Anonymous
12/24/24(Tue)15:41:13 No.41765990

Anonymous 12/24/24(Tue)15:41:13 No.41765990

>>41764743
So once again it's the cheery time of the year!

Anonymous
12/24/24(Tue)18:15:50 No.41766445

Anonymous 12/24/24(Tue)18:15:50 No.41766445

>>41764480
True.

Anonymous
12/24/24(Tue)21:03:56 No.41766948

Anonymous 12/24/24(Tue)21:03:56 No.41766948

>9

Anonymous
12/25/24(Wed)01:58:19 No.41767648

Anonymous 12/25/24(Wed)01:58:19 No.41767648

File: 12403.jpg (178 KB, 1024x1024)

178 KB JPG

Anonymous
12/25/24(Wed)06:02:59 No.41768025

Anonymous 12/25/24(Wed)06:02:59 No.41768025

Up.

Anonymous
12/25/24(Wed)06:21:03 No.41768042

Anonymous 12/25/24(Wed)06:21:03 No.41768042

Nice bump thread, faggots

Anonymous
12/25/24(Wed)09:31:31 No.41768329

Anonymous 12/25/24(Wed)09:31:31 No.41768329

>>41751186
Top K is Top Kek, it measures how funny you want the output to be
Top P is Top Pony, the higher it is the more the voice will resemble best pony
Temperature is how hot AKA sexy you want it to sound

Anonymous
12/25/24(Wed)14:48:27 No.41769337

Anonymous 12/25/24(Wed)14:48:27 No.41769337

>>41768042
yeah

Anonymous
12/25/24(Wed)23:59:38 No.41770942

Anonymous 12/25/24(Wed)23:59:38 No.41770942

>>41768042
Bump for the bump thread! Sage for the sage throne!

Anonymous
12/26/24(Thu)00:59:45 No.41771141

Anonymous 12/26/24(Thu)00:59:45 No.41771141

>>41768329
Top K
>>41751186
This is a little hard to explain if you don't understand how inference works. When generating outputs, the model generates one "token" at a time, and it assigns a probability to each possible token. Top K should limit the per-token output selection based on the k best matches. E.g., with topk=5, it finds the 5 most likely tokens and picks from that. Top P should limit the per-token output selection based on the probability. With topp=0.5, it should find the most likely tokens up to 50% probability, then cut off the rest. Higher temperature makes all the tokens more equally likely to be selected, lower temperature makes it so probabilities are skewed in favor of the more likely tokens. You generally only want to use one of of [topk, topp, temperature] for any given inference.
topk, topp, and low temperature all accomplish the similar things in slightly different ways. If you want the model to pick the single best output at every time (which will give you the same result every time you run it), you want temperature=0 or topk=1 (same thing). If you want to prevent very bad tokens from being selected, you want to use topp <= say 0.8. (Lower topp means more tokens are considered "very bad".) Other than that, the values get hard to interpret.
That's how it works for LLMs. I'm guessing it's the same for GPT-SoVITS.

Anonymous
12/26/24(Thu)06:24:24 No.41771812

Anonymous 12/26/24(Thu)06:24:24 No.41771812

>>41770942
Let the catalog burn!

Anonymous
12/26/24(Thu)08:21:27 No.41771952

Anonymous 12/26/24(Thu)08:21:27 No.41771952

>>41754075
AI art has now more soul than human made art. It's over, join us!
AI is here to save the fandom and artists against it or trying to stop AI projects are the threat, not AI.

Anonymous
12/26/24(Thu)08:25:04 No.41771957

Anonymous 12/26/24(Thu)08:25:04 No.41771957

>>41771952
Don't bring this shit here.

Anonymous
12/26/24(Thu)08:38:07 No.41771981

Anonymous 12/26/24(Thu)08:38:07 No.41771981

File: 3196688.png (398 KB, 512x512)

398 KB PNG

>>41771957
It's just true, the fandom will not be saved by creeps trying to sell both MLP art ~~and furry or loli porn~~ but with absolute freedom of creativity, this is important.

Anonymous
12/26/24(Thu)10:53:57 No.41772189

Anonymous 12/26/24(Thu)10:53:57 No.41772189

>>41771952
fuck off with this divisive bullshit, go back the discord your crawled into the AI art general from.

Anonymous
12/26/24(Thu)11:48:44 No.41772285

Anonymous 12/26/24(Thu)11:48:44 No.41772285

>>41754075
Most important question: what publically avaliable models(weights, code, training and tuning procedures) can achieve and how can they be improved.
As a demo of AI potential this is fine, so now we need to get some idea how get that and better.

Anonymous
12/26/24(Thu)11:53:07 No.41772290

Anonymous 12/26/24(Thu)11:53:07 No.41772290

We might try to develop MAGIC: Multi-Agent Generative Image Converter.
I've seen some MAS LLM research, but is there any MAS image generator papers?

Anonymous
12/26/24(Thu)15:11:09 No.41772729

Anonymous 12/26/24(Thu)15:11:09 No.41772729

>>41771952
No drama, just pony.

Anonymous
12/26/24(Thu)16:45:29 No.41772991

Anonymous 12/26/24(Thu)16:45:29 No.41772991

here's all the leaked tell your tale lines so far
https://mega.nz/folder/pWczEYKY#T19kpTbI7haPnw63G2msoA

Anonymous
12/26/24(Thu)20:14:06 No.41773578

Anonymous 12/26/24(Thu)20:14:06 No.41773578

>>41772991
>G%
Big old meh.

Anonymous
12/27/24(Fri)00:14:36 No.41774248

Anonymous 12/27/24(Fri)00:14:36 No.41774248

>>41772991
If audio from the celebrity VAs in ANG had leaked, that might be worthwhile. Otherwise, I sleep.

LatentThroat
12/27/24(Fri)04:34:14 No.41774700

LatentThroat 12/27/24(Fri)04:34:14 No.41774700

>>41726596
Looks interesting. I think this is first paper on voice conversion that I have read entirely with searching for everything I don't know.
I'm comparing it with StyleTTS2, and new thing this paper seems to propose is alternative to decoder and slight alteration to training process.
1. It does not use convolution. Not a fancy upsampler.
2. Instead of using spectrogram like iSTFTNet or HifiGAN it uses latent diffusion model to generate multiple waveforms on different frequency bands directly from embedding. LatentThroating in other words.
3. As result, we don't need two models for embedding->mel and mel->waveform(or STFT).
4. It is not GAN. (Yay training?)

It is still possible to make upsampler by giving encoder decimated input(or giving less bands) and comparing loss relative to true audio.
In StyleTTS2 it would be replacement of vocoder and require replacement of decoder, maybe merging them in one model.

LatentThroat
12/27/24(Fri)05:00:49 No.41774722

LatentThroat 12/27/24(Fri)05:00:49 No.41774722

>>41774700
StyleTTS-ZS looks like decoder+vocoder combo I mentioned. But audio does not go through PQMF.

LatentThroat
12/27/24(Fri)05:22:00 No.41774752

LatentThroat 12/27/24(Fri)05:22:00 No.41774752

Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
https://arxiv.org/abs/2401.10460v1
>Neural vocoders model the raw audio waveform and synthesize high-quality audio, but even the highly efficient ones, like MB-MelGAN and LPCNet, fail to run real-time on a low-end device like a smartglass. A pure digital signal processing (DSP) based vocoder can be implemented via lightweight fast Fourier transforms (FFT), and therefore, is a magnitude faster than any neural vocoder. A DSP vocoder often gets a lower audio quality due to consuming over-smoothed acoustic model predictions of approximate representations for the vocal tract. In this paper, we propose an ultra-lightweight differential DSP (DDSP) vocoder that uses a jointly optimized acoustic model with a DSP vocoder, and learns without an extracted spectral feature for the vocal tract. The model achieves audio quality comparable to neural vocoders with a high average MOS of 4.36 while being efficient as a DSP vocoder. Our C++ implementation, without any hardware-specific optimization, is at 15 MFLOPS, surpasses MB-MelGAN by 340 times in terms of FLOPS, and achieves a vocoder-only RTF of 0.003 and overall RTF of 0.044 while running single-threaded on a 2GHz Intel Xeon CPU.

LatentThroat
12/27/24(Fri)05:32:09 No.41774765

LatentThroat 12/27/24(Fri)05:32:09 No.41774765

>>41774700
Correction: there is still convolution

Anonymous
12/27/24(Fri)10:46:45 No.41775349

Anonymous 12/27/24(Fri)10:46:45 No.41775349

>mare

Anonymous
12/27/24(Fri)11:55:27 No.41775543

Anonymous 12/27/24(Fri)11:55:27 No.41775543

>>41731435
Been a hot while since I've been to these threads. This is extremely impressive, definitely better than what I remember 15 being, give or take.
What's the latest development on producing non-horse audio? Are there any actually solid services that sound halfway good?

Anonymous
12/27/24(Fri)16:37:33 No.41776366

Anonymous 12/27/24(Fri)16:37:33 No.41776366

>>41775543
Just train same ai models on different voices

LatentThroat
12/27/24(Fri)18:58:57 No.41776814

LatentThroat 12/27/24(Fri)18:58:57 No.41776814

LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
https://arxiv.org/abs/2406.05325v1
>Any-to-any singing voice conversion (SVC) is an interesting audio editing technique, aiming to convert the singing voice of one singer into that of another, given only a few seconds of singing data. However, during the conversion process, the issue of timbre leakage is inevitable: the converted singing voice still sounds like the original singer's voice. To tackle this, we propose a latent diffusion model for SVC (LDM-SVC) in this work, which attempts to perform SVC in the latent space using an LDM. We pretrain a variational autoencoder structure using the noted open-source So-VITS-SVC project based on the VITS framework, which is then used for the LDM training. Besides, we propose a singer guidance training method based on classifier-free guidance to further suppress the timbre of the original singer. Experimental results show the superiority of the proposed method over previous works in both subjective and objective evaluations of timbre similarity.

LatentThroat
12/27/24(Fri)19:21:11 No.41776910

LatentThroat 12/27/24(Fri)19:21:11 No.41776910

LHQ-SVC: Lightweight and High Quality Singing Voice Conversion Modeling
https://arxiv.org/abs/2409.08583
>Singing Voice Conversion (SVC) has emerged as a significant subfield of Voice Conversion (VC), enabling the transformation of one singer's voice into another while preserving musical elements such as melody, rhythm, and timbre. Traditional SVC methods have limitations in terms of audio quality, data requirements, and computational complexity. In this paper, we propose LHQ-SVC, a lightweight, CPU-compatible model based on the SVC framework and diffusion model, designed to reduce model size and computational demand without sacrificing performance. We incorporate features to improve inference quality, and optimize for CPU execution by using performance tuning tools and parallel computing frameworks. Our experiments demonstrate that LHQ-SVC maintains competitive performance, with significant improvements in processing speed and efficiency across different devices. The results suggest that LHQ-SVC can meet

LatentThroat
12/27/24(Fri)21:18:09 No.41777332

LatentThroat 12/27/24(Fri)21:18:09 No.41777332

>>41389084
Interesting. Now I'm thinking about voice-only variation of SESD.
1. Train Decoder->Encoder like LatentSpeech does
2. Freeze codec and train embedding denoiser
The goal here is to deal better with speakers differences.
Or maybe something else, but idea is to make encoder more stable.

LatentThroat
12/27/24(Fri)21:35:43 No.41777380

LatentThroat 12/27/24(Fri)21:35:43 No.41777380

>>41776814
I found LDM-SVC-but-faster paper

LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation
https://arxiv.org/abs/2408.12354
>Any-to-any singing voice conversion (SVC) aims to transfer a target singer's timbre to other songs using a short voice sample. However many diffusion model based any-to-any SVC methods, which have achieved impressive results, usually suffered from low efficiency caused by a mass of inference steps. In this paper, we propose LCM-SVC, a latent consistency distillation (LCD) based latent diffusion model (LDM) to accelerate inference speed. We achieved one-step or few-step inference while maintaining the high performance by distilling a pre-trained LDM based SVC model, which had the advantages of timbre decoupling and sound quality. Experimental results show that our proposed method can significantly reduce the inference time and largely preserve the sound quality and timbre similarity comparing with other state-of-the-art SVC models.

Anonymous
12/28/24(Sat)05:30:04 No.41778191

Anonymous 12/28/24(Sat)05:30:04 No.41778191

Up.

Anonymous
12/28/24(Sat)06:29:24 No.41778261

Anonymous 12/28/24(Sat)06:29:24 No.41778261

https://www.youtube.com/watch?v=WPUVxX734iw
Is technology good enough to finally make it sung in Ponk's voice?

Anonymous
12/28/24(Sat)14:00:18 No.41779092

Anonymous 12/28/24(Sat)14:00:18 No.41779092

Bonk.

Anonymous
12/28/24(Sat)16:31:02 No.41779488

Anonymous 12/28/24(Sat)16:31:02 No.41779488

File: full.png (388 KB, 1470x473)

388 KB PNG

>>41778261
Yes it is!
https://files.catbox.moe/wa6e8d.wav

Anonymous
12/28/24(Sat)20:37:02 No.41780286

Anonymous 12/28/24(Sat)20:37:02 No.41780286

>>41779092

Anonymous
12/28/24(Sat)21:13:36 No.41780436

Anonymous 12/28/24(Sat)21:13:36 No.41780436

is it just me or is haysay.ai down?

Anonymous
12/28/24(Sat)21:25:06 No.41780487

Anonymous 12/28/24(Sat)21:25:06 No.41780487

REDUB TAMERS' VIDEOS
REDUB TAMERS' VIDEOS
REDUB TAMERS' VIDEOS

Anonymous
12/29/24(Sun)04:15:33 No.41781331

Anonymous 12/29/24(Sun)04:15:33 No.41781331

File: OIG3.FfdM1NXya2uoBksLaw8m.jpg (199 KB, 1024x1024)

199 KB JPG

Anonymous
12/29/24(Sun)08:46:04 No.41781717

Anonymous 12/29/24(Sun)08:46:04 No.41781717

Up.

Anonymous
12/29/24(Sun)09:05:37 No.41781750

Anonymous 12/29/24(Sun)09:05:37 No.41781750

>>41780436
https://files.catbox.moe/ej3uqm.mp3
as rule of thumb I usually wait 5 minutes and refresh the website to see if it underps itself.

Anonymous
12/29/24(Sun)17:05:33 No.41783016

Anonymous 12/29/24(Sun)17:05:33 No.41783016

>bump

Anonymous
12/29/24(Sun)17:33:38 No.41783125

Anonymous 12/29/24(Sun)17:33:38 No.41783125

how does one train a gpt-sovits-v2 model?

Anonymous
12/29/24(Sun)17:38:14 No.41783142

Anonymous 12/29/24(Sun)17:38:14 No.41783142

>>41783125
Tenderly yet firmly.

Anonymous
12/29/24(Sun)17:43:14 No.41783163

Anonymous 12/29/24(Sun)17:43:14 No.41783163

>>41783125
https://rentry.co/GPT-SoVITS-guide
https://huggingface.co/Delik/gsvlite/resolve/main/GPT-SoVITS-Lite.7z?download=true
lite version (v1)
I remember some files needed to be mess around with due to python being retarded and not connecting to the correct elements.

Anonymous
12/29/24(Sun)20:59:07 No.41783762

Anonymous 12/29/24(Sun)20:59:07 No.41783762

>>41783016

Anonymous
12/30/24(Mon)01:09:26 No.41784445

Anonymous 12/30/24(Mon)01:09:26 No.41784445

>>41783762

Anonymous
12/30/24(Mon)04:51:24 No.41784839

Anonymous 12/30/24(Mon)04:51:24 No.41784839

>>41784445

Anonymous
12/30/24(Mon)10:18:18 No.41785304

Anonymous 12/30/24(Mon)10:18:18 No.41785304

>mares

Anonymous
12/30/24(Mon)14:09:06 No.41785941

Anonymous 12/30/24(Mon)14:09:06 No.41785941

>>41779092
BOINK

Anonymous
12/30/24(Mon)20:16:27 No.41787007

Anonymous 12/30/24(Mon)20:16:27 No.41787007

>>41785304
>stallions

Anonymous
12/30/24(Mon)23:31:45 No.41787642

Anonymous 12/30/24(Mon)23:31:45 No.41787642

>>41785941

Anonymous
12/31/24(Tue)04:16:56 No.41788232

Anonymous 12/31/24(Tue)04:16:56 No.41788232

>>41787642

Anonymous
12/31/24(Tue)11:45:34 No.41788936

Anonymous 12/31/24(Tue)11:45:34 No.41788936

>>41788232

Anonymous
12/31/24(Tue)11:49:21 No.41788949

Anonymous 12/31/24(Tue)11:49:21 No.41788949

>>41788232

Anonymous
12/31/24(Tue)11:56:38 No.41788969

Anonymous 12/31/24(Tue)11:56:38 No.41788969

REDUB TAMERS' VIDEOS
REDUB TAMERS' VIDEOS
REDUB TAMERS' VIDEOS!!!

Anonymous
12/31/24(Tue)13:08:05 No.41789144

Anonymous 12/31/24(Tue)13:08:05 No.41789144

>>41788969
nah, I would prefer anons getting into ai music.

Anonymous
12/31/24(Tue)13:09:54 No.41789149

Anonymous 12/31/24(Tue)13:09:54 No.41789149

How do we stop being dead?

Anonymous
12/31/24(Tue)13:31:37 No.41789219

Anonymous 12/31/24(Tue)13:31:37 No.41789219

>>41789149
AI in it's current state is a fad. Just like early phonographs were.
>oh, we can record and play back sounds now? that's cool I guess.
Most people have gotten over AI. It's not amazing anymore. The technology won't be very interesting to most people unless it has
>Ease of use
>Low cost
>Maturity
Ease of use was presented by 15.ai, and it was free and mature compared to the ngroks, but it's gone now. Haysay isn't quite the same, and that's because of maturity.
It's going to be current year tomorrow and we still have robotic AI voices. Paid AI models might be better, but 1. Nobody here wants to pay for AI, and 2. Doxing yourself just to have all your inputs restricted and lobotomized isn't worth it.
And nobody here cares about voice to voice, this has always been a TTS thread, as the most active periods have been when 15.ai was alive. Why would anyone want to sing like an absolute fag just to make songs with robotic pony voices?
So yeah, we're gonna be dead for a while. Nobody's going to care that much until some new technology comes and reignites interest in it again.

Anonymous
12/31/24(Tue)14:41:16 No.41789425

Anonymous 12/31/24(Tue)14:41:16 No.41789425

>>41789219
We do have non robotic voices it's called Udio and ElevenLabs, not our problem you Sonicfags and Bronies are so uncultured and live under a rock. I think 15.AI got bought or joined a bigger AI project, the guy was always secretive.

Anonymous
12/31/24(Tue)15:05:43 No.41789516

Anonymous 12/31/24(Tue)15:05:43 No.41789516

File: 17339797948262.png (169 KB, 578x566)

169 KB PNG

>>41789425
>can't read and immediately begins seething at nothing

Anonymous
12/31/24(Tue)16:31:31 No.41789828

Anonymous 12/31/24(Tue)16:31:31 No.41789828

>>41789425
15 said he shut his site down because he received a cease & desist letter. Interestingly, he never said who sent the letter: was it Hasbro or the Screen Actors' Guild? Originally, 15 cited the Google Books precedent as a way to justify why his site wasn't illegal, but after SAG went on strike and received concessions to give actors control over how their likenesses can be used in AI training, they'd have a stronger case to get 15 shut down.

Anonymous
12/31/24(Tue)16:35:26 No.41789843

Anonymous 12/31/24(Tue)16:35:26 No.41789843

>>41789828
The same cease and desist Jan got, right?

Anonymous
12/31/24(Tue)16:37:29 No.41789854

Anonymous 12/31/24(Tue)16:37:29 No.41789854

>>41789843
That was definitely Hasbro.

Anonymous
12/31/24(Tue)16:40:17 No.41789859

Anonymous 12/31/24(Tue)16:40:17 No.41789859

>>41789854
he could've just ignored it

Anonymous
12/31/24(Tue)17:35:49 No.41790053

Anonymous 12/31/24(Tue)17:35:49 No.41790053

>>41789859
If he ignored the C&D, he'd have to prepare to defend his position in case Hasbro decides to sue him. Since he produced animations that a casual observer can mistake for legitimate MLP cartoons, that's clearly a violation of Hasbro's trademark. He could claim he's producing a parody, but it'll still be very hard for him to win the lawsuit. I don't think ForgaLorga/Agrol has that problem simply because his animations don't have any dialogue.

Anonymous
12/31/24(Tue)17:40:59 No.41790066

Anonymous 12/31/24(Tue)17:40:59 No.41790066

>>41790053
Hasbro would never go after him, nobody wants to waste money.

Anonymous
12/31/24(Tue)20:30:38 No.41790708

Anonymous 12/31/24(Tue)20:30:38 No.41790708

happy new year for mares!

Anonymous
12/31/24(Tue)20:43:06 No.41790780

Anonymous 12/31/24(Tue)20:43:06 No.41790780

File: 1735694448231793.png (37 KB, 170x192)

37 KB PNG

>>41790708
New year? NEW MARES!

Anonymous
12/31/24(Tue)21:31:20 No.41790955

Anonymous 12/31/24(Tue)21:31:20 No.41790955

>>41789425
>ElevenFags
Kill yourself.

Anonymous
01/01/25(Wed)03:27:10 No.41791927

Anonymous 01/01/25(Wed)03:27:10 No.41791927

Happy New Year!

Anonymous
01/01/25(Wed)05:49:15 No.41792299

Anonymous 01/01/25(Wed)05:49:15 No.41792299

>>41791927
Happy New Year to you too!

Vogelfag revealed
01/01/25(Wed)12:45:28 No.41792946

Vogelfag revealed 01/01/25(Wed)12:45:28 No.41792946

>>41789828
>The good ol'd reliable we got C&D u guys!!
HAHAHA AND YOU BELIEVED IT? Tiarawhy came out and told me he lied about it to saveface, they fabricated the email themselves.

Anonymous
01/01/25(Wed)16:28:31 No.41793517

Anonymous 01/01/25(Wed)16:28:31 No.41793517

>10

Anonymous
01/02/25(Thu)06:21:57 No.41795152

Anonymous 01/02/25(Thu)06:21:57 No.41795152

>>41793517

LatentThroat
01/02/25(Thu)14:09:00 No.41795943

LatentThroat 01/02/25(Thu)14:09:00 No.41795943

>>41789219
StyleTTS2 does pretty mares' voices from robotic input. We can try doing StyleTTS2-VC like StyleTTS-VC or wait for StyleTTS-ZS models to come out.

Or do our own models.

Anonymous
01/02/25(Thu)17:32:49 No.41796493

Anonymous 01/02/25(Thu)17:32:49 No.41796493

Is it still impossible to do voice AI without Nvidia GPU?

Anonymous
01/02/25(Thu)17:38:54 No.41796514

Anonymous 01/02/25(Thu)17:38:54 No.41796514

>>41796493
Sadly all current tech is depended on (py)torch and that is completely depended on Nvidia gpu (there are some walkabouts to make AMD gpus work with it on Linux but from what I heard it's very tricky to make it work).

Anonymous
01/02/25(Thu)21:39:06 No.41797271

Anonymous 01/02/25(Thu)21:39:06 No.41797271

>>41795152
Almost again.

Anonymous
01/03/25(Fri)03:00:08 No.41797887

Anonymous 01/03/25(Fri)03:00:08 No.41797887

What do you think ponies would call machine learning?

Anonymous
01/03/25(Fri)03:58:59 No.41797986

Anonymous 01/03/25(Fri)03:58:59 No.41797986

File: 960x0.jpg (64 KB, 959x629)

64 KB JPG

Anonymous
01/03/25(Fri)06:34:27 No.41798255

Anonymous 01/03/25(Fri)06:34:27 No.41798255

>>41797986
Ugh, what has happened this time?

Anonymous
01/03/25(Fri)07:16:15 No.41798302

Anonymous 01/03/25(Fri)07:16:15 No.41798302

File: Flutterdead.png (96 KB, 447x272)

96 KB PNG

>>41798255
Nothing. I think that's the point, this place is kill.

Anonymous
01/03/25(Fri)10:13:22 No.41798557

Anonymous 01/03/25(Fri)10:13:22 No.41798557

>>41798302
Hey now, there is a pretty good possibility we will get some kind of mlp animation generator based on anons posting from ai image threads.
>>41797878 >>41797864
it will take a while to see a smaller scale version, hopefully some other nerds will join in in making new models/architecture since right now everything is now limited to openai vs chinese stuff.

Anonymous
01/03/25(Fri)12:26:23 No.41798910

Anonymous 01/03/25(Fri)12:26:23 No.41798910

>>41798302
In this case, bump.

Anonymous
01/03/25(Fri)16:01:34 No.41799570

Anonymous 01/03/25(Fri)16:01:34 No.41799570

Up.

Anonymous
01/03/25(Fri)17:14:58 No.41799855

Anonymous 01/03/25(Fri)17:14:58 No.41799855

hydrusbeta, could you post a link for the RVC models depository (specially the ones for singing)? im asking as the huggingface.co/hydrusbeta seems to be missing a lot of singing models, and I kind prefer to be able to run this offline as uploading and running voice convector online takes ages.

Anonymous
01/03/25(Fri)18:06:25 No.41799984

Anonymous 01/03/25(Fri)18:06:25 No.41799984

File: luna tophat.jpg (23 KB, 300x300)

23 KB JPG

https://files.catbox.moe/3oavqv.mp3
Luna song. Few times rvc misunderstood loud=high pitch.
Any chance one of training wizards will be able to use the 3 seconds "I missed you big sister" clip to create S1 Luna tts?

Anonymous
01/03/25(Fri)19:20:15 No.41800240

Anonymous 01/03/25(Fri)19:20:15 No.41800240

>>41711970
>>41711975
>>41711980
>>41711985
I love you 1111 aka 15 but you sound like a Cali.
t. the anon formerly known as the IFOWONAIO anon

HydrusBeta
01/03/25(Fri)21:02:28 No.41800578

HydrusBeta 01/03/25(Fri)21:02:28 No.41800578

>>41799855
Vul trained a ton of singing and non-singing voices for RVC:
https://huggingface.co/therealvul/RVCv2/tree/main
For a list of all the RVC models that Hay Say knows about, along with links to the model files, see this JSON file:
https://github.com/hydrusbeta/hay_say_ui/blob/main/architectures/rvc/character_models.json

Anonymous
01/03/25(Fri)23:24:48 No.41801022

Anonymous 01/03/25(Fri)23:24:48 No.41801022

File: vidu--8-2024-12-14T06_58_00Z.webm (1.09 MB, 1280x720)

1.09 MB WEBM

>>41706417
I can't find the other thread so I'm sticking this here. Also this generation turned out unusually good, my other ones were not nearly as cool.

Anonymous
01/03/25(Fri)23:26:02 No.41801026

Anonymous 01/03/25(Fri)23:26:02 No.41801026

>>41797864
says file is corrupt? Is it just my Firefox?

Anonymous
01/04/25(Sat)04:06:04 No.41801634

Anonymous 01/04/25(Sat)04:06:04 No.41801634

What would ponies name their GPUs?

Anonymous
01/04/25(Sat)08:31:07 No.41802013

Anonymous 01/04/25(Sat)08:31:07 No.41802013

>>41801634
My best guess would be workhorse related terms. Something referring to heavy lifting, probably.

Anonymous
01/04/25(Sat)12:08:57 No.41802447

Anonymous 01/04/25(Sat)12:08:57 No.41802447

Up.

Anonymous
01/04/25(Sat)12:35:00 No.41802539

Anonymous 01/04/25(Sat)12:35:00 No.41802539

is vul the author of that song about jannies by ponka?
I don't remember where I found it but did he post it here?

Anonymous
01/04/25(Sat)12:56:19 No.41802627

Anonymous 01/04/25(Sat)12:56:19 No.41802627

>>41802539
>about jannies by ponka
Not really hitting any bells with this one, if you could post the file + filename maybe I could compare to my playlist (so far cant really find anything related). It possible it was posted in /create/ thread?

Anonymous
01/04/25(Sat)13:05:08 No.41802657

Anonymous 01/04/25(Sat)13:05:08 No.41802657

>>41802627
https://pomf2.lain.la/f/kg95divo.mp3
create is down right now..

Anonymous
01/04/25(Sat)19:49:08 No.41803856

Anonymous 01/04/25(Sat)19:49:08 No.41803856

>>41802657
uhhh, soory dude, I had not head this one before, so it must had been posted somewhere outside ppp.

Anonymous
01/04/25(Sat)21:39:14 No.41804141

Anonymous 01/04/25(Sat)21:39:14 No.41804141

>9

Anonymous
01/05/25(Sun)01:20:06 No.41804850

Anonymous 01/05/25(Sun)01:20:06 No.41804850

>10

Anonymous
01/05/25(Sun)10:20:39 No.41805738

Anonymous 01/05/25(Sun)10:20:39 No.41805738

>11

Anonymous
01/05/25(Sun)16:37:19 No.41806798

Anonymous 01/05/25(Sun)16:37:19 No.41806798

something straight from tardland, i mean, /sci/
https://github.com/odin-loki/Cell-AI

Anonymous
01/05/25(Sun)19:56:00 No.41807474

Anonymous 01/05/25(Sun)19:56:00 No.41807474

File: gross 1641605866481.png (249 KB, 1000x1000)

249 KB PNG

>>41806798
Im happy to see people trying alternative methods of running/creating ai stuff, but I would also love if they actually run some presentation on how it would work in the practice.

Anonymous
01/05/25(Sun)20:51:57 No.41807602

Anonymous 01/05/25(Sun)20:51:57 No.41807602

>>41802539
Yes, the original was linked at the bottom of the lyrics for "Word of the Nightmare"
https://ponepaste.org/10467

Anonymous
01/06/25(Mon)02:16:43 No.41808396

Anonymous 01/06/25(Mon)02:16:43 No.41808396

>>41805738
Nice try. The board didn't have a sticky in a while.

Anonymous
01/06/25(Mon)05:07:42 No.41808634

Anonymous 01/06/25(Mon)05:07:42 No.41808634

>>41807602
does he announce/post new songs here or in create?

Anonymous
01/06/25(Mon)09:52:14 No.41808989

Anonymous 01/06/25(Mon)09:52:14 No.41808989

File: ab with ab's bow.png (2.01 MB, 6493x10000)

2.01 MB PNG

>>41731435
Kinda bummed that with all those East Asian language opinions there's no Autumn Blaze voice so I could make her say
>"Konnichiwa dude!"
But then, she only has three minutes of dialogue minus the song.
>0hymn4

Anonymous
01/06/25(Mon)19:18:58 No.41810436

Anonymous 01/06/25(Mon)19:18:58 No.41810436

>>41804850

Anonymous
01/06/25(Mon)23:46:16 No.41811132

Anonymous 01/06/25(Mon)23:46:16 No.41811132

>>41810436

Anonymous
01/07/25(Tue)00:18:15 No.41811276

Anonymous 01/07/25(Tue)00:18:15 No.41811276

>>41808634
I've seen posts from him in both places in the past, but I don't know if he prefers one thread over the other

Anonymous
01/07/25(Tue)01:28:20 No.41811597

Anonymous 01/07/25(Tue)01:28:20 No.41811597

>>41801022
Marecelium vibes intensify.

Anonymous
01/07/25(Tue)04:11:38 No.41811978

Anonymous 01/07/25(Tue)04:11:38 No.41811978

>>41808634
He posts in both from what I've seen, though afaik he wasn't able to get as much out in 2024 as he wanted, hence why there wasn't a ton of new Vul tracks to find.

Anonymous
01/07/25(Tue)14:33:30 No.41813099

Anonymous 01/07/25(Tue)14:33:30 No.41813099

>>41807602
the link to that catbox song is broken,is catbox down?

Anonymous
01/07/25(Tue)14:35:31 No.41813106

Anonymous 01/07/25(Tue)14:35:31 No.41813106

>>41811597
Thank you for the (You) I actually really like Marecelium but this OC existed before Marecelium did, since it was created for a D&D campaign originally though not by much time, they were created the same year though

Anonymous
01/07/25(Tue)16:03:53 No.41813383

Anonymous 01/07/25(Tue)16:03:53 No.41813383

>run out of credit before I could get music ai model generate the correct type of song after just an hour of prompting
the music ai services are really fucking gay, can somebody please make a offline model that works on 6GB gpu?

Anonymous
01/07/25(Tue)23:48:56 No.41814697

Anonymous 01/07/25(Tue)23:48:56 No.41814697

>>41811132

Anonymous
01/08/25(Wed)07:29:52 No.41815434

Anonymous 01/08/25(Wed)07:29:52 No.41815434

File: 00003-1215387662.png (3.73 MB, 1536x1536)

3.73 MB PNG

Bump.

Anonymous
01/08/25(Wed)11:23:43 No.41815912

Anonymous 01/08/25(Wed)11:23:43 No.41815912

>>41813383
I hate this credit nonsense. There are too many artificial 'currencies' like that out there.

Anonymous
01/08/25(Wed)11:31:33 No.41815927

Anonymous 01/08/25(Wed)11:31:33 No.41815927

>>41815912
indeed , its all too kosher, lots of times I end up loosing the credits because their model fucked up and either swap sentences around, change the pitch/vocalist mid song or straight up started to pump out house/techno background noises when I specifically asked for a classical 18th century piano tunes.

Anonymous
01/08/25(Wed)14:37:21 No.41816401

Anonymous 01/08/25(Wed)14:37:21 No.41816401

File: _chemicals.jpg (70 KB, 480x608)

70 KB JPG

>https://files.catbox.moe/z98zyl.mp3
redoing the classic meme with gpt-sovits.

Anonymous
01/08/25(Wed)17:35:10 No.41817019

Anonymous 01/08/25(Wed)17:35:10 No.41817019

99

Anonymous
01/08/25(Wed)19:35:18 No.41817388

Anonymous 01/08/25(Wed)19:35:18 No.41817388

>>41817019

Anonymous
01/08/25(Wed)21:39:15 No.41817690

Anonymous 01/08/25(Wed)21:39:15 No.41817690

I need to figure out if this is worth paying $10 a month for.

https://suno.com/invite/@enrapturingelevatormusic755

Anonymous
01/08/25(Wed)22:19:49 No.41817774

Anonymous 01/08/25(Wed)22:19:49 No.41817774

Does anypony have advice for generating yelling/screaming voices? I'm trying to make a mod for Helldivers 2 where the helldivers' voices are replaced with mlp characters. I can get the calm voice lines to sound fine with just a couple steps through the models on haysay, but the screaming voice lines take all sorts of passes/retries/edits to even get close to "just ok."

As examples:
Calm line original: https://voca.ro/13diHsoRGyQ8
Calm line Rarity: https://voca.ro/1mH7Ei7a5K84

Yelling line original: https://voca.ro/1ljbWIVgWlhQ
Yelling line Rarity: https://voca.ro/1jk038XmYqHW

BGM
01/08/25(Wed)22:45:55 No.41817821

BGM 01/08/25(Wed)22:45:55 No.41817821

File: SCREAMING MARE3.png (131 KB, 453x373)

131 KB PNG

>>41817774
If there's a secret, I haven't found it yet. Best I could figure out is to just spam it with multiple (sometimes dozens) of takes, and splice together whatever parts sound good enough.
I will say that SoVitsSVC4.0 can sometimes translate certain parts of a performance better, I've gotten okay out of it with basic yelling. However I doubt you're ever gonna get the gritty, gravelly kind of deliveries you'd expect in a warzone out of the MLP characters, cause the data for it ain't there.

Anonymous
01/09/25(Thu)05:22:55 No.41818422

Anonymous 01/09/25(Thu)05:22:55 No.41818422

>>41817690
ehh, kind of but not really, when you get struck by inspiration (and do not mind spending half a day proompting) you can make pure gold like the /g/ Anon who made the "4am" in the first few days the server when online
>https://files.catbox.moe/0yeais.mp3
than there are times that even their credits run out and end up like this >>41813383 or worse, have entire days/weeks/months were the creative spell is broken and you just get your money siphoned just like with all the other modern services. But than again it's just 10$ so as long as you aren't struggling irl you probably will most likely not notice it.

Anonymous
01/09/25(Thu)07:57:04 No.41818594

Anonymous 01/09/25(Thu)07:57:04 No.41818594

File: IMG_7621.jpg (32 KB, 614x342)

32 KB JPG

>>41817821
I figured that might be the case. Although splicing together final takes didn’t occur to me, thanks! Mostly my concern was getting lines to be the proper cadence and hitting the notes. They just end up being either super short or sound flat (or sound like shit). It took me multiple edits to get her to say “Earth” for any good length of time in that last one.

I may not go through on this after all since that’s hundreds of voice lines on 4 characters and the proof of concept took too much effort on its own. My last hail mary would be making a script to automate batch-generation of lines so I can check 50 or so at once and use the best ones.

Anonymous
01/09/25(Thu)15:02:02 No.41819421

Anonymous 01/09/25(Thu)15:02:02 No.41819421

Goddamnit Twilight Sparkle get out of my unrelated songs:
https://voca.ro/1d4GeS8UkFJV
https://voca.ro/13LOt6zYasm1

https://www.youtube.com/watch?v=isMwV-EO1tI&list=PLXplGAZHGThcgC0USArWosEuvPY8InUu0&ab_channel=Domibombs

Does this count as Tara Strong or Rebecca Shoichet singing?

LatentThroat
01/09/25(Thu)16:54:37 No.41819812

LatentThroat 01/09/25(Thu)16:54:37 No.41819812

https://youtu.be/N6Piou4oYx8
https://openreview.net/forum?id=AL1fq05o7H
Mare, this is interesting. MAMBA-SoVITS, anypony?
This horsie wonders if MAMBA can be used in StyleTTS2 - the best and most stable TTS so far.

Did anycreature experiment with smashing GPT and StyleTTS together? If it there will be positive results, then maybe experiment on MAMBA and StyleTTS.

Anonymous
01/09/25(Thu)17:22:52 No.41819940

Anonymous 01/09/25(Thu)17:22:52 No.41819940

>stealing news from /g/
>VLC automatic subtitles generation and translation based on local and open source AI models running on your machine working offline, and supporting numerous languages!
>Demo can be found on our #CES2025 booth in Eureka Park.
VLChads can't stop winning:
https://x.com/videolan/status/1877072497146781946?t=jcarCV_7wCs11kDPWunwvg&s=19
This would be pretty cool if there was an api option to hook it up to tts and have mare voices translate random Japaniese vtubers.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.