/mlp/ - Pony Preservation Project (Thread 144) - Pony

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/mlp/ - Pony

Return Catalog Bottom Refresh

[Post a Reply]

Name
Spoiler?	[Spoiler?]
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File	[Spoiler?]
Please read the Rules and FAQ before posting.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
Pony Preservation Project (Thr(...) 06/06/24(Thu)21:35:07 No.41137243

File: Classic.png (996 KB, 2119x1500)

996 KB PNG

Pony Preservation Project (Thread 144) Anonymous 06/06/24(Thu)21:35:07 No.41137243

Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE

The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.

Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.

AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.

Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.

EQG and G5 are not welcome.

>Quick start guide:
docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.

>The main Doc:
docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
An in-depth repository of tutorials, resources and archives.

>Active tasks:
Research into animation AI
Research into pony image generation

>Latest developments:
GDrive clone of Master File now available >>37159549
SortAnon releases script to run TalkNet on Windows >>37299594
TalkNet training script >>37374942
GPT-J downloadable model >>37646318
FiMmicroSoL model >>38027533
Delta GPT-J notebook + tutorial >>38018428
New FiMfic GPT model >>38308297 >>38347556 >>38301248
FimFic dataset release >>38391839
Offline GPT-PNY >>38821349
FiMfic dataset >>38934474
SD weights >>38959367
SD low vram >>38959447
Huggingface SD: >>38979677
Colab SD >>38981735
NSFW Pony Model >>39114433
New DeltaVox >>39678806
so-vits-svt 4.0 >>39683876
so-vits-svt tutorial >>39692758
Hay Say >>39920556
Haysay on the web! >>40391443
SFX seperator >>40786997 >>40790270
Clipper finishes re-reviewing audio >>40999872
Synthbot updates GDrive >>41019588
Private "MareLoid" project >>40925332 >>40928583 >>40932952
VoiceCraft >>40938470 >>40953388
Fimfarch dataset >>41027971
5 years of PPP >>41029227
Audio re-up >>41100938

>The PoneAI drive, an archive for AI pony voice content:
drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp

>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
drive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx

>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.

Last Thread:
>>41064811

Anonymous
06/06/24(Thu)21:35:25 No.41137244

Anonymous 06/06/24(Thu)21:35:25 No.41137244

FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Main: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit

>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwq
How to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy

>Where can I find content made with the voice AI?
In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
And the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit

>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ4
2021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f
2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw5
2023 pony.tube/w/fVZShksjBbu6uT51DtvWWz

>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.

>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imitations of official voices?
No.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.

>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
pony.tube/w/mqJyvdgrpbWgZduz2cs1Cm

PPP Redubs:
pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97

Stream Premieres:
pony.tube/w/6cKnjJEZSCi3gsvrbATXnC
pony.tube/w/oNeBFMPiQKh93ePqTz1ns8

Anonymous
06/06/24(Thu)21:36:25 No.41137250

Anonymous 06/06/24(Thu)21:36:25 No.41137250

File: biganchor.jpg (161 KB, 640x640)

161 KB JPG

>>41137243
Anchor.

Anonymous
06/07/24(Fri)01:00:48 No.41137709

Anonymous 06/07/24(Fri)01:00:48 No.41137709

>>41137243
Pre-director OP image is a bad omen

Anonymous
06/07/24(Fri)07:57:22 No.41138126

Anonymous 06/07/24(Fri)07:57:22 No.41138126

I think mares are kinda cool

Anonymous
06/07/24(Fri)10:18:42 No.41138285

Anonymous 06/07/24(Fri)10:18:42 No.41138285

>>41138126
I would love to own a cute robot mare maid.

Anonymous
06/07/24(Fri)11:27:35 No.41138394

Anonymous 06/07/24(Fri)11:27:35 No.41138394

File: cocker spaniel dad.jpg (62 KB, 569x720)

62 KB JPG

>>41138285

Anonymous
06/07/24(Fri)11:37:38 No.41138412

Anonymous 06/07/24(Fri)11:37:38 No.41138412

>>41138042

AI will help porny games.

Anonymous
06/07/24(Fri)11:49:37 No.41138437

Anonymous 06/07/24(Fri)11:49:37 No.41138437

Style Mixture of Experts for Expressive Text-To-Speech Synthesis
https://arxiv.org/abs/2406.03637
>Recent advances in style transfer text-to-speech (TTS) have improved the expressiveness of synthesized speech. Despite these advancements, encoding stylistic information from diverse and unseen reference speech remains challenging. This paper introduces StyleMoE, an approach that divides the embedding space, modeled by the style encoder, into tractable subsets handled by style experts. The proposed method replaces the style encoder in a TTS system with a Mixture of Experts (MoE) layer. By utilizing a gating network to route reference speeches to different style experts, each expert specializes in aspects of the style space during optimization. Our experiments objectively and subjectively demonstrate the effectiveness of our proposed method in increasing the coverage of the style space for diverse and unseen styles. This approach can enhance the performance of existing state-of-the-art style transfer TTS models, marking the first study of MoE in style transfer TTS to our knowledge.
>We trained the StyleMoE framework on the ”100-clean” subset of the LibriTTS dataset [24], comprising 100 hours of multispeaker speech data. This data was downsampled to 16 kHz from its original 24 kHz.
https://stylemoe.github.io/styleMoE/
no model but no one would want it since it's a small academic proof of concept one that is useless for anything neat. to the point where its just hard to tell if the method is even worth scaling but well interesting idea.

Anonymous
06/07/24(Fri)12:39:25 No.41138517

Anonymous 06/07/24(Fri)12:39:25 No.41138517

File: rarara.png (302 KB, 1280x720)

302 KB PNG

>>41135966
>Tears for Fears - Everybody Wants to Rule the World (Rarity AI cover)
>https://www.youtube.com/watch?v=w6817OqPfYU

Finally finished it. I'm learning as I go, but I'm pretty content with this one. My goal is to make an AI cover for each of the mane six characters, so AJ and Rarara down, four more to go!

Anonymous
06/07/24(Fri)16:35:36 No.41138967

Anonymous 06/07/24(Fri)16:35:36 No.41138967

>>41138517
nice work , however the ai outputs have randomly changing reverb (im not talking about two voice overlaying each other) that comes harder in few moments than in other.

Anonymous
06/07/24(Fri)17:32:20 No.41139100

Anonymous 06/07/24(Fri)17:32:20 No.41139100

File: rara smile.jpg (445 KB, 1600x1600)

445 KB JPG

>>41138967
The reverb settings are constant throughout the song. What you're hearing is two voicetracks overlapping, a main/lead voice and a falsetto voice with a slightly different melody and a bit lower volume. During some parts of the song, only the lead voice plays and the falsetto is silent. Most noticable during the bridge.

Anonymous
06/07/24(Fri)17:33:33 No.41139105

Anonymous 06/07/24(Fri)17:33:33 No.41139105

>>41137243
>They changed the fucking OP pic
And the downfall of the PPP begins.

Anonymous
06/07/24(Fri)18:46:16 No.41139281

Anonymous 06/07/24(Fri)18:46:16 No.41139281

>>41139105
This was the original OP pic. It was originally changed to commemorate 15.ai going live, but since it's dead now, why not change it back?

Anonymous
06/07/24(Fri)22:59:55 No.41139847

Anonymous 06/07/24(Fri)22:59:55 No.41139847

File: 2014794__safe_artist-colo(...).jpg (340 KB, 1280x819)

340 KB JPG

>ywn sing your son and daughter a lullaby with your horsewife
why live?

Anonymous
06/08/24(Sat)03:26:12 No.41140522

Anonymous 06/08/24(Sat)03:26:12 No.41140522

File: pinkie pie comforts herself.gif (542 KB, 675x470)

542 KB GIF

>>41139847
Yeah that sucks, but if you think about it, we're all kind of lucky that we were born just at the right time to experience friendship is magic when we could most appreciate it. Plus, we get to enjoy the rise of AI during it's most fun times. The early years when it's still a free for all rechnology available to anyone, and it's not yet regulated to hell and back and monopolized by three or four big corporations who will sterilize the fun out of it for profitability.

Anonymous
06/08/24(Sat)11:47:57 No.41141471

Anonymous 06/08/24(Sat)11:47:57 No.41141471

Bump.

Anonymous
06/09/24(Sun)03:02:00 No.41143349

Anonymous 06/09/24(Sun)03:02:00 No.41143349

Would you all be okay if I posted a link to this thread to a discord server I might make.

I am just trying to say something instead of bump.

Anonymous
06/09/24(Sun)03:18:49 No.41143374

Anonymous 06/09/24(Sun)03:18:49 No.41143374

File: no.png (173 KB, 800x450)

173 KB PNG

>>41143349

Anonymous
06/09/24(Sun)05:47:15 No.41143588

Anonymous 06/09/24(Sun)05:47:15 No.41143588

>>41064967
>With a laughter's ring (aha!), Pinkie's in line
>Throws a party for two, with Anon, it's divine
Huh. https://www.youtube.com/watch?v=ux8XqsIA9s8

Anonymous
06/09/24(Sun)16:39:21 No.41144680

Anonymous 06/09/24(Sun)16:39:21 No.41144680

>>41144337
other than haysay there really isn't good way to use it. or train it

Anonymous
06/09/24(Sun)16:55:26 No.41144723

Anonymous 06/09/24(Sun)16:55:26 No.41144723

https://youtu.be/ORiIsU2D0Sk?si=zNuCQ_SVYiRO5APZ

So... this is a thing. Before you rush to get it though, it's only available through a closed beta, and they have to check who you are before you get a copy. Unfortunately, I never got a response back, but eh, maybe someone here will have better luck?

Anonymous
06/09/24(Sun)17:44:23 No.41144830

Anonymous 06/09/24(Sun)17:44:23 No.41144830

>>41144723
this looks neat, if this ever becomes open to the public I could easily see a great way to train with the emotion tags as their own vertex bits to better control the outputs in a way that none other speech tools have ever worked before.

Anonymous
06/09/24(Sun)19:02:33 No.41144976

Anonymous 06/09/24(Sun)19:02:33 No.41144976

File: Instrumental Ponk.png (134 KB, 415x480)

134 KB PNG

"Smile" song (a different one)
https://files.catbox.moe/sr2rr0.mp3

Anonymous
06/09/24(Sun)21:19:46 No.41145296

Anonymous 06/09/24(Sun)21:19:46 No.41145296

File: large.png (337 KB, 1280x861)

337 KB PNG

>>41144976
Based and World War I-pilled.

Anonymous
06/09/24(Sun)22:32:35 No.41145405

Anonymous 06/09/24(Sun)22:32:35 No.41145405

File: errerr.jpg (85 KB, 1274x393)

85 KB JPG

i can't believe i never thought to post about my stuff here
i know /CHAG/ deals with text gen models and all that but im doing more technical stuff with local text gen models
see, im trying to get Qlora.py to work but all the tutorials i've watched go through it with ease while im getting an error thrown at me or just flat out don't even give a tutorial
currently i have all the dependencies needed to run it but I'm stuck on a pic related
although my instinct was to search for a solution online usuing the actual error code, i've come up with nothing to actually use in my situation
im hoping that once i figure this shit out, i can do some training involving the chat logs where i augment them in multiple ways as to prevent over fitting, my goal is to attempt to make a permanent memory of sorts, an anon on /g/ already tried this but with regular loras and with completely raw chat logs and it over fit like crazy
but since augmenting (meaning to replace words with synonyms, or reword something while retaining the same exact meaning) prevents this, im wondering if i could pull of some sort of long term memory with this thing

Koster184
06/10/24(Mon)05:09:09 No.41145926

Koster184 06/10/24(Mon)05:09:09 No.41145926

Hey guys, does anyone know the author? Did VUL really do this? I couldn't find this track anywhere. Gimme a link, pls?
https://files.catbox.moe/0twb8w.flac

Anonymous
06/10/24(Mon)06:07:42 No.41146004

Anonymous 06/10/24(Mon)06:07:42 No.41146004

>>41145405
If I understand you correctly, you want to achieve a form of long term memory by training a lora on a lower precision model, which is the definition of qlora, and train that qlora as the conversation goes.
Long story short, this is unlikely to work because:
1 It's unlikely to bring up actual facts "from the past" unless the lora is very high rank, but even like this it will indeed overfit and ruin model's performance.
2 This is a very obvious approach and nobody has published a research about it as of today, which means that noone managed to get it to work
3 Lora training takes quite a lot of time, from personal experience a 8 billion q5 model needed about 6 hours on a 3090 for my use case.

If you still want to train a lora but aren't in experienced in tech, try oobabooga webui, it has a lora training option.

Anonymous
06/10/24(Mon)06:10:39 No.41146009

Anonymous 06/10/24(Mon)06:10:39 No.41146009

>>41146004
P.S. /lmg/ and /chag/ are coomers, never take advice from them or assume they have any understanding beyond basedboy youtube tutorials.

Anonymous
06/10/24(Mon)06:45:39 No.41146053

Anonymous 06/10/24(Mon)06:45:39 No.41146053

>>41145405
It seems like your GPU is not detected.
Double-check that you have installed PyTorch with CUDA enabled and not the CPU version.
And I disagree with >>41146009, you can get basic answers like >>41146004 on /lmg/.

Anonymous
06/10/24(Mon)07:58:19 No.41146186

Anonymous 06/10/24(Mon)07:58:19 No.41146186

>>41146053
Ok maybe, haven't visited /lmg/ it in a while

Anonymous
06/10/24(Mon)07:58:55 No.41146187

Anonymous 06/10/24(Mon)07:58:55 No.41146187

>>41146053
>pytorch with cuda
i specifically remember downloading the non cuda version
thanks anon ill do it when i get home

Anonymous
06/10/24(Mon)08:17:16 No.41146226

Anonymous 06/10/24(Mon)08:17:16 No.41146226

File: 2329420__safe_screencap_p(...).gif (736 KB, 160x176)

736 KB GIF

https://vocaroo.com/1oyZgeUTaR6l
https://pomf2.lain.la/f/m4is43un.mp3

Udio
Journey to Dodge Junction

Driving through the desert, under twilight stars
On a quest for relief, it's takin' us so far
Through the tumbleweeds, Dodge Junction's close
No more rest stops, where’ll this story go?
It's the only toilet, in Equestria's land
In Dodge Junction, where the cowboys stand

Anonymous
06/10/24(Mon)08:44:07 No.41146274

Anonymous 06/10/24(Mon)08:44:07 No.41146274

>>41145926
The song is"Chant of Selflessness" by 4everfreebrony. This is a remix of that song with a Rarity cover. I don't know who created the Remix.

Anonymous
06/10/24(Mon)10:09:24 No.41146373

Anonymous 06/10/24(Mon)10:09:24 No.41146373

File: lyra heartstrings (mlp), (...).png (1.15 MB, 1024x1024)

1.15 MB PNG

>>41136413
At this point I'd settle for just the TTS-to-RVC aspect of it. Freedom of mare speech without the restrictions of existing vocal bases as reference should always be sought; especially with RVC remaining as powerful as it is.

Speaking of, has there been any recent developments in technologies rivaling or surpassing RVC?

Anonymous
06/10/24(Mon)11:26:41 No.41146501

Anonymous 06/10/24(Mon)11:26:41 No.41146501

>>41146009
it just a bunch of avatarfags and nothing more

Anonymous
06/10/24(Mon)11:31:16 No.41146507

Anonymous 06/10/24(Mon)11:31:16 No.41146507

>>41146373
cursed numget

Anonymous
06/10/24(Mon)11:39:25 No.41146529

Anonymous 06/10/24(Mon)11:39:25 No.41146529

>>41146501
/lmg/ is for debating the existence of Hatsune Miku's penis, not developing and discussing Local Language Models.

Anonymous
06/10/24(Mon)11:53:50 No.41146566

Anonymous 06/10/24(Mon)11:53:50 No.41146566

Other AI generals bad. This AI general good.

Anonymous
06/10/24(Mon)11:58:59 No.41146580

Anonymous 06/10/24(Mon)11:58:59 No.41146580

File: 1718034693358596.png (52 KB, 305x672)

52 KB PNG

>>41146566
It's an actual cross-posting schizo.
You can see him right now trying this shit on /lmg/
>>>/g/100898357
Here are all the generals he admitted to baiting in.

Anonymous
06/10/24(Mon)12:40:29 No.41146670

Anonymous 06/10/24(Mon)12:40:29 No.41146670

>>41146004
>3 Lora training takes quite a lot of time, from personal experience a 8 billion q5 model needed about 6 hours on a 3090 for my use case.
How many tokens was this? I'm getting about 3M tokens per hour for qlora training on a 4090, which is supposed to be 2x the flops of a 3090

Anonymous
06/10/24(Mon)13:14:20 No.41146751

Anonymous 06/10/24(Mon)13:14:20 No.41146751

>>41146670
My dataset was ~20 mb. But it's not just the length that determines the time. Lora's affected layers, rank, sequence length and desired loss affect the convergence time much more. And the right stopping loss is hard to determine because there is no criteria for a "trained" lora, you want to find the balance between knowledge and overfitting.

Anonymous
06/10/24(Mon)13:30:30 No.41146786

Anonymous 06/10/24(Mon)13:30:30 No.41146786

Prompt-guided Precise Audio Editing with Diffusion Models
https://arxiv.org/abs/2406.04350
>Aligning large language models (LLMs) with human preferences becomes a key component to obtaining state-of-the-art performance, but it yields a huge cost to construct a large human-annotated preference dataset. To tackle this problem, we propose a new framework that boosts the alignment of LLMs through Self-generated Preference data (Selfie) using only a very small amount of human-annotated preference data. Our key idea is leveraging the human prior knowledge within the small (seed) data and progressively improving the alignment of LLM, by iteratively generating the responses and learning from them with the self-annotated preference data. To be specific, we propose to derive the preference label from the logits of LLM to explicitly extract the model's inherent preference. Compared to the previous approaches using external reward models or implicit in-context learning, we observe that the proposed approach is significantly more effective. In addition, we introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data. Our experimental results demonstrate that the proposed framework significantly boosts the alignment of LLMs. For example, we achieve superior alignment performance on AlpacaEval 2.0 with only 3.3\% of the ground-truth preference labels in the Ultrafeedback data compared to the cases using the entire data or state-of-the-art baselines.
no weights but they explained how they built off of Tango in the appendix

Anonymous
06/10/24(Mon)14:11:39 No.41146884

Anonymous 06/10/24(Mon)14:11:39 No.41146884

File: 1472317353606.png (943 KB, 6000x8209)

943 KB PNG

>>41146004
first off i went to Qlora because i had so much difficulty with oobabooga webui lora training because at least then i could see the code and use it to search for a solution
>>41146053
ACTUALLY, the issue was not pytorch but rather that bitsandbytes had the cpu version instead of cuda
this did fix another side error but i still get the original n_gpus error in my first post here >>41145405
>>41146187
and also when it comes to this post i misremembered bitsandbytes as pytorch

im wondering if there's a variable i have to change or whatever, fucking wish there were coherent Qlora guides out there, then maybe i would learn this and be able to apply that knowledge elsewhere once i do more stuff like this

Anonymous
06/10/24(Mon)14:27:24 No.41146921

Anonymous 06/10/24(Mon)14:27:24 No.41146921

>>41146884
So you do have pytorch + Cuda?

Anonymous
06/10/24(Mon)15:03:43 No.41147007

Anonymous 06/10/24(Mon)15:03:43 No.41147007

>>41146921
I can tell I'm being retarded so I'm gonna try and spell out exactly what is going on so there's no confusion
>do you have pytorch + cuda
No, I only have REGULAR, non-cuda, torch (NOT pytorch)

i feel like the main reason im having difficulty with this is that i could see exactly what packages were imported in the code so all I needed to do is import those packages explicitly mentioned in the code after the word 'import'
torch is mentioned but not pytorch, after you mentioned trying a cuda version of pytorch i searched the pycharm packages search thing, I could not find any variation of pytorch+cuda, bitsandbytes had a cuda version but i could not find one for pytorch
>pic related
the program requires torch not pytorch, and yes, i tried finding a cuda version of torch as well, didn't get anything
sorry if this is confusing, if you need me to answer anything else i can
was just attempting to get as much info out as I could so you have a better idea of what's going on
pissed and confused because pytorch doesn't get imported in the code but it's mentioned 9 times within the code
im not sure what to make of that
here's the actual code
>https://github.com/artidoro/qlora
im looking at the literal individual qlora.py file from this github page

Anonymous
06/10/24(Mon)15:08:03 No.41147019

Anonymous 06/10/24(Mon)15:08:03 No.41147019

>>41144830
I think this works more like a way to immediately get results without training. So for example, if one fed like a 10-30 second thing of Twilight to Vocoflex, it would convert the voice to data and allow it to be used as a singing voice... similar to RVC. It seems like their website says that it's optimized for speaking voices, but we easily have singing voices of mares, amongst others. I also noticed an EULA thing on their site, which you can read here:

https://dreamtonics.com/vocoflex-eula/

Anything to note?

Anonymous
06/10/24(Mon)15:10:29 No.41147022

Anonymous 06/10/24(Mon)15:10:29 No.41147022

>>41147007
In the code:
>n_gpus = torch.cuda.device_count()
You need pytorch with cuda for that.

Anonymous
06/10/24(Mon)15:20:35 No.41147046

Anonymous 06/10/24(Mon)15:20:35 No.41147046

>>41147007
>No, I only have REGULAR, non-cuda, torch (NOT pytorch)
"PyTorch" and "torch" are the same thing, it's just that pytorch is imported as torch.
>the program requires torch not pytorch, and yes, i tried finding a cuda version of torch as well, didn't get anything
Whenever you install torch you should follow the instructions on the pytorch website https://pytorch.org/get-started/locally/ because the CUDA builds of pytorch are from a special package index.

Anonymous
06/10/24(Mon)15:22:44 No.41147051

Anonymous 06/10/24(Mon)15:22:44 No.41147051

File: Screenshot 2024-06-10 122027.jpg (47 KB, 849x405)

47 KB JPG

>>41147022
well, since i cant get it directly thru pycharm, would
>>41147046
>link
well, i was typing this out right as the thread refreshed, yeah, if that's the case then all I need to do is somehow get those files and throw em into the right interpreter on pycharm and im golden i hope

Anonymous
06/10/24(Mon)15:25:51 No.41147062

Anonymous 06/10/24(Mon)15:25:51 No.41147062

>>41145926
Yes, I did. It's not "officially released" right now (aside from being posted on /create/).
>>41144337
Expensive to train.

Anonymous
06/10/24(Mon)15:45:58 No.41147127

Anonymous 06/10/24(Mon)15:45:58 No.41147127

>>41147019
Correction, optimized for SINGING voices, not speaking.

Anonymous
06/10/24(Mon)17:19:37 No.41147327

Anonymous 06/10/24(Mon)17:19:37 No.41147327

>>41147051
You may want to start using anaconda for envs

Anonymous
06/10/24(Mon)18:59:25 No.41147565

Anonymous 06/10/24(Mon)18:59:25 No.41147565

>>41146884
Try fsdp qdora
https://archive.is/IbOaf

Anonymous
06/10/24(Mon)20:00:13 No.41147674

Anonymous 06/10/24(Mon)20:00:13 No.41147674

File: 1681604662915387.gif (330 KB, 500x281)

330 KB GIF

>>41147327
>using anaconda for enviroment
how would i go about doing this? I feel like the interface of pycharm actually allows me to click stuff and is especially helpful for dealing with the dependancies/'imports' since it allows me to search for them through it's little search engine thing
also
>anaconda
had some good experiences with anaconda, forgot what they were but all I remember is anaconda saved my ass many times a few years ago for project shit
>>41147565
god that is fucking cool
once i figure out how to do shit on my own ill probably try using it
>>41147614
>nvidia container
tf is that?

Anonymous
06/11/24(Tue)13:50:13 No.41149370

Anonymous 06/11/24(Tue)13:50:13 No.41149370

Why posts gone

Anonymous
06/11/24(Tue)18:53:12 No.41149886

Anonymous 06/11/24(Tue)18:53:12 No.41149886

>https://writingpeterkaptein.wordpress.com/2024/04/18/blender-stable-diffusion-workflow/
I see that there are people slow making attempts at integrating the SD with the Blender for some nicer img2img tools. I don't have powerful enough gpu but I have a feeling that one could run both SD and Blender, render some basic low poly models and convert those images into whatever style one would wish to create (FiM style, a specific pony artist, oil paints or whatever art style once fancy).
Im guessing the consistency between the renders is something that will need to be fixed to prevent objects randomly morphing in and out of existence in the background, but fuck it, I feel like we are getting closer to get a real "computer made" animation than all those years when the project started.

Anonymous
06/11/24(Tue)18:59:59 No.41149908

Anonymous 06/11/24(Tue)18:59:59 No.41149908

Anyone got a download for curated voice samples for the characters?
I'm training my own local voice model with Tortoise TTS
recalled that at some point there was some sort of effort to get voice samples here

Anonymous
06/11/24(Tue)19:15:27 No.41149937

Anonymous 06/11/24(Tue)19:15:27 No.41149937

>>41149908
>curated voice samples
do you mean the mega links in OP?

Anonymous
06/11/24(Tue)19:44:39 No.41149993

Anonymous 06/11/24(Tue)19:44:39 No.41149993

>>41149937
oh shoot, don't know how I missed that, thanks anon

Anonymous
06/12/24(Wed)07:18:05 No.41151066

Anonymous 06/12/24(Wed)07:18:05 No.41151066

>>41149993
Yeah, it happens. Btw, how's the training going?

Anonymous
06/12/24(Wed)10:01:42 No.41151231

Anonymous 06/12/24(Wed)10:01:42 No.41151231

>>41149886
You can just sketch some shapes in paint and get a similar result, unless you want to use stuff like depth map conditioning instead of image to image, for which you would need a 3d scene.

Anonymous
06/12/24(Wed)10:06:38 No.41151241

Anonymous 06/12/24(Wed)10:06:38 No.41151241

>>41149908
post samples. i pay like 100$ a month for eleven labs. i mean its great quality. and only a few min of audio for each voice. but i HATE being tied to the cloud. but tortoise just wouldnt work for me.

Anonymous
06/12/24(Wed)13:29:24 No.41151602

Anonymous 06/12/24(Wed)13:29:24 No.41151602

>>41151231
yes, however instead of drawing everything by hand, the above way would allow to just grab 24fps low quality images and turn them into actual mare animation.

Anonymous
06/12/24(Wed)15:29:13 No.41151869

Anonymous 06/12/24(Wed)15:29:13 No.41151869

>>41151602
From what I've seen the current checkpoints that can do temporally coherent img to img are only acceptable for real life footage and are sloppy. So you can't use ponydiffusion for it, for example.

Anonymous
06/12/24(Wed)18:13:18 No.41152188

Anonymous 06/12/24(Wed)18:13:18 No.41152188

>>41151869
>only acceptable for real life footage and are sloppy
its a start, even if it will look like shit there is always possibility that someone will come along and either fix it up or get annoyed enough to make a new tech from scratch just to dab on the coders before him.
Don't make dig out the clips from way back when there was only tacotron2 around.

Anonymous
06/12/24(Wed)21:55:01 No.41152681

Anonymous 06/12/24(Wed)21:55:01 No.41152681

bmp

Anonymous
06/12/24(Wed)23:19:46 No.41152843

Anonymous 06/12/24(Wed)23:19:46 No.41152843

File: 9864174090-23864.jpg (74 KB, 1810x837)

74 KB JPG

>>41145405
Still experiencing the issue i posted in this post here
even though i installed pytorch with cuda (pix related as proof) im still getting the same error as the picture in >>41145405
I can't imagine what I'm doing wrong considering I now have all the dependencies I can think of
although i will mention the instance of (+cpu) is concerning, i did triple check to make sure that i installed the actual cuda pytorch
im losing my mind at this shit, idk if my computer files are so messy that something along the lines is getting fucked but i have everything in there, i can't imagine what I'm doing wrong

Anonymous
06/13/24(Thu)00:37:18 No.41152990

Anonymous 06/13/24(Thu)00:37:18 No.41152990

>>41137243
What are currently the best free options we have if I want an entire book read to me (in German)?

Anonymous
06/13/24(Thu)01:00:26 No.41153034

Anonymous 06/13/24(Thu)01:00:26 No.41153034

>>41152990
in terms of TTS?
i would imagine local Hay say
just take the entire book (assuming it's already in german) then drop it into the speech thing and wait a while for it to load (its gonna be fackin huge since its a full book) then boom, download and u have der deutchen flutten shagen audio booken

Anonymous
06/13/24(Thu)03:48:27 No.41153274

Anonymous 06/13/24(Thu)03:48:27 No.41153274

>>41152843
Just make a new environment and run pip install -r requirements.txt for whatever repo you want to use. You can ask chatgpt all this and it will get you through it faster than this thread.

Anonymous
06/13/24(Thu)04:04:24 No.41153301

Anonymous 06/13/24(Thu)04:04:24 No.41153301

>>41152843
You probably still have the cpu version installed then? You should uninstall it

Anonymous
06/13/24(Thu)08:50:32 No.41153652

Anonymous 06/13/24(Thu)08:50:32 No.41153652

>>41153301
+1, there is always some combo of pytorch cuda-tools and the other dependencies hell you need to look out for

Anonymous
06/13/24(Thu)09:42:03 No.41153740

Anonymous 06/13/24(Thu)09:42:03 No.41153740

>>41153034
Big thank, anon

Anonymous
06/13/24(Thu)12:15:31 No.41153953

Anonymous 06/13/24(Thu)12:15:31 No.41153953

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
https://arxiv.org/abs/2406.07969
>We introduce LibriTTS-P, a new corpus based on LibriTTS-R that includes utterance-level descriptions (i.e., prompts) of speaking style and speaker-level prompts of speaker characteristics. We employ a hybrid approach to construct prompt annotations: (1) manual annotations that capture human perceptions of speaker characteristics and (2) synthetic annotations on speaking style. Compared to existing English prompt datasets, our corpus provides more diverse prompt annotations for all speakers of LibriTTS-R. Experimental results for prompt-based controllable TTS demonstrate that the TTS model trained with LibriTTS-P achieves higher naturalness than the model using the conventional dataset. Furthermore, the results for style captioning tasks show that the model utilizing LibriTTS-P generates 2.5 times more accurate words than the model using a conventional dataset.
https://github.com/line/LibriTTS-P
might give some guidance if anyone here tries to caption the existing voiced data

Anonymous
06/13/24(Thu)15:16:59 No.41154387

Anonymous 06/13/24(Thu)15:16:59 No.41154387

>>41153953
could be useful, if there was a way to get this custom trained, than even a bad emotional audio output could be smoothed out in rvc/sovits at 10% likeness of the character model trained on the same clips.

Anonymous
06/13/24(Thu)17:41:24 No.41154796

Anonymous 06/13/24(Thu)17:41:24 No.41154796

>>41153740
updates on results?

Anonymous
06/13/24(Thu)21:23:16 No.41155310

Anonymous 06/13/24(Thu)21:23:16 No.41155310

Up.

Anonymous
06/14/24(Fri)04:08:24 No.41155839

Anonymous 06/14/24(Fri)04:08:24 No.41155839

>>41155310

Anonymous
06/14/24(Fri)11:48:44 No.41156377

Anonymous 06/14/24(Fri)11:48:44 No.41156377

>need to learn how to build python wheel to solve the dependency problem
fug

Anonymous
06/14/24(Fri)17:03:06 No.41157006

Anonymous 06/14/24(Fri)17:03:06 No.41157006

File: OIG3.stWn5QRPqaxHqS8tzhK1.jpg (87 KB, 1024x1024)

87 KB JPG

Anonymous
06/14/24(Fri)22:41:11 No.41157602

Anonymous 06/14/24(Fri)22:41:11 No.41157602

>>41157006
>Lewding the ai
Oh dear.

Anonymous
06/15/24(Sat)18:38:26 No.41159478

Anonymous 06/15/24(Sat)18:38:26 No.41159478

>>41157006
boop

Anonymous
06/16/24(Sun)02:17:41 No.41160279

Anonymous 06/16/24(Sun)02:17:41 No.41160279

>>41157006
Touching AI mares without permission.

Anonymous
06/16/24(Sun)06:29:59 No.41160613

Anonymous 06/16/24(Sun)06:29:59 No.41160613

>>41157006
Holy shit, Anon. Spoiler this image!

Anonymous
06/16/24(Sun)20:27:04 No.41162218

Anonymous 06/16/24(Sun)20:27:04 No.41162218

File: RariBump 1.jpg (122 KB, 1024x1024)

122 KB JPG

https://files.catbox.moe/utlyfw.wav
I miss the days when everyone was shitposting in these threads with whatever voice AIs were available, making full on audio skits or getting the characters to say funny fucked up shit

Anonymous
06/16/24(Sun)20:57:58 No.41162272

Anonymous 06/16/24(Sun)20:57:58 No.41162272

>>41162218
I have started writing my ideas in a notebook, so one day when im not feeling like having my brain liquified from wageing all day long I would love to sit down and return to the shitpost roots.

Anonymous
06/17/24(Mon)02:11:03 No.41162703

Anonymous 06/17/24(Mon)02:11:03 No.41162703

>>41162218

Anonymous
06/17/24(Mon)09:53:10 No.41163191

Anonymous 06/17/24(Mon)09:53:10 No.41163191

>>41162218
cure rara

Anonymous
06/17/24(Mon)14:34:59 No.41163720

Anonymous 06/17/24(Mon)14:34:59 No.41163720

File: 1694811827493421.png (141 KB, 444x465)

141 KB PNG

Anonymous
06/17/24(Mon)17:04:14 No.41164125

Anonymous 06/17/24(Mon)17:04:14 No.41164125

File: sad smoke 00009-572671836.png (572 KB, 848x512)

572 KB PNG

>>41162218
There are so much cool ai shit popping out but I still haven't got a chance to upgrade my old pc, so it's a bit of abstract kind of pain to know what could be made but have no means to do it.
>>41163720
I feel ya.

Anonymous
06/17/24(Mon)17:46:20 No.41164251

Anonymous 06/17/24(Mon)17:46:20 No.41164251

File: 1578359950666.jpg (182 KB, 1920x1080)

182 KB JPG

https://vocaroo.com/1ewMMKjktre2

Udio
Celestia's Light

Never thought I'd see, oh, such a sight
Sunbutt shining in the day and night
Equestria, you make me want to cheer
Celestia's light, oh dear
Every pony, dance with glee tonight
That's bright, bright, bright, bright

Anonymous
06/17/24(Mon)18:19:44 No.41164322

Anonymous 06/17/24(Mon)18:19:44 No.41164322

File: Sad Rarity Hours.png (330 KB, 550x650)

330 KB PNG

>>41060498
>https://vocaroo.com/1ib4ysUKOcv8
Udio anon please come the fuck back and finish this I implore you
>>41060678
>https://voca.ro/16cfcyvh3dzA
Also more of this

Anonymous
06/17/24(Mon)18:31:06 No.41164348

Anonymous 06/17/24(Mon)18:31:06 No.41164348

>>41162218
I know that feel.
https://u.smutty.horse/lywizqhkwgv.wav
https://u.smutty.horse/lybxxbvphoc.wav

I would be making more but it's not quite as easy these days as it once was with 15.ai. The models on HaySay are capable, but only really if you have a voice that works well as a reference, which I don't. Still possible though with perseverance, I've nearly got a new thing finished.

Anonymous
06/17/24(Mon)21:11:26 No.41164790

Anonymous 06/17/24(Mon)21:11:26 No.41164790

I tried to write a song for my green with Suno and haysay. I don't think I will end up using it, but it was fun.

https://files.catbox.moe/zsyiic.mp3

>A lifetime spent jacking it to ponies,
>But only loneliness truly knows me.
>Empty laps where a mare should be,
>Waifu pillow my sole company.
>Copium injected, straight to the vein -
>"Your pony waifu will come!" - Insane!

>Fate is a bitch with a donkey kick,
>She laughs as wizards die loveless pricks.
>But wait - what light through window breaks?
>KA-FUCKING-BOOM, the sky it quakes!
>A motherfucking SONIC RAINBOOM,
>Wiping its ass with my cynical gloom!

>A bridge of glory spans to the stars,
>Powered by autism from /mlp/ tards.
>Down they prance with heavenly glow,
>I'm tripping balls, this can't be so!
>Then I see her - oh fuck me blind!
>Perfection of plot and grace combined.

>Marshmallow fur and indigo locks,
>T H I C C thighs to crush my cock.
>"Well I say, this realm seems a tad mundane."
>Posh ASMR floods my brain.
>It's her - RARITY, waifu supreme!
>Element of Generosity and wet dreams!

>The prophecies true, the shitposts real,
>Best pony arrived, my heart to steal.
>"Yo Rares, check out that hooman dude!"
>Rainbow Dash keeping shit rude.
>"Hush, the poor dear's overwhelmed, I fear."
>Rarity drifts over, poise crystal clear.

>Sweet Celestia, dat swaying hip,
>Class, sass and ass make the perfect ship.
>"Are you quite alright, darling? So sorry for the fright."
>Spaghetti spills forth, a pathetic sight.
"i want snugle ur fluf" True autism prevails.
>But she smiles! "But of course, darling! Tis destiny's tale!"

Anonymous
06/18/24(Tue)02:27:22 No.41165374

Anonymous 06/18/24(Tue)02:27:22 No.41165374

Up from 10.

Anonymous
06/18/24(Tue)02:36:27 No.41165392

Anonymous 06/18/24(Tue)02:36:27 No.41165392

MARS5: A novel speech model for insane prosody
https://github.com/Camb-ai/MARS5-TTS
>This is the repo for the MARS5 English speech model (TTS) from CAMB.AI. The model follows a two-stage AR-NAR pipeline with a distinctively novel NAR component (see more info in the Architecture). With just 5 seconds of audio and a snippet of text, MARS5 can generate speech even for prosodically hard and diverse scenarios like sports commentary, anime and more.

Anonymous
06/18/24(Tue)03:30:58 No.41165469

Anonymous 06/18/24(Tue)03:30:58 No.41165469

DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer
https://arxiv.org/abs/2406.11427
>Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models for TTS. In this work, we present an efficient and scalable Diffusion Transformer (DiT) that utilizes off-the-shelf pre-trained text and speech encoders. Our approach addresses the challenge of text-speech alignment via cross-attention mechanisms with the prediction of the total length of speech representations. To achieve this, we enhance the DiT architecture to suit TTS and improve the alignment by incorporating semantic guidance into the latent space of speech. We scale the training dataset and the model size to 82K hours and 790M parameters, respectively. Our extensive experiments demonstrate that the large-scale diffusion model for TTS without domain-specific modeling not only simplifies the training pipeline but also yields superior or comparable zero-shot performance to state-of-the-art TTS models in terms of naturalness, intelligibility, and speaker similarity.
https://ditto-tts.github.io/
they have celeb clone examples on their site tha sound pretty good. no weights but the paper has some good info on how they trained it. by KRAFTON which turns out to be the pubg devs so that probably explains why

Anonymous
06/18/24(Tue)03:47:54 No.41165480

Anonymous 06/18/24(Tue)03:47:54 No.41165480

Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
https://arxiv.org/abs/2406.10514
>Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered by professional voice actors. Inspired by this, we explore expressive speech synthesis through the lens of articulatory phonetics. Specifically, we define a framework with three dimensions: Glottalization, Tenseness, and Resonance (GTR), to guide the synthesis at the voice production level. With this framework, we record a high-quality speech dataset named GTR-Voice, featuring 20 Chinese sentences articulated by a professional voice actor across 125 distinct GTR combinations. We verify the framework and GTR annotations through automatic classification and listening tests, and demonstrate precise controllability along the GTR dimensions on two fine-tuned expressive TTS models. We open-source the dataset and TTS models.
https://demo.gtr-voice.com/
website (and thus the links to the code/models) doesn't work yet. chinese language model but the idea was interesting

Anonymous
06/18/24(Tue)11:03:31 No.41165888

Anonymous 06/18/24(Tue)11:03:31 No.41165888

>>41165374
Minus one.

Anonymous
06/18/24(Tue)11:58:57 No.41165965

Anonymous 06/18/24(Tue)11:58:57 No.41165965

File: image_2024-06-18_115854324.png (227 KB, 615x410)

227 KB PNG

Hey guys, I have this clip for the Antithology that will serve as the intro. It's a parody of Top Gear, but I need it to be narrated instead by Jeremy Clarkson. I have no idea how to train voice data. Is there anyone willing to lend a helping hand to get this done?

https://files.catbox.moe/tlm61a.mp4

Anonymous
06/18/24(Tue)12:13:40 No.41165995

Anonymous 06/18/24(Tue)12:13:40 No.41165995

>>41165965
>https://applio.org/models/1165806674146230313

Anonymous
06/18/24(Tue)12:20:07 No.41166007

Anonymous 06/18/24(Tue)12:20:07 No.41166007

>>41165995
That's great! Thank you! And so I just extract the archive into the model folder for so-vits?

Anonymous
06/18/24(Tue)12:21:05 No.41166009

Anonymous 06/18/24(Tue)12:21:05 No.41166009

>>41164790
I love it

Anonymous
06/18/24(Tue)13:04:35 No.41166074

Anonymous 06/18/24(Tue)13:04:35 No.41166074

>>41166007
nta, it looks like its a rvc v2 model.
https://vocaroo.com/1Mccz5BcQpeU
From the sound of it it was trained on some yt clips from Top Gear, from all that artificial engine noise going on in the background.
If you want/need I could retrain a new model for you.

Anonymous
06/18/24(Tue)13:06:13 No.41166077

Anonymous 06/18/24(Tue)13:06:13 No.41166077

>>41166074
I would very much appreciate that, Anon.

Anonymous
06/18/24(Tue)14:08:55 No.41166199

Anonymous 06/18/24(Tue)14:08:55 No.41166199

File: rara wheels.png (339 KB, 912x896)

339 KB PNG

>Tonoight on Bottom Gear
>https://vocaroo.com/11P8h43QdEpY

Anonymous
06/18/24(Tue)14:21:13 No.41166216

Anonymous 06/18/24(Tue)14:21:13 No.41166216

>>41166074
Is this the same model?
https://voice-models.com/model/1n44tDNY1Ve

Anonymous
06/18/24(Tue)14:24:27 No.41166221

Anonymous 06/18/24(Tue)14:24:27 No.41166221

>>41166077
alright, training started, it should be ready in 6 hours (or 12, depending if I fall asleep in front of my pc).
>>41166216
>BelAir1603
yep, its directing to the exact same link to the same model.

/r/
06/18/24(Tue)18:25:50 No.41166734

/r/ 06/18/24(Tue)18:25:50 No.41166734

https://vocaroo.com/1lvLqvEHxbMr
https://pomf2.lain.la/f/7d0mt6y.mp3
https://www.udio.com/songs/tgF73cds6Tx2i2DGwehvdz

Udio
Taste of Friendship

[Verse 1]
Do you like bananas? (Do you like bananas?)
Tell me, tell me true, don't be shy, do you?
Magic all around me, ponies let's be clear
Celestia's here to hear what's dear

[Bridge]
Friendship, friendship, oh my dear subjects (Friendship, friendship)
But first things first, do you like the taste? (Friendship, friendship)

Anonymous
06/18/24(Tue)18:50:16 No.41166778

Anonymous 06/18/24(Tue)18:50:16 No.41166778

>>41165965 >>41166007
>>41166077 >>41166216
>https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/VA_JeremyClarkson_C02
>https://vocaroo.com/1j4EOZla9qD8
The quality of the model is bit lesser to what I would normally be happy with (as it tries to replicate the thin studio echo that is heard in OG clips) but this is best I could do within one day limit.
It's still 110% better than the stuff that the other one. Im really glad to put some support for the Anti this year, put it to good use Anon.

Anonymous
06/18/24(Tue)19:10:22 No.41166817

Anonymous 06/18/24(Tue)19:10:22 No.41166817

File: SD ai jeremy and his waif(...).jpg (125 KB, 768x768)

125 KB JPG

>https://files.catbox.moe/oo2f1c.mp3
For Anon that wished for some ai voice shitposting, I hope this is shitposting enough for you. Thanks haysay Anon for adding the styletts2 to it. ~~also I find it funny that SD decided that Jeremy waifu is Apple Fritter when I was trying to get something that resembles Rainbowdash~~

Anonymous
06/18/24(Tue)20:07:44 No.41166922

Anonymous 06/18/24(Tue)20:07:44 No.41166922

>>41166778
You're a life saver, thank you!

Anonymous
06/19/24(Wed)00:15:09 No.41167308

Anonymous 06/19/24(Wed)00:15:09 No.41167308

backgrounds safetensor mirror
https://drive.filen.io/d/56e3cd3c-51ce-4ec2-a059-f40a384bd0c3#laGTVsncEs1g5dtWV3TiHyDsanSDiClL

Anonymous
06/19/24(Wed)01:56:24 No.41167472

Anonymous 06/19/24(Wed)01:56:24 No.41167472

https://sonycslparis.github.io/diffariff-companion/

This seems nice, though for now they're not releasing a model due to a few circumstances...

Anonymous
06/19/24(Wed)07:47:11 No.41167850

Anonymous 06/19/24(Wed)07:47:11 No.41167850

>>41167472
Man, it's so frustrating seeing all the new cool tech and be told that nobody can play with it.

Anonymous
06/19/24(Wed)22:52:06 No.41169555

Anonymous 06/19/24(Wed)22:52:06 No.41169555

File: 618893__suggestive_artist(...).png (452 KB, 788x622)

452 KB PNG

Anonymous
06/20/24(Thu)02:07:54 No.41170005

Anonymous 06/20/24(Thu)02:07:54 No.41170005

>>41169555
Tell me your secret, purple horse.

Anonymous
06/20/24(Thu)09:48:48 No.41170592

Anonymous 06/20/24(Thu)09:48:48 No.41170592

>>41169555
boop

Anonymous
06/20/24(Thu)15:10:04 No.41171220

Anonymous 06/20/24(Thu)15:10:04 No.41171220

>>41169555
>555
Is she oscillating?
~~Damn, no one will have the reference~~

Anonymous
06/20/24(Thu)16:42:34 No.41171465

Anonymous 06/20/24(Thu)16:42:34 No.41171465

File: applejack weird.png (188 KB, 446x627)

188 KB PNG

https://vocaroo.com/1b0ZgDIGOQXJ

Running voice conversion on instrumental clips makes the AI generate weird pony beatboxing.

Anonymous
06/20/24(Thu)21:06:39 No.41172091

Anonymous 06/20/24(Thu)21:06:39 No.41172091

>>41164348
I love these! Especially that second one. I can just see Twilight getting paranoid about the thread reaching page 10. Must have taken a lot of effort to generate all of that and include sound effects.
Looking forward to the new thing you are finishing.

Anonymous
06/21/24(Fri)04:09:21 No.41172819

Anonymous 06/21/24(Fri)04:09:21 No.41172819

>>41171220
She's on an astable multivibrator.

Anonymous
06/21/24(Fri)11:04:07 No.41173328

Anonymous 06/21/24(Fri)11:04:07 No.41173328

Bump.

Anonymous
06/21/24(Fri)11:45:04 No.41173400

Anonymous 06/21/24(Fri)11:45:04 No.41173400

File: weed mares.png (597 KB, 1620x1394)

597 KB PNG

https://files.catbox.moe/reqmb9.mp3
Here, some musical ai slop made with udio and haysay.
Zecora - Yellow Pegasus (reggae)

Anonymous
06/21/24(Fri)16:43:53 No.41174002

Anonymous 06/21/24(Fri)16:43:53 No.41174002

>>41173400
>https://files.catbox.moe/reqmb9.mp3
Sounds good, actually.
Not as good as >>41164251 , though

Anonymous
06/21/24(Fri)16:46:43 No.41174009

Anonymous 06/21/24(Fri)16:46:43 No.41174009

File: scrunch2.gif (18 KB, 96x96)

18 KB GIF

Derpy Whooves sings When I'm Up (I Can't Get Down) by 'Great Big Sea'.
https://files.catbox.moe/d7jfyq.mp3

Anonymous
06/21/24(Fri)16:47:44 No.41174011

Anonymous 06/21/24(Fri)16:47:44 No.41174011

File: derpflip.gif (15 KB, 100x90)

15 KB GIF

Anonymous
06/21/24(Fri)16:50:05 No.41174019

Anonymous 06/21/24(Fri)16:50:05 No.41174019

File: star ce26529c81e72417.jpg (70 KB, 630x630)

70 KB JPG

>>41174009
I don't think it's the proper Derpy voice model you used, you would also need to go 12 (or even 23) semitones down to get the pitch right . But hey, it's an attempt.

Anonymous
06/21/24(Fri)16:54:03 No.41174029

Anonymous 06/21/24(Fri)16:54:03 No.41174029

File: beegderp.jpg (2.25 MB, 8192x8192)

2.25 MB JPG

>>41174019
>I don't think it's the proper Derpy voice model you used, you would also need to go 12 (or even 23) semitones down to get the pitch right . But hey, it's an attempt.
>

Anonymous
06/21/24(Fri)16:55:07 No.41174032

Anonymous 06/21/24(Fri)16:55:07 No.41174032

File: boots.png (13 KB, 100x100)

13 KB PNG

Anonymous
06/21/24(Fri)21:02:13 No.41174616

Anonymous 06/21/24(Fri)21:02:13 No.41174616

>>41172819
You too are a man of culture I see.

Anonymous
06/22/24(Sat)02:45:26 No.41175354

Anonymous 06/22/24(Sat)02:45:26 No.41175354

File: DerpyWithGuitar.png (57 KB, 215x234)

57 KB PNG

>>41174019
https://files.catbox.moe/mpukv9.mp3
Fixed, though!
Thannks fer the feedback.
(Unfortunately, When I decided to use the -12 transposition feature the audio became grainy, much like the way it sounds in the end of this track. It's like a horse whisper. A "hoarseness" that obscures the voice of the pone that we are trying to captivate. I didn't even DARE TO try -24 (or 23 or whatever!)
Necessarily, I used the so-vits-svc 4.0 model for this one as haysay.ai does not have the 5.0 so-vits-svc Derpy Whooves model available (that is, assuming that we even know of one or even have one.) Hopefully you don't mind the lighter or higher pitch tone of Derpy's voice. I still think it quite sounds a bit like her, especially since she only got like one singing role in the entire show (if even that.))
I almost was about to use Rainbow Dash's voice when singing this song,
but ahhh...
it just didn't sound quite....
"Bubbly" enough.
Just out of curiosity, when you referred to the "proper" Derpy Whooves model, were you referring to one even hosted on haysay.ai? (the website?) or was it something someone uploaded here and it's still a model that has to be downloaded or linked through a Custom Model URL through Controllable TalkNet?
Besides that, I do have this: the attempt that was made at covering the lower octave shift of Derpy's voice. I don't think these files will be of much help but here are the "scratchy" voice files that were NOt used because the cause of "hoarseness" turned out to be the 'transpose' option slider being utilized in Hay Say (I know, really, it crushes me a little though since I thought Derpy's voice was lower, too.) The second option is to just transpose it down afterwards and it works but it sounds a bit like... Derpy is using a voice changer to sound more like a man. So I guess that's it for that, then.
>Then... But it seems the only alternative would be... that would mean we would have to... sing with a lower voice.
And that obviously only works to the extent that that character was using their 'deep' voice in the first place as it pertains to the information stored in the dataset in general.
But yes, in case you were wondering, I /did/ in fact use my "deep voice" on this one.
>-12 semitones really caused that bad of an effect?
Yes. Maybe 'tis a glitch with the Derpy model
But it's not of concerning to moi
>How do you know that
because I tried at '0' transposed again and the voice was once again discern-able as Derpy Whooves but without the whispering.
>Proof???
Delivery on the next post.

Anonymous
06/22/24(Sat)03:03:30 No.41175384

Anonymous 06/22/24(Sat)03:03:30 No.41175384

>>41175354
>"I still think it quite sounds a bit like her, especially since she only got like one singing role in the entire show (if even that.))"
I guess she got none, I was wrong.
So sorry everybody. I thought this counted: https://www.youtube.com/watch?v=L6zodtgljFE
...
But I guess not... eh.
>You are STALLIGN!!!
Oh, right!
right.
The files: Sorry, I tried. If anyone knows anything about this (or how to go about fixing this issue, whether within Hay Say or otherwise,) please let me know!!!
HEERE THEY AREE:
~~Nope nevermind Catbox responded with zero code—pic related. You'll just have to take my word for it THAT THEY DIDN'T WORK.~~
~~Maybe someday soon sometiem ~~smutty.horse~~ will make a comeback!!!~~

Anonymous
06/22/24(Sat)03:06:07 No.41175388

Anonymous 06/22/24(Sat)03:06:07 No.41175388

File: catboxOhSHI---(2).png (66 KB, 602x237)

66 KB PNG

>>41175384
>pic related
>forgothtepicture.jpg
HAHAaha-hue hue

Anonymous
06/22/24(Sat)03:08:09 No.41175396

Anonymous 06/22/24(Sat)03:08:09 No.41175396

File: Attitude.png (141 KB, 841x521)

141 KB PNG

>>41145926
>https://files.catbox.oe/0twb8w.flac
That's fantastic!

Anonymous
06/22/24(Sat)03:36:25 No.41175437

Anonymous 06/22/24(Sat)03:36:25 No.41175437

File: Rarity.png (98 KB, 360x360)

98 KB PNG

>>41175354
Some songs just aren't really fit for making pony AI covers, because the singer's voice is too high or too low pitched to fit any of the poners. Heavy chorus effects or background singers who sound too much like the main singer can also make it pretty much impossible to get a clear enough vocal extract to work with.

I've been dabbling in AI covers for a bit lately. Using haysay I generally turn the character likeness all the way down for AI covers. Character likeness comes from the singer/vocals you're using.

Overall making AI covers is pretty easy once you get used to the tools you're working with. The real difficulty is finding the right song to cover.

Actually I'm wrong about that. The real work (besides the actual original singer) is put in by the people who trained the character models in the first place. We wouldn't be anywhere without those toppest of lads. Unfortunately I have neither the know-how nor resources to get into that, so I have to satisfy myself by picking the fruits of their labors.

This is a shoutout to all of you based horsebros that make this possible in the first place.

I'm still learning, but I think the Rarity Tears for Fears one I made a few weeks ago came out well
>https://files.catbox.moe/p6tfhy.mp3

Anonymous
06/22/24(Sat)08:24:20 No.41175758

Anonymous 06/22/24(Sat)08:24:20 No.41175758

This thread is full of school shooters anonbros...

Anonymous
06/22/24(Sat)08:36:22 No.41175768

Anonymous 06/22/24(Sat)08:36:22 No.41175768

File: 1706354978348500.jpg (66 KB, 861x963)

66 KB JPG

>>41175758
Fuck, how was I found out so easily?

Anonymous
06/22/24(Sat)12:38:24 No.41176123

Anonymous 06/22/24(Sat)12:38:24 No.41176123

>>41175758
Are you saying this is a ~~banger of a thread?~~

Anonymous
06/22/24(Sat)14:50:43 No.41176437

Anonymous 06/22/24(Sat)14:50:43 No.41176437

>>41175758
What makes you say that?

Anonymous
06/22/24(Sat)15:58:48 No.41176733

Anonymous 06/22/24(Sat)15:58:48 No.41176733

>>41145926
Nice

Anonymous
06/22/24(Sat)21:09:17 No.41177441

Anonymous 06/22/24(Sat)21:09:17 No.41177441

File: OIG4.1lDzcuHODFUw7Z_XGccI.jpg (251 KB, 1024x1024)

251 KB JPG

Anonymous
06/23/24(Sun)01:58:32 No.41177948

Anonymous 06/23/24(Sun)01:58:32 No.41177948

File: Pinkamena - Pumped Up Kicks.jpg (106 KB, 900x888)

106 KB JPG

>>41175758
https://files.catbox.moe/kgkhxr.mp3

By the way, haysay.ai seems to be down. I did this with voice-models and easyaivoice.

Anonymous
06/23/24(Sun)08:11:01 No.41178396

Anonymous 06/23/24(Sun)08:11:01 No.41178396

>>41177948
>By the way, haysay.ai seems to be down.
Yep, noticed that too. Site is still not loading.

hydrusbeta
06/23/24(Sun)10:13:23 No.41178595

hydrusbeta 06/23/24(Sun)10:13:23 No.41178595

>>41177948
>>41178396
haysay.ai is back up now and I renewed the TLS certificate while I was at it. Thanks for bringing it to my attention. The server somehow got into a bad state again and was completely unreachable so I had to reboot it.

Anonymous
06/23/24(Sun)11:59:06 No.41178880

Anonymous 06/23/24(Sun)11:59:06 No.41178880

>>41178595
Thanks! Glad to see it's back up!

Anonymous
06/23/24(Sun)20:25:33 No.41179935

Anonymous 06/23/24(Sun)20:25:33 No.41179935

Up.

Anonymous
06/24/24(Mon)01:31:49 No.41180437

Anonymous 06/24/24(Mon)01:31:49 No.41180437

down ^>:(

Anonymous
06/24/24(Mon)03:40:08 No.41180656

Anonymous 06/24/24(Mon)03:40:08 No.41180656

>>41180437
Konami.

Anonymous
06/24/24(Mon)07:12:52 No.41180876

Anonymous 06/24/24(Mon)07:12:52 No.41180876

Huh. Have any of the VAs done audio books?

Anonymous
06/24/24(Mon)12:51:54 No.41181354

Anonymous 06/24/24(Mon)12:51:54 No.41181354

>>41181350

Anonymous
06/24/24(Mon)13:20:28 No.41181417

Anonymous 06/24/24(Mon)13:20:28 No.41181417

>>41180876
Even if they did, the voices they'd use for an audiobook aren't in character for pony, and we don't need audiobook recordings to train pony models. Emily Blunt recorded one chapter of an audiobook, but it was in her normal British accent instead of her Tempest voice. Also, Nicole Oliver narrated that documentary about fungi, but it's unlikely that adding that recording would improve the quality of Celestia models.

Anonymous
06/24/24(Mon)15:05:11 No.41181670

Anonymous 06/24/24(Mon)15:05:11 No.41181670

File: trixie scared.gif (114 KB, 579x351)

114 KB GIF

>https://files.catbox.moe/5wk6ba.mp3

Anonymous
06/24/24(Mon)18:09:23 No.41182087

Anonymous 06/24/24(Mon)18:09:23 No.41182087

File: Sensible_Chuckle.png (778 KB, 1546x1480)

778 KB PNG

>>41181670
I feel like I should know what this is referencing to, but for some reason my brain can't remember what it is.

Anonymous
06/24/24(Mon)18:29:26 No.41182133

Anonymous 06/24/24(Mon)18:29:26 No.41182133

>>41181670
Holy fuck that got intense, trixie's voice only made it more attention-grabbingpwmn

Anonymous
06/25/24(Tue)00:15:13 No.41182845

Anonymous 06/25/24(Tue)00:15:13 No.41182845

>>41181670
>https://files.catbox.moe/5wk6ba.mp3
Why so wide

Anonymous
06/25/24(Tue)01:20:58 No.41182956

Anonymous 06/25/24(Tue)01:20:58 No.41182956

Improving Text-To-Audio Models with Synthetic Captions
https://arxiv.org/abs/2406.15487
>It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model} to synthesize accurate and diverse captions for audio at scale. We leverage this pipeline to produce a dataset of synthetic captions for AudioSet, named \texttt{AF-AudioSet}, and then evaluate the benefit of pre-training text-to-audio models on these synthetic captions. Through systematic evaluations on AudioCaps and MusicCaps, we find leveraging our pipeline and synthetic captions leads to significant improvements on audio generation quality, achieving a new \textit{state-of-the-art}.
nvidia paper so no weights but maybe this will give some insights
>>41181417
Ah I was thinking you could take the VA voice then voice clone it to whoever they voiced to build a larger dataset.

Anonymous
06/25/24(Tue)03:57:14 No.41183167

Anonymous 06/25/24(Tue)03:57:14 No.41183167

>>41181670
This is so fucking funny.

Anonymous
06/25/24(Tue)10:42:42 No.41183555

Anonymous 06/25/24(Tue)10:42:42 No.41183555

>Don't have a functional computer
>Found only one online tool for AI voice
>Wanted to be part of PPP so bad by creating infinite pony content
Damn. Maybe next year.

Anonymous
06/25/24(Tue)10:45:00 No.41183557

Anonymous 06/25/24(Tue)10:45:00 No.41183557

>>41183555
Haysay is all you really need?
https://haysay.ai

Anonymous
06/25/24(Tue)11:32:06 No.41183661

Anonymous 06/25/24(Tue)11:32:06 No.41183661

>>41183557
>Found only one online tool for AI voice
Actually I was talking about Haysay, which is great for the character I'm aiming for, but the voice input only (RVC) is a downgrade for me, that can't speak english properly

Anonymous
06/25/24(Tue)13:03:03 No.41183815

Anonymous 06/25/24(Tue)13:03:03 No.41183815

File: EmoSphere-TTS.png (117 KB, 994x454)

117 KB PNG

>https://arxiv.org/abs/2406.07803
>Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressive emotional speech by using a spherical emotion vector to control the emotional style and intensity of the synthetic speech. Without any human annotation, we use the arousal, valence, and dominance pseudo-labels to model the complex nature of emotion via a Cartesian-spherical transformation. Furthermore, we propose a dual conditional adversarial network to improve the quality of generated speech by reflecting the multi-aspect characteristics. The experimental results demonstrate the model ability to control emotional style and intensity with high-quality expressive speech.
>https://EmoSphere-TTS.github.io/
No code for this one, but they describe their process well enough that I feel like someone advanced with knowledge in python and training could possibly recreate it. I like the idea they propose to control the emotion not by just trying to use text but by dragging the values of "arousal, valence, and dominance" with modified wav2vec 2.0.

Anonymous
06/25/24(Tue)17:22:52 No.41184281

Anonymous 06/25/24(Tue)17:22:52 No.41184281

File: sweaty twi.gif (563 KB, 642x405)

563 KB GIF

General question to Anons in the thread, how many of you guys keep backups of the training dataset (be it the voice, audio clips, images or other)? Like let's say there is a scenario were the google servers and mega totally fuck themselves sideways following up with accidental deletion of all online ai mare content , would there be enough Anons with backups and know-how to get back to the current state of the tools we are using right now?

Anonymous
06/25/24(Tue)18:31:27 No.41184452

Anonymous 06/25/24(Tue)18:31:27 No.41184452

>>41184281
I have a copy of the master files

Clipper
06/25/24(Tue)19:03:16 No.41184516

Clipper 06/25/24(Tue)19:03:16 No.41184516

>>41184281
I have backups, and the label files can be used to re-generate the dataset if everything else somehow gets lost.

Anonymous
06/25/24(Tue)19:33:14 No.41184590

Anonymous 06/25/24(Tue)19:33:14 No.41184590

https://github.com/DigitalPhonetics/IMS-Toucan
has instructions on how to train with your own dataset

Anonymous
06/26/24(Wed)02:08:04 No.41185318

Anonymous 06/26/24(Wed)02:08:04 No.41185318

Bumpen from page 10

Anonymous
06/26/24(Wed)09:59:20 No.41185830

Anonymous 06/26/24(Wed)09:59:20 No.41185830

File: OIG3.maSgbxWgxcvSDO31en3I.jpg (206 KB, 1024x1024)

206 KB JPG

Anonymous
06/26/24(Wed)11:02:35 No.41185933

Anonymous 06/26/24(Wed)11:02:35 No.41185933

>>41182845
what do you mean

Anonymous
06/26/24(Wed)12:38:27 No.41186129

Anonymous 06/26/24(Wed)12:38:27 No.41186129

File: RecentHaySayError_27JUN24.png (150 KB, 986x411)

150 KB PNG

>Been while since used Hay Say
>Load the local version to make a pony cover
>Error
Temporary failure in name resolution?

Anonymous
06/26/24(Wed)15:24:58 No.41186453

Anonymous 06/26/24(Wed)15:24:58 No.41186453

>>41186129
Is the error sticking around or did it go away after retrying? If it is persistent, try stopping and then restarting Hay:
sudo docker compose stop
sudo docker compose up

I'm not sure why exactly you got this error, but to understand what's going on, refer to the diagram in the Readme under "The Technical Design of Hay Say":
https://github.com/hydrusbeta/hay_say_ui?tab=readme-ov-file#the-technical-design-of-hay-say

Hay Say is comprised of several "containers" which are like little servers running locally on your machine. When you click the Generate button, the container running the UI makes a webservice call to another container. Specifically, it sends a POST request to the "/generate" method on the container responsible for generating audio for the architecture you had selected (each architecture has its own container). For some reason, the Docker instance on your machine lost its ability to map container names to their local IP addresses (a process called "name resolution") which is a required step before one container can make a webservice call to another. This webservice call happens locally, by the way, and does not reach out across the internet.

Anonymous
06/26/24(Wed)19:12:46 No.41186871

Anonymous 06/26/24(Wed)19:12:46 No.41186871

Bumo.

Anonymous
06/26/24(Wed)19:15:06 No.41186878

Anonymous 06/26/24(Wed)19:15:06 No.41186878

>>41186871
Excellente.

Anonymous
06/26/24(Wed)19:58:54 No.41187003

Anonymous 06/26/24(Wed)19:58:54 No.41187003

File: Gemini_Generated_Image_1t(...).jpg (272 KB, 1536x1536)

272 KB JPG

>>41186453
I've tried restarting both Hay Say in the way you mentioned as well as my entire PC, unfortunately the error persists. I have a feeling I somehow broke something over time and should maybe do a fresh reinstall.

Anonymous
06/27/24(Thu)02:59:51 No.41187842

Anonymous 06/27/24(Thu)02:59:51 No.41187842

>>41185830

Anonymous
06/27/24(Thu)06:35:09 No.41188060

Anonymous 06/27/24(Thu)06:35:09 No.41188060

>>41187003
Nta but recently I was updating haysay and it also stopped working so I had to clean old containers and old models. after that almost all worked except cuda (which worked previously). But I blame beta driver update and used cpu for now.

Anonymous
06/27/24(Thu)08:35:14 No.41188202

Anonymous 06/27/24(Thu)08:35:14 No.41188202

Could someone tell me what tts was used for the Aryanne voice in the last redub?

Anonymous
06/27/24(Thu)09:24:51 No.41188268

Anonymous 06/27/24(Thu)09:24:51 No.41188268

>>41188202
Wait, there's an Aryanne redub?

Anonymous
06/27/24(Thu)09:29:40 No.41188274

Anonymous 06/27/24(Thu)09:29:40 No.41188274

>>41188202
I can't recall exactly but I either used Google TTS (German) or I just typed up German TTS.

Anonymous
06/27/24(Thu)09:30:42 No.41188276

Anonymous 06/27/24(Thu)09:30:42 No.41188276

>>41188274
I meant Google Translate (German) *

Anonymous
06/27/24(Thu)16:19:29 No.41189077

Anonymous 06/27/24(Thu)16:19:29 No.41189077

Not sure if anybody needs this, but after few days of struggling to get torch working on old system, the following line of code managed to get it rolling on my old gpu (it uses conda environment of python=3.10.3)
pip3 install --upgrade setuptools==70.1.1 wheel==0.43.0
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 'tensorflow[and-cuda]' --extra-index-url https://download.pytorch.org/whl/ --no-cache-dir --force-reinstall

Anonymous
06/27/24(Thu)17:10:15 No.41189160

Anonymous 06/27/24(Thu)17:10:15 No.41189160

>>41164790
Someone should make a metal song using these lyrics

Clipper
06/27/24(Thu)17:13:27 No.41189164

Clipper 06/27/24(Thu)17:13:27 No.41189164

>>41188202
>>41188274
Anyone want to suggest a voice source for Aryanne? Could easily whip up a dataset to give her a proper voice, ~5-10 mins of clean audio should be plenty.

Anonymous
06/27/24(Thu)17:23:42 No.41189186

Anonymous 06/27/24(Thu)17:23:42 No.41189186

>>41189164
>https://files.catbox.moe/3gcygh.mp4
There was an unspoken agreement among mlpol Anons that voice from video would be best fit for her.

Anonymous
06/27/24(Thu)18:49:37 No.41189413

Anonymous 06/27/24(Thu)18:49:37 No.41189413

>>41181670
I am called Guillaume, and that was fucking uncanny.
But I laugh my ass off!

Anonymous
06/27/24(Thu)18:53:18 No.41189424

Anonymous 06/27/24(Thu)18:53:18 No.41189424

>>41189186
>Ich liebe Panzer
Damn, that's cute.
Unfortunately, I think the quality is too bad to be useful...

Clipper
06/27/24(Thu)19:00:42 No.41189435

Clipper 06/27/24(Thu)19:00:42 No.41189435

>>41189186
Don't think I can do anything with that, not much of the actual voice and what is there has a lot of noise. Surely must be some similar voices out there though?

Anonymous
06/27/24(Thu)21:07:17 No.41189677

Anonymous 06/27/24(Thu)21:07:17 No.41189677

>>41189186
That sounds like shit though.

Anonymous
06/27/24(Thu)21:39:59 No.41189754

Anonymous 06/27/24(Thu)21:39:59 No.41189754

>>41189435
Since she is OC without any media that would dictate the proper direction of her voice (as opposed to the popular ones like Rainbow Dash Presents or SweetieBot) its free for all choice.
But if I were going to make something I would probably with Neco Arc, as mixture of high pitch Japanese cartoony voice with strong German would be the peak of shitposting power level.
>>41189677
>That sounds like shit though.
it does, but you are forgetting it's ~~all about sovl~~

Anonymous
06/27/24(Thu)22:00:22 No.41189795

Anonymous 06/27/24(Thu)22:00:22 No.41189795

Theoretically, and I mean that, would there be a way to make some sort of model that would accept at least 10 seconds of voice data in order to make a coherent voice? I know there are some proprietary things out there that do that, but I mean like open source?

Anonymous
06/27/24(Thu)22:05:28 No.41189807

Anonymous 06/27/24(Thu)22:05:28 No.41189807

>>41189795
As you noted in your post they already exist. So theoretically, sure.

Anonymous
06/27/24(Thu)22:46:20 No.41189885

Anonymous 06/27/24(Thu)22:46:20 No.41189885

>>41189795
if you look in the archives for the Blueblood rvc model, that is possible with some voice cloning tech, but even if you give it 100% clean 10s reference line the result it will gives you would be a 1 good clip for 30 terrible ones.

Anonymous
06/28/24(Fri)01:13:31 No.41190161

Anonymous 06/28/24(Fri)01:13:31 No.41190161

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
https://arxiv.org/abs/2406.18009
>This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the audio infilling task. Unlike many previous works, it does not require additional components (e.g., duration model, grapheme-to-phoneme) or complex techniques (e.g., monotonic alignment search). Despite its simplicity, E2 TTS achieves state-of-the-art zero-shot TTS capabilities that are comparable to or surpass previous works, including Voicebox and NaturalSpeech 3. The simplicity of E2 TTS also allows for flexibility in the input representation. We propose several variants of E2 TTS to improve usability during inference.
https://www.microsoft.com/en-us/research/project/e2-tts/
From Microsoft so now weights obviously. Very cool. Emotion control, speed control, and more interesting explicit phoneme pronunciation. See the examples since it's pretty impressive.
Some related stuff I found to that
https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/g2p.html
https://huggingface.co/docs/transformers/en/model_doc/wav2vec2_phoneme

Anonymous
06/28/24(Fri)03:31:51 No.41190446

Anonymous 06/28/24(Fri)03:31:51 No.41190446

File: DREP.png (1.04 MB, 1024x1024)

1.04 MB PNG

>>41137243

Stuff i should of posted here.
>>41190109
>>41190310
>>41190431

Anonymous
06/28/24(Fri)04:51:35 No.41190522

Anonymous 06/28/24(Fri)04:51:35 No.41190522

>>41190161
I'm curious, where do you find your papers? Do you just scroll through the newest arxiv submissions every day?

Anonymous
06/28/24(Fri)05:02:30 No.41190539

Anonymous 06/28/24(Fri)05:02:30 No.41190539

>>41190446
Does Suno give you the option to separate the vocals from the instrumentals, or do you have another way to do it?

Anonymous
06/28/24(Fri)09:04:37 No.41190801

Anonymous 06/28/24(Fri)09:04:37 No.41190801

>>41190539
i had to use vocalremover.org.

Anonymous
06/28/24(Fri)09:05:10 No.41190802

Anonymous 06/28/24(Fri)09:05:10 No.41190802

>>41190522
For arxiv cs.LG yeah. Had to set up a twitter that I have focused on ML/AI stuff to catch papers/gits I miss. Also have a few other places I check that sometimes has relevant stuff pop up.

Anonymous
06/28/24(Fri)11:15:27 No.41190954

Anonymous 06/28/24(Fri)11:15:27 No.41190954

>>41190801
personally i prefer using the Ultimate Vocal Removal with the Kim2 model and then follow up with the reverb removal model (I find it the end result sounds much better when adding my own reverb/echo than try to work with whatever the original source is giving you)

Anonymous
06/28/24(Fri)16:32:06 No.41191684

Anonymous 06/28/24(Fri)16:32:06 No.41191684

>9

Anonymous
06/28/24(Fri)23:22:33 No.41192886

Anonymous 06/28/24(Fri)23:22:33 No.41192886

>>41166778
>Special thanks in post anti panel
Very nice, it was a pleasure to help fellow horse fuckers.

Anonymous
06/29/24(Sat)01:55:36 No.41193237

Anonymous 06/29/24(Sat)01:55:36 No.41193237

File: flutterglow.png (1.14 MB, 1024x1024)

1.14 MB PNG

>>41190954
i updated the songs, i used UVR MDX-Net Kim_Vocal_2 and Reverb_HQ_by_Foxjoy

Pinkie Pies Dance Club
https://files.catbox.moe/eckq4h.mp3
RAINBUTT V2
https://files.catbox.moe/1bti1o.mp3
Applejack burnt her pie V2
https://files.catbox.moe/ju4ig8.mp3

Anonymous
06/29/24(Sat)02:18:41 No.41193261

Anonymous 06/29/24(Sat)02:18:41 No.41193261

>>41189077
>posts a solution to his problem
>gets no thank you's
im here to tell you that I, specifically am writing this out of gratitude anon, you're an unsung hero, hope u know that

Anonymous
06/29/24(Sat)09:28:29 No.41193710

Anonymous 06/29/24(Sat)09:28:29 No.41193710

>>41191684
Quite so.

Anonymous
06/29/24(Sat)21:21:32 No.41195438

Anonymous 06/29/24(Sat)21:21:32 No.41195438

>>41193237
Nice songs!

So, you made the music with lyrics using "suno", splited music from lyrics using "Ultimate Vocal Remover", feed the lyric to an AI pony voice, and mix back with music and some echo to hide the AI a bit?

Anonymous
06/30/24(Sun)00:21:48 No.41195875

Anonymous 06/30/24(Sun)00:21:48 No.41195875

>>41195438
Yep. workflow was suno, ultimate vocal remover, ultimate vocal remover again (for the reverb), sovits for vocals, combine instruments and sovits vocals in audacity.

Anonymous
06/30/24(Sun)01:53:14 No.41196132

Anonymous 06/30/24(Sun)01:53:14 No.41196132

>>41193237
Dude. Keep doing this. The AIs sound so clean here, plus the songs ain't half bad either.

Anonymous
06/30/24(Sun)04:20:43 No.41196380

Anonymous 06/30/24(Sun)04:20:43 No.41196380

>>41193237
Applejack starts to sound pretty solid at this point.

Anonymous
06/30/24(Sun)10:01:31 No.41196733

Anonymous 06/30/24(Sun)10:01:31 No.41196733

Up.

Anonymous
06/30/24(Sun)16:08:57 No.41197696

Anonymous 06/30/24(Sun)16:08:57 No.41197696

Reminder that StyleTTS is pretty neat.
https://files.catbox.moe/h95xtr.mp3

Anonymous
06/30/24(Sun)16:10:37 No.41197703

Anonymous 06/30/24(Sun)16:10:37 No.41197703

>>41197696
This is a line from a fic I'm writing, to clarify.

Anonymous
06/30/24(Sun)20:08:24 No.41198644

Anonymous 06/30/24(Sun)20:08:24 No.41198644

>>41196733

Anonymous
07/01/24(Mon)01:56:58 No.41199671

Anonymous 07/01/24(Mon)01:56:58 No.41199671

>>41188060
I see. Well, I'm a complete docker novice. How would I go about cleaning the containers? Also, I attempted to pull the latest versions to fix the issue, and now it errors without even running.

Anonymous
07/01/24(Mon)02:40:26 No.41199732

Anonymous 07/01/24(Mon)02:40:26 No.41199732

is it worth making a new SD safetensor trained only with every canon screenshot?

Anonymous
07/01/24(Mon)02:49:03 No.41199744

Anonymous 07/01/24(Mon)02:49:03 No.41199744

File: Gemini_Generated_Image_ko(...).jpg (283 KB, 1536x1536)

283 KB JPG

Luma seems to do much better when given an image as guidance in addition to the prompt, even if the image in question is AI generated.

>Derpy Hooves (MLP:FiM) looks towards the camera and tilts head in a smile with closed eyes
https://files.catbox.moe/l6ybfj.webm
https://files.catbox.moe/j7cbi6.webm

Anonymous
07/01/24(Mon)02:52:09 No.41199749

Anonymous 07/01/24(Mon)02:52:09 No.41199749

>>41199744
both of those were ass anon
not saying you should stop posting stuff like this because it's still informative, especially considering how most anons got very different LUMA videos compared to what their prompt implied, but this needs more work, it just aint there yet
again, A for effort though, cool concept of figuring out better LUMA prompting

Anonymous
07/01/24(Mon)02:58:38 No.41199757

Anonymous 07/01/24(Mon)02:58:38 No.41199757

>>41199732
Why? If by only you mean from scratch then that's the most retarded idea ever, it won't generalize at all.
You can get show style output in current models by adding "screencap" to prompt, the boorus where the training images were scraped from already had a bunch of screencaps

Anonymous
07/01/24(Mon)04:17:37 No.41199845

Anonymous 07/01/24(Mon)04:17:37 No.41199845

File: 2bb2f256-d7ca-4583-aff2-8(...).webm (209 KB, 768x768)

209 KB WEBM

>>41199749
Yeah, but still much better than the garbage Stable Video can do lately. Especially given how you can only request image ref OR text prompt; never both.

Google should debut their Veo/VideoFX/Deepmind already, It's already great at horses and their Imagen counterpart is already great at show accurate pony.

Anonymous
07/01/24(Mon)08:57:13 No.41200115

Anonymous 07/01/24(Mon)08:57:13 No.41200115

>>41199845
yeah, the SD animation suffers from constantly re-rending the background causing a weird flickering effect, I feel like this is a kind of problem that could be solved with some decent image separators + composition scripts but trouble is that SD edits the whole image in one go.

Anonymous
07/01/24(Mon)09:40:04 No.41200183

Anonymous 07/01/24(Mon)09:40:04 No.41200183

>>41162218
oh fucking mc boohoo than, fuck off to the afterlife than, cause everyone on here was a bitch that alot of the creators dont wanna do it no more because this fucking forum chases them away because you cunts dont have any fucking common decency and not have practiced the friendship lessons twilight taught us u guys are a bunch of fake bitches kiss my arse and hope to die u bunch of cave dwelling gigachad motherfuckers practice the lessons of friendship then come back to me

Anonymous
07/01/24(Mon)09:52:00 No.41200203

Anonymous 07/01/24(Mon)09:52:00 No.41200203

>>41200183
Hi Thunder"Anon". I wasn't talking about you, your shit always sucked. I was referring to the earlier days when people were posting audio episodes and memes with 15.ai and such. Was before your time.

Anonymous
07/01/24(Mon)10:33:41 No.41200268

Anonymous 07/01/24(Mon)10:33:41 No.41200268

lol who is that retard ? ur bitchin and complaining like a fucktard get in line you whinny ass bitch

Anonymous
07/01/24(Mon)10:35:55 No.41200273

Anonymous 07/01/24(Mon)10:35:55 No.41200273

>>41200203
>>41200203
>>41200183

lol i wasnt talking about him, or anyone, people have been here and gone cause you are all bunch of fucktards anyway

Anonymous
07/01/24(Mon)10:41:21 No.41200287

Anonymous 07/01/24(Mon)10:41:21 No.41200287

>>41200203
your shit sucks to
eat my hoursecock
much on it like its peanut butter n jelly

Anonymous
07/01/24(Mon)10:50:06 No.41200304

Anonymous 07/01/24(Mon)10:50:06 No.41200304

>>41200183
B-based?

Anonymous
07/01/24(Mon)11:06:27 No.41200336

Anonymous 07/01/24(Mon)11:06:27 No.41200336

>>41200304
yes indeed based

Anonymous
07/01/24(Mon)11:33:45 No.41200386

Anonymous 07/01/24(Mon)11:33:45 No.41200386

>>41184281
I can see it now. Two centuries from now after the Google-Amazon War of Independence, Twilight STCs are seen as more valuable than breathing air itself.

Anonymous
07/01/24(Mon)14:56:07 No.41200843

Anonymous 07/01/24(Mon)14:56:07 No.41200843

File: Hyperwar Warzone Broadcas(...).jpg (128 KB, 768x768)

128 KB JPG

>>41200386 >>41162218
>https://files.catbox.moe/l6x2rd.mp3
here is your silly ai shitpost content sir.

Anonymous
07/01/24(Mon)20:14:01 No.41201754

Anonymous 07/01/24(Mon)20:14:01 No.41201754

File: OIG2.l92qSBtBAUaocWOX5Q7N.jpg (166 KB, 1024x1024)

166 KB JPG

Anonymous
07/02/24(Tue)02:40:57 No.41202539

Anonymous 07/02/24(Tue)02:40:57 No.41202539

Pony?

Anonymous
07/02/24(Tue)02:41:51 No.41202543

Anonymous 07/02/24(Tue)02:41:51 No.41202543

Uppity.

Anonymous
07/02/24(Tue)03:05:09 No.41202569

Anonymous 07/02/24(Tue)03:05:09 No.41202569

boops

Anonymous
07/02/24(Tue)04:23:38 No.41202672

Anonymous 07/02/24(Tue)04:23:38 No.41202672

Lightweight Zero-shot Text-to-Speech with Mixture of Adapters
https://arxiv.org/abs/2407.01291
>The advancements in zero-shot text-to-speech (TTS) methods, based on large-scale models, have demonstrated high fidelity in reproducing speaker characteristics. However, these models are too large for practical daily use. We propose a lightweight zero-shot TTS method using a mixture of adapters (MoA). Our proposed method incorporates MoA modules into the decoder and the variance adapter of a non-autoregressive TTS model. These modules enhance the ability to adapt a wide variety of speakers in a zero-shot manner by selecting appropriate adapters associated with speaker characteristics on the basis of speaker embeddings. Our method achieves high-quality speech synthesis with minimal additional parameters. Through objective and subjective evaluations, we confirmed that our method achieves better performance than the baseline with less than 40\% of parameters at 1.9 times faster inference speed.
https://ntt-hilab-gensp.github.io/is2024lightweightTTS/
no weights (small model trained on japanese anyway) but interesting

Anonymous
07/02/24(Tue)04:28:48 No.41202682

Anonymous 07/02/24(Tue)04:28:48 No.41202682

File: Untitled.png (151 KB, 1357x724)

151 KB PNG

Papez: Resource-Efficient Speech Separation with Auditory Working Memory
https://arxiv.org/abs/2407.00888
>Transformer-based models recently reached state-of-the-art single-channel speech separation accuracy; However, their extreme computational load makes it difficult to deploy them in resource-constrained mobile or IoT devices. We thus present Papez, a lightweight and computation-efficient single-channel speech separation model. Papez is based on three key techniques. We first replace the inter-chunk Transformer with small-sized auditory working memory. Second, we adaptively prune the input tokens that do not need further processing. Finally, we reduce the number of parameters through the recurrent transformer. Our extensive evaluation shows that Papez achieves the best resource and accuracy tradeoffs with a large margin.
https://github.com/snuhcs/Papez
might be useful

Anonymous
07/02/24(Tue)10:27:51 No.41203079

Anonymous 07/02/24(Tue)10:27:51 No.41203079

https://huggingface.co/fishaudio/fish-speech-1.2
>trained on 300k hours of English, Chinese, and Japanese audio data.
https://github.com/fishaudio/fish-speech
https://speech.fish.audio/en/samples/
Might be cool. English samples only have the generated audio lol

Anonymous
07/02/24(Tue)18:02:30 No.41204065

Anonymous 07/02/24(Tue)18:02:30 No.41204065

File: think derpy.png (94 KB, 750x906)

94 KB PNG

Hey Anons, Im just throwing an idea out here to see how people feel about it. There are some folks who wish to create a songs/audio shitpost and on the other side there are some contentfags that are little bit burnout from whatever they are doing and could use a small side project to test and bounce around new concepts.
How is that sounding? ~~I feel like it would be at the very least it bit less blue than keep on posting a bump every five to three hours~~?

Anonymous
07/02/24(Tue)18:26:17 No.41204146

Anonymous 07/02/24(Tue)18:26:17 No.41204146

>>41204065
What's the idea?

Anonymous
07/02/24(Tue)19:26:18 No.41204335

Anonymous 07/02/24(Tue)19:26:18 No.41204335

>>41204146
dunno, like people could post a greentext/pastebin of song or story that would make a fun audio/song?

Anonymous
07/02/24(Tue)23:17:49 No.41204915

Anonymous 07/02/24(Tue)23:17:49 No.41204915

In a nutshell, how does text-to-music/audiocraft work? Do we have models of Daniel Ingram's style to create ponylike ost/bgm?

Anonymous
07/03/24(Wed)02:00:16 No.41205157

Anonymous 07/03/24(Wed)02:00:16 No.41205157

>>41204915
Daniel Ingram wrote the songs, it was William Anderson who composed the background music. Anyway, Meta has released training code and weights for Autiocraft here:
https://civitai.com/api/download/models/613602

Preparing the dataset requires slicing the BGM tracks into 30 second chunks and writing a text description for each chunk. Is there a music-to-text language model that can help with the descriptions?

Anonymous
07/03/24(Wed)02:28:26 No.41205187

Anonymous 07/03/24(Wed)02:28:26 No.41205187

>>41205157
>music-to-text
I would have thought that similarly with the tex2img those models would came with their coders/decoders (like the BLIP Interrogator ) so you could technically reverse the process of generation by "asking" the model what it sees in the image?
or maybe the audio models work too differently from the image one?

Anonymous
07/03/24(Wed)07:29:54 No.41205451

Anonymous 07/03/24(Wed)07:29:54 No.41205451

>>41205157
>https://www.yeschat.ai/tag/Music-Transcription
It looks like there are tools to transcript the actual musical notes into text , and may also provide a (questionable) ability to tell differences between musical genres. Sadly those also require a a decent amount of shekels to be thrown at their subscription.

Anonymous
07/03/24(Wed)08:51:40 No.41205544

Anonymous 07/03/24(Wed)08:51:40 No.41205544

https://youtube.com/live/hm2IJSKcYvo
Pretty impressive. Apparently will be open sourced. All hinges on how well it finetunes

Anonymous
07/03/24(Wed)10:55:10 No.41205785

Anonymous 07/03/24(Wed)10:55:10 No.41205785

>>41205157

Maybe I'm missing a joke here, but that link definitely does not lead to the Audiocraft stuff...

Anonymous
07/03/24(Wed)12:07:50 No.41205981

Anonymous 07/03/24(Wed)12:07:50 No.41205981

>>41205544
https://us.moshi.chat/?queue_id=talktomoshi

Anonymous
07/03/24(Wed)12:23:41 No.41206023

Anonymous 07/03/24(Wed)12:23:41 No.41206023

What's the opensource equivalent to suno.ai?

Anonymous
07/03/24(Wed)12:28:06 No.41206038

Anonymous 07/03/24(Wed)12:28:06 No.41206038

>>41205785
I copied the wrong link:
https://github.com/facebookresearch/audiocraft

Anonymous
07/03/24(Wed)12:28:31 No.41206039

Anonymous 07/03/24(Wed)12:28:31 No.41206039

>>41206023
Nothing.

Anonymous
07/03/24(Wed)13:54:17 No.41206254

Anonymous 07/03/24(Wed)13:54:17 No.41206254

File: pain.gif (735 KB, 960x540)

735 KB GIF

>>41206039
>2024
>no local suno
aifags have failed me

Anonymous
07/03/24(Wed)15:16:51 No.41206476

Anonymous 07/03/24(Wed)15:16:51 No.41206476

>>41206254
>>41206023
Im pretty sure the udio and suno are just the BARK on steroids (in the same way as how the chatgpy is BERT on steroids).
>>41205544
>10s input to clone good quality voice
Impressive. They talked about having model understood 17 different distinctive emotions, the speaking in France accent, as pirate and whispering types is pretty neat, it would be pretty useful to be able to feed a text its tts module and have the exact emotional outcome and pace as an output.
I would like to see what exact laptop model they used, if it's some bullshit gaymer 16vram than it wouldn't be as impressive as running it on a small 6vram gpu.

Anonymous
07/03/24(Wed)19:33:36 No.41207064

Anonymous 07/03/24(Wed)19:33:36 No.41207064

>>41137243
i renember a guy making a ai south park episode generator,i wonder if something similar could be done with mlp and flash sprites

Anonymous
07/03/24(Wed)19:51:59 No.41207106

Anonymous 07/03/24(Wed)19:51:59 No.41207106

>>41207064
https://www.youtube.com/@blob81/streams
It has been done in the past, this Anon is keeping it WIP but I have high hopes it will get finish soon(tm).

Anonymous
07/03/24(Wed)19:55:14 No.41207120

Anonymous 07/03/24(Wed)19:55:14 No.41207120

>>41207106
lmao

Anonymous
07/03/24(Wed)20:16:55 No.41207161

Anonymous 07/03/24(Wed)20:16:55 No.41207161

>>41207106
this uses 3d models it seems, it would be cool if it was done using flash

Anonymous
07/03/24(Wed)22:35:16 No.41207434

Anonymous 07/03/24(Wed)22:35:16 No.41207434

File: a paradox.jpg (174 KB, 1280x720)

174 KB JPG

>>41207106
It feels so weird seeing redeemed Glimmer standing with unicorn Twilight inside the Golden Oaks Library, but it's exactly the kind of AU timeline I want to see more of in fanworks.

Anonymous
07/04/24(Thu)03:05:54 No.41207837

Anonymous 07/04/24(Thu)03:05:54 No.41207837

>10

Anonymous
07/04/24(Thu)07:51:25 No.41208125

Anonymous 07/04/24(Thu)07:51:25 No.41208125

>>41207161
Weirdly enough trying to do this in 2d would be more difficult without making it look like ass, since it would either be forced to never change the generic stock vector look or be limited to just loop the exact same handful of animations. With the 3d models there isn't really that much of animations than standard walk and iddle talk however sinc ethey can be rotated and place in front and behind each other allows it to break the "sameness" a little bit.

Anonymous
07/04/24(Thu)08:02:23 No.41208136

Anonymous 07/04/24(Thu)08:02:23 No.41208136

>>41187003
>>41199671
Sorry for my delayed reply. What error are you getting after pulling the latest images?

Since the original error was with name resolution, I suspect there's an issue with the Docker engine itself. Have you tried reinstalling Docker?
Windows/MacOS: https://www.docker.com/products/docker-desktop/
Linux: https://docs.docker.com/engine/install/

Anonymous
07/04/24(Thu)19:13:02 No.41209325

Anonymous 07/04/24(Thu)19:13:02 No.41209325

>>41207837
horse

Anonymous
07/04/24(Thu)19:24:50 No.41209368

Anonymous 07/04/24(Thu)19:24:50 No.41209368

File: file.png (73 KB, 1230x253)

73 KB PNG

>All /ppp/ is on borrowed time.

Anonymous
07/04/24(Thu)19:49:12 No.41209427

Anonymous 07/04/24(Thu)19:49:12 No.41209427

>>41209368
we are already dead to be honest

Anonymous
07/04/24(Thu)19:50:44 No.41209432

Anonymous 07/04/24(Thu)19:50:44 No.41209432

>>41209368
>"Hey Jewtube I'm a literally who voice actor [starting 10 minutes ago of course] and this video I don't like has AI content in it that was trained on my voice and/or appearance."
i give it a week tops before BGM's stuff is baleeted
i would recommend just starting a platform specifically for ai, but cia pedoniggers would spam it and get it shut down

Anonymous
07/04/24(Thu)19:53:01 No.41209437

Anonymous 07/04/24(Thu)19:53:01 No.41209437

>>41209432
>what is ponytube

Anonymous
07/04/24(Thu)19:55:34 No.41209439

Anonymous 07/04/24(Thu)19:55:34 No.41209439

>>41209432
How would they know it's a copy? Have to record your own voice? big tech datamining wins either way

Anonymous
07/04/24(Thu)19:56:40 No.41209441

Anonymous 07/04/24(Thu)19:56:40 No.41209441

>>41209439
https://youtu.be/YWdD206eSv0?si=1pH4abj87bJ3LgNx

Anonymous
07/05/24(Fri)00:03:54 No.41209873

Anonymous 07/05/24(Fri)00:03:54 No.41209873

>>41209427
are any of the voice actors and actresses on that voice actor union thingy?

Anonymous
07/05/24(Fri)00:20:39 No.41209906

Anonymous 07/05/24(Fri)00:20:39 No.41209906

also has anything new coming out in regards to ai voice stuff?

Anonymous
07/05/24(Fri)04:55:08 No.41210230

Anonymous 07/05/24(Fri)04:55:08 No.41210230

>>41209439
They don't give a crap. Copyright trolling has been a blight on Youtube for years.

Anonymous
07/05/24(Fri)06:52:31 No.41210334

Anonymous 07/05/24(Fri)06:52:31 No.41210334

https://x.com/PrimeIntellect/status/1808639707435446543
Cheapest yet?
H100s $1.65/hr
A100s $0.87/hr
4090s $0.32/hr
3090s $0.19/hr

Anonymous
07/05/24(Fri)14:02:12 No.41211011

Anonymous 07/05/24(Fri)14:02:12 No.41211011

File: OIG2.RPWenu1u9saLw0neKMCG.jpg (133 KB, 1024x1024)

133 KB JPG

Anonymous
07/05/24(Fri)14:17:05 No.41211041

Anonymous 07/05/24(Fri)14:17:05 No.41211041

File: 1689454683655703.png (146 KB, 1008x777)

146 KB PNG

Haysay's broken again.

hydrusbeta
07/05/24(Fri)15:21:23 No.41211179

hydrusbeta 07/05/24(Fri)15:21:23 No.41211179

>>41211041
Thank you for bringing it to my attention. It should be fixed now. The root cause was the exact same thing mentioned in >>41135074. I have no idea why those config files are getting wiped, though. I might add a cron job to periodically check the files and fix them if needed.

Anonymous
07/05/24(Fri)16:53:18 No.41211355

Anonymous 07/05/24(Fri)16:53:18 No.41211355

>>41209439
I'm pretty sure that adding "No AI voices used in this video were trained on unconsenting people's voices." would protect from this.

Anonymous
07/05/24(Fri)19:07:30 No.41211592

Anonymous 07/05/24(Fri)19:07:30 No.41211592

>>41211011
Ooh, shiny.

Anonymous
07/05/24(Fri)22:17:08 No.41211992

Anonymous 07/05/24(Fri)22:17:08 No.41211992

>>41137243
https://www.youtube.com/watch?v=5yYrh6AcURk

Anonymous
07/06/24(Sat)02:41:28 No.41212531

Anonymous 07/06/24(Sat)02:41:28 No.41212531

>>41211011

Anonymous
07/06/24(Sat)04:04:47 No.41212638

Anonymous 07/06/24(Sat)04:04:47 No.41212638

>>41211041
These walls of text are always funny to read.

Anonymous
07/06/24(Sat)05:22:33 No.41212737

Anonymous 07/06/24(Sat)05:22:33 No.41212737

https://github.com/FunAudioLLM/CosyVoice
https://fun-audio-llm.github.io

Anonymous
07/06/24(Sat)17:07:43 No.41213680

Anonymous 07/06/24(Sat)17:07:43 No.41213680

>mare at 9

Anonymous
07/06/24(Sat)22:04:38 No.41214214

Anonymous 07/06/24(Sat)22:04:38 No.41214214

>>41213680
Eeyup.

Anonymous
07/06/24(Sat)22:21:47 No.41214235

Anonymous 07/06/24(Sat)22:21:47 No.41214235

Huh, just realized there wasn't a PPP panel at this year's /mlp/con.

Anonymous
07/06/24(Sat)22:56:00 No.41214273

Anonymous 07/06/24(Sat)22:56:00 No.41214273

>>41214235
yeah it made me sad
i always loved PPP panels...

Anonymous
07/07/24(Sun)04:56:06 No.41214755

Anonymous 07/07/24(Sun)04:56:06 No.41214755

>>41214214
Eenope.

Anonymous
07/07/24(Sun)11:42:57 No.41215208

Anonymous 07/07/24(Sun)11:42:57 No.41215208

>entropy

Anonymous
07/07/24(Sun)16:34:12 No.41215845

Anonymous 07/07/24(Sun)16:34:12 No.41215845

>>41209432
>>41211355
It's already over. A lot of AI covers are being taken down. There was a celine dion cover that was perfectly fine for years, then after this it's suddenly been wiped off the map, along with many south park cartman covers.

Anonymous
07/07/24(Sun)18:44:46 No.41216105

Anonymous 07/07/24(Sun)18:44:46 No.41216105

>>41215845
That's how innovation is thrown into a dumpster.

Anonymous
07/07/24(Sun)20:11:33 No.41216327

Anonymous 07/07/24(Sun)20:11:33 No.41216327

>>41216105
but Anon, think about it, if you don't send your shekles to the poor musical publishing companies the artist WILL starve (they only get like a quarter of a penny per sold cd).

Anonymous
07/07/24(Sun)20:47:33 No.41216432

Anonymous 07/07/24(Sun)20:47:33 No.41216432

>>41215845
I hate when they do this

>>41216105
Well, maybe it's time to switch to something else, like peertube or whatever?
Maybe mirror things at first?

Anonymous
07/08/24(Mon)00:54:44 No.41216858

Anonymous 07/08/24(Mon)00:54:44 No.41216858

File: Proposal.png (294 KB, 1080x1920)

294 KB PNG

>pic related
Now of course there’s a fuck load of issues with this idea
>first
Step 4 implies that you could even train a lora fast enough from start to finish within the amount of time it takes you to reach the context limit while using sillytavern, let alone the amount of time it takes to augment the data even if you made some shitty python program that could magically do it for you, and if there even was one, you would likely need to run ANOTHER model to even augment the data automatically in the background while using silly tavern which considering the fact you’re running a local model on top of constantly running lora training in the background, you’re fucking raping your GPU assuming anyone even has enough VRAM for this shit
>second
I doubt you can even swap out Loras mid chat, idk about kobold since i’ve never tried, but text gen webui takes a good amount of time to load a model, I doubt it would be different for a lora and i don’t see why other programs would magically just not have a loading time, so that’s another roadblock in the seamlessness of this idea
>third
Even in a magical world where any of us had the VRAM for this monstrosity, the amount of little python or batch files needed to be made in order to automate this process is fucking annoying
>why bother augmenting the data
Wish i had the end result of this dude’s stuff but basically this guy tried a similar thing by training a single lora on the raw unaugmented chat logs and in the end it got hyper schizo, and augmentation is the number 1 way to reduce schizophrenia outcomes on a small dataset
https://desuarchive.org/g/thread/95930009/#95933766
Wish i had proof because the end result was pretty funny after it spammed about fallout new vegas (the guy said his favorite game was fallout 3)
Like i said, wish i had proof of how it turned out but i don’t have enough keywords to find it on desuarchive
>why not just use summarization?
Summarization has a limit on exact information, it quite literally has to be a summary, details have to be thrown out, and even then it will eat up context as time goes on
In a hypothetical scenario where pic related could even function, summarization would be a great way to buy time for lora training so that the chat logs aren’t only being influenced by recent messages and therefore also making the chat logs higher quality for training the next lora rotation.

yeah this plan is shit, but gun to your head, how would you make a plan for infinite long term memory of a local language model?

Anonymous
07/08/24(Mon)00:58:25 No.41216866

Anonymous 07/08/24(Mon)00:58:25 No.41216866

>>41216858
>small dataset unaugmented = overfitting
thought of this while writing this out, but couldn't you just prevent overfitting by doing less epochs or steps or something? or would it just simply not retain the information and small, unaugmented datasets are stuck between a rock and a hard place of either being undertrained and therefore pointless or overfit and schizophrenic?

Anonymous
07/08/24(Mon)03:35:39 No.41217076

Anonymous 07/08/24(Mon)03:35:39 No.41217076

bump

Anonymous
07/08/24(Mon)06:26:08 No.41217258

Anonymous 07/08/24(Mon)06:26:08 No.41217258

>>41216858
I think there are plans on making a """infinite"""" text references with LLM s but so far it looks like it's limited to a very specific and niche kind of work.

Anonymous
07/08/24(Mon)09:28:21 No.41217435

Anonymous 07/08/24(Mon)09:28:21 No.41217435

https://github.com/KwaiVGI/LivePortrait
https://arxiv.org/abs/2407.03168
not sure if it will work for ponies with a finetune but might be interesting for you guys

Anonymous
07/08/24(Mon)13:32:09 No.41217850

Anonymous 07/08/24(Mon)13:32:09 No.41217850

>>41217435
thats pretty neat, even if just now it seems to be slight advanced vtuber avatar it would be nice to see it evolved into some more advanced puppeteer control.

Anonymous
07/08/24(Mon)17:50:52 No.41218433

Anonymous 07/08/24(Mon)17:50:52 No.41218433

bump

Anonymous
07/08/24(Mon)20:01:30 No.41218750

Anonymous 07/08/24(Mon)20:01:30 No.41218750

>>41218433
yep

Anonymous
07/08/24(Mon)22:58:28 No.41219409

Anonymous 07/08/24(Mon)22:58:28 No.41219409

File: 22538.png (159 KB, 839x810)

159 KB PNG

Bumperino

Anonymous
07/09/24(Tue)02:40:35 No.41219780

Anonymous 07/09/24(Tue)02:40:35 No.41219780

>>41219409
Subtle as a train wreck.

Anonymous
07/09/24(Tue)06:02:56 No.41220021

Anonymous 07/09/24(Tue)06:02:56 No.41220021

File: Spoiler Image (1.34 MB, 1600x1095)

1.34 MB PNG

Get bumped by something unholy.

Anonymous
07/09/24(Tue)19:09:00 No.41221215

Anonymous 07/09/24(Tue)19:09:00 No.41221215

>>41220021
Eh, that's not so bad.

Anonymous
07/09/24(Tue)23:05:40 No.41221635

Anonymous 07/09/24(Tue)23:05:40 No.41221635

https://voca.ro/1ehkf83fVxmA
https://pomf2.lain.la/f/u35gqrq6.mp3

Udio
Celestial Dilemma

I never thought I'd buy into the dream,
A world of pastel ponies and gleam.
But, then I saw her, that royal behind,
Celestia's butt just messed with my mind.

I swore I'd be tough, not fall in this trap,
No magic, no sparkles, but oh, holy crap!

Anonymous
07/09/24(Tue)23:26:03 No.41221698

Anonymous 07/09/24(Tue)23:26:03 No.41221698

Hey Synthbot (if you're still here), can you try loading the line "Who knows, you might want even want to live there!" (sic) from the pony-speech parquet dataset? I'm getting an error trying to load the audio.

Anonymous
07/10/24(Wed)02:44:23 No.41222037

Anonymous 07/10/24(Wed)02:44:23 No.41222037

>>41221635
I love the retro vibes my dude.

Anonymous
07/10/24(Wed)07:16:18 No.41222434

Anonymous 07/10/24(Wed)07:16:18 No.41222434

>mare 10
Audio bros, were are you at?

Anonymous
07/10/24(Wed)12:34:22 No.41222862

Anonymous 07/10/24(Wed)12:34:22 No.41222862

https://x.com/yoachlacombe/status/1810964784927367668
Maybe that one anon will be interested in doing a finetune with this

Anonymous
07/10/24(Wed)15:31:56 No.41223095

Anonymous 07/10/24(Wed)15:31:56 No.41223095

>>41222434
On a holiday.

Anonymous
07/11/24(Thu)02:03:51 No.41224222

Anonymous 07/11/24(Thu)02:03:51 No.41224222

https://github.com/kwatcharasupat/bandit
>Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psychoacoustically motivated frequency scales were used to inform the band definitions which are now defined with redundancy for more reliable feature extraction. A loss function motivated by the signal-to-noise ratio and the sparsity-promoting property of the 1-norm was proposed. We additionally exploit the information-sharing property of a common-encoder setup to reduce computational complexity during both training and inference, improve separation performance for hard-to-generalize classes of sounds, and allow flexibility during inference time with detachable decoders. Our best model sets the state of the art on the Divide and Remaster dataset with performance above the ideal ratio mask for the dialogue stem.
should be having a new version come out since they made a new paper about their benchmark dataset
https://arxiv.org/abs/2407.07275
https://github.com/kwatcharasupat/source-separation-landing

Anonymous
07/11/24(Thu)02:36:55 No.41224271

Anonymous 07/11/24(Thu)02:36:55 No.41224271

>>41224222
that would be extremely useful, there are several older cartoons that I would love to get the voices of but there is no way to get them split since only existing copy of it is dinky 480p torrent.

Synthbot
07/11/24(Thu)06:24:37 No.41224554

Synthbot 07/11/24(Thu)06:24:37 No.41224554

>>41221698
I'm also getting an error loading that file. Will debug.

Synthbot
07/11/24(Thu)07:00:08 No.41224597

Synthbot 07/11/24(Thu)07:00:08 No.41224597

>>41221698
>>41224554
That sound file seems to have been empty in my copy of Clipper's Master File. I checked for other files, and it looks like that was the only one affected. I'm updating the HuggingFace dataset and re-uploading my Master File clone now. Thanks for pointing that out.

>>41189435
There's an empty SFX file in Master File 2:
>https://mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ/folder/BBo1VSSY
>98 Wind Soft Wind, Weather _ Wind Light, Weather 1_ Wind Yellow Sky, Weather _ Wind Siberian, Weather _ Wind Howling Blizzard Weather.mp3
There's a non-empty file with the same name under HB03, so I think it's safe to delete the HB06 one.

Anonymous
07/11/24(Thu)07:17:16 No.41224620

Anonymous 07/11/24(Thu)07:17:16 No.41224620

https://aitestkitchen.withgoogle.com/tools/music-fx?isMusicFxLive=true
For that anon who likes making music

Anonymous
07/11/24(Thu)07:52:27 No.41224651

Anonymous 07/11/24(Thu)07:52:27 No.41224651

File: images (22).jpg (10 KB, 281x180)

10 KB JPG

Hello Degans I have created the most

degenerate song

Orgy Night With The Mane Six

https://files.catbox.moe/jvyanm.mp3

Anonymous
07/11/24(Thu)11:28:49 No.41224938

Anonymous 07/11/24(Thu)11:28:49 No.41224938

File: dance.gif (77 KB, 240x335)

77 KB GIF

>>41224651
>https://files.catbox.moe/jvyanm.mp3

Anonymous
07/11/24(Thu)16:43:25 No.41225415

Anonymous 07/11/24(Thu)16:43:25 No.41225415

>>41224651
bangin'

Anonymous
07/11/24(Thu)19:58:28 No.41225777

Anonymous 07/11/24(Thu)19:58:28 No.41225777

>suno change UI to new globohomo corpo design
When will we get ai that tells managers to stop trying to "fix" things?

Anonymous
07/11/24(Thu)20:37:08 No.41225850

Anonymous 07/11/24(Thu)20:37:08 No.41225850

>>41225777
I hate when that happens. And they always say it's 'new' and 'fresh'.

Synthbot
07/11/24(Thu)21:58:49 No.41225976

Synthbot 07/11/24(Thu)21:58:49 No.41225976

>>41224597
>>41221698
Fixed. Both HuggingFace and the gdrive clone both have the updated data now.

Anonymous
07/11/24(Thu)23:30:08 No.41226086

Anonymous 07/11/24(Thu)23:30:08 No.41226086

>>41225976
Thanks

Anonymous
07/12/24(Fri)05:48:20 No.41226481

Anonymous 07/12/24(Fri)05:48:20 No.41226481

Uppy.

Anonymous
07/12/24(Fri)12:18:44 No.41226988

Anonymous 07/12/24(Fri)12:18:44 No.41226988

>>41225415
>>41224651
thanxies

Anonymous
07/12/24(Fri)13:33:37 No.41227142

Anonymous 07/12/24(Fri)13:33:37 No.41227142

https://emilia-dataset.github.io/Emilia-Demo-Page/

Anonymous
07/12/24(Fri)18:34:47 No.41228182

Anonymous 07/12/24(Fri)18:34:47 No.41228182

Multi-layer mares

Anonymous
07/12/24(Fri)22:02:48 No.41228966

Anonymous 07/12/24(Fri)22:02:48 No.41228966

>>41228182
On eight pages!

Anonymous
07/13/24(Sat)02:29:36 No.41229541

Anonymous 07/13/24(Sat)02:29:36 No.41229541

>>41228966
That's a lot of mareameters

Anonymous
07/13/24(Sat)03:32:07 No.41229644

Anonymous 07/13/24(Sat)03:32:07 No.41229644

>>41229541
The squaremares will manage.

Anonymous
07/13/24(Sat)03:47:20 No.41229694

Anonymous 07/13/24(Sat)03:47:20 No.41229694

File: 2345675432.jpg (43 KB, 374x376)

43 KB JPG

>>41224651

Anonymous
07/13/24(Sat)07:36:32 No.41230081

Anonymous 07/13/24(Sat)07:36:32 No.41230081

https://vocaroo.com/12ya4n0XVe5m
https://pomf2.lain.la/f/elgi8rs.mp3
https://www.udio.com/songs/dEEeMSr3DLtMdaZjB6o1fE

Stay101
Applejack's Hoedown Memories
Prompt: My Little Pony, Applejack, country, female vocalist,
Country, Female vocalist, Country pop, Singer-songwriter, Folk pop, Contemporary folk, Northern american music, Folk, Regional music, Pop rock, Love, Mellow, Playful

Lyrics

Verse 1:
I'm Applejack, the pony with a cowboy hat
Workin' hard on the farm, there's no denying that
With my trusty hat and boots, I'm ready to roll
My memories on the farm, they never get old

Chorus:
Hoedown memories, laughter in the air
Running through the fields without a care
From sunrise to sunset, workin' side by side
Apple family love, it's a joyful ride
Verse 2:
From Sweet Apple Acres to the rodeo scene
I'm the best apple bucker you've ever seen
Raisin' the barn with friends, it's a sight to behold
All these memories with family, worth more than gold

Chorus:
Hoedown memories, laughter in the air
Running through the fields without a care
From sunrise to sunset, workin' side by side
Apple family love, it's a joyful ride

Bridge:
Through thick and thin, we stick together
Facing challenges, we'll brave any weather
My little pony friends, we're a colorful bunch
With memories like these, we'll never lose touch

Chorus:
Hoedown memories, laughter in the air
Running through the fields without a care
From sunrise to sunset, workin' side by side
Apple family love, it's a joyful ride
Outro:
So take a trip down memory lane with me
In Ponyville, where we're wild and free
Hoedown memories will always be near
With my friends by my side, I have nothing to fear.

Anonymous
07/13/24(Sat)09:57:15 No.41230268

Anonymous 07/13/24(Sat)09:57:15 No.41230268

>>41230081
neat work Anon

Anonymous
07/13/24(Sat)15:31:34 No.41231250

Anonymous 07/13/24(Sat)15:31:34 No.41231250

Page 10 bump.

Anonymous
07/13/24(Sat)21:53:33 No.41232245

Anonymous 07/13/24(Sat)21:53:33 No.41232245

>>41229694
Close your toilet lid before flushing, and keep it closed when not in use. Bacteria builds up in there, especially due to the still water. When water floods the bowl, the bacteria goes all over the place. It might be fine for some time after sanitizing everything, but after a while the bacteria builds up quicker in your bathroom if you're not toilet-conscious.

Anonymous
07/14/24(Sun)03:25:00 No.41232738

Anonymous 07/14/24(Sun)03:25:00 No.41232738

File: OIG4.txnUZsOedDbMw75DJAvF.jpg (137 KB, 1024x1024)

137 KB JPG

Anonymous
07/14/24(Sun)06:08:52 No.41232938

Anonymous 07/14/24(Sun)06:08:52 No.41232938

>>41232738
>mane colors
I have a hard time giving the prompter the benefit of the doubt that this could be a coincidence.

Anonymous
07/14/24(Sun)09:30:14 No.41233210

Anonymous 07/14/24(Sun)09:30:14 No.41233210

>>41232938
What do you mean? I didn't prompt the mane colors and I would have preferred them to look more like Trixie's, but that was the way they turned out.

Anonymous
07/14/24(Sun)10:59:21 No.41233420

Anonymous 07/14/24(Sun)10:59:21 No.41233420

>>41216105
>Innovation
>"I'm owed all of your work and to replace you"
Jew

Anonymous
07/14/24(Sun)14:19:39 No.41233800

Anonymous 07/14/24(Sun)14:19:39 No.41233800

>>41232938
You can thank Google Images and Twitter.

Anonymous
07/14/24(Sun)14:21:30 No.41233802

Anonymous 07/14/24(Sun)14:21:30 No.41233802

>https://huggingface.co/parler-tts/parler_tts_mini_v0.1
>only 647M params
>less than 3gb model size
This seems like a nice, possibly phone sized, tts model.

Anonymous
07/14/24(Sun)14:40:23 No.41233832

Anonymous 07/14/24(Sun)14:40:23 No.41233832

>>41233800
>Twitterites fuck up the AI with their garbage
Now that's just stellar.

Anonymous
07/14/24(Sun)15:28:48 No.41233923

Anonymous 07/14/24(Sun)15:28:48 No.41233923

Has anyone made nsfw audio of a mare riding (You)? y'know grunting, slapping, moaning sounds, etc. so far I've found a couple audios with pinkie and fluttershy.

Anonymous
07/14/24(Sun)18:06:46 No.41234241

Anonymous 07/14/24(Sun)18:06:46 No.41234241

>>41233923
There was one with Twilight that was made very early on, when 15 was available. There was also a really weird Equestria Girls audio where you have sex with Rainbow Dash, then nuclear war breaks out and everybody dies.

Anonymous
07/14/24(Sun)19:49:31 No.41234467

Anonymous 07/14/24(Sun)19:49:31 No.41234467

>>41233923
Try the PoneAI Drive:
drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp

Anonymous
07/15/24(Mon)02:23:03 No.41235123

Anonymous 07/15/24(Mon)02:23:03 No.41235123

>>41232738

Anonymous
07/15/24(Mon)10:50:38 No.41235817

Anonymous 07/15/24(Mon)10:50:38 No.41235817

I want to find a short I saw on Youtube of Anon running a "STUD" phone sex line for mares, the voices were really good and I can't find it again. Anyone has it?

Anonymous
07/15/24(Mon)10:55:05 No.41235831

Anonymous 07/15/24(Mon)10:55:05 No.41235831

>>41235817
https://youtu.be/y5KjXz37TYc
https://ponepaste.org/6642
>>41181350

Anonymous
07/15/24(Mon)11:34:46 No.41235904

Anonymous 07/15/24(Mon)11:34:46 No.41235904

>>41212638
>text generation fails
>errors are thrown instead
>speech generation still works
>Twi scolding you on fucking up

Anonymous
07/15/24(Mon)13:57:17 No.41236287

Anonymous 07/15/24(Mon)13:57:17 No.41236287

bump

Anonymous
07/15/24(Mon)20:25:16 No.41237172

Anonymous 07/15/24(Mon)20:25:16 No.41237172

>>41236287

Anonymous
07/16/24(Tue)00:44:51 No.41237750

Anonymous 07/16/24(Tue)00:44:51 No.41237750

File: 1664286824018001.png (600 KB, 1280x720)

600 KB PNG

Has anyone experimented with both Suno and Udio's audio upload feature? pretty gud, might experiment more tomorrow. I think this might be the best way to create "mare-ly" music rn. The most impressive part is how easily it retains the voice of the original singer and the spirit of the song. This shit will be insane in 2 years

>Rules of Rarity (AI extension around 30s):

https://www.udio.com/songs/f9q48aoVAHLcrnm2oyYDnw

>Coinky Dink World (AI extension around 25s):

https://www.udio.com/songs/cXDhQugtqoeSmuHfJNduwR

>Pinkie's Lament (AI extension around 30s):

https://www.udio.com/songs/jDbAtRzcpzprLK4KVav9em

https://www.udio.com/songs/1nBYDNfjzmx91Mv8dP8Rwa

https://www.udio.com/songs/axvZ569P8nvcDD6WEQJtxQ

>I'll Fly (AI extension around 30s):

https://www.udio.com/songs/sfmVfvmBnzAJCwjDonojju

>Pony I Want to Be (AI extension around 13s):

https://www.udio.com/songs/hM6FTmWPTpeog2u9RZ9o7e

https://www.udio.com/songs/nBduKsoMeWTocLaPTUPy4o

https://www.udio.com/songs/o28uj8rqKWHoEd3tJZ7RP8

https://www.udio.com/songs/4vV67be2eoQEH53iSuz8ac

https://www.udio.com/songs/mrnxHf5Mi6AsfPiTKQeP49

>Smile Song (AI extension around 30s):

https://www.udio.com/songs/4VHb7HDA9y1KhqeSzGLGAw

https://www.udio.com/songs/vRE3WqsxAeZEKhURQbqQ3v

https://www.udio.com/songs/b7z29FvNWixtJaD6iAvudp

https://www.udio.com/songs/czoS8J1Hzy11dQzrbzSUAW

https://www.udio.com/songs/kEhnvpX9j9RMkWuPPnFp9Z

https://www.udio.com/songs/cf1GX7YfHSZjh7atRsLFng

https://www.udio.com/songs/j19cTNE7gmuUjY9TPkwE6M

https://www.udio.com/songs/u5ds29S37csb558z1oLY8k

>Giggle at the Ghosties (AI extension around 33s):

https://www.udio.com/songs/evxeR8PzmkuK9WYs7SdZW8

https://www.udio.com/songs/uxquZLy9S2cZpqdupBDiNj

https://www.udio.com/songs/5gJehHnERQdU4D6nkTFyAG

https://www.udio.com/songs/vfqqdXGQ697wqJ5grVVy5B

https://www.udio.com/songs/15PL9FmnfjaLe4eGWpGW4o

>Cutie Mark Crusaders go Crusading BGM (AI extension around 30s):

https://suno.com/song/54211a67-c9ab-4fef-8a48-122b191c8cd6

>Show intro instrumental (AI extension around 27s):

https://www.udio.com/songs/aV3hN5Jky3FvdBSAk5xXZh

The AI can also extend backwards, making "intros" for these songs too, which is pretty neat-o. Could have potential for creating mare-ly songs if you just keep extending off these, perhaps with an extra RVC pass:

https://www.udio.com/songs/iqVw5UFfsvT9cg8wMuUQ4f

https://www.udio.com/songs/quUkWowsZ6nzYmgHaxHk2e

https://www.udio.com/songs/rm7JvPxKM3v2wLRkmuWUSF

https://www.udio.com/songs/aALnyb32bUtogmDiTcmpmY (japanese smile song kek)

https://www.udio.com/songs/m1zwVBEu7KGLPXR2w2aMzW

https://www.udio.com/songs/77Jsm4M2DWnpAvhevwUyQE

https://www.udio.com/songs/szFDLUuxRA6caDMmt9Lsuy

Anonymous
07/16/24(Tue)00:52:20 No.41237763

Anonymous 07/16/24(Tue)00:52:20 No.41237763

>>41237750
forgot to mention, but these generations are HEAVILY unoptimized. These could sound a lot better with some experimenting. Towards the end I found out that if you shorten the context length of the extensions to purely the length of the uploaded audio instead of leaving it at max, the extension adheres a lot better to the original audio. There's also other shit like Prompt and Lyric strength, too. Need to experiment more. Plus, I used Udio's shitty auto-generated lyrics instead of using GPT-4, so yeah

Anonymous
07/16/24(Tue)04:58:44 No.41238194

Anonymous 07/16/24(Tue)04:58:44 No.41238194

Mare me up inside

Anonymous
07/16/24(Tue)06:47:22 No.41238331

Anonymous 07/16/24(Tue)06:47:22 No.41238331

File: 2908460__safe_artist-colo(...).jpg (244 KB, 884x1200)

244 KB JPG

I'm sorry if that's not the right place to ask, but I was on the bootleg rewatch stream last weekend, and someone queued this ~~gem~~: https://beautiful.bluefast.horse/jgq2up5.mp4, a cover of S3RL - MTC with Spike and Rarity voices. I couldn't find this on YouTube or PonyTube, and if I understood the chat correctly it's not a new piece of content, so I wonder why? Was it meant to be not really public? It's really nice.

Anonymous
07/16/24(Tue)08:00:55 No.41238451

Anonymous 07/16/24(Tue)08:00:55 No.41238451

>>41238331
That clip was a submission for the first mlp AI redub. As for why it's not up on its own, I suppose that's up to the original creator of it.
https://youtu.be/gEXaFVw9J1o?si=d8ObKCJtMzcKtzVy&t=730

Anonymous
07/16/24(Tue)08:19:30 No.41238481

Anonymous 07/16/24(Tue)08:19:30 No.41238481

>>41237750
I'm at work so I will listen to those latter, but thank you for testing those stuff out .

Anonymous
07/16/24(Tue)11:59:43 No.41238840

Anonymous 07/16/24(Tue)11:59:43 No.41238840

This question has probably been asked to death, but what happened to 15? I remember that when these threads were young, they used to post the best shit, then came the website, and then nothing? Did they ever mention why they took down the website/stopped contributing?

Anonymous
07/16/24(Tue)12:53:17 No.41238952

Anonymous 07/16/24(Tue)12:53:17 No.41238952

>>41238840
Either he got slapped with some lawsuits and his lawyer told him to disappear and avoid all AI stuff for the forseeable future, or he got tired of dealing with all the fags on twitter nagging him to add more voice models or whining about the pony models and moved his research to a private zone.

Anonymous
07/16/24(Tue)13:16:24 No.41239001

Anonymous 07/16/24(Tue)13:16:24 No.41239001

>>41238840
Legal issues.

Anonymous
07/16/24(Tue)13:56:46 No.41239085

Anonymous 07/16/24(Tue)13:56:46 No.41239085

So is there anything to the level of 15.ai that's freely available anywhere? Or is it all just uberduck tier at best these days? I know there's loads of ways to regenerate audio into a character's voice, but that simply doesn't interest me because it doesn't capture what TTS-generation does at all.

Anonymous
07/16/24(Tue)14:01:13 No.41239097

Anonymous 07/16/24(Tue)14:01:13 No.41239097

>>41239085
Try Haysay. It has one text to speech model and a couple of audio to audio ones.
https://haysay.ai/

Anonymous
07/16/24(Tue)14:03:42 No.41239105

Anonymous 07/16/24(Tue)14:03:42 No.41239105

>>41239097
Sadly that again looks like it requires imput audio. I'm talking about PURE text to speech like 15.ai and ElevenLabs did.

Anonymous
07/16/24(Tue)14:06:41 No.41239115

Anonymous 07/16/24(Tue)14:06:41 No.41239115

>>41239105
Oh I was wrong. But man does it sound honestly fucky and limited compared to what used to be free. I'm not surprised but this is closer to shitty uberduck quality more than anything.

Anonymous
07/16/24(Tue)14:42:17 No.41239180

Anonymous 07/16/24(Tue)14:42:17 No.41239180

>>41239105
>https://haysay.ai/
Choose Controllable TalkNet or StyleTTS for straight Text to Speech synthesis. Neither of those require reference audio.
https://files.catbox.moe/ky4fhg.mp3
Not particularly impressive, but actually better than I remember, and it's what we have at the moment.

Anonymous
07/16/24(Tue)14:47:02 No.41239187

Anonymous 07/16/24(Tue)14:47:02 No.41239187

>>41239115
Unfortunately 15 was probably the result of far too many stars aligning at just the right time. Chances are we're never gonna get something with that level of both quality and ease of use again.

Anonymous
07/16/24(Tue)16:07:44 No.41239382

Anonymous 07/16/24(Tue)16:07:44 No.41239382

>>41237750 >>41238481
Now that I've listen to some of those, I feel it's a little bit of mix bag, some vocal extensions are like 95% same as the input, then there are the ones that differ between input and output as much as VAs clips between early seasons and late seasons. Then there are some of them that sound like a decent quality fan VA imitating the mane6 followed up by a the ones that sound like lower quality fan VAs.
But over all its surprisingly positive result, as they are able to get to such good quality without having extra fine tuned on the show clips.
>>41239187
annoying,we seen even in just this thread that there are new algorithms and tech to remake or improve tts tech to get the emotion control and quality to current tts tech, but there just aren't HIGH level academic anons here to understand how this nonopen tech works and reverse engineer it for people to use.

Anonymous
07/16/24(Tue)16:52:28 No.41239459

Anonymous 07/16/24(Tue)16:52:28 No.41239459

>>41239382
Yep. I'm moreso focused on the potential of this particular technology and how it might improve in the next one or two years - perhaps to an exponential degree. This technology isn't even 6 months old and it's already progressing at such an insane degree. After looking into it a bit more, I discovered that there's an even better tier audio model locked behind the premium tier for Udio in particular, so I might shell out some shekels and experiment even further. Exciting times ahead.

Anonymous
07/16/24(Tue)19:18:46 No.41239783

Anonymous 07/16/24(Tue)19:18:46 No.41239783

>>41239459
if there would ever be a leak of on of those large song generating models, it would help tremendously with a lot of other project I keep in back-burner, for example their cloning abilities could possibly be used to extend few seconds of background character speech to a few minutes monologue and allow for creation of better artificial data set to make even better tts/voice conversion model down the line.

Anonymous
07/17/24(Wed)03:44:49 No.41240740

Anonymous 07/17/24(Wed)03:44:49 No.41240740

Precautionary bump.

Anonymous
07/17/24(Wed)10:35:34 No.41241363

Anonymous 07/17/24(Wed)10:35:34 No.41241363

Page 10 bump.

Anonymous
07/17/24(Wed)10:35:54 No.41241365

Anonymous 07/17/24(Wed)10:35:54 No.41241365

UNLEASH THE PONIES!

Anonymous
07/17/24(Wed)14:23:06 No.41241721

Anonymous 07/17/24(Wed)14:23:06 No.41241721

>https://www.youtube.com/watch?v=7_LW7u-nk6Q
>bipedal robot tough with reinforced machine learning
I desperately need a robot mare maid.

Anonymous
07/17/24(Wed)20:23:17 No.41242583

Anonymous 07/17/24(Wed)20:23:17 No.41242583

File: OIG1.qLdeafnslF8gm0MBCH9v.jpg (107 KB, 1024x1024)

107 KB JPG

Anonymous
07/18/24(Thu)03:19:03 No.41243354

Anonymous 07/18/24(Thu)03:19:03 No.41243354

Bump

Anonymous
07/18/24(Thu)04:03:11 No.41243432

Anonymous 07/18/24(Thu)04:03:11 No.41243432

>>41233923
asking this too but ~~of glimmer telling me to cum and calling me a good boy~~

Anonymous
07/18/24(Thu)09:32:37 No.41243828

Anonymous 07/18/24(Thu)09:32:37 No.41243828

File: Untitled.png (244 KB, 1108x930)

244 KB PNG

TTSDS -- Text-to-Speech Distribution Score
https://arxiv.org/abs/2407.12707
>Many recently published Text-to-Speech (TTS) systems produce audio close to real speech. However, TTS evaluation needs to be revisited to make sense of the results obtained with the new architectures, approaches and datasets. We propose evaluating the quality of synthetic speech as a combination of multiple factors such as prosody, speaker identity, and intelligibility. Our approach assesses how well synthetic speech mirrors real speech by obtaining correlates of each factor and measuring their distance from both real speech datasets and noise datasets. We benchmark 35 TTS systems developed between 2008 and 2024 and show that our score computed as an unweighted average of factors strongly correlates with the human evaluations from each time period.
might be interesting for some here

Anonymous
07/18/24(Thu)16:05:34 No.41244557

Anonymous 07/18/24(Thu)16:05:34 No.41244557

>>41237172

Anonymous
07/18/24(Thu)19:19:19 No.41244976

Anonymous 07/18/24(Thu)19:19:19 No.41244976

Experiments with transforming speech features in RVC
Add gaussian noise with scale
>0.0: https://files.catbox.moe/7dycpu.mp3
>0.1: https://files.catbox.moe/btzkpu.mp3
>0.2: https://files.catbox.moe/6z3x04.mp3
>0.3: https://files.catbox.moe/co3z3f.mp3
>0.4: https://files.catbox.moe/hsk53e.mp3
>0.5: https://files.catbox.moe/ae0qgk.mp3
>1.0: https://files.catbox.moe/142845.mp3
>3.0: https://files.catbox.moe/h4og7u.mp3

(I'm not sure about the implementation of the rest of these functions, I just asked Claude to write them)
Gaussian filter (kernel size 7, sigma varies):
>sigma=0: https://files.catbox.moe/nki99s.mp3
>sigma=1: https://files.catbox.moe/ex83vn.mp3
>sigma=2.5: https://files.catbox.moe/68nds0.mp3
>sigma=5: https://files.catbox.moe/7eloma.mp3
>sigma=10: https://files.catbox.moe/icxfph.mp3
>Kernel size 15, sigma 30.0: https://files.catbox.moe/5w9s6h.mp3

Subtract scaled Laplacian filter:
>0: https://files.catbox.moe/mucqxq.mp3
>0.3: https://files.catbox.moe/s9q56y.mp3
>0.6: https://files.catbox.moe/vjjpcb.mp3
>1.0: https://files.catbox.moe/tuzxs4.mp3

PCA (50 components, showing first 3) (86% explained variance):
Add to first component:
>0: https://files.catbox.moe/0p2bqd.mp3
>2: https://files.catbox.moe/bkwjyz.mp3
>4: https://files.catbox.moe/w16ejj.mp3
>-2: https://files.catbox.moe/4ofoai.mp3
>-4: https://files.catbox.moe/ftpzpo.mp3

Add to second component:
>2: https://files.catbox.moe/36tg4x.mp3
>4: https://files.catbox.moe/j3wxq8.mp3
>-2: https://files.catbox.moe/cwxa64.mp3
>-4: https://files.catbox.moe/gado1x.mp3

Add to third component:
>2: https://files.catbox.moe/jawiov.mp3
>4: https://files.catbox.moe/830iey.mp3
>-2: https://files.catbox.moe/c953iy.mp3
>-4: https://files.catbox.moe/jfk71a.mp3

Some interesting combinations:
[0, 4, -4]: https://files.catbox.moe/obcqbg.mp3
[4, 0, -4]: https://files.catbox.moe/b3ed2u.mp3

PCA (10 components) (41% explained variance): https://files.catbox.moe/9horpd.mp3
Code: https://github.com/effusiveperiscope/raraai/blob/master/2_featureexpl/testing.ipynb

Anonymous
07/18/24(Thu)19:20:19 No.41244980

Anonymous 07/18/24(Thu)19:20:19 No.41244980

>>41244976
Now do it again, but with a line from the actual character:
Add gaussian noise with scale
>0.0: https://files.catbox.moe/h4va4y.mp3
>0.1: https://files.catbox.moe/q91pta.mp3
>0.2: https://files.catbox.moe/ste0ox.mp3
>0.3: https://files.catbox.moe/nuks4d.mp3
>0.4: https://files.catbox.moe/o3xucd.mp3
>0.5: https://files.catbox.moe/mlpskx.mp3
>1.0: https://files.catbox.moe/fvi6rd.mp3
>3.0: https://files.catbox.moe/cofucz.mp3

(I'm not sure about the implementation of the rest of these functions, I just asked Claude to write them)
Gaussian filter (kernel size 7, sigma varies):
>sigma=0: https://files.catbox.moe/uuudlh.mp3
>sigma=1: https://files.catbox.moe/83nv38.mp3
>sigma=2.5: https://files.catbox.moe/vx00ia.mp3
>sigma=5: https://files.catbox.moe/vx00ia.mp3
>sigma=10: https://files.catbox.moe/edw1hq.mp3
>Kernel size 15, sigma 30.0: https://files.catbox.moe/kd7vxw.mp3

Subtract scaled Laplacian filter:
>0: https://files.catbox.moe/jhv6ow.mp3
>0.3: https://files.catbox.moe/y0uh36.mp3
>0.6: https://files.catbox.moe/43wdcy.mp3
>1.0: https://files.catbox.moe/5mhy3j.mp3

PCA (50 components, showing first 3) (88% explained variance):
Add to first component:
>0: https://files.catbox.moe/2e7pcy.mp3
>2: https://files.catbox.moe/pr5qcf.mp3
>4: https://files.catbox.moe/w3r928.mp3
>-2: https://files.catbox.moe/rnsn5m.mp3
>-4: https://files.catbox.moe/0gdho5.mp3

Add to second component (pretty interesting):
>2: https://files.catbox.moe/jlsq2f.mp3
>4: https://files.catbox.moe/wuibwr.mp3
>-2: https://files.catbox.moe/gceiip.mp3
>-4: https://files.catbox.moe/uoj6ga.mp3

Add to third component:
>2: https://files.catbox.moe/gmynkz.mp3
>4: https://files.catbox.moe/iaam1c.mp3
>-2: https://files.catbox.moe/cobbez.mp3
>-4: https://files.catbox.moe/5r0k94.mp3

[0, 4, -4]: https://files.catbox.moe/19zl30.mp3
[4, 0, -4]: https://files.catbox.moe/0jsw6o.mp3

PCA (10 components) (46% explained variance): https://files.catbox.moe/tueqn7.mp3

Anonymous
07/18/24(Thu)20:55:45 No.41245258

Anonymous 07/18/24(Thu)20:55:45 No.41245258

Uppy.

Anonymous
07/18/24(Thu)22:28:47 No.41245510

Anonymous 07/18/24(Thu)22:28:47 No.41245510

what happened to 15

Anonymous
07/18/24(Thu)22:46:05 No.41245552

Anonymous 07/18/24(Thu)22:46:05 No.41245552

>>41245510
see >>41238840

Anonymous
07/19/24(Fri)02:56:21 No.41246044

Anonymous 07/19/24(Fri)02:56:21 No.41246044

>>41244976
>>41244980
berry interesting. is this code editing the audio before it gets sorted inside the rvc or there are some code elements that are changing how the model operates (like loras with image ai and such)?

Anonymous
07/19/24(Fri)04:42:56 No.41246189

Anonymous 07/19/24(Fri)04:42:56 No.41246189

>>41246044
It is modifying the hubert speech features that are input into RVC.

The rationale here is: when non-mare is converted into mare voice it doesn't sound the same. Likely because the model only sees speech features from the target mare in training and not other speakers, and features from non-mare differ in a significant way that causes the resulting artifacts. So maybe there is a way to make non-mare feature inputs closer to the target mare using an added network that would perform better than the current retrieval based method RVC is using (I'm not sure exactly how much testing went into choosing the "retrieval" methods or "cluster" methods for this purpose; if there is any documentation for this it is probably on some Chinese forum or website where they were developed)

The approximate idea is that this is a denoising problem. You could add noise/convert multiple times through different RVC models/perturb PCA components/apply conv kernels to generate a synthetic dataset of "noisy" features from real lines, then learn a mapping from noisy features to denoised features that sound like the target character. Part of what I am trying to figure out here is whether there is a sensible way to perturb the features that will preserve enough low-level information that a model could reasonably reconstruct the input (as opposed to distorting the signal into complete meaningless garbage), and whether those perturbations plausibly sound like the sort of artifacts that happen when a non-mare speaker tries to convert to mare. so-vits-svc 5.0 has this baked into the training process, where they add gaussian noise to whisper features for what I can only imagine is a similar reason.

I'm sure there is a more statistically sophisticated way of doing this but that's above my education grade. Each speech feature is 768 long so there's no real retard-friendly way to "visualize" them like you can image models, so "hearing" what RVC thinks is the next best thing.

What's kind of interesting is how the transformations that correspond to image operations (gaussian blur/laplacian filter sharpening) actually seem to match what happens to the resulting audio--the gaussian filter audibly "blurs" the syllables and makes everything sound "fuzzy", and the subtract laplacian brings out random details and makes it more defined. The "fuzziness" sounds similar to what I get whenever I try to convert my voice to Twilight. Some of the PCA perturbations also sound similar, seeming to modify formants of the output to the point where it still sounds like the correct words, but the speaker doesn't even sound like RD (so it might not even need a NN)?

Anonymous
07/19/24(Fri)05:05:24 No.41246213

Anonymous 07/19/24(Fri)05:05:24 No.41246213

>>41246189
The reason I'm looking at this is because while making this bumper:
https://files.catbox.moe/vxucyp.mp3
I had the idea to feed in StyleTTS2 generated audio thru RVC and it sounded significantly better, quality wise, than any conversions using my own voice directly, even if the prosody suffered (which actually was OK for the purposes of a radio bumper because it doesn't need to sound like a plausible delivery in that context). I also fed an RVC generated reference into the style encoder, since I also noticed that the "style encoding" pretty much seemed to copy the prosody of the input audio--the "style vector" actually seems to smuggle in a lot more information than you'd expect from the term "style", so it makes StyleTTS2 act like a sort of "fuzzy voice conversion". So the whole pipeline was RVC -> StyleTTS2 -> RVC which is a bit too convoluted than I'd like for any longer projects.

Anonymous
07/19/24(Fri)07:35:45 No.41246402

Anonymous 07/19/24(Fri)07:35:45 No.41246402

https://github.com/fishaudio/fish-speech
new update to 1.2. looks like they keep their website version ahead by 1.
https://www.bilibili.com/video/BV1wz421B71D/

Anonymous
07/19/24(Fri)13:48:06 No.41246916

Anonymous 07/19/24(Fri)13:48:06 No.41246916

>>41246189
>>41246213
so if i understand correcly, you are trying to find a way for the RVC to take a bit noisy recording (like what most people have) that has voice not in the range of the model and try to squeeze more quality out of it in the outputs? Thats neat.
>RVC -> StyleTTS2 -> RVC
Ah, the classic "use ai to make ai better" technique, this is a very dinky way to work but whenever models refuse to work with my vocals (with is 9 out of 10 times) its a helpful trick to finish a project.

Anonymous
07/19/24(Fri)17:35:49 No.41247299

Anonymous 07/19/24(Fri)17:35:49 No.41247299

Putin Eats Poutine: https://www.udio.com/songs/tUaTeSxWiWB78KdmfafPNz
Celestia Eats Cake: https://www.udio.com/songs/sdPiXv4DWSG4EvmCk2YYRG
https://vocaroo.com/1iwpscmGqkRY
https://pomf2.lain.la/f/b90j92f6.mp3

Udio v1
Celestia's Sweet Secret (Celestia Eats Cake) ext v1.2.2

[Verse 1]
In a castle with grandeur, oh (whoa whoa)
A princess with a secret way (hey hey)
Celestia's got a sweet delight
In a slice of cake, day or night

[Pre-Chorus]
All she can think of as she reigns
Is that sugar rush she craves

[Chorus]
Celestia. Eats cake.
Celestia. Eats cake.
Celestia. Eats cake.
Celestia. Eats cake.
Every day!

[Verse 2]
In the grand dance halls we see (hey hey)
Robots groove in symmetry, oh (whoa)
Celestia’s got that royal taste
For that cake she won’t waste

[Pre-Chorus]
In the beats of the night, she dreams
Of the sweetest, creamiest themes

[Bridge]
"Whoa whoa, wait a minute!
Are you saying the queen of Equestria
Has such a sweet, delightful sin?"
Yes!

[Chorus]
Celestia. Eats cake.
Celestia. Eats cake.
Celestia. Eats cake.
Celestia. Eats cake.
Every day!

Anonymous
07/19/24(Fri)18:34:29 No.41247408

Anonymous 07/19/24(Fri)18:34:29 No.41247408

File: def3ce8-8ff72b83-25d4-442(...).jpg (220 KB, 2000x2000)

220 KB JPG

How would I go about changing the lyrics to a song like what this person did here? https://www.youtube.com/watch?v=VornCFk3Z0U
They changed the word 'starboy' to 'starpony' and I was just wondering how I could do the same thing. I already know how to use RVC to change the voice, I just need to know how to change specific lyrics

Anonymous
07/20/24(Sat)02:35:15 No.41248435

Anonymous 07/20/24(Sat)02:35:15 No.41248435

>>41242583

Anonymous
07/20/24(Sat)03:13:25 No.41248476

Anonymous 07/20/24(Sat)03:13:25 No.41248476

>>41247408
If you can make the same quality + pronunciation as the original vocals converted with rvc/sovits that just cut the single word in audacity and swap it out.
However most of the time you may end up needing to re-sing entire sentence (or even the song ) to make that work.

Anonymous
07/20/24(Sat)10:50:21 No.41248895

Anonymous 07/20/24(Sat)10:50:21 No.41248895

>>41247408
>>41248476
Also, you probably need to extract the music beforehand for that too.
Some AI are specialized in to that.

Anonymous
07/20/24(Sat)18:54:03 No.41249876

Anonymous 07/20/24(Sat)18:54:03 No.41249876

>10

Anonymous
07/20/24(Sat)19:19:32 No.41249969

Anonymous 07/20/24(Sat)19:19:32 No.41249969

>whisper-medium transcribes this as: "Equestria, Orlando, France!"
https://files.catbox.moe/wx4hzu.mp3

Anonymous
07/21/24(Sun)02:29:07 No.41250692

Anonymous 07/21/24(Sun)02:29:07 No.41250692

bump

Anonymous
07/21/24(Sun)04:58:15 No.41250916

Anonymous 07/21/24(Sun)04:58:15 No.41250916

File: Gemini_Generated_Image_9h(...).jpg (293 KB, 1536x1536)

293 KB JPG

>>41237750
Been having issues with Suno's audio upload feature, where it "successfully" uploads but then simultaneously throws a network error. Would be doing the exact same otherwise; hopefully it clears up.

Vocal coherence in your examples is more than I was expecting, but still veers off enough to want improvement. If the samples separate well enough using UVR (or similar) maybe it can be fed back through RVC to re-ponify?

Anonymous
07/21/24(Sun)11:22:29 No.41251308

Anonymous 07/21/24(Sun)11:22:29 No.41251308

File: 669d235a82b9813a19fe5a43.webm (595 KB, 1120x840)

595 KB WEBM

Anonymous
07/21/24(Sun)16:17:16 No.41251785

Anonymous 07/21/24(Sun)16:17:16 No.41251785

File: 1687370679074222.png (1.1 MB, 3508x2480)

1.1 MB PNG

>>41249969
Sound accurate, not sure whats the problem Anon ^:)

Anonymous
07/21/24(Sun)20:28:39 No.41252350

Anonymous 07/21/24(Sun)20:28:39 No.41252350

File: OIG2.ODAYiy5u9L5O3xfEDd_H.jpg (112 KB, 1024x1024)

112 KB JPG

Anonymous
07/22/24(Mon)00:11:11 No.41252945

Anonymous 07/22/24(Mon)00:11:11 No.41252945

>>41252350

Anonymous
07/22/24(Mon)03:38:05 No.41253299

Anonymous 07/22/24(Mon)03:38:05 No.41253299

>>41251308
That doesn't look healthy.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.