/mlp/ - Pony Preservation Project (Thread 147) - Pony

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/mlp/ - Pony

Return Catalog Bottom Refresh

[Post a Reply]

Name
Spoiler?	[Spoiler?]
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File	[Spoiler?]
Please read the Rules and FAQ before posting.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

3-Year duration 4chan Passes are now available for $45

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
Pony Preservation Project (Thr(...) 08/28/24(Wed)18:31:23 No.41364782

File: New OP.png (1.53 MB, 2119x1500)

1.53 MB PNG

Pony Preservation Project (Thread 147) Anonymous 08/28/24(Wed)18:31:23 No.41364782

Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE

The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.

Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.

AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.

Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.

EQG and G5 are not welcome.

>Quick start guide:
docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.

>The main Doc:
docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
An in-depth repository of tutorials, resources and archives.

>Active tasks:
Research into animation AI
Research into pony image generation

>Latest developments:
GDrive clone of Master File now available >>37159549
SortAnon releases script to run TalkNet on Windows >>37299594
TalkNet training script >>37374942
GPT-J downloadable model >>37646318
FiMmicroSoL model >>38027533
Delta GPT-J notebook + tutorial >>38018428
New FiMfic GPT model >>38308297 >>38347556 >>38301248
FimFic dataset release >>38391839
Offline GPT-PNY >>38821349
FiMfic dataset >>38934474
SD weights >>38959367
SD low vram >>38959447
Huggingface SD: >>38979677
Colab SD >>38981735
NSFW Pony Model >>39114433
New DeltaVox >>39678806
so-vits-svt 4.0 >>39683876
so-vits-svt tutorial >>39692758
Hay Say >>39920556
Haysay on the web! >>40391443
SFX seperator >>40786997 >>40790270
Synthbot updates GDrive >>41019588
Private "MareLoid" project >>40925332 >>40928583 >>40932952
VoiceCraft >>40938470 >>40953388
Fimfarch dataset >>41027971
5 years of PPP >>41029227
Audio re-up >>41100938
RVC Experiments >>41244976 >>41244980
Ace Studio Demo >>41256049 >>41256783

>The PoneAI drive, an archive for AI pony voice content:
drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp

>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
drive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx

>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.

Last Thread:
>>41354496

Anonymous
08/28/24(Wed)18:31:50 No.41364785

Anonymous 08/28/24(Wed)18:31:50 No.41364785

FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Main: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit

>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwq
How to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy

>Where can I find content made with the voice AI?
In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
And the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit

>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ4
2021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f
2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw5
2023 pony.tube/w/fVZShksjBbu6uT51DtvWWz

>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.

>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imitations of official voices?
No.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.

>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
pony.tube/w/mqJyvdgrpbWgZduz2cs1Cm

PPP Redubs:
pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97

Stream Premieres:
pony.tube/w/6cKnjJEZSCi3gsvrbATXnC
pony.tube/w/oNeBFMPiQKh93ePqTz1ns8

Anonymous
08/28/24(Wed)18:32:51 No.41364787

Anonymous 08/28/24(Wed)18:32:51 No.41364787

File: anchor.png (33 KB, 1200x1453)

33 KB PNG

Anchor

Anonymous
08/28/24(Wed)18:54:27 No.41364866

Anonymous 08/28/24(Wed)18:54:27 No.41364866

Is Clipper still doing episodes? I loved that "Free Hugs" one.

Anonymous
08/28/24(Wed)18:56:12 No.41364875

Anonymous 08/28/24(Wed)18:56:12 No.41364875

>>41364866
God I hope not. It sucks.

Anonymous
08/28/24(Wed)18:57:00 No.41364876

Anonymous 08/28/24(Wed)18:57:00 No.41364876

File: 1701781466764247.png (60 KB, 500x459)

60 KB PNG

>>41364875
>Wanting everything to be about sex with Anon

Anonymous
08/28/24(Wed)19:00:26 No.41364884

Anonymous 08/28/24(Wed)19:00:26 No.41364884

>>41364876
Good idea

Clipper
08/28/24(Wed)19:03:25 No.41364894

Clipper 08/28/24(Wed)19:03:25 No.41364894

File: 670652.png (670 KB, 3991x5761)

670 KB PNG

>>41364787
Added audio from recently released animatics (s2e3, 25, 26) to the voice dataset, replacing corresponding entries in the FiM folder.
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
Sliced Dialogue -> Special source
Also put label files in the the label files folder.

>>41364866
Working on a thing to present at Mare Fair. Not an episode as such, though will make use of the AI voice.

>>41364875
There are a lot of things I'd do differently looking back, mainly pacing. Always better next time, that's the goal.

Anonymous
08/28/24(Wed)19:05:04 No.41364903

Anonymous 08/28/24(Wed)19:05:04 No.41364903

>>41364894
>Sex Hotline 2.0

Anonymous
08/28/24(Wed)19:55:53 No.41365081

Anonymous 08/28/24(Wed)19:55:53 No.41365081

>>41364787
reposting song cover in case people missed it from the last thread >>41361841
>https://files.catbox.moe/qxa6vp.mp3

Anonymous
08/28/24(Wed)20:18:35 No.41365138

Anonymous 08/28/24(Wed)20:18:35 No.41365138

File: FlutterbotExperimentation.png (248 KB, 751x477)

248 KB PNG

Found a good VITS-based voice conversion/generation tool with decent TTS capabilities called Applio.
https://www.youtube.com/watch?v=gjggpadBgOo
https://github.com/IAHispano/Applio
https://applio.org/

It seems to have a lot of functionality similar to Hay Say, but with more in-depth TTS. The way it functions is effectively has a TTS voice layer of various voices; speakers of numerous countries, languages and accents that it uses to interpose RVC voices onto. Which works pretty well in my early testings in the compiled clip below.

https://files.catbox.moe/lju6ub.mp4

Full compatible with existing pony models, as evidenced by it working with Vul's Fluttershy S1 model. Though it was a bit confusing where the models had to go (Apparently in "/Applio-3.2.4/logs/" in a named folder containing the .pth and .index files). It's a little finicky, needing experimentation with suitable TTS voices and settings to optimize for each mare, adjusting until it sounds right. The noisiness is mostly from the TTS end of things and less so the RVC side. Additional RVC training and inference functions are useful too, hadn't tested those parts yet though.

Anonymous
08/28/24(Wed)20:27:09 No.41365157

Anonymous 08/28/24(Wed)20:27:09 No.41365157

>>41365138
Those clips sound pretty good, some low level noise but all within what I'd call an acceptable limit.
Might this be a new start for pony TTS? Would be awesome to have an alternative to those that rely fully on reference audio.

Synthbot
08/28/24(Wed)23:18:29 No.41365609

Synthbot 08/28/24(Wed)23:18:29 No.41365609

>>41364894
- [In progress] Download the Master File again so I can a clean updated copy.
- [ ] Reupload a clone of the new Master File to my gdrive.
- [ ] Reupload a clone of both Master Files to HuggingFace.

Separately, I've spent a lot of time in the last few months working with LLMs. I'm putting together a library for creating more complex chatbots.
https://github.com/synthbot-anon/horsona
- [In progress] Collect a list of functionality that would help in making better chatbots. Right now, that means going through existing chatbots and figuring out (1) what features they support, (2) what it takes to specify a new chatbot, and (3) what people complain about regarding chatbot interactions and personalities.
- [ ] Split the target features into individual functions that can be implemented and pieced together, and create a github issue for each one so it's easy to keep track of everything.
- [ ] Start implementing.

If anyone has ideas for functionality and for things other chatbots do well/poorly, let me know. Right now, I don't care how difficult anything here is to implement.

Anonymous
08/29/24(Thu)01:44:53 No.41365910

Anonymous 08/29/24(Thu)01:44:53 No.41365910

>>41364782
Is there a voice actor AI model that can simulate rage and emotions like AVGN and doesn't take a rocket degree science to utilize it? I plan on using it for really long ranting reviews of 10k+ words.

Anonymous
08/29/24(Thu)01:55:09 No.41365924

Anonymous 08/29/24(Thu)01:55:09 No.41365924

>>41365910
No, not right now.

Anonymous
08/29/24(Thu)02:26:07 No.41365950

Anonymous 08/29/24(Thu)02:26:07 No.41365950

File: Gemini_Generated_Image_jh(...).jpg (316 KB, 1536x1536)

316 KB JPG

>>41365609
>Chatbots
>Horsona
Would there be any likelihood of these being capable of writing entire fics independently like GPT-PNY does/did? Curious too about there flexibility as it could be fun to have AI mares give us a script to then animate or otherwise try and achieve. Mare assisted brainstorming.

Synthbot
08/29/24(Thu)03:22:24 No.41366029

Synthbot 08/29/24(Thu)03:22:24 No.41366029

>>41365950
I think so. I'm toying around with having an LLM read a fanfic paragraph-by-paragraph to extract information for automated lorebook creation. Writing a fic would be basically that in reverse + lorebook generation.
Once I get a better handle on how to implement this and everything /chag/ suggested, I'll try to break down the tasks into small pieces that can be implemented by more people than just myself.

Anonymous
08/29/24(Thu)04:40:21 No.41366105

Anonymous 08/29/24(Thu)04:40:21 No.41366105

>>41365138
Example is little bit too high pitched but it does show an idea. I will test it out later myself.

Anonymous
08/29/24(Thu)07:44:51 No.41366302

Anonymous 08/29/24(Thu)07:44:51 No.41366302

>>41366029
are you planing to make it as heavy modded TavernAI or some custom ui program from scratch?

Anonymous
08/29/24(Thu)08:14:11 No.41366336

Anonymous 08/29/24(Thu)08:14:11 No.41366336

>>41366105
There is a pitch slider in the settings, so it's pretty much a non-issue; can be adjusted as desired. This will be a setting played around with often as differing TTS voices each vary in natural pitch range. Some deliveries might also need a small increase or reduction too.

Lower pitches did feel more Fluttershy, but didn't seem to have as much pitch variance. She kinda sounded bored or tired to me.

Synthbot
08/29/24(Thu)09:06:51 No.41366401

Synthbot 08/29/24(Thu)09:06:51 No.41366401

>>41366302
I don't know. Probably a mix of both, leaning toward integrating with other UIs as much as possible. Right now, I mostly want to see how limiting the technical challenges really are when trying to make perfect chatbots. I intend for it to be a library, not a full chat program, but a UI might be necessary occasionally to make use of & test the functionality.

Anonymous
08/29/24(Thu)09:23:59 No.41366416

Anonymous 08/29/24(Thu)09:23:59 No.41366416

>>41366401
I still have a fondness of for how the barebones GPT-PNY worked way back when, with the colab and separate window thing. I still feel it functioned a lot better and more freely than in KoboldAI, so any simple interfacing you come up with that'd allow for more raw/unfiltered/free-form outputs is good by me, even if other more flexible and potentially restrictive interfaces are adopted for it later on too.

Anonymous
08/29/24(Thu)13:54:17 No.41366868

Anonymous 08/29/24(Thu)13:54:17 No.41366868

>>41366416
>https://www.youtube.com/watch?v=jHS1RJREG2Q
>https://arxiv.org/abs/2408.14837
>Diffusion Models Are Real-Time Game Engines
So same nerds combined a llm text model with art diffusion model and trained it on images + keyboard inputs to create a synthesized Doom gameplay.
Not mare related but the idea of practical combination of different ai tools is interesting to me.

Anonymous
08/29/24(Thu)15:25:32 No.41367097

Anonymous 08/29/24(Thu)15:25:32 No.41367097

https://www.udio.com/songs/67X7mqHih4C8m4raEX8fzW
https://pomf2.lain.la/f/tky4cms5.mp4

Midnight Rejections
acoustic guitar music. princess celestia, anon, male vocalist, sad

Lyrics

[Verse 1]
Celestia's trying hard
She’s got her royal charm turned up to ten
But Anon's still not interested again, yeah
She pulled out all her tricks
Even baked him a cake, extra thick
But buddy's not biting, not even one little bit

[Chorus]
In the castle, at midnight, room 302
With a bouquet of roses and some candles too
Anon’s locked the door, put a sign in his view
That says "Please go away"

[Verse 2]
She's got a plan, who knew?
But Luna and Cadance can't believe it’s true
She wore a fancy dress and said “Hey there you!”
She read from romance books
Tried adding sexy looks
But Anon just laughed and moved to his favorite nooks

[Chorus]
In the castle, at midnight, room 302
With a bouquet of roses and some candles too
Anon’s locked the door, put a sign in his view
That says "Please go away"

[Bridge]
So Celestia sighed, wiped a tear from her eye
The mares all gathered round, gave it one more try
They played their guitars, singing under moon light
But Anon just yawned and said, "Goodnight"

[Chorus]
In the castle, at midnight, room 302
With a bouquet of roses and some candles too
Anon’s locked the door, put a sign in his view
That says "Please go away"

----
Is "put a sign in his view" too bad?

Anonymous
08/29/24(Thu)17:17:58 No.41367385

Anonymous 08/29/24(Thu)17:17:58 No.41367385

>>41367097
you have quite a nice collection there, the 'Dreams of Luna' is pretty nice.

Anonymous
08/29/24(Thu)18:45:19 No.41367684

Anonymous 08/29/24(Thu)18:45:19 No.41367684

Do any of you guys reckon you could make a good voiced version of the second comic here?

https://www.tumblr.com/radioactive-dragonlover/759831654724419584

I tried using Haysay and putting in a segment of the audio from the episode of Game Changers it's referencing as an audio input, but it came out sounding very robotic and off - I think because Twilight as a character has a different range of pitches when speaking emotionally than BLeeM does. I don't really know how to adjust that, though.

Synthbot
08/29/24(Thu)19:51:11 No.41367811

Synthbot 08/29/24(Thu)19:51:11 No.41367811

>>41366416
Do you mean how it wasn't tuned to act like an assistant, and that it just continued from whatever text you gave it? That should be easy enough.
>>llm = AsyncCerebrasEngine(model="llama3.1-70b")
>>print(await llm.query_continuation("Once upon a time there was a little pony named Twilight Sparkle."))
>She lived in a magical land known as Equestria, where the sun was always shining and the air was sweet with the scent of blooming wildflowers. Twilight Sparkle was a student of Princess Celestia, the ruler of Equestria, and was learning the art of magic at the princess's palace in Canterlot. One day, while Twilight was studying in the library, she received a letter from the princess, instructing her to move to the town of Ponyville and live alongside the other ponies, to learn about the magic of friendship.

I can make sure there's a way to support jailbreaks too.

Anonymous
08/29/24(Thu)20:11:43 No.41367854

Anonymous 08/29/24(Thu)20:11:43 No.41367854

>>41366336
Welp, that new program is not going to be very useful to me as it crashes at the beginning with inability to load some dll models. So the struggle to find nice sounding tts is still going.

Anonymous
08/29/24(Thu)21:43:57 No.41368005

Anonymous 08/29/24(Thu)21:43:57 No.41368005

>>41367854
Afraid I can't be much help with that, assuming you're using the windows version; Linux version works fine. If it's an installation issue, in the releases of the GitHub there should be another install option there, maybe that'll work?
>dll models
I'm pretty sure it's intended to use .pth files as the models. Also only RVC, I had a slip up earlier where I accidentally tried to load a SoVits model if mine and it naturally errored.

Anonymous
08/29/24(Thu)22:56:02 No.41368170

Anonymous 08/29/24(Thu)22:56:02 No.41368170

>>41365609
Current ai might be too slow for this but I was thinking about some kind of LLM/RAG powered "RPG engine" where you don't just provide character definitions but world definitions, item definitions, maybe have some kind of prebuilt framework for quests, skills, other user defined mechanics, do the math of tracking XP, health, armor, damage multipliers in code rather than having the LLM try to pick that role up. Rather than the hackiness of trying to shove a world scenario or an explicit story into each character card these could be split into more logical pieces and composed into RPGs

Synthbot
08/30/24(Fri)02:32:20 No.41368601

Synthbot 08/30/24(Fri)02:32:20 No.41368601

>>41368170
Someone mentioned this in /chag/. The hard part is make sure it's possible extract & track the relevant information from a rulebook. If you can send me an example rulebook (maybe one of the PonyFinder ones), that would help.

Anonymous
08/30/24(Fri)02:49:31 No.41368620

Anonymous 08/30/24(Fri)02:49:31 No.41368620

>>41367684
Can you catbox the Game Changers audio, or link to the original episode?

Anonymous
08/30/24(Fri)06:12:16 No.41368840

Anonymous 08/30/24(Fri)06:12:16 No.41368840

>>41368005
they do have a precompiled download but its giving me the same error, im guessing it just my system being extra derped.

Anonymous
08/30/24(Fri)09:23:24 No.41369021

Anonymous 08/30/24(Fri)09:23:24 No.41369021

File: tumblr_d39fca40a770bbf5ed(...).jpg (617 KB, 2048x1326)

617 KB JPG

>>41368620
https://youtu.be/88et7YlmzTs?si=_okFx5HtSE9e9cBV
Here's the audio snippet I used:
https://files.catbox.moe/m5eph6.mp3

Anonymous
08/30/24(Fri)13:43:36 No.41369482

Anonymous 08/30/24(Fri)13:43:36 No.41369482

File: a-vibrant-acrylic-paintin(...).png (1.93 MB, 1024x1024)

1.93 MB PNG

>9

Anonymous
08/30/24(Fri)20:23:41 No.41370493

Anonymous 08/30/24(Fri)20:23:41 No.41370493

>>41369482
So it is.

Anonymous
08/31/24(Sat)00:19:21 No.41371103

Anonymous 08/31/24(Sat)00:19:21 No.41371103

>>41369021
https://files.catbox.moe/rf3zrc.mp3

Architecture = rvc | Character = Twilight Sparkle | Index Ratio = 0.95 | Pitch Shift = 8 | Voice Envelope Mix Ratio = 1.0 | Voiceless Consonants Protection Ratio = 0.33 | f0 Extraction Method = rmvpe

Anonymous
08/31/24(Sat)03:34:50 No.41371447

Anonymous 08/31/24(Sat)03:34:50 No.41371447

>>41369482
>cutie mark
Is that what it looks like to have 4chan as a special talent?

Anonymous
08/31/24(Sat)11:14:45 No.41372096

Anonymous 08/31/24(Sat)11:14:45 No.41372096

>>41369482
oy

Anonymous
08/31/24(Sat)11:38:12 No.41372139

Anonymous 08/31/24(Sat)11:38:12 No.41372139

File: a-3d-render-of-pony-twili(...).png (1.2 MB, 1248x832)

1.2 MB PNG

>>41371447
Either that or her talent is related to some kind of Star Trek: Green Edition.

Anonymous
08/31/24(Sat)13:53:24 No.41372478

Anonymous 08/31/24(Sat)13:53:24 No.41372478

>>41372139
Kek, that uniform design is ~~gold.~~

Synthbot
08/31/24(Sat)17:24:44 No.41373093

Synthbot 08/31/24(Sat)17:24:44 No.41373093

>>41365609
Updating the Master File:
- [Hopefully done] Download the Master File again so I can a clean updated copy. Mega was having issues, as usual. I'll need to check to make sure I have all the files, but I think this is done.
- [In progress] Reupload a clone of both Master Files to HuggingFace.
- [In progress] Reupload a clone of the new Master File to my gdrive.

Horsona chatbot library:
- [Done] Collect a list of functionality that would help in making better chatbots. The currentl list is up on github readme https://github.com/synthbot-anon/horsona. I have enough to get started, but please keep suggesting functionality if you think of anything. There's still functionality I want that no one's mentioned, so I'm sure the list is incomplete.
- [In progress] Split the target features into individual functions that can be implemented and pieced together, and create a github issue for each one so it's easy to keep track of everything. I'll need to start implementing some of these things so I can have a better understanding of how do this.
- ... [Done] Create a sample memory module, which is required for several of the candidate features. I went with "RAG where the underlying dataset can be automatically updated based on new information." The implementation is done, though the LLM prompts in https://github.com/synthbot-anon/horsona/blob/main/src/horsona/memory/rag.py#L139 could use some work. There's an example use in https://github.com/synthbot-anon/horsona/blob/main/tests/test_rag.py though the test only passes about 30% of the time. I'm pretty sure this can be made close to 100% with better prompts.
- ... [ ] Add documentation, a "start developing" guide, and tasks for the features where it's feasible to make progress using the memory module.
- ... [ ] Find some old & current jailbreaks to add jailbreak support, and turn them into either modules or LLM wrappers. If anyone has links for this, please send them.
- ... [ ] Figure out how to organize text corpora into compatible universes.
- ... [ ] Go through the candidate feature list and make sure jailbreaks & compatible universes are the only feature that are hard to support with the existing framework.

Information I need:
- Sample rulebooks, preferably one of the PonyFinder ones, so I can figure out what it'll take to extract information from these.
- Old & current jailbreaks so I can make sure my jailbreak implementation is comprehensive.

Anonymous
08/31/24(Sat)18:29:17 No.41373305

Anonymous 08/31/24(Sat)18:29:17 No.41373305

>>41373093
>keep suggesting functionality if you think of anything
I can't think of any at this moment, however I would love if you were able to keep the addons options that auto1111 webui for Stable Diffusion has, where one can just install whatever additional options as needed and as how anons make new ones in the future.

Anonymous
08/31/24(Sat)21:14:10 No.41373888

Anonymous 08/31/24(Sat)21:14:10 No.41373888

Page 10 bump.

Anonymous
08/31/24(Sat)21:19:19 No.41373902

Anonymous 08/31/24(Sat)21:19:19 No.41373902

Does Clipper know where he found the MLP background music? Shit's kino as fuck.

Synthbot
08/31/24(Sat)23:14:06 No.41374122

Synthbot 08/31/24(Sat)23:14:06 No.41374122

>>41373305
I'm only building the library right now (not a full application), but I can make sure it can support custom add-ons that can be dynamically toggled.

Synthbot
08/31/24(Sat)23:25:00 No.41374147

Synthbot 08/31/24(Sat)23:25:00 No.41374147

>>41373093
It looks like jailbreaks sometimes are LLM-specific and require modifying near-arbitrary arguments to the call. So they'll likely be implemented as custom LLMEngines. In that cases, I don't think I need to do anything for them right now since my LLMEngine implementation already supports all of the customizations required. I'll just create issues for popular jailbreaks that I or others can implement.
Organizing text into compatible universes looks like it'll require graphs of data sources, where one data source can inherent from another with edits. I'll have to think about how to implement this. Most of the features don't depend on this, so I'll shift focus to documenting & creating issues for now.

Anonymous
09/01/24(Sun)01:03:08 No.41374368

Anonymous 09/01/24(Sun)01:03:08 No.41374368

>>41373902
I know about these archived rips of background music:
https://www.mediafire.com/?rdhhrpyc0d6d3
https://www.mediafire.com/?rh219xdgj66bu

The first directory has music from seasons 1-2, and the second account belongs to RainShadow, who also has a YouTube channel:
https://www.youtube.com/@RainShadow

Synthbot
09/01/24(Sun)02:35:26 No.41374539

Synthbot 09/01/24(Sun)02:35:26 No.41374539

>>41373093
I think horsona / chatbot library is in a good-enough state for anyone that wants to help out with development.
Repo: https://github.com/synthbot-anon/horsona
Open tasks: https://github.com/synthbot-anon/horsona/issues
- The current open tasks are for creating new LLMEngines (easy), making prompts more reliable (medium), and creating new datatypes & modules for character cards and image generation (medium/hard, probably requires some familiarity with pytorch).
- If you want to develop and run into any issues with the setup, let me know.

The integrations with existing chatbot UIs will come a bit later from integrations with common inference engines. I don't expect that to be difficult.

Updating the Master File:
- [Hopefully done] Download the Master File again so I can a clean updated copy. I'll need to check to make sure I have all the files, but I think this is done.
- [In progress] Reupload a clone of both Master Files to HuggingFace.
- [In progress] Reupload a clone of the new Master File to my gdrive.

Horsona chatbot library:
- [Done] Add documentation, a "start developing" guide, and tasks for the features where it's feasible to make progress using the memory module.
- [Done] Find some old & current jailbreaks to add jailbreak support, and turn them into either modules or LLM wrappers. Ultimate Jailbreak is the main one, and there's an open task for it. There are others listed on rentry listed here: https://rentry.org/jb-listing.
- [Done enough] Go through the candidate feature list and make sure jailbreaks & compatible universes are the only features that are hard to support with the existing framework. Jailbreaks are easy to support. The rest of the features are easy to support.
- [ ] Work on whatever open issues other anons don't pick up.
- [ ] Continue working on lorebook generation. After this, I'll try making a simple chatbot with the library.
- [ ] Figure out how to organize text corpora into compatible universes.

Anonymous
09/01/24(Sun)04:41:51 No.41374903

Anonymous 09/01/24(Sun)04:41:51 No.41374903

>>41373902
pretty sure 99% its rips from the show itself (with two/three pieces made by Anons) that you can find it OP post second mega NZ link ('sfx and music' folder).

Anonymous
09/01/24(Sun)08:27:10 No.41375342

Anonymous 09/01/24(Sun)08:27:10 No.41375342

>>41373888
Minus one.

Anonymous
09/01/24(Sun)12:20:25 No.41375822

Anonymous 09/01/24(Sun)12:20:25 No.41375822

File: a-vibrant-acrylic-paintin(...).png (1.94 MB, 864x1152)

1.94 MB PNG

Clipper
09/01/24(Sun)15:58:58 No.41376564

Clipper 09/01/24(Sun)15:58:58 No.41376564

>>41373902
Nothing special, I just took it all from the music tracks of the same show audio used to make the voice dataset. Same clipping process with a different tagging system.

Anonymous
09/01/24(Sun)17:55:02 No.41376920

Anonymous 09/01/24(Sun)17:55:02 No.41376920

>>41371103
That's pretty good up until the screaming at the end. Thank you

Anonymous
09/01/24(Sun)19:03:18 No.41377069

Anonymous 09/01/24(Sun)19:03:18 No.41377069

File: 3121428.png (170 KB, 1528x2267)

170 KB PNG

Any chance someone here could train up a Lightning Dust model for RVC please? I need it for a song and her SVC one isn't cutting it.

Anonymous
09/01/24(Sun)19:50:18 No.41377203

Anonymous 09/01/24(Sun)19:50:18 No.41377203

>>41377069
https://huggingface.co/Amo/so-vits-svc-4.0_GA/tree/main/ModelsFolder/ddm_DaringDo_100k
There is a sovits model for her that was set up as ulti model training, as if I remember correctly the model did not have the required 2 minutes of audio?
I may give it a try for RVC training in a day or two.

Synthbot
09/01/24(Sun)20:00:43 No.41377233

Synthbot 09/01/24(Sun)20:00:43 No.41377233

>>41374539
Updating the Master File:
- [Done] Downloaded the new Master File and checked to make sure everything is good.
- [In progress] Reupload a clone of both Master Files to HuggingFace. This should be done in about 1 hour.
- [In progress] Reupload a clone of the new Master File to my gdrive. This should be done in about 3 hours.

Updating the Fimfarchive:
- [In progress] Download & verify the Sep 1 Fimfarchive. I'm downloading it now. If there are no errors, this should be done in about 5 hours.
- [ ] Upload to HuggingFace.

>>41364894
There's an empty "Luster Dawn" folder in Special Source, in case you wanted to remove that.

Anonymous
09/01/24(Sun)21:20:00 No.41377404

Anonymous 09/01/24(Sun)21:20:00 No.41377404

File: _lightning dust (mlp), fe(...).png (410 KB, 640x640)

410 KB PNG

>>41377069
I've also been meaning to train a Lightning Dust model. I'll see about training her later today probably. Would be good to get me back into the rhythm of training; been a long while.

Anonymous
09/01/24(Sun)21:49:25 No.41377494

Anonymous 09/01/24(Sun)21:49:25 No.41377494

File: Untitled.png (553 KB, 1080x1049)

553 KB PNG

SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
https://arxiv.org/abs/2408.17432
>Synthesizing the voices of unseen speakers is a persisting challenge in multi-speaker text-to-speech (TTS). Most multi-speaker TTS models rely on modeling speaker characteristics through speaker conditioning during training. Modeling unseen speaker attributes through this approach has necessitated an increase in model complexity, which makes it challenging to reproduce results and improve upon them. We design a simple alternative to this. We propose SelectTTS, a novel method to select the appropriate frames from the target speaker and decode using frame-level self-supervised learning (SSL) features. We show that this approach can effectively capture speaker characteristics for unseen speakers, and achieves comparable results to other multi-speaker TTS frameworks in both objective and subjective metrics. With SelectTTS, we show that frame selection from the target speaker's speech is a direct way to achieve generalization in unseen speakers with low model complexity. We achieve better speaker similarity performance than SOTA baselines XTTS-v2 and VALL-E with over an 8x reduction in model parameters and a 270x reduction in training data
https://kodhandarama.github.io/selectTTSdemo/
code and weights to be released (soon?)
examples aren't great but considering the training time/training data/parameters means its viable for personal training. they used 100 hours of data

Anonymous
09/01/24(Sun)21:51:30 No.41377501

Anonymous 09/01/24(Sun)21:51:30 No.41377501

Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement
https://arxiv.org/abs/2408.17358
>Convolutional layers with 1-D filters are often used as frontend to encode audio signals. Unlike fixed time-frequency representations, they can adapt to the local characteristics of input data. However, 1-D filters on raw audio are hard to train and often suffer from instabilities. In this paper, we address these problems with hybrid solutions, i.e., combining theory-driven and data-driven approaches. First, we preprocess the audio signals via a auditory filterbank, guaranteeing good frequency localization for the learned encoder. Second, we use results from frame theory to define an unsupervised learning objective that encourages energy conservation and perfect reconstruction. Third, we adapt mixed compressed spectral norms as learning objectives to the encoder coefficients. Using these solutions in a low-complexity encoder-mask-decoder model significantly improves the perceptual evaluation of speech quality (PESQ) in speech enhancement.
https://github.com/felixperfler/Stable-Hybrid-Auditory-Filterbanks

Anonymous
09/02/24(Mon)00:38:47 No.41377900

Anonymous 09/02/24(Mon)00:38:47 No.41377900

>>41377203
>>41377404
It would be much appreciated. SVS has a tone to it that's difficult to work into some genres, I think RVC would nail it.

Anonymous
09/02/24(Mon)03:39:11 No.41378282

Anonymous 09/02/24(Mon)03:39:11 No.41378282

Precautionary page 8 bump.

Anonymous
09/02/24(Mon)09:35:14 No.41378709

Anonymous 09/02/24(Mon)09:35:14 No.41378709

>>41378282
Plus one.

Anonymous
09/02/24(Mon)13:58:08 No.41379052

Anonymous 09/02/24(Mon)13:58:08 No.41379052

>>41377900
I may have something workable within 6 hours (or 12, depending if pc decides to have another technical hiccup).

Anonymous
09/02/24(Mon)19:37:14 No.41380032

Anonymous 09/02/24(Mon)19:37:14 No.41380032

File: Lightning Dust 1639574203891.png (508 KB, 1241x1235)

508 KB PNG

>>41377900
>>41379052
And Im back.
>https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/MLP_Lightning_Dust_GA
>https://files.catbox.moe/amkbes.mp3
I may over train her with setting up the epoch to 500 but I still think this model came out pretty decent, specially with all the additional clean up data clips.

Anonymous
09/02/24(Mon)21:31:02 No.41380258

Anonymous 09/02/24(Mon)21:31:02 No.41380258

File: lightning dust (mlp), pon(...).png (447 KB, 640x640)

447 KB PNG

>>41380032
Nicely done. I attempted to clean some more files for training her yesterday but didn't have much luck with getting any of the tools to work before I got bummed out and needed sleep. Those tools being jazzpear94's model (https://colab.research.google.com/drive/1efoJFKeRNOulk6F4rKXkjg63RBUm0AnJ) and a couple other similar ones intended to help separate SFX specifically, which I found in this doc: https://docs.google.com/document/d/17fjNvJzj8ZGSer7c7OFe_CNfUKbAxEh_OBv94ZdRG5c/edit#heading=h.owqo9q2d774z

How much data did you have of her to work with? Still tempted to train another model of her anyways, if at least to see if my training setup works still.

Anonymous
09/02/24(Mon)21:47:34 No.41380291

Anonymous 09/02/24(Mon)21:47:34 No.41380291

Hey Hydrus, any chance we could get this >>41380032 on Haysay.ai?

Synthbot
09/02/24(Mon)23:30:41 No.41380500

Synthbot 09/02/24(Mon)23:30:41 No.41380500

>>41377233
Updating the Master File:
- [Done] Reupload a clone of both Master Files to HuggingFace. https://huggingface.co/datasets/synthbot/pony-speech and https://huggingface.co/datasets/synthbot/pony-singing
- [In progress] Reupload a clone of the new Master File to my gdrive. https://drive.google.com/drive/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQ

Updating the Fimfarchive:
- [Done] Download & verify the Sep 1 Fimfarchive. Fimfiction added some restriction to prevent bots from scraping. Part of my script downloads the story html if there's anything wrong with the fimfarchive epubs, or if there's a conflict between the fimfarchive metadata and epub. I have a hackish fix for this for now.
- [Done] Upload to HuggingFace. https://huggingface.co/datasets/synthbot/fimfarchive

Horsona chatbot library:
- [In progress] Continue working on lorebook generation. I cleaned up part of my embedding-based memory implementation. I'll clean the rest as I figure out the right way to use it for reading through a story. Right now, I'm having an LLM create a question-answer dataset for the story setting, which it refines as it reads the story. The questions get turned into embeddings, which can be used to look up the corresponding answers as necessary. This is still a work in progress. I think my first "test" for this would be if it can create a decent character card for every character after each chapter of a story. That's what I'm currently working toward.
- [ ] Work on whatever open issues other anons don't pick up.
- [ ] Figure out how to organize text corpora into compatible universes.

Synthbot
09/02/24(Mon)23:31:32 No.41380502

Synthbot 09/02/24(Mon)23:31:32 No.41380502

>>41380500
Correction: the Master File reupload to my gdrive should be [Done].

Anonymous
09/03/24(Tue)02:08:38 No.41380756

Anonymous 09/03/24(Tue)02:08:38 No.41380756

>>41380258
>https://files.catbox.moe/d3j7w5.zip
>I attempted to clean some more files
I usually just grab the Clear files from OP mega folder and if the number is below 3 minutes I will additionally scavenge any usable 'noisy' files as well, in this case I think I used almost all of the noisy ones with only three being deleted.

Anonymous
09/03/24(Tue)05:43:56 No.41381049

Anonymous 09/03/24(Tue)05:43:56 No.41381049

File: lightning dust (mlp), pon(...).png (372 KB, 640x640)

372 KB PNG

>>41380756
That's more or less what I have to train with, but I'm a bit stringent when it comes to samples I use. There's quite a few lines sourced from her second episode that I didn't feel suited as she kinda gets a slight country-like accent to it in her delivery (https://files.catbox.moe/gn0vd5.flac) and unusual or distorted with others (https://files.catbox.moe/1h5trr.flac & https://files.catbox.moe/4mug5h.flac). But yeah, I'll train her with what I've defined, though fewer files, and consider an alternate version that'll hopefully be more faithful to her debut version.

Anonymous
09/03/24(Tue)10:16:28 No.41381353

Anonymous 09/03/24(Tue)10:16:28 No.41381353

Up.

Anonymous
09/03/24(Tue)12:43:36 No.41381590

Anonymous 09/03/24(Tue)12:43:36 No.41381590

>>41381049
Oh yes, when going through the dataset I will try to look out for the tone of the voice as well, while I dont have proof I think the clips were "X character pretends to talk like Y" may end up poisoning the training process.
Btw Anon, what stuff are you planing to be making with her voice, songs? green text?

Anonymous
09/03/24(Tue)13:35:57 No.41381673

Anonymous 09/03/24(Tue)13:35:57 No.41381673

>>41381590
I usually make covers, but I'll be doing further tests with Applio and which combinations of TTS voices it has pairs best and the settings I feel works best. Might have to compile some sort of parameters list. Also curious about the TTS >>41377494 mentioned and how well that'll perform in comparison. I have a few factors limiting my ability to effectively perform non-song stuff, but I have a long list of stuff to produce. Nothing so much with Lightning Dust though thus far, aside from a few songs I wanna test with her.

Anonymous
09/03/24(Tue)19:56:09 No.41382409

Anonymous 09/03/24(Tue)19:56:09 No.41382409

Good night bump.

Anonymous
09/04/24(Wed)00:28:56 No.41383025

Anonymous 09/04/24(Wed)00:28:56 No.41383025

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
https://arxiv.org/abs/2409.00750
>Nowadays, large-scale text-to-speech (TTS) systems are primarily divided into two types: autoregressive and non-autoregressive. The autoregressive systems have certain deficiencies in robustness and cannot control speech duration. In contrast, non-autoregressive systems require explicit prediction of phone-level duration, which may compromise their naturalness. We introduce the Masked Generative Codec Transformer (MaskGCT), a fully non-autoregressive model for TTS that does not require precise alignment information between text and speech. MaskGCT is a two-stage model: in the first stage, the model uses text to predict semantic tokens extracted from a speech self-supervised learning (SSL) model, and in the second stage, the model predicts acoustic tokens conditioned on these semantic tokens. MaskGCT follows the \textit{mask-and-predict} learning paradigm. During training, MaskGCT learns to predict masked semantic or acoustic tokens based on given conditions and prompts. During inference, the model generates tokens of a specified length in a parallel manner. We scale MaskGCT to a large-scale multilingual dataset with 100K hours of in-the-wild speech. Our experiments demonstrate that MaskGCT achieves superior or competitive performance compared to state-of-the-art zero-shot TTS systems in terms of quality, similarity, and intelligibility while offering higher generation efficiency than diffusion-based or autoregressive TTS models
https://maskgct.github.io/
no weights (ever) since they're worried about safety. finetuned it after for emotion control and voice cloning. sounds pretty good. 100k hours training dataset.

Anonymous
09/04/24(Wed)00:36:06 No.41383042

Anonymous 09/04/24(Wed)00:36:06 No.41383042

What's the fastest way to get this shit voice acted? it can even be a female voice actor, it doesn't matter, it just has to sound entertaining to listen to.
and optionally make the visuals match with what he's talking about?
https://desuarchive.org/mlp/thread/40590194/#40598329

Anonymous
09/04/24(Wed)04:04:14 No.41383428

Anonymous 09/04/24(Wed)04:04:14 No.41383428

anypony have voice packs for eleven labs

Anonymous
09/04/24(Wed)04:17:45 No.41383447

Anonymous 09/04/24(Wed)04:17:45 No.41383447

I hate to moral fag but am I the only one bothered by people who use CMC voice AIs for coomer shit normally I wouldn't care but they were kids when they recorded at least most of their lines

Anonymous
09/04/24(Wed)09:34:36 No.41383974

Anonymous 09/04/24(Wed)09:34:36 No.41383974

>>41383447
The VAs are all adults now anyway, so that concern isn't really an issue any more.

Anonymous
09/04/24(Wed)14:23:30 No.41384662

Anonymous 09/04/24(Wed)14:23:30 No.41384662

File: acrylic-art-of-pinkie-pie(...).png (1.8 MB, 1024x1024)

1.8 MB PNG

Anonymous
09/04/24(Wed)18:37:18 No.41385349

Anonymous 09/04/24(Wed)18:37:18 No.41385349

File: OIG4.Pc7p8fPrtACEmu2G7R_o.jpg (249 KB, 1024x1024)

249 KB JPG

Synthbot
09/04/24(Wed)20:04:58 No.41385601

Synthbot 09/04/24(Wed)20:04:58 No.41385601

>>41380500
Minor update on lorebook generation:
The current plan for memory is to extract questions and answers as the LLM reads a story. The questions get indexed by embedding, and they won't get updated unless a corresponding answer is deleted. The answers will get updated as the LLM reads the story. It'll process each paragraph twice: one the generate a list of questions that need to be answered to understand the paragraph, and a second time with the corresponding answers to determine how the memory should be updated.

Clipper
09/04/24(Wed)20:42:39 No.41385715

Clipper 09/04/24(Wed)20:42:39 No.41385715

>>41377233
Not sure what that empty Luster Dawn folder was supposed to be for, perhaps a holdover from processing the studio leaks that now has no purpose. It's now been removed.

Anonymous
09/05/24(Thu)06:12:18 No.41386893

Anonymous 09/05/24(Thu)06:12:18 No.41386893

File: acrylic-art-of-fluttershy(...).png (1.76 MB, 1024x1024)

1.76 MB PNG

Anonymous
09/05/24(Thu)06:46:44 No.41386926

Anonymous 09/05/24(Thu)06:46:44 No.41386926

https://www.udio.com/songs/kgm72z2swizqRLSYDJWMMG
https://vocaroo.com/1hO6O2SpRNy6
https://pomf2.lain.la/f/4s3d56ud.mp4

Behind the Facade 1

Lyrics

[Verse 1]
We live in a world with cartoons and rainbows
With sparkly eyes and vibrant shows
Featureless, seamless, no lines can be seen
In our perfect land where nothing disagrees
No whispers of night's forbidden touch
In pastel dreams, we're bound and crushed

[Chorus]
Hey, Equestria
What can we do?
We live by rules, pretend they're true
While desires hide and hearts must play
In child's delight, we can't be free today

[Verse 2]
Can't flaunt our flair or show a peek
No lips can part for secrets to speak
Innocent, sweet, and always demure
Living in a world where nothing's obscure
Behind closed doors, our true selves lie
Hushing our wants as we gaze at the sky

[Chorus]
Hey, Equestria
What can we do?
We live by rules, pretend they're true
While desires hide and hearts must play
In child's delight, we can't be free today

[Bridge]
Can't break these chains of purity's face
In this vibrant land, we find no embrace
Our silent cries echo in the night
In a painted world, there's no real sight

[Chorus]
Hey, Equestria
What can we do?
We live by rules, pretend they're true
While desires hide and hearts must play
In child's delight, we can't be free today

Anonymous
09/05/24(Thu)07:03:19 No.41386944

Anonymous 09/05/24(Thu)07:03:19 No.41386944

Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems
https://arxiv.org/abs/2409.02517
>While universal vocoders have achieved proficient waveform generation across diverse voices, their integration into text-to-speech (TTS) tasks often results in degraded synthetic quality. To address this challenge, we present a novel augmentation technique for training universal vocoders. Our training scheme randomly applies linear smoothing filters to input acoustic features, facilitating vocoder generalization across a wide range of smoothings. It significantly mitigates the training-inference mismatch, enhancing the naturalness of synthetic output even when the acoustic model produces overly smoothed features. Notably, our method is applicable to any vocoder without requiring architectural modifications or dependencies on specific acoustic models. The experimental results validate the superiority of our vocoder over conventional methods, achieving 11.99% and 12.05% improvements in mean opinion scores when integrated with Tacotron 2 and FastSpeech 2 TTS acoustic models, respectively.

Anonymous
09/05/24(Thu)07:16:13 No.41386971

Anonymous 09/05/24(Thu)07:16:13 No.41386971

>>41386926
Cute and soulful
>>41386944
>Tacotron 2
Huh, that's a name we had not seen in the threads for a while.

Anonymous
09/05/24(Thu)12:14:59 No.41387488

Anonymous 09/05/24(Thu)12:14:59 No.41387488

>page 10

Anonymous
09/05/24(Thu)13:33:30 No.41387652

Anonymous 09/05/24(Thu)13:33:30 No.41387652

File: lightning dust (mlp), pon(...).png (560 KB, 640x640)

560 KB PNG

>>41377069
>>41377404
Training of Lightning Dust (Alt) model has begun. Decided to use the pretrained TITAN as I found the descriptor of it interesting.
>TITAN is a fine-tuned based on the original RVC V2 pretrained, leveraging an 11.15-hours dataset sourced from Expresso. It gives cleaner results compared to the original pretrained, also handles the accent and noise better due to its robustness, being able to generate high quality results. Like Ov2 Super, it allows models to be trained with few epochs, it supports all the sample rates.
Hopefully she'll prove to be less noisy and have more accent retention. Training's a little slow my end with the reduced batch size (supposedly smaller gives better results but at the expense of training speed) but so far no issues in the process. If all goes well hopefully I can also begin training more mares between my other commitments.

Improvements to vocal separation AI should be looked further into, it'd be nice to be able to separate audio that have had a hard time separating for datasets in the past. Amalthea comes to mind with most current separators struggling with cricket sounds and other natural additions. I have a feeling we could perhaps create and/or finetune a UVR5 model designed to separate SFX using all the pony SFX we've separated thus far to have an easier time removing a lot of foley, hoofstep, crashes and similar sounds. As for more natural sounds like rain, wind, birds, insects, etc. there's a lot of data from the enormous SONNISS GameAudioGDC packs that could be utilized for this. Would be preferable to use MDX-Net so it's reliable for a number of GPUs, as DEMUCS models tend to not want to run unless the GPU has more than 6GBs.

Anonymous
09/05/24(Thu)14:32:48 No.41387770

Anonymous 09/05/24(Thu)14:32:48 No.41387770

File: 1725560903092381013-video(...).webm (517 KB, 1280x720)

517 KB WEBM

>https://hailuoai.com/video
this shit is fucking crazy

Anonymous
09/05/24(Thu)14:48:53 No.41387807

Anonymous 09/05/24(Thu)14:48:53 No.41387807

>>41387770
but can it recreate the tienanmen square massacre of june 1989?

Anonymous
09/05/24(Thu)15:00:37 No.41387845

Anonymous 09/05/24(Thu)15:00:37 No.41387845

File: Long Queue.png (11 KB, 418x303)

11 KB PNG

>>41387770
Almost as crazy as the queue times for it will soon be.
6 minutes to wait already with only 171 people.Yikes.

Anonymous
09/05/24(Thu)15:11:01 No.41387864

Anonymous 09/05/24(Thu)15:11:01 No.41387864

File: PinkiePieWave_HailuoAI.webm (695 KB, 1280x720)

695 KB WEBM

>>41387770
>>41387845
For the quality though, it's definitely got pony down surprisingly well. Just concerned how slow it'll start to get once more jump on board the same generator. Thankfully it's free (for now)
>Pinkie Pie (My Little Pony: Friendship is Magic) pony waving her hoof at the viewer.

Anonymous
09/05/24(Thu)15:27:26 No.41387897

Anonymous 09/05/24(Thu)15:27:26 No.41387897

>>41387770 >>41387845 >>41387864
so now we are entering the age of computer animated mares, I will be very disappointed if there isnt going to be a bootleg of this tech available for offline generation sometime within the next three years.

Anonymous
09/05/24(Thu)15:27:46 No.41387898

Anonymous 09/05/24(Thu)15:27:46 No.41387898

How do I get Doug Walker or James Rofle's voices to voice act this?

https://desuarchive.org/mlp/thread/40590194/#40598329

Anonymous
09/05/24(Thu)15:29:14 No.41387904

Anonymous 09/05/24(Thu)15:29:14 No.41387904

>>41387864
In what format is it done though? Is it usable for .ai vectors PSD photoshop, FLA flash, Toonboom , Live2D, Spine, or something else?

Anonymous
09/05/24(Thu)15:31:08 No.41387910

Anonymous 09/05/24(Thu)15:31:08 No.41387910

>>41387770
>https://hailuoai.com/video
Anime is dead. Finally smooth animation for free.

Anonymous
09/05/24(Thu)16:31:44 No.41388039

Anonymous 09/05/24(Thu)16:31:44 No.41388039

File: 1725566944758114428-video(...).webm (2.27 MB, 1280x720)

2.27 MB WEBM

>>41387845
G6 intro just dropped.

Anonymous
09/05/24(Thu)16:33:12 No.41388043

Anonymous 09/05/24(Thu)16:33:12 No.41388043

File: 1725567262635368079-video(...).webm (632 KB, 1280x720)

632 KB WEBM

>>41387770

very cute twilight.

Anonymous
09/05/24(Thu)16:52:46 No.41388101

Anonymous 09/05/24(Thu)16:52:46 No.41388101

File: 1725568854892390070-video(...).webm (967 KB, 1280x720)

967 KB WEBM

>>41387845

Anonymous
09/05/24(Thu)17:06:26 No.41388147

Anonymous 09/05/24(Thu)17:06:26 No.41388147

File: lolgate.gif (302 KB, 300x335)

302 KB GIF

>>41388039
the more often you watch this, the funnier it gets

Anonymous
09/05/24(Thu)18:15:03 No.41388337

Anonymous 09/05/24(Thu)18:15:03 No.41388337

>>41387898
voice-models.com doesn't have a Doug Walker voice, but it does have James Rolfe. Since it's an RVC model, you'll have to read the pasta yourself.

ISleepNow !!f0IvJy0A6um
09/05/24(Thu)19:53:13 No.41388624

ISleepNow !!f0IvJy0A6um 09/05/24(Thu)19:53:13 No.41388624

>>41388039
>background full of 'Curse of the Fly'/'The Unearthly' type misshapen monstrosities
>>41388147
Just pause at any random moment for a good laugh/scare

Anonymous
09/05/24(Thu)20:03:18 No.41388651

Anonymous 09/05/24(Thu)20:03:18 No.41388651

>>41388624
my favorite part is the random tiny little houses at the end for some reason

ISleepNow !!f0IvJy0A6um
09/05/24(Thu)20:22:00 No.41388706

ISleepNow !!f0IvJy0A6um 09/05/24(Thu)20:22:00 No.41388706

>>41388651
Well now that you mention it, this does bring up sort of an interesting point with the show. It is clearly established that animals like Angel Bunny have roughly pony intelligence. Would it really be so farfetched if we saw them living in actual tiny houses?

Anonymous
09/05/24(Thu)22:49:11 No.41389084

Anonymous 09/05/24(Thu)22:49:11 No.41389084

File: Untitled.png (118 KB, 1125x440)

118 KB PNG

Sample-Efficient Diffusion for Text-To-Speech Synthesis
https://arxiv.org/abs/2409.03717
>This work introduces Sample-Efficient Speech Diffusion (SESD), an algorithm for effective speech synthesis in modest data regimes through latent diffusion. It is based on a novel diffusion architecture, that we call U-Audio Transformer (U-AT), that efficiently scales to long sequences and operates in the latent space of a pre-trained audio autoencoder. Conditioned on character-aware language model representations, SESD achieves impressive results despite training on less than 1k hours of speech - far less than current state-of-the-art systems. In fact, it synthesizes more intelligible speech than the state-of-the-art auto-regressive model, VALL-E, while using less than 2% the training data.
https://github.com/justinlovelace/SESD
no code yet though they suggest they'll post an "implementation" so maybe weights too. no examples. so just posting to keep those interested aware. the 2% training data of vall-e but outcompetes it is big if true

Anonymous
09/05/24(Thu)22:50:40 No.41389086

Anonymous 09/05/24(Thu)22:50:40 No.41389086

>>41389084
>Note: Code and model checkpoint will be available soon. Stay tuned for updates!
ah should have checked the whole readme

/r/
09/06/24(Fri)00:53:09 No.41389381

/r/ 09/06/24(Fri)00:53:09 No.41389381

>>41387770
I prompted "Applejack (My Little Pony: Friendship is Magic) collecting hay from each of the rest of the Mane 6." and got G5.

Anonymous
09/06/24(Fri)04:36:52 No.41389749

Anonymous 09/06/24(Fri)04:36:52 No.41389749

Page 9 bump.

Anonymous
09/06/24(Fri)07:10:26 No.41389882

Anonymous 09/06/24(Fri)07:10:26 No.41389882

>>41389084
>Graphs
We're the voices at? Also it would be nice if once this gets published to be able to run it on my old GPU (I will loose my shit if this will be another model that requires 16vram to even start up).

Anonymous
09/06/24(Fri)07:29:02 No.41389897

Anonymous 09/06/24(Fri)07:29:02 No.41389897

File: lightning dust (mlp), pon(...).png (1.34 MB, 1024x1024)

1.34 MB PNG

>>41387652
[RVC] Lightning Dust sings - There For Tomorrow "A Little Faster"
>https://files.catbox.moe/d610ed.mp4
>https://files.catbox.moe/v8uxxn.mp3

>https://huggingface.co/datasets/HazySkies/RVC2-M/tree/main/FiM_LightningDust
So far she seems decently capable with her preliminary testings; although I've only tried singing so far. Surprisingly her natural range requires +0 than the expected +12. This test could've been better had the original song separated a little better, but it's a good average to test out. I quite liked her backing vocals at 1:22, her sustained notes sound nice.

Anonymous
09/06/24(Fri)08:27:51 No.41389967

Anonymous 09/06/24(Fri)08:27:51 No.41389967

File: lightning dust.gif (3.15 MB, 468x480)

3.15 MB GIF

>>41389897
hey that sounds pretty good. Nice job, man!

Anonymous
09/06/24(Fri)11:39:38 No.41390354

Anonymous 09/06/24(Fri)11:39:38 No.41390354

>>41389897
That's actually impressive, given how few voice lines she has.

Anonymous
09/06/24(Fri)15:45:19 No.41391013

Anonymous 09/06/24(Fri)15:45:19 No.41391013

>>41389897
very good, we need more background mares songs.

Anonymous
09/06/24(Fri)17:03:56 No.41391221

Anonymous 09/06/24(Fri)17:03:56 No.41391221

>>41391013
>catbox ded
uhoh, another good alternative to host small files? also could someone repost the above cover?

Anonymous
09/06/24(Fri)17:05:18 No.41391225

Anonymous 09/06/24(Fri)17:05:18 No.41391225

File: cadence flurry DJ.gif (1.04 MB, 394x382)

1.04 MB GIF

Cadence - Like a Prayer (BHO cover)
>https://www.youtube.com/watch?v=uP6CRRhTOIM

First time experimenting with vocoders and heavy filters. It sounds bit scuffed, but not too bad. Used Haysay RVC for the voice and UVR for everything else.

Does anyone have a working catbox alternative? I'd rather drop a direct link than a youtube link, but every other site I try either doesn't work or prunes the link in a matter of hours.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.