[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/mlp/ - Pony

Name
Spoiler?[]
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File[]
  • Please read the Rules and FAQ before posting.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: AnotherFilenameMaybe.png (1.54 MB, 2119x1500)
1.54 MB
1.54 MB PNG
Welcome to the Pony Voice Preservation Project!

See below for regular OP post.
>>
>>41064811
Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE

The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.

Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.

AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.

Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.

EQG and G5 are not welcome.

>Quick start guide:
docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.

>The main Doc:
docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
An in-depth repository of tutorials, resources and archives.

>Active tasks:
Research into animation AI
Research into pony image generation
>>
>Latest developments:
GDrive clone of Master File now available >>37159549
SortAnon releases script to run TalkNet on Windows >>37299594
TalkNet training script >>37374942
GPT-J downloadable model >>37646318
FiMmicroSoL model >>38027533
Delta GPT-J notebook + tutorial >>38018428
New FiMfic GPT model >>38308297 >>38347556 >>38301248
FimFic dataset release >>38391839
Offline GPT-PNY >>38821349
FiMfic dataset >>38934474
SD weights >>38959367
SD low vram >>38959447
Huggingface SD: >>38979677
Colab SD >>38981735
NSFW Pony Model >>39114433
New DeltaVox >>39678806
so-vits-svt 4.0 >>39683876
so-vits-svt tutorial >>39692758
Hay Say >>39920556
Haysay on the web! >>40391443
SFX seperator >>40786997 >>40790270
Clipper finishes re-reviewing audio >>40999872
>>
>>41064816
>The PoneAI drive, an archive for AI pony voice content:
drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp

>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
drive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx

>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.

Last Thread:
>>40921071
>>
FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Main: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit

>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwq
How to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy

>Where can I find content made with the voice AI?
In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
And the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit

>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ4
2021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f
2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw5
2023 pony.tube/w/fVZShksjBbu6uT51DtvWWz

>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.

>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imitations of official voices?
No.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.

>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
pony.tube/w/mqJyvdgrpbWgZduz2cs1Cm

PPP Redubs:
pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97

Stream Premieres:
pony.tube/w/6cKnjJEZSCi3gsvrbATXnC
pony.tube/w/oNeBFMPiQKh93ePqTz1ns8
>>
File: WoodAnch.png (583 KB, 656x847)
583 KB
583 KB PNG
>>41064811
Anchor.
>>
>>41064811
Here is the rest of the news section and an explanation why the OP is like it is. Sorry about that.
https://ponepaste.org/10030
>>
https://vocaroo.com/162hAcNA6WY3

Udio
Prompt: Mares of Equestria love Anon AKA Anonymous, Who is the only (male) human in Equestria;female vocalist, Eurodance, Pop, Melodic, Funny

Anon in Equestria

In this land of hooves and tails, he walks alone
With a two-legged stride in a horse's home
They trot to his side, say 'Anon's our own!'
Every mare in Equestria is calling his phone

Their hearts trotting fast, galloping in time
To the beat of their world where the sun always shines

All the stallions might glare, but Anon's unfazed
He's got unicorn magic in his human ways
Pegasus girls, they all wanna race
But it's Anon, only Anon, who can keep up the pace

Whoa-oh, in Equestria, he's living the dream
Every mare's got a crush, yeah, Anon's the theme
Through the fields, he's the one they pursue
A human touch that's both rare and true
(left out: When Anon steps, the world's brand new)
In a pony parade, he's the first in view

Flutters shy and Rarity's gleam
Every mare in the land wants to join his team
With a laughter's ring (aha!), Pinkie's in line
Throws a party for two, with Anon, it's divine
Rainbow Dash in the sky, drafting clouds to spell 'Anon's mine!'
But there's a queue, yeah, the line's longer each dawn

Twilight's sparkle can't compare
To the way that they swoon when he's there

In this land of hooves and tails, he walks alone
With a two-legged stride in a horse's home
They trot to his side, say 'Anon's our own!'
Every mare in Equestria is calling his phone
>>
Adapting WavLM for Speech Emotion Recognition
https://arxiv.org/abs/2405.04485
>Recently, the usage of speech self-supervised models (SSL) for downstream tasks has been drawing a lot of attention. While large pre-trained models commonly outperform smaller models trained from scratch, questions regarding the optimal fine-tuning strategies remain prevalent. In this paper, we explore the fine-tuning strategies of the WavLM Large model for the speech emotion recognition task on the MSP Podcast Corpus. More specifically, we perform a series of experiments focusing on using gender and semantic information from utterances. We then sum up our findings and describe the final model we used for submission to Speech Emotion Recognition Challenge 2024.
good walkthrough for how they did it for anyone here who wants to make one for ponies. made for this competition
https://www.odyssey2024.org/emotion-recognition-challenge
>>
>op didn't put "ppp"in the text and now I can't quickly find the thread
Anyhow, I'm thinking of training more seasons specific voices, does anyone have some special request ?
>>
File: Coloratura Sit.png (1.64 MB, 6735x6735)
1.64 MB
1.64 MB PNG
>>41065199
Coloratura. What architecture are you training for?
>>
>>41065205
Rvc, but first I will need to check if she has at least 2 minutes of files. I may try sovit later but those usually take almost all day.
>>
>>41065199
"ppp" has never been in the text. I usually find the thread by searching for "preservation".
>>
File: razzledazzle.png (15 KB, 90x90)
15 KB
15 KB PNG
>>41065979
hmm, weird, typing "ppp" in the search always worked in the past several years when lurking in the main catalogue.
>>41065205 >>41064828
>https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/MLP_Rara_mix
>https://vocaroo.com/1m32fJsJKv1X
RVC of Countess Coloratura aka Rara, for training I've used mix of singing and speaking lines to pad it out to 4 minutes of dataset. For some reason it seems the model is having more difficulties working with the male voice clips conversion than the previous models I've trained.
I will need to refresh my knowledge on how my sovit training setup works so her model will be train sometime over the weekend.
>>
>>41064967
What is the best TTS for mares in Silly Tavern?
>>
>>41066442
Awesome!
>>41064591
Any chance this could get upped on haysay.ai?
>>
File: happy Twilight.gif (1.15 MB, 675x540)
1.15 MB
1.15 MB GIF
>>41028148
I am late to the party, but...
Happy birthday PPPV!
I was here at the beginning, and even if I can't help much these past two years, it always warms my heart to see this thread alive and kicking!

Long life to PPP!
>>
https://github.com/yxlllc/ReFlow-VAE-SVC
ReFlow-VAE-SVC

Has anyone tried this one yet? Is it good?
>>
>>41068494
Same anon. I tried it... but there seems to be an issue involving the output being very squeaky at times and deeper in others... I don't know if it's because of the data I used or what, but I don't like that it's giving that sort of quality.
>>
>>41067280
Added.
>>
>>41067471
>5 years
How? It feels like it only started yesterday?
>>
early 2023 15 ai audio clips i haven't shared:
fluttershy yelling at angel spongebob reference
https://files.catbox.moe/2h8jtd.wav
cocopommel says i love you
https://files.catbox.moe/7aep55.wav
>>
>>41064967
Anyone know why Udio has so many problems generating 16-bit music?
>>
File: out of apples.png (1.27 MB, 1400x1400)
1.27 MB
1.27 MB PNG
https://www.youtube.com/watch?v=cRomvo3obF8
>>
>>41070370
nicely done
>>
>>41070370
Very Nice.
>>
>>41070370
>>41064967
mare music
>>
>>41064811
>>41064845
The problem is that you can't link 5 posts consecutively on a single line. So this should work:

Something in the following text will prevent your posts from going through. I tried like 6 times to make the thread, I tried from both Chromium and Firefox, different ISPs, I tried without 4chanx. I would just get redirected to this thread >>41064811. Eventually I tried posting it piecemeal to see if maybe a part of the OP was being blocked? And that did seem to be the case. Something in the below text will prevent your text from being posted. If the delete functionality still worked I would remake the thread. I'm really sorry about the messed up OP.

Synthbot updates GDrive >>41019588
Private "MareLoid" project >>40925332 >>40928583 >>40932952
VoiceCraft >>40938470 >>40953388
Fimfarch dataset >>41027971
5 years of PPP >>41029227
Various AI News >>40947581 >>40991154 >>41012445 >>41023953
>>41025268 >>41041365
>>
Does anyone know if there is some kind of add-on to the RVC training script were you can provide it with a list for creating an entire batch of voices?
I would like to use such function for whenever I would be away from my pc (like a weekend with the extended family) and it would be pretty neat to return to a whole cast of characters done cooking and ready to go.
>>
>>41070370
Catchy!
Even if I didn't recognize AJ very well at the beginning, it changes very fast.
An awesome work indeed!
>>
>>41073112
Mystery solved. Thanks Anon. I thought I've made long post lists previously, but maybe it's a recent change. Who knows anymore.
>>
Page 9 bump.
>>
>>41074548
You mean page 10 bump, right?
>>
When running training for so-vits-svc 4 I got this strange error, does anyone know what this is about, as I've never seen it before?
>>
>>41075737
Did you run out of RAM perhaps?
>>
>>41075766
Nope, I was montoring the RAM and VRAM usage and while it was pretty high it didn't ran out (usually if a program used up all my ram in past the pc would crash). Like I've said I never seen this error before and I did trained few sovits voices in the past.
However it is possible that another program and/or browser that was running at the same time could have somehow affect the training process, so I may re-attempt training some other time.
>>
>>41075879
For whatever reason (usually running out of non-reserved memory) the computer started using the hard drive as additional memory through a pagefile. This is the default behavior when low on memory but it's no impossible some weird edgecase made it try to reserve an absurd amount of memory. How much space do you have left on your OS HDD/SSD?
>>
>>41075203
That works too.
>>
File: 1672976889910218.jpg (97 KB, 385x501)
97 KB
97 KB JPG
Logically speaking, how would you bridge the gap between a simple text to text model (think /CHAG/) and allow it to interact with the world mechanically?
my initial guess it just wouldn't be advanced enough, it's built exclusively for text to text how could you take it to the territory of motion, not even anything accurate but just any mechanical capabillities at all?
>>
>>41078120
strap it into a horse-shaped BattleMech, of course
>>
>>41078120
Give it senses. A text generator model already has a way to interact with the world what it doesn't have is a method to autonomously and accurately generate context.
>>
Bump.
>>
>>41078120
>it's built exclusively for text to text
They do image to text too now. It's basic and I don't think it would be enough for complex motions, but that should give some basic interactions with the world around.
>>
>>41078120
You can formulate it as an action that an llm can choose. The environment can be mapped by the robot in 3d and objects labelled with a corresponding model. it's no different than those dog robots.

If you want something less scripted there are models that estimate humanoid skeleton animation given a description, same can be applied to ponies.

Theoretically it's already perfectly possible, you would only need a tech guy, a robotics guy, 3d printer and an H200 server.
>>
>>41078120
I would imagine something along the similar Sims like "emotional needs" system would be enough for most interactions, as long as it would be correctly exchanging the information between itself and the more advanced ai text model to give an illusion of the robot acting semi-intelligent.
>>
>>41065205 >>41066442
https://huggingface.co/Amo/so-vits-svc-4.0_GA/tree/main/ModelsFolder/MLP_Rara_mix
Hi Anon, here is another model of Countess Coloratura trained for sovits 4.0.
>>
File: 1673497361137281.jpg (61 KB, 600x600)
61 KB
61 KB JPG
>>41078184
>text models can already interact with the world
by this im assuming you just mean interacting with the world by generating text right?
>but they dont have a method to autonomously and accurately generate context
isn't context basically just the 'memory' a model has of past interactions?
i wish there was a way to increase context through RAM usage or maybe even hard drive storage
wish we could just pump out some kind of simple python code to do that, infinite context, but at the same time how fast would it be able to make use of that context? if it's on a hard drive as opposed to RAM then I would imagine it would take much longer to load that context
>>41078857
you would probably need something more like video to text in order to make it interact in real time rather than hyper delayed reactions
>>41078917
>the enviroment can be mapped by the robot in 3D and objects labelled with a corresponding model
im assuming you mean translating a video feed with a model that identifies stuff so that it can be interpreted by the main LLM that has the personality and movement control and all that
where would you even get a model to translate video to text in real time?
>models that estimate humanoid skeleton animation given a description
this one im completely lost on, what does this mean and how would it be of use? maybe im just sub 80 IQ
>>41079184
i know what you're talking about and part of that is you could have a self reflection mechanism
this might be really niche but i remember seeing i think it was on GitHub a program that you could put your API key in and it was called 'auto computer' or something like that where it could perform tasks by asking the AI to do something and it will break it down into actionable steps before performing it which the AI then generates after being prompted to create the steps to do something
In this way, you could have an automatic prompt after each interaction to gauge emotion like 'how did this make you feel' which will then output an emotional state of some kind i would imagine
if you wanted to take it further you could make it more dynamic with percentages like this
happiness - #%
sadness - #%
anger - #%
and it could factor in previous interactions to make a totaled emotional state, similar to how you could have a bad day at work but then you get to hang out with friends after and you start to feel better
but then again you would need a mechanism to calculate whether those emotions would fade away after some time and some emotions stick longer than others which would be confusing to deal with and I think is a project all on it's own

Last thing I wanted to ask was how could you imagine someone pulling in funding for a project like this, assuming someone had baseline needs met and some kind of disposable income to actually complete the project, what would they do with the project itself or even the past iterations that were made?
>>
>>41079629
>isn't context basically just the 'memory' a model has of past interactions?
You can think of context as the working memory of a model. Without context a model will react to everything entirely based on its 'instinct' (that is completing based on averages). Giving a model a way to autonomously observe (generate context) what's around it is the first step to having it start to form an understanding of the world. Of course the model would be limited by the maximum size of useful context the system can handle for how complex the models understanding of the world can be.
>>
File: 1417744044882.gif (60 KB, 720x1050)
60 KB
60 KB GIF
>>41079644
oh AUTONOMOUSLY creating context
yeah that makes more sense
it has to be able to learn on it's own without human input
i think this would require some form of recursive self reflection like 'what am i looking at right now' but at the same time could cause problems since you could look at the same thing and have it make a new entry even though it looked at it just a few minutes ago
a big problem would be just rewriting the same context over and over and filling up context space needlessly
i wonder hypothetically how much storage you would need for context alone
assuming you could convert context to like, idk, a text file or some kind of file which could be read fast enough to not slow down response times
I wonder how fast the context would fill up though, i mean if it's taking literally everything in that it sees i would imagine it's gonna fill up crazy fast

the main bottle necks here are how fast you could access context and the sheer amount of storage that it would need, this is assuming you could even somehow automatically write the context into a TXT file and keep it stored on an actual hard drive rather than contained within the AI or however the fuck it works, I would have to ask /CHAG/ where context itself is even stored since I'm a spastic and don't know how that works but I do know that past it's limit it starts overwriting previous context (I THINK)
>>
>>41079790
Yeah there are many barriers still to actually getting such a system working. Both on the hardware and software side.
>>
>>41079815
Maybe it'll be simpler to have an AI-controlled agent in a virtual environment, like Second Life. I know people have trained AIs to play video games before. You could try asking the popen thread how hard it would be to get an AI ton control a popen.
>>
Are there any good samples or excerpts of ponies cheering? What about booing?
>>
>>41079815
not that anon; how feasable and/or difficult is it to model an AI's memory after the human brain, i.e. short term memory, long term memory, and the conversion between the two?
>>
>>41080451
I know there are attempts of varying quality using summarization, clever systems of reinforcing and forgetting context and such but they're to my understanding just the AI compressing and managing context. I'd guess a true split would require some new architecture? Not exactly an expert here.
>>
>>41080486
honestly i see no reason to make short term memory unless having everything in long term memory would be too many files and take too much storage or confuse the model somehow
>>
File: Untitled.jpg (322 KB, 1565x957)
322 KB
322 KB JPG
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding
https://arxiv.org/abs/2405.03121
>The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a wide range of facial dynamics, including subtle expressions and head movements. AniTalker enhances motion depiction through two self-supervised learning strategies: the first involves reconstructing target video frames from source frames within the same identity to learn subtle motion representations, and the second develops an identity encoder using metric learning while actively minimizing mutual information between the identity and motion encoders. This approach ensures that the motion representation is dynamic and devoid of identity-specific details, significantly reducing the need for labeled data. Additionally, the integration of a diffusion model with a variance adapter allows for the generation of diverse and controllable facial animations.
>The weights and code are being organized, and we will make them public as soon as possible.
https://github.com/X-LANCE/AniTalker
I was really confused since I thought I had already read this paper but it turns out there was another one
https://arxiv.org/abs/2403.17694
that I had a link to their implementation in the links rentry
https://github.com/Zejun-Yang/AniPortrait
but bizarrely even though both papers are chinese the anitalk one does not even mention the aniportrait paper or compare against it. anyway it has face cloning ability for anyone interested
for you guys specifically I think a new model or a finetune would need to be done with ponies facial motion data
>>
File: 649480.png (303 KB, 2090x2350)
303 KB
303 KB PNG
march of the pigs
https://files.catbox.moe/s3bd4h.mp3
>>
>>41080451
Any language model is just computing a probability distribution for the next token, a token is a word or part of a word. The context is all the previous tokens. You take the model's prediction, sample the token from predicted probability distribution (not necessairly the most likely one) add it to the initial sequence and refeed it to model, repeat the process until you have generated enough tokens, or stop when the model outputs a special end of sequence token. You can manipulate the context any way you want, inserting information that might be useful for the model, such as surrounding info, for corporate cloud models you dont get such option. There are also multimodal models that are trained together with, say, image description model, so that the image description outputs tokens, which are inserted to model context, that aren't words yet are understanded by the model, this is how gpt4 works.
If the context has significantly more tokens than the model was trained with, the predictions will be shit. You can indeed try to summarize the most important points and insert them to the beginning of context.
>>
>>41070370
>made with so-vits 5.0
rvc v3 better catch up soon this is amazing
>>
>>41080190
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
SFX and Music folder.
>>
vul - I'm trying to address this post from Synthbot that I previously forgot to sort out >>41015670.
He's asked to re-export some of the sorted audios from S1 and S2. For some reason I'm now getting errors whenever I try to open saved projects in the PonySorter, both the most recent version and previous versions before EQG became a factor. The common error reported is that it can't find episodes in the labels index, though I'm sure that the supplied index JSON file contains the required data. It's been a while since I last opened the PonySorter so perhaps there's something I've just forgotten that I need to do.

Here are the save files I created for S1 and S2 - can you see if you can get them to open on your end, and then perhaps find what's going wrong?
https://files.catbox.moe/ssc719.json
https://files.catbox.moe/kq9tt7.json
https://files.catbox.moe/f3b9uw.json
>>
>>41081700
honestly, it could've been made with RVC too. I don't quite remember. I was trying a lot of different settings on haysay and I don't know which result I ended up saving.
>>
>>41081961
Does your config index_file point to episodes_labels_index.json?
>>
>>41082020
Yeah it does, which is minaly why I have no idea what's going wrong with it.
>>
>>41081052
kek
>>
>>41082020
Can you post the specific error outputs?
>>
>>41082214
After trying to load the first savefile >>41081961
>>
>>41082251
Can I see the labels index file as well
>>
>>41082251
>>41082481
I can't reproduce this problem on my end but this labels index is what I used
https://files.catbox.moe/kbr1bh.json

Also reading the post chain further--is this about the dialogue from those episodes, or SFX and Music?
>>
>>41081752
Thank you!
>>
LLAniMAtion: LLAMA Driven Gesture Animation
https://arxiv.org/abs/2405.08042
>Co-speech gesturing is an important modality in conversation, providing context and social cues. In character animation, appropriate and synchronised gestures add realism, and can make interactive agents more engaging. Historically, methods for automatically generating gestures were predominantly audio-driven, exploiting the prosodic and speech-related content that is encoded in the audio signal. In this paper we instead experiment with using LLM features for gesture generation that are extracted from text using LLAMA2. We compare against audio features, and explore combining the two modalities in both objective tests and a user study. Surprisingly, our results show that LLAMA2 features on their own perform significantly better than audio features and that including both modalities yields no significant difference to using LLAMA2 features in isolation. We demonstrate that the LLAMA2 based model can generate both beat and semantic gestures without any audio input, suggesting LLMs can provide rich encodings that are well suited for gesture generation.
again another thing you'd need to new model with a pony dataset but could be interesting for anyone wanting to animate using generated voice. maybe worth emailing the writers to see if they'd post their code/weights
>>
File: 1715668337046185.png (484 KB, 1500x793)
484 KB
484 KB PNG
hypothetically speaking, is it possible to train a model while using it?
Like lets say your have a past conversation you had with it, and its getting too long for the context window
now my gut instinct says 'duh no fuckin way can you train a model while also using it or holding a conversation with it' but i figured i might as well ask
If it is impossible to train and use at the same time then why is that?
>>
>>41083350
duh no fuckin way can you train a model while also using it or holding a conversation with it
>>
>>41083350
>Is it possible to train a model while using it?
No.
Your sessions are saved inside text files so you could train a model on it afterward if you want. But training requires a lot and lot of data.
>It's getting too long for the context window
It's not the best, but you can use solutions like summarization or lorebooks if there are important events you want to keep in memory. You can also configure some part to always be in context no matter what.
>>
>>41082481
https://files.catbox.moe/kbr1bh.json

>>41082809
Still getting the same error with that index file. It's the exact same filesize as my current one so pretty sure they're the same. No idea why it's not working.

>Dialogue or SFX and Music?
It might be SFX and Music actually, in which case this issue doesn't matter.
Synthbot - I've forgotten the context here, am I re-exporting dialogue or SFX and Music?
>>
>>41083350
From a theoretical standpoint, I don't see why not. I don't know of any LLMs that currently do this, though. Look up "online supervised learning" and "reinforcement learning".
>>
>>41083871
You're re-exporting the labels and audio files for just the SFX and Music for the episodes listed in >>41015670. Those are the only ones with mismatches between their label files and audio files.
>>
>>41084013
Because you need an estimate of model's performance for reinforcement learning. This means you will have to rate every response yourself. Finetuning a model like this by yourself is insanity. If you want to understand what you are talking about, I'd suggest reading books on the subject.
>>
File: 3098740.jpg (22 KB, 290x292)
22 KB
22 KB JPG
>>41084155
Okay, new problem - turns out I only have the dialogue tracks, I didn't hold onto any of the full audio versions with the SFX and Music, presumably for space saving as I have a LOT of stuff saved. I can't find the original source audio that was used, none of the versions on yayponies seem to line up with the labels and the old torrent links don't have anyone seeding them.

Here's the original torrent, don't suppose anyone could reseed, or reupload if they have the same audio used from way back when?
https://files.catbox.moe/gbgsld.torrent
https://desuarchive.org/mlp/thread/33700529/#q33717152
>>
File: exampl.jpg (105 KB, 922x435)
105 KB
105 KB JPG
>>41083350
>>41083781
if you could automate lore book entries you could essentially create a sort of memory system
though i feel that there would be issues regarding correct key wording and making the content actually accurate and relevant to the topic at hand which it needs to memorize
im sure with testing you could get decent results though which would inevitably make context exist more so for how much of the lorebook it can recall

although now that i think about it this could cause insane bloat with it recalling even the most mundane of things and absolutely WRECKING the context with nonsense
>>
File: 1536707646075.png (537 KB, 958x958)
537 KB
537 KB PNG
>>41084554
furthermore, you would need a system where everytime it learns something new about a topic it doesn't over write stuff and accidentally remove key parts of the lorebook entry

guess you could see if you could min max stuff in regards to lorebooks like that one guy who did a test and found out you could 'typelikethistosavetheamountoftokensyouhavetouse'
other than that, you would have to monitor how in depth the automated lore book entries to make sure it's all relevant info

fuck i wish i had a powerful graphics card
>>
techno bump
>>
Pumps and a bump
>>
There are XTTS models of main six?
>>
>>41084554
Candid question, but why not reduce the description to a small nimber of word for "lesser known ponies" ?
That would allow it to still know the ponies, without using too much ressources?

Like: "Ocatavia Melody is a sophisticated cellist grey pony mare" ? Still better than nothing, and would keep data low?
>>
>>41086484
>>
Page 10 bump.
>>
>>41087670
>>
>>41067471
>Long life to the PPP!
AMEN!
>>41078120
Moving from CHAI (Character AI chatbots) to giving robots a mechanism (usable) from which they can function and begin to affect the world requires servo motors (for movement functions) and well, ah...
One motive has changed. I believe that in the future autoforce will be required for all robots.
>What is "Autoforce"
It's a telekinetic leveraging superpower/ability
that uses microwave resonance and only works on metalloids/metals or on specific objects that have been coated with a special coating.
>What is it for?
For getting "realism" into our robotic pones of course
mostly for unicorns since teleportation isn't possible—not unless you're willing to "deconstruct" and reconstruct your robot and transport it and its assorted accessories somewhere and then tap on it like its memory has been vaguely veiled for what "the past 30 minutes or so" and so it "thinks" but only THINKS that it's been teleported but in reality all you did was shut the power off and moved it and then "woke it up" on-time
So giving their horn a shooty-energy weapon or magnetic-beam-thingy works much better
>But why not lasers?
You see, lasers; all cool and everything, not complaining.
But-HEY-imagine you're looking at your waifu (or she's not looking at you, whatever) and sOEMTHING happens (oh no!) :itsover:
bt then
your robot mare comes to the rescue wth her
>DUN DUN DUNNN
Lasserrr powwerrr!!!1one
but uh oh!
The laser done highlights your face and then SWA-PTchooo!!!
BEAM to the EYEBALLW!!@@HAT!?
wHAT?
ouch
then yur
eye is
gone
ao o
ouch
>So thta's why we use microwave weapons?
Heh yes exaclty
>>41078137
>>41078184
>>41078917
these area all obvious (*(and valid)) options too
*but they aren't actually the answer we're after there though
>>
>>41088579
based schizoposter
>>
File: Blini.png (638 KB, 811x739)
638 KB
638 KB PNG
Hey /ppp/, I don't know if this is the right place to ask, but I was curious if you guys knew of any sort of any AI that helped translate audio to english. Specifically, I wanted to translate this video.
https://www.youtube.com/watch?v=W7oWlSnf4I4
I've heard it's possible, but the video length and the audio quality might pose a problem.
I was just interested in seeing if you guys knew or heard of anything that could help me.
>>
>>41088846
>https://github.com/AmoArt/Automatic-Wavs-Transcriber-VOSK
It MAYBE possible to use the above code, if you replace the code part that refer to English model with the "vosk-model-ru-0.42" from here https://alphacephei.com/vosk/models
You will still need to download the raw audio and chop it up into small pieces before it will be usable by the Vosk transcript code. (I do not guarantee that this will work).
OR alternatively you can pump up the volume and pull out the phone with that "auto translate spoken language" google translation option.
>>
>>41086484
The way lorebooks are typically handled is that they don't insert any info unless one of the keywords appears in the context. So the problem, imo, is less "Octavia's context has too much tokens", and more about getting the keywords right. If your automated lorebook uses too obvious words then you'll get flooded with garbage bloat context, and if the keywords are intentionally very specific to avoid this, then it might not hit them even if the object/concept from the lorebook is being discussed.
>>
>>41085697
Not to my knowledge. IIRC I tried to finetune Celestia and Twilight and it didn't fly so good

>>41086484
>>41089195
IDK what the stage of development is on this problem but here is what SillyTavern has to say about smart context
https://docs.sillytavern.app/extensions/smart-context/
>>
I need a mare to bump my face with her rump
>>
Bump for Rainbow Dash cucking me.
>>
Bomp.
>>
>>41091345
>>
>>41088846
https://ai.meta.com/blog/seamless-m4t/
https://huggingface.co/docs/transformers/en/model_doc/seamless_m4t
>>
>>41092205
nta but this is pretty interesting, I would love to see someone test it with more weird japaniese references (Im going to assume it will not make the "Jelly Filled Doughnuts" tier mistakes).
>>
What's 15 cooking? I made my monthly rounds on his site to point and laugh, but noticed that it's blank. Last time that meant that the site was going to be back up within a month or so. What is pulling out of his ass after 2 years of absence and leeching off of Patreon money?
>>
>>41094279
I don't know I think he might have just given up It's hard to know considering he doesn't communicate very often I really hope he hasn't given up because I just want to hear a good derpy voice again If he has given up, I just really wish he would open source it at least
>>
>>41094279
>>41094402
i heard a rumor that 15 has been battling lawyers and court cases, like viacom, hbo or hasbro, forced a DMCA to take down the site.
>>
>>41094418
If this is true, I wish he would just say so instead of just going silent
>>
>>41094495
When you end up in legal shit, the first thing your lawyers are gonna advise is for you to shut the fuck up, and sit the fuck down, so you don't end up digging yourself any deeper.
>>
>>41094530
Yeah, but he could have just said oh there's been an issue with thing and putting it pause or something Instead of saying it will be out next week and then go disappear
>>
>>41094279
He's fapping to dashfag's ass.
>>
>>41094530
Or, he's only using his website as his stepping stone for his Computer Science Ph.D. but the university stops funding him for being a failure.
Datacenter isn't exactly cheap.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.