[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: mqdefault[1].jpg (7 KB, 320x180)
7 KB
7 KB JPG
I wasn't done talking about ace step edition.

>What is this?

A local open weights music generator, like Suno and Udio.

>Original repo (includes lora training)
https://github.com/ace-step/ACE-Step-1.5
>Comfyui guide
https://docs.comfy.org/tutorials/audio/ace-step/ace-step-v1-5
>Suno-like UI
https://github.com/fspecii/ace-step-ui

Share your gens.

Keywords: music gen, local model, song gen, suno, udio, acestep, ace step
>>
Has anyone, and I mean anyone made a good lokr yet? It feels inferior to LoRa in every way.
>>
https://youtu.be/IjCOM825wk0

https://youtu.be/R6ksf5GSsrk

https://youtu.be/QzddQoCKKss
>>
>>108117091
give this to comfy users
>https://github.com/filliptm/ComfyUI-FL-AceStep-Training
it works
heavy on vram, read issues
bit finicky.

test that works:
up to 1200 epochs and no more.
save every 250.
rank/alpha 16/16
learning rate: 0.0005
batch size 1
gradient accumulation 1
leave everything else as it is in configuration node.

disconnected llm loader node before training.

use 4 - 10 songs as dataset to test.
adjust max_duration in those nodes - around 4 minute songs cutt-off.

use 1.7B as llm (it will take few minutes), 4b will take hours..
check manifest.json, almost same results will be given by 4b (retarded?)
llm processing depends on amount of items in dataset.
to test what llm preprocessing does use only 2 songs.
add others to dataset later.
you can modify llm captions by hand.

when loading lora use standard comfy lora loader.
other loaders have minimal or not effect (strange?).

up the strength of lora to 2.0 - 2.5 (it is a must).

add reference audio if you wish via link to ksampler.
denoise 0.85 - 0.95 with euler/simple.
use sft model for cooler results.

will post generation sample later since i am trying to get perfect one right now.
lora is not trained enough, twas a test.
comfy does not pick up lora properly as well - hopefully that will eb adressed soon enough.
>>108117140
what do you fags use, that ryan node?
should link tools next time.
>>
>>108117192
>what do you fags use, that ryan node?
random anime man UI.

https://github.com/sdbds/ACE-Step-1.5-for-windows/tree/qinglong

I actually just took is lokr training script and put it in the official UI. I cannot stand that fake suno AI with the backend hidden.
>>
>>108117140
i think that guy was just a troll, he was just fucking with people and succeeded very well apparently,
>>
>>108117276
Do any of them use the quantized models for reduced vram usage? Quantized for LLM and quantized for PT
>>
>>108117276
>i think that guy was just a troll
He's basically Chinese furk.
>>
>>108117274
ah. ty. i saw it but i did not open it at all since title itself seems as it was built for special case aka windose.
ty.
>took is lokr training script and put it
good stuff
>>
>>108117411
>good stuff
Debatable. I've yet to get a good lokr output. Be aware it doesn't work in comfyui without editing LoRA.py as well.

 if isinstance(model, comfy.model_base.ACEStep15):
for k in sdk:
if k.startswith("diffusion_model.decoder.") and k.endswith(".weight"):
key_lora = k[len("diffusion_model.decoder."):-len(".weight")]
key_map["base_model.model.{}".format(key_lora)] = k # Official base model loras
key_map["lycoris_{}".format(key_lora.replace(".", "_"))] = k
key_map["lora_unet_{}".format(key_lora.replace(".", "_"))] = k

return key_map
>>
The gradio is broken as all hell on Windows. Tried preprocessing some flacs, and it's complaining about their names or just can't process them, never had this issue when the dataset was smaller so I haven't been able to train anything in the last 5 hours. Discord everyone seems to be having issues with training as well.
>>
>>108117519
Easiest solution is to just use an older commit. Or, ask gemini or something. That seems to be what the repo owners do lol.
>>
>>108117276
Anime man claimed it was good, maybe his implementation is broken or something but it is possible to get somewhat mediocre results. It should theoretically be better than LoRAs though, so it's probably a miscalculation somewhere. There's a LoKR PR open on the official repo as well.
>>
>>108117536
They "work" but the default learing rate is 0.001
Which is insane. I think there are other issues as well, but having to pick through the parameters is time consuming.
>>
>>108117581
Wait, just an appendment to that. I tried 128/128 with an LR of 0.0001 I also set factor to 8 instead of -1

It's actually producing coherent results now. Might be something there.
>>
>>108117600
Wait never mind. It's still shit.
>>
Oh god I feel like a retard. Using the LM to generate audio codes basically nullifies the LoRA.
>>
But yeah lokr is pure garbo unless I'm proven otherwise. LoRA works just fine.
>>
>>108117444
check and ty

btw those nodes i found for lora will attempt to donwload models into models/acestep which can be filled with shortcuts if one has models

i got x2 good gens today with lora i trained but i am getting nothing but slop during last hour or so.
even slight change to lyrics has an impact on this model.

and lora is not trained anough.
lower rank/alpha eve more and up the lr = quicker overfit i presume, i must try that.
>>
>>108117838
>>>/wsg/6090567
got semi-decent one just now
bass and drums rhythm is ok
voice still close to ai slop
guitars = horrible synth

that is with lora at 950 epochs via those nodes (around 1500 steps = 250 epochs per those nodes)
>>
My findings so far
>big fat high rank LoRA = generally better
>LR should be very conservative ~0.00003~6
>GA or batch seems optional large GA following by supplemental training with no GA might be cool.
>Letting LM generate codes while using LoRA will basically override the LoRA
>On comfy you may need to crank the strength to 1.5 or 2.
>lokr seems to be a meme.
>>
>>108117906
>>On comfy you may need to crank the strength to 1.5 or 2.
This is false, it really depends how overfit the lora is.

>>108117906
>>Letting LM generate codes while using LoRA will basically override the LoRA
If you have that off, the melody will be very wild/random, and the main model is only 600M parameters, so...
>>
>>108117938
Do one with the LM codes with the LoRA and without the LoRA. They are almost the same.
>>
Anyone have settings that work well for slow, solo piano instrumentals?
>>
I wish there was a way to force key change in the middle of the songs, Udio excels at this, and this was a common feature back when music wasn't shit (before mid-2000s), the bias towards modern music unironically ruin the models
>>
>>108118234
[chorus - key change] doesn't work?
>>
>>108117091
Add this to the next OP:

https://github.com/ryanontheinside/ComfyUI_RyanOnTheInside/tree/main/examples/ace1.5

It contains workflows for Cover and Edit modes

>>108118310
Not really, from my tests
>>
>>108118764
>https://github.com/ryanontheinside/ComfyUI_RyanOnTheInside/tree/main/examples/ace1.5
I couldn't get any of them to work, it only output noise/garbage
>>
File: comfy fail.jpg (682 KB, 1960x2496)
682 KB
682 KB JPG
>>108117192
Trying this workflow now and keep getting this error
did I set it up wrong
>>
Can anyone confirm if negative prompts work?
>>
Just use the fucking gradio, comfy is a mess for now.
>>
Is ace able to make songs in Italian?
https://www.youtube.com/watch?v=-nvX9BKOnTA
>>
Any workflows for lyric rewriting in existing audio samples?
it's for memes
>>
>>108121226
Id rather eat my own shoes.
>>
>>108120317
Ok I'm back and it seems thread is ded and no one is genning shit because nobody can figure out this training crap

Seems like the llm version they provide is only useable with the clip from the turbo aio safetensors you get from ComfyUI
Then it tries to scan the music directory to auto-label but just fails immediately
>Scanning directory: H:\Music\whatever
>[FL AceStep] Starting auto-labeling...
>Starting auto-labeling...
>Prompt executed in 0.00 seconds
it even tells me the job completed with no problems even though nothing was generated, the output folder is empty
Already tried with every single mix of settings but nothing seems to work
>>
Ace step with a good LoRA is basically the best shit ever and nobody cares.
>>
>>108123488
Genning music is fun but it's not the best thing since sliced bread. I'm not spending hours training loras to make slop sound slightly less sloppy.
>>
>>108123488
Which lora anon? Is there a list somewhere?
>>
>>108123820
Nobody is ever going to share LoRAs for this. You need to make them.
>>
>>108123555
>I'm not spending hours
It's like 2 hours max for a LoRA.
>>
>>108123935
Well, fuck.
>>
>>108123935
>Nobody is ever going to share LoRAs for this
I wish zoomers knew what Torrents are. If you try to push torrents in this day and age, you'd get dead torrents with no seeds unless it's a popular TV show
>>
>>108120317
i posted that first link to nodes
i do not get that error and use same setup as you
>>108123232
my modles are not from comfy
i got huggingface repo file by file when ace was released
ad placed shortcuts into "model>acestep" my_shortcuts to models

and you should train on turbo that is what acestep developers say if you read issues on their github
sft is for full fine-tunes
>>
>>108124236
>you should train on turbo
Sure, if you want your music to sound like beep boop midi shit.
>>
>Gradio crashes and training stops because the inbuilt tensor board has too many points and crashes.

Gradio
>>
It's no suno, but it's okay sometimes.
>>
>>108124513
It really depends what you're after, with LoRAs it's a beast desu. Right out of the box? Meh
>>
>>108124513
suno gives me hives, does it occasionally have a good song?
>>
>>108124266
full model does not have what is required for lora, developers said that themselves, check issues

anyways those nodes i found:
- updated few hours ago
2. ran training
3. all generated audio via new lora code fail with corrupt audio (will crash your music player)

no idea what is going on atm
>>
>>108124974
Good thing I use comfy where this isn't a problem
>>
https://files.catbox.moe/07jvs0.mp3

What's the model?
>>
>>108125007
>nodes
is comfy

i updated comfy too so it can be either one messing it up
have to test mix of training nodes/comy old code later since i backup before update
>>
>>108123935
1.0 has LoRAs on HF. Why should this version be different? People think others would care, not as much as they think. Model can't be taken down unless artist files complaint. Training on their work is fair use, not illegal. Derivatives made e
With AI fall under covers/fan art, as long the LoRA is not overbaked to reproduce songs verbatim then I see nothing wrong.
>>
>>108125190
The difference is these work :^))))
>>
Modern problems: Having an AI song stuck in your head. Nobody else on Earth will ever know of it.
>>
>>108125240
That's literally why I'm using ai. I can make up my own songs, but the thing is they all sound like me-songs, imo pretty nice, but I would prefer a wider gamut.

I look forward to dubbing harmonies and solos onto ai and having it layer with a2a.
>>
together with image gen or video gen, and then I can edit a video. It's highly effective.
>>
:3

https://vocaroo.com/1oKyOdyQD8Hu
>>
>>108125879
This is totally the kind of song that would be playing at 1:30 in the morning in a night club while some bitch is screaming in my ear telling me what drink she wants me to buy.
>>
>>108117192
>>108123232
Ok I tried this shit again, got it to auto label all samples but then failed at Preprocess dataset because it ran out of vram to allocate
12Gb 4070 vramlet btw
seems like training isn't for us poorfags yet
>>
>>108126146
Ask Claude (Opus) to vibe code a Ramtorch preprocess script to you. I did this and never again ran into OOM issues when preprocessing.
>>
>>108123232
I wish, I can't get pass a numpy.import_array error.
>>
>>108126146
>>108126466
you guise must always visit github link before using anything from that place
and then
- read home page "README"
- look at issues both opened and closed (open links if you see something interesting to your case)

you have not done that
not enough memory = not enough gas to run the nodes
>>108126152
try that if it works for you
or ask github developer to consider your case
>>
>>108126585
me again
and to reduce vram usage you could try feeding it snippets of audio samples, not full songs.

60 -120 seconds could yield lower vram usage.
>>
https://voca.ro/1eLqt5jUmHWl

What if I just dumped the entire Morrowind soundtrack into a LoRA trainer.
>>
Without LoRA
https://voca.ro/1cRNtD1kUoHk

With Lora.
https://vocaroo.com/1lmlWsJzUrRo
>>
Some more of my anime song LoRA. Just random lyrics and titles made my gemini.

>Overtime Fantasy ~The Demon Lord is my Section Chief~
https://voca.ro/1azCh43Ss7Xu

>UFO in the Tatami Room!? ~Please Don't Eat My Homework, Princess!~
https://voca.ro/145KN2PpfONh

>Absolute Territory! ~Please Buy 10 Copies
https://voca.ro/1dFicT2qic5u
>>
>>108127406
Pretty good, way better than without loras.
How many songs did you use to create a specific lora?
>>
>>108127625
11 tracks of that particular flavor.
>>
>>108127648
I'll have to try.
>>
Imagine having the amazing opportunity of training a computer model to replicate any set of songs from a library of hundreds of years worth of recordings mankind provided, and you decide to train on garbage high pitched tracks that are hardly appreciated by anyone other than mentally ill manchildren
>>
>>108127909
Absolute loser mindset right here
>>
We could share loras here
>>>/t/1374659
I'm training my first lora, let's see how long it takes.
>>
>>108127909
> meanwhile seedream 2.0 and other chinese shit
>>
can someone make me a lora with this style? https://youtu.be/PFl4QKl0WSE
>>
I trained a lora on several adult contemporary tracks from the 1990s and 2000s and it produces superior vocals and bangers much more often
https://voca.ro/1c3mmRMujTkn
https://voca.ro/1fBBblQgbPZq
https://voca.ro/19HYVVQOYnY7
>>
https://voca.ro/1gVJAWf65yyg
>>
Trained 500 steps and I can barely hear any difference, so I won't bother posting the two versions.
https://vocaroo.com/1KYsjYHyJrj0
>>
>>108129554
>Trained 500 steps and I can barely hear any difference
Anon, it's not the "number of steps" that matter, it's the loss values
You have to keep training until you consistently see loss values of ~0.18, and you need to ensure the lora rank is high enough
>>
>>108129554
It was epochs sorry, but thanks the loss wasn't in that range by a lot.
>>
By the way, I can confirm what the other anon said the other day that training a big Lora on a large well-captioned dataset does seem to improve the model's lyric alignment a bit, it even improves the parts of the song you are supposed to use tags to indicate instrumental solos etc
>>
>>108129554
a 1000 epochs and rank to 32. It shows.
https://vocaroo.com/1koKRZmH6ewZ
>>
:o I woke up this (afternoon) morning and I have Udio at home! WOW!!! I'm Amazing, I have my own UDIO! AND I have a full copy of Hogwarts Legacy. And some cold coffee. I may go to the store and buy a gallon of half & half to celebrate (yes, really, I'm whiter than u). To celebrate, I'll make a black lives happy song 87)

>>108130509
It's neat, is this meant to be Grunge?

You're going to need to alter it a tiny bit on the LLM side, because the 4B was taught to square everything up nice and Pro Tools tight. Real Grunge vocals miss the beat a LOT.
>>
I also want to remind everyone that Jeffrey Epstein is alive and playing Fortnight on a not yet identified account.
>>
>>108131242
Dead Kennedys.
>>
Anyone training LoRAs might want to look at

https://github.com/koda-dernet/Side-Step

It has a bunch of scripts for properly training sft and base along with a lot of other basic improvements that the default training script is lacking.
>>
>>108132321
>low vram support
vramlets, we can still pretend to be proper human beings...
I'm sold
>>
>>108132421
Keyword is pretend.
>>
I never trained a lora before, what do you actually use in the dataset?
I understand for the audio itself, but what about the "captions"? Just all the lyrics? A description of the genres? A description of the instruments used?
>>
>>108129073
>>108129337
>>108130509
acestep has way more potential than I expected, I wonder if some rich anon would make a finetune of it with actual copyrighted music
>>
>>108132740
The gradio has a dataset builder. It will give you a json with the correct FORMAT for building your dataset, but I must stress the llm is actually horrendous and auto captioning. You need to go back and manually fix the captions with the correct ones that fit your dataset and trust something like gemini can do the job better.
>>
>>108132758
Can you share an example of a proper caption of a known song?
>>
File: AS15T__00021_.png (369 KB, 512x512)
369 KB
369 KB PNG
New song genning. I have an sd1.4 gen run at the start, to give me a thumbnail (using a trick to get it to run first). sd1.4 is trivial at 512^2, and looks really neat.
>>
>>108132321
Thanks, apparently the vibe coded release software is not the real in house software.
>>
>>108132904
and I guess they won't share the in house stuff?
>>
>>108132758
They actually used Gemini 2.5 to do their tagging for training ace step.
>>
>>108132911
They probably won't. I don't think these song models are actually produced separately as is alleged.

picrel:

like all the ai crew in all of china is like fishing boats coordinating, competing, exploiting, according to bugman rules.
>>
>>108132912
>They actually used Gemini 2.5 to do their tagging for training ace step.

That doesn't change the fact the dinky little llm in the gradio UI is awful at tagging.
>>
>>108132904
The model is far too competently trained compared to the gradio interface for there not to be some fuckery going on. I'm just not sure what that is.
>>
>>108132750
no idea how suno and others managed to avoid that
it would be taken down very quickly
>>
>>108133090
They didn't and got sued and basically mafia forced to be on major's side :
https://www.yahoo.com/entertainment/music/articles/universal-music-ai-song-generator-112138759.html
>>
https://vocaroo.com/1n2Fh0SFpdN3

Untrimmed. The chef leaves the eyeballs on the fish.

:^)
>>
I am liking AceStep so far, but I still hope Alibaba releases the musicgen model they promised.
>>
>>108133247
The music industry is cartoonishly evil.
>>
>>108133322
Can suno or udio do this?
>>108133280
>>
>>108133353
also, audio quality is degraded by vocaroo.
>>
Does anyone know if the gradio thing generates audio codes? or is that audio code mechanism proprietary?
>>
https://files.catbox.moe/9hqykk.mp3

8^)

Blank.

Beautiful.

I think it defaults to nonsense-chinese if not given [instrumental] or whatever prompts.
>>
>>108117519
Figured out what this was, turns out the file paths I automatically named with a script were wrong in the .json.

First Initial D LoRA test. Didn't fully converge how I want yet, but it turned out neat.

Lower quality vocaroo since catbox/literbox are both down, not perfect yet but it's getting there. I'll try a LoKR next.

https://vocaroo.com/1jd1RPwi0YOk
https://vocaroo.com/1jbbWhW53zqp
https://vocaroo.com/19Rwwk5tiXl7
https://vocaroo.com/15yUhXzFwVEG
>>
Music generation can never reach the breadth and popularity of image generation because it can take seconds to appreciate an image. You're stuck for two minutes minimum if you want to appreciate music. This is confounded by the fact that everyone's taste in music is hyper specific, and one man's favorite genre might illicit disgust in another.
>>
>>108117176
You can add input audio now? I was using Yue before because ace step didn't have input
>>
>>108133908
Models don't thrive on popularity. They thrive on how fun they are to use. If ACEStep remains the only viable local solution (unlikely), it'll eventually blow up.
>>
>>108133925
YuE is significantly inferior to ACEStep 1.5, both in architecture, audio output diversity, speed. It was neat, but still behind commercial models. ACEStep is on par with those, so you shouldn't be using YuE anymore. But in short, yes, ACEStep 1.5 can take in audio input and do covers, audio repainting, and extensions.
>>
>>108133931
I don't care if it gets popular.

I have what I want. I have Udio at home. I'm set for life.
>>
>>108133951
That's what the indian bloggers say, anyway.
>>
>>108133931
>>108133959
it hard not test this model and mess with it atm
but (dont have links since i didnt even bookmark them) if i am correct ace is already working on next model
>>
yue is inferior, that is all.
>>
>>108133685
Loss is looking at least 2x better with LoKr and the default Gradio settings. Might actually converge now.
>>
Let me give you an example of what I mean by this thing ace step 1.5 is amazing:

I can gen unlimited *happy* 2016+ style EDM. I don't like new sad or moralizing or BAME crap.
>>
When training a lora I noticed cutting off all the instrumental only parts gives significantly better results, I think the model is learning the lyrics even during an instrumental outro or intro for example, it doesn't understand well when to stop. What else did you guys found out that increases the quality of the loras?
>>
>>108134180
Increasing the rank desu. Slide that shit all the way up. There was a writeup on the discord that basically concluded that if there is too much variation in your dataset and your rank was too low it would average accross all of the inputs. A big rank accounts for the variation in data. You just need a very conservatively low lr to go with that.
>>
>>108134018
What lokr trainer are you using and what settings? I couldn't get it to do anything except crab when I tried.
>>
File: 456454545648.png (140 KB, 1873x527)
140 KB
140 KB PNG
>>108134219
LoKr has been added to official Gradio, these are default settings
0.001
64/128
>>
>>108134245
Huh, I tried these settings on the random anime guy repo and it resulted in basically every song being played over itself at once after 500 steps. If it actually works out for you, I might give it another go.
>>
>>108134283
It varies by dataset size. I suspect since it learns so fast, might want to use tiny LR decrements. Everyone has been saying LoKR results in more accurate voices/likeness, and with complex genres it also helps. So I think it's generally accepted this is better. Initial D is no easy target, this is with a dataset of 70 songs so let's see how Turbo does.
>>
>When there aren't enough of songs you like in a given style to make a viable dataset

Songs for this feel?
>>
Nothing like the very first song you generate after training a lora being a banger

https://voca.ro/1oLuGAhFE5tk
>>
>>108134355
Not my kind of music but the quality is very good.
>>
Two made with Yue tonight
https://files.catbox.moe/tz26s7.mp3
App for destruction
https://files.catbox.moe/wltpvl.mp3
Guerilla transmission
>>
>>108134379
>https://files.catbox.moe/wltpvl.mp3
Was it your intention for the singer to sound like Chris chan mumbling over a poorly mixed instrumental track with a dollar store microphone?
>>
File: 1745613154461604.png (415 KB, 600x456)
415 KB
415 KB PNG
>>108134410
Who you suckas think you're sucking on
I'm the sucking boss
https://files.catbox.moe/544onu.mp3
>>
Stuck on choosing my next LoRA.

Choices:

1) Lil' Pump LoRA
2) Hercules the animated movie soundtrack LoRA
3) Yakuza Kiryuu Karaoke collection LoRA
4) Various stage musical tracks LoRA
5) Eroge background music LoRA
>>
Did they fix the sft training? I heard the training was coded to work only with turbo.
>>
>>108134508
https://github.com/koda-dernet/Side-Step

This guy has fixes for it in his repo. They work as far as I can tell.
>>
>>108134515
Yeh im using it right now, i was talking about the official repo
>>
>>108134523
I try not to pay attention to it. It's always breaking itself with 500 random ai generated commits a day.
>>
I am currently training an Enya lora, very curious about the results, especially she has a very unique style, will post results if it turns out alright
>>
comfy repaint nodes when?
cover/reference nodes when?
AIEEEEEEEEEEEEEEEEEEEEEEEEEEE
>>
>>108134611
>Hey chat gpt make an audio mask node that taks a vae encoded audio latent and a time range and then blends it back into the latent before decoding it
>>
>>108134630
is cover/reference then just playing with the denoise in this case?
>>
>>108134636
Pretty sure that's done with audio codes.
You can probably do a jazzy remix of a song more faithfully in comfy by just running the song through a lower denoise.
>>
the audio codes part is not publicly available. Hopefully someone will reverse engineer it.
>>
>>108132750
> a finetune
The base is too poisoned.
>>
comfyui needs to support FLAC thumbnails.
>>
Once you generate with ace step, are there models to enhance the result like seedvr exists for videos and images, to get rid of that low quality 64kbps mp3 feel?
>>
>>108135311
I use this
https://github.com/entrepeneur4lyf/Web-Audio-Mastering
>>
https://voca.ro/11kqJ8NyiVQH
https://voca.ro/13eMGMKVVkkd
https://voca.ro/1kKtMpYi0318
>>
>>108135465
Neat, I like the reduce stereo width thing.
>>
>>108117192
he done an update to his nodes

i have done some training tests; it fails to do llm captioning part and create samples (maybe it is just me and my setup).

so if you dont have all code nodes will not work.
i had samples generated with his old code.

if you train full songs nan loss info is displayed but training does happen - it starts to capture style at around 250 epochs and it is done at 500 steps.

dataset 5 full length songs, it takes around 13gb to 14gb to train fluctuates. he rewrote his memory management. songs were max 4.5 minutes in length. vocals and instruments are well captured.
test was retard settings of
- 128/256 rank/alpha
- 0.0001 LR
- 500 warmup


i done another test with 2 minute length snippets it hovers just above 8gb vram. no nan happens everything displays well in console and in nodes.
quality is meh but usable via hunting a good gen.
guitar instruments sound like synths.

those nodes might get good.
>>
>>108135700
>all code nodes
old node code
>>
>>108135700
that was me
>>108135704
and that was me

and it is me again
oh fugg
retard settings or not, i did try others as well o see will it prevent nan (it did not) did not help
nan is real
lora is not usable it is corrupt

seems those nodes can train only maximum of 2 minutes audio samples, they can't process full length songs hence the nan

and nan is real i repeat
>>
>>108135833
and me one more time, forgot to say
i forgot to hook it up (my wf is a bit messy) hence false excitement; i got default model gens -.-. but they sound so good i did not even notice lora was not active
>>
sft is necessary for really long amounts of text, but base is the real chad.
>>
>>108135854
What does base offer over sft?
>>
>>108135946
NTA but base absolutely nails vocals in cover mode with a single sample for me, sft didn't even come close with the exact same setup.
Instruments were worse though.
Wouldn't be surprised it it absolutely kills it with loras when I can be fucked to train them.
>>
anyone has any idea how to unlock the turbo steps and go beyond 8? i tried editing the limit in handler.py and made no difference
>>
>>108134601
As promised, here is some Enya, kek
https://voca.ro/1dnPwaTL3p7V
https://voca.ro/1a0PJll2cypA
>>
File: file.png (94 KB, 932x1290)
94 KB
94 KB PNG
where the hell are base and sft?
>>
>>108136227
quite lovely
>>
>>108136228
i downloaded them following the instructions in the GH, there's 3 turbo versions too (with different shifts)
>>
>>108135998
https://huggingface.co/ACE-Step/models
>>
>>108136228
You're in the turbo directory. Look at the author they have each one in their own repo.
>>
>>108136227
Someone assembled an experimental set of recordings and put it on, well, saying the name would get me b& because now it's like a shit site, but it used to be grand. One of those mega download type sites. Anyway, I'm sure someone made torrents.

Those would make amazing loras. For everyone, not just those interested in that kind of fruity music - because it would vastly expand the tonal palette

also why is dice game capcha always the highest #?
>>
https://voca.ro/18QToU7m2LXS
>>
My second Lora only works at 0.3 strengths more than that it collapses. Does the graph tell you anything, I am clueless.
>>
>>108137455
first time training anything? that graph means its fucking shit
>>
>>108137455
Did you use an audio crop node or what?
>>
so far, for me, all my covers have sounded like haunted house music. So, not totally worthless.
>>
>>108136281
>>108136244
ok thanks
>>
>>108129073
can you share the lora? I like it
>>
>>108134245
Got awful results out of this. Either my LR was too so it didn't learn anything, or the inference code on the meme UI for it is not right, or LoKR is a meme.
>>
So yeh sft training is broken, only turbo works properly, i tried side step and i get the same garbage results. This model has a lot more potential, but until someone fixes the sft training, we wont be able to reach it. Base non sft is not worth training because it dosent use the lm.
>>
>>108140757
Turbo is limited to 8 steps only.
>>
What are you guys using to download your albums for LoRA tuning?

https://github.com/vitiko98/qobuz-dl

The only just werks way I could find. Seems like the only reliable quick way to do it, but it requires one to be a paypig after trial period.
>>
What's the recommended mp3 quality for training?
>>
>>108141032
I recommend FLAC, since it's lossless and the preprocessor accepts that, but if you can't just get the highest quality you can get.
>>
>>108136227
that is very good
>>
>>108141073
Got it.
Well I have 320kbps mp3, they should be enough for my tests.
>>
>>108139986
I tried to warn you about lokr bro
>>
>>108137593
Nta but there’s literally no way you can infer the final quality of the LoRA from a loss chart outside of it doing something extremely unusual
>>
>>108140757
Skill issue, I trained several loras on the SFT model and they work fine
Remember to set Timestep shift to 1
>>
>>108142222
Yes i did set it to 1, do you have any advice? What are you doing differently? I don't think I'm making any mistake.
>>
>>108142222
I've also had fine outputs from sft. Even with shift at 3.
>>
>>108142309
>>108142252
I'm about to shoot myself
>>
lil pump LoRA

https://voca.ro/135cMhiSdc2H
>>
https://voca.ro/1kqmuHl6YVJN
>>
Does comfy support offloading like gradio now?
>>
Does anyone have tried training instrumental lora?

not related, just checking new gradio ui
https://voca.ro/13B13qagi5P7
>>
>>108142893
comfy has native cpu offloading since always basically, so yes.
>>108142982
man they make so many commits in their project, it's always in constant flux. I have like 4 versions of acestep right now too FUCK
>>
>>108143016
i am not touching any ace code for a full month
there is code updoot every 20 minutes average
and lot of llm nonsense
>>
>>108134499
>Eroge background music LoRA
This. And share plz
>>
Who said vocaroo degrades the audio quality? I upload an mp3 then redownloaded it, it has the same crc.
>>
>>108134499
a capella music (harmony) lora
>>
>>108143100
isn't it mono?
>>
>>108143034
I think I am going to do the same.
https://vocaroo.com/15pvmdQpY5GD
>>
>>108143301
no, same crc means no file modification so the audio should be the same
>>
>>108134499
>Various stage musical tracks LoRA
A Julie Andrews LoRa would be lovely.
>>
>>108143353
based
>>
File: 1771021102830.jpg (83 KB, 1178x827)
83 KB
83 KB JPG
>>108117091
>>
>>108143400
god dammit this shit is moving too fast, I can't keep up
>>
>>108143445
thats their SAAS grift
>>
https://voca.ro/1kvqKbVHkwg8

I wasn't happy with how the lil pump LoRA didn't really sound like lil pump, so I traied it again with a slightly better set of captions.
>>
File: 1758710699856385.jpg (680 KB, 1884x1464)
680 KB
680 KB JPG
>>108143353
>no metallic sounding voice
how
>>
Is it normal lora training preprocessing takes forever and using cpu?
>>
>>108143353
out of lyrics in the world and all of the topics you gotta be retarded
such is your life
>>
File: 55151125445.png (130 KB, 1901x800)
130 KB
130 KB PNG
Someone tell me how a vibecoded training UI (which I patched up with Gemini) is 20x faster than the original on 3090

This dataset was taking me 20 hours per run with same settings, now under 60 mins for the same 70 song dataset? Plus this properly has prodigy scheduler. The official Gradio is a disaster.

https://github.com/Estylon/ace-lora-trainer
>>
>>108143613
I've vibecoded at least three trainers at this point and they all are like done in like 1/4th of time of the offical because they keep forcing torchao on the windows users.
>>
>>108143604
>preprocessing
in any case it is not a good quality,
so use low parameter model and that one does all songs in few minutes max.

someone recommended music-flamingo model by nvidia which you have to be able to "install" since python+venv is required.
apparently it can analyze the styles of music and alike.
>>
>>108143623
I think I'm pretty much done with their garbage UI. It's sad that many will get introduced to ACEStep that way. Getting these LoRAs to work like normal in Comfy at this point is pivotal. They work at like a weight of 2, maybe there's a solution to make them work at a weight of 1?
>>
>>108143613
>vibecoded UI
should be safe to use,
now i must check it out.
dont use those that "vibecoded" hardware interaction unless you want rma your hardware sometime in future.
>>
>>108143659
Only thing is by default LM is loaded which I had to disable, you can't choose path for checkpoint and it auto downloads the model, and it gives some weird errors but those can be fixed with Gemini, then this UI is essentially the ai-toolkit of ACEStep imo.
>>
https://voca.ro/1nAHSP4V569N
>>
>>108143653
>Getting these LoRAs to work like normal in Comfy at this point is pivotal. They work at like a weight of 2
Just increase the rank and train longer (achieve a lower loss), dude
>>
https://voca.ro/1edUWyxS6MXU
>>
personally, I went with this:

nano ~/ComfyUI/user/__manager/config.ini

idk, that's where it is in my fresh install. I changed:

network_mode = offline
db_mode = local

if you know of anything else to do, let me know.

I don't care about Qwen 2. Here's Qwen 2. It's using prompt enhancement, I think, but it doesn't look special.
>>
>>108143378
This literally just happened. Great discovery, nobody else on the Internet knows this yet. No point in using catbox, except for workflow flacs.

>>108143613
The official gradio is a vibe coded fake front end. It's not what the chinese used to make ace step. Research papers are a skinsuit for them.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.