/g/ - /asg/ - AceStep General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/asg/ - AceStep General 02/16/26(Mon)15:01:13 No.108164777

File: acestep2.png (2.3 MB, 1344x768)

/asg/ - AceStep General Anonymous 02/16/26(Mon)15:01:13 No.108164777

>What is this?

A local open weights music generator, like Suno and Udio.

>Original repo (includes lora training)
https://github.com/ace-step/ACE-Step-1.5
>Comfyui guide (but use the SFT model instead of Turbo, CFG=1 and 50 steps)
https://docs.comfy.org/tutorials/audio/ace-step/ace-step-v1-5
>Suno-like UIs
https://github.com/fspecii/ace-step-ui
https://github.com/roblaughter/ace-step-studio
>Cover and Edit modes
https://github.com/ryanontheinside/ComfyUI_RyanOnTheInside/tree/main/examples/ace1.5
>Cover and reference song tutorial
https://www.youtube.com/watch?v=sv4pNrjRh7s

Share your gens and lora results.

Keywords: music gen, local model, song gen, suno, udio, acestep, ace step, lmg,ldg, dmp

Anonymous
02/16/26(Mon)15:27:56 No.108165006

Anonymous 02/16/26(Mon)15:27:56 No.108165006

Britney Spears lora:

https://voca.ro/16blE7la2Ff8
https://voca.ro/1eLHswbE9ZHK
https://voca.ro/1me4VBIkzHfK

To that one anon that is claiming "lora training on SFT doesn't work": this is a Lora trained on SFT =)

Anonymous
02/16/26(Mon)16:12:33 No.108165295

Anonymous 02/16/26(Mon)16:12:33 No.108165295

based thread! i don't use ACE Step but the best wisheS!

Anonymous
02/16/26(Mon)16:15:01 No.108165310

Anonymous 02/16/26(Mon)16:15:01 No.108165310

>>108164777
Nakadashee AceStep-chan

Anonymous
02/16/26(Mon)16:52:52 No.108165607

Anonymous 02/16/26(Mon)16:52:52 No.108165607

File: default.jpg (23 KB, 586x275)

23 KB JPG

>>108165006
it is...ok, can hear her singing style from time to time.
my internets is cutting out for half a day and more lately since i live in a third world country (australia), so yesterday i had time to sit and test default settings as per developer instruction, results were meh via gradio.
must test comfy nodes since i got better results than via gradio interface.
one note, manual captions, removing redundant stuff like too many attributes llm gives (in case it does detect correct instruments), help quite a bit.

enya anon done it really well via overfit, and if he sees this post;
what was your overfit setting?
high lr low rank low small-medium dataset?

Anonymous
02/16/26(Mon)17:19:32 No.108165764

Anonymous 02/16/26(Mon)17:19:32 No.108165764

>>108165607
>enya anon done it really well via overfit, and if he sees this post;
>what was your overfit setting?
>high lr low rank low small-medium dataset?

I used the default 0.0003 LR at 800~1000 epochs (I can't remember where I stopped), my dataset consisted of 24 songs

Anonymous
02/16/26(Mon)17:21:37 No.108165776

Anonymous 02/16/26(Mon)17:21:37 No.108165776

>>108165764
As of the rank, I used rank 128 I think, it's the maximum my card supports without going OOM

Anonymous
02/16/26(Mon)18:42:02 No.108166294

Anonymous 02/16/26(Mon)18:42:02 No.108166294

>>108165776
>>108165764
ty, will try it.

i used 12 songs, same genre and different bands, results are meh and sometimes ok.

comfy gens are better, if i crank lora strength to 2 there is that fm radio but super high can+static noise, yet it does replicate training set song at around 80% of the content.

Anonymous
02/16/26(Mon)18:47:34 No.108166323

Anonymous 02/16/26(Mon)18:47:34 No.108166323

>>108166294
>fm radio
AM -.- radio

Anonymous
02/16/26(Mon)18:52:56 No.108166359

Anonymous 02/16/26(Mon)18:52:56 No.108166359

>>108166294
>comfy gens are better, if i crank lora strength to 2 there is that fm radio but super high can+static noise
I said that in last thread and I am going t say again, you are probably undertraining your models. Use a high enough LR, train longer, and use a high rank.

Anonymous
02/16/26(Mon)18:58:38 No.108166395

Anonymous 02/16/26(Mon)18:58:38 No.108166395

>>108166359
Also DO NOT USE THE LLM.
The LLM tends to weaken the Lora effect, sometimes it even changes the voice/singing style

Anonymous
02/16/26(Mon)19:05:42 No.108166444

Anonymous 02/16/26(Mon)19:05:42 No.108166444

does anyone have access to the suite Sony has produced to compare?

Anonymous
02/16/26(Mon)19:15:27 No.108166524

Anonymous 02/16/26(Mon)19:15:27 No.108166524

>>108166444
>suite Sony has produced
What even that is?

Anonymous
02/16/26(Mon)19:19:19 No.108166559

Anonymous 02/16/26(Mon)19:19:19 No.108166559

>>108165776
In my experience 128 has been good. I think people need to cook their LoRAs a lot longer though. At 1500 epochs 2000 or more is probably better.

>>108166395
Yeah do not use the LLM if you are using a LoRA. It straight up makes it own song and the LoRA is just antagonistic to it.

Anonymous
02/16/26(Mon)19:36:59 No.108166694

Anonymous 02/16/26(Mon)19:36:59 No.108166694

>>108164777
I'm a vramlet with 12GB VRAM and 32GB RAM. Can I use this and make loras? Also, I have a degree in music. Can this help me with anything on this matter, for example "make a bimodal pop song" or "make a kpop song using the hungiran minor scale"?
Also I'd like to train LoRAs of classical composers and mix them with other musical aesthetics.
Can I make a LoRA of my own musical compositions to figure out what resources or patterns I usually use?

Anonymous
02/16/26(Mon)20:04:10 No.108166901

Anonymous 02/16/26(Mon)20:04:10 No.108166901

>>108166694
>I'm a vramlet with 12GB VRAM and 32GB RAM.
Maybe, but you won't be able to use a higher lora rank on a regular trainer. You can try to vibe code with Claude or something a training script that uses Ramtorch or something like it to offload everything to RAM and train on GPU one demand, but it will probably be very slow.

>Can this help me with anything on this matter, for example "make a bimodal pop song" or "make a kpop song using the hungiran minor scale"?
If you create a properly tagged dataset, maybe. And btw, the model natively already requires you to tag songs by keyscale and bpm, besides the captions.

>Can I make a LoRA of my own musical compositions
Yes

>to figure out what resources or patterns I usually use
You can already "figure out" that by captioning the songs on Gemini Pro, you can explicitly ask it

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.