[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: acestep2.png (2.3 MB, 1344x768)
2.3 MB
2.3 MB PNG
>What is this?

A local open weights music generator, like Suno and Udio.

>Original repo (includes lora training)
https://github.com/ace-step/ACE-Step-1.5
>Comfyui guide (but use the SFT model instead of Turbo, CFG=1 and 50 steps)
https://docs.comfy.org/tutorials/audio/ace-step/ace-step-v1-5
>Suno-like UIs
https://github.com/fspecii/ace-step-ui
https://github.com/roblaughter/ace-step-studio
>Cover and Edit modes
https://github.com/ryanontheinside/ComfyUI_RyanOnTheInside/tree/main/examples/ace1.5
>Cover and reference song tutorial
https://www.youtube.com/watch?v=sv4pNrjRh7s


Share your gens and lora results.


Keywords: music gen, local model, song gen, suno, udio, acestep, ace step, lmg,ldg, dmp
>>
Britney Spears lora:

https://voca.ro/16blE7la2Ff8
https://voca.ro/1eLHswbE9ZHK
https://voca.ro/1me4VBIkzHfK

To that one anon that is claiming "lora training on SFT doesn't work": this is a Lora trained on SFT =)
>>
based thread! i don't use ACE Step but the best wisheS!
>>
>>108164777
Nakadashee AceStep-chan
>>
File: default.jpg (23 KB, 586x275)
23 KB
23 KB JPG
>>108165006
it is...ok, can hear her singing style from time to time.
my internets is cutting out for half a day and more lately since i live in a third world country (australia), so yesterday i had time to sit and test default settings as per developer instruction, results were meh via gradio.
must test comfy nodes since i got better results than via gradio interface.
one note, manual captions, removing redundant stuff like too many attributes llm gives (in case it does detect correct instruments), help quite a bit.

enya anon done it really well via overfit, and if he sees this post;
what was your overfit setting?
high lr low rank low small-medium dataset?
>>
>>108165607
>enya anon done it really well via overfit, and if he sees this post;
>what was your overfit setting?
>high lr low rank low small-medium dataset?

I used the default 0.0003 LR at 800~1000 epochs (I can't remember where I stopped), my dataset consisted of 24 songs
>>
>>108165764
As of the rank, I used rank 128 I think, it's the maximum my card supports without going OOM
>>
>>108165776
>>108165764
ty, will try it.

i used 12 songs, same genre and different bands, results are meh and sometimes ok.

comfy gens are better, if i crank lora strength to 2 there is that fm radio but super high can+static noise, yet it does replicate training set song at around 80% of the content.
>>
>>108166294
>fm radio
AM -.- radio
>>
>>108166294
>comfy gens are better, if i crank lora strength to 2 there is that fm radio but super high can+static noise
I said that in last thread and I am going t say again, you are probably undertraining your models. Use a high enough LR, train longer, and use a high rank.
>>
>>108166359
Also DO NOT USE THE LLM.
The LLM tends to weaken the Lora effect, sometimes it even changes the voice/singing style
>>
does anyone have access to the suite Sony has produced to compare?
>>
>>108166444
>suite Sony has produced
What even that is?
>>
>>108165776
In my experience 128 has been good. I think people need to cook their LoRAs a lot longer though. At 1500 epochs 2000 or more is probably better.

>>108166395
Yeah do not use the LLM if you are using a LoRA. It straight up makes it own song and the LoRA is just antagonistic to it.
>>
>>108164777
I'm a vramlet with 12GB VRAM and 32GB RAM. Can I use this and make loras? Also, I have a degree in music. Can this help me with anything on this matter, for example "make a bimodal pop song" or "make a kpop song using the hungiran minor scale"?
Also I'd like to train LoRAs of classical composers and mix them with other musical aesthetics.
Can I make a LoRA of my own musical compositions to figure out what resources or patterns I usually use?
>>
>>108166694
>I'm a vramlet with 12GB VRAM and 32GB RAM.
Maybe, but you won't be able to use a higher lora rank on a regular trainer. You can try to vibe code with Claude or something a training script that uses Ramtorch or something like it to offload everything to RAM and train on GPU one demand, but it will probably be very slow.

>Can this help me with anything on this matter, for example "make a bimodal pop song" or "make a kpop song using the hungiran minor scale"?
If you create a properly tagged dataset, maybe. And btw, the model natively already requires you to tag songs by keyscale and bpm, besides the captions.

>Can I make a LoRA of my own musical compositions
Yes

>to figure out what resources or patterns I usually use
You can already "figure out" that by captioning the songs on Gemini Pro, you can explicitly ask it



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.