[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: ACE.png (95 KB, 1920x1080)
95 KB
95 KB PNG
>What is this?

A local open weights music generator, like Suno and Udio.

>Original repo (includes lora training)
https://github.com/ace-step/ACE-Step-1.5
>Comfyui guide
https://docs.comfy.org/tutorials/audio/ace-step/ace-step-v1-5
>Suno-like UI
https://github.com/fspecii/ace-step-ui

Share your gens.

Keywords: music gen, local model, song gen, suno, udio, acestep, ace step
>>
Is audio gen lighter than image/text gen or will it still rekt gpulet users?
>>
>>108095115
theres like a huge range of models for image gen. Its a bit heavier than SDXL models but lighter than like Flux. I am running the comfyui workflow with 4GB vram and the generation time is about 8x longer than the song duration.
>>
how far are we from giving an ai a full album with corresponding lyrics and letting it generate more songs in the same style?
>>
>>108095075
>https://github.com/fspecii/ace-step-ui
looks unironically like vibecoded trash.
do we know if comfy is planning to implement more nodes?
>>
>>108095139
It works if you train a Lora. People have already trained loras with Michael Jackson, Linkin Park etc with success.
>>
>>108095139
you can do that now with the lora training feature. also supposedly the audio sounds a lot better too when you use a lora
>>
>>108095115
You need a 24gb GPU to run the biggest LLM text encoder together with the main DiT model comfortably, but if you disable the LLM, it works even on CPU-only for VRAMlets, but the output quality will be shittier.
>>
>>108095174
>>108095168
do you give them snippets or full length mp3s?
>>
>>108095285
You can train Loras with full length tracks as long as your GPU doesn't OOM in the process.
>>
>>108095306
I'll bite, how many minutes of audio for a decent lora?
>>
>>108095379
Any full album (~11 songs) works
>>
File: may_i_see_it.jpg (28 KB, 500x378)
28 KB
28 KB JPG
>>108095168
>>
File: acelokr.png (120 KB, 598x527)
120 KB
120 KB PNG
https://xcancel.com/bdsqlsz/status/2020432198210613708

Based if true
>>
>>108095075
Can I use lora with ComfyUI yet?
why is it broken?
>>
>>108095560
You have to convert the lora first. I had Claude to write a python script for me that converts it to a format Comfy accepts and it worked perfectly.

You have to convert the keys like:

new_key = k.replace("base_model.model.base_model.model.", "diffusion_model.decoder.")
>>
https://voca.ro/1o8PRqN0Gbae
>>
File: AE86-fifth-stage.png (1.15 MB, 1600x900)
1.15 MB
1.15 MB PNG
https://voca.ro/15rN76Zadfqu
>>
>>108095536
is LoKr another Lora replacement like Locon Dora etc?
>>
>>108095768
sounds like low quality mp3 but overall good
>>
>>108096107
Yes



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.