[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: ace step 1.5.png (245 KB, 1007x1076)
245 KB
245 KB PNG
This is the local audio gen general, but there is only one valid option for local audio gen. This is Ace Step 1.5. It released YESTERDAY.

They aimed for day 0 support, but really it doesn't work on everyone's machine. I have to use cpu vae, which causes generation times to increase by 10, and memory use is big too, nearly 50 gb at times.

So why sing the praises of Ace Step 1.5? Because it works. Local Suno is here.

When you think, "I've heard better made by Suno", remember that that is cherry picked out of thousands of gens.

comfy
https://blog.comfy.org/p/ace-step-15-is-now-available-in-comfyui

Official page, good luck getting Gradio to work, but the steps are here:
https://huggingface.co/ACE-Step/Ace-Step1.5
>>
>>108060884
https://files.catbox.moe/m8cxqb.mp3

to upload your own, use catbox.moe
>>
>>108061093
>>108060884
sounds robotic/shit
Also your OP looks like not-so-subtle advertisement you copypasted from reddit.
Gay nigger thread for gay nigger people.
>>
sounds like ass
not even talking about composition etc
the quality of the generated audio is ass
the low end is awful and mushy
top end is actually painful - almost like you can hear aliasing
it sounds like it's a 64kbps mp3 'upscaled'
tiny model
tiny model smell
>>
>>108061161
>>108061169
It's not for "audiophiles" it's for people who are cool and have a life.

Action living. It's not fat old people with headphones bigger than their old woman's tits.
>>
The jews have lost their power over ai music.

This is despite the fact it takes me 37 minutes per gen.
>>
>>108061190
nah, m8, i was ready to rent some gpu hours and train some sick loras for this ting, but the fidelity is just not there
maybe good enough for like mobile ads or parody shit, but that's like the lowest form of music innit
>>
I suck at Music and don't want to have to prompt my own lyrics.
>>
>>108061247
aw shucks, u wuz so gunna but dint huh. gawsh aw dat an sheit
>>
>>108061277
You can just find poems and paste them in.
>>
Voice quality sounds robotic, like it has a very low sample rate.
>>
https://files.catbox.moe/n0e07n.mp3

This is still at cfg=1. Because this is so slow on cpu vae, I have to just guess. I'm trying cfg=2 now.

comfyui only has the turbo model. Some people indicate they have gotten the base model to work, but I'll wait for a real wf.

and wow does comfyui need work done on audio stuff.

Someone *DID* report that Comfy-Zluda is working with ace step 1.5, on windows of course.
>>
>>108061388
Sorry, sunno shill, you're over.
>>
>>108061277
Use meme copypasta or lorem ipsum or a llm robobraindump or "my shitty lyrics, listen to them" on repeat. It's not that you must devise amazing text to get any music.
>>
>>108061407
Suno has the same problem, as with just about every other AI music generator I've come across.
>>
>>108061388
>>108061437
Udio ha less of this issue, but in ACEStep 1.5 voice quality varies depending in prompt, some of them are not as robotic, others are a bit more.
>>
>>108061437
Sounds like a "you" problem. As the tao te ching says:
Tao has three treasures which he guards and cherishes. The first is called compassion; the second is called economy; the third is called humility.
>>
>>108061548
There's always some difference upon which a complaint can be registered.

For example, rock n roll, the jazz fans said, was "too repetitive".

idk, whatever, so they don't like it, why should I care what people who are going to hell for supporting the jews are going to think of me?
>>
>>108060884
Anybody manage to get it running in comfy with AMD hardware? I've ran an issue where I can't find one of the proper nodes.
>>
>>108061629
I updated, and then I had to use --novram --cpu-vae

here is my actual launch command, on Linux:
PYTORCH_ALLOC_CONF=expandable_segments:True TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 MIOPEN_FIND_MODE=FAST python ~/comfy/ComfyUI/main.py --use-pytorch-cross-attention --novram --cpu-vae

the result is that a 1 minute diffusion takes 37 minutes, because first it has to cpu condition, and then after diffusion it does cpu vae.

--novram
causes cpu conditioning, I believe

--cpu-vae
obvious
>>
It takes a lot of attempts for the model to make something decent, but infinite retries unlike with Suno or Udio. I always write the lyrics.
I would like to train style loras for something more garage-y. Anyway here's a discarded version.
https://files.catbox.moe/w9c09t.mp3
>>
>>108061670
Hell yeah brother! I'll give those a try later today. Thanks!
>>
>>108061798
:^) it's fun.
>>
trying to see if I can get heun to work. I wound up with a bad gen, but I had a bad prompt too.

I'm using a new prompt, we'll see how it goes. here is the ai slobbered prompt:
A powerful, melodic death metal track driven by fast, palm-muted guitar riffs, aggressive double-bass drumming, and guttural male growls contrasted by clean, soaring vocal harmonies in the chorus. The song begins with a haunting, atmospheric guitar lead and thunderous blast beats, building into a relentless verse with rapid tremolo picking and symphonic keyboard layers. The chorus opens up with anthemic, harmonized clean vocals and a triumphant, melodic guitar line. A technical guitar solo erupts mid-song, featuring sweeping arpeggios and shredding scales over a dynamic drum performance. The track concludes with a dramatic slowdown, merging the main riff with orchestral elements and a final, echoing scream.
>>
>>108062046
idk. 60 steps, heun. simple scheduler. cfg=1.

I really need to try a lot more stuff, but comfyui needs to fix audio on AMD so it's fast.

but inference is fast. I just found out I am running batch=2 lmao.

it remains to be seen how well it keeps bpm and pitch. but, if it does, you could mix and match from many gens.

https://files.catbox.moe/nx3pir.mp3
>>
the risk of collapse in gens is like when you have a similar mora.

so like

I like cabbage.
my cab is rad.

it might do
I like cabbage is rad.
or
I like cab (noise) rad
>>
>>108062516
also like if you have
I like big red tomatoes
I red tomatoes biggest fan

it may only give you one full sequence of "red tomatoes" and like do a half-assed job on the other one. I think this is because the token is not intensified enough by repetition. maybe use (parens)???
>>
>>108061862
Yeah, also the default wf by comfy has way too low a # of steps. Like I'm thinking 60 is my minimum.
>>
anyway,
>contemporary R&B
but also
>bpm 200

what will it sound like? Join the musical adventure as we explore Ace Step 1.5
>>
Has comfy merged in thinking support yet? It actually helps with prompt adherence (not just the 4B LM model).

Repost of recent Japanese gens from previous /ldg/ threads, I'm starting to get hang of the model to get more of that v4.5 power, there's still much more to test.

Glitchhop-
https://files.catbox.moe/wjh4eo.mp3
Babymetal-
https://files.catbox.moe/3eccq6.mp3
https://files.catbox.moe/go8sj2.mp3

(All of these are prompted detailed in detailed way using help of a custom prompt template I created- https://pastebin.com/Xt551MqD)

The model has potential in terms of catchyness, definitely Suno v4.5/v5 on good gens, and perhaps a finetune dedicated to certain genres would push it a bit more towards Udio area voice quality and make it surpass Suno.

If you're not sure how to prompt something out of the model, a nifty feature that Gemini has is mp3 interrogation. Since the model was trained with the help of Gemini 2.5 captions, it helps to give it a song in style you'd like and ask Gemini to give you prompts for that, but of course you may also need to describe it precisely to Gemini as well because even that is not perfect.
>>
information gleaned.

I just tried cfg=8, 80 steps, with lcm. This resulted in a sileng gen. Generally, you want fewer steps with lcm, but idk, my gens are so slow that I am playing Battle ship with what's good.

>>108062683
Thanks! if you output to save audio FLAC it includes the wf.
>>
someone on you tube or twitch or whatever said to say that ace step 1.5 "is xyz"

none of their opinions are their own, and that's obvious because they resort to clipped platitudes.
>>
It seems the audio code in comfyui in general need work. older nvidia card owners are also having issues. (in addition to all amd owners)
>>
https://files.catbox.moe/76zi5j.mp3

trying different things (slooooowly).

prompt: contemporary R&B

LCM
cfg=1.5
20 steps
(simple)

really not sure. I know that lcm worked really well with ace step 1.35, not sure if that can be done here. I'll keep guessing.
>>
holy shit it's good enough

we are in stable diffusion v1.5 territory now
>>
>>108063426
Yes, and I think that the a2a idea is really spicy because the idea of upgrading audio you generate today with the next model is pretty exciting.
>>
>>108062883
Easier to spend your time pretending to have formed opinions on things instead of having to actually do things that cause you to form opinions.
>>
Is it at the point where I could generate convincing enough sounding instrumental loops and then mix them together?
Like can this thing accurately generate based on BPM and the like?
>>
>>108063621
We're really early. but asking if you can make loops indicates you don't know anything about edm. of course you can.
>>
>>108063625
>EDM
Slop genre imo, but agree to disagree.
Good to know though, I'll have to play around with it. Does specifying BPM in the prompt work accurately though?
>>
>>108063636
Why would I even talk to you? slop yo momma
>>
>>108062824
There is no workflow because it's Gradio.
Here's babymetal metadata
https://files.catbox.moe/zwvnzz.json
glitchhop
https://files.catbox.moe/4lm6te.json

Now testing a combination of three genres (glitchhop, babymetal, 8bit)
https://files.catbox.moe/mibncm.mp3
https://files.catbox.moe/tn3adx.json

Interesting that it can rap fast
https://files.catbox.moe/3v4zxj.mp3

This second version would be perfect if not for messing up the chorus "nyan! nyan! nyan! DATA SMASH!!", but I can't fix it because repaint lowers the quality on the Gradio version I'm using, not sure if the Chinks messed up implementation or repaint is that bad unfortunately.
>>
https://files.catbox.moe/cgpiq6.mp3
60 steps cfg 1 heunpp2 simple
>>
>>108063693
Lyrics in case you don't have gradio

 [Intro - 8-bit Boot-Up & Rapid Arpeggios]
pi-pi-pi! bo-bo-bo!
hajimaru yo, GAME START
kira kira RAM ga
baka ni natta—ERROR! (wah!)

[Verse 1 - Cute Idol Groove / Swung Beat]
asa kara zutto hyper mode
neko mitai ni janpu janpu!
atama no naka bug darake
demo waratteru kara OK

coin! coin! hirotte
HP kaifuku da yo!
sekai ga kashikoku natte mo
watashi wa pop-up girl

[Pre-Chorus - Glitch Build-Up & Rapid 8-bit FX]
doki doki CPU atsui
nettaiya saikou chouten!
countdown ichi ni san—
bara bara ni naru no!?

[Chorus - Kawaii Chaos Explosion / Heavy Guitar Riffs]
KYU! KYU! KYU! HEART CRASH!!
bitto ga tobu yo BAN BAN BAN!
metal na yume mo
chiptune de tokashite

nyan! nyan! nyan! DATA SMASH!!
egao de sekai o hack shiyou!
baka de mo ii jan?
kawaii wa seigi da mon!

[Verse 2 - Playful Rap-Sing / Glitch-Hop Elements]
pixel no umi o oyogu
pata pata te o futte
lag ga kite mo
love wa delay shinai!

sensei ga itteita
“chanto shinasai” tte
demo ne demo ne
tanoshii ga ichiban!

[Breakdown - Metal × Glitch Drop]
(Heavy distorted down-tuned guitars)
ZAZA—GSHH—!!
SYSTEM DOWN!! SYSTEM DOWN!!

[shouted] gyaa!?
[childish vocal] demo mada ikiteru!
neji to reesu de
tachimukau no da!

[Chorus 2 - Extra Hyper / Double-Bass Drums]
KYU! KYU! KYU! HEART CRASH!!
noise mo uta ni shichau!
garigari guitar mo
omocha mitai da ne

nyan! nyan! nyan! DATA SMASH!!
sukoshi kowareta mama de
sekai o dakishimete
waracchau no da!

[Bridge - Ultra Cute Drop-Out / Chiptune Only]
[whispered] shh… shh…
himitsu no patch day…

[sweet airy vocal] chu chu install shite
ai o saikidou!

[Final Chorus - Max Energy Idol Mode]
KYU! KYU! KYU! HEART CRASH!!
mirai ga buffer over!
demo demo ne
tanoshikereba OK!

nyan! nyan! nyan! DATA SMASH!!
kono oto ga tomaru made
watashi wa utau yo
LOOP SHITE EIEN!!

[Outro - 8-bit Fade-Out & Glitch noise]
pi-pi-pi…
clear shita?
mada mada tsuzuku yo
GAME OVER?
—uso da yo!
[End - Bit-crushed sparkle]
>>
Has anyone gotten decent results with 4gb VRAM or is it just not worth using? Skips whole verses and just mangles the lyrics pretty badly.
>>
>>108063854
hey zoomer. Don't you have some vibes to watch on twitch or whatever you lobotomy students do.
>>
>>108063660
I said agree to disagree. You're the one who assumed I was going to make EDM with it.
>>
>>108063867
Again, I have a life to live, and you're not a part of the human future.
>>
>>108061247
wait for the well done code
even ace devs did not release well done code yet
this seems to be good model
>>
>>108063888
True technically, but it's best to try to run it and complain. Generate pressure.

I know my future, though, most likely I'll have to vibe code this thing.
>>
>>108063897
> I'll have to vibe code
avoid such vile activity (use llms to guide you how2code if they r abel to do so, and not to 'buzzword' nonsense)
>>108063004
>>
>>108063854
>Skips whole verses and just mangles the lyrics pretty badly

Used 4B LM if you can (perhaps offload to CPU), enable thinking mode if you're using Gradio version, play around with BPM/seconds. I have found that if it messes it up it's either seed or just the song is too short for the chosen BPM. Also certain keyscales give superior results to others depending on genre.
>>
(trigger warning: this is Udio, not local)
https://litter.catbox.moe/p5jfhqft5ebihakn.mp3
https://files.catbox.moe/6n60yd.mp3
Local has a long way to go before it sounds this believable. And this is still a long way from where we might want AI music to be.

The problem with audio gen in general right now is that it can't sustain the listener's interest. At its best it sounds like sloppy unfocused jam sessions with occasionally cool ideas (although never a new one really). There's no "songcraft" at all. AI will be very cool for music when we are able to use it to dress up e.g. midi-style human inputs, but building songs from prompts is merely a fun way to waste time
>>
https://files.catbox.moe/b2cy03.mp3

Still not really getting very metal-y metal yet. The scream singing is pretty good tho, that's new for local models afaik

Feeling like heun2 is worth it, but we'll see. I'd really like this to be faster, because I want to know if LCM can work or not.

my experience with ace step 1.35 was that it was unpredictable with lcm, like sometimes it would produce good results, and sometimes it would freak out.
>>
>>108063924
I'm not really a guru guy.
>>
>>108064172
You will have no part in the future, old fart / zoomer. go back to your brainless fox news / msnbc / twitch streams of lobotomy.
>>
ace step doesn't know what a vocoder is... but I bet it can be prompted.
>>
>>108064172
>AI will be very cool for music when we are able to use it to dress up e.g. midi-style human inputs
Is that possible with cover mode?
https://github.com/ace-step/ACE-Step-1.5/blob/main/docs/en/Tutorial.md#2-source-audio-semantic-structure-control
>>
>>108064361
I don't think anyone has gotten 1.5 gradio to work on amd yet, but maybe someone can put up a guide. for now, I only have the comfyui version that's literally just the turbo model and t2a.
>>
There's some sort of weird background noise in all these posts that makes it sounds awful. Also I'm not installing chinese spyware lol wumao.
>>
https://files.catbox.moe/anq746.mp3

kind of broken, but I was experimenting with tags. I have NO idea why it went with an Australian sounding intro. It happened during:
[noise]

yeah, no clue lol

Why it is super repetitious is a mystery to me too.

But this is how I got that weird voice, because maybe it can be adjusted into a usable prompt tag thing, with changes:

[verse - female vocalist barely audible in noise]
>>
I'm howling. suno hires black hat influencers.
>>
>>108064513
https://files.catbox.moe/swjs2v.mp3

maybe improving...
>>
also.

:devil horns:
I'm testing out something crazy. If it works, there will be weeping & gnashing of teeth by the whole tech jewz industry...
>>
ok, in this gen it's surprising, we have a song that really gets louder, which I have not yet heard with suno or udio (much).

does that happen with the latest suno? Like my thoughts about suno and udio is it's like they used normalized audio to train on. ace step doesn't seem that way.
>>
>>108064910
It kinda does. We'll see.

https://files.catbox.moe/rse31x.mp3
>>
>>108061169
nigger never heard of NN audio upscaling.
>>
>>108063837
holy fuck.. those are some cringe lyrics
>>
Is there some way to control the voice? Wish we could just give it a recording. I guess I could try training a lora but I don't have music, just a bunch of voice clips, not sure if that would work.
>>
>>108066308
try tags i guess in the brackets where you denote sections
>>
Hold the fort while I'm away!
>>
>>108066201
Well, the lyrics in this case aren't meant to be about anything in particular, just random stuff made by AI matching tags I chose.
>>
>>108062683
Last thing I genned yesterday, a dark style vocaloid glitchhop test.
https://files.catbox.moe/kwh1vm.mp3

Also for those not in the know, LoRAs confirmed to be Udio tier (posted by another anon in /ldg/)
>>108065691

So far every song I've made or listened to without LoRA is either 4.5-v5 tier at best, and still not as good as Udio (the best of all) because the voice is not as natural/human sounding (variety issue mostly).

Well, with the sample that anon posted, the slop is pretty much gone! Not just catching up in voice, the composition is there too! It actually sounds human, with the instruments and voice sounding richer than Udio itself (and you can tell if you listen to both Udio and this on hifi gear, possibly because the powers that be at Udio has neutered the crap out of its outputs)! I've listened to plenty of Japanese Udio clips, they essentially sound like compressed real songs.

Now, imagine remastering a song like this, it would be indistinguishable from real song!

Note when I say it had caught up, this is what I mean in terms of human sounding voice quality/composition.

https://www.udio.com/songs/nfdtmJRUC7niZfhseaHdNk

https://www.udio.com/songs/7zrLreMnwCYrdBqQkGtEXM

This confirms my suspicion about this model, I do think it was trained on a lot more than the data they claim on their paper. I will now dedicate some more time to train a LoRA.
>>
>>108060884
Why catbox over vocaroo?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.