/g/ - Suno at home: Ace Step 1.5 - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
Suno at home: Ace Step 1.5 02/04/26(Wed)14:15:08 No.108060884

File: ace step 1.5.png (245 KB, 1007x1076)

Suno at home: Ace Step 1.5 Anonymous 02/04/26(Wed)14:15:08 No.108060884

This is the local audio gen general, but there is only one valid option for local audio gen. This is Ace Step 1.5. It released YESTERDAY.

They aimed for day 0 support, but really it doesn't work on everyone's machine. I have to use cpu vae, which causes generation times to increase by 10, and memory use is big too, nearly 50 gb at times.

So why sing the praises of Ace Step 1.5? Because it works. Local Suno is here.

When you think, "I've heard better made by Suno", remember that that is cherry picked out of thousands of gens.

comfy
https://blog.comfy.org/p/ace-step-15-is-now-available-in-comfyui

Official page, good luck getting Gradio to work, but the steps are here:
https://huggingface.co/ACE-Step/Ace-Step1.5

Anonymous
02/04/26(Wed)14:42:31 No.108061093

Anonymous 02/04/26(Wed)14:42:31 No.108061093

>>108060884
https://files.catbox.moe/m8cxqb.mp3

to upload your own, use catbox.moe

Anonymous
02/04/26(Wed)14:50:02 No.108061161

Anonymous 02/04/26(Wed)14:50:02 No.108061161

>>108061093
>>108060884
sounds robotic/shit
Also your OP looks like not-so-subtle advertisement you copypasted from reddit.
Gay nigger thread for gay nigger people.

Anonymous
02/04/26(Wed)14:50:54 No.108061169

Anonymous 02/04/26(Wed)14:50:54 No.108061169

sounds like ass
not even talking about composition etc
the quality of the generated audio is ass
the low end is awful and mushy
top end is actually painful - almost like you can hear aliasing
it sounds like it's a 64kbps mp3 'upscaled'
tiny model
tiny model smell

Anonymous
02/04/26(Wed)14:53:04 No.108061190

Anonymous 02/04/26(Wed)14:53:04 No.108061190

>>108061161
>>108061169
It's not for "audiophiles" it's for people who are cool and have a life.

Action living. It's not fat old people with headphones bigger than their old woman's tits.

Anonymous
02/04/26(Wed)15:00:12 No.108061244

Anonymous 02/04/26(Wed)15:00:12 No.108061244

The jews have lost their power over ai music.

This is despite the fact it takes me 37 minutes per gen.

Anonymous
02/04/26(Wed)15:00:20 No.108061247

Anonymous 02/04/26(Wed)15:00:20 No.108061247

>>108061190
nah, m8, i was ready to rent some gpu hours and train some sick loras for this ting, but the fidelity is just not there
maybe good enough for like mobile ads or parody shit, but that's like the lowest form of music innit

Anonymous
02/04/26(Wed)15:05:20 No.108061277

Anonymous 02/04/26(Wed)15:05:20 No.108061277

I suck at Music and don't want to have to prompt my own lyrics.

Anonymous
02/04/26(Wed)15:17:26 No.108061348

Anonymous 02/04/26(Wed)15:17:26 No.108061348

>>108061247
aw shucks, u wuz so gunna but dint huh. gawsh aw dat an sheit

Anonymous
02/04/26(Wed)15:18:27 No.108061353

Anonymous 02/04/26(Wed)15:18:27 No.108061353

>>108061277
You can just find poems and paste them in.

Anonymous
02/04/26(Wed)15:23:08 No.108061388

Anonymous 02/04/26(Wed)15:23:08 No.108061388

Voice quality sounds robotic, like it has a very low sample rate.

Anonymous
02/04/26(Wed)15:24:53 No.108061403

Anonymous 02/04/26(Wed)15:24:53 No.108061403

https://files.catbox.moe/n0e07n.mp3

This is still at cfg=1. Because this is so slow on cpu vae, I have to just guess. I'm trying cfg=2 now.

comfyui only has the turbo model. Some people indicate they have gotten the base model to work, but I'll wait for a real wf.

and wow does comfyui need work done on audio stuff.

Someone *DID* report that Comfy-Zluda is working with ace step 1.5, on windows of course.

Anonymous
02/04/26(Wed)15:25:56 No.108061407

Anonymous 02/04/26(Wed)15:25:56 No.108061407

>>108061388
Sorry, sunno shill, you're over.

Anonymous
02/04/26(Wed)15:26:44 No.108061414

Anonymous 02/04/26(Wed)15:26:44 No.108061414

>>108061277
Use meme copypasta or lorem ipsum or a llm robobraindump or "my shitty lyrics, listen to them" on repeat. It's not that you must devise amazing text to get any music.

Anonymous
02/04/26(Wed)15:29:32 No.108061437

Anonymous 02/04/26(Wed)15:29:32 No.108061437

>>108061407
Suno has the same problem, as with just about every other AI music generator I've come across.

Anonymous
02/04/26(Wed)15:43:26 No.108061548

Anonymous 02/04/26(Wed)15:43:26 No.108061548

>>108061388
>>108061437
Udio ha less of this issue, but in ACEStep 1.5 voice quality varies depending in prompt, some of them are not as robotic, others are a bit more.

Anonymous
02/04/26(Wed)15:49:37 No.108061603

Anonymous 02/04/26(Wed)15:49:37 No.108061603

>>108061437
Sounds like a "you" problem. As the tao te ching says:
Tao has three treasures which he guards and cherishes. The first is called compassion; the second is called economy; the third is called humility.

Anonymous
02/04/26(Wed)15:51:37 No.108061610

Anonymous 02/04/26(Wed)15:51:37 No.108061610

>>108061548
There's always some difference upon which a complaint can be registered.

For example, rock n roll, the jazz fans said, was "too repetitive".

idk, whatever, so they don't like it, why should I care what people who are going to hell for supporting the jews are going to think of me?

Anonymous
02/04/26(Wed)15:55:56 No.108061629

Anonymous 02/04/26(Wed)15:55:56 No.108061629

>>108060884
Anybody manage to get it running in comfy with AMD hardware? I've ran an issue where I can't find one of the proper nodes.

Anonymous
02/04/26(Wed)16:00:46 No.108061670

Anonymous 02/04/26(Wed)16:00:46 No.108061670

>>108061629
I updated, and then I had to use --novram --cpu-vae

here is my actual launch command, on Linux:
PYTORCH_ALLOC_CONF=expandable_segments:True TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 MIOPEN_FIND_MODE=FAST python ~/comfy/ComfyUI/main.py --use-pytorch-cross-attention --novram --cpu-vae

the result is that a 1 minute diffusion takes 37 minutes, because first it has to cpu condition, and then after diffusion it does cpu vae.

--novram
causes cpu conditioning, I believe

--cpu-vae
obvious

Anonymous
02/04/26(Wed)16:19:35 No.108061798

Anonymous 02/04/26(Wed)16:19:35 No.108061798

It takes a lot of attempts for the model to make something decent, but infinite retries unlike with Suno or Udio. I always write the lyrics.
I would like to train style loras for something more garage-y. Anyway here's a discarded version.
https://files.catbox.moe/w9c09t.mp3

Anonymous
02/04/26(Wed)16:28:10 No.108061862

Anonymous 02/04/26(Wed)16:28:10 No.108061862

>>108061670
Hell yeah brother! I'll give those a try later today. Thanks!

Anonymous
02/04/26(Wed)16:29:12 No.108061872

Anonymous 02/04/26(Wed)16:29:12 No.108061872

>>108061798
:^) it's fun.

Anonymous
02/04/26(Wed)16:54:39 No.108062046

Anonymous 02/04/26(Wed)16:54:39 No.108062046

trying to see if I can get heun to work. I wound up with a bad gen, but I had a bad prompt too.

I'm using a new prompt, we'll see how it goes. here is the ai slobbered prompt:
A powerful, melodic death metal track driven by fast, palm-muted guitar riffs, aggressive double-bass drumming, and guttural male growls contrasted by clean, soaring vocal harmonies in the chorus. The song begins with a haunting, atmospheric guitar lead and thunderous blast beats, building into a relentless verse with rapid tremolo picking and symphonic keyboard layers. The chorus opens up with anthemic, harmonized clean vocals and a triumphant, melodic guitar line. A technical guitar solo erupts mid-song, featuring sweeping arpeggios and shredding scales over a dynamic drum performance. The track concludes with a dramatic slowdown, merging the main riff with orchestral elements and a final, echoing scream.

Anonymous
02/04/26(Wed)17:40:27 No.108062382

Anonymous 02/04/26(Wed)17:40:27 No.108062382

>>108062046
idk. 60 steps, heun. simple scheduler. cfg=1.

I really need to try a lot more stuff, but comfyui needs to fix audio on AMD so it's fast.

but inference is fast. I just found out I am running batch=2 lmao.

it remains to be seen how well it keeps bpm and pitch. but, if it does, you could mix and match from many gens.

https://files.catbox.moe/nx3pir.mp3

Anonymous
02/04/26(Wed)17:57:29 No.108062516

Anonymous 02/04/26(Wed)17:57:29 No.108062516

the risk of collapse in gens is like when you have a similar mora.

so like

I like cabbage.
my cab is rad.

it might do
I like cabbage is rad.
or
I like cab (noise) rad

Anonymous
02/04/26(Wed)18:05:13 No.108062562

Anonymous 02/04/26(Wed)18:05:13 No.108062562

>>108062516
also like if you have
I like big red tomatoes
I red tomatoes biggest fan

it may only give you one full sequence of "red tomatoes" and like do a half-assed job on the other one. I think this is because the token is not intensified enough by repetition. maybe use (parens)???

Anonymous
02/04/26(Wed)18:06:14 No.108062566

Anonymous 02/04/26(Wed)18:06:14 No.108062566

>>108061862
Yeah, also the default wf by comfy has way too low a # of steps. Like I'm thinking 60 is my minimum.

Anonymous
02/04/26(Wed)18:07:14 No.108062573

Anonymous 02/04/26(Wed)18:07:14 No.108062573

anyway,
>contemporary R&B
but also
>bpm 200

what will it sound like? Join the musical adventure as we explore Ace Step 1.5

Anonymous
02/04/26(Wed)18:21:23 No.108062683

Anonymous 02/04/26(Wed)18:21:23 No.108062683

Has comfy merged in thinking support yet? It actually helps with prompt adherence (not just the 4B LM model).

Repost of recent Japanese gens from previous /ldg/ threads, I'm starting to get hang of the model to get more of that v4.5 power, there's still much more to test.

Glitchhop-
https://files.catbox.moe/wjh4eo.mp3
Babymetal-
https://files.catbox.moe/3eccq6.mp3
https://files.catbox.moe/go8sj2.mp3

(All of these are prompted detailed in detailed way using help of a custom prompt template I created- https://pastebin.com/Xt551MqD)

The model has potential in terms of catchyness, definitely Suno v4.5/v5 on good gens, and perhaps a finetune dedicated to certain genres would push it a bit more towards Udio area voice quality and make it surpass Suno.

If you're not sure how to prompt something out of the model, a nifty feature that Gemini has is mp3 interrogation. Since the model was trained with the help of Gemini 2.5 captions, it helps to give it a song in style you'd like and ask Gemini to give you prompts for that, but of course you may also need to describe it precisely to Gemini as well because even that is not perfect.

Anonymous
02/04/26(Wed)18:45:46 No.108062824

Anonymous 02/04/26(Wed)18:45:46 No.108062824

information gleaned.

I just tried cfg=8, 80 steps, with lcm. This resulted in a sileng gen. Generally, you want fewer steps with lcm, but idk, my gens are so slow that I am playing Battle ship with what's good.

>>108062683
Thanks! if you output to save audio FLAC it includes the wf.

Anonymous
02/04/26(Wed)18:56:27 No.108062883

Anonymous 02/04/26(Wed)18:56:27 No.108062883

someone on you tube or twitch or whatever said to say that ace step 1.5 "is xyz"

none of their opinions are their own, and that's obvious because they resort to clipped platitudes.

Anonymous
02/04/26(Wed)19:05:37 No.108062945

Anonymous 02/04/26(Wed)19:05:37 No.108062945

File: Screenshot from 2026-02-0(...).png (58 KB, 1020x242)

58 KB PNG

It seems the audio code in comfyui in general need work. older nvidia card owners are also having issues. (in addition to all amd owners)

Anonymous
02/04/26(Wed)19:54:23 No.108063246

Anonymous 02/04/26(Wed)19:54:23 No.108063246

https://files.catbox.moe/76zi5j.mp3

trying different things (slooooowly).

prompt: contemporary R&B

LCM
cfg=1.5
20 steps
(simple)

really not sure. I know that lcm worked really well with ace step 1.35, not sure if that can be done here. I'll keep guessing.

Anonymous
02/04/26(Wed)20:23:52 No.108063426

Anonymous 02/04/26(Wed)20:23:52 No.108063426

holy shit it's good enough

we are in stable diffusion v1.5 territory now

Anonymous
02/04/26(Wed)20:47:39 No.108063569

Anonymous 02/04/26(Wed)20:47:39 No.108063569

>>108063426
Yes, and I think that the a2a idea is really spicy because the idea of upgrading audio you generate today with the next model is pretty exciting.

Anonymous
02/04/26(Wed)20:48:39 No.108063574

Anonymous 02/04/26(Wed)20:48:39 No.108063574

>>108062883
Easier to spend your time pretending to have formed opinions on things instead of having to actually do things that cause you to form opinions.

Anonymous
02/04/26(Wed)20:55:54 No.108063621

Anonymous 02/04/26(Wed)20:55:54 No.108063621

Is it at the point where I could generate convincing enough sounding instrumental loops and then mix them together?
Like can this thing accurately generate based on BPM and the like?

Anonymous
02/04/26(Wed)20:57:06 No.108063625

Anonymous 02/04/26(Wed)20:57:06 No.108063625

>>108063621
We're really early. but asking if you can make loops indicates you don't know anything about edm. of course you can.

Anonymous
02/04/26(Wed)20:59:03 No.108063636

Anonymous 02/04/26(Wed)20:59:03 No.108063636

>>108063625
>EDM
Slop genre imo, but agree to disagree.
Good to know though, I'll have to play around with it. Does specifying BPM in the prompt work accurately though?

Anonymous
02/04/26(Wed)21:02:17 No.108063660

Anonymous 02/04/26(Wed)21:02:17 No.108063660

>>108063636
Why would I even talk to you? slop yo momma

Anonymous
02/04/26(Wed)21:07:53 No.108063693

Anonymous 02/04/26(Wed)21:07:53 No.108063693

>>108062824
There is no workflow because it's Gradio.
Here's babymetal metadata
https://files.catbox.moe/zwvnzz.json
glitchhop
https://files.catbox.moe/4lm6te.json

Now testing a combination of three genres (glitchhop, babymetal, 8bit)
https://files.catbox.moe/mibncm.mp3
https://files.catbox.moe/tn3adx.json

Interesting that it can rap fast
https://files.catbox.moe/3v4zxj.mp3

This second version would be perfect if not for messing up the chorus "nyan! nyan! nyan! DATA SMASH!!", but I can't fix it because repaint lowers the quality on the Gradio version I'm using, not sure if the Chinks messed up implementation or repaint is that bad unfortunately.

Anonymous
02/04/26(Wed)21:27:54 No.108063794

Anonymous 02/04/26(Wed)21:27:54 No.108063794

https://files.catbox.moe/cgpiq6.mp3
60 steps cfg 1 heunpp2 simple

Anonymous
02/04/26(Wed)21:35:43 No.108063837

Anonymous 02/04/26(Wed)21:35:43 No.108063837

>>108063693
Lyrics in case you don't have gradio

 [Intro - 8-bit Boot-Up & Rapid Arpeggios]
pi-pi-pi! bo-bo-bo!
hajimaru yo, GAME START
kira kira RAM ga
baka ni natta—ERROR! (wah!)

[Verse 1 - Cute Idol Groove / Swung Beat]
asa kara zutto hyper mode
neko mitai ni janpu janpu!
atama no naka bug darake
demo waratteru kara OK

coin! coin! hirotte
HP kaifuku da yo!
sekai ga kashikoku natte mo
watashi wa pop-up girl

[Pre-Chorus - Glitch Build-Up & Rapid 8-bit FX]
doki doki CPU atsui
nettaiya saikou chouten!
countdown ichi ni san—
bara bara ni naru no!?

[Chorus - Kawaii Chaos Explosion / Heavy Guitar Riffs]
KYU! KYU! KYU! HEART CRASH!!
bitto ga tobu yo BAN BAN BAN!
metal na yume mo
chiptune de tokashite

nyan! nyan! nyan! DATA SMASH!!
egao de sekai o hack shiyou!
baka de mo ii jan?
kawaii wa seigi da mon!

[Verse 2 - Playful Rap-Sing / Glitch-Hop Elements]
pixel no umi o oyogu
pata pata te o futte
lag ga kite mo
love wa delay shinai!

sensei ga itteita
“chanto shinasai” tte
demo ne demo ne
tanoshii ga ichiban!

[Breakdown - Metal × Glitch Drop]
(Heavy distorted down-tuned guitars)
ZAZA—GSHH—!!
SYSTEM DOWN!! SYSTEM DOWN!!

[shouted] gyaa!?
[childish vocal] demo mada ikiteru!
neji to reesu de
tachimukau no da!

[Chorus 2 - Extra Hyper / Double-Bass Drums]
KYU! KYU! KYU! HEART CRASH!!
noise mo uta ni shichau!
garigari guitar mo
omocha mitai da ne

nyan! nyan! nyan! DATA SMASH!!
sukoshi kowareta mama de
sekai o dakishimete
waracchau no da!

[Bridge - Ultra Cute Drop-Out / Chiptune Only]
[whispered] shh… shh…
himitsu no patch day…

[sweet airy vocal] chu chu install shite
ai o saikidou!

[Final Chorus - Max Energy Idol Mode]
KYU! KYU! KYU! HEART CRASH!!
mirai ga buffer over!
demo demo ne
tanoshikereba OK!

nyan! nyan! nyan! DATA SMASH!!
kono oto ga tomaru made
watashi wa utau yo
LOOP SHITE EIEN!!

[Outro - 8-bit Fade-Out & Glitch noise]
pi-pi-pi…
clear shita?
mada mada tsuzuku yo
GAME OVER?
—uso da yo!
[End - Bit-crushed sparkle]

Anonymous
02/04/26(Wed)21:38:27 No.108063854

Anonymous 02/04/26(Wed)21:38:27 No.108063854

Has anyone gotten decent results with 4gb VRAM or is it just not worth using? Skips whole verses and just mangles the lyrics pretty badly.

Anonymous
02/04/26(Wed)21:39:57 No.108063857

Anonymous 02/04/26(Wed)21:39:57 No.108063857

>>108063854
hey zoomer. Don't you have some vibes to watch on twitch or whatever you lobotomy students do.

Anonymous
02/04/26(Wed)21:42:08 No.108063867

Anonymous 02/04/26(Wed)21:42:08 No.108063867

>>108063660
I said agree to disagree. You're the one who assumed I was going to make EDM with it.

Anonymous
02/04/26(Wed)21:45:05 No.108063883

Anonymous 02/04/26(Wed)21:45:05 No.108063883

>>108063867
Again, I have a life to live, and you're not a part of the human future.

Anonymous
02/04/26(Wed)21:46:01 No.108063888

Anonymous 02/04/26(Wed)21:46:01 No.108063888

>>108061247
wait for the well done code
even ace devs did not release well done code yet
this seems to be good model

Anonymous
02/04/26(Wed)21:48:05 No.108063897

Anonymous 02/04/26(Wed)21:48:05 No.108063897

>>108063888
True technically, but it's best to try to run it and complain. Generate pressure.

I know my future, though, most likely I'll have to vibe code this thing.

Anonymous
02/04/26(Wed)21:52:30 No.108063924

Anonymous 02/04/26(Wed)21:52:30 No.108063924

>>108063897
> I'll have to vibe code
avoid such vile activity (use llms to guide you how2code if they r abel to do so, and not to 'buzzword' nonsense)
>>108063004

Anonymous
02/04/26(Wed)21:52:49 No.108063928

Anonymous 02/04/26(Wed)21:52:49 No.108063928

>>108063854
>Skips whole verses and just mangles the lyrics pretty badly

Used 4B LM if you can (perhaps offload to CPU), enable thinking mode if you're using Gradio version, play around with BPM/seconds. I have found that if it messes it up it's either seed or just the song is too short for the chosen BPM. Also certain keyscales give superior results to others depending on genre.

Anonymous
02/04/26(Wed)22:45:01 No.108064172

Anonymous 02/04/26(Wed)22:45:01 No.108064172

(trigger warning: this is Udio, not local)
https://litter.catbox.moe/p5jfhqft5ebihakn.mp3
https://files.catbox.moe/6n60yd.mp3
Local has a long way to go before it sounds this believable. And this is still a long way from where we might want AI music to be.

The problem with audio gen in general right now is that it can't sustain the listener's interest. At its best it sounds like sloppy unfocused jam sessions with occasionally cool ideas (although never a new one really). There's no "songcraft" at all. AI will be very cool for music when we are able to use it to dress up e.g. midi-style human inputs, but building songs from prompts is merely a fun way to waste time

Anonymous
02/04/26(Wed)23:01:26 No.108064235

Anonymous 02/04/26(Wed)23:01:26 No.108064235

https://files.catbox.moe/b2cy03.mp3

Still not really getting very metal-y metal yet. The scream singing is pretty good tho, that's new for local models afaik

Feeling like heun2 is worth it, but we'll see. I'd really like this to be faster, because I want to know if LCM can work or not.

my experience with ace step 1.35 was that it was unpredictable with lcm, like sometimes it would produce good results, and sometimes it would freak out.

Anonymous
02/04/26(Wed)23:02:27 No.108064239

Anonymous 02/04/26(Wed)23:02:27 No.108064239

>>108063924
I'm not really a guru guy.

Anonymous
02/04/26(Wed)23:04:07 No.108064247

Anonymous 02/04/26(Wed)23:04:07 No.108064247

>>108064172
You will have no part in the future, old fart / zoomer. go back to your brainless fox news / msnbc / twitch streams of lobotomy.

Anonymous
02/04/26(Wed)23:16:25 No.108064312

Anonymous 02/04/26(Wed)23:16:25 No.108064312

ace step doesn't know what a vocoder is... but I bet it can be prompted.

Anonymous
02/04/26(Wed)23:26:46 No.108064361

Anonymous 02/04/26(Wed)23:26:46 No.108064361

>>108064172
>AI will be very cool for music when we are able to use it to dress up e.g. midi-style human inputs
Is that possible with cover mode?
https://github.com/ace-step/ACE-Step-1.5/blob/main/docs/en/Tutorial.md#2-source-audio-semantic-structure-control

Anonymous
02/04/26(Wed)23:47:49 No.108064470

Anonymous 02/04/26(Wed)23:47:49 No.108064470

>>108064361
I don't think anyone has gotten 1.5 gradio to work on amd yet, but maybe someone can put up a guide. for now, I only have the comfyui version that's literally just the turbo model and t2a.

Anonymous
02/04/26(Wed)23:52:55 No.108064509

Anonymous 02/04/26(Wed)23:52:55 No.108064509

There's some sort of weird background noise in all these posts that makes it sounds awful. Also I'm not installing chinese spyware lol wumao.

Anonymous
02/04/26(Wed)23:53:34 No.108064513

Anonymous 02/04/26(Wed)23:53:34 No.108064513

https://files.catbox.moe/anq746.mp3

kind of broken, but I was experimenting with tags. I have NO idea why it went with an Australian sounding intro. It happened during:
[noise]

yeah, no clue lol

Why it is super repetitious is a mystery to me too.

But this is how I got that weird voice, because maybe it can be adjusted into a usable prompt tag thing, with changes:

[verse - female vocalist barely audible in noise]

Anonymous
02/04/26(Wed)23:54:35 No.108064524

Anonymous 02/04/26(Wed)23:54:35 No.108064524

I'm howling. suno hires black hat influencers.

Anonymous
02/05/26(Thu)01:34:24 No.108064907

Anonymous 02/05/26(Thu)01:34:24 No.108064907

>>108064513
https://files.catbox.moe/swjs2v.mp3

maybe improving...

Anonymous
02/05/26(Thu)01:35:26 No.108064910

Anonymous 02/05/26(Thu)01:35:26 No.108064910

also.

:devil horns:
I'm testing out something crazy. If it works, there will be weeping & gnashing of teeth by the whole tech jewz industry...

Anonymous
02/05/26(Thu)01:37:21 No.108064918

Anonymous 02/05/26(Thu)01:37:21 No.108064918

ok, in this gen it's surprising, we have a song that really gets louder, which I have not yet heard with suno or udio (much).

does that happen with the latest suno? Like my thoughts about suno and udio is it's like they used normalized audio to train on. ace step doesn't seem that way.

Anonymous
02/05/26(Thu)03:30:55 No.108065369

Anonymous 02/05/26(Thu)03:30:55 No.108065369

>>108064910
It kinda does. We'll see.

https://files.catbox.moe/rse31x.mp3

Anonymous
02/05/26(Thu)07:11:29 No.108066192

Anonymous 02/05/26(Thu)07:11:29 No.108066192

>>108061169
nigger never heard of NN audio upscaling.

Anonymous
02/05/26(Thu)07:13:22 No.108066201

Anonymous 02/05/26(Thu)07:13:22 No.108066201

>>108063837
holy fuck.. those are some cringe lyrics

Anonymous
02/05/26(Thu)07:39:57 No.108066308

Anonymous 02/05/26(Thu)07:39:57 No.108066308

Is there some way to control the voice? Wish we could just give it a recording. I guess I could try training a lora but I don't have music, just a bunch of voice clips, not sure if that would work.

Anonymous
02/05/26(Thu)13:22:11 No.108068490

Anonymous 02/05/26(Thu)13:22:11 No.108068490

>>108066308
try tags i guess in the brackets where you denote sections

Anonymous
02/05/26(Thu)14:59:43 No.108069302

Anonymous 02/05/26(Thu)14:59:43 No.108069302

Hold the fort while I'm away!

Anonymous
02/05/26(Thu)15:00:34 No.108069304

Anonymous 02/05/26(Thu)15:00:34 No.108069304

>>108066201
Well, the lyrics in this case aren't meant to be about anything in particular, just random stuff made by AI matching tags I chose.

Anonymous
02/05/26(Thu)15:32:38 No.108069491

Anonymous 02/05/26(Thu)15:32:38 No.108069491

>>108062683
Last thing I genned yesterday, a dark style vocaloid glitchhop test.
https://files.catbox.moe/kwh1vm.mp3

Also for those not in the know, LoRAs confirmed to be Udio tier (posted by another anon in /ldg/)
>>108065691

So far every song I've made or listened to without LoRA is either 4.5-v5 tier at best, and still not as good as Udio (the best of all) because the voice is not as natural/human sounding (variety issue mostly).

Well, with the sample that anon posted, the slop is pretty much gone! Not just catching up in voice, the composition is there too! It actually sounds human, with the instruments and voice sounding richer than Udio itself (and you can tell if you listen to both Udio and this on hifi gear, possibly because the powers that be at Udio has neutered the crap out of its outputs)! I've listened to plenty of Japanese Udio clips, they essentially sound like compressed real songs.

Now, imagine remastering a song like this, it would be indistinguishable from real song!

Note when I say it had caught up, this is what I mean in terms of human sounding voice quality/composition.

https://www.udio.com/songs/nfdtmJRUC7niZfhseaHdNk

https://www.udio.com/songs/7zrLreMnwCYrdBqQkGtEXM

This confirms my suspicion about this model, I do think it was trained on a lot more than the data they claim on their paper. I will now dedicate some more time to train a LoRA.

Anonymous
02/05/26(Thu)17:45:39 No.108070446

Anonymous 02/05/26(Thu)17:45:39 No.108070446

>>108060884
Why catbox over vocaroo?

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.