[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


Discussion of Free and Open Source Diffusion Models

Prev: >>108048611

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>Anima
https://huggingface.co/circlestone-labs/Anima

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
Do not engage with posts mentioning "debo", "ani" or "ran". These are troll posts.
>>
>>108053185
I'm not from St. Petersburg and you are replying to an anonymous 4chan post.
This proves one fact: most ocd trolls are 17 year old virgins.
>>
>>108053187
ty 4 bake
>>
>>108053206
did you just admit to being underage while posting on 4chan?
>>
>>108053187
> Do not engage with posts mentioning "debo", "ani" or "ran". These are troll posts.
Based.
>>
is klein 4B supposed to be this shit? can't even generate a proper selfie
>>
reminder, dont use sageattention in starting args if you try acestep, I was getting jibberish till I removed --use-sage-attention.

now it is VERY clear, and almost Suno quality. using comfy workflow and the non-AIO one.
>>
>>108053210
no problem
>>
>>108053214
You don't have anything compared to the original schizo. You are just a little baby.
>>
>>108053245
doesnt come even close to suno loser retard
>>
>>108053245
example: bill gates epstein song, used grok to generate lyrics. left the kpop description the same in the top node.

https://voca.ro/12VaksFm9iMn
>>
File: 38777059622937.png (1.84 MB, 1088x944)
1.84 MB
1.84 MB PNG
https://voca.ro/1mNw5x3sY7Ak

It's definitely not as good as Suno, but it's cool that it exists.
>>
>>108053284
same lyrics with "a slow rock ballad" in the top node.

https://voca.ro/17EKFTdU0qDi
>>
>>108053245
is sage just done for at this point?
>>
okay dont use randomize for duration unless youre a retard like me. was wondering why it was taking longer than last time and the duration was set to 862
>>
Does batch size work for you?
>>
My first song ever... I kind gave up until 1:00 tho
https://voca.ro/14gqzALZkd3z
>>
>>108053328
Sage only has major gains for video. For image fast fp16 is miles better. Idk how is it with audio.
>>
Consider using the front end the ace step team built for the model. It can do covers and turn vocal tracks into polished songs.
>>
>>108053354
does it have inpainting?
>>
File: image.png (59 KB, 776x284)
59 KB
59 KB PNG
cumrag...
>>
>>108053354
link?
>>
File: z_image_bf16_00083_.jpg (2.94 MB, 1344x1728)
2.94 MB
2.94 MB JPG
>>
>>108053354
I used it. It works, but I wouldn't recommend it.
>>
>>108053354
can this be done in comfy?
>>
File: file5og.jpg (661 KB, 1013x1351)
661 KB
661 KB JPG
I hate 'thunar' it has nothing compared to explorer.
>>
>>108053386
just use a real file manager like doublecmd (totalcmd clone)
>>
>>108053386
Have you tried using a real DE like Plasma with Dolphin?
>>
>>108053227
you look like shit
>>
>>108053284
>used grok to generate lyrics.
vramlet or retard?
>>
>>108053396
I knew that the solution is always to change the distro. You are a techlet.
>>108053395
I don't care that much. File transfer logistics are an issue.
When it happens I want to kill someone.
I can use terminal and this fine but who the fuck wants to type all the time.
>>
so can ace step do speech and stuff or only music and singing?
>>
>>108053408
>changing DE means changing distro
it is you who is the techlet
>>
kek it can do synthwave, here is a song about bill gates and epstein. also, use grok to get prompts for song styles + lyrics.

https://voca.ro/15wCyq253sQe
>>
>>108053354
I already made all of those features work in comfy, and no I won't share fuck you
>>
>>108053396
Dolphin is actually worse than Thunar.
What also bothers me is the software's name. Who the fuck names a file manager as 'thunar'.
>>108053418
Oh sorry I didn't know you were supposed to wipe your pants.
>>
>kek it can do [blank]
i swear you are an organic bot
>>
our minds are like an ai model.
just as we write a prompt, to generate an image,
we think, say, feel, and do, to generate our lives.
what will your prompt be?
>>
so, is ace fun? can i make some perverted techno? i already have the lyrics. "s my ice cream..."
>>
>>108053441
tldr
>>
>>108053366
Now do Misaka and Uiharu and Accel
>>
>>108053448
dafuq is peverted techno?
>>
File: z_image_bf16_00084_.jpg (3.24 MB, 1344x1728)
3.24 MB
3.24 MB JPG
>>108053453
i haven't watched the anime
>>
https://github.com/ace-step/ACE-Step-1.5?tab=readme-ov-file#-installation

works well, gonna try their frontend for the inpainting/edit stuff
>>
File: 01936-1310522073.png (1.12 MB, 896x1152)
1.12 MB
1.12 MB PNG
>>108053284
>>108053303
kek. nice. why doesn't it finish the songs though?
>>
>>108053342
Based
>>
https://github.com/ace-step/ACE-Step-1.5/blob/main/docs/en/GRADIO_GUIDE.md
hmm looks like for now comfy's implementation is simple t2i
gradios (bleah) UI has inpainting, reference audio, cover mode and with BASE it has instrument add/segmentation.
but I hate gradio
also comfy can offload across gpu/cpu, their implementation doesnt.
and finally the 4b model isnt actually implemented in comfy.
SAD.
>>
Is MultiGPU broken with Klein for anyone else? Trying to offload the text encoder winds up partially loading the UNET and giving me an OOM the moment my gens finish, even though half my vram is free.
>>
>>108053521
seems like that might be better for my 4gb card though since it disables the LLM which takes about 18 minutes on my system
>>
>>108053521
Comfy did the absolute minimum for this lmao
>>
>>108053504
hey can you stop being creepy?
>>
Ace, everything works but the audio is empty.
Fuck this python shit I'm not going to trouble shoot why the audio file is empty. It actually calculates.
>>
File: z_image_bf16_00085_.jpg (3.4 MB, 1152x2016)
3.4 MB
3.4 MB JPG
>>
It is running on 4GB of vram:
>Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding. Prompt executed in 69.03 seconds
>>
File: 1740399440554489.jpg (511 KB, 832x1216)
511 KB
511 KB JPG
Anima is absolutely insane for something so lightweight. I only hope we will get it to prompt in higher resolution because upscaling nukes it.
>>
Nah, Ace sucks. Suno 0.5 tier.
>>
>>108053564
I mean I have 12GB of vram.
>>
>>108053545
my acestep was shit till I removed sageattention from my startup args.

use grok to make a song template or song + lyrics, ie:

(Verse 1)
Flying high on a private jet, secrets in the air,
Bill Gates, the corrupt globalist, doesn't seem to care.
Island breeze and hidden deals, under tropic skies,
Bill Gates, the corrupt globalist, wearing his disguise.
(Chorus)
Oh, he's off to Epstein's island, chasing what he sees,
Bill Gates, the corrupt globalist, living on his knees.
Girls are dancing in the shadows, whispers in the night,
Bill Gates, the corrupt globalist, everything's not right.
>>
File: zib_00002.png (1.24 MB, 832x1216)
1.24 MB
1.24 MB PNG
>>
>>108053568
Some SDXL tunes are better and you can still use multiple characters.
Anima has lack of perspective and it doesn't understand backgrounds.
Test it with a random booru tag prompts and you will see what I mean.
And yet SDXL gen is <30 seconds compared to this even on low end hardware.
>>
>>108053586
It seems to calculate them but vae decode spits out an empty file.
Must be one of those ComfyUI(tm) Quirks.
>>
File: 1769636480059782.jpg (906 KB, 1248x1824)
906 KB
906 KB JPG
>>108053598
No SDXL finetune can handle multiple cutstom subjects unless they are characters. Not a single one of them.
And why the fuck would you booru prompt a natural language model?
>>
>>108053568
sonic, masterpiece, safe
>>
>>108053598
Lol aah reminds me all the cope I used to listen to why SD1.5 was so much better, nostalgic
>>
>>108053533
MultiGPU updates slow as fuck. It's was broken for a month, patched, then broken for another month.
>>
>>108053615
Anima is designed for tags. Maybe read the huggingface page. In any case, Qwen 0.6B is not that helpful lol.
>>
Many of you can't prompt for shit so I need to actually use the model before making a judgment
>>
my Mom said I'm a great prompter, I'll have you know
>>
>>108053632
>The model is trained on Danbooru-style tags, natural language captions, and combinations of tags and captions.
Oh really now?
>>
>>108053543
have I truly been so abominable?
>>
File: SDXLtoKlein.jpg (2.2 MB, 2080x1824)
2.2 MB
2.2 MB JPG
Klein is way too good at upscaling old XL gens
>>
>>108053586
It began to write the files after a reboot. Need to do a bigger test run. Maybe it was python cache or something.
>>
>>108053568
that skin color doesnt exist
>>
>>108053649
so far ive been doing tags combined with comma separated short phrases myself. pretty much what i did for illustrious but anima handles the short non-tag phrases way better
>>
>>108053640
>Anima
Good
>Acestep
Good
Thats my review right now and they can only get better from here
>>
>>108053671
bro that's crazy
>>
>>108053672
fuck you poor you are ruining it for everyone
>>
>>108053660
I don't think this is even SDXL. SDXL was more graceful. You are upscaling a chatgpt gen I suppose.
>>
>>108053681
its sd 1.5 retard
>>
kek

grok prompt: make a song about George Floyd doing too much fent that he got high and overdosed, with rhyming lyrics.

output for a slow, melodic synthwave song:

https://voca.ro/1g9wrs64VKPo
>>
File: o_00475_.png (1.27 MB, 1280x768)
1.27 MB
1.27 MB PNG
>>
File: 1756899078000123.jpg (1.28 MB, 1248x1824)
1.28 MB
1.28 MB JPG
>>108053660
I've been using klein to cobble together character art by using blurry as fuck screenshots as character reference, the model is insane.
>>
>>108053686
I don't think so. Paid service image.
>>
>>108053681
No
>>
>>108053706
Ok, I believe. You used caps at least.
>>
>>108053687
thanks to acestep, I call this fentwave:

https://voca.ro/1gV4SnoxVvsA
>>
>>108053687
What does this tell you about yourself? You are a teenager <18 and spend your time on youtube listening to trash.
Jesus go home and tell your parents something, faggot.
You are the reason why 4chan sucks.
>>
>>108053671
Anima has strong full length boomerprompt adherence, IDK why anyone is claiming otherwise
>>
>>108053712
It was literally base SDXL with like 4 loras
>>
Pretty funny.
https://voca.ro/1kzK4q82Hkh2
>>
>>108053729
i tried a few full natural language prompts but only a short paragraph or so. im just not used to multi-paragraph huge prompts since i pretty much only use tag models
>>
>>108053722
it's worse because he is 30+ years old
>>
File: 00193-720977806.jpg (1.09 MB, 2688x2048)
1.09 MB
1.09 MB JPG
Anima needs to work on forge neo soon. On a unrelated note how close is to to Z in regards to prompt adherence?
>>
>>108053721
top node: a slow, driving synthwave song. ok, now THIS is fentwave.

https://voca.ro/11u9EFCq9j69
>>
>>108053729
*bleeds your prompt*
heh nothing personnel kid
>>
>>108053757
Legit skill issue, I don't think you have the hardware to run the model
>>
>>108053753
It has been calibrated for lack of balls music, trained on dataset without testosterone.
>>
>>108053749
fucking loving your gens for all the wrong reasons, my coom instinct is fully triggered
>>
says the shill crying about zit being slow
>>
bumped steps to 20 since it's already super fast.

a heavy metal song with drums and guitars.

https://voca.ro/1inqaeYxA5YI
>>
>>108053801
This thread is image related, why do you always talk music here?
>>
>>108053807
>Discussion of Free and Open Source Diffusion Models
>>
File: 73887501589779.png (1.8 MB, 912x1136)
1.8 MB
1.8 MB PNG
>>108053749
In my experience not as good, but still capable of parsing fairly complex prompts.
>>
File: test.jpg (687 KB, 2661x1711)
687 KB
687 KB JPG
>>108053366
i NEED the anon who made the y2k ZIT lora to come back and remake it for klein
>>
>>108053807
LOCAL DIFFUSION general. not local image general.

anyway. an uptempo anime song from the opening of an anime show with synths and guitars.

https://voca.ro/1amYCIdp6r44
>>
File: 1740070976356214.png (3.18 MB, 1248x1824)
3.18 MB
3.18 MB PNG
>>108053699
>>
>>108053801
Much better than what I had.
It still sounds like Midi convertions of metal songs of early 2000s
>>
>>108053807
Better than the other brain damage samefagging shit i have to scroll through here usually
>>
holy shit the gradio ui for acestep is so fucking CANCER
>>
File: 01944-1472889807.png (1.05 MB, 896x1152)
1.05 MB
1.05 MB PNG
They will call us lucky for having been here.
>>
>>108053832
Can this model generate made-up lyrics like in Nier songs?
>>
>>108053867
Thank you.
>>
File: anima_uwu.jpg (115 KB, 704x1024)
115 KB
115 KB JPG
>>
so you can apparently use acestep to mod songs

"It's just a process similar to img2img. Vae encode a song and use the latent to render with low Denoise around 0.25, I also increased the cfg a bit."

toss this into latent image instead of the default. set denoise to 0.25. seems to work.
>>
>>108053749
why is ani white
>>
is rannigger really going we wuz kangz mode?
>>
File: 1761988173716748.png (49 KB, 738x444)
49 KB
49 KB PNG
>>108053892
fucking stupid website captchas

anyways, do this.
>>
>>108053896
He is white, why would I not keep the character in line?
Refining the prompt
>>108053901
I don't think I'm a king but you seem to worship me so why not
>>
>>108053901
I'm his best friend and protector now.
>>
>>108053908
DAS RITE
>>
>>108053908
make him blue instead
>>
>>
>>108053908
Well said my friend. Let's concentrate on creating new images.
>>
>>108053832
>https://voca.ro/1amYCIdp6r44
ok nigga you got me
>>
>>108053908
then why is ran brown?
>>
https://vocaroo.com/17FBr0Hqcimc
>Heavy metal, slow.
>[Intro - Robotic Metal Riff]
>[Instrumental - oriental solo]
Yeah this model is a joke. Unless you have yodling japanese voices and drumbeat it doesn't do anything.
Thing is most people are as blind as they are musical which equals to none.
>>
File: 00196-771589346.jpg (1.45 MB, 2688x2048)
1.45 MB
1.45 MB JPG
>>108053918
I will make him a blue space slave at a later date, trying to think of a good setting
>>
>>108053902
and, if you put in M@GICALCURE! LOVE SHOT! by SAWTOWNE feat.Hatsune Miku

you will get Miku singing about Floyd using fent. I need to tweak the denoise and stuff. but you can edit songs with this/change lyrics.

https://voca.ro/15ugOpa1EuuZ
>>
>>108053943
keep up the good work king
>>
>>108052364
>Qwen Image 2512 fp8_e4m3fn
>4 seconds/step and 4 minutes for the whole image
>27gb of vram used during inference
I get 23.4GB VRAM usage when running qwen in comfy+linux, on my 7900 XTX, and that's with the desktop and other software in VRAM too. maybe more of the workflow like the text encoder is also being loaded in VRAM for you since you have more. My perf:
>100%|| 8/8 [00:22<00:00, 2.78s/it]
pretty good showing from the 9700, but I think the driver support needs improvement. the architecture of that card should allow it to perform better than it does.

>nice to know I'm not missing out by not being able to run larger models on my 16gb vram card at home
qwen image is capable of rendering things that other models simply can't handle. complex geometry, architecture, crowds, interactions, etc. and it's actually very fast with this lora:
https://huggingface.co/lightx2v/Qwen-Image-2512-Lightning/blob/main/Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors
>>
Repaint feature really needs to get here. I'm finding music variety rivalling Suno (given good prompts and all). On good seeds it rivals Udio. If you don't agree you haven't been around long enough.

https://files.catbox.moe/blln37.mp3
>>
>>108053908
>why would I not keep the character in line?
Why is your self insert white then?
>>
>>108053954
>hormone pills
kek
>>
>>108053959
Still cacophony without any music theory or even composition.
No wonder you are a chronic masturbator.
>>
>>108053943
fuck it turns me on insanely
>>
>>108053963
You seem to be of low IQ but high in rage
>>
>>108053976
High sub 70s?
>>
wait is ran brown irl?
>>
>>108053991
now we are being racist
>>
>>108053948
holy shit kek

8 steps, set cfg to 3.0. now skip to 30s to see how it swaps the lyrics. need to tweak more but it's funny.

https://voca.ro/1I7ddfbHLz5L
>>
>outs himself by using the same nickname he's assigned.
Low IQ and can't even troll correctly. Can't even make a gen because you lack the skill too
>>
>>108053993
somalian
>>
>>108053757
IDK what you're referring to
>>
>>108054005
fuck you be respectful
>>
Please help, never used comfy but I want to try AceStep1.5, it keeps hanging at VAEDecodeAudio "GET was unable to find an engine to execute this computation"
--lowvram works but takes 5minutes for 30 seconds on 16GB 4080
>>
>>108054130
okay
>>
>>108054130
https://github.com/ace-step/ACE-Step-1.5?tab=readme-ov-file#-installation Try this, it works on 12gb rtx 3060.
>>
File: 00203-3755223359.jpg (1.19 MB, 2688x2048)
1.19 MB
1.19 MB JPG
>Years later
>Same group of failures mad that they can't match me
I even take multi month breaks and you losers can't make anything new or interesting. I'm not even going to waste my time to look at the /sdg/ miscarriages.
>>
>>108054160
your loss
>>
>>108054130
restart and try again, it is most likely a pychache issue.
New nodes don't refresh even if you click 'refresh' inside cumui.
>>
>>108053993
>be black irl
>generate pictures of yourself as a white slave owner
>portray your enemies as cotton pickers
its quite a display of psychological trauma,
>>
>>108053541
Bordering on absurdity compared to the functionality their own front end has.
>>
a 1980s pop song with synths:

https://voca.ro/19p6qt3QHU18
>>
I really didn't think ran was real, I thought he was some schizos hallucination, this is crazy
>>
ran won btw
>>
>>108054193
unc
>>
>>108054196
So, he made few images and now some off-site psycho is crying about it.
>>
>>108054160
I think I'm going to have to bust a nut to the next one just to worship your gens
>>
>>108054146
Already tried it but the gradio interface is dogshit and after doing "initialize service", "generate sample" and "generate music" the buttons for the "results" literally do nothing, on multiple browsers with no extensions etc.

>>108054189
I restarted multiple times and tried on the manual and portable installations of comfy, doesn't explain why it worked with --lowvram
>>
>>108054130
I get 60 seconds for a 2 minute song on 10gb
You are fucking something up
>>
>>108054205
Remove that --lowvram from your script.
This is mine
#!/bin/bash
python3 ./main.py --disable-manager --disable-manager-ui --disable-api-nodes --preview-method auto
>>
>>108054212
with --lowvram? The point is it doesn't work at all unless I do --lowvram which obviously is wrong
>>
I see links to songs and I ask myself. If this really worth a bare minimum two minutes of my time to appreciate? And the answer is usually no.

Your songs needs to be incredibly catchy and or funny to get my interest and I'm not going to open that link unless you sell it to me first.
>>
>>108054217
No extra settings it all loads into vram normally, downloaded the split files from comfy huggingface
Using the basic template they uploaded today
>>
File: 1755137689292711.png (37 KB, 558x357)
37 KB
37 KB PNG
if you do this you can clone/mess with existing audio but im trying to figure out the best settings. start at 0.25 denoise.
>>
>>108054225
Yeah basically this is worse than Microsoft's VibeVoice 7b. It was able to clone voices and be realistic.
This is just pomp pomp japan japan. It's not a model.
>>
>>108054203
based hornyposter
>>
File: file.png (105 KB, 991x726)
105 KB
105 KB PNG
>>108054205
These are the settings that worked for me.
>>
>>108054248
is that anistudio?
>>
>>108054256
kek
>>
>>108053973
It absolutely does have insane composition matching instructions on my lyrics. It's following the prompt precisely, just needs a repaint for 100% lyrics match.

https://files.catbox.moe/3i5itm.mp3

It helps that I can quickly iterate on songs based on a system prompt I made based on songs from their demo, here's what I'm giving Gemini

https://pastebin.com/Xt551MqD

Works like a charm.
>>
>>108054256
No it's my original character Amispoodeo.
>>
File: Flux2-Klein_00818_.png (2.17 MB, 1536x672)
2.17 MB
2.17 MB PNG
bloody nodes
>>
>>108054214
Launching Comfy with this worked.
1st gen: 120s audio in 105s. 2nd was 60s audio in 29s. 3rd 120s in 50s.
Thanks everyone
>>
>>108054246
There's nothing wrong with the quality of the music. It's just all so generic and dependent on the tastes of the individual that I fail to really care.
>>
>>108054283
jej
>>
>>108054283
unironically kino
>>
>>108054283
saar i have broduced most beutiful skin for the comyui
>>
File: wat.png (11 KB, 1013x55)
11 KB
11 KB PNG
>basically already here
>just a completely different model
>>
>>108054283
I kind of want this.
>>
>>108054283
I want this but with a sensible non-poojeet aesthetic
>>
>>108054361
if you build it sar they will bloody come
>>
>>108054276
No, sorry, it's all shit.
Sort of solves the normie question- you are as blind as you are amusical.
>>
are 60% of threads ran talking to herself?
>>
>ricing your workflow
>>
File: 1759623036469093.jpg (511 KB, 832x1216)
511 KB
511 KB JPG
>>
>>108054414
https://www.youtube.com/watch?v=TBV8_0BqIw8&list=RDTBV8_0BqIw8
This my song.
>>
File: ComfyUI_Anima_00031_.png (1.06 MB, 1024x1024)
1.06 MB
1.06 MB PNG
Where's the anon who said it couldn't do authentic chiptunes (for keygen music)

Well, I got some news for you anon-kun... It can do some killer keygen music!

https://files.catbox.moe/olocnw.mp3
>>
File: 00213-2554173359.jpg (1.37 MB, 2688x2048)
1.37 MB
1.37 MB JPG
>>
>>108054276
>>108054424
One thing I forgot to mention is of course you should tell the LLM to give you appropriate duration in seconds, BPM and keyscale in addition to the prompt. Changing BPM based on pace of song does change it a lot and is crucial depending on what you're going for.
>>
>>108054449
Thank you for inspiring me.
I will post some new stuff tomorrow or something.
>>
>>108054449
put somebody whipping him as well
>>
>>108053288
>Is not better than Suno, goy
Kek shilleets are working hard. Go back fag, I can literally made a lora with real artist, meanwhile you are still cucked by Suno shit
>>
>>108054485
1.5 by itself it pretty passable. But with LoRAs the door is kind of open to anything, and it trains crazy fast.
>>
>>108054495
Is there guide on how to train somewhere?
>>
>>108053288
>It's definitely not as good as Suno
It absolutely is. Do you have access to v5? So far from my tests the musicality matches or surpasses v4.5, music sounds less slopped.
>>
>>108054508
Guides? Not that I'm aware of. I think it helps to caption the data using their captioner though.

Are you using their front end? There's a LoRA trainer attached to it. A few as 8 songs seems to produce pretty good results but idk, I haven't actually tried it yet. Gonna steal a bunch of stuff from troono when I get off work today and try it.
>>
>>
>>108054522
Thanks, I'll check out their front end
>>
LTX2 video extend is so much fun.

https://files.catbox.moe/a7a1ip.mp4
>>
>>108054485
He posted a bad gen. Probably prompt issue, or something else. Very easy to get good gens where you can clearly hear the lyrics with quality matching or surpassing Suno.
>>
>>108054527
>ldg
>1girl
Other than the lady on the right, nice gen.
>>
>>108054449
make him carry the platform with your throne on it. cant remember the exact name for it but its usually like 6 people doing it
>>
>>108054544
Model author himself says it doesn't surpass suno. But it is good enough and has enough tools that with a little tweaking, can suprass suno.
>>
>>108054477
It kind of struggles with that with the scenes I'm doing, I might do one off a black man whipping him or a NTR image series of Busty Aries (comfy mascot) is getting dicked down by a faceless man with corpo written on his face and Ani is seething/crying in the corner.
>>
File: 00219-1119278622.jpg (1.27 MB, 2688x2048)
1.27 MB
1.27 MB JPG
>>108054573
Forgot image
>>
we're eating so good it's not even funny anymore
>>
>>108053827
THE MOON WAXES AN MUH MERCY WANES

nice. this is zit with a lora?
>>
```Add a garish Anime Waifu theme to the entire ComfyUI user interface in image 1 while maintaining the original composition and layout.```
>>
>>108054640
bros... I kinda want this unironically...
>>
>>108054640
woah very pretty and comfy
>>
>>108054573
Yeah for a dynamic image like that I'll need controlnet.
I want to have the ultimate whipping position
>>
File: 1761269939642747.png (1.97 MB, 768x1360)
1.97 MB
1.97 MB PNG
>>108053568
>I only hope we will get it to prompt in higher resolution because upscaling nukes it.
desu all it needs is a cnet. with a good prompt the only thing that really suffers are some background details
>>
>>108054640
You might have unironically stumbled on a thing people would want.
>>
>>108054640
WTF?!?!!? I love ComfyUI now
@ComfyAnon!!!!
Please this is a great idea!!!!
>>
Heart shitters on suicide watch right now after ace step.
>>
When the ai bubble pops will we finally be able to have good local models by renting gpu clusters for 10 cents/hour? please say yes I cant take this anymore
>>
acestep 1.5 can extend songs?
>>
you're a big guy.

https://files.catbox.moe/f6jgsu.mp4
>>
>>108054693
>acestep 1.5 can extend songs

No reason why it couldn't. No idea if it does it well or not.
>>
>>108054691
you can turn your house into a data center because gpus will be 10c each
>>
does AceStep have a way to add in sound effects with the prompt like gunshots?
>>
File: 1751485110200432.jpg (661 KB, 1824x1216)
661 KB
661 KB JPG
>>
I need Chroma Klein to be finished immediately.
>>
>>108054424
except that isn't keygen at all
>>
>>108054681
jej
>>
LTX2 likes to be stuck on the first frame for a full 3 seconds and then start moving at the very last second. What am I doing wrong now?
>>
>>108054760
you're using an image and not a video
>>
>>108054727
[Gunshots]
>>
File: Flux2-Klein_00736_.png (542 KB, 704x768)
542 KB
542 KB PNG
>>108054640
kek
>>
>>108054760
Use the movement lora
https://huggingface.co/MachineDelusions/LTX-2_Image2Video_Adapter_LoRa/tree/main
>>
>>108054640
trans coded UI
>>
ltx model knows spongebob natively:

https://files.catbox.moe/ln833s.mp4
>>
>>108054779
Got a bad feeling throwing another 5gb LoRA into the mix isn't going to be easy on my 12gb card.
>>
>>108054788
we know
>>
>>108054790
I use it on 12 too, it doesn't really hurt that much since we are using a lot offload as it is
>>
>>108054788
bud can you at least test to see what other characters it knows?
>>
>>108054798
Certainly will try it. I'll let you know whether it worked for me someday when this gen finishes and the next one finishes
>>
File: ComfyUI_Anima_00033_.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
Another thing I noticed is that Comfy seems to have messed up the speed for whatever reason. No idea why it's going through text-encode each time for whatever reason kek.

More keygen kino
https://files.catbox.moe/4kgqcy.mp3

Also this model absolutely can match Udio in musicality with prompts targeting the genre.

https://files.catbox.moe/vpcuf0.mp3

Gave this Udio 1.0 gen to Gemini asking it how to prompt it, then pass tags to my AceStep prompt template

This is one I got
https://files.catbox.moe/fkf9l9.mp3

Here's another which better matches lyrics (though it's more generic)
https://files.catbox.moe/u00pjf.mp3

>Model author himself says it doesn't surpass suno

Yes, he says it matches 4.5/v5 though when all the tools are leveraged, and since it's local we can surpass it because Suno is prone to censorship, it's gated/expensive per gen, and it also has no way to get specific voices or cover copyrighted songs like we do.
>>
>>108054527
```The woman in photograph image 1 is now completely surrounded by leering dark-skinned South Asian men. Maintain all other aspects of the composition and layout.```
>>
>>108054773
very punchable image.

I can imagine it doing slightly sped up Jim Carrey face expression impersonations to pitched up music.
>>
>>108054799
>bud can you at least test to see what other characters it knows?
Everyone knows test anon only tests one of like 6 individuals.
>>
>>108054788
better:

https://files.catbox.moe/cz6xkh.mp4
>>
File: 1757484291910574.jpg (708 KB, 1824x1216)
708 KB
708 KB JPG
>>
>>108054547
ty

>>108054844
>mfw trying to curry favor
>>
>>
File: 00228-3108687076.jpg (1.35 MB, 2688x2048)
1.35 MB
1.35 MB JPG
Z is a great model
>>
>>108054949
getting ready to send the pyramids to earth?
>>
>>108054829
First one sounds somewhat similar to Plastic Love so I know at least that was in training data, which is great since that means it has high quality songs.

One thing it definitely needs a LoRA for is spaghetti western songs, I can't get it to do this Udio gen

https://www.youtube.com/watch?v=8moLFyfgUR4

The model likely has never heard it unfortunately due the dataset used.
>>
Just a heads up that ace step also has an optional 4B LM and a base model (most workflows use turbo)

https://huggingface.co/ACE-Step/acestep-v15-base/tree/main

If you want that see variety and shiet.
>>
>>108054844
>the woman is surrounded by 4 angry Italians because she ate the slice incorrectly
>>
>>108054979
I want a working remix/cover option
>>
>>108054997
Is this a comfy issue or is the gradiot interface not working too?
>>
File: 00231-3498321773.jpg (1.45 MB, 2688x2048)
1.45 MB
1.45 MB JPG
>>
Anyone tried remaking the Ken-sama Go song in acestep yet? the original fucked up the pronunciation of nippon
>>
>>108055010
nta comfy just added text to music, They mention the other features on the blog page but say the community has to do them lol
>>
File: Flux2-Klein_00691_.png (445 KB, 704x768)
445 KB
445 KB PNG
>>108054773
>>
>>108054979
>Just a heads up that ace step also has an optional 4B LM and a base model (most workflows use turbo)

How does loading the 4B LM work in comfy? Do we just load it by itself or do we include the other ones as well
>>
>>108055034
One or the other. They do the same thing. But the 4B has more Bs and is therefore better.
>>
>>108054982
kek
>>
File: 00234-614151232.jpg (1.46 MB, 2688x2048)
1.46 MB
1.46 MB JPG
>>
Chroma Kaleidoscope status check
not quite there yet but coming along
https://files.catbox.moe/3u8z4e.png
can almost do dog fucking a chick properly
>>
I feel like LoRAs for ACEStep would get taken down from sites like Civit and treated like celebrity LoRAs except even worse, so users would need tor resort to piracy to share LoRAs.
>>
File: 1758335067358946.png (1.23 MB, 912x1136)
1.23 MB
1.23 MB PNG
the asian girl in image1 is wearing a white tshirt with the anime girl in image2 on it. her midriff is visible.
>>
>>108055095
come to this thread for now to discuss ace step 1.5 specifically:
>>108051632
gradio status: no amd support, rumor is Metal may work (???)
>>
hello, saars. can somebody make NAG work with anima???
>>
>>108055128
Why? Negative works?
>>
>>108055144
isn't nag better than negative most of the time?
>>
>>108055148
No NAG is a cope
>>
>>108055148
I don't think so (correct if I am wrong), It's just a hack to add negatives into Distill models, doesn't do much for base models.
>>
>>108055157
next you're gonna say negpip is cope also. negatives are not that good by themselves
>>
>>108055074
ani is coming out determined and stringer in these. like he could support a family
>>
>>108055177
he has to support his bull's offspring
>>
File: 1759611698831257.png (1.14 MB, 1136x896)
1.14 MB
1.14 MB PNG
>>
I need a $9,999,999 graphics card, I am going to fucking die if I don't get a $9,999,999 graphics card
>>
>>108055256
GB200 NVL72 is what you're looking for, stay safe anon
>>
>>108055079
>not quite there yet but coming along
spoiler alert: This will forever and always be the status of Chroma models.
>>
>>108055195
sounds like a cope fantasy
>>
>>108055167
what do you expect from people who use euler as a sampler
>>
>>108054949
tried a recreation with Klein 4B Distilled
>>
>>108055381
neat
>>
>>108055343
nah, as long as the NSFW is there it'll be fine. Especially if distilled back into somthing like 4B Distilled.
>>
>>108055381
why are you spamming more of this shit? it's fucking tiresome
>>
>>108055381
I like his helmet.
>>
fresh
>>108055391
>>108055391
>>
File: 1750567289874407.png (1.12 MB, 1136x896)
1.12 MB
1.12 MB PNG
replace the face of the man in image1 wearing armor with the face of the cartoon frog in image2, in the same pose as the man in image1.
>>
File: Flux2-Klein_00820_.png (586 KB, 1088x944)
586 KB
586 KB PNG
>>108055395
>>
>>108053366
delicious thighs
>>
>>108055547
thanks
>>
>>108055074
proompt



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.