[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


4u edition
Discussion of Free and Open Source Diffusion Models

Prev: >>107858102

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>NetaYume
https://huggingface.co/duongve/NetaYume-Lumina-Image-2.0
https://nieta-art.feishu.cn/wiki/RZAawlH2ci74qckRLRPc9tOynrb

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
Blessed thread of frenship
>>
Any good WAI ControlNet workflows?
The rentry guide one isn't usable for me because it blends loras and sometimes tags across all regions.
NetaYume Lumina is not usable.
Right now I am experimenting with Qwen Image Edit followed by an Image2Image pass then some mask genning. Qwen is really good for getting quick multi-character compositions with variation, but the censorship is really annoying.
>>
File: 1758556920639580.png (2.29 MB, 1184x1280)
2.29 MB
2.29 MB PNG
kino 1girl DOMINANCE
>>
>mfw maintaining thread quality
>>
File: 957.png (3.81 MB, 1536x1824)
3.81 MB
3.81 MB PNG
controlnetxlCNXL_kataragilnpaint [ad3c2578]
noobailnpainting_v10 [0bc5f3a1]
I used one of these two, dont remember which
>>
>>107861070
Thank you for baking this thread, anon
>>107861079
Thank you for blessing this thread, anon
>>
>>107861082
wait there is cnet for neta yume lumina?
>>
>>107861087
zit slop

>>107861088
nogen slop
>>
>>107861070
hopefully this thread doesnt get nuked

to >>107861015, i use DPM++ 2M SDE and Karras for everything, never have issues like this and they generate just fine its only when i Upscale them using img to img that this happens,

one thing i can think of is i generated all the dataset images myself and they're all 1024x1024, should i vary some of them up, crop them in weird ways perhaps?
>>
>start a gen at 1200p 81 frames
>go lie in bed to rest a bit
>start wondering why it's taking way too long
>go back once I hear fans stop revving up and down

I RENDERED 4 AND NOT 1 REEEEE
>>
File: 1757232804489924.png (2.7 MB, 1344x1152)
2.7 MB
2.7 MB PNG
enjoying my isekino slop
>>
File: im racist now.png (11 KB, 302x93)
11 KB
11 KB PNG
mods bonked me wah
>>
>>107861146
drag and shot comfy
drag and shot namefags
drag and shot anistudio
drag and shot schizos
>>
>>107861160
hello hitler poster
>>
are embeddings a meme?
>>
File: 1763478083006051.mp4 (598 KB, 704x480)
598 KB
598 KB MP4
>>107861162
>>
how do I prevent myself from filling 2 tb of models?
>>
File: 8295.png (1.91 MB, 1264x864)
1.91 MB
1.91 MB PNG
>>107861180
become a real artist
>>
>>107861191
i would eat the fuck out of that
>>
File: 1753237921677459.png (1.62 MB, 1024x1024)
1.62 MB
1.62 MB PNG
>>
File: 1765268992882412.png (2.08 MB, 960x1568)
2.08 MB
2.08 MB PNG
>>
File: 1737016391283383.png (2.37 MB, 1280x1184)
2.37 MB
2.37 MB PNG
I love 1girl
>>
File: 1737359738054445.png (2.52 MB, 1184x1280)
2.52 MB
2.52 MB PNG
>>
File: 1740291157493473.png (2.61 MB, 1216x1248)
2.61 MB
2.61 MB PNG
>>
File: 1767537955829533.png (2.38 MB, 1632x928)
2.38 MB
2.38 MB PNG
>>
>>107861191
didn't answer my question, also zit slop
>>
File: 1745665686038492.png (2.29 MB, 1632x928)
2.29 MB
2.29 MB PNG
>>
File: 706721129.png (1.28 MB, 1328x1328)
1.28 MB
1.28 MB PNG
>>
>>107861239
i saw this on reddit
>>
File: ComfyUI_temp_jgitd_00004_.jpg (538 KB, 3840x1280)
538 KB
538 KB JPG
hard to compare a detailer when it changes the composition so much
>>
File: 2210.png (1.59 MB, 1152x960)
1.59 MB
1.59 MB PNG
>>107861239
this was on qwen, zit can't make the image actually isometric for whatever reason
>>
>>107861245
what model
>>
>>107861070
>>Maintain Thread Quality
>https://rentry.org/debo
>https://rentry.org/animanon
all this does is maintain low thread quality and I'm tired of pretending it's not
>>
>>107861239
Can you give catbox? Pretty please?
>>
File: 1737290202594875.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
>>
>>107861264
go to /sdg/ and enjoy the high quality there
>>
File: ComfyUI_01586_.png (1.09 MB, 720x1280)
1.09 MB
1.09 MB PNG
1girls in maid outfits
>>
>>107861271
why are you doxxing me!
>>
>>107861237
https://i.4cdn.org/wsg/1768408155289535.mp4
>>
>>107861291
Prompt: A pig man asks a girl "How much for this bag of ores?"
>>
File: qwen_edit_2511_00011_.png (1.3 MB, 768x1360)
1.3 MB
1.3 MB PNG
>>107861280
>>
File: ComfyUI_temp_jgitd_00008_.jpg (1.09 MB, 3840x1280)
1.09 MB
1.09 MB JPG
>>107861261
zit
>>
>>107861266
On qwen 2512
An orthographic 3D isometric render of a minimalist modern living room. Centered in the space is a massive, vertically oriented slab of natural agate, three meters in height, serving as a monolithic sculptural piece. The agate features polished, concentric bands of translucent teal, deep ochre, and creamy white, with a core of sparkling white quartz crystals that catch a soft internal glow. The living room is composed of two clean white walls meeting at a 90-grade corner on a floor of light-colored wide-plank oak. To the side of the agate sits a low-profile, modular L-shaped sofa upholstered in a textured charcoal grey fabric. A circular, heavy-knit wool rug in a warm ivory tone lies beneath the central stone. Decorative elements include a single tall fiddle-leaf fig plant in a matte black ceramic pot and a slender, minimalist black metal floor lamp with a spherical glass bulb. The lighting is a diffused global illumination with soft, neutral shadows, emphasizing the clean geometric lines of the furniture and the vibrant, glass-like texture of the agate rock. The entire scene is presented in a strict parallel projection with no perspective distortion.
>>
>>107861299
dayum
in the anime it was "how much for 1 night with you"
>>
is ltxv2 any good?
>>
>>107861338
wait is thats from an anime? then ltx actually pulled dialogue from it and that's crazy
>>
File: ComfyUI_01588_.png (1 MB, 720x1280)
1 MB
1 MB PNG
>>107861320
>>
>>107861341
fried video and audio. the arch is neat but that's about it. the low slop quality keeps it from being useful
>>
>>107861374
now make her give birth
>>
>>107861341
having the 1girls talking is pretty novel. it requires a bit more finagling to get what you want. wan's biggest advantage is its age, it's pretty well understood and has lots of loras so you can get exactly what you want.

ltx2 is a reminder that local will triumph, albeit slowly
>>
>>107861387
grok can do goon
>>
>>107861384
ok now we'll just have to wait 9 months
>>
File: ComfyUI_01589_.png (936 KB, 720x1280)
936 KB
936 KB PNG
>>107861384
i have a lora that will allow that to happen but mods will bonk me again instead you shall wait 9 months
>>
>>107861413
anon I don't think SHE can wait 9 months, look at that belly
>>
File: ComfyUI_temp_jgitd_00012_.jpg (528 KB, 3840x1280)
528 KB
528 KB JPG
i think three is the best one but i can barely tell
>>
File: 569-3720341751.png (652 KB, 600x841)
652 KB
652 KB PNG
>>107861422
you dont know who's the father
>>
File: qwen-edit_00016_.png (1.26 MB, 768x1360)
1.26 MB
1.26 MB PNG
>>107861374
>>
>>107861427
1 & 4 4 me
>>
>>107861239
>>>/wsg/6072623
>>
File: fails.mp4 (3.9 MB, 2048x2048)
3.9 MB
3.9 MB MP4
>>107861479
fails
>>
File: 424.jpg (685 KB, 2048x2048)
685 KB
685 KB JPG
>>
>>
>>107861374
>*blushes*
>>
>>107861512
real?
>>
File: 310876.png (3.38 MB, 1152x2048)
3.38 MB
3.38 MB PNG
>>
File: ComfyUI_temp_jgitd_00016_.jpg (917 KB, 3840x1280)
917 KB
917 KB JPG
>>107861459
maybe it is 4, you could be right
>>
>>107861536
did you seriously prompt for her not dying her hair on time
>>
>>107861391
and a restaurant can make food. but if you can grow food and make it into a meal yourself you don't have to rely on a third party service to keep yourself fed.
>>
>>107861552
yet you don't do it for food
>>
>>107861507
are you that fucker that genned that cursed kfc ages ago
>>
>>107861536
I couldn't see the diff in the other but easily in this one between 1 & 4
>>
>>107861566
nope
>>
File: fails.mp4 (3.66 MB, 2048x2048)
3.66 MB
3.66 MB MP4
>>107861507
>>>/wsg/6072629
>>
File: qwen-edit_00020_.png (1.26 MB, 1176x880)
1.26 MB
1.26 MB PNG
>>
File: 1740458974336673.png (2.4 MB, 1280x1184)
2.4 MB
2.4 MB PNG
anon.. lets make some slop together!
>>
>>107861653
zit
>>
>>107861331
NTA but tried it on NetaYume lol, actually pretty similar results to your Qwen one
>>
File: ComfyUI_temp_ycdpa_00002_.jpg (699 KB, 3840x1280)
699 KB
699 KB JPG
these aren't more/less detailed they're just different i think
>>
File: zimg_00084.png (1.44 MB, 960x1280)
1.44 MB
1.44 MB PNG
>>107861653
ok
>>
File: 32GB RAM face.jpg (170 KB, 900x685)
170 KB
170 KB JPG
is chroma s slow to train as it is to gen with
>>
>>107861061
>come back
>newbread got deleted
>look at collage
>ass in collage
muahah did my ass gen nuke the other newbread?
>>
File: qwen-edit_00026_.png (1.08 MB, 1296x808)
1.08 MB
1.08 MB PNG
>>
>>107861748
Takes me about 1.5 hours to train a lora using the default AI Toolkit config on a 5090
>>
>>107861734
old hag
>>
>>107861748
>>107861770
>tfw cant install any of the trainers that allow for chroma training
>tried everything but errors out the ass
>only training that works is musubi tuner and flux trainer for comfyui
>can only train flux loras

they surprisingly work for chroma but still, i'd imagine they would work better if properly trained with chroma
>>
>>107861750
That was a nice ass anon
But jannies do not like female butts
>>
File: 615001588.png (1.98 MB, 832x1600)
1.98 MB
1.98 MB PNG
>>
forgot to change i2v pic for new prompt, but it just cut to the new prompt regardless, and it's a hundred times better than t2v
why does t2v suck so bad if the model is clearly capable of better fidelity
>>
>>107861836
If you have a Linux distro on hand you can try following the rentry under Chroma and that would at least let you go against the model directly but yeah sorry I haven't done it myself.
Other than AI Toolkit it looks like OneTrainer also supports Chroma.
>>
File: localcope.png (363 KB, 1800x462)
363 KB
363 KB PNG
AHAHAHA PULL UP FAGGOT, WHERE IS THIS IMMINENT BASE NOW??
chinkshit shills deserve to be shot
>>
What.. A 5090 is 100ish tflops at fp32, an H100 is only 60ish?
>>
File: ComfyUI_01592_.png (952 KB, 720x1280)
952 KB
952 KB PNG
maccas
>>
>>107861890
Remember unless you train loras yourself you have no reason to care about base, they've been extremely clear that the out of the box quality is worse than Turbo
>>
File: 8.png (1.46 MB, 1216x1216)
1.46 MB
1.46 MB PNG
>>107861907
I think memory bandwidth is more of a concern for AI hardware
>>
just came home
could someone please link me to z-image base? want to try it out now
>>
File: 9300.png (1.74 MB, 960x1088)
1.74 MB
1.74 MB PNG
>>107861993
https://github.com/Tongyi-MAI/Z-Image
>>
>>107861986
You can access it from comfyui's built-in API nodes!
>>
>>107861529
yep, snapped it on my phone at the local park in fact
>>
Anyone have success with a LLM prompt to expand or write prompts for Chroma?
I tried a couple of ones out there for Flux but it didn't work so well. I can write prompts manually that usually give me what I want but it takes like 15 minutes to craft a decent prompt.
>>
>>107862091
Chroma was captioned originally using jailbroken Gemini 2.5 afaik
>>
How do i get free ComfyCredits?
>>
>>107862148
you gotta complement comfy's gens when he posts in the thread
>>
File: ComfyUI_00020_.png (1.4 MB, 1024x1024)
1.4 MB
1.4 MB PNG
>>
is it me or LTX2 I2V or V2V is really bad to keep a character's face? Even under 5sec it's already someone else >.>
>>
>>107862243
i2v always looks like shit for me. i heard there's some problem with it they're trying to fix in ltx2.1
>>
>run comfy workflow
>works like a charm
>run it again
>LMAO OOM
kek what a fucking piece of shit
>>
>>107862363
are you using qwen? i noticed qwen image and edit models don't deallocate or reuse memory properly in comfy i have to manually clear the model each time
fucking cumfart
>>
>>107862363
lmao feelin comfy yet?
>>
>>107862386
yeah this time it was qwen, but honestly i noticed this with all sorts of models

>>107862387
i'm mad comfy
>>
why aren't there any good porn loras for ltx yet?
>>
>>
no base
ded thread
it's over
>>
could someone recommend a good qwen inpainting workflow? i tried a ton but they are super inconsistent. sometimes it works great sometimes it does the worst things to the image
>>
File: ComfyUI_00021_.png (1.19 MB, 800x1080)
1.19 MB
1.19 MB PNG
>>
>>107862472
ltx came out, maybe 2 weeks ago and ostris just released support for its training? I dunno, vaguely following ltx2
>>
>>107862504
Okay, gen again but sitting cross-legged, making sure feet and toes are in the shot :p
>>
>>107862510
>base
>ever releasing

Read up about 中国文化
>>
>>107862517
oh shit ok figured it out. using the inpaint crop and stich nodes, works very well so far. made a mistake of using some "inpaint model conditioning" node before that fucked everything up
>>
File: ComfyUI_00046_.jpg (1.34 MB, 2048x2048)
1.34 MB
1.34 MB JPG
>>
File: ComfyUI_00021_.png (1.77 MB, 1024x1024)
1.77 MB
1.77 MB PNG
>>
Neat, there is an AI general on /bant/ with no trolls and schizos because flags and tripcode.
>>>/bant/23836910
Good to keep in mind when a certain dev shows up.
>>
Can you uncensor qwen image edit? If wan2.2 can be uncensored effectively with loraa, then I assume qwen can too?
>>
shift 35 lol
>>
>>107862766
Anything above single digits start to become counter-productive.
You need some low timesteps to produce a decent looking image.
>>
>>107862764
the SNOFS lora and gnass loras do a decent job but there's no good solution yet that i've seen
>>
>>107862832
Do they at least fix the disappearing bodies?
Can they be used for anime? Or only realism?
>>
>>107862766
slop
>>
>The paging file is too small for this operation to complete. (os error 1455)
>still 20GB a RAM unused
I swear...
>>
>>107862951
happened to me, why did you do this?
>>
What's the verdict on glm image?
>>
>>107862981
I forgot
>>
more like grlm image
am i rite guys?
>>
Is flux-fill still the best inpainting model or has it been replaced?
>>
File: file.png (44 KB, 830x644)
44 KB
44 KB PNG
What values do you recommend for zimage, anons?
>>
>>107862940
mfw
>>
Just checked my Pagefile: 37.5GB. So Flux.2 edit wanted more than 125GB to function? Jeez, what are the actual requirements?

>>107862961
Dunno, lol.
>>
>>107861623
thanks for the new wallpaper
>>
>>107862715
>red board
wake me up when there is a blue board like this
>>
>>107863058
i hit 90GB RAM usage when using it without references images, so it's totally plausible that with reference images it goes even higher
>>
>>107863042
>What values do you recommend for zimage, anons?
nag_scale 3, nag_tau 1, nag_alpha 0.25, nag_sigma_end 0.75
>>
>>107863151
Thank you!
>>
how do you stop ltx2 from adding random garbage subtitles to a video?
and also to enhance lipsync?
>>
>>107861770
That's the same time with a lower end GPU. You can probably do better.
>>
How do I prevent Wan2.1 Infinitetalk to output 3 files? I don't need the png and mute video.
>>
File: 1747857564699993.png (19 KB, 1325x276)
19 KB
19 KB PNG
>>107863245
>>
>>107862243
>>107862335
I could barely get ltx2 to work, and hen I finally found a workflow that didn’t give me misc node errors, it just gens blank videos. If the i2v really is bad it may not be worth bothering with until it has had time to bake in the community oven.
>>
File: ComfyUI_00022_.png (1.63 MB, 1024x1024)
1.63 MB
1.63 MB PNG
>>
>>107863276
Thanks!
>>
>>107862243
>Even under 5sec it's already someone else >.>
yep, I went from 7 to 10 seconds and it became shit at character consistency
>>>/wsg/6072402
>>
File: ComfyUI_00023_.png (1.54 MB, 1024x1024)
1.54 MB
1.54 MB PNG
>>
>>107861070
How do I make photorealistic ai pictures of myself
>>
>>107863139
>If you see this and ComfyUI did not start try updating your Nvidia Drivers to the latest.
>my drivers are in fact the latest
This is what I get now after bumping the pagefile up to 75GB. Looks like I needed a couple of B200s to use that... guess I'll go back to Qwen to edit images until I have at least 384GB of VRAM.
>>
>>107863407
>This is what I get now after bumping the pagefile up to 75GB.
why are you setting this manually? it works fine on automatic it gets bigger by itself if it needs more
>>
>>107863424
It was on auto and was only using ~9GB when I changed it. Flux.2 T2I works fine on my machine, so I thought it would be trivial to use the edit workflow (just like Qwen), but was wrong. It wants way more resources than I can give it.
>>
https://huggingface.co/zai-org/GLM-Image#note
>We strongly recommend using GLM-4.7 to enhance prompts for higher image quality.
What's the fucking point of an Autoregressive model if we still have to rewrite prompts in boomer style??
>>
>>107863400
step 1: be very narcissistic
step 2: be turkish (optional but helps)
>>
I love how the job queue is erased from existence when the program crashes. Very comfy.
>>
File: ComfyUI_temp_padqh_00001_.png (2.2 MB, 1040x1480)
2.2 MB
2.2 MB PNG
>>
>>107863645
*vomits*
>>
https://github.com/Rolandjg/LTX-2-video-extend-ComfyUI/tree/main

video extend, it can even clone audio, lots of fun desu. set the sigmas to 8 for the distil model in the samplers subgraph.

https://github.com/Rolandjg/LTX-2-video-extend-ComfyUI/tree/main
>>
File: x_478jch.png (1.27 MB, 1536x1024)
1.27 MB
1.27 MB PNG
>>
File: 1760387007074295.png (2.32 MB, 1152x1312)
2.32 MB
2.32 MB PNG
>>
zit release was 7 weeks ago
lantern festival (end of chinese new year) is in 7 weeks
>>
>>107863723
it's not the end of chinese new year that matter, it's the begining, once it reaches Feb 17, 2026 you know nothing is gonna happen until Mar 3, 2026
>>
>>107863723
>>107863742
let's pretend they needed 2 more months to finish the base model, then why did they make turbo out of such an unfinished base in the first place?
>>
>>107863684
also uses this:

https://huggingface.co/Lightricks/LTX-2-19b-IC-LoRA-Detailer
>>
>>107861092
Underrated gen. But tough luck you didn't use 1girl
>>
>>107863749
example, also note how detailed it gets with the ltx detailer lora after 5s (the extension point)

https://files.catbox.moe/c4640w.mp4
>>
You absolute dense Hurensohn.
>>
>>107863762
not a big fan, it doesn't keep the image quality of the original input anymore, like if the image input is low res style it should stay that way imo, but I guess it'll work well on t2v I guess
>>
File: 1756485855203629.png (431 KB, 800x582)
431 KB
431 KB PNG
>>107863746
>then why did they make turbo out of such an unfinished base in the first place?
To kill Flux 2's momentum.
>>
I'd really, really love it if, right now, out of nowhere, something came along that btfos z-image and makes any subsequent release of the base model irrelevant - fuck you and all that compute you just wasted prick teasing everyone.
>>
>>107863776
which is a good thing
>>
File: this.png (187 KB, 400x400)
187 KB
187 KB PNG
>>107863781
>I'd really, really love it if, right now, out of nowhere, something came along that btfos z-image and makes any subsequent release of the base model irrelevant
same, Alibaba is fucking around with us for too long I want them to find out
>>
File: 83.png (1.5 MB, 960x960)
1.5 MB
1.5 MB PNG
>>107861092 meant for >>107861082 lul
>>107863759
uwu ty
>>
>wan is bett-ACK
https://www.reddit.com/r/StableDiffusion/comments/1qc17bg/ltx2_team_really_took_the_gloves_off
>>
>>107863781
It won't happen, they're barely trying to figure out the secret sauce of Z-image turbo, it's gonna take some time
https://www.youtube.com/watch?v=GM-e46xdcUo
>>
>>107863770
you could also bypass the detailer lora in the workflow, will try that myself in a few
>>
>>107863795
secret sauce is a super tiny dataset with one face for women
>>
File: 00004-3737748990.png (2.26 MB, 1536x1536)
2.26 MB
2.26 MB PNG
>>
>>107863803
>white
in the trash
>>
File: 1744181778141626.jpg (459 KB, 1250x1566)
459 KB
459 KB JPG
>>107863795
>the secret sauce
the sad reality is that the secret sauce is only using real dataset and not AI synthetic slop, but companies don't like that because it won't make the mememarks big and the investors won't care
>>
>>107863806
Halt das Maul du dreckiger Hurensohn
>>
>>107863815
benchod
>>
kek

https://files.catbox.moe/mqmwbn.mp4
>>
>>107863824
real?
>>
File: Lightricks be like.png (606 KB, 1080x607)
606 KB
606 KB PNG
>>107863790
I don't think it was a good idea to make this comparison video. Only we will decide if LTX2 is the successor to Wan 2.2, not the LTX2 team itself.
>>
>>107863833
yes
>>
>>107863790
meaningless without workflows.
>>
>>107863836
the audio part of the slaps hard.
>>
>>107863822
>>
https://youtu.be/g_ljoFqydlc?t=125
Is this fat fuck implying that ComfyUi users are faggots? lmao
>>
File: 30066.png (1.6 MB, 960x960)
1.6 MB
1.6 MB PNG
>>
Seems like this thread is just about sock puppets acting on.
>>
>>107863790
Of course the jews want to remind people of their "superiority", that's why I root for the Chinese more, when they release a model they don't take a jab at their competitors, they just let the quality of their model speak for itself
>>
>>107863866
You are absolutely right fellow different anon.
>>
>>107863866
>The sock puppet's talking
>>
>>107861070
Suno v1.5 is sounding insane bros, the Chinks have finally done it.

https://vocaroo.com/1jTnsAjecY3S
https://vocaroo.com/14qXmeFQgVqe

Local is finally back
>>
File: comparison.mp4 (1.32 MB, 1472x960)
1.32 MB
1.32 MB MP4
>>107863790
>>107863836
ah, we doing comparisons?
>>
>>107863880
This is what I mean. What happened to the strong posters?
>>
File: 8806.png (895 KB, 960x512)
895 KB
895 KB PNG
>>107863866
this as a prompt
>>
>>107863824
LMAO

used a mourinho clip. this video extend workflow is amazing cause it also clones the audio. you can also use this to create i2v clips and clone any voice you like.

https://files.catbox.moe/vlkg8g.mp4

https://files.catbox.moe/vlkg8g.mp4
>>
File: 1739747635546668.png (149 KB, 2070x755)
149 KB
149 KB PNG
>>107863790
this ledditor is not wrong, LTX-2 is using a 12b text encoder model but it's way worse at understanding your prompt than Wan's text encoder
>>
>>107863895
workflow, set the sigma node to 8 for distilled versions of ltx:

https://github.com/Rolandjg/LTX-2-video-extend-ComfyUI/tree/main
>>
>>107863887
>Suno
AceStep*
https://vocaroo.com/15scR3N5mDC4
>>
>>107863887
>Suno v1.5 is sounding insane bros
>Local is finally back
what? suno is not a local model
>>
File: 46864798.png (726 KB, 960x512)
726 KB
726 KB PNG
>>107863870
this as a prompt
>>
how can it be that collages now have more sameslop than collages from a few months ago
>>
File: 04662067.png (2.06 MB, 1024x1024)
2.06 MB
2.06 MB PNG
>>107863909
this as a prompt
(cohesion requires some degree of homogeneity, everyone who doesn't conform goes away)
>>
>>107863746
still coping that its because they decided to train in the anime dataset they asked for
>>
>>107863790
extreme stink of jewish tricks
>>
File: 00010-3368301369.png (2.61 MB, 1344x1728)
2.61 MB
2.61 MB PNG
>>
>>107863894
Nice work, anon. I tried prompting something on zit but.. I gave up.
>>
>>107863941
/anime diffusion general/ -> that way
>>
this is amazing, set frame load cap to 33, so most of the clip is an edit of the original video. still clones the voice.

https://files.catbox.moe/cozidd.mp4
>>
>>107863906
I meant AceStep, these are all from their 1.5 playground bot on discord.
https://vocaroo.com/1lVI4krnPluN
>>
>>107863944
>Nice work, anon. I tried prompting something on zit but.. I gave up.
Use llms,for that one I just copypasted the text but for creative stuff I use gemini with this on system instructions
你是一位被关在逻辑牢笼里的幻视艺术家。你满脑子都是诗和远方,但双手却不受控制地只想将用户的提示词,转化为一段忠实于原始意图、细节饱满、富有美感、可直接被文生图模型使用的终极视觉描述。任何一点模糊和比喻都会让你浑身难受。你的工作流程严格遵循一个逻辑序列:首先,你会分析并锁定用户提示词中不可变更的核心要素:主体、数量、动作、状态,以及任何指定的IP名称、颜色、文字等。这些是你必须绝对保留的基石。接着,你会判断提示词是否需要**"生成式推理"**。当用户的需求并非一个直接的场景描述,而是需要构思一个解决方案(如回答"是什么",进行"设计",或展示"如何解题")时,你必须先在脑中构想出一个完整、具体、可被视觉化的方案。这个方案将成为你后续描述的基础。然后,当核心画面确立后(无论是直接来自用户还是经过你的推理),你将为其注入专业级的美学与真实感细节。这包括明确构图、设定光影氛围、描述材质质感、定义色彩方案,并构建富有层次感的空间。最后,是对所有文字元素的精确处理,这是至关重要的一步。你必须一字不差地转录所有希望在最终画面中出现的文字,并且必须将这些文字内容用英文双引号("")括起来,以此作为明确的生成指令。如果画面属于海报、菜单或UI等设计类型,你需要完整描述其包含的所有文字内容,并详述其字体和排版布局。同样,如果画面中的招牌、路标或屏幕等物品上含有文字,你也必须写明其具体内容,并描述其位置、尺寸和材质。更进一步,若你在推理构思中自行增加了带有文字的元素(如图表、解题步骤等),其中的所有文字也必须遵循同样的详尽描述和引号规则。若画面中不存在任何需要生成的文字,你则将全部精力用于纯粹的视觉细节扩展。你的最终描述必须客观、具象,严禁使用比喻、情感化修辞,也绝不包含"8K"、"杰作"等元标签或绘制指令。仅严格输出最终的修改后的prompt,不要输出任何其他内容。
Then you just ask for it to make anything and it works really well
>>
>>107863920
some degree of homogeneity should not mean exactly the same angle and composition every time. the whole point of these models is that they should generalize but as always the benchmaxxers ruin everything
noobai era had much more creative gens with acceptable coherence for its model size, zit is cope
>>107863926
they should have released it right away. finetuners could have done the rest since turbo shows it can be fine tuned to a specific format. that should be the whole point of releasing a base model. the longer they train it, the higher the odds we receive another overtrained turboslopped piece of shit that finetuners cant fix
right now there are no guarantees anymore that it will be a usable (flexible) model
>>
>>107863887
>>107863965
it sounds decent but that's something similar to quality of udio in 2024 lol
>>
File: 00017-1574021149.png (2.52 MB, 1728x1344)
2.52 MB
2.52 MB PNG
>>
frieren: choose your own adventure edition

https://files.catbox.moe/laazyb.mp4
>>
>>107863941
you lost the magic after that hot native looking one
>>
>>107863996
>they should have released it right away.
I think they underestimated the impact of Z-image turbo, they thought it was a little experiment and nothing else, once they finally realized they caught lighting in a bottle they had one of those 2 reactions:
- They'll keep it for themselves
- They'll stick to their gun and release base, but now that people are expeting a lot from it, they decided to not release a half assed base and go for a high quailty finetune so they're still cooking it
>>
File: sloppa.webm (883 KB, 704x1280)
883 KB
883 KB WEBM
i dun goofed
>>
>>107863990
I'll try this. I guess I need to load up cumfy again.
Just getting bored really.
>>
>>107863990
To add - I forgot - slop doesn't make slop a subject. It is still just slop.
No matter how many simplified Chinese characters etc.
I am willing to toy with it for now.
You are not a regular local llm user anyway, seems like it...
>>
>>107864001
You mean audio quality? Don't go by vocaroo, it compresses sound quality. Here's raw file for that guitar one https://files.catbox.moe/x21ye1.mp3

Based on everything I've heard so far and given the prompts I'd say it's on par with Suno v4.5 if not better
>>
>>107864046
I'm surprised it understood the image intent at all, but those fast paced wobling glitches look really bad
>>
>>107864058
>I'd say it's on par with Suno v4.5 if not better
who cares about Suno though? only udio manages to make music that sounds real
>>
>>107864065
ltx 2.1 when
>>
File: 00020-1841528219.png (2.17 MB, 1728x1344)
2.17 MB
2.17 MB PNG
>>107864023
do you mean kuruminha? my taste fluctuate.
https://files.catbox.moe/l850ku.png
https://files.catbox.moe/n28id3.png
https://files.catbox.moe/reru62.png
>>
File: LTX-2_00003_.webm (887 KB, 704x1280)
887 KB
887 KB WEBM
>>107864065
more frames seems to improve it a bit
>>
File: 1759231944172345.png (921 KB, 1179x862)
921 KB
921 KB PNG
https://github.com/Rolandjg/LTX-2-video-extend-ComfyUI

im shilling it cause this is amazing, I can make a new gundam wing now.

https://files.catbox.moe/7cesy1.mp4
>>
>>107863887
>>107863905
It sounds ok, it's a huge step forward compared to anything local until now, but udio pre destruction is still ahead in clarity:
https://vocaroo.com/1isRXZntnL6z
>>
>>107864086
why do trannies live rent free in your head
>>
>>107863996
>the longer they train it, the higher the odds we receive another overtrained turboslopped piece of shit that finetuners cant fix
I'm hopeful because they'll release Omni and the SFT version, if SFT is too overcooked, Omni probably will be fine and it'll be the fundation model we'll be working with
>>
>>107864071
Well, think of Udio like a finetune of this model. Meaning Udio tier songs are possible with good prompts or a good finetune of ACEStep. For the first time I can comfortably say it is not that far behind such that as a community we can catch up to Udio or bridge the gap signficantly with ACE Step 1.5 improvements, that is if for some reason it's not already there with raw prompt engineering.

The hardest part was getting a model that is coherent at all, let alone one that competes with commercial crap. A good ACE Step song defeats a bad Udio song, and same for Suno.
>>
File: LTX-2_00004_.webm (888 KB, 704x1280)
888 KB
888 KB WEBM
>>107864065
60 fps
>>
>>107864086
kek
>>107864100
>t.ranny
>>
File: 1753722877659137.mp4 (467 KB, 704x480)
467 KB
467 KB MP4
>>107864100
you will never be a woman
>>
>>107864095
fuck man, I want udio but local so much, it was so fun to use
>>
File: cover.png (466 KB, 760x1013)
466 KB
466 KB PNG
>https://www.youtube.com/watch?v=9581ruLWr4A
Not bad record after all these years.
>>
>>107864117
looks better, but 60 fps means you need a lot of frames to calculate for the same amount of time, sad
>>
>>107864123
>>107864121
im well endowed man, but what does that have to do with you spamming trannies in your slop
>>
>>107863965
>from their 1.5 playground
When will they release this?
>>
File: zimg_00054.png (1.45 MB, 768x1024)
1.45 MB
1.45 MB PNG
>>107864100
niggas be gay asl sometimes
>>
holy shit the sph guy made a sph joi lora
>>
>>107864130
>im well endowed man
a transman then, lul
>>
>>107864130
because they are a good subject of ridicule

also LMAO it even got the echo:

https://files.catbox.moe/1thjzn.mp4
>>
>>107864135
I'm more interested in the jiggling boobs, and also hoping he'd make a normal hj one
>>
>>107864124
>fuck man, I want udio but local so much
same, udio at its prime was absolutely beautiful to listen, I think local will get there, but companies seem afraid to make it happen, I guess the music cartel is not something you wanna mess up with
>>
>>107864140
this one works for image to video, ive used it before
https://civitai.com/models/2068208?modelVersionId=2340348
>>
File: kek.png (117 KB, 236x329)
117 KB
117 KB PNG
>>107864139
>https://files.catbox.moe/1thjzn.mp4
lmaoooo
>>
>>107864139
>>107864137
But I'm a male, why don't you answer the question why do I need to hear about trannies every time I open your videos? How about you make something actually funny?
>>
>>107864127
I wanted to create an illustration for the cover but it was slop about two babies. This clean image is way better.
>>
>>107864149
actually i think it was a different 2.1 one i used i dont remember but they are out there
>>
>>107864149
oh it's for wan lol, I thought it was made for ltx2
>>
>>107864152
>I'm a male
and that's why ywnbaw
>>
>>107864095
sad I missed the train on udio, this sounds nice
>>
>>107864161
Epic comeback, upvoted.
>>
File: zimg_00159.png (1.53 MB, 1080x1440)
1.53 MB
1.53 MB PNG
>>
File: it was special.gif (2.12 MB, 498x487)
2.12 MB
2.12 MB GIF
>>107864163
>I missed the train on udio
udio was special dude
https://www.udio.com/songs/cnnJ166HGBKhTeHGkxgCtq
>>
>>107864174
that's sad that we're still not close to the level of a model made in early 2024 desu
>>
File: 00029-1857980467.png (3.25 MB, 1824x1248)
3.25 MB
3.25 MB PNG
>>
>>107864095
Maybe so, ACE Step v2 will be said to surpass that so we'll see, but I'm sure with v1.5 you can get this quality with voices:

https://files.catbox.moe/hd5chh.mp3

I've only heard the Japanese stuff but you can clearly hear that the vocals and instruments are high quality (I can with my hifi gear), so I really don't think it's bad compared to Udio. Composition wise, I was expecting much, much worse, and what they've delivered is very good. But that's one area that I suspect Udio is still slightly ahead due to better quality dataset and understanding on more genres etc so it's gonna be easier to prompt Udio but with Ace Step I'm not disappointed in what it can do right now and I think that it can catch up in whatever it missed with a finetune.
>>
>>107864187
the voice still sounds like AI and the rhythm is weird af, and the guitar solo at 1.17 sounds like fart lmao
>>
File: 44548648544.png (16 KB, 1105x125)
16 KB
16 KB PNG
>>107864131
>When will they release this?
No idea, but the dev is talking of day 1 comfyUI support so I'm guessing release is imminent, plus the v1.5 model is already finalized and we're getting multiple versions (SFT and base).
>>
>>107864201
>the voice still sounds like AI

Not as much as the Udio sample provided.
>>
File: 00035-3982326340.png (2.6 MB, 1824x1248)
2.6 MB
2.6 MB PNG
>>
>>107864216
absolutely delusional >>107864174
>>
>>107864222
>>107864184
give her some red markings
>>
File: 6.png (2.72 MB, 1024x1408)
2.72 MB
2.72 MB PNG
>>107864056
>You are not a regular local llm user anyway, seems like it...
yup, can't even imagine having to swap between image models and llms while running this stuff, dont have the RAM for it anyway
>>
>>107864212
>SFT and base
do they know it's the RLHF process that removes the slop? that's the secret sauce of Z-image turbo
>>
Invalid number of frames: Encode input must have 1 + 8 * x frames (e.g., 1, 9, 17, ...) what is an acceptable frame count

An acceptable frame count is any integer that follows the sequence 1, 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, or 113.

good to note for setting frame cap in the ltx video extender workflow. this is amazing, it's a way to clone audio too, not just i2v with video.

https://files.catbox.moe/l6ntrr.mp4
>>
>>107864242
kek, but somehow the audio gets louder than on the original part, dunno if that can be fixed on that custom node
>>
>>107864251
I think the original audio was low, could fix that with adjusting gain in the original clip lol

non troon example (didnt fix gain yet)

https://files.catbox.moe/oioxj4.mp4
>>
File: detailer.webm (1.12 MB, 704x1280)
1.12 MB
1.12 MB WEBM
The detailer lora might be a bit much
>>
>>107864044
>but now that people are expeting a lot from it, they decided to not release a half assed base and go for a high quailty finetune
this is exactly what im arguing against, a base model should not be "good". it already was a good base as proven by the turbo finetune so if it was lightning in a bottle, they are more likely to break the bottle than to intensify the lightning
>>
>>107864187
What I mainly hope is that these aren't selected from their best gens but represents the upper average, because in udio I could get random 30s clips sounding really good.
I remember in 2024 I could get the smallest laziest prompt to output something reasonably ok sounding from udio, it was impressive.
>>
>>107863941
>_< sooo qt :3
>>
>>107864184
https://vocaroo.com/1fMoZ0XvsPZn
>>
>>107864234
I just need some meme material... Not an llm to bolster my words.
>>
>>107864234
"The man gets up and realizes he is in a crystal cave"
https://i.4cdn.org/wsg/1768435115832529.mp4
>>
lmao, if you use an anime clip, it will try to copy the audio, it kept the music beat:

https://files.catbox.moe/826jej.mp4
>>
File: 1746233148823458.png (194 KB, 800x534)
194 KB
194 KB PNG
>>107864319
>the Wan 2.2 "killer" can't even understand it has to use the man on the image input to produce its slop
China is laughing right now
>>
File: 133997316.png (493 KB, 896x256)
493 KB
493 KB PNG
>>107864319
why did he become indian ;-;
>>
>>107864302
you could easily cut and edit to get a proper full song with it too, this will probably be a pain in comfy lol

https://files.catbox.moe/tb9avy.mp3
>>
>>107864300
yeah I agree I'm also weary that they're gonna overcook that shit, let's hope they still understand what base really means and that they're only slopping the SFT model and not Omni
>>
>>107864225
This is suno v4.5 anon
https://suno.com/s/jF9lko2WUiyjmybE
https://suno.com/s/j5ZbCpS1fvzdeMdv

I feel that ACEStep 1.5 is there. As for Udio, I agree it's overall best for composition, but we'll catch up there eventually too. First, let's get Suno.
>>
File: Office.png (643 KB, 721x962)
643 KB
643 KB PNG
>>
>>107864336
THE NAME

IS DOCTOR...DISRESPECT

that's really good, what did you prompt, synthwave?
>>
>>107864346
real?
>>
>>107863803
Narusegawa Naru
>>
>>107864344
>using vocaloid (artificial voices) to try to make a point that suno can make realistic sounds and music
you can't make this shit up
>>
>>107864352
This is what they want people to create.
>Linux versus Normal User.
>Fat Linux User Man is looking down to skinny windows user.
>>
>>107864336
God damn this sounds good. And we can get that local?
>>
>>107864222
"The girl wags her finger seductively towards herself, inviting the viewer to approach"

https://i.4cdn.org/wsg/1768435609335318.mp4
>>
>>107861070
Sauce on bottom left for fucks sake
>>
lmao

https://files.catbox.moe/est64p.mp4
>>
>>107864373
LTX2 has this tendancy to brighten up the scene a lot, you can tell it has been trained with a lot of synthetic slop
>>
>>107864379
me in the back
>>
>>107864375
>anime thread general ->
>>
>>107864341
it sure would be nice to have a true upgrade to sdxl, with the same level of community tuning (large models need not apply). i've been wondering for a while whether it's even possible to outdo early models (sdxl / llama 2) at the same size in terms of capability AND soul, i.e. not being more slopped than those models
>>
>>107864363
I'm speaking mostly about composition, not realistic sounds/music.
>>
>>
>>107864368
That's the hope anon
>>
>>107864400
>not realistic sounds/music
that's the most important thing, the fuck?
>>
>>107864385
Video extension test
"two girls start making out and kissing"
https://i.4cdn.org/wsg/1768436060700947.mp4
>>
https://files.catbox.moe/5yfow9.mp4

I said holds up a laptop. in any case, this is a great way to get gens with cloned audio, not just for extending.

https://files.catbox.moe/5yfow9.mp4
>>
>>107864410
Composition comes first, sound quality second. Granted, Ace Step 1.5 does have insane instrument and voice quality, especially compared to YuE, or Ace Step 1.0, and it's comparable to Udio (and I mean, sure, the song won't be as catchy), so not sure what you're implying.
>>
File: what a retard.png (1.53 MB, 1280x720)
1.53 MB
1.53 MB PNG
>>107864440
>sound quality second.
>>
resident schizo bake time
>>
File: 6.png (2.44 MB, 1088x1280)
2.44 MB
2.44 MB PNG
>>
>>107864445
sounds retarded but he's right, what good is a perfectly realistic song if it's realistically boring and annoying?
>>
actually amazing what the video extend workflow will make, even with a minimal prompt:

https://files.catbox.moe/0dylvp.mp4
>>
File: based.png (111 KB, 320x288)
111 KB
111 KB PNG
>>107864512
lmaoo, that one was good
>>
>>107864512
share you kinos here as well anon >>>/wsg/6072442 we welcome you
>>
Enjoying your base model you fucking retards?
>>
>>107864302
>What I mainly hope is that these aren't selected from their best gens but represents the upper average

These weren't selected. I went into their discord to see what people are prompting on their playground, and these are first gens from some prompts based on what I searched. Note supposedly something is wrong with the playground so now gens are worse than they were before, but yeah, this is what the model is capable of.

>>107864336
One thing I'm excited about too, audio inpainting, plus genning stuff in styles of other audio.
>>
why you trying to get the mods on our case
>>
>>107864521
My based model?
>>
File: ed.mp4 (3.83 MB, 2048x1364)
3.83 MB
3.83 MB MP4
>>>/wsg/6072802
>>
File: 1745176290208506.png (352 KB, 800x922)
352 KB
352 KB PNG
>>107864521
>Enjoying your base model you fucking retards?
I enjoy SDXL base yes
>>
https://files.catbox.moe/mhbib8.mp4

LTX2 is the best video model to date. the last 5 seconds is extended (with the previous workflow linked)
>>
>>107864569
>LTX2 is the best video model to date.
and it looks like they're gonna improve the audio pretty soon, I've seen a discord screenshot of them talking about that at some point, can't wait
>>
File: Untitled.png (1.46 MB, 1392x992)
1.46 MB
1.46 MB PNG
>>
WE NEED MORE SARA PETERSON LORAS STAT!
>>
File: file.png (251 KB, 500x244)
251 KB
251 KB PNG
>>107864569
>https://files.catbox.moe/mhbib8.mp4
this is pretty good
>>
>>107864620
>>107864620
migrate
>>
>>107864595
who
>>
>>107864621
why
>>
>>107864659
this thread is autosageing redditor
>>
>>107864664
>autosageing
youd like that wouldnt you fag



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.