[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1765815295790257.jpg (2.7 MB, 3126x2407)
2.7 MB
2.7 MB JPG
Discussion of Free and Open Source Diffusion Models

Prev: >>107805470

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg
>>
This sounds pretty good: https://github.com/Saganaki22/ComfyUI-AudioSR

To process low quality LTX2 audio output to better quality.
>>
>>107801257
Asking on this one as well for more nanobanana prompts for making datasets
>>
>schizobake
>>
>>107809444
fuck off with your jewtx64gbram
>>
Wether you like it or not AniStudio should be in OP
>>
>>107809444
good idea

>>107809490
what? this runs on most gpu people have here, doesn't it?
>>
>>107809497
feel free to make your own thread. you can put whatever you want in the OP
>>
Anyone using z-image finetunes? Or is everyone running the vanilla original? Somehow I can't decide if I find a finetune being better than the original.
>>
>>107809513
they are not finetunes, they are just loras merged with the z-image model
>>
>>107809490
meds
>>
>>107809513
only for lora training, not for genning
>>
>>107809539
Are they good for anything?
>>
lmao the image + sound workflow kijai posted is amazing, just provide audio and prompt "talking" for example: deus ex sound clip source

https://files.catbox.moe/tt1obv.mp4
>>
>>107809556

https://files.catbox.moe/oio2rr.mp4
>>
>>107809364
wtf is that
>>
File: comfyui-portable.png (351 KB, 2008x2080)
351 KB
351 KB PNG
Hey, can anyone help here?

I am using ComfyUI portable for Wan as per the rentry guide, and it had been working fine to this point. However I recently updated it, along with all the nodes, and now a bunch of nodes are broken including the important VHS_Combine node, which is required for generating videos.

Does anyone know exactly what has gone wrong here? Something related to numpy? If so, is there a simple way to downgrade without breaking something critical?
>>
>>107809588
>he pulled
>>
>update comfy
>nothing works anymore
why is cumfartui so shit?
>>
>>107809588
I've been using comfy for years no, and while it can to shit, it's never this bad, so I wonder if it's a case of windows install being particularly bad since I'm ubuntu, or if you guys just install every custom node no matter how unmaintained it is.
You should read the error too, it's a mismatch of numpy version, something wants v2.x but something else wants v1.x and it's breaking stuff.
>>
>>107809614
I have literally never had this problem. It's always a simply git pull and install requests. You must be using the retard portable version or something.
>>
>>107809614
he hates custom nodes and wants you to use the native ones, or wait until they exist
>>
>>107809556
I already lost track of all these workflows. Is it the workflow with Mel-Band RoFormer that can be used for singing? I think https://github.com/RageCat73/RCWorkflows/blob/main/LTX2-Audio-Input-FP8-Distilled.json is even a little improved.

>>107809588
That's a node I like a lot too, though technically you could do without it on a recent comfyui (with less options how to save the video).

IDK what went wrong but I'd generally just uv pip install -r requirements.txt with the venv activated and then do uv pip install <stuff> manually as-needed.
>>
lmao, cia guy does music, and it works: kijai audio workflow

https://files.catbox.moe/t64fx4.mp4
>>
https://files.catbox.moe/apg37g.mp4
>>
im dying cause I didnt prompt to add the black guy. I just prompted "the man sings".

https://files.catbox.moe/ach182.mp4
>>
miku audio:

https://files.catbox.moe/e4ei3p.mp4
>>
I have test anon fatigue.
>>
>>107809701
better, last one but the workflow does in fact work well.

https://files.catbox.moe/jsfa9q.mp4
>>
>>107809710
it's set in deep during kontext "text"
>>
Reads don't mess up an ssd like writes do? Doing a --cache-none on comfy to reduce ram usage on Wan, but its doing like 30gb of reads every gen.
>>
>>107809743
only write
>>
>>107809614
he pulled?

>>107809638
of course he does because its not making him any money. i updated recently and found 3 nodes didnt work
>>
Some fun with a LTX default workflow:
https://litter.catbox.moe/9iug9bw879i6o9hb.mp4

>>107809670
I think it actually summons people based on voice(s) at times

>>107809712
looks very good
>>
lmao, Hitler reborn as Floyd (speech audio, floyd image 2 video)

it is SO expressive. I love ltx2.

https://files.catbox.moe/p1gjoi.mp4
>>
the highlight of my day is seeing if my gen made it into the op
>>
>>107809786
slightly longer speech (240 frames)

https://files.catbox.moe/1l3b74.mp4
>>
File: ZImageTurbo + 2x upscaler.png (3.45 MB, 1248x1824)
3.45 MB
3.45 MB PNG
https://files.catbox.moe/xf5zuk.png
>>
File: 1737409640854075.png (2.28 MB, 1632x928)
2.28 MB
2.28 MB PNG
>>
File: ZImageTurbo-_0005.png (1.06 MB, 832x1216)
1.06 MB
1.06 MB PNG
Upscaler screwed up the lettering of this one
https://files.catbox.moe/7lu3so.png
>>
okay last one. this is the best speech by floyd.

https://files.catbox.moe/yfo4xs.mp4
>>
it's apparently a warning
https://litter.catbox.moe/vw0occ677epqlq78.mp4
>>
File: 1756458140114154.png (2.34 MB, 1632x928)
2.34 MB
2.34 MB PNG
>>
File: 1750313320090174.png (1.86 MB, 1632x928)
1.86 MB
1.86 MB PNG
>>
File: 1754014357767103.jpg (299 KB, 1352x1058)
299 KB
299 KB JPG
>>107809385
Okay my 5070ti just arrived
Image gen massively increased in performance. From 5 to 10 minutes generating 2k image with my old GTX 1080 into 5 to 10 SECONDS with my 5070ti

I think im gonna try Video generation next.
Im gonna get addicted to this. This is my fentanyl
>>
File: images[1].jpg (6 KB, 275x183)
6 KB
6 KB JPG
>>107809588
https://files.catbox.moe/p1s7wj.mp4
>>
File: file.png (77 KB, 1024x1024)
77 KB
77 KB PNG
horse man
>>
Anon, tell me chroma model to use for realism and anatomy. I love sdxl but hands are always weird
>>
>>107809870
does he speak normal words?
>>
>>107809893
neigh
>>
>>107809385
Say what you want about Flux.2, it has already been proven it has more sovl and LoRAs are way better than ZiT versions

https://civitai.com/models/2212121?modelVersionId=2511510

This is 32B vs. 6B anyways, it's an indisputable fact.
>>
does wan 2.2 i2v degrade the quality when the resolution is too high (e.g. above 1280x720)? just curious how high the resolution can be without fucking up the quality
>>
>>107809900
flux 2 is pretty cool, it may be 32B but it gens faster for me than qwen for example, the offloading works really well
probably not great for NSFW but it's a cool model
>>
>the girl is sitting on the on top of the table next to the keyboard
>the girl ends up sitting in the chair AT the keyboard and half a person sticks in the table
sometimes z-img is as dumb as SDXL. I don't get it.
>>
>>107809761
I think it can be corralled with a good prompt
https://files.catbox.moe/wyzd80.webm
>>
>>107809922
>sitting on the on
>blaming the model
>>
>>107809943
made that typo here.
>>
File: LTX-2.mp4 (492 KB, 704x704)
492 KB
492 KB MP4
>>107809854
tried to prompt a video with your image
> she dies

>>107809875
you might want to start with z-image-turbo
chroma suggestions are in the previous thread and the one before

qwen, flux2 and wan (1 frame for an image) also can do hands quite well

>>107809859
congrats, you certainly could do video now
>>
ok when I put another person IN the chair AND say it's viewed from the side, the girl ends up ON the table...
>>
>>107809977
why does she turn into a horse at the end
>>
>>107810013
i really do not know. it was just "she dies" in the prompt. that's probably how it works? haven't died yet.
>>
>>107809977
holy lmao
>>
>>107809977
>congrats, you certainly could do video now
Where do i start ?
>>
File: 9072534.png (1.77 MB, 1024x1536)
1.77 MB
1.77 MB PNG
>>
File: 1744162931754915.mp4 (1.73 MB, 1264x720)
1.73 MB
1.73 MB MP4
>>107809813
>>
File: 1758836420362253.png (2.32 MB, 1408x1152)
2.32 MB
2.32 MB PNG
>>107810065
hot
wan or ltx2?
>>
Hi guys, I'm running some Wan 2.2 high / low noise model video gens. Image to video. The workflow seems to just ignore the inserted image and create a clip from the prompt only, any help?
>>
>>107810038
for ltx probably with these workflows https://github.com/Lightricks/ComfyUI-LTXVideo/tree/master/example_workflows
or https://github.com/RageCat73/RCWorkflows/blob/main/LTX2-Audio-Input-FP8-Distilled.json for audio-matched singing/talking perhaps, there are also workflows in the templates

wan has a lot of workflows but perhaps start with the basic templates / https://comfyanonymous.github.io/ComfyUI_examples/wan22/ ?


>>107810037
another one, the sound is just silly too
https://litter.catbox.moe/6pqc49695ucazl09.mp4
>>
>>107810073
maybe you are using the wrong checkpoints or lora, hard to tell without seeing the workflow tho
>>
>960x960x240 = 221 million pixels total, fine
>720x720x408 = 211 million pixels total, oom
what gives
>>
>>107810069
WAN still. I haven't had time to setup LTX2 yet.
>>
>>107810092
you're naive, calculations aren't just about pixels, especially if we're involving relating previous frames to next frames.
>>
the future of animation is LTX2:

https://files.catbox.moe/kj8io2.mp4
>>
>>107810092
are you counting the temporal resolution as pixels? i don't think it's as simple as that, there are probably attention mechanisms that scale worse than linearly that need to attend to the whole video at once
>>
>>107810106
holy shit, this iteration it invented new characters.

https://files.catbox.moe/6ggc31.mp4
>>
File: 078245654.png (1.83 MB, 1024x1536)
1.83 MB
1.83 MB PNG
>>
wan sisters please help. how the fuck do i stop "Diffusion Model Loader KJ" from offloading every other generation? takes about 5 - 8 minutes to load everytime. i can queue up 10 gens and every few gens it will offload the models and i have nothing extra running
>>
File: 1759975299844544.png (2.29 MB, 1152x1312)
2.29 MB
2.29 MB PNG
>>107810123
nice pussy
>>
does ltx2 work for goon stuff
>>
>>107810127
>he doesnt have enough ram
lmao
>>
>>107810130
lol
lmao
>>
>>107810130
it has made me coom, but I'm a freak who loves roleplay more than explicit stuff
>>
close enough
>>
File: ZImageTurbo-_0031.png (2.95 MB, 1824x1248)
2.95 MB
2.95 MB PNG
https://files.catbox.moe/5yt6cq.png
>>
File: 1741971397716464.png (2.12 MB, 1632x928)
2.12 MB
2.12 MB PNG
>>107810166
>supreence
just reroll it bwo
>>
File: Capture.png (1.1 MB, 3463x1707)
1.1 MB
1.1 MB PNG
>>107810085
I'll post the result when it's done, but I'm not hopeful, I'm not sure why it just ignores the input image.
>>
File: ltx.mp4 (982 KB, 1280x704)
982 KB
982 KB MP4
>>107810130
sometimes you can continue from lewd i2v for a bit which might be good enough for some, it can talk dirty, some people's fetishes might be supported, and if you squint hard enough it supports (usually deformed) naked boobs t2v with sufficient attempts

but really: not very well
>>
>>107810196
looks like the high noise model is labelled t2v, not i2v
>>
>>107810132
its doing it every gen now, what are some -- settings to stop this from happening?
>>
>>107810205
probably, I think I just fixed that...
Restarted.
>>
>>107810123
zimage?
>>
>>107810130
wait for nsfw loras
>>
dubs man with some insight:

https://files.catbox.moe/0l09xv.mp4
>>
>>107810232
ya
>>
>>
File: ZImageTurbo-_0037.png (2.55 MB, 2304x960)
2.55 MB
2.55 MB PNG
https://files.catbox.moe/nd73sa.png
>>
File: ltx_hallucination.mp4 (2.13 MB, 1280x704)
2.13 MB
2.13 MB MP4
>>107810224
just to be clear, it's not just the filename, you need the i2v model for both high and low

also "r" eloading node definitions should make new models show up without restarting
>>
>>
Is there any point in using a specific qwen 3 4b variant over the normal one for z-image? Maybe allowing less censored output?
If so which one?
>>
>>
>>107810266
I downloaded (I think) all of the models so yeah, it should just be a drop down select for me. Thanks for clarifying though, most people just ASSUME you know! thank you for looking and pointing it out!
>>
>>107810252
thanks
>>
>>
>>107810268
no. you can use various quants PROBABLY with no ill effect most of the time, but it doesn't unlock much

things may change if the nsfw trainings happen against abliterated versions of qwen or w/e, maybe then it starts to matter?
>>
>>107810292
The nose need inpainting, otherwise I like it
>>
>>107810310
OK, I was hoping it would help, but it makes sense.
>>
>>107810315
it's a boogie
>>
>>107810268
isn't qwen just for tokenizing the prompt/images inputs? abliteration wouldn't do anything useful unless qwen was actually used to do any inference. but even then, qwen is actually pretty uncensored by default.
>>
>>107810335
It's the text encoder, yes.
>>
>>107810335
some workflows already use the fact that it's a VL model to also do image-to-prompt at some point

but yes abliteration doesn't even add knowledge on either the image model or the text model, it just stops the retarded refusals
>>
File: file.png (38 KB, 633x386)
38 KB
38 KB PNG
wher basuuu
>>
>>107810370
soonâ„¢
>>
Does --highvram work to keep models from offloading?
>>
>>107810198
Wtf bro that looks awful.
>>
Where do i even start with this
>>
>>107810198

https://files.catbox.moe/g4jm4z.mp4
>>
>>107810398
vram_group.add_argument("--highvram", action="store_true", help="By default models will be unloaded to CPU memory after being used. This option keeps them in GPU memory.")
>>
File: 1747606581557314.png (2.15 MB, 2159x1536)
2.15 MB
2.15 MB PNG
>>
>>107810370
100 more weeks, don't worry :3
>>
>>107810407
>This option keeps them in GPU memory
Sounds good, I'll give that a try. Never had to bother with high low vram settings until this week. Current settings are --windows-standalone-build --use-sage-attention --fast fp16_accumulation --disable-api-nodes
>>
>>107808950
Thanks
>>
>>107810205
I think you got me, that output followed the input image
>>
>>
File: ZImageTurbo-_0055.jpg (593 KB, 1248x1824)
593 KB
593 KB JPG
>>107810446
Just FYI, I switched prompts to a less retarded one. The 4-step one has a fuckton of upscaler artifacts, the one below has a very persistent camo-like pattern (latent noise artifacts, I think) on the image I can't remove no matter what. So pick your poison

https://files.catbox.moe/8llwr6.png
>>
>>107810598
* I switched the workflow
>>
how the fuck are you prompting someone's groin in zimage?
>groin, crotch, between the legs, pubis
only seem to vaguely address the region
>pussy, vagina etc
often draw genitals over the clothes
>genitals
ignored somehow
how do you prompt someone holding her hand in front of her crotch?
>>
>>107810598
fucking nice
>>
>>107810406
excellent. did you generate the audio externally or is that all LTX?

>>107810527
great, easy fix then
>>
Can LTX2 generate Audio only for a video?
>>
>>107810629
use simplified Chinese translation of the area
lower midriff
>>
>>107810370
i ate it and now my belly is all big and round... mmmmfffgh
>>
this helped me get it running finally
https://www.reddit.com/r/StableDiffusion/comments/1q7klzo/i_followed_this_video_to_get_ltx2_to_work_with/
>>
2000 steps
prolly should bake some more
>>
>>107810912
No one asked, avatarfag.
>>
>>
>>107810921
:P
>>
File: ZImageTurbo-_0089.png (2.42 MB, 2304x960)
2.42 MB
2.42 MB PNG
>>107810912
Keep at it, loving it
https://files.catbox.moe/6kv4i9.png
>>
LTX2 is such an emotive model desu:

https://files.catbox.moe/gts1j8.mp4
>>
>>107810788
I was wondering the same thing and anon said
>>107800860
>>
>>107810948
thanks
>>
File: ZImageTurbo-_0092.png (1.72 MB, 2304x960)
1.72 MB
1.72 MB PNG
colors a tad too washed out, like someone is watching the movie with a monitor with too high brightness and low contrast setting. Kinda weird glitch
https://files.catbox.moe/ofgsd0.png
>>
https://files.catbox.moe/x7wfck.mp4
>>
>>
catbox is way too slow for this shit
>>
>>
File: ComfyUI_01316_.jpg (3.61 MB, 2880x1440)
3.61 MB
3.61 MB JPG
Holy shit I've been wondering for days why my ZiT images suddenly look like noisy fucking garbage after a pull. It's because a commit made ZImage default to fp16 dtype instead of bf16.
There's no option to select bf16 in the default diffusion loader so you have to use the "ModelComputeDtype" node
>>
>>107811022
the fix
>>
>>107810770
That was all LTX.
>>
>>107811022
you mean this commit?
https://github.com/Comfy-Org/ComfyUI/pull/11057

I was wondering how to "fix" it. thanks for that
>>
https://files.catbox.moe/ge3e3f.mov
>>
>>107811022
>>107811028

Had the same issue,Thanks a lot, man
>>
>>107811052
I believe that's the one yeah
>>
File: 1742326097810428.png (1.15 MB, 1048x992)
1.15 MB
1.15 MB PNG
>.mov
>>
File: ComfyUI_00012_.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
>>
>>107811052
So, Cumanonymous ruined something on purpose? This is nasty.
>>
>>107810912
The only time I've not needed to bake it for 4,000 steps is when it was of a person and they were already a form of asian. Even then it was better than less desu.
Really nice gens.
>>
the ltx commits have dried up, is this the final state for it in comfy?
>>
>https://files.catbox.moe/kr3svc.json

To any anon that got this workflow working with the 2K-DC blockwise checkpoint working, can you share your Comfy flags?
It runs but I get a bunch of noise
>>
>>107811083
i clicked it it's fine
>>
File: ZImageTurbo-_0114.png (1.71 MB, 2304x960)
1.71 MB
1.71 MB PNG
>>107811022
>>107811028
This workflow uses your fix. Thanks once again
https://files.catbox.moe/0275tt.png
>>
>>107811112
use LTX workflows NOT comfys, they are much better >>107810891
Know that I2V is sort of buggy and needs 48 fps to work well. They said they are gonna fix it within a month or two with ltx2.1 then 2.5
>>
>>
>>
nudity is fixed with albliterated gemma btw, the model itself does not seen censored
https://civitai.com/models/2292336/ltx-2-nsfw-text-encoder-gemma-3-12b-abliterated?modelVersionId=2579572
>>
>>107811188
obviously genitals are not detailed, but a lora will fix that fast
>>
File: 00025-4192047237.png (1.42 MB, 1368x856)
1.42 MB
1.42 MB PNG
>>
File: zimage1.png (1.21 MB, 1536x1024)
1.21 MB
1.21 MB PNG
>>
cozy bread
>>
>>107811207
hey thats my line you thief
>>
File: image-w1280.jpg (460 KB, 1280x720)
460 KB
460 KB JPG
>>107811135
original has more sovl
>>
>>107811231
Whose line is it anyway?
>>
>>107811031 >>107811093
it certainly has potential:
https://litter.catbox.moe/2kuse1bpjgofrolw.mp4
https://litter.catbox.moe/yx2lxondp7xravrc.mp4
>>
>>107811234
Mission improbable: your only son becomes a tranny
>>
File: 00013-848792884.png (1.38 MB, 952x1152)
1.38 MB
1.38 MB PNG
>>
>>107811254
did you prompt it to be australian
>>
>>107811232
>>
>>107811408
yes. and yes, Indian works too.
>>
>>107811428
Weird I tried australian before but couldn't get it. I'll give it another try
>>
what happens if you include a tag in parenthesis during training? I see boorutageditor has weight options that does this (1.0, 1.1, 1.2 etc..)

for example
1girl, blonde hair, (hair between eyes), blue eyes,
in the txt file. does it make the ai train really hard on 'hair between eyes'? is it a good idea to use that on a tag that is not super common but something you really want the lora to learn? such as trapezius muscles, which is probably something that is not really tagged at all in the general models or in the danbooru tags.
>>
>>107811022
its the same picture
>>
>>107811442
The trainer needs to supports that. Need to read the fine details of whatever you're using.
>>
a base model just flew over my house
>>
>>107811455
kohya?
>>
>>107811451
Compressing into jpg to fit into 4mb covers some of the detail loss but just look at the green background and edges of the red circle it's significantly noisier on fp16
>>
>>107811434
it worked most times, wasn't a lucky roll
>>
>>107811463
Yes, Kohya_SS has that option.
>>
>>107811479
what is it called specifically so I know what to look for?
>>
>>107811005
yeah it's annoying, even more when
>click embed
>wait 20s
>another floyd or hitler or trump
all i want are pretty girls fuck that ugly shit
>>
>>107811470
hmmm yeah maybe
i will test this myself when i'm back home
>>
>>107811496
https://github.com/kohya-ss/sd-scripts/pull/336
>>
https://ltx.io/model/model-blog/prompting-guide-for-ltx-2
>>
>>107811005
>>107811506
Just look up vidya dump threads on /wsg/ (or /gif/ for spicy stiff), then post links
>>
What kind of system prompt is recommended to properly describe images in natural language?
>>
Abliterated gemma 3 is snake oil then?
>>
>>107811599
no, it for sure makes a difference.
>>
>>107811586
joycaption?
>>
>>107811599
It will give you different results just cause it's a modified weight, ltx won't suddenly become uncensored/nsfw or something lol
>>
>>107811599
Abliteration removes refusals, but also dumbs down the model, making it hallucinate harder. Usually it means it gets progressively more retarded in chats (especially when ERPing) so I'm not sur ehow it affects short one-time prompts. There's also this https://huggingface.co/Nabbers1999/gemma-3-12b-it-abliterated-refined-novis/tree/main that's further trained and another "less censored" version called herecic but I' not sure how to make it work with ltx
>>
What is the base noobai model called specifically that you train on? is it just illustrious?
>>
>>107811697
illustrious 0.1 is the base for pretty much everything current
training on noobai eps 0.5 or vpred 1.0 is reasonable though
>>
>>107811701
Why not illustriousv1.1 or v2
>>
>>107811709
They were partially trained on 1.5 megapixel images which was a very bad idea for sdxl finetune
>>
>>107811709
They are failed models with poorly implemented hires training
>>
>>107811648
it will output entirely different things because it's becoming another model
>>
>>107811697
Training on naked noob is fine.
>>
Am I retarded for trying to run chroma-unlocked-v50-annealedon a 9060 xt with 16 gb vram and 32 gb ram? A 512x512 gen took 3 minutes, 768x768 9 minutes, and I'm not even going to try a 1024x1024.Although its not showing any ram offloading so its kind of wierd.
I tried a sdnq quant with int8 and it genned a 1024x1024 in 2 minutes which still isn't exactly fast but the gen was pretty sharp. Is this just amd/rocm sucking or does chroma really just kill?
>>
>>107811701
>>107811736
i'm just using the civit lora trainer atm. I have not yet been able to figure out how to get local training to work on my 5070ti.
what exactly is it called so I can search for it on the civitai list?
I'll move over to the other thread if I have more questions, I realize this isn't relevant to the local thread.
>>
>>107811783
comfy used to have a bug with chroma that made it super slow, its now as fast as flux, faster with the distill lora. Also use the 2k res versions, now the 512 res ones.
Use this https://huggingface.co/silveroxides/Chroma-Misc-Models/blob/main/Chroma-2K-QC/Chroma-2K-QC-fp8mixed-blockwise.safetensors

With this:
https://github.com/silveroxides/ComfyUI-QuantOps
>>
no base general
>>
>>107811790
https://civitai.com/models/833294?modelVersionId=1190596
if you want to train more than one lora consider spending time on figuring out local training, using civitai when you have a 5070ti is a waste
>>
>>107811804
Alright,thank you, will try
>>
Anyone get the ltx2 vid2vid working? I almost got something decent from a poor quality input video, made up some interesting details, but it shits the bed with motion and falls apart near the end.
>>
>>107811857
I2V is bugged and needs 48 fps, they said 2.1 would work on this
>>
>femanon, look at this cool lora i made of my face
>so you just asked ai to make you muscular with a full head of hair?
>n-no its more than that
>>
>>107811804
I couldn't make it work, just get noisy image.
>>
you now remember that despite being the same size as turbo it will take longer to gen
>>
finally got ltx working
https://files.catbox.moe/67cy3a.mp4
>>
>>107811866
Not using the I2V though? using their V2V: https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/LTX-2_V2V_Detailer.json
>>
File: file.png (34 KB, 150x182)
34 KB
34 KB PNG
>>107811883
did you prompt him to sound like an italian mafioso?
>>
File: vram.jpg (117 KB, 912x604)
117 KB
117 KB JPG
>>107811823
thanks.
>using civitai when you have a 5070ti is a waste
there's something critical i'm missing. I can't get the vram usage down in kohya no matter how many setting I turn down.
>>
>>107811892
same difference, the issue is that their temporal compression is too aggressive, more fps lessens the effect of this and fixes the issues during movement. They said they plan to refine the temporal compression more as their focus was speed
>>
>>107811883
Oh thank god. A gen than isn't George Floyd, the CIA agent.
>>
>>107811909
was there an ETA on 2.1?
>>
Why does LTX2 keep giving me Indians
>>
>>107811934
You either get the Indian model or the Chinese model this is the cost of America outsourcing its tech talent and industrial capacity.
>>
>>107811931
In a month or so. They said they would try for their 2.5 in Q1 which has major improvements.

>>107811934
its biased to india and china unprompted cause most of the world's content is from those places.
>>
>>107811934
Describe the ethnicity explicitly, saar
>>
File: 1750108113249903.png (12 KB, 760x60)
12 KB
12 KB PNG
>>107811902
im somewhat of a prompt engineer myself
>>
the kijai sound + image workflow is great.

https://files.catbox.moe/s5c9ke.mp4
>>
LTX-2 lora anons, are you training on the dev model or the distilled model? Tried making a lora on the dev model and it sucks, can't even tell if it's doing anything.
>>
File: sdgsdfasdfsd.png (49 KB, 1106x346)
49 KB
49 KB PNG
>>107811931
>>
https://files.catbox.moe/bb38ps.mp4
>>
>>107811904
OneTrainer has default configs for different memory capacities but everyone swears by their own trainer. It's worked well for me. Noob specifically needs a manual edit to the config though.
>>
>>107811950
people made reddit posts. Or id ask people who made some on civitia
>>
>>107811953
Guess I'll stop fighting with it and hope that 2.1 or 2.5 yields something stable. Was actually impressed with the few frames of compression garbage that it managed to make something from.
>>
>>107811959
i'll check it out. could you share your configs?
>>
>Wan is so much better than LTX
>The Wan videos in question are basically static shots of a woman slapping her hips against a dick for 5 seconds.
>>
you are gonna see so many mobile game ads made with this lol
https://files.catbox.moe/z5dz25.webm
>>
>>107811996
I hate how conservative it makes the underwear.
>>
>>107811956
what did you prompt? wow game + camera prompts?
>>
>>107811934
Indian content is huge, it's just basically unknown to the rest of the world because it's not that interesting outside of the occasional bollywood thing.
Just write the ethnicity you want, or do i2v like sane people.
>>
>>107812019
the grass on the hills and trees sway gently in the breeze, a blue 2017 Toyota Prius drives along the dirt road in front of the player, the camera shifts view to follow the car as it drives through the scene. Epic MMORPG music plays. A panicked voices says "Chat! Chat! Are you seeing this?"

Just a lazy prompt and a WoW screenshot.
>>
>>107812044
looks neat, so much variety with the gens.

https://ltx.io/model/model-blog/prompting-guide-for-ltx-2

ltx has a lot of prompting ideas im skimming through this
>>
>>107811996
crazy how the lora mimicked the game's soundtrack perfectly
>>
>>107811804
>fp8
get the fuck out
>>
File: 1765248625110865.png (1.96 MB, 1584x1056)
1.96 MB
1.96 MB PNG
ltx2 is fun but I havent forgot about you qwen edit 2511:
>>
>>107811996
prompt? the transition was neat
>>
https://files.catbox.moe/7ha8zw.mp4
>>
>>107812112
kino, sometimes i just want to get comfy with a 12 boar asses game
>>
>>107812112
>collect 12 boar asses for king gobblecock
award
1 silver
>>
https://files.catbox.moe/z6zk65.mp4
>>
https://files.catbox.moe/m8y4t0.mp4
>>
>>107812184
lmao
>>
>>107812219
sora 2 could never. Now hopefully 2.5 indeed does make the compression better, still fast but better quality
>>
>>107812112
used a deus ex screenshot:

https://files.catbox.moe/nzgdxp.mp4
>>
>>107810912
Interesting how it looks like modern Tumblr commissions instead of Schiele.
>>
how do you do a camera rotation prompt? are the loras needed?
>>
Is ComfyUI still the best or should I switch to something else?

The other ones still can't do video gens right?
>>
>>107812377
nothing comes close to comfyui but neoforge is probably a good easier to use alternative for newbies
>>
>>107812383
>alternative for newbies
Ooops, I meant "alternative for people that aren't shit-eating cuckolds". imagine being forced to use an interface as bad as cumrag
>>
>>107812399
sure
>>
File: 1739720005183748.jpg (593 KB, 1784x2600)
593 KB
593 KB JPG
>>107812377
comfyui sucks only nerds use it
>>
lmao, the loras work well for camera movement. dolly out lora test:

https://files.catbox.moe/e0585a.mp4
>>
>>107812404
what do you use?
>>
If Comfyui is so good, how come there's no comfyui 2?
>>
Any AniStudio news? Ani has been committing new stuff to the main repo, seems big. Why isn't AniStudio in the op anyways? It seems to be pretty competitive nowadays and it's actually might be faster than cumfart from what I'm seeing
>>
>>107812464
very organic ani
>>
>>107812464
Buy an ad.
>>
>>107812464
I don't like vibecoded slop
>>
ran meltie
>>
I want to run the new ltx on my 3090. which one do I want, ltx-2-19b-distilled-fp8.safetensors? is fp4 usable? is there some optimized workflow for such setup or is the official one fine (LTX-2_I2V_Distilled_wLora.json)?
>>
>>107812503
Use the FP8, the FP4 doesn't look that good and is optimized for 5000s GPUs where it's twice as fast as FP8.
>>
>>107812503
kijai is working on a better WF with split files:
https://huggingface.co/Kijai/LTXV2_comfy
https://files.catbox.moe/jftiwc.mp4
But until then apparently the LTX ones are better
>>
meeting 1girl irl
https://files.catbox.moe/hvyx5q.mp4
>>
>>107812529
oh and he got GGUFs working btw, far better quality than fp8 imo
>>
>>107812524
>>
>>107812415
forge neo
>>
File: 1752838564918204.jpg (1.43 MB, 1248x1824)
1.43 MB
1.43 MB JPG
>>107812377
Swarmui is the best of both worlds. It's still comfy but there's a very useful base interface on top.
>>
>>107812550
>but there's a very useful base interface on top.
how so
>>
>>107812570
All base functions you would ever need including the best inpainting interface i've seen thus far. Built in regional prompting, seed variation, rembg, you name it.
The best sorting interface and metadata viewer, no need to fuck around with shit like diffusion toolkit.
Can pull model info from civit together with preview images
Just a very comfy interface, especially for a newfag.
>>
yea, GGUF is far better than FP8 it looks like. This is Q6 https://files.catbox.moe/hzov9n.mp4
He says he is doing Q8 as well which should look much better than fp8
>>
>>107812541
add tan
>>
>>107812529
>>107812503
Ok, I'm currently getting filtered by this https://huggingface.co/Lightricks/LTX-2/tree/main telling me to get gemma-3-12b-it-qat-q4_0-unquantized but it's gated and there is no reupload. And the text_encoder in the ltx hf repo contains weights for a different model, I'm lost
>>
death to subgraphs
>>
>>107812590
The tan is already there, but it's pretty subtle. I hate niggers
>>
>>107812541
unbelievably..shite. welcome to 2026, lmao
>>107812587
show me the inpainting inferface please, curious now
t. actual comfy user (with the old interface of course)
>>
>>107812587
How does it compare to Invoke? I tried that but didn't like it. And can it outpaint? Is it easy to upscale images over and over with it?
>>
>>107812601
I don't hate subgraphs, but they are being wildly misused desu.
They should only be for things like helper functions like math for resizing images and things that generally aren't touched by cause a lot of clutter, and the things that might be changed in it should be exposed in the settings of the subgraph. Instead people are putting everything but the fucking prompt window in there.
>>
>>107812541
Very good stuff
>>
>>107812653
Personally I use subgraphs to hide any setting that I don't use on the day to day. I want everything visible on the main view to be an actionable setting relevant to genning.
>>
lmao the dolly in/out loras are good

dolly in:

https://files.catbox.moe/g1iahr.mp4
>>
https://github.com/Comfy-Org/ComfyUI/pull/11741

latent2rgb previews coming for ltx2
>>
>>
Can you feed in a song and change the lyrics while having it sung the same way?
>>
File: autism2.png (1.01 MB, 1280x720)
1.01 MB
1.01 MB PNG
>>
ok now we are talking.

https://files.catbox.moe/q3s4jp.mp4
>>
where is pussy lora
>>
>>107812685
Why deleted? You're right there's no tan
Gyarus without tan are shite
>>
okay, now we have a winner. dolly zoom in lora used:

https://files.catbox.moe/2g7kh6.mp4
>>
File: ComfyUI_00077_.png (1 MB, 1280x720)
1 MB
1 MB PNG
>>
>>107812730
>>107812730
>>107812730
fresh
>>
>>107812739
Anatomy loras never work well
>>
new thread
>>107812673
>>107812673
>>107812673

>>107812772
duplicate, please remove
>>
>>107812780
>extremely early bake
Sorry that's spamming/flooding
>>
>>107812772
thanks for baking anon
>>107812780
Kill yourself shitposting subhuman
>>
actual thread
>>107812800
>>107812800
>>107812800
>>
File: 1755229201199045.png (173 KB, 809x504)
173 KB
173 KB PNG
>>107812635
>show me the inpainting inferface please, curious now
Here you go
Can handle masking and layers, can outpaint, has auto segment functionality.
>>
>>107813072
lmao looks like fucking shit unironically
>>
File: 1766699283673799.png (1.65 MB, 1632x928)
1.65 MB
1.65 MB PNG
moo
>>
File: autism4.png (1.83 MB, 1920x1080)
1.83 MB
1.83 MB PNG
>>107813239
hello there
>>
>>107813252
it's time to play uno?
>>
File: 1757245005541318.png (2.28 MB, 1280x1280)
2.28 MB
2.28 MB PNG
sad life
>>
Has anyone experimented with using the abliterated gemma 3 with LTX2? I couldn't find a single file version of it so I made it from the repo here https://huggingface.co/mlabonne/gemma-3-12b-it-abliterated with a python script chatgpt wrote. However, when I try to use it I get Invalid tokenizer errors and chatgpt ain't that useful about a solution. Is there a single file version file floating around somewhere that works as a tokenizer for comfyui? My ultimate goal is to use the abliterated version to train a NSFW lora which seems straightforward but I assume I also need to use it during inference as well.
>>
File: 1767379820375934.png (1.98 MB, 1280x1280)
1.98 MB
1.98 MB PNG
>>107813854
dont waste your time, the model hasnt been trained with the ablit model so it wont know the concepts anyway
>>
>>107813854
https://huggingface.co/FusionCow/Gemma-3-12b-Abliterated-LTX2

but what for ATM, it's not like you will get better results



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.