[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: collage.jpg (2.26 MB, 2878x3566)
2.26 MB
2.26 MB JPG
Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107594109

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
https://comfyanonymous.github.io/ComfyUI_examples/z_image/

>WanX
https://github.com/Wan-Video/Wan2.2
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2485296
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
Blessed thread of frenship
>>
what the fuck.
go away
fucking troll
>>
>>107597478
tanks 4 bake
>>
Opinions on https://github.com/zai-org/SCAIL ?
>>
>>107597525
>zai
>not affiliated with z image or alibaba
hm...
>>
>>107597525
Some of those community works look almost too good. I wonder if it handles nudity and stills
>>
>>107597525
Do they even ever looks like promised
>>
Base status?
>>
>>107597525
>v2v
I sleep
>>
Whih is the real thread
>>
>>107597620
Trust the plan, 2 weeks
>>
uhh not sure which thread i should speak in
>>
>>107597478
>flamewar links in OP
This is not okay
>>
>>107597620
>Base status?
they'll probably release everything in christmas day
>>
>>107597669
What if Chinese Christmas is like Chinese new year. Do we have a knower
>>
File: 1748927874573129.png (3.71 MB, 1152x2312)
3.71 MB
3.71 MB PNG
>>
Does the z controlnet work or is it still bugged?
>>
>>107598120
Z is nice model but I dont like the frame it almost forces on paintings
>>
cozy bread
>>
>>107598184
Have to try z to chroma i2i workflow sometime. Getting good composition saves so much time
>>
>>107598237
>z to chroma
that sounds ass backwards
>>
>>107598256
Yeah true, but z will give me correct amount of limbs almost every time. Good start
>>
File: kek.jpg (63 KB, 910x586)
63 KB
63 KB JPG
https://www.reddit.com/r/StableDiffusion/comments/1ppa8x9/zimageedit_news
Congrats retards, you made Tongyi cry :(
>>
>>107598284
Conflicted because SegFault is an OG chad from the Pixart days but that pic is very homosexual.
>>
>>107598284
Tongyi be like:
https://youtu.be/bYzKJ91KBzE?t=22
>>
>>107598284
they don't answer, that's the problem
>>
>>107598322
>where is base???
>it's coming.
>5 minutes later
>where is base???
>...
>wtf they don't answer!!!
>>
>>107598349
>>5 minutes later
*2 weeks later (after they said that base would be released "before the weekend" btw)
>>
File: 1754521073194063.png (3.57 MB, 1152x2312)
3.57 MB
3.57 MB PNG
>>
>>107598273
SPARK improved a lot on that. though this gen isn't a great example lol
>>
>>107598349
>between now and the end of time
well I could have told you that
>>
File: orz.png (1.03 MB, 1280x720)
1.03 MB
1.03 MB PNG
impossible to get an actual low angle orz
>>
so the new ani shitter strategy is to bump his thread with nonsense advice

>>107598428
>>
File: 1748505506798805.mp4 (3.86 MB, 2048x1152)
3.86 MB
3.86 MB MP4
https://xcancel.com/aisearchio/status/2001365588980175153#m
sovl vs sovless
>>
>>107598443
whocars just post bifusion here
>>
>>107598456
but those are both sovless
>>
File: 2MW.png (150 KB, 897x773)
150 KB
150 KB PNG
SOON™
>>
soon
>>
File: OTL.png (1.13 MB, 832x1248)
1.13 MB
1.13 MB PNG
>>
File: 1744231120338829.png (362 KB, 576x448)
362 KB
362 KB PNG
>>
>>107598488
her name is soon yon
>>
>>107598530
y so smol
>>
>>107598575
if her name aint soon base i dont care
>>
>>107598530
>>107598584
>>>/g/adt
>>
>>107598580
my gpu can only run sd1.5 at low resolution
>>
>>107598584
>>107598595
No, it's fine here. Good gen too.
>>
File deleted.
>>107598595
fuck off
>>
>>107598608
It's the schizo. See in the 70 posts that were IP nuked in that thread, it included redirects to /adt/ to shit up the thread. https://desuarchive.org/g/search/tnum/107570316/deleted/deleted/page/2/
>>
File: 1763581455933377.jpg (701 KB, 2000x1336)
701 KB
701 KB JPG
>>
>>107598599
i applaud your dedication anon
you tried sdultimate upscale? it divides the image up into smaller squares so you can gen larger res without needing the extra vrams
>>
>>107598616
>https://desuarchive.org/g/search/tnum/107570316/deleted/deleted/page/2/
damn, that's some serious mental illness
>>
File: 1735520266883826.png (3.42 MB, 1920x1088)
3.42 MB
3.42 MB PNG
>>107598617
>>107598366
these are sick. box/prompt?

>>107598599
based. sd15 was peak, it's been downhill since.
>>
>>107598615
get out schizo!
>>
>>107598617
>>107598629
>>107598648
guys I need you to tell the schizo to tell off too
>>
>>107598615
prompt?
>>
File: 1762218061397260.png (726 KB, 832x1248)
726 KB
726 KB PNG
>>107598652
>>
File: z_mod_00143_.jpg (1.09 MB, 1344x1728)
1.09 MB
1.09 MB JPG
>>107598648
Those floating swords are cool
>>
File: 1743064119841539.png (3.88 MB, 1336x2008)
3.88 MB
3.88 MB PNG
>>107598648
nice gen yourself
lora: https://files.catbox.moe/jelpf6.safetensors
wf: https://files.catbox.moe/epo278.png
the custom nodes are for resolution stuff so you can remove them
>>107598676
i should prune that dataset and retrain. its got a lot of nonsensical images
>>107598686
cool style anon
>>
>>107598666
nta but it's in the rentry
>>
>>107598749
Thanks, buddy.
>>
>>107598749
yes it unlearned the shape of things. i have briefly tried training zit on a strong style and it seems very prone to unlearning the shape of things before successfully training styles. maybe the datasets need to be higher quality than sdxl, maybe it's a setting issue. it's hard to find the right lora training inputs when each iteration takes so long to complete.
>>
File: disco_i2i.png (2.7 MB, 1664x1216)
2.7 MB
2.7 MB PNG
I tried anon's Disco Elysium lora with a simple 0.5 denoise i2i on an alt portrait of Octavia I made a while back. The art style is awesome, but it's also an uglifier (unsurprisingly). I just want painterly beautiful women for my CRPG portraits!
>>
File: 1765108483657514.png (2.51 MB, 1920x1088)
2.51 MB
2.51 MB PNG
>>
File: 1750091942698036.png (3.09 MB, 1336x2008)
3.09 MB
3.09 MB PNG
>>107598765
if your dataset has poor quality images or subpar captions, itll look bad. for example that dataset had images like https://files.catbox.moe/evwsef.jpeg but when the dataset is clean itll look really nice with little to no bad fingers or nonsensical text
but i also didnt use a regularization set
for settings, i just use the default onetrainer config
>>
>>107598749
i know no one wants to hear, this but with illustrious as a base i always got pretty great results with no captioning haha.
>>
>>107598780
>simple 0.5 denoise i2i
Would probably do what you want more if you used a cnet. Not sure if it's "still bugged" though >>107598064
>>
File: 1747545091884906.jpg (815 KB, 2000x1336)
815 KB
815 KB JPG
>>107598835
>i always got pretty great results with no captioning
youre right. but still, having captions is better than not imo
>>
File: derp.png (2.03 MB, 832x1248)
2.03 MB
2.03 MB PNG
>>107598848
2.1 speed is fixed, but it's very janky in general. This is the first time I'm trying it with a lora and there seems to be some kind of incompatibility. Apparently I'm not the first one to run into this kind of issue. https://www.reddit.com/r/comfyui/comments/1pf0aae/zimage_controlnet_workflow_with_character_lora/
>>
File: rescaled_00006_.png (294 KB, 416x608)
294 KB
294 KB PNG
>>
File: file.png (2.91 MB, 1344x1728)
2.91 MB
2.91 MB PNG
>>
>>107599005
Love that artist.
>>
>>107598284
Bahahahaaa

>>107598749
>the reposting is real >>107598774
Checking timestamps was too much bother till now, but my eyes are finally adjusting to the samefagging after the last couple of threads.

>>107590362
>>107590392
Neat. On my end Wai 13 was uniquely bad, 14 fixed all its shit, 15 regressed, and now 16 seems good again.
>>
>>107599032
>now 16 seems good again.
I might grab it then, I stayed on 14 till now, though it's not my main checkpoint.
>>107599020
I'm guessing she doesn't love us back considering her Instagram says AI学習NG
>>
>>107599032
>Checking timestamps was too much bother till now
4chanX and XT have an option to always show relative timestamps which helps.
>>
File: ComfyUI_10126_.png (1.57 MB, 1152x1152)
1.57 MB
1.57 MB PNG
>>107598184
>Spark Chroma

I still maintain that HD Flash is all you need.
>>
>>107599073
I'm willing to believe you but don't you have cuter girls
>>
File: ComfyUI_00008_.png (1.21 MB, 1000x778)
1.21 MB
1.21 MB PNG
>>
>>107599073
those thighs look so softu...
>>
File: ComfyUI_10134_.png (1.19 MB, 1152x1152)
1.19 MB
1.19 MB PNG
>>107599103
But of course anon-kun. Also leave it to Comfy to completely remove Queue, place it under some obscure option and refactor an UI used by autists.
>>
>>107599255
I'm afraid I like ZiT more
>>
File: ComfyUI_00025_.png (3.17 MB, 1728x1344)
3.17 MB
3.17 MB PNG
>>
>>107599302
Damn that looks nice. Is it z lora?
>>
>>107599320
miruku v1 no lora
Prompt: https://files.catbox.moe/sjm69b.txt
>>
File: file.png (2.58 MB, 1728x1344)
2.58 MB
2.58 MB PNG
I accidentally forgot to change my aspect ratio to portrait and got this dorky pose
>>
File: file.png (1.55 MB, 1000x778)
1.55 MB
1.55 MB PNG
>>
>>107599386
Great style
>>
>>107599386
The pose and the smirk go together very well.
>>107599415
Spooky.
>>
File: 1741035781471745.png (3.34 MB, 1336x2008)
3.34 MB
3.34 MB PNG
>>
File: ComfyUI_10156_.png (1.79 MB, 1152x1152)
1.79 MB
1.79 MB PNG
>>107599280
Both are great for what they're good at. What I really like about Chroma is that it's great for guiding the attention towards the exact gen that I want, even if it may take several tries or the background may be fucked. That is the advantage of not being as rigid as Z. Chroma does often fuck up a lot and often just needs better prompt engineering or optimization on those gens that it messes up. But then again, what Chroma needs is a proper finetune (same as Z).
>>
>>107599433
1girl, big breasts
it's that shrimple
>>
>>107599496
meant for >>107599483
>>
big
https://blog.google/technology/developers/t5gemma-2/
https://huggingface.co/collections/google/t5gemma-2
>>
File: resurrected_replayer.png (2.51 MB, 1248x1824)
2.51 MB
2.51 MB PNG
>>107599433
got it from anon a while back https://civitai.com/models/1586526/low-poly-ps1-psx-render-style-illustriousxl-noob-lora
>>
>>107599531
>T5Gemma 2 is more than a re-training. It
slop
>>
>>107599531
that shit probably knows 0 potentially naughty words
>>
>>107598284
uooooohhh dev harrasment
>>
>>107598617
i wish i remembered what someone called these at one point. i though it was "cancerheart" but now i don't remember
>>
>>107599615
nta but called what? back in the 00s some of us used blended weird images like that in forum signatures and iirc called them cinema4d images or something but that may not be what you were thinking of
>>
File: ComfyUI_10160_.png (1.91 MB, 1152x1152)
1.91 MB
1.91 MB PNG
>Still no Alibaba music model

It's over isn't it bros
>>
>>107599665
i just want a good voice model to generate perfect joi wavs in my native language
>>
File: file.png (1.27 MB, 1000x785)
1.27 MB
1.27 MB PNG
>>107599623
>back in the 00s some of us used blended weird images like that in forum signatures
yeah that's what i was thinking of. i remember someone on /wg/ calling it "metalheart" several years back, that metallic 2000s 3d bursting fractal kind of style
>>
>>107598900
I'm stupid and forgot to re-set denoise to 1. ControlNet and loras work fine together. Still ZiT ControlNet is hard to get good results out of, very finnicky.
>>
File: 1736157152801211.png (3.64 MB, 1336x2008)
3.64 MB
3.64 MB PNG
>>107599615
>>107598749
yeah metalheart
it had a resurgence maybe a half decade or so ago which is where i found most of the training data because they are highres compared to period pieces
i want to double the size of the set its only ~100 images
>>
>>107599773
forum signatures were kino, I miss the 00s
>>
>>107599723
is there any good option for whispery english and japanese language nsfw voice gen?
>>
why do you have two threads again degenerates
which one is correct
>>
>>107599898
The other one was on page 10 until someone bumped it mere moments ago.
>>
>>107599898
Just use whichever one you like anon.
But if you want to know why people are using this one, it's because threads without https://rentry.org/animanon in the OP are made by a ban evading spammer
You can see his 150 IP-nuked posts in those threads
https://desuarchive.org/g/search/tnum/107570316/deleted/deleted/
https://desuarchive.org/g/search/tnum/107565575/deleted/deleted/type/posts/
https://desuarchive.org/g/search/tnum/107557293/deleted/deleted/
>>
File: ComfyUI_10180_.png (1.98 MB, 1152x1152)
1.98 MB
1.98 MB PNG
>>
>>107599898
None. We need to bake a third one.
>>
File: z_mod_00155_.jpg (914 KB, 1344x1728)
914 KB
914 KB JPG
>>
>>107599945
That's really good, is it a style prompt or lora?
What is ``z_mod''?
>>
>>107599917
Seems to me more like someone would have to be mentally ill to include drama irrelevant to everyone else in the op. Literally I give zero fucks about your worthless life and who you are having some internet "fight" with
>>
>>107599945
really good
>>
>>107599904
stop nooooticing
>>
>>107599951
making 70 spam posts per thread is not the same as having internet fight with someone. you're entitled to your opinion but we've had maintain thread quality rentries for over 1000 threads
>>
File: 1760384914938402.png (659 KB, 1662x1580)
659 KB
659 KB PNG
https://xcancel.com/bdsqlsz/status/2001895667955012047#m
what is that?
>>
File: 1752762555059374.png (142 KB, 1832x753)
142 KB
142 KB PNG
https://github.com/comfyanonymous/ComfyUI/pull/11408
Comfy has his hands on Qwen Image Layered lol
>>
>>107599982
likely his lora/finetune training interface.
>>
File: z_mod_00174_.jpg (780 KB, 1344x1728)
780 KB
780 KB JPG
>>107599949
>>107599958
Giger lora for Z
>>
>>107599993
neat
>>
>>107599982
>>107599991
looks like he's comparing the loss function of his loras compared to something else, and that something else is probably an older version of base
>>
>>107599993
hype
>>
I want to open Civitai but I know it'll crash my computer at this very moment.
>>
>>107600070
anon must be training a lora
>>
File: 1759937487675440.png (3.37 MB, 1344x1728)
3.37 MB
3.37 MB PNG
>>
>>107597478
Do you guys have coil whine while genning? Is there anything that can be done about it?
>>
File: 1756054143758204.png (2.35 MB, 1024x1344)
2.35 MB
2.35 MB PNG
okay. imagine this. imagine if you could combine two or more loras to achieve two or more concepts at once.
>>
>>107600148
what two loras is that image?
>>
File: file.png (1.38 MB, 832x1248)
1.38 MB
1.38 MB PNG
>>
>>107600148
There's a lora block weight node for Z that aids with this. Some have said rescale helps as well (for older models).
>>
File: file.png (3.35 MB, 1536x1536)
3.35 MB
3.35 MB PNG
>>
File: file.png (1.65 MB, 832x1248)
1.65 MB
1.65 MB PNG
>>
>>107600172
the block loader helps but it's not enough. i think only proper base training will remedy this.
>>
ZIT is saved.
>>
>>107600243
i was waiting for this
>>
File: file.png (3.21 MB, 1536x1536)
3.21 MB
3.21 MB PNG
>>
>when you're too busy jerking off during the photoshoot
>>
>>107600309
i would be if i was in the same room as those middle qts
>>
File: z_00088_.png (752 KB, 1024x1024)
752 KB
752 KB PNG
>>
>>107599917
why is that unrelated shit in op in the first place
i can't see ips of the deleted posts

i don't care about your schizo wars you can put in op whatever you want just stop making multiple threads
>>
>>107600318
I didn't even prompt for 1girl. Using nag from now on.
>>
>>107600342
>unrelated
Stop concern trolling, /ldg/ being spammed is relevant to /ldg/.
>making multiple threads
Well I didn't make this one, and we can't stop the spammer from making duplicate threads. The best way would be to make him realize everyone knows who he is and he won't get rich off of commercial licenses by spamming /ldg/. Using the threads with warnings about him is a good step towards that.
>>
>>107599951
>>107600342
free to fuck off to leddit, you dumb nigger.

>>107600144
weirdly only when using a local llm. genning pics/vids seems fine on my 5090. don't think there's anything concrete that can be done about it.
>>
What's the point of all those
>imaginary artificial AI influencer girl #2458
loras on Civit?
>>
>>107600342
The end result is having someone's shitty github project featured in the OS but by proxy also means endlessly shitting up the threads with pointless beef over his competition
I don't care either but they're making me care.
>>
>>107600377
I find it fruitless to ponder such questions
>>
>>107600369
> and we can't stop the spammer from making duplicate threads
which number is smaller
107597478 or
107596792
???

>>107600374
consider doing kys to yourself
>>
>>107600400
He's made tons of duplicate threads with higher numbers in the past couple of weeks. You want to stop him? Make him understand spamming and samefagging is fruitless. Concern troll.
>>
>>107600400
Considering that other thread was made very early in an attempt to do a bit of hijacking
Yeah kill yourself
>>
>>107600406
if he is bad and spamming why do you behave the same?

> Concern troll.
you are schizo just like >>107598443

>>107600423
> other thread was made very early
can you prove that
>>
>>107600449
(You)
>>
>>107600441
>don't bother
We are in agreement.
>>
>>107599904
>>107599967
aaand it happened again
>>
File: z_00091_.png (1.13 MB, 1024x1024)
1.13 MB
1.13 MB PNG
>>
what rank/steps do people use for chroma (person) lora training?
>>
>>107600517
>steps
Just keep going until it looks right or has failed (outputs start to degrade despite not having learned the desired theme).
>rank
I don't know
>>
I fell in love with zit's sameface.
>>
File: file.png (1.13 MB, 864x1152)
1.13 MB
1.13 MB PNG
>>
File: file.png (961 KB, 1280x720)
961 KB
961 KB PNG
>>
File: z_00546_.png (1.21 MB, 1024x1024)
1.21 MB
1.21 MB PNG
>>
Would you buy a 5090 over other options? I got a used 4090 because of the 5090 fire stories, not sure how much of a concern they should really be.
>>
>>107601011
If I had the money and I would buy a 5090 and run it power limited <400W.
>not sure how much of a concern they should really be.
Not very high but still a lot higher than they should be
>>
>>
>>107601011
a used 4090 is the same price as a msrp 5090 so...
>>
File: ComfyUI_03859_.png (1.2 MB, 784x1440)
1.2 MB
1.2 MB PNG
>>
>>107601331
oh fuck you
>>
>>107597525
Scail is really good but it's limited to kijai for now with a shitty pose estimator
>>
Trained a DALL-E 3 style lora for each rank 8 32 128 same dataset same other settings. 100 1024p images.

Verdict: If you actually care about making the best LoRA possible, pick the biggest rank possible that you think you or other people will want to or be able to load in vram. There shouldn't really be a limit. Preferably also don't downsample your images, especially for z image turbo, where I think that destroys texture details, so I trained all of them on only 1024p resolution in ai-toolkit.

32 rank is ok, but the jump from 32 to 128 is similar to the jump from 8 to 32.
At 128, the model finally is able to really learn the small things about your dataset, the concepts/items seen only on one or two images. Basically, the model just gets it. It generalizes everything much better, generated images feel much more natural, they feel less like a model that is just kinda trying to mimick the style by focusing on the main points of it but instead it understands the broader scope of it.

The composition of images is better, the poses are more dynamic, the eyes are more focused on things that make sense to focus on. The image just feels more correct and natural.

I would say that the only negative, aside from file size, is that larger rank loras overfit somewhat slower to where you want since they are learning wider things.
>>
>>107601575 (cont)
Misc suggestions, aside from the default settings in ai-toolkit https://www.youtube.com/watch?v=Kmve1_jiDpQ:

Differential guidance in ai-toolkit also seems a bit weird to me, hard to say if it's better, I think without it training is more stable and focused and can converge better overall
Always train with the unquanted text encoder because it will be removed from vram after the initial encoding is quickly finishes if you have "Cache Text Embeddings" enabled
At least for ZIT Keep training for any amount of steps and only stop after your sampling images are broken for like 1k+ steps in a row and are going completely crazy. Since the model can break itself near the start but recover later, which I assume might be because it's still somewhat distilled even with the ostris dedistillation adapter
Again, don't downsample your images when training ZIT, only select the highest resolution that your dataset is at to keep more texture details
I disable sampling in ai-toolkit completely since it has different generation settings than comfyui, giving you a false view of things. Instead train a model to a point and test it in comfyui yourself.
Also try testing your lora in comfyui around strength 0.7 instead of just 1, I'm not sure if there is a bug or not but ~0.75 strength seems like what should be strength 1 no matter what settings I use or how many steps does the checkpoint I'm testing have (although this may be a problem with my specific lora dataset etc but I doubt it)
Don't increase learning rate for ZIT

Other general advice everyone should already know is that dataset is the biggest factor, just remove bad images, low resolution images etc. Tag them in simpler tag captions if you want more seed variety in your lora
Maybe you can train the first iteration of your lora on lower resolution first to see if the model overfits too much on a specific concept you don't want in your lora so you can caption it in your dataset but thats it
>>
>>107601575

I feel like it definitely comes down to how many images you have as well. Mine so far have been trained with 10-15 HQ images with different angles and I think it was all usual AI-Toolkit settings and done at like 2K steps using a 32 rank.
>>
>>107601575
Thanks for sharing your results anon, I have a questionable quality style dataset (It's like 45 images but has a few that aren't quite the same style as the others) that I failed to train ZiT on effectively, I'm going to try with 128 rank now. For Illustrious it seemed like high rank isn't really necessary to train styles. I'm I'm also going to try not downsampling. I'll remove the bad images if this isn't enough.
>>
>>107600158

Love this!
>>
>>107601647
You can get a good lora while gimping a lot of settings, train at 512 instead of 1024+, at fp4-8 instead of unquanted, with differential guidance on, for low steps like 3k or less, at rank 16 or less, with improperly tagged images and you will still get a good lora with ZIT, but the point is without gimping any of these, the lora will still be much better, and you can only see it when not gimping these settings and being able to compare.
>>
>>107601637 (cont)
Also obviously train on the unquanted model or highest quant that can fit into your gpu, and for bigger models like Qwen Image or future big models that can't fit, if you care about your lora train at least at fp4 with tensor offloading in ai-toolkit, although personally i would rather wait a few days and train at even higher quants with more offloading.

When testing in comfyui or future end-user uis generate at least 4-8 different seeds for each different prompt, have 10+ testing prompts that also include more niche things from your dataset and generally use prompts that are quite different from each other. Include a couple of basic very mundane prompts too in order to see if your lora destroys the models general knowledge of other things, i like to include just the trigger word or see the gens of empty prompts too.
Test at different resolutions to see how the model generalized.
Test every 500th step checkpoint with these 10+ prompts x 4-8 images for each checkpoint, then compare them all, then pick the best candidates and add +-250 step checkpoints around them to the testing mix, then test on more prompts or with different seeds to pick the winner.
>>
>>107601647
prompt?
>>
just ignore the above
>>
>>107601810


photograph of a woman with green eyes, black mascara and eyeshadow, straightened light brown hair and matte dark red lipstick. She is smiling as A boudoir-style photo of her from behind kneeling on a bed covered with a dark red velvet spread. She is looking directly at the camera. She is wearing red lace bra and thong. The background is slightly out of focus, showing pillows and a decorated christmas tree. Her nails are painted red as well. She is wearing a santa hat.


You might not get the exact results because I'm using a custom lora in the WF.
>>
no amount of samefagging and gaslighting will stop me from posting in the official thread with the debo and trani rentries. get a life
>>
File: 1861906852.png (962 KB, 832x1216)
962 KB
962 KB PNG
>>
File: ComfyUI_00034_.webm (3.71 MB, 960x720)
3.71 MB
3.71 MB WEBM
not sure what the motivation behind this was, to tell the truth
>>
>>107599723
>i just want a good voice model to generate perfect joi wavs in my native language
I am once again reminding you anons that my prediction is going to be correct and the best local audio model is going to come in the form of a video model with audio (and if you want audio only you'll just generate videos at a tiny throwaway resolution)
Pretty impossible to refute that the JOI/ASMR/fart/other sound fetishes aren't going to be fully solved when we know that this was already solved with Sora 2 audio+video
>>
Reee, why are these not ordered in the order you select them and group?
>>
>>107601575
I went from rank 32 to 64, and put mixed resolution. Test run done soon.

>>107602032
Deanna Trois' mother doing some Michael Jackson dance?
>>
>>107601840
nice. can you share the lora for this pose?
>>
>>107601575
>>107601637
>>107601808
these posts are useless if you don't provide any results examples anon, jeet reddit tier post

>>107601647
This anon just proved you wrong with a single image
>>
>>107602032
is wan animate finally stable? any good workflow?
>>
>>107602073
Holy shit I did it.
>>
>>107602205
not sure what you did to be desu
>>
>>107602221
With a bunch of integers,set/get nodes, i grouped up all the important shit you need to change into Group Nodes.
>>
what's the goto for images with openpose
flux2 supports a lot of shit but it doesn't support that, does it?
>>
What local llm models can I use that doesnt suck at vibecoding? I ran outta credits :(
>>
Fuck, I'm so close.
Why am I getting these errors when using the sampler/scheduler selector nodes and have unipc/simple selected? Why error out for the shit I'm not using?
>>
just published 3 new nodes!!!! :DDD
>>
>>107602476
cool anon, please share what do they do?
>>
>>107602481
uhmm no i'll get doxed :S!! I'm very happy with the results tho!!!! hooray for me!!!!!!!!!
>>
>>107602492
are you seriously publishing AI-related content under your real name? you do you i guess
>>
>z image default template on comfyui bf16 model
>takes 18ish seconds for each picture

Can I get this even lower without affecting quality too much?
>>
can i use images to create a video lora for wan? i want videos of something that doesnt exist, but i can gen images for it for training data
>>
>>107602525
just gen the images and then i2v instead?
>>
>>107602525
you can. how well it works will be up to the base model and some luck.

>>107602385
Qwen Image Edit with
https://huggingface.co/InstantX/Qwen-Image-ControlNet-Union

I think some versions of flux2 also had control over poses but I think it wasn't with openpose but just a reference image?
>>
>>107602544
it alters it too much when turning and stuff
>>
>>107602426
>5000 errors
>I'M SO CLOSE
>>
File: 1757726501926303.png (748 KB, 1058x934)
748 KB
748 KB PNG
Peak just dropped
https://x.com/chengzeyi/status/2002037174531981722
>>
>>107602426
change scheduler to karras
>>
>>107602601
God damn that female mountaineer has bushy beard. Perhaps Q4 quant is too much
>>
>>107602426
I have fixed this exact error before.
You need to go into the code of the node that's feeding a scheduler value to your KSamplerAdvanced node. And find the definition for the enum type that's the list of all scheduler names. And add the missing one(s). In this case beta57.
>>
>>107602601
>to acceleration
holy esl
>>
>>107602601
This is literally my idea lol
>>
File: 1743528900776540.png (311 KB, 1652x1246)
311 KB
311 KB PNG
>>107602601
cute chinaman grammar, checked his github and it's made by the same guy who made stable-fast, seems promising
>>
>>107602739
I'll take anything that replaces torch compile
>>
>>107602601
Based
>>
OH SHIT
https://huggingface.co/Qwen/Qwen-Image-Layered
>>
File: 1736154916372072.png (746 B, 61x47)
746 B
746 B PNG
>>107602861
uh oh
>>
>>107602861
bro wheres QIE 2511? its 2512 already BRO WHRE IS IT?!?
>>
>>107602861
How is it with lora compatibility?
>>
>>107602861
I am skeptical, because I'm a cynical dick
does it actually separate layers or does it just draw new ones that are mostly the same but not really
>>
>>107602861
Has anyone got any cool use case ideas for this?
The concept is really exciting, but I really don't know what I want to do with it.
>>
>>107602924
take a wild guess
>>
>>107602995
bro why so asshurt? it's using a VAE, so it's never gonna be 1:1
>>
>>107602861
I want to try this but also fear updating my comfyui will break everything
also heard it's 6x slower than normal edit
>>
>>107602928
its just a more advanced and generic adetailer or background remover. it would be really convenient to have in your workflow if you are often photoshopping things together to create your scene
>>
>>107603073
>also heard it's 6x slower than normal edit
you have to be kidding me...
>>
File: file.png (69 KB, 857x323)
69 KB
69 KB PNG
what is this lora kijai just uploaded?
>>
>>107603209
probably the re-distill or de-distill lora, but in his autistic intepretation
>>
File: WanVideo2_2_I2V_00548.webm (2.02 MB, 1248x704)
2.02 MB
2.02 MB WEBM
>>107597620
>>
File: ZJValLoraComparison.jpg (1.89 MB, 2688x1728)
1.89 MB
1.89 MB JPG
Used Justina Valentine as a test subject just to verify Z-Image was actually capable of doing unique looking white women without any of the weird distinct Chinese idea of what white women look like (present in every Chinese T2I model ever made to some extent) creeping in. I'd say it passed pretty nicely.

Attached pic is the exact same seed / prompt with only the sampler and scheduler changing, just to make the point that changing those can make quite a big difference on Z but it's somewhat subjective which is best (can depend a lot on the subject matter, etc)

Direct Lora DL link if anyone wants it (it's file.io, I guess Limewire the company still exists and owns them or something though lol)

https://limewire.com/d/LfHMZ#aRwUUjkbYk
>>
File: 1753981021417599.png (577 KB, 1702x668)
577 KB
577 KB PNG
>>107602861
>https://huggingface.co/Qwen/Qwen-Image-Layered
lemao
>>
>>107602861
>each layer can be independently manipulated without affecting other content.
sweet
>Meanwhile, such a layered representation naturally supports high-fidelity elementary operations-such as resizing, reposition, and recoloring.
Is that all it can do? Or does it have similar capabilities to Qwen Image Edit?
>>
>>107603351

Catbox for the image pretty please?
>>
What the fuck did they do with comfy ui? Why remove functionality
>>
>>107601575
>>107601637
>>107601808
DALL-E bro.... why did you not attach any example comparisons..... DALL-E bro.....
>>
>>107603523
comfyui is compromised
>>
>>107603281
>in his autistic intepretation
More like in the format that actually works with ComfyUI, since the people who trained that lora invented a new format that doesn't work with comfy OR diffusers.
inb4
>why do they reupload, stolen valor, botting downloads, dragged and shot, etc
>>
>>107603523
>>107603566
anon why are you using ran's thread? stop placating him. go to the real thread
>>
>>107603351
>limewire
heh
>>
Any linux AMDfags here? How do I fix this error in SwarmUI?
[ComfyUI-0/STDERR] RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
[ComfyUI-0/STDERR] [W1219 12:04:50.680322652 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
2025-12-19 12:04:50.881 [Error] Self-Start ComfyUI-0 on port 7821 failed. AutoRestart ignored as this was an initial launch failure.

I already did
pip install --pre torch torchvision torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx110X-dgpu/

I have a 7900xtx.
>>
>>107603354
>a100
HOLY BROWNPOOR
>>
File: zit wave.jpg (834 KB, 2048x1344)
834 KB
834 KB JPG
>>107602739
I checked that guy's wavespeed nodes. And on same seed I get differrent result, but the artifacts are gone. I'll try the caching methods.
>>
>>107602647
I'll give that a try. I abandoned the idea since I rarely change these for video, but it will be useful for image.
>>
>>107603702
I SWAPPED THE LABELS AGAIN. The left one is Wavespeed
>>
>>107603714
this is a troll please igonre, the bottom one is wavespeed.
>>
>bottom
at least try
>>
>>107603714
meh, why not just use cachedit?
>>
>ApplyFBCacheOnModel
>No double blocks found for NextDiT
gah
>>
File: AnimateDiff_00001.mp4 (2.86 MB, 720x1280)
2.86 MB
2.86 MB MP4
>worked on workflow for hours
>making use of set/get nodes and Grouped Nodes
>it all just werks
>reboot comfyui
>set/get nodes inside the grouped nodes are now not working unless it's all ungrouped back into nodes

Raging, but the broken result I got was kinda cool.
>>
W1219 18:35:48.223000 7724 Lib\site-packages\torch\_dynamo\convert_frame.py:1358] [4/8] function: 'execute' (C:\Users\Gaming\Documents\ComfyUI\ComfyUI\comfy\patcher_extension.py:107)
W1219 18:35:48.223000 7724 Lib\site-packages\torch\_dynamo\convert_frame.py:1358] [4/8] last reason: 4/0: Cache line invalidated because L['self'].original got deallocated
W1219 18:35:48.223000 7724 Lib\site-packages\torch\_dynamo\convert_frame.py:1358] [4/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W1219 18:35:48.223000 7724 Lib\site-packages\torch\_dynamo\convert_frame.py:1358] [4/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html
Anyone else's torch fucking up?
>>
>>107603817
You are bit too opinionated to give this sort of advice. AI doesn't "suck ass" on Windows. It does suck for brainlets like you.
>>
File: 1744036842442520.png (11 KB, 462x125)
11 KB
11 KB PNG
>>107603775
>>107603786
>>
>>107603831
Stop using group nodes nothing supports them anymore. Update Comfy. Use subgraphs. Keep wireless nodes (set/get, use everywhere) outside of subgraphs/group nodes.
>>
>>107603933
I read that the set/get nodes inside the group was buggy and having them outside works for me now. I just hide them behind the node.

Subgraphs doesn't let you have values visible in the node, forced to go inside the subgraph.
>>
>>107603945
>Subgraphs doesn't let you have values visible in the node, forced to go inside the subgraph.
If I understand correctly what you're trying to do. Just connect the value to a subgraph input. Alternatively, right click on the value widget, and then click Expose Widget (or something).
>>
File: ZiMG_01480_.jpg (470 KB, 1344x1728)
470 KB
470 KB JPG
>>
File: 1748480195424144.jpg (2.05 MB, 2112x1152)
2.05 MB
2.05 MB JPG
>>
>>107603951
Oh wow, alright that works too. But it seems you're unable to rename the exposed widgets, all named 'value'. Guess it's time to google. Thanks.
>>
>>107604022
Yes, you are able to rename them.
>>
File: 1743009240581647.jpg (36 KB, 258x330)
36 KB
36 KB JPG
where the FUCK is zimage-base and edit???
>>
i could buy myself a 5090 for christmas. but will faster gens fill the hole in my heart?
>>
>>107604046
no
>>
>>107604029
Geez, rightclicking the tiny input slot and then selecting rename..

Thanks, open source.
>>
what would happen if i only had training data of a single person for a video lora, the videos are high quality though. would it cause wan to morph the face and other features to look like the person in the training data?
>>
File: zzzz_00052_.png (1.16 MB, 800x1200)
1.16 MB
1.16 MB PNG
>>107604049
>>
>>107603447
For both versions, all of this applies:

Seed: 968966137502078
Prompt: "a close-up digital photograph of a sultry Caucasian Justina Valentine woman holding a microphone while hosting a television show. She has red hair and blue eyes and she is wearing heavy makeup."
896x1152 base gen @ 9 steps / CFG 1 -> 1.5x upscale using DATx4 as the upscale model -> second denoise pass on the upscaled image, again @ 9 steps / CFG 1 but with 0.4 denoise strength.

This is just with two KSamplers that have the same everything settings-wise (including seed) but different denoise, and the upscale node in between them.

Lora model strength with LoraLoaderModelOnly (since Z Loras don't have a CLIP component at all) was 0.8 both times.
>>
cozy breas
>>
comfy thread
>>
>>107604046
yes
>>
>>107603954
wow very cool anon
>>
>>107604149
>(since Z Loras don't have a CLIP component at all)
?
>>
>>107604046
nothing will, just buy the 5090 and gen 1girl, standing until you perish
>>
1girl, standing
*clicks coom*
>>
1girl, large breasts, standing
is more my style
>>
Remember when anon said textgen on images was a mememark
>>
>>107602696
Far from the first time a jeet has taken ideas or code from this thread. At least this time it's not paywalled.
>>
File: 1760039534447039.png (310 KB, 512x512)
310 KB
310 KB PNG
>>107603817
>>107603833
Thanks, it seems to be working after following this
https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/advanced/advancedrad/windows/comfyui/installcomfyui.html
Dunno why SwarmUI doesn't work. I have no idea what the fuck I'm doing though. Gonna go watch some tutorials.
>>
>rocm
>>
>>107604260
>(huge penis, flaccid, foreskin, excessive pubic hair:1.3)
>>
reminder there is still a thread up after this
>>107596792
>>107596792
>>107596792
>>
hmm nyo
>>
hmmm, nyo
>>
>>107604371
>>107604379
is there a particular reason you wouldn't fill an /ldg/ thread that's on-topic?
>>
>>107604356
What's wrong with it?
>>
Hey So basically I'm just gonna not use it I Know..... UGH I know... It's just that I'm not gonna take it is all
>>
>>107604399
It's missing the reminders that we hate attention whores and avatarniggers
>>
File: ZImageTurbo_Output_161631.png (2.52 MB, 1344x1728)
2.52 MB
2.52 MB PNG
>>107604249
It's not a thing unless the text encoder is being trained and the text encoder in question is also specifically CLIP and not anything else. Meaning it's not a thing on anything past SDXL and SD 1.5 really.

Like AFAIK even if you did train Qwen somehow along with Z's model part while doing a Lora, there'd be no way of properly saving the text encoder aspect to a normal Comfy-compatible Lora safetensors file, I don't think the format has support for non-CLIP text encoders at all.
>>
>>107604411
oh ok that's fair. you do you, this is 4chan
>>
posted up the lora on civit for the anon that was asking
>>
>>107604422
uhhh
>>
>>107604422
She needs to be fatter and way older to attract the Indians.
>>
>>107604422
left < right
>>
>>107604411
>>107604418
but you are one and this falls under off-topic

>>107604446
kek
>>
>>107604422
what is this Lora exactly, like what's the comparison here lol?
>>
Fresh when ready

>>107604458
>>107604458
>>107604458
>>
>>107604413
you can merge it into the model and publish that although no many would want to download it
>>
>>107604461
thanks baker, i'll move over in a bit, just gonna chat up this thread for five more minutes
>>
note to self: post a brown woman for engagement

>>107604450
blackberry photo style
>>
File: file.png (2.47 MB, 1024x1024)
2.47 MB
2.47 MB PNG
>>
>>107604461

>>107596792
>>107596792
>>107596792
>>
>>107604413
That makes sense, I'm sort of a retard and the way OneTrainer presents everything (or perhaps my lack of reading) makes it seem like you are in fact training the Qwen encoder. But what you said makes more sense.
>>
>>107604486
Fake thread
>>
File: 1731914810442280.jpg (61 KB, 744x475)
61 KB
61 KB JPG
>>107604422
>>107604446
>>
File: file.png (1.69 MB, 1024x1024)
1.69 MB
1.69 MB PNG
>>
>>107604472 (me)
alright i'm ready to move over to
>>107604458
>>107604458
>>107604458
>>
>>107604422
based
>>107604502
kek
>>
>>107604511
stop instigating flamewars
>>107596792
>>107596792
>>107596792
>>
File: file.png (1.74 MB, 1024x1024)
1.74 MB
1.74 MB PNG
>>
>>107604473
this lora is poisoned
>>
>>107596792
>>107596792
>>107596792
>>
>>107598648
prompt/model for picrel?
>>
So I'm fairly new to this and it feels like my prompt is getting a bit long for what I'm trying to achieve..

I'm basically describing every part of the image, subjects hair, clothes, body, image style etc I'm at around 230 words right now. Is there a better way of doing this? Should I be doing multiple prompts? Is that even a thing? what i mean is that I first generate the person then generate the clothes and so on?
>>
>>107603678
he's just renting it on runpod or something
>>
>>107603845
nope, sorry
>>
How come the wan2.2 rentry isn't linked in the op? Are newfags mean to use that wan2gp thing now or something else?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.