[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1749766932296056.jpg (1.89 MB, 2954x2552)
1.89 MB
1.89 MB JPG
Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107687569 (Cross-thread)

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2485296
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg
>>
>>107693072
third time's the charm kej
>>
blessed thread of frenship
>>
>>107693072
Why is AniStudio not in OP?
>>
File: collage.jpg (3 MB, 4607x3095)
3 MB
3 MB JPG
>>
Are there any good detailer loras for z-image?
>>
>>107693142
>xbox one
>>
cozy bread
>>
>>107693354
No one knows, yes still king, and very well
>>
>>107693354
>What happens to Z image base
it's inference code PR got merged 4 days ago, for the new version of Qwen Image Edit, the PR got merged 2 weeks before they released the model, make that what you will
https://github.com/huggingface/diffusers/pull/12857
>>
File: 1741413748506572.jpg (804 KB, 1920x1080)
804 KB
804 KB JPG
>>
File: 1741752823950154.jpg (349 KB, 1344x1240)
349 KB
349 KB JPG
>>
File: 1759027773787847.jpg (366 KB, 848x1280)
366 KB
366 KB JPG
>>107693479
not interested in your schizophrenic drama. just posting zit gens.
>>
File: 1738071918997894.png (92 KB, 666x307)
92 KB
92 KB PNG
lol
lmao
>>
>>107693469
cute migu!
>>
File: file.jpg (348 KB, 1344x1056)
348 KB
348 KB JPG
>>
i tried all the z-image lora training toolboxes in order to get it running with my shitty 8gb card and only this one
https://github.com/shootthesound/comfyUI-Realtime-Lora
makes it possible for me to train @4gb vram with good results. what did he do differently?
I tried to replicate the config with onetrainer but i always end up with higher vram and massively worse results
ai-toolkit was the worst
>>
what's the difference between kijai's wan2.2 unets and the regular ones?
>>
I vaguely remember BFL announcing a video model, or at least hinting to it.
Do I remember badly or are they doing something?
>>
>>107693691
they did at some point but i think they gave up after the chinese models like Wan got released
>>
anyone have benchmarks for sdcpp and how it compares to comfy?
>>
>>107693072
Based.
>>
>>107693469
great gen
>>
Do I retrain a previous lora (because last time it wasnt nearly enough steps) or do I train on a fresh new exciting dataset hmm...
>>
File: 1756415429576962.png (888 KB, 2432x1664)
888 KB
888 KB PNG
>>
>>107693918
>it wasnt nearly enough steps)
just continue from the last epoch
>>
File: rin.jpg (491 KB, 958x1400)
491 KB
491 KB JPG
>>
>>107693962
catbox?
>>
File: ZiMG_01652_.png (2.75 MB, 1344x1728)
2.75 MB
2.75 MB PNG
>>
Am I just retarded or how are you supposed to prompt these things
I've tried using qwen 2511 and flux 2 dev, and neither seem to be capable of this edit. Also tried with inpainting on qwen and couldn't get it to even do anything to the hair clip
>>
File: ZiMG_01663_.png (3.2 MB, 1344x1728)
3.2 MB
3.2 MB PNG
>>107694039
>>
>>107693115
agreed, I only use AniStudio because the dev understands what he's doing and it doesn't have use shitty python
>>
>>107694139
>conan isn't using python
>>
File: zimg_0179.png (1.51 MB, 848x1280)
1.51 MB
1.51 MB PNG
>>107693551
not for nothing anon you can read the documentation yourself and see that this uses musubi tuner under the hood; you can probably just run musubi tuner with these settings:
https://github.com/shootthesound/comfyUI-Realtime-Lora/blob/main/musubi_zimage_config_template.py
>>
File: ZiMG_01676_.png (2.83 MB, 1344x1728)
2.83 MB
2.83 MB PNG
>>107694090
>>
File: 1737139411746294.jpg (1017 KB, 1536x1536)
1017 KB
1017 KB JPG
>>
>>107694204
yeah I also think musubi might be the key difference here. honestly i only tried to replicate the settings with the musubi gui which is lacking quite a lot of options.
i think it might be the "blocks_to_swap" option, haven't seen that anywhere else so far.
>>
File: 1761242797327290.jpg (1.39 MB, 1248x1824)
1.39 MB
1.39 MB JPG
>>
File: dmmg_00029.png (1.66 MB, 960x1280)
1.66 MB
1.66 MB PNG
>>107694281
that could very well be it, as that would be directly tied to how many blocks to swap out of vram into your machine ram. generally block swapping will prevent you from going OOM, but it slows down training quite a bit.
>>
>>107694368
yeah it's slow alright. it takes hours to finish.
but at least it runs and the results are solid...
>>
File: ComfyUI_00682_.png (1.7 MB, 1480x1128)
1.7 MB
1.7 MB PNG
>>
why can't comfy save normal fucking FP8 now. Everything is scaled without asking. Torch 2.9 can't compile fp8 on 3090 if there are scaled tensors. Even if you patch triton. Torch 2.7 worked.
It's like there is a silent cabal of faggots who conspire to break older GPU work arounds and then ignore you when you ask about it.
>>
File: ComfyUI_00526_.mp4 (732 KB, 1200x720)
732 KB
732 KB MP4
the WAN keyframe template is alright I guess. Still waiting for something like qwen-edit for video
>>
>>107694587
Google had this since 2024. can't believe open source still has no answer
>>
>>107694039
yes, please, moar. I need to get into the right mindset to abandon the (oldshit) I am experienced with and start fresh with Z-image. harder than I thought.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.