[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models and UI

Prev: >>106661715

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Neta Lumina
https://huggingface.co/neta-art/Neta-Lumina
https://civitai.com/models/1790792?modelVersionId=2122326
https://neta-lumina-style.tz03.xyz/

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbours
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
Cookies & Cream
>>
>>106666599
pretty cool collage
>>
File: 1751448759653009.png (194 KB, 2610x1210)
194 KB
194 KB PNG
For those who have missed that out, maybe we'll get something better than Wan
https://byteaigc.github.io/Lynx/
>>
File: WanVideo2_2_I2V_00047.mp4 (543 KB, 480x512)
543 KB
543 KB MP4
Thanks for the help with the multiple sampler method to split up the strengths of the lora for each step. Sadly it doesn't seem it works.
I'm trying to fix the color shift with the other wrapper wf.
>>
>>106666683
I downloaded one video and it's 16fps, it's probably a finetune of Wan 2.2
>>
>>106666683
i love how those numbers just do not mean anything at all.
>>
>>106666755
> Built on an open-source Diffusion Transformer (DiT) foundation model
it's 100% wan2.2. but the somewhat interesting part is their ID adapter. that aside i see nothing else of value.
>>
File: 1745929321836140.mp4 (850 KB, 672x480)
850 KB
850 KB MP4
>>
>>106666800
what's with the celery
>>
>>106666878
you don't know Hatsune Miku's leek? how young are you? lol
https://www.youtube.com/watch?v=6ZWwqTnqxdk
>>
>>106666890
I don't speak japanese, do you?
>>
>>106666912
>do you?
I do know how to search for the translated lyrics on the internet yes?
>>
Blessed thread of frenship
>>
>>106666890
>Hatsune Miku's leek
Leekspin was orihime from bleach mikunigger
>>
File: FluxKrea_Output_36262.jpg (3.57 MB, 1664x2496)
3.57 MB
3.57 MB JPG
>>
>>106666683
>>106666755

While the demos do look pretty good, seems to still have the same 5 second cap....sigh...
>>
If I got the hardware, are the fp32 options worth it over the fp16 precision in comfyui nodes? Can't tell much difference.
>>
>>106666979
https://www.youtube.com/watch?v=ekdKIKfY6Ng
damn I miss that era of youtube, All YouTube recommendations were as kino as this
>>
File: 1745321222999795.png (2.08 MB, 1024x1552)
2.08 MB
2.08 MB PNG
cute kote
>>
>>106667005
likely a finetune of Wan so yeah, with all the drawbacks associated with it
>>
>>106667011
isn't bf16 better than fp32?
>>
nunchaku qwen image edit plus when???????????
>>
>>106667003
corpo slop you'd see them hang in your office to remind you not to think for yourself and waste your life on the company while they rape you.

10/10 would kms again
>>
Is the state of unpozzed video gen good enough that there's a thread someplace with horrible degenerate images of heterosexual coupling (and not just "female with liquid overflowing")? Where? Is there a place to try my hand at it for free?
>>
File: 1747958829558856.png (534 KB, 1024x1024)
534 KB
534 KB PNG
make a plastic anime figure of the cyan hair anime girl on a round pedestal. (qwen edit)
>>
>>106667130
> free
yes, simply buy a 5090 and you can gen it at home for free.
>>
File: 1737544147093371.png (553 KB, 1024x1024)
553 KB
553 KB PNG
>>106667152
oops, didnt set upscale on the compare image nodes to lanczos. now it's better:
>>
>>106667153
dont need a 5090 for wan. 12 or 16gb is enough.
>>
File: 1746273254494760.png (2.12 MB, 1500x1500)
2.12 MB
2.12 MB PNG
>>106667084
>again
what are you a fucking cat?
>>
>>106667189
sure, if you want to wait forever just for 5 seconds.
>>
>>106667203
with a 4080 I can get a clip in like 100-120 seconds with lightx2v loras.
>>
>>106666591
>having good success with the 2.1 lora at 3 strength for high, and 2.2 lora at 1 for low. only 2.2 high seems to affect motion in a bad way
nigga we been known this
>>
File: 1746702751519132.png (978 KB, 1072x968)
978 KB
978 KB PNG
>>106667175
with both:
>>
>>106665018
Catbox?
>>
File: 1730104063321332.png (1.01 MB, 1072x968)
1.01 MB
1.01 MB PNG
put the image on a coffee cup, that is placed on a table in a coffee shop.

the edit models are so neat. and you can use this stuff with wan or whatever. you can inpaint and do stuff like that but it'd be very hard to do this type of stuff without qwen edit/kontext.
>>
>>106667279
Woah that's crazy I had no idea this is entirely new information that no one's mentioned before thank you for sharing anon
>>
>>106667288
yes my bad for discussing diffusion models in the diffusion general.

point remains, these are really good tools cause any amount of denoise % or controlnets would not be able to do what these can do.
>>
>>106667003
I would use that in a powerpoint.
>>
How many GB vram do you need to train an SDXL LoRA?
>>
>>106667320
Much like the memes you choose to edit the discussion you bring is old and stale kek
>>
>>106667250
Can it make a mamama apimiku style mimukawa miku?
>>
File: 1756649195842063.png (1.07 MB, 1136x912)
1.07 MB
1.07 MB PNG
>>
>>106667394
kek
>>
File: 1752173970525730.png (1.58 MB, 1368x760)
1.58 MB
1.58 MB PNG
replace the anime girl with Miku Hatsune. Change the text "DOROTHY: SERENDIPITY" to "MIKU: HATSUNE".

nice
>>
>>106667445
kys
>>
>>106667455
no ty sir
>>
>>106667445
pretty good, but hatsune miku isn't the hardest to change, most models know her well
>>
>>106667481
youre expecting anon to do something interesting? he would never
>>
>>106667481
you could also remove the model entirely and then swap them in with photoshop. what's neat about the edit models is they can edit or remove stuff but respect layers. can't do that with inpainting at high denoise levels.
>>
https://huggingface.co/Qwen/Qwen-Image-Edit-2509
it's up
>>
Incredible groundbreaking developments from the miku poster
>>
>>106667506
Enhanced Single-image Consistency: For single-image inputs, Qwen-Image-Edit-2509 significantly improves editing consistency, specifically in the following areas:
Improved Person Editing Consistency: Better preservation of facial identity, supporting various portrait styles and pose transformations;
Improved Product Editing Consistency: Better preservation of product identity, supporting product poster editing;
Improved Text Editing Consistency: In addition to modifying text content, it also supports editing text fonts, colors, and materials

sounds good
>>
>>106667506
>This September, we are pleased to introduce Qwen-Image-Edit-2509
>Multi-image Editing Support: For multi-image inputs, Qwen-Image-Edit-2509 builds upon the Qwen-Image-Edit architecture and is further trained via image concatenation to enable multi-image editing. It supports various combinations such as "person + person," "person + product," and "person + scene." Optimal performance is currently achieved with 1 to 3 input images.
let's fucking go dude, no more image stitching cope anymore
>>
File: 1734482800694463.png (424 KB, 405x720)
424 KB
424 KB PNG
>>106667506
thank you chinks, you really are our saviors
>>
>>106667506
any fp8/q8 yet?
>>
>>106667506
https://huggingface.co/spaces/Qwen/Qwen-Image-Edit-2509
https://huggingface.co/spaces/akhaliq/Qwen-Image-Edit-2509
there's demos in here
>>
>>106667537
it's just been here less than an hour ago, unlikely
>>
>>106667506
already?
lmao, qwen actually speedruning to agi in 3 years...
>>
>>106667445
>>106667279
>>106667152
Can you try with the updated model >>106667506
>>
>>106667506
>and it's not over
we'll get wan 2.5 in a few days, Alibaba is better than Santa Claus lmao
>>
File: 1728428382115506.png (2.28 MB, 3072x1638)
2.28 MB
2.28 MB PNG
>>106667546
>https://huggingface.co/spaces/akhaliq/Qwen-Image-Edit-2509
wtf is this shit ;_;
>>
>>106667574
dont have a quant or fp8 download, I assume it has to be converted from this batch of files.
>>
> This September, we are pleased to introduce Qwen-Image-Edit-2509, the monthly iteration of Qwen-Image-Edit.

what? so they are updating it monthly or what?
>>
>>106667601
China does what Scam Altman dont
>>
>>106667506
sadly it does look just like a incremental improvement
>>
>>106667601
>what? so they are updating it monthly or what?
I think they already done that with LLMs
>>
at what point will you wake up and stop getting excited for chinese slop
>>
>>106667506
NUNCHAKU NEXT TARGET
>>
>>106667611
sure but is this just a monthly further finetune or athe actual "edit plus" they are talking about?
>>
>>106667622
>NUNCHAKU
give up bro lol
>>
>>106667619
slop? wan is better than any western video model and is open source. qwen/qwen edit are free. OpenAI want you to pay $1000 a month for 5 prompts a day.
>>
>>106667506
if that one doesn't zooms in randomly, all we have to do is to SPRO this shit and we'll be back
>>
that's nice and all but let me know when they are brave enough to include aroused genitals in the dataset
>>
File: file.png (15 KB, 450x115)
15 KB
15 KB PNG
ok yeah sure
>>
So are the lighning loras supposed to be 2 high 2 low or 4 high 4 low?
>>
>>106667619
never, we can save it!
>>
>>106667611
I still would welcome a leak of dalle 3. It has a weird vibe no other model gets right.
>>
File: 1744949353100508.png (1.28 MB, 1280x720)
1.28 MB
1.28 MB PNG
>>106667506
>The girl from Image 2 is sunbathing on the lounge chair in Image 1
>The girl from Image 2 is drinking coffee on the sofa in Image 1
excellent, that's exactly what I wanted from an edit model
>>
>>106667678
scale is way off in the coffee shop image
>>
>>106667640
wan is the only non slopped decent model from xi its an outlier need i remind you of the dozen failed image models
>>
>>106667692
that was a bad idea to use the ratio of the 2nd image, it should've been the ratio of the 1st image
>>
>>106667506
>60GB
do I need 60gb vram to run this?
>>
>>106667619
Qwen Image is less slopped than Flux and has the apache 2.0 licence, if only it wasn't so big it would be finetuned by another suspiciously richfag
>>106667704
it's the same size as Qwen Image, so a 24gb vram card will suffice (Q8 QIE + Q8 text encoder)
>>
>>106667704
where'd you get that number? I'm seeing 40, and that's fp16 so fp8 will be 20
>>
>>106667712
I use q8 image/edit on 16gb (4080) without issue, not even using multigpu node.
>>
It's shit. You will cope for a week saying how it's better than nano banana until you too finally admit it's shit.
>>
can you actually run the bf16 qie with blockswap on a 24gb card?
>>
>>106667745
who legit cares about if it's better or not than nano banan? this shit is free and wildly uncensored compared to any paid non-local model.

you fat cunt.
>>
File: retiring.jpg (148 KB, 1376x768)
148 KB
148 KB JPG
>>106667546
>>
>>106667745
>You will cope for a week saying how it's better than nano banana
no one will say that lol, I don't expect QIE to beat nano banana anytime soon
>>
Is Wan 4:3 resolutions only or can it do 9:16 (ie iphone), 1:1, etc? I mostly gen between 4:5 and 9:16 so 4:3 doesn't work for me...
>>
>>106667760
the eyes are sus for poor Ryan, but I like the skin texture though
>>
>>106667767
this is a dumb question.
>>
>>106667767
just try it?
>>
>>106667767
you can do any size, smaller is good for fast gens (ie: 640x640 vs 832, etc)
>>
the model has been out one hour now

where quants
>>
why aren't all loras migrated to the new model already?
>>
>>106667767
If only there was perhaps a guide written that included this very information.
>>
File: 1727804101294440.png (2.95 MB, 3562x1664)
2.95 MB
2.95 MB PNG
>>106667506
not bad at all
>>
File: 1741028704148360.png (574 KB, 1920x1080)
574 KB
574 KB PNG
Qwen Image Edit PSA

Always add:
"without changing anything else about the image"
at the end of your prompts if you want to preserve anything at all from the original image

Also here's a great workflow for the old Qwen Image Edit model
https://files.catbox.moe/6wcz4m.png
>>
>>106667839
your advice is deprecated anon kek >>106667506
>>
>>106667821
Can you catbox those two images so I can try it with the old model? And paste the prompt too if you can
>>
>>106667856
I found it on reddit so I can't help you with that
>>
>>106667853
No, I'm giving it specifically because I see that the new version also needs that same prompt to be appended for it to preserve things properly and because I see people who are using bad workflows for the old model thinking it's bad but the new one is just an incremental improvement.

The new model still doesn't keep the exact same resolution, and still has the same VAE quality loss obviously as it's still not pixelspace.
>>
File: Chroma_Output_24252.png (1.41 MB, 1024x1496)
1.41 MB
1.41 MB PNG
>>
>>106667872
once lodestones finishes his radiance/pixelspace model we will likely see more models adopt it.
all pissy trolling aside, it really keep images nicer not having to run em through a vae. this will be important for edit models where you simply can't iterate because the quality gets raped by the vae.
>>
do people still use guidance that makes generation take 2x as long if you use negatives? haven't done SD in a while
>>
File: 1749081402833794.png (727 KB, 1176x880)
727 KB
727 KB PNG
quants where?
>>
File: 1750371447094879.mp4 (3.77 MB, 1920x1080)
3.77 MB
3.77 MB MP4
>>106667642
>if that one doesn't zooms in randomly,
it does, look at 33 sec
https://xcancel.com/Ali_TongyiLab/status/1970194603161854214#m
>>
>>106667821
looks kinda bad, you can see how it completely rejects the blue jacket's texture when outpainting. looks like a 512x512 crop pasted on top
>>
>>106667642
spro is a meme lil bro, let it go
>>
>>106667906
oh yeah nice catch
>>
not uploaded yet but seems like the first quants are here https://huggingface.co/calcuis/qwen-image-edit-plus-gguf
>>
>>106667916
we'll still have to wait for comfy to implement the multi image process too
>>
no qwen image edit plus nsfw finetune yet?
>>
>>106667885
>once lodestones finishes his radiance/pixelspace model we will likely see more models adopt it.
yep, for edit model it'll be mendatory to go for the pixel space, maybe QIE will be the first to do it who knows
>>
>>106667932
This is gonna be brutally heavy on vram and ram tho
>>
File: 1749166461153434.png (1.94 MB, 1505x1466)
1.94 MB
1.94 MB PNG
>>106667506
now that can be interesting to experiment with
>>
>>106667938
I still think 20b is overkill, if they manage to keep the quality with 13-14b + pixel space we could manage to run this shit
>>
>>106667886
Raising the CFG will inevitably make things slower.
>>
>>106667782
it takes me like 10 minutes every time I "just try" something with video, I'd like to not waste hours discovering the handful of things everyone else already knows. Besides I was hoping someone might know some info about how it was trained and whether it was intended to support such resolutions or not
>>
>>106667506
it says it supports ControlNet, holy shit
>>
File: 1748998248144044.png (2.86 MB, 1752x1590)
2.86 MB
2.86 MB PNG
Here's the new vs old Qwen Image Edit models for comparison with the Will Smith example posted above.

We need SRPO and no quality loss from VAE like Chroma radiance model has with the new pixelspace research, this is just an incremental improvement that isn't much different as the old model is already pretty good depending on the prompt and workflow.
>>
>>106668015
I like the improvement, the face is more accurate and the skin texture is not as slopped as before
>>
>>106668015
Old version workflow: https://files.catbox.moe/r0kyif.png
Same workflow as posted at >>106667839

>>106667821
>>
>>106668015
old version looks so much better ahahah.
can you do some more comparisons? i'm out of gpu time on hf
>>
>>106667601
>what? so they are updating it monthly or what?
lmao, are they really gonna upload a new version of QIE each month? sounds crazy, I guess they realized that the training wasn't over and the curve loss wasn't flattened yet
>>
File: 1730766689667884.png (1.22 MB, 1280x768)
1.22 MB
1.22 MB PNG
>>106668015
>>106668028

Also keep in mind that the deployment parameters of models matter a lot, so we need to wait for the best workflow to be created for a more like to like comparisons.

For example with this comparison and that generated image you see there, on the old model I added
"Don't change anything about their heads at all, keeping their faces and heads exactly as they are."
to the prompt and yet I got the same image as you see there as I get in picrel without adding that sentence to the prompt, meaning the old model for example can't like for like copy the original images like the new one can, despite the images being low quality themselves, it can be a showcaseing of the new model following the prompt better, which is important.
>>
https://huggingface.co/calcuis/qwen-image-edit-plus-gguf/blob/main/qwen2.5-vl-7b-test-q4_0.gguf

what's this
>>
I just bought a 5060 Ti (16 GB) instead of a 5070 Ti.
Not worth 2x the price; still a massive upgrade from my 3060
>>
>>106668134
the text encoder? it's probably the same text encoder as the previous Qwen Image model
>>
>>106668137
Waste of money. Should've waited until you had more and bought a 4090 or 5090. I can't imagine doing video gens on 16GB.
>>
well I guess q8/other models will be up later today some time.
>>
File: elf hugger_00435_.png (3.56 MB, 1080x1920)
3.56 MB
3.56 MB PNG
>>
>>106668151
wan q8 works absolutely fine on a 4080 (16gb). the only thing you have to consider is not making the dimensions *too* large cause that needs more vram.
>>
>>106668134
>>106668149
if that's the same text encoder he's wasting his time, there already have gguf of this
>>
>>106668137
nice!
>>
>>106667786
imo there's specific resolutions that wan works best with, and i will continue to stick to wan 2.1 resolutions, which is 1280x720 high res, and 480x832 for low res.
>>
>>106668137
Should've waited for the super cards. A 3090 is faster than a 5060 ti and you're now stuck with 16gb.
>>
>Sarrs... a second model is released this week.

Wtf, I didn't get to fuck around with Wan animate completely yet. We're eating too good.
>>
>>106668161
you cant do 720p + future wan models may not do the whole high/low split thing again. if they don't, you'll be forced to use a lower quant like with wan2.1
>>
>>106668181
what the hell even is wan animate? is it like vace?
>>
>>106668151
waste of money, by getting +70% faster gens? not really. Not very interested in video

>>106668168
yeah! looking forward to it.

>>106668175
I considered waiting for them, but when they're coming is not confirmed - and they're hardly going to be anywhere close to MSRP anyway
>>
>>106667506
>Multi-image Editing Support: For multi-image inputs, Qwen-Image-Edit-2509 builds upon the Qwen-Image-Edit architecture and is further trained via image concatenation
what makes it different to our image concatenation cope we used to do on the previous QIE?
>>
>>106668216
>waste of money
it's literally not. vram is absolute king.

>Not very interested in video
it's beneficial for training loras too and future proofing for the latest models, but you do you. 100% wasted.
>>
>>106668212

https://humanaigc.github.io/wan-animate/

>tl:dr

Character replacer for videogen.
>>
>>106668236
whos the girl anyway?
>>
>>106668234
>vram is absolute king
would you take a 96GB GTX 680 over a 16GB RTX 4080?
>>
>>106668236
Seems VACE is a component within Wan Animate, so essentially it's doing the same thing except better I guess.
>>
File: ComfyUI_01086_.jpg (329 KB, 864x1152)
329 KB
329 KB JPG



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.