[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models and UI

Prev: >>106418741

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassic
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
AniStudio: https://github.com/FizzleDorf/AniStudio/tree/dev

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://tensor.art
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://rentry.org/wan22ldgguide
https://github.com/Wan-Video
https://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbours
https://rentry.org/ldg-lazy-getting-started-guide#rentry-from-other-boards
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
reminder that i used nunchaku before all of you
seethe
>>
Comfy ..... shot
>>
File: 1755505194365758.jpg (125 KB, 604x604)
125 KB
125 KB JPG
>>106426685
is nunchaku good for you?
>>
Blessed thread of frenship
>>
>>106426692
yes, but i cant use it for qwen image yet because im a 12gb vramlet and it doesn't support cpu offloading
https://github.com/nunchaku-tech/nunchaku/pull/624
>>
File: 00042-3972725621.jpg (113 KB, 1728x1344)
113 KB
113 KB JPG
>>
>>106426715
god damn i would absolutely rape that little slut and make her never smile again
>>
why does batch processing work so much better for images?
it seems like comfy is making waaay better use of system resources despite apparent increased load.
>>
>>106426724
anon! no! bad anon! anon no raping!!!
>>
>>106426727
it was the same shit for auto1111. it isn't some space magic, it's just pytorch. comfy has been doing a great job at fucking memory as hard as possible because of diaper leakies
>>
Anyone ever done i2v of their poop?
>>
>>106426724
no rape allowed
>>
>>106426734
I feel like I'm standing on top of a very tall tower made out of rotted 2x4s and rusty nails assembled by Pedro and the gang
>>
>>106426745
this one is too uncanny to be deserving of my rape
>>
File: 00051-3768333111.jpg (193 KB, 1728x1344)
193 KB
193 KB JPG
>>
>>106426774
would want to be reverse raped by
>>
>>106426782
Reverse rape is just consensual sex.
>>
File: 1751161807219094.png (1.17 MB, 1120x1440)
1.17 MB
1.17 MB PNG
>>
Why is everyone shitting themselves over nano banana when qwen edit is basically just as good?
>>
>>106426793
The smile should be less authentic and more sardonic.
Otherwise kino gen.
>>
>>106426793
this theme used to be funny. but now I doubt they're the only big villain
>>
>>106426800
Is everybody? after they pissed off pretty much everyone with the flood marketing they did before release. I don't see it anywhere anymore (unless it's on xitter, don't waste my time there)
>>
File: 00058-1040468860.jpg (135 KB, 1728x1344)
135 KB
135 KB JPG
this might my favorite gen of the year.
>>
File: BEAHGWAHAHAHA.mp4 (564 KB, 1080x720)
564 KB
564 KB MP4
>>106426697
>>106426689
>>106426685
THE DUALiTY OF MAN!
BWAHAGHAHAH
>>
File: didntobeysadly.mp4 (409 KB, 720x720)
409 KB
409 KB MP4
>>106426774
>>106426861
rocketnon this is just sad
>>
>>106426863
Why are you scared of confrontation?
>>
I used comfy example workflow for qwen edit and it creates blank image with error
ComfyUI\nodes.py:1590: RuntimeWarning: invalid value encountered in cast
img = Image.fromarray(np.clip(i, 0, 255).astype(np.uint8))
tf going on? did i fuck up updating?
>>
File: smokeanotherone.mp4 (989 KB, 1080x720)
989 KB
989 KB MP4
>>106426898
why are you afraid to have sex?
>>
>>106426800
>everyone
Sure, Jan
>>
*yawn*
>>
>>106426901
>did i fuck up updating?
good software wouldn't allow that
>>
File: photo-collage.png.png (2.75 MB, 1920x1080)
2.75 MB
2.75 MB PNG
>>106426800
both qwen models desperately needs a finetunes and more lora support from the community.
>>
>>106426861
it's pretty nice, i'll give you that. middle finger is a bit long though, def looks odd.
>>
>>106426906
i forgot that you are underage
but i also don't know how people your age communicate
so why do you always hide behind this?
>>
File: spidermanmeme.mp4 (1.06 MB, 1080x720)
1.06 MB
1.06 MB MP4
>>106426917
"its not that serious mate"
>>
https://files.catbox.moe/p8xa4s.mp4
>>
File: file.png (17 KB, 218x194)
17 KB
17 KB PNG
redeem >>>/g/sdg/
>>106426861
that is a very nice gen
>>
File: WanVideo2_2_I2V_00236.webm (249 KB, 1248x720)
249 KB
249 KB WEBM
>>
File: 1752523430609772.png (1.33 MB, 1120x1440)
1.33 MB
1.33 MB PNG
>>106426793
actually, the shadow got fucked up a bit in that one

>>106426800
it's not organic. it's a massive, obnoxious indian operation hired by google. blatantly spamming threads and subreddits where it's against the rules/topic.
https://files.cat .. box.moe/5if355.png

>>106426838
they're not the only problem, but I'm still sick of them.
>>
>>106426926
so you ARE just scared of confrontation
well i hope you didn't leave enough breadcrumbs for someone to do things postcard
>>
what do you mean anon?
>>
Is there a fp16 qwen image edit?
>>
someone needs their meds, NOW.
>>
every time i see s/w get too many replies i get curious and then eventually i click on one of the videos, because i know what they look like
then my heart starts HURTING (i am not exaggerating) out of cringe
>>
>>106426901
https://github.com/comfyanonymous/ComfyUI/issues/9265
https://github.com/thu-ml/SageAttention/issues/162
https://github.com/comfyanonymous/ComfyUI/issues/8689#issuecomment-3177486707
sageattention issue, sounds like they're still working on it. tho there seems to be a temp fix you can manually do for now.
using sageattention by itself i haven't had any issues so far.
>>
File: 8-30 061002.png (114 KB, 1176x672)
114 KB
114 KB PNG
what's wrong bwos...
>>
>>106426993
Because you touch yourself at night
>>
>>106426989
poor thing! have mommy cover your eyes!
>>
>>106426993
Update the gguf node
>>
>>106426993
Using third party nodes, Comfy will always break those
>>
>>106426989
sounds like we need a s\w only collage next thread ;3
>>
File: WanVideo2_2_I2V_00237.webm (750 KB, 720x1248)
750 KB
750 KB WEBM
>>
>>106427008
wow what an asshole
>>
>GOOD MORNING!!!!
B O T . S T A T U S ?
>>
>>106427012
i dont mind thumbnails tho
>>
>>106426993
this is the correct answer
>>106427004
>>
>>106426931
>>106427018
>horrible
DEF GOIN IN THE COLLAGE!!!
>>
>>106426993
NOODLES
>>
>>106427018
this reminds me of a scene from
https://en.wikipedia.org/wiki/The_Poughkeepsie_Tapes
which I wish I hadn't watched
>>
>make comfyui workflow a total shitload of fuck
>complain that its 'shit' and 'doesn't work'
>become threadly anti-comfy schizo
why does this keep happening?
>>
>>106426992
>https://docs.comfy.org/tutorials/image/qwen/qwen-image-edit
can you try the first qwen edit example with --use-sage-attention
on?
I removed it and it ran fine. so yea this may be it
>>
>>106427038
well someone is definitely trying since months to discredit comfyui so who could it be ...
>>
REALLY ACTIVATES THE ALMONDS
>>
>>106427050
users
>>
>>106427038
>make comfyui workflow a total shitload of fuck
>whack it against claude until it works
>anons give me helpful suggestions which speed up my workflow considerably
why does this keep happening?
>>
>>106427050
a certain shota enjoyer?
>>
>>106426945
>post violates united states law
>>
>>106427050
girls with penis chad?
>>
imagine being so psychotic that you make direct calls for violence\dox
>>
>>106427067
mens with tits
>>
Legitimately don't know how people do anything with their images outside of comfyui. They just press a big stupid orange generate button.
>>
>>106427076
its blue tho?
>>
File: 00032-1937354874.png (1.21 MB, 1384x712)
1.21 MB
1.21 MB PNG
>>
imagine how shitty a general has to be to complain about software and schizos all day with barely any engagement of on-topic discussion
>>
>>106427076
some people prefer simplicity over customization
>>
did anyone update the wan 2.2 ldg guide yet?
>>
>>106427039
i don't have the edit model downloaded somebody else might try
with qwen image using --use-sage-attention runs fine for me
>>
>>106427084
I see gradio has upped its game since I last used it.
>>
>>106427061
>>106427070
>thinking swedes care about mutt laws
>>
>>106427106
just emailed
expect my attorney to get involved
>dmca, tida, etc
ringing any bells?
>>
>>106427061
whiny bitch kek
>>
>>106427087
cant even post a gen, laughable
>>
File: WanVideo2_2_I2V_00238.webm (429 KB, 1248x720)
429 KB
429 KB WEBM
>>
>>106427145
>says this
>is a nogenner
pottery
>>
>>106427146
>>106426931
do you HAVE to post this garbage? its making our general look bad!
>>
>>106427137
see >>106427138
>>
>>106427088
most people do actually. it's why complaints are more common
>>
File: 1743543611443482.png (1.34 MB, 1024x1024)
1.34 MB
1.34 MB PNG
>>
when will i be able to experience the pleasure of inseminating my cute petite large breasts wide hips 1girl
>>
>>106427197
>>106427160
BEAGHAHAHAH
>>
File: elf hugger_00178_.png (3.54 MB, 1600x2200)
3.54 MB
3.54 MB PNG
>>106427215
You may have to leave the house to achieve this....
>>
before i even bother diving into this, can my mbp m4 laptop handle wan2
i want to make vids from pics
>>
>>106427232
>cant run windoze
>>
>>106427092
the relevant information that needs to be updated to get up and running you could easily do by either copying pasting from the git pages or just linking to them.
that or just link to one of the many updated auto installs i've seen on reddit if you're lazy.
>>
File: ComfyUI_05402_.png (719 KB, 720x1280)
719 KB
719 KB PNG
>>
>>106426800
Qwen Edit is not just as good lmao, Nano Banana is a model with Imagen 4 Ultra (or a bit better) realism capability, Qwen Edit is a model with Qwen realism capability, it simply cannot retain sufficient detail or likeness if given e.g. an actual photograph.
>>
gm
>>
>>106427242
you're right, i was able to get it up and running on my own, but wanted to see if its updated for other anons
>>
>>106427215
it'll happen sooner if you never say inseminating again
>>
I pulled, for the first time since the 22nd, and not only did it not crash anything, but actually I got a 5% performance increase

Is Comfy getting it together or should I go out and buy a lottery ticked due to my current good fortune ?
>>
File: WanVideo2_2_I2V_00240.webm (154 KB, 1248x720)
154 KB
154 KB WEBM
>>106427160
Can it, Spergules.
>>
>>106427197
>>106427243
need her
>>
what's the point of talking about Nano Banana here? we already know that online slop tools are better. except for the nsfw part kek
>>
>>106427324
Can't believe people have forgotten how Disney used beloved characters like Hercules to influence kids into smoking
>>
>>106427346
this. /ldg/ is a gentleman's club of gooners
>>
This must be what a gambling addiction feels like. You randomly get a really nice gen with a whole bunch of good extra details that you didn't even ask for, then you try to add those extra details to the prompt and it just goes lol no and ignores half of it and fucks up the other half.
>>
>>106427232
assuming you have 32GB+ ram, yes, but it will be pretty slow
>>
>>106427324
this ones pretty good
>>
>>106427365
Learning to inpaint and photobash a bit will ameliorate to some extent dependency on the RNG god.
>>
>>106427346
>online slop tools are better
If a much larger model than anything you can run at consumer hardware wasn't better, it would be insane.

Actually it is insane how little gap there is between Wan and SAAS commercial alternatives, which in turn are heavily censored and impossible to augment with extra training like loras.

Local is such a pure win.
>>
>>106427346
>Needing NSFW images to goon
you are weak
>>
use case for comfyui needing to run at 1000 FPS?
>>
>>106427364
*losers
>>
File: output-0018.jpg (818 KB, 1768x2280)
818 KB
818 KB JPG
>>106427387
snazzier snap-tos
>>
>>106427346
it's not even significantly better than qwen, and that's before qwen LORAs and finetunes really take off. Xi achieved a pre-emptive strike with that model.

>>106427365
unlike gambling, you can re-engineer prompts, workflows, etc to improve your chances. yeah it's addictive as fuck though. it's really like one of those cyber drugs from cyberpunk fiction.
>>
>>
>>106427387
Kelvan Empire Occupiers want gens.
>>
>>106427397
*cutting edge ai researchers
>>
>>106427403
holy sloppa
(probably going in the collage)
>>
>>106427416
I dunno, the catgirl I made from the same workflow didn't get into it, though the chroma image I I worked on did.
>>
>>106427416
I dunno, I like the messy shading lines, makes it a lot less 'AI'
>>
File: AniStudio-01218.png (1.86 MB, 1024x1344)
1.86 MB
1.86 MB PNG
>>106427365
https://youtu.be/OVAkL2YbisE?list=RDOVAkL2YbisE
>>
>>106427387
it's a bunch of fucking divs!!! why the fuck should it cost more to render than 4chan???
>>
bro is SO upset that hes not in the faggollage
>>
File: FluxKrea_Output_66543.png (2.21 MB, 1344x1728)
2.21 MB
2.21 MB PNG
>>
File: 00032-1937354874.png (874 KB, 1344x768)
874 KB
874 KB PNG
>>
>>106427406
There is absolutely no way to train a Qwen single-subject likeness lora on photographs of an actual person and inference with it in a way that looks nearly as good as a Flux Krea lora trained on the same dataset (I've tried), the model just isn't realistic enough inherently, even with schizo negatives it still likes to veer into Pony Realism esque sloppa real quick.

This isn't to say the likenesses are super off or that the coherency is bad, in those regards the Loras I've tried came out fine, it's just it's not a good platform aesthetically for anything non-illustrative at the moment.
>>
File: WanVideo2_2_I2V_00241.webm (540 KB, 1248x720)
540 KB
540 KB WEBM
>>
File: ComfyUI_00004_.mp4 (312 KB, 640x352)
312 KB
312 KB MP4
>>
>>106427493
still no fast krea?
>>
>>106426800
Not as good as everyone here has you think
https://lmarena.ai/leaderboard/image-edit

My findings are pretty much the same. Qwen edit still doesn't hold a candle to whatever BFL is hosting behind their API. And nano banana is basically just a hypothetical Krea edit, so it's not really that far off from Kontext Pro/Max.
>>
>>106427493
Raw output or no LoRA? The output is comparable to Chroma.
>>
Man, I really fucking need my local music gen fix, fast.
Every single open music gen model is garbage.
>>
>>106427568
I've found Qwen edit does basically as good a job and I can train it to do exactly what I want.
>>
>>106427564
nunchaku
>>
>>106427575
Ace step wasn't too bad. Definitely a step in the right direction. I think it's getting a big update soon too.
>>
>>106427564
Not sure what you're referring to exactly, it got a Nunchaku already at least though:

https://huggingface.co/nunchaku-tech/nunchaku-flux.1-krea-dev
>>
>>106427230
so what?
>>
>>106427584
When stacked against other models though you can easily pick apart the slopped Qwen edit images. That's why it's not performing well in LMArena.
>>
>>106427573
I legit can't tell if you're asking a question or stating something here lol, I don't know what you meant by this.
>>
>>106427535
>There is absolutely no way to train a Qwen single-subject likeness lora on photographs of an actual person and inference with it in a way that looks nearly as good as a Flux Krea lora trained on the same dataset
The model has barely been out a month. We don't really know how to train it yet. Flux was out for a whole year before we finally got some decent de-slopping of it like Krea and Chroma. I'll reserve judgement since some LORAs seem to do an OK job at de-slopping, way better than Flux was at this age.

That said, if we do get a Qwen finetune, by the time it drops we'll have a new completely different SOTA lmao
>>
>>106427592
>>106427597
so nothing for vanilla comfyui?
>>
>>106427594
I hope they at very least improve the alignment related to the genre tags, and make it work with niche genres (eg Eurobeat), or follow the reference audio of said genres well
>>
File: output.mp4 (4 KB, 60x40)
4 KB
4 KB MP4
>vanilla comfyui
>>
>>106427616
15 rupees were deposited on your shit account saar
>>
>>106427648
the thing is, everything is for vanilla comfyui
>>
>>106427631
I'm asking whether you used a LoRA or not.
>>
when are we going to see 1080p genning?
>>
>>106427742
Now if you have infinite amounts of time.
>>
File: comparison.jpg (3.4 MB, 3584x2560)
3.4 MB
3.4 MB JPG
>>106427568
>And nano banana is basically just a hypothetical Krea edit

not really, Imagen (which Nano is clearly related to) has it's own aesthetic, obviously it's not a complete comparison of everything either model can do here or really even a proper one to one comparison at all but this is Imagen 4 Ultra at 2K straight from the API on the left and the Krea on the right, same prompt.

At least for this particular type of gen / prompt, Imagen is more like, "a model that's aesthetically tuned but with really high detail and not distilled at all", Krea sort of leans more raw (perhaps a little too much so, it kind of falls into the trope of representing modern phones as having rather worse camera quality than they actually do)
>>
>>106427317
>Is Comfy getting it together or should I go out and buy a lottery ticked due to my current good fortune ?
You pulled when he just put out a release
https://github.com/comfyanonymous/ComfyUI/releases/tag/v0.3.55
So I guess the lottery ticket?
>>
>>106427638
no it trains fine, the only people struggling are the sort of people who ALWAYS used super weird training configs with retardedly low learning rates and the constant scheduler even on older models, there's definitely absolutely nothing technically "wrong" with the Loras themselves, trust me
>>
>>106427648
I guess not? I dunno what that would even be in terms of equivalents on any model that's come before though
>>
>>106427735
oh my bad, yeah I did, one that was actually trained on Krea, not regular Flux
>>
>>106427861
>constant scheduler
Nothing wrong with the constant scheduler.

Every large model is trained using constant scheduler, including all released as open for local, from SD15, SDXL, Flux, Wan, Chroma, Qwen, I have no doubt that holds true for all local finetunes as well, but they seldom release papers / detailed info on how they train their models.
>>
>>106427841
might be the first time ive seen the mirror effect in a gen. thats sweet.
>>
File: 1755548566283397.png (31 KB, 551x83)
31 KB
31 KB PNG
If it's finished, why does he keep updating it?
>>
>>106427450
Hey Ani, we post your shitty UI in /adt/'s OP, go there and do some clowning around at least as a way of saying thanks!
>>
>>106427983
>ani
beahafahahah
>>
File: 1756319723537883.png (377 KB, 2729x763)
377 KB
377 KB PNG
>>106427898
you're a gorillion times more likely to overtrain or undertrain with constant on a Lora than with Cosine or Cosine with Restarts, in my experience.

these settings (epoch count might change depending on the dataset or whatever obviously, I never count in "steps" or use more than one repeat though, and the text encoder learning rating there wasn't relevant cause it's obviously not actually being trained) have been working for me well on TensorArt, anyways.

I don't know what exactly their backend is but Dim seems to always been in Kohya-equivalent scale, I'll note also, for Qwen Dim 16 gives about a 260 MB Lora, Dim 32 is 500 something, Dim 64 would presumably be well over 1 GB.
>>
>>106427669
https://files.catbox.moe/v8ysdr.jpg
Left: Kontext Pro, right: Qwen Edit. Bottom: original.
You are free to draw your own conclusions.
>>106427841
Imagen 4's aesthetic is not really in line with Gemini 2.5 Flash though. The photorealism is a lot closer to the Flux side.
>>
File: ComfyUI_00178_.png (3.31 MB, 1280x1920)
3.31 MB
3.31 MB PNG
>>
>>106427982
he is uploading the radiance progress. chroma but in pixel space.
>>
File: file.jpg (468 KB, 2112x1584)
468 KB
468 KB JPG
>Blows out the color
Huh. Any Qwen Image Edit Anons got some pro tips?
>>
>>106428125
use nanobannana instead
>>
>>106428143
kys jewgle shill
>>
File: file.jpg (434 KB, 2112x1600)
434 KB
434 KB JPG
Alright, had the resize to be divisible by 16 instead of 32. Way less blown out.
>>106428143
I'd rather be drowned in a vat of boiling shit.
>>
File: Comparison2.png (2.4 MB, 2048x1024)
2.4 MB
2.4 MB PNG
>>106428084
I'd say Nano is very very similar to be honest, they both have somewhat broad output diversity even on the same prompt (if the prompt isn't super detailed at least) so it can be a bit hard to compare them (and also Nano doesn't have straight text-to-image functionality for anything but 1MP 1:1 aspect ratio yet unlike what it has for image-to-image).

This is them on a much longer more detailed prompt than the one I used before though, at 1024x1024 it's pretty clear there's no way they aren't closely related IMO.
>>
>>106428084
Can you provide the original image and prompt? That would be cool.
>>
is it possible to set a desired output size, for qwen image edit? or is it always based on the image input size
>>
>>106427416
I discard your allegations of slop, they mean nothing to me, for I have seen what you gen for your own satisfaction.
>>
>>106428027
>gorillion times more likely to overtrain or undertrain with constant on a Lora than with Cosine or Cosine with Restarts
No I must disagree, it's actually a LOT more easy to avoid overtraining or undertraining with constant than with Cosine.

With cosine the amount of guessing increases, will X epochs be good or will it fall too sharply or too slow ? And restarting a cosine training often makes the model go haywire.

With constant and adamw you will have a small dropoff, so it's not actually constant, but it is VERY gradual, so as long as you check your samples / eval / loss curve to see when it overtrains, you will have great results. With a adaptive parameter free optimizer like Prodigy, here I agree that cosine is good since it makes it work like adamw dropoff, very gradual and also fights back Prodigy's eagerness to raise LR again after having settled, and often too high.
>>
>>106428094
Slop
>>
File: file.png (182 KB, 768x838)
182 KB
182 KB PNG
>>106426678
Well, maybe the NetaYume guy isn't dumb after all doing reckless things at the behest of randos and popular demand, but I dunno how much I trust that to last given his prior finetune results. Really sucks we're down to this kind of cope hope for anime models.
>>
>>106428225
>https://files.catbox.moe/v8ysdr.jpg
Original image is at the bottom. The prompt was to make her lay at the beach.
>>
File: 1746793599545699.gif (3.25 MB, 450x462)
3.25 MB
3.25 MB GIF
after the eternal chroma vs qwen, now krea vs banana...
>>
>things that didn't happen
>>
I made my wife talk
>trigger warning: hooves
https://files.catbox.moe/jxse60.webm
>>
>>106428317
I was really hoping for tom green there.
>>
>>106428323
>LOOK IM A FARMER!
Society is such a joke that he actually became a farmer.
>>
File: 1752097503294124.png (964 KB, 1360x768)
964 KB
964 KB PNG
Mcdonalds, fix your menu or else.
>>
>>106428337
Do you have a lora for him? him being violent towards fast food has a lot of potential.
>>
>>106428337
>McDonaids
kek
>>
File: 1732271769879186.jpg (42 KB, 854x480)
42 KB
42 KB JPG
>>106428341
no need, qwen edit can make anyone do anything more or less

source image:

prompt: A brown bag with the McDonalds logo on it is resting on a marble counter in a kitchen, beside some McDonalds cheeseburgers. the man is looking at the camera with a serious expression, and is holding a large butcher knife.

with the light2x lora it only takes 8 steps, qwen is otherwise slower than flux.

https://huggingface.co/lightx2v/Qwen-Image-Lightning/blob/main/Qwen-Image-Edit-Lightning-8steps-V1.0.safetensors
>>
>schizoid theory: statler is rocketnon
>>
>>106428362
Holy crap. TY anon. AI has so many buzzwords its impossible to research.
>>
https://huggingface.co/starsfriday/Qwen-Image-Edit-Remove-Clothes

have not tried this yet, does it work like kontext

and dont do it to reviewbrah
>>
>>106428280
his first version was honestly a fine attempt given it was a tune of the BETA version of the original Neta IMO. Once he moved over to tuning against the main 1.0 release, in his 2.0 / 2.0 Plus release, it was basically immediately better than the base 1.0, I'd say.
>>
>>106428370
yes
>>
>>106428370
no
>>
>>106428304
no one was arguing Krea vs Banana lol, I merely pointed out that Banana (and Imagen) are not actually really that aesthetically similar to any other non-Google model
>>
>>106428370
maybe
>>
>>106428377
downloading now b4 nuked
>>
>>106428370
can you repeat the question?
>>
>>106428370
only one of them posts good gens so...
>>
>>106428397
with google you can find all these without much issue, they just cant host it on civitai or whatever.
>>
File: 1726013982678174.png (1.14 MB, 1360x768)
1.14 MB
1.14 MB PNG
The man is wearing a black suit and is standing outside a McDonalds restaurant on a summer day. He is pointing at the McDonalds sign above the restaurant and is smiling. He is wearing brown dress shoes.

it didn't make him a hobbit, based qwen
>>
>>106428415
>not holding an assault rifle
Please do him
>outside with assault rifle
>inside angry, scaring register staff
>inside the back kitchen
>shooting hamburgers
>>
>>106428415
>it didn't make him a hobbit
Kontext be seething
>>
Is there a way to apply LoRa for variable strength over time-step in Wan video? As in 1.0 strength at step 1 and then strength 0.5 at step 3. Controlnet has something like that.
>>
>>106428280
Kek.
NetaLumina is Chroma story arc but for anime models.
>>
File: 1752155071318359.png (1.03 MB, 1360x768)
1.03 MB
1.03 MB PNG
The man is wearing a black suit and is sitting in a booth inside a McDonalds restaurant on a summer day. beside him is a brown bag with the McDonalds logo, a bottle of champagne in a bucket of ice, and a plate and silverware. A McDonalds cheeseburger is on the plate.

fine dining.
>>
>>106428280
no need for cope, wai v14 or hassaku + controlnet union + adetailer is all you need, the base checkpoints know 99% of characters even before loras (which there are millions of).

use this extension in forge/reforge for anime models, it's so easy to use tags just by typing:

https://github.com/DominikDoom/a1111-sd-webui-tagcomplete
>>
File: file.jpg (272 KB, 1600x1184)
272 KB
272 KB JPG
>>106428084
Yeah, I'd say kontext pro certainly wins that one. Fuck SaaS models, though, still.
>>
>>106428394
https://files.catbox.moe/i8tbj8.png
Prompt: A woman is cosplaying at comiket.
Left: Krea, Middle: Imagen 4, Right: Nano banana.
They're kind of similar.
>>
>>106428084
>>106428517
This shit is really pandora's box.
>>
>>106428406
>>106428397
>>106428377
initial test results:
>works instantly with simple "remove her clothes" prompt as advertised
>nipples a bit weird still, seems to like adding bush which is nice
>does a much better job of preserving body shape/anatomy. I was able to get nudes out of qwen with no LORA and also with that NSFW LORA, but they tend to modify body shape a lot more
it's a step forward but not great
>>
>>106428524
(I mean Krea Dev on left of course)
>>
https://huggingface.co/starsfriday/Qwen-Image-Edit-Remove-Clothes

so uhh, I guess it works.

"remove all the clothes of the figure in the picture", like the huggingface prompt examples

bikini be gone, AI magic:

https://files.catbox.moe/7khb2e.png
>>
>>106428574
also in the event of any issues you can just inpaint to fix with any noob/illu model for realism or even anime models will fix it.
>>
>>106428517
SaaS-crap is not necessarily better. Just out of the box Qwen is not that good at realism. It needs some kind of photorealism tune, then it's on par with those.
>>
>>106428574
blue board GOD
AI is better for funny memes, because all the small details make most porn uncanny. If I'm a tit man I'm not going to jack off to some weird AI nipples.
>>
File: FluxKrea_Output_112232.jpg (2.92 MB, 1768x2280)
2.92 MB
2.92 MB JPG
>>
>>106428524
were you using a Lora there? I've never seen Krea natively have that kind of random text watermark, ever
>>
another example, can easily retouch with inpaint but the core functionality (lora to remove clothes) works in qwen edit.

"remove all the clothes of the figure in the picture. her skin is light."

had to add a skin prompt cause the shadow was making the right side appear tanned, lol

here's an a -> b example for the lora. retouching/inpainting would take like 20 seconds, the core functionality works great.

https://files.catbox.moe/rl4em1.png
>>
>>106428600
still sloppa
>>
>>106428635
No LoRA.
>>106428636
Careful anon, are those real pics?
>>
>>106428524
this. if you are not familiar with AI tools, you can easily think that the same tool created them
>>
>>106428670
it's a test image for some random gravure photoset, not rando girls.
>>
>>106428649
different guy but yes it is a remake of that first one lol
>>
>>106428476
nobody wants to used outdated sdxl 4ch vae sloppa. it's 2+ years old now...
>>
also, even if it was trained on realistic photos, it works for anime too. should have prompted black blindfold to keep it on but...you get the idea.

got a nice 2b racing stripe:

https://files.catbox.moe/zve1dd.png
>>
>>106428672
The same tool can create them all depending on seeds and prompts imo. Maybe by default you will see a more realistic output and background on the Google models, and the outfits and props are more involved, but aside from that, with good prompt engineering it should be possible on Krea.
>>
>>106428713
"remove all the clothes of the figures in the picture."

yes, I know it's incredibly easy to generate noob/illu anime lewds. the point is testing the lora on what is by default a model that doesn't allow nsfw. I think it works better than the kontext remover, imo.

https://files.catbox.moe/pjynov.png
>>
File: 1741383976282077.png (1.67 MB, 1920x1080)
1.67 MB
1.67 MB PNG
kek, it even worked on the misato picture from eva.

source is this:

result is this:

https://files.catbox.moe/pvgkg3.png
>>
>>106428734
Not surprising. Flux is not a strong anime base.
>>
>>106428748
it did a decent job of preserving the style there
>>
File: file.jpg (443 KB, 2400x1184)
443 KB
443 KB JPG
>>106428590
That would be great, for sure.
Still going to mess around with it some more, but the lighting on the model is real annoying, it's so damn bright. Might fuck around with model shift tomorrow, gotta sleep now though.
>>
>>106428758
yeah, for the anime images ive tried it has preserved color/shading style which is neat. kontext is pretty good but qwen edit is even better imo at preserving style.
>>
>everyone forgetting it's a blue board
uhh based?
>>
File: 1751231779241867.png (943 KB, 1360x768)
943 KB
943 KB PNG
>>106428748
the purple hair anime woman in the picture is standing on a sunny beach, waving hello to the camera.

this time just a reg prompt with the 8 step lightx2 lora (for speed purposes)

neat. I didnt even prompt to keep her expression the same or style the same or whatever.
>>
>>106428789
got the blues and totally bored. so?
>>
File: 1752487358206719.png (948 KB, 1360x768)
948 KB
948 KB PNG
>>106428799
this time with "keep her expression the same" added:
>>
>>106428789
I'm colorblind
>>
File: 00043-1029807901.png (1.87 MB, 896x1152)
1.87 MB
1.87 MB PNG
>>
>>106428835
thought i was in /sdg/ for a second
>>
File: 1737924704471707.png (994 KB, 1360x768)
994 KB
994 KB PNG
>>106428803
the purple hair anime woman in the picture is sitting in a business office typing at a computer, with a large white CRT monitor and white tower computer. she is typing. keep her expression the same.

nice retro pc.
>>
>>106428802
blue boards are stupid. adult website. not nearly as much grooming as roblox or discord
>>
File: 1737137809531294.png (1003 KB, 1360x768)
1003 KB
1003 KB PNG
>>106428845
>>
>>106428636
seems like undertrained, the breasts size are totally different in the output image, same as the kontext nudify lora, it would give women giant boobs
>>
File: 1730359557208978.png (987 KB, 1360x768)
987 KB
987 KB PNG
>>106428855
last misato pc.
>>
File: WAN_00015.mp4 (1.52 MB, 568x856)
1.52 MB
1.52 MB MP4
>>106428094
cute, borrowing that gen
>>
File: AnimateDiff_00244.mp4 (2.67 MB, 720x1280)
2.67 MB
2.67 MB MP4
>>106427243
>>
File: Asuka.webm (219 KB, 568x856)
219 KB
219 KB WEBM
>>106428872
Neet. I removed the garbage at the end.
>>
>>106428718
literally no one was even saying Imagen was "better" specifically, my original point (which I maintain) was just that most of the time if given a very short prompt, it's really not all that stylistically similar to Flux Krea at all (both for realism and non, really). The other guy's cosplay example is a pretty rare exception IMO.
>>
>>106428872
rocketnon this is horrible we can all tell which gens are yours
>>
>>106428872
extremely good clarity. wan 2.2?
>>
>>106428901
this one is dope
>>
>>106428907
cry harder no gen
>>
>>106428906
For every image I gen with nano banana, I feel like I'm genning with Flux Krea. This behavior is consistent across every photoreal prompt I tested.
>>
File: ComfyUI_00210_.png (3.62 MB, 1280x1920)
3.62 MB
3.62 MB PNG
>>106428872
>>106428901
really nice, thanks
>>
>>106428907
rent free
>>
>>106428939
he is a THREAD SHITTER HE IS WALDORF CANT YOU SEE THAT?
>>
>>106428962
In fact, it wouldn't surprise me if Google just picked up Krea weights, did some tuning to make it multimodal and then called it a day.
>>
Having a picture of a girl and prompting her to remove her clothes feels like a form of mindcontrol and is very hot.
>>
>replying to himself
>>
File: 1742025960133409.png (1.05 MB, 1360x768)
1.05 MB
1.05 MB PNG
>my order is wrong, and my day is ruined.
>>
>>106429003
>everyone i dont like is all one human
>>
File: 1734481127114887.png (1.07 MB, 1360x768)
1.07 MB
1.07 MB PNG
>>106429012
okay now we have a wan candidate. no violence just shooting a vending machine or something.
>>
>>106428996
It's a bit too good at anime for it to be Krea, it's just probably once you get photoreal you can't really get more photoreal lol
>>106429000
Then that strip LoRA for Wan may just ruin you lol
>>
File: AnimateDiff_00245.mp4 (3.21 MB, 720x928)
3.21 MB
3.21 MB MP4
>>106428600
>>
>>106429012
>>106429022
From the size and fit of his suits, I didn't think he was that big...
>>
>>106429036
he's a big guy.
>>
>>106429022
nice chair bro
>>
Gimme your Labor Day Special!
>>
>>106428969
Slop
>>
>>106429023
he has to be trolling lol, there's no way he actually thinks there's any chance that Nano Banana (which like I said earlier is VERY CLEARLY related to Imagen) is a Krea finetune
>>
>>106429025
nice
>>
Disturbing lack of gens in this thread.
>>
File: 1730214524234162.mp4 (488 KB, 672x480)
488 KB
488 KB MP4
the man holding a plate with a hamburger is upset and throws the plate at the counter behind him. the people behind him squat down to avoid the plate.

upset brah
>>
File: ComfyUI_16410.png (2.74 MB, 1200x1600)
2.74 MB
2.74 MB PNG
>>106429172
I'm not genning anything interesting... just fiddling with my LLM sys prompt tonight.

>>106429179
His expression (or lack thereof) had me giggling pretty hard. All hunched over like that.
>>
>>106429179
I don't know how McDonalds taste in America, but every time they tried to start a franchise where I live they always shut down a few months later because of how terrible they taste.
>>
File: img_00083_.png (3.04 MB, 1400x1400)
3.04 MB
3.04 MB PNG
>>106429172
>>
Does Wan2GP have a proper NSFW model?
>>
File: 1726639929930190.mp4 (434 KB, 672x480)
434 KB
434 KB MP4
better toss:
>>
>>106429133
It can be. Outputs are very similiar and we don't know how Google trains their models. This could be a Flux-based model, such as Kontext, or a tune.

>>106429023
That's what a tune would do, improve it at anime etc.
>>
>>106429199
amerifat here. last time i ate mcdonalds it was honestly the worst fast food i've ever eaten. i was upset
>>
>>106429206
cliprel. he needs an uzi, and to shoot hamburgers.
https://www.youtube.com/watch?v=XkwQ6EjLdMQ
>>
File: 1742607622672277.mp4 (474 KB, 672x480)
474 KB
474 KB MP4
day: ruined
>>
File: 1552655500961.png (153 KB, 333x254)
153 KB
153 KB PNG
https://huggingface.co/tencent/HunyuanVideo-Foley/tree/main
How does this fare in comparison to the Wan audio model? Anyone tested yet?
>>
File: Qwen_Boebert_Epoch10.png (1.21 MB, 1024x1024)
1.21 MB
1.21 MB PNG
>>106428027
this is what a Lauren Boebert lora looks like by just epoch 10 with these settings

(this doesn't mean it's done though, Qwen has a gorillion parameters and isn't distilled so it's VERY hard to overtrain a Lora on it, the only thing you can do to combat just the way Qwen looks in general and add more detail to the likeness is to keep going for basically as many epochs as you can afford timewise or moneywise or creditswise or whatever)
>>
File: 1738543869317802.webm (878 KB, 672x480)
878 KB
878 KB WEBM
holy shit, well this was unexpected.

the man holding a plate with a hamburger runs towards the man on the right and hits him with the plate. The man on the right falls down.

technically...it worked?
>>
>>106429253
and yes this was reply-to-self if anyone was confused

>>106429211
no it can't, the overall prompt adherence of Imagen 4 and Nano Banana has zero chance of being from a T5-based model, they're definitely on a much larger, way more advanced Google LLM with basically infinity context as opposed to the 512 token limit of T5.
>>
>>106429270
I need to specify wearing a tie to make it more obvious who is the one doing the action. still funny though.
>>
File: 1746342624871165.mp4 (513 KB, 672x480)
513 KB
513 KB MP4
the same guy did it. I even said man with the tie.
>>
>still no chroma category on civit
why
>>
>>106429253
Here, I got you a present:
https://civitai.com/models/1910170/woman-feet?modelVersionId=2162006
>>
Do any of you guys know anything about good printers? I'd like to make little wallet-size print-outs of my sexiest 1girls to carry around with me, but I'd want the best possible color depth and DPI and so on, so I can really enjoy looking at it up close. Has anyone done this?
>>
>>106429297
staff too busy trying to keep the site from imploding
>>
>>106429298
why would I want this
>>
File: WAN_22_I2V_INT__00045.mp4 (3.89 MB, 720x1280)
3.89 MB
3.89 MB MP4
>>
File: 1750244220006102.mp4 (651 KB, 672x480)
651 KB
651 KB MP4
okay, now brah is dealing the pain.
>>
>>106429275
>512 token limit of T5.
When I'm describing an image with gemini or joycaption, can I specify the token count and will it understand it?
>>
>>106429306
i thought they figured out the payment stuff and got a new processor?
>>
>suddenly firefox is using 50% cpu and eating up 15gb ram
>returns to normal when I close all my civitai tabs
civitai sometimes goes fucking crazy are they mining on my cpu or what the fuck
>>
File: ComfyUI_00076_.png (3.14 MB, 1280x1920)
3.14 MB
3.14 MB PNG
>>106429314
nah hommie, asuka don't cry
>>
File: 1741631400073677.webm (914 KB, 672x480)
914 KB
914 KB WEBM
okay. this is what I wanted. get em reviewbrah.
>>
>>106429324
Gemini can kinda do it but not with extreme accuracy, so it helps to sort of leave headroom with how many you tell it to cap at.
>>
>>106429275
Haven't seen how long their context limits are. Though LLMs show it's possible to extend that dynamically even with existing architectures: https://kaiokendev.github.io/context
Plus a prompt adherence improving paper came out that even helps SD 1.5 with no changes to the code. Google does have talented engineers, and there's no doubt they probably have a better model from a higher quality dataset, but what they're doing is not necessarily more architecturally groundbreaking than what we have, at least not yet. It seems like it's just a scaled up version of what we already have, and even then, look at the far right image https://files.catbox.moe/i8tbj8.png
The faces in the background... That is not perfect. I contend that many other models can do better. Again here is Krea dev https://files.catbox.moe/qmbyb5.jpg
You guys are really overhyping this Google model.
>>
>>106429365
If lode gets it working, Chroma will move on from t5 to qwen 2.5
>>
File: 1752856897039710.webm (1.3 MB, 672x480)
1.3 MB
1.3 MB WEBM
in theory we aren't far away from full AI reviews

but you cant replace reviewbrah
>>
File: Combined.mp4 (3.33 MB, 1376x640)
3.33 MB
3.33 MB MP4
>>106427085
i stitched together gens with no prompt but to what end
>>
File: 1730502141196689.mp4 (1.07 MB, 672x480)
1.07 MB
1.07 MB MP4
success

the anime girl stands up, and picks up the white CRT monitor and throws it out the window to the right.
>>
>>106429365
Nvm, seems Gemini is autoregressive like 4o. So it understands prompts like "show me a room with no elephants in it, make sure to annotate the image to show me why there are no possible elephants" which is pretty cool. Not impressive aesthetic wise, but still impressive architecture/prompt following.
>>
I think nano banana is better. Qwen is my new watermark remover now I guess.
>>
File: 1746682303899022.mp4 (512 KB, 672x480)
512 KB
512 KB MP4
the anime girl is typing on her computer, and the site 4chan appears on the screen, along with the text "/LDG/". She gives the thumbs up.

not quite 4chan but still a good gesture
>>
>>106429460
Gemini result
https://files.catbox.moe/18zeg3.png
>>
File: chroma1-hd_00009_.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
>>
make sure to remove tranistudio from the next bake
>>
In wan, is it better to get a sampler than converges fast or something slower so the refiner low noise model can do something?
>>
>>106429434
lol i do that sometimes too shits fun just to see what it does.
https://www.reddit.com/r/comfyui/comments/1n3qm5c/
this dude did it with s2v got a cool one with no image/prompt input.
>>
>>106429545
>>106429545
>>106429545
>>
>>
>>106429460
I think at this point they probably train that in since it's such a overdone benchmark lol. Try man without a beard. Will give you a man with a beard
>>
>>106429440
lmao epic gen
>>
>>106429480
based??
>>
any tips for achieving extreme angles with SDXL? like from directly below?
>>
>>106429336
Slop



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.