[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: tmp.jpg (758 KB, 3264x3264)
758 KB
758 KB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>101671236

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Kolors
https://gokaygokay-kolors.hf.space
Nodes: https://github.com/kijai/ComfyUI-KwaiKolorsWrapper

>AuraFlow
https://fal.ai/models/fal-ai/aura-flow
https://huggingface.co/fal/AuraFlows

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>GPU performance
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/trash/sdg
>>
blessed thread of frenship
>>
>>101674851
>Previous thread lasted 4 hours
Nothing like a new model to get anon going
>>
File: ComfyUI_Flux_0223.jpg (137 KB, 832x1216)
137 KB
137 KB JPG
>>
File: file.png (1.43 MB, 1280x800)
1.43 MB
1.43 MB PNG
>>
the hired saas shill worked, ahhh /ldg/ is playing 4d chess with itself
>>
File: ComfyUI_00275_.png (1.57 MB, 1024x1024)
1.57 MB
1.57 MB PNG
>>
Flux bad. Come at me.
>>
File: 00707-5038312178.jpg (138 KB, 1024x1024)
138 KB
138 KB JPG
>>
File: file.png (441 KB, 1024x1024)
441 KB
441 KB PNG
>>101675033
no need, i bring a solution
>>
File: ComfyUI_00034_.png (1.47 MB, 1024x1024)
1.47 MB
1.47 MB PNG
Even on fp8 DiT, the pictures are impressive as fuck, too bad I still hate comfyUI and this shit unloads and reload models everytime I make a fucking gen, reeeeeeeee
https://github.com/comfyanonymous/ComfyUI/issues/2046
>>
all the parameters in the world won't save you from a slopped dataset
>>
>sd3 meme gens
>flux meme gens
One can foretell the future by looking into the past
>>
File: 00320-2152674471.jpg (105 KB, 1024x1024)
105 KB
105 KB JPG
>>
File: file.png (1.06 MB, 1024x1024)
1.06 MB
1.06 MB PNG
>>
File: file.png (2.12 MB, 1280x800)
2.12 MB
2.12 MB PNG
>>101675108
There's only memes and porn.
>>
File: ComfyUI_00036_.png (823 KB, 1024x1024)
823 KB
823 KB PNG
>>101675106
in flux we trust
>>
Synthetic datasets are the future. The reason why Sora turned out so good, was the use of computer generated data to train spatial and temoral awareness
>>
>>101675134
kingdom hearts was made before ai
>>
File: file.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
my eyes gonna break soon
>>
>>101675149
picrel me
>>
Have anyone tried running flux on an APU with DDR5 ram?
>>
thread challenge: make the biggest pair of anime boobs you can with flux, bonus points for making me cum
>>
File: file.png (1.27 MB, 1024x1024)
1.27 MB
1.27 MB PNG
>>101675162
missed the button apparently
>>
File: 00829-3712576348.jpg (63 KB, 1024x1024)
63 KB
63 KB JPG
why does it sometimes make output very blurry?
>>
>>101675181
only small booba unfortunately :(
>>
File: file.png (1.17 MB, 1024x1024)
1.17 MB
1.17 MB PNG
>>101675201
>>
File: out-0-1.png (1.05 MB, 1024x1024)
1.05 MB
1.05 MB PNG
>>
File: 00650-1636729884.jpg (134 KB, 1024x1024)
134 KB
134 KB JPG
>>
File: lele.jpg (286 KB, 3041x1642)
286 KB
286 KB JPG
>>101675193
yeah I noticed too, sometimes it gets really blurry, dunno if that's a vae issue or something else
>>
>>101675106
>just a billion more images
>t. eurofag
>>
8gb vram 32gb ram
is there any hope for me to run flux?
>>
File: ComfyUI_00290_.png (1.01 MB, 1024x1024)
1.01 MB
1.01 MB PNG
>>101675181
One boobie coming up
>>
File: file.png (871 KB, 1024x1024)
871 KB
871 KB PNG
>an anime picture of hatsune miku hanging herself, low angle, suicide
anyone figured out how to gen people hanging themselves with flux?
>>
>>101675260
not a chance lol, even on fp8 it asks for 13gb of vram
>>
File: ComfyUI_169867_.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
>>101675235
It's a CFG issue, these models don't work with CFG. Set it to 1.0 if using the regular sampler node.

>>101675260
If you use the fp8 option.
>>
>>101675278
be more descriptive, like "a rope is surrounding her neck" or some shit I guess?
>>
File: ComfyUI_Flux_0235.jpg (248 KB, 832x1216)
248 KB
248 KB JPG
>>
File: file.png (1.52 MB, 1024x1024)
1.52 MB
1.52 MB PNG
>>
File: wtf.jpg (76 KB, 2818x390)
76 KB
76 KB JPG
>>101675293
>It's a CFG issue, these models don't work with CFG. Set it to 1.0 if using the regular sampler node.
Holy shit, I've put CFG = 1.0 on my latest gen and now the speed is like 2 times faster, wtf
>>
File: file.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>101675306
>an anime picture of hatsune miku hanging herself, a rope is surrounding her neck, low angle, suicide
it's getting there...
>>
>>101675335
T5 is very literal, try adding rope collar
>>
>>101675340
>T5 is very literal, try adding rope collar
can we replace T5 with something else or?
>>
File: file.png (1.59 MB, 1216x832)
1.59 MB
1.59 MB PNG
>>101675316
>>
File: ComfyUI_00042_.png (974 KB, 1024x1024)
974 KB
974 KB PNG
Can't believe we managed to get a dalle3 with better quality pictures at home, and it's only using 12gb of vram on fp8, feelsgoodman
>>
File: file.png (1.28 MB, 1024x1024)
1.28 MB
1.28 MB PNG
>>101675340
>T5 is very literal, try adding rope collar
i didn't know that, ty for the tip
>an anime picture of hatsune miku hanging herself, a rope is surrounding her neck, she is wearing a rope collar, low angle
>>
>new model
>24GB VRAM
>don't feel like swapping the 7900XT in the linuxbox with the 7900XTX in the gaymingbox
>>
File: ComfyUI_169870_.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
>>101675328
That's what happens when the model only needs to do half the work because there's no negative prompt.
>>
what's going on with audio generation? I remember a model coming out months ago and then I suddenly stopped hearing about it.

where are the threads? did (((they))) shut it down?
>>
is flux 16ch?
>>
File: ComfyUI_169871_.png (1.28 MB, 1024x1024)
1.28 MB
1.28 MB PNG
>>101675389
https://comfyanonymous.github.io/ComfyUI_examples/audio/
Stable audio is a thing that exists.
>>
>>101675388
what? if this model doesn't work with cfg, that means it doesn't work with negative prompt?
>>
>>101675401
Good question
>>
File: ComfyUI_Flux_0241.jpg (104 KB, 832x1216)
104 KB
104 KB JPG
>>
File: what.jpg (338 KB, 2210x1501)
338 KB
338 KB JPG
I tried to change to 1400x1400 resolution and I got this error
>>
>>101675422
1536x1536 worked for me. but the image looked very overcooked
>>
File: ComfyUI_169872_.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
>>101675408
Exactly. Positive prompts only.

>>101675422
Yeah some specific resolutions won't work at the moment, I have to fix that.
>>
bigma.......
>>
>>101675441
>Exactly. Positive prompts only.
that's insane... negative prompts are really important though, and flux can't do that? damn...
>>
>>101675453
Flux only makes Bigma stronger because it's motivation. AI models are an arms race.
>>
>>101675441
i want to kill that creature
>>
File: file.png (1.33 MB, 1280x800)
1.33 MB
1.33 MB PNG
>>
>>101675306
>>101675335

You hang yourself with a "noose" fucking ESL faggots.
>>
>>101675472
>AI models are an arms race
nope, it's a finger race now. both kolors and flux have set a standard for hands
>>
>>101674851
>>
I guess artists have no reason to complain if the models can't generate a painting!
>>
Have we finally found the new base model friends?
>>
File: ComfyUI_169879_.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
>>10167547
She can defend herself.
>>
>>101675520
The issue is how expensive it will be to finetune. Joe smoe cant get it done on his 3090 anymore.
>>
>>101675293
>It's a CFG issue, these models don't work with CFG.
the API has a CFG setting though
https://replicate.com/black-forest-labs/flux-dev
>>
>>101675520
don't know, gotta wait 2 more months and see what happens
>>
>>101675543
dev version works with 2 ish cfg. its not that it does not support it, its just that distilling requires lower cft just like lightning models
>>
File: ComfyUI_00031_.png (2.14 MB, 1536x1536)
2.14 MB
2.14 MB PNG
>>101675436
1536x1536 looks fine for me on schnell, but like a blurry mess on dev
>>
File: file.png (1.59 MB, 1280x800)
1.59 MB
1.59 MB PNG
>>101675501
not sure if lying, schizo or incompetent
>>
>>101675556
but I'm running flex-dev, it's also a distilled model?
>>
>>101675520
>>101675542
The base model already requires 24GB of VRAM.
>>
>>101675542
QLoRA for diffusion image models when?
>>
File: ComfyUI_00033_.png (2 MB, 1536x1536)
2 MB
2 MB PNG
>>101675558
same on dev .. not sure whats better?
>>
File: ComfyUI_Flux_0249.jpg (201 KB, 1536x1536)
201 KB
201 KB JPG
>>101675558
maybe its scheduler related? ddim_uniform looked more cooked than picrel with simple scheduler. flux dev fp8
>>
>>101675580
Can run on 12GB vram with 8 bit.
>>
So Flux doesn't support negative prompts? The example Comfy workflow has a positive prompt only.
That's a pretty huge limitation even without taking into the account the slopped aesthetic and inability to do painting.
>>
>>101675541
https://www.reddit.com/r/comfyui/comments/17h66ld/comment/k6mxxac/?utm_source=share&utm_medium=web2x&context=3
Can you consider making multi-gpu possible now? we have a great model that is tough to run with only a 24gb vram card, would be cool if we could use multiple cards for that model and not rely on fp8 instead
>>
guess i'll just shleep till bigma then
>>
>>101675616
It does, guy did not know what he was talking about. CFG just needs to be low on distilled models
>>
>>101675608
>>101675558
so only dev makes blurry pictures like that? dunno if I should switch to schnell? what's the difference between the 2?
>>
>>101675500
give her armpit hair
>>
>>101675616
If you have a prompt you can have a negative prompt. It's just math: positive - negative = prompt
>>
File: ComfyUI_00034_.png (2.38 MB, 1536x1536)
2.38 MB
2.38 MB PNG
>>101675610
could be karras on dev seems to not work at all for me .. also is it a different data set? and what res does the data set have? details can pop 1536x1536 but it hugely depends on the subject you prompt
>>
>>101675641
schnell seems to be the sdxl lightning equivalent
>>
>22.28s/it
This is painful.
>>
File: ComfyUI_169880_.png (1.11 MB, 1024x1024)
1.11 MB
1.11 MB PNG
>>101675543
That's not real "cfg" it's an extra condition passed to the model.

https://github.com/comfyanonymous/ComfyUI/commit/a531001cc772305364a319a760fcd5034e28411a

You can use this new node to play around with it.

>>101675620
We'll see.
>>
>>101675641
omg... change cfg. schnell is like using a lightning model, it needs lower cfg
>>
>5g vram
Is it over for me?
>>
https://blackforestlabs.ai/announcing-black-forest-labs/
>FLUX.1 [dev]: FLUX.1 [dev] is an open-weight, guidance-distilled model for non-commercial applications.
>FLUX.1 [schnell]: our fastest model is tailored for local development and personal use. FLUX.1 [schnell] is openly available under an Apache2.0 license.
Uh oh...
>>
File: file.png (1.16 MB, 1024x1024)
1.16 MB
1.16 MB PNG
>>101675680
yup :D
>>
>>101675558
transformer-based models just seem way worse at out of distribution resolutions in general than diffusion models, I found the same thing with SD3

like when you make a diffusion model do an OOD resolution the texture of the image will still be fine, it'll just loop on the composition, so if you can use tactics to prevent the looping then you can push the limits of resolution quite well
but when you give a transformer model an OOD resolution it just starts falling apart, it doesn't loop, the image just turns to incoherent pixel mud
>>
File: dzdzdz.jpg (246 KB, 2658x1495)
246 KB
246 KB JPG
>>101675678
>omg... change cfg. schnell is like using a lightning model, it needs lower cfg
both need low cfg, flex is also a distilled model
>>
>>101675696
dam
it was nice to know you ldgbros
>>
File: file.png (626 KB, 1024x1024)
626 KB
626 KB PNG
>>
>>101675706
that's cute but in practice it's stupid
>>
Show me Flux's "double exposure"
>>
>>101675671
>>https://github.com/comfyanonymous/ComfyUI/commit/a531001cc772305364a319a760fcd5034e28411a
>You can use this new node to play around with it.
I have no idea what it is? how do I make it work?
>>
>>101675727
you have not articulated yourself clearly
>>
File: ComfyUI_00039_.png (3.32 MB, 1536x1536)
3.32 MB
3.32 MB PNG
>>101675641
damned if id knew, the huggingface page only lists very little details, but the same prompt can have very different results on both models, dev produces anime on one prompt for me, while schnell makes photorealism ... schnell works on karras, then dev fails on karras, I wish they had some documentation
>>
File: file.png (1.27 MB, 1280x800)
1.27 MB
1.27 MB PNG
>>101675731
seems fine
>>
realistically can bigma even reach this level? if it does i might die of hype.
>>
>>101675743
there's this >>101675685
>>
>>101675749
I'll count that
>>
>>101675542
I think normies can do smaller finetunes with less than 50 bucks when they rent H100 or two for a day.
>>
>>101675749
>>101675763
Not as good as base sigma desu but better than the others
>>
>>101675767
numb brain here, can you finetune a distilled model?
>>
>>101675831
That's a not cherry picked lazy gen. It's the first thing that came out.
>>
>>101675839
yes
>>
>>101675845
Relax anon we are on the same side
>>
>>101675858
My point is you can't really qualify how good it is because it's one sample. It knows the effect, I'm sure you can achieve something better.
>>
>>101675671
https://github.com/comfyanonymous/ComfyUI/commit/a531001cc772305364a319a760fcd5034e28411a
>You can use this new node to play around with it.
First time in my life I use ComfyUI, how do you load this node?
>>
>>
>>101675750
The 1.3B model will not be able to reach the level of a larger model. Its primary advantage will be that it's truly uncensored and will have a lot of pop culture knowledge. Who knows what the official Pixart model will target, I assume they're going to do a 24 GB squeezer too and it might be really good, they certainly proved they're kings of efficiency.
>>
File: ComfyUI_Flux_0255.jpg (220 KB, 1536x1536)
220 KB
220 KB JPG
>>101675876
>CLIPTextEncodeFlux

double click on a blank spot and try searching for that
>>
File: ComfyUI_169889_.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>101675876
>>101675734
You update, double click on the canvas and search for Flux and then click on CLIPTextEncodeFlux.
Then you replace the CLIPTextEncode node in your Flux workflow with it.
>>
if the official sigma successor is a 24gb squeezer what will we be calling it?
>>
>>101675922
Squeezema
>>
Both Flux models seem unable to generate illustrations of flat-chested lolis in a swimsuit.
>>
Question: If someone makes a finetune for flux-schnell, can someone use it with Dev?
>>
>>101675938
Yeah, but they're able to generate flat-chested children in a swimsuit.
>>
File: file.png (2.57 MB, 1024x1024)
2.57 MB
2.57 MB PNG
i jus want a bigger pixart model grumble grumble
>>
File: file.png (991 KB, 1280x800)
991 KB
991 KB PNG
>>101675938
FBI please
>>
im using a realistic pony checkpoint, and i want more vivid colors so i dont have to adjust it in photoshop. should i be trying to find a color vae, or a lora, or just try to force it with prompting/
>>
File: aa.jpg (225 KB, 2848x1230)
225 KB
225 KB JPG
>>101675919
>>101675920
like this? so the "guidance" thing is like the cfg of distilled models?
>>
when i get my hands on flux itll be over for you
>>
ugh these models that were trained on AI captions are doa for anyone who wants to generate paintings
auraflow had this problem and flux seems to as well, where they don't even know the names or styles of famous dead public domain painters
>>
>>101675968
SDXL is fundamentally fucked when it comes to colors
>>
>>101675954
They really are. Only problem is, I don't care about realistic photos.
>>
>>101675959
Buy me a computer with four H100s
>>
File: ComfyUI_00495_.jpg (140 KB, 1024x1024)
140 KB
140 KB JPG
A cat riding a bicycle with wheels made of pizza on the surface of the Moon. Planet Earth is exploding in the background.
>>
>>101675974
sorry pal, show's over, everybody's gone back to waiting for the next thing mode.
>>
>>101675980
at this point, more pretraining must be done on those models to add the styles and stuff
>>
>>101676008
what about a few words of encouragement instead?
>>
>>101675989
that sucks, are there any techniques people use to adjust? like control nets or something? i cant go back to 1.5 after seeing how sdxl can generate feet and stuff
>>
File: asasa.jpg (275 KB, 3108x1636)
275 KB
275 KB JPG
>A sculture of Donald Trump made only with bottles
What the fuck??
>>
File: ComfyUI_169890_.png (1.23 MB, 1024x1024)
1.23 MB
1.23 MB PNG
>>101675970
Set CFG to 1.0 or use the example workflow.
>>
will we be getting native pixart support on comfyui one day?
>>
>>101676055
you're prompting on clip_l anon, that's not the good one, go for the t5xxl slot
>>
>>101675671
>>101675920
you have up mixed t5 and clip, they have an opposite effect
>>
>>101675963
Or flat-chested anime girls in a swimsuit / bikini / underwear in general. Just try, Flux will always add fairly large boobs. How do you control breast size with these models?
>>
>>101676064
Does the dual clip loader run t5 on CPU/RAM by default? That's what I want and it's what seems to be happening, just want to confirm
>>
>>101675671
Is there a way to remove the "clip_l" slot with a "remove button" or something? that shit is useless and take some space at the end
>>
>people using flux pro in replicate and other API sites
>don't realize that schnell is what will get any support from finetune community, if it even get it, since it's still very unclear how hard it is to train
>>
>>101676161
just like sd3 huh
>>
>>101676161
what about flux?
>>
>>101676168
cept with apache 2.0 license and not being shit
>>
because flux is distilled, it means it's supposed to work at 4steps?
>>
>>101676224
8
>>
File: asasa.png (3.07 MB, 3116x3108)
3.07 MB
3.07 MB PNG
desu I don't see much difference between pro and flux, It's rare that companies are making a local model at the same level as their "best" API model
>>
>>101676257
how do you know it's 8?
>>
>>101676326
default workflow had 8. Should work at 4 as well just with less detail
>>
>>101676316
neither of them is remotely in the style of picasso like you asked for, though
none of these new AI-captioned models coming out seem to understand style for shit

old models were great at style but had shit coherence, now it seems we've swapped the two
>>
>>101676055
Negative prompt still doesn't seem to work with this setup. What do
>>
>>101676316
You mean "dev" and "pro"?
Also, try to remember that "dev" is not Apache 2.0 and will most likely never get big costly finetunes like Pony, because it will costs like hundred thousand dollars and you can't monetize it in any way.

There is third one, which is "schnell", and that is the truly open version, but it's distill, few-step model that requires at least 16 GB of VRAM to operate and will be a bitch to train, and when you use schnell, you realize that training is what it needs. It lacks styles and concepts.
>>
File: Comparaison-Steps.jpg (1.09 MB, 3072x1024)
1.09 MB
1.09 MB JPG
>>101676345
Sounds like steps = 10 is the sweet spot
>>
>>101676360
Artists stuff are removed due to legal concerns. Its still a very grey area.
>>
>>101676372
>There is third one, which is "schnell", and that is the truly open version, but it's distill
Flux is also a distilled model
https://blackforestlabs.ai/announcing-black-forest-labs/
>FLUX.1 [dev]: FLUX.1 [dev] is an open-weight, guidance-distilled model for non-commercial applications. Directly distilled from FLUX.1 [pro]
>>
>>101676382
No it's because of AI captioning
That's why they don't know dead public domain artists from the 18th century either
>>
when I change the weight_dtype between fp8_e4m3fn and fp8_e5m2 I see no difference, what's it supposed to do again?
>>
>>101676402
yeah but it knows celebrities though so...
>>
>>101676402
>not trained on data
>REEE its the ai captioning. SLOP SLOP SLOP
>>
>>101676407
show us 2 pictures with the exact same seed and settings to be sure there's no difference
>>
>>101676419
meds
>>
>>101676402
there is no way Picasso fails to be picked up by even v1 of llava. they were removed.
>>
>>101676449
If that's true, then not coding in an exception for dead public domain painters is lazy as fuck

it's a few hundred names at most, assembling a list to make a whitelist of allowed artist names would have taken 10 minutes
>>
>>101676392
They are all "flux". There is three flux.1 variants.
>Pro
>Dev
>Schnell

Pro is what you will never get to run locally. It is the one they are monetizing. Dev is what you *can* play with locally, but you can't monetize and you agree their license (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md) and it restricts you. Schnell is the one you can do what ever you want with.
>>
>Requested to load FluxClipModel_
Is this mendatory to load clip_l? Why does this model has 2 text encoders?
>>
>>101676486
>lazy as fuck
this has been the case for local datasets since sd 1.4, can't really expect much to change.
>>
>>101676490
yeah my b, what I wanted to say is that dev is as much as a distilled model as schnell
>>
File: ComfyUI_temp_obxpl_00001_.png (1.46 MB, 1120x1440)
1.46 MB
1.46 MB PNG
>>101676316
what would be the equivalent of interval in sd language, a loopback? a detailer node?
>>
>>101676490
Pro is also the base model. I don't understand why more people aren't concerned about that.
>>
File: 1712724608509419.png (920 KB, 1220x540)
920 KB
920 KB PNG
>try to make Kolors draw closed lips challenge (impossible)
REEEEEEEEEEEEEEE
>>
File: ComfyUI_00287_.jpg (634 KB, 2048x2048)
634 KB
634 KB JPG
how are you supposed to get rid of the blurry ass backgrounds with no negative prompt
>>
>>101676538
yeah, it's retarded to pretend that negative prompt isn't important, it fucking is
>>
>>101676538
>how are you supposed to get rid of the blurry ass backgrounds with no negative prompt
you put a low cfg value like 1.5 so that you can use negative prompts
>>
>>101676547
ugh, you must be a dallejourney shill
>>
>>101676570
I'm not lol, I fucking love this model, we literally have dalle3 with better realistic humans at home, it's just that negative prompt is really important though
>>
>>101676583
https://github.com/comfyanonymous/ComfyUI/commit/a531001cc772305364a319a760fcd5034e28411a
>>
>>101676593
that doesn't change anything, you can't use negative prompt at cfg = 1, regardless of the guidance scale
>>
>>101676602
put cfg at 1.1 then
>>
>>101676602

This, this model needs really low cfg, like 1.5 or a bit lower
>>101676612
>>
File: ezgif-6-3ce5c3ff22.gif (1.03 MB, 1024x1024)
1.03 MB
1.03 MB GIF
>>101676642
>>101676612
yeah, even at 1.5 it starts to hurt the output
>>
>both of the released sets of weights, the Schnell and the Dev model, are distilled from the Pro model, and probably not directly tunable in the traditional sense

It's over
>>
>>101676724
you can tune distilled models, just takes longer to make up for the loss
>>
File: asa.jpg (243 KB, 2945x1566)
243 KB
243 KB JPG
euler_a doesn't seem to work on flux-dev
>>
>>101676766
tried any custom SCHEDULERS?
>>
>>101676772
no lol
>>
wonder if an ipadapter could fix flux not being able to do painting styles
you could use ipadapter to do style transfer maybe with a painting as input
>>
>>101676766
ancestral samplers don't seem to work with SD3 either, maybe there's something about them that makes them only work for diffusion models and not transformers?
>>
File: nono.png (2.06 MB, 2214x1391)
2.06 MB
2.06 MB PNG
I really don't recommand to use the fp8 version of the text encoder, this shit is ass
>>
>>101676851
it's been a while I haven't used SD models, and there's a shit ton of samplers now, is there one that is above the rest or do they all have the same level?
>>
What s/it we suffering at boys
>>
>>101676752
Did they even release training code?
>>
>>101676851
Ancestral samplers work on sigma
>>
>>101676852
So we have no other choice but to use two text encoders for this model? can this be two t5xxl instead of t5xxl + clip_l?
>>
>>101676888
I thought pixart was a diffusion model
>>
File: ComfyUI_00069_.png (1021 KB, 1024x1024)
1021 KB
1021 KB PNG
The advantage of having dalle3 at home is that I won't have to be cautious of triggering any cucked censorship, feelsgoodman
>>
File: 1699296391689700.png (167 KB, 575x774)
167 KB
167 KB PNG
First time ever using Comfy, how the FUCK do I load Flux properly? All the pics it's been spitting out so far look dark and noisy as fuck.
>>
>>101675377
>>101676953
This is extremely dangerous to our democracy.
>>
>>101676975
download this picture >>101676953
and click on load on ComfyUi, it will use the working node and you're good to go
>>
>>101676996
4chinz removes pic metadata, got a catbox?
>>
they obviously did not fully strip artist names
prompt here was "art by Makoto Shinkai", you can see it clearly recognizes him

so I think the anon that said the lack of artist knowledge was due to AI captioning rather than intentional stripping is likely correct
>>
>>101677006
oh my b, there you go anon
https://files.catbox.moe/k50f9f.png
>>
File: ComfyUI_00071_.png (1.26 MB, 1200x1200)
1.26 MB
1.26 MB PNG
sometimes it add some artist signatures, this one doesn't seem to exist though
>>
>>101677028
Thanks, I'll give that a try.
>>
>>101677014
i think the answer is both in a way. the autocaptioner loses some niche ones while the pruning is hitting bigger ones. i've seen at least 3 different people try picasso and fail each time. i mean if it can't do that then idk what to say, maybe it's an architectural issue or something i don't really know.
>>
>>101677014
another test, this was
>art by Margaret Olley
again it clearly knows this name, because this is the kind of thing she painted, still life of flower vases and such

so it's not total doom for painting enjoyers
>>
>>101677059
Ahh man, nice texture on this one too. Oil paint with a canvas texture. So it can do that after all.
>>
File: IsThisAJoke.jpg (321 KB, 2992x1594)
321 KB
321 KB JPG
No fucking way... it doesn't know who Rei Ayanami is...
>>
File: ComfyUI_00074_.png (1.28 MB, 1024x1024)
1.28 MB
1.28 MB PNG
>>101677059
>it clearly knows this name, because this is the kind of thing she painted, still life of flower vases and such
I tried using that style and it didn't work, it didn't include Trump too but only his hair onto Hatsune Miku top kek
>>
File: auraflow cat.png (759 KB, 793x758)
759 KB
759 KB PNG
>>101677059
Yeah that looks very good, yet the ones i've tried for obvious artists have come out terrible. I wonder if this is similar to that Auraflow cat thing. Where 1/4 of the synthetic dataset produced a 'no nsfw cat' so it wound up reflecting in random prompts due to it being assigned captions regardless. Perhaps there was synthetic data used from garbage midjourney gens where people throw in "By Yoshitaka Amano, by Pablo Picasso, by Wlop, by Greg Rutkowski" and out comes a slopped-up 1girl that looks nothing like any of them. If they trained on those captions directly, then it would make sense that the more popular artists have their styles completely watered down from misuse. This is now my leading schizotheory after thinking about it more.
>>
>>101677014
interesting
>>
File: ComfyUI_13062_.jpg (449 KB, 1280x768)
449 KB
449 KB JPG
>>
I'm late. Does it do NSFW? How pozzed is their dataset? What about overall anatomy.
>>
>>101677153
>Does it do NSFW?
Yes
> How pozzed is their dataset?
Doesn't know much style and anime characters :(
>What about overall anatomy.
Really great, API level great
>>
File: Capture.jpg (380 KB, 3119x1514)
380 KB
380 KB JPG
>>101677014
>hey obviously did not fully strip artist names
>prompt here was "art by Makoto Shinkai", you can see it clearly recognizes him
it feels like as long as you put a character that was known for its own style, it's hard to make it translate to a different style, that sucks
>>
>>101677137
so you're saying my esoteric knowledge of classical oil painter names that normal people have never heard of is finally going to be useful
>>
>>101677141
I swear to go you give a random guy this picture it will believe it's from a random anime, that's great
>>
>>101677137
that's an interesting theory anon, another reason to not use synthetic data, it will train the model on wrong representations of real artists
>>
>>101677188
I wish but I tried that on Winterhalter, Albert Lynch, Hans Gude, etc and it just failed every one. It doesn't even do them wrong, it just doesn't do them at all
>>
How do we finetune it?
>>
>>101675377
>>101675615
Works even on 10gb 3080 with 8bit :)

(comfy launched with --novram probably helps too)
>>
File: ComfyUI_temp_vvjht_00363_.png (1.93 MB, 1024x1024)
1.93 MB
1.93 MB PNG
localchads won
>>
>>101677280
What speed are you getting? 2 hours per image?
>>
is there really a need for a diffusion general on almost every board, even /b/? seems suspicious to me, it's very under powered and not really "ai" but it's being shoved in everywhere on 4chan
every single one of these generals is bumped by bots
>>
>>101677188
>>101677212
>Painting of Super Mario's head, classical painting by Hans Gude, Winterhalter, Albert Lynch, Michelangelo.
I either get this generic 3d render on top of a paint-ish background, or some random airbrushed guy that looks nothing like mario on top of a paint-ish background. The model being unable to put characters into a 'le classic painting' style is really odd given this was one of the very first uses of SD back in 1.4 and WD 1.2 with the victorian booba girls.
I really don't think it's 'style pruning' here as even with every artist name pruned the autotagger should recognize 'a classical painting'. Am I a promptlet? Is such a prompt not enough?
>>
>>101677299
The /b/ one is definitely needed
>>
>>101673628
>>101674725

If synthetic images make you that upset, why not train your own models then rather than getting angry at others?

Maybe you can convince someone gullible to throw you some training money as you're obviously an expert on the subject.

Meanwhile others are going to keep using real images and synthetic images in our models.

.
>>
File: aas.png (2.53 MB, 2100x1024)
2.53 MB
2.53 MB PNG
https://blog.fal.ai/auraflow/
I'm testing if flux dev is as good as auraflow at promt understanding, left is flux dev, right is auraflow
>A photo of a beautiful woman wearing a green dress. Next to her there are three separate boxes. The Box on the Right is filled with lemons. The box in the Middle has two kittens in it. The Box on the Left is filled with pink rubber balls. In the background there is a potted houseplant next to a Grand Piano."
>>
>>101677314
no need to seethe over being called out. just improve next time, it's okay to be wrong
>>
>>101677307
Auraflow has this exact problem too, where content overwhelms style
>>
File: CascadeMario.png (793 KB, 1024x1024)
793 KB
793 KB PNG
>>101677307
Cascade, which seems to do a better job at it. Dataset, architecture, both?
>>
File: NotBad.png (2.79 MB, 2048x1024)
2.79 MB
2.79 MB PNG
>>101677317
>a cat that is half orange tabby and half black, split down the middle. Holding a martini glass with a ball of yarn in it. He has a monocle on his left eye, and a blue top hat, art nouveau style
that's a shame that it doesn't know much about styles of actual characters trivia, because it follows prompts really well
>>
File: ComfyUI_00080_.png (1.56 MB, 1024x1024)
1.56 MB
1.56 MB PNG
>>
>>101677356
data, always
>>
>>101677356
>Dataset, architecture, both?
I'd go for synthetic captions, they probably used too much of them and destroyed all the artist names in the process
>>
File: ComfyUI_13066_.jpg (478 KB, 1280x768)
478 KB
478 KB JPG
>>101677192
Harder to gacha text when it takes 2 minutes per image is the only issue
Add LoRA support and it opens the road to fake spoilers and gaslighting people about shows kek
>>
>>101677307
>sonic the hedgehog. impressionist style painting with wide brush strokes. visible paint daubs, canvas texture
>>
>>101677476
about the furthest thing from impressionist style out there, but at least it got the brush strokes
>>
>>101677476
>>101677495
>impressionist style painting with wide brush strokes. visible paint daubs, canvas texture. homer simpson
yeah, not impressionist at all. but seems to work for getting it to do an actual painting and not digital art or a render
>>
File: ComfyUI_00085_.png (1.02 MB, 1024x1024)
1.02 MB
1.02 MB PNG
>>101677356
>Cascade, which seems to do a better job at it. Dataset, architecture, both?
>>
Have you tried other samplers than euler? Maybe there's one that works great with flow dev
>>
File: Fruit.png (1.86 MB, 1024x1024)
1.86 MB
1.86 MB PNG
>>101677411
Don't think so entirely, as cogvlm can recognize a painting even if it loses the artist. Certainly the case for it not understanding Winterhalter, etc, but not for painting in general. It seems you really need to force-feed it the idea of a painting or else it fails especially when combined with characters. I'm able to do "A detailed classical painting of a variety of fruit" and it comes out pretty fine. Likely something architecturally I guess if it starts to get stubborn when combined with different concepts.
>>
>>101677549
I tried dpm2 that someone suggested, and it came out pretty fried and nasty. euler remains eternally winning amidst a sea of snakeoil
>>
>>101677575
works on my machine
>>
File: 1698265304378588.png (1.66 MB, 768x1280)
1.66 MB
1.66 MB PNG
flux is SEXO
>>
>>101677630
it almost always have perfect hands, that's so cool to see, looks like hands are a problem from the past now
>>
>>101677515
kek
>>
>>101677651
lmao
>>
File: ComfyUI_00091_.png (877 KB, 1024x1024)
877 KB
877 KB PNG
>>101677587
>works on my machine
But does it work on your atachine anon?
>>
>>101677293
14s/it so around 4-5 mins per image at 1024x1024, not great but seems about normal
>>
>>101677288
localchads always win
>>
>>101677683
Huh, way better than I thought.
>>
File: ComfyUI_00089_.png (2.86 MB, 1536x1536)
2.86 MB
2.86 MB PNG
That's insane, it reproduced the roman number of the clock perfectly
>>
fp16 is too big but fp8 is too different to fp16, how about fp12? is that even possible?
>>
Is there a quick setup guide for flux on comfyui? What folders do I put each file in, what settings to use.
>>
locals walks so saas can run. as a midjourney and dall-e user, i thank you all for your service. none of our amazing closed models would be possible without your free research.
>>
>>101677751
yes
https://comfyanonymous.github.io/ComfyUI_examples/flux/
>>
>>101677759
I'm crying
>>
>>101677718
fwiw this is on Linux and I haven't updated the Nvidia driver in a while so as far as I know there is no VRAM->RAM offload like on Windows. Sitting on 9.8gb VRAM usage though lol
>>
File: 1721201955020516.png (1.28 MB, 1064x1192)
1.28 MB
1.28 MB PNG
Can Flux generate Laura Kinney in a bikini?

I might just stick with pony
>>
>>101675567
catbox?
>>
Why did they go for 12b, it's a bit too big for my 24gb vram card ;-;
>>
>>101677828
works on my 3090 THOUGH
>>
>>101677811
its by far the best we have gotten with a base model ever. We just need one of the big finetuners to take the leap and we will have THE next gen model
>>
>>101677828
I'm running it with 20GB VRAM.
>>
>>101677835
mine too (3090 chads rise up) but barely. the vae decode takes ages in fp16 because the vram fit is so tight. I have to drop to 8bit if I want my computer to be usable at all while I'm genning.
>>
>>101677811
I highly doubt it knows who that is
>>
>>101677840
Anons said you need a stupid amount of vram to run it
>>
>>101677835
>>101677843
for me it reaches the limit and everything slows down, fock, maybe I should disable hardware optimisation on chrome or something
>>
>>101677828
It fits, and 8 bit fits on 12GB and the difference in quality is minimal
>>
>>101677851
If 12GB is a stupid amount I guess. You could edit some files and try it at 4bit to maybe get it to work on a 6GB card
>>
File: 1708197331614845.png (1.1 MB, 1024x1024)
1.1 MB
1.1 MB PNG
>>101677851
>>
>>101677854
>>101677843
>>101677835
24gb is still not enough though, it only run the image model, if you want to add the text encoder it would need an insane amount of vram
>>
>>101677873
follow the instructions anon... even with FP16 everything it fits in 24GB with room to spare
>>
>>101677881
no, only the image model is fitting the vram, try to do a --highvram flag and you'll see your gpu will be oversaturated
>>
>>101677873
Yeah I'd imagine we would get to a point where we would need to work on multi-gpu support
>>
>>101677881
nta but nah, the t5 model is being run on your cpu/system ram
>>
>>101677849
Pony it is
>>
>>101677902
Stop doing retarded shit and just use someones workflow from the 1000 catbox links / the site
>>
>>101677904
>Yeah I'd imagine we would get to a point where we would need to work on multi-gpu support
I'd be ok for an hybrid cpu/gpu inference on the text encoder, instead of putting the whole text encoder to the cpu when it can't fit tonto the gpu, that's something that has been done on the llm community
>>
>>101677929
he's right though -> >>101677926
>>
>>101677873
Just run in on your second 3090 that isn't connected to the monitor. Surely you have one just like all of us?
>>
>>101677953
>Just run in on your second 3090 that isn't connected to the monitor.
how? there's no multi gpu support on comfyui
>>
>>101677960
I meant load the model fully on the idle GPU instead of the one that runs your system.
>>
https://huggingface.co/motexture/FluxDiff

https://github.com/motexture/FluxDiff

https://blackforestlabs.ai/up-next/

not seeing any posts about this
>>
File: ComfyUI_temp_zjpls_00021_.png (1.29 MB, 1024x1024)
1.29 MB
1.29 MB PNG
>>101676064
honestly wishing I had negs right now because it keeps giving me heart pasties. hydit fennec ears > flux fennec ears. these ones aren't nearly as fluffy
https://files.catbox.moe/s6xb8y.png
>>
It's weird they decided to use t5xxl as the text encoder, this shit is 2 years old at this point
>>
>>101677984
those guys will revolutionize this ecosystem, they're insane, if it has the same quality as their image model we'll be eating so good... our true savior
>>
>>101677985
>honestly wishing I had negs right now because it keeps giving me heart pasties.
try to go for a low cfg like 1.5 so that you can access to the negs
>>
>>101677984
oh no no no no no sorasissies...
>>
the skeleton of voice synth has all but turned to dust....
>>
is it worth paypigging for runpod?
>>
>>101677984
HOLY FUCKING KINO
>>
>>101677984
if the text to image gen needs 24gb to be good, there's no way we'll be able to run high quality text to video models, unless Nvdia decides to give us more VRAM... LOOOOOOL
>>
>>
File: what.jpg (254 KB, 2886x1559)
254 KB
254 KB JPG
goddam I wrote "adult" on the fucking prompt, why is it showing me a kid?
>>
Which one is better? fp8 e4 or e5?
>>
File: Capture.jpg (240 KB, 2897x1537)
240 KB
240 KB JPG
>>101678140
this model has a big problem with blur, that's why a negative prompt is important in the first place...
>>
File: 1696893508336508.jpg (97 KB, 740x1232)
97 KB
97 KB JPG
We are so backing back it unreal .
Open source bros we won
>>
>>101678140
>>101678183
the problem is your scheduler, select "simple" and it will work, if the number of steps are too little I think it also outputs blurry images
>>
File: file.png (1.23 MB, 1280x800)
1.23 MB
1.23 MB PNG
>>
>>
https://huggingface.co/Kijai/flux-fp8/tree/main
flux dev in fp8, 11.2GB
>>
>>101678167
"We also see that FP8-E5 is never the best format for inference; even for the transformer layers with significant outliers, the FP8-E4 format is better"
https://arxiv.org/pdf/2303.17951
I'm not sure how applicable this is for us though, that being said, the VAE uses e4 so thats already 2 data points supporting e4
>>
>>101678228
oh shit
>>
>>101678199
I went for 20 steps and it works fine now
>ohh it's a distilled model it should work perfectly at 8 steps
that's a fucking lie, 20 steps seems to be the sweet spot kek
>>
File: file.png (762 KB, 1280x800)
762 KB
762 KB PNG
>>
>>101678245
The DEV model isn't distilled for low step count, only the other one, the DEV one peaks at 40~50 steps according to the developers
>>
>>101678067
Get multiple 3090s. /lmg/fags are hacking together 4x3090 systems so they can use 100b+ local models. I'm actually super excited we finally have a larger diffusion model. These 1-2b models were never gonna cut it. None of the inference backends support multi-GPU yet, but conceptually it's not hard, you literally just put half the layers on one GPU and half on the other, and use pytorch hooks or whatever the fuck to make sure the activations automatically route between them. Actually HF Diffusers might already do GPU splitting, certainly every single HF Transformers model can do it out of the box.
>>
>>101678260
>the DEV one peaks at 40~50 steps according to the developers
ohh... I should stop going into the 20 steps then, can you give me the source of that though? that looks interesting
>>
>>101678228
BTW anons. If you already have the original weights, you get the same outputs as this by using the --fp8_e4m3fn-unet flag.

Also, if you plan to download and use that, also run it with --fp8_e4m3fn-unet, or else it'll try to take up fp16 amounts of space for some reason.
>>
File: hmm.jpg (26 KB, 860x419)
26 KB
26 KB JPG
>>101678228
what's the difference with downloading the whole model and putting the fp8-e4 weights though?
>>
>>101678252
neat
>>
>>101678269
>Get multiple 3090s. /lmg/fags are hacking together 4x3090 systems so they can use 100b+ local models.
I already a 3090 + 3060, and unfortunately I can't use the 3060 because there's no multigpu support on comfyUI
>>
>>101678269
yeah, I'm surprised it hasn't been done before, the llm community can do that for a year now
>>
>>101678274
I thought I saw it in the HF model card but it isn't there... still, I distinctly remember reading that, but as you can see its still very good below that
>>
File: file.png (1.23 MB, 1280x800)
1.23 MB
1.23 MB PNG
>>
File: ComfyUI_00106_.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
>>101678315
kek
>>
>>101678291
You wont have to read 24gb of garbage, and if have 32gb of vram this also prevents having to swap to disk while loading for the first time
>>
>>
>>101678377
oh nice, I still have to specify the e4 weights or should I put "default" for that one?
>>
>>101678315
Same setup here. Poor little 3060.
>>
>>101678415
It shouldn't matter, but I don't know, I tested and leaving it at e4 matches the original model
>>
>>101678315
the t5 loader from ComfyUI-ExtraModels has the option of loading it to a secondary GPU, and it works well. There's no way to use it with Flux unfortunately (node output is the wrong type). But it proves it's quite possible and Comfyanon just doesn't want to implement it.
>>
Fresh bred when ready

>>101678250
>>101678250
>>101678250
>>
>>101678437
fuck...
>>



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.