[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: tmp.jpg (1.37 MB, 3264x3264)
1.37 MB
1.37 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102332654

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/c/kdg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/u/udg
>>>/tg/slop
>>>/trash/sdg
>>>/pol/uncensored+ai
>>
File: MiniMax.webm (380 KB, 1280x720)
380 KB
380 KB WEBM
BFL managed to make dalle at home, will they do the miracle again by making a MiniMax at home?
https://blackforestlabs.ai/up-next/
>>
>>102351926
If twitter announces a video model then yes, someone needs to fund it
>>
>>102351926
Best not to think about it. They've said nothing about it since it was announced.
Personally I'd torture someone who promised me something "Soon", give me a fucking date you carrot dangling fucks.
I need to plan my schedule, projects and so on and not have shit just dropped on me and scramble to make time, change hardware, write things and so on.
It's really a bad sign when a company says "soon" they either have no idea if the project will even work or are having financial problems which may kill the project completely. It shows they dont have a complete grasp of what they are doing and cannot make projections about their own product.
We saw this with SD3 <shudder>
Just say a month Black Forest Labs, before black friday (meme prices)? so anons can decide to buy shiny new cards.

Throw us a bone, and not the ones betwixt your legs.
>>
>>102351926
Are there any video models that I can animate with controlnets? I feel like I'm stuck with SD1.5 and Animatediff. I was doing some experiments with ToonCrafter and PonyXL stills and saw that there were some efforts with SVD and controlnets. I'm interested in cartoony/anime stuff.
>>
>>102352073
Training a model is not straightforward. It's like giving a date for when your baby will say its first word.
>>
File: file.png (187 KB, 800x400)
187 KB
187 KB PNG
>>102352073
>Personally I'd torture someone who promised me something "Soon", give me a fucking date you carrot dangling fucks.
desu when you're making models you don't really know when it's gonna end, they're experimenting a lot and I prefer them to take their time and then release a good product, rather than going the SAI path and release a failed experiment as a bone for us
>>
Forge question about Clip Skip on pony
I switched to new Forge yesterday (version that has flux and shit) and noticed my gens aren't the same on the same settings between old and new Forge. Blamed it on the sampler+scheduler split and moved on though. Now I also noticed Clip Skip is not on 2. Tried enabling it via sdxl_clip_l_skip and CLIP_stop_at_last_layers (together and separately), but it made no difference at all. On old forge, changing Clip Skip did make a difference however. I even tried making a xyz plot with Clip Skip 1 and 2, both images were the same with Clip Skip being 2 in both PNG Infos. What could be the issue here?

tl;dr how do I enable Clip Skip 2 on Forge with pony models
>>
>>102352145
>Are there any video models that I can animate with controlnets?
there's some video to video features that exist, like you make a basic blender animation and you let the video model to do the rest
>>
Blessed thread of frenship
>>
>>102352073
I don't think they have much money problem now that they sold their licence to twitter, you can't find a better place to release your model as an API
>>
>>102352163
Clip skip is bizarre on Pony. On Comfy setting it to 1 gave me noise (same for merges/finetunes of it), but I was told that on A1111 it worked and gave a slightly different output (not better, just different).
>>
File: 1715899465664143.png (2.96 MB, 1632x1632)
2.96 MB
2.96 MB PNG
>>
Booba going crazy
>>
File: 1725787761485762.jpg (1.39 MB, 1632x1632)
1.39 MB
1.39 MB JPG
>>
>>102352146
>>102352149
I understand it's an iterative process to get a well performing model, they could go on for a month, 6 months a year re-iterating the model until they were happy with the strengths it has in the areas they want.
What I'm objecting to is the use of "soon" because there's no timescale for "soon" and im too jaded to be strung along when a better explanation could be given; here's our timeframe" 3 months model refinement, 2 months tweaking the %'s, 1 month user testing, 1 month model refinement from feedback" etc.
it's just nebulous bullshit at this point. I don't want a shit model either, I don't want it now.
I want it when it's done which my sources tell me 100% will be:
>In 7 bananas
See how dumb it actually sounds?
>>
>>102352228
on comfy for pony you have to set clipskip to a negative number, the setting is -2.
>>
>>102352286
Maybe you should do something instead of acting like you're waiting for a deposit on your EBT card
>>
>>102351926
im more looking forward to what those chiness university students are up to. maybe they cracked the 16ch VAE.
>>
File: noise.png (1.15 MB, 1024x1024)
1.15 MB
1.15 MB PNG
why does pony do this? it doesnt make sense to me. I'll have a prompt and it will have "blonde hair" in it. then when i change it to "long blonde hair" it shits itself. and ill have to trial and error removing words until it works again.
>>
File: 1724343864122503.jpg (1.51 MB, 1632x1632)
1.51 MB
1.51 MB JPG
>>
>>102352464
there is something very wrong with whatever you're using
>>
>>102352377
You make a great point.
EBT card payments are more reliable than Black Forest Labs saying "soon"
Lol.
>>
>>102352499
I don't care. I hope they release it and manage to single you out and ban you from using it.
>>
File: 00099-3082093201.png (1010 KB, 896x1152)
1010 KB
1010 KB PNG
>>
>>102352464
This looks like a vae or clip slip -1 issue
>>
BFL employee having a melty ITT
>>
File: 5645ggffg5x.png (2.55 MB, 1487x888)
2.55 MB
2.55 MB PNG
>>
>>102352606
gib me dat *smacks lips* gib me dat model now
>>
File: 1721875783572207.png (1.53 MB, 1024x1024)
1.53 MB
1.53 MB PNG
>>102352606
I literally don't read any of these posts lmao who cares about some BFL retard
>>
>>102352702
why aren't you posting in sdg, sounds more like your echochamber
>>
>>102352713
who are you? oh yeah that's right it doesn't matter fuck off
>>
>>102352727
it kind of matters because you're a brain drooling retard
>>
File: 1709429314860412.png (1.7 MB, 1024x1024)
1.7 MB
1.7 MB PNG
>>102352735
stop typing
>>
>>102352635
unironically smacking my lips yes
>>
>>102352479
>>102352577
im using automatic1111, the vae and clip skip is correct. im not sure, maybe something to do with out a1111 is sending the prompt. sometimes it will be totally fucked up like thei mage i shared, other times its just really warped. but I have found that just adding a bit of punctuation can fix it. so if adding thick thighs fucks it up adding "thick thighs" instead makes it work as expected
>>
File: 00014-802331460.png (1.22 MB, 1024x1280)
1.22 MB
1.22 MB PNG
>>
>>102352606
Investors have shills all over /g/ , threads are scraped, keywords are flagged and they jump into threads.
>>
File: 00001-2807647590.png (3.12 MB, 1280x1920)
3.12 MB
3.12 MB PNG
>>
File: 00003-1596829530.png (2.93 MB, 1280x1920)
2.93 MB
2.93 MB PNG
>>
when will mods make a /1girl/ containment board for low effort slop?
>>
you just know who's trolling kek
>>
>>102352919
there's less than 10 dedicate AI slop posters
it would be a very dead board
>>
>>102352939
you clearly haven't seen the red boards
>>
>>102352975
And you clearly demonstrate there's an overlap of users.
>>
File: 00008-1468888841.png (2.94 MB, 1280x1920)
2.94 MB
2.94 MB PNG
>>
>>102352565
keke
>>
I like 1girls desu
>>
>>102351868
nice imgs
>>
>>102353131
>using an LLM to prompt is literal Indian retard tier shit
It's good to get a sense of how most data sets are captioned and in turn how you /should/ prompt
>>
>>102353131
>bloo bloo bloo why do I have to write long winded captions
>no I won't use a tool that does this for me, I just want to complain and get random images from single words
>>
>>102352565
>Chudalus 4chanus
>>
>>102352565
>>102352794
Great stuff man. Did you train just with the basic 2d meme images?
>>
>>102352565
Attempted this today >>102350076
Flux does not understand the word "fat", so I had to use "obese" instead.
>>
>>102353368
Mostly, but there were a few pictures of Patrick Crusius.
>>
File: file.png (342 KB, 1200x757)
342 KB
342 KB PNG
https://gist.github.com/sayakpaul/a9266fe2d0d510ec44a9cdc385b3dd74
>This code snippet shows how to split the Flux transformer across two 16GB GPUs and run inference with the full pipeline.
that's cool
>>
>>102353476
Big if true
>>
>>102353476
>across two 16GB GPUs
I guess it would work for a 24gb + 12gb aswell?
>>
>>102353476
Dumb question: why does it need two text encoders instead one?

>>102353611
Image being able to use spare 3060 as some dedicated ai card
>>
Why most loras made and posted here worked fine while the average civitai lora gives fucked up hands or eyes?
>>
>>102353626
T5 translates your text from boomer prompting into tags that clip uses
>>
>>102353629
Civitai is known for its easy of use not necessarily its quality
Tangentially anon is pretty good at it
>>
>>102353626
>Dumb question: why does it need two text encoders instead one?
it's not a dumb question at all, I think they went for two text encoder to get the best of both words, clip_l is excellent at tags wheras t5 is excellent at natural language, imo I think that was a bad idea because clip_l can only eat 77 tokens, so if you go for long prompts it's basically useless
>>
>>102353641
I have to tune way more often the random civitai ones while the ones from here worked fine with prompts used for other loras, so maybe everyone read here some guide to train loras properly?
>>
Was the DC sampler implemented anywhere?
>>
>>102353649
Pixart proved you don't need both, it seems like a dumb layover idea from SDXL (same team).
>>
>>102353476
would this work with other models?
>>
>>102353671
It was made like that so average users on twitter can use it without tags
>>
>>102353476
that's big, there should be a ComfyUi node about that because Comfy can't be bothered to implement important stuff on his own repo
>>
>>102353671
>it seems like a dumb layover idea from SDXL (same team).
Highly likely
>>
>>102353691
T5 doesn't give a shit how you prompt it and in practice no one has ever liked the dual text encoder. No one wants to type different shit into two boxes. And in practice it's not required and they probably did it because they're lazy cunts who didn't want to train on a diverse set of prompt formats.
>>
>>102353714
>No one wants to type different shit into two boxes
then dont
>>
File: 00003-3273304712.jpg (818 KB, 1488x1776)
818 KB
818 KB JPG
>>
>>102353737
I won't and like I said, it's a dumb architecture and they're dumb cunts. And in the end it didn't even matter, because the model is stylistically inferior and the CLIP does fucking nothing.
>>
File: ComfyUI_01413_.png (2.8 MB, 1920x1080)
2.8 MB
2.8 MB PNG
>>
File: 00016-557003662.png (2.91 MB, 1280x1920)
2.91 MB
2.91 MB PNG
>>
>>102353755
>use style lora
>people gen as asians
>>
File: dino_00023_.png (944 KB, 1024x1024)
944 KB
944 KB PNG
4090 coming in tomorrow (currently have a 3060 Ti)
can't wait can't wait can't wait
>>102353755
looks like something out of Darius
>>
>>102353665
>so maybe everyone read here some guide to train loras properly?
Unfortunately I don't think anyone created a "definitive" guide. There's been lots of discussion in previous threads (and some small guides posted) so I presume anon gleamed something from that.
>>
>>102353753
>and the CLIP does fucking nothing.
the worst part is that clip does something, when you change the clip finetune (there are several of them) you can get really different pictures, I was surprised how much that affect the output actually
>>
>>102352565
nice
>>
>>102353784
cute
>>
File: ComfyUI_01421_.png (3.74 MB, 1920x1088)
3.74 MB
3.74 MB PNG
>>
File: 00014-2280578470.png (869 KB, 1152x896)
869 KB
869 KB PNG
I downloaded a new flux model and get an error saying "mat1 and mat2 shapes cannot be multiplied" when trying to use it, what's the deal with that?
>>
>>102354065
>I downloaded a new flux model
link?
>>
>>102354065
it's going to be something like you're using an SDXL vae, lora or something when your entire process should contain flux based models, loras etc.
>>
>>102354065
did anons answer yesterday not work?
>>
File: ComfyUI_temp_znyvv_00002_.png (2.63 MB, 1920x1152)
2.63 MB
2.63 MB PNG
>>
File: 00010-436492188.png (687 KB, 1152x896)
687 KB
687 KB PNG
>>102354074
I meant finetune, it's AcornIsSpinning Flux on civitai.
>>102354086
I have the clip file, T5 encoder and Flux vae, I used jibMixFlux before which worked without problems.
>>
>>102354124
missed it, gonna check the archive
>>
>>
File: ifx464.png (987 KB, 1024x1024)
987 KB
987 KB PNG
>>
>>102353476
great, an overcooked cat
>>
>>102353476
so that's it, we can now split the model into different gpus? that's a huge deal, especially if you want to split a tiny % onto your cpu to run bigger models but with tolerable speed
>>
>>102354065
>>102354173
i enjoy this cat in space
>>
File: 00027-2388925365.png (2.66 MB, 1280x1920)
2.66 MB
2.66 MB PNG
>>
File: 1705076095107533.png (263 KB, 512x512)
263 KB
263 KB PNG
>>
File: 1714503552279243.png (511 KB, 512x512)
511 KB
511 KB PNG
>>
I feel so irrelevant ever since XL.
>>
>>102353476
What? This isn't new. I'm running T5 + CLIP + VAE on either GPU of my choosing, and then I run the transformer model on the other.
>>
>>102352464
No norm needs to be enabled in settings if you're not using comfy
>>
>>102354635
the new thing about that script is that you can now split Flux into multiple GPUs, for example if bf16 is too big for your first gpu, you split it and put the smaller parts on gpu 1 and 2
>>
>>102354660
splendid
>>
>>102354625
get with the times, old man
>>
its finally happened..............
im in love with one of my gens
>>
>>102354836
Well, get on with it, don't be such a cockblock tease and allows us to shame your taste in women.
>>
File: goblin gf.jpg (152 KB, 768x1344)
152 KB
152 KB JPG
>>102354860
>>
>>102354888
>AI face
>>
>>102354888
I'm a humble gobbo enjoyer myself
>>
>>102354906
what does that mean
>>
File: 1714457595655868.png (312 KB, 512x512)
312 KB
312 KB PNG
>>
>>102354939
there are some facial features that appear commonly in ai generated women, making many gens basicaly sameface

you can especially see it with sd 1.5
>>
>>102354939
Basically all the slop models are overtrained on specific facial features so they all have the same AI face. I'm not sure if this is by design however, like Flux has it and it might be a deep fake counter measure. Or just retards training on a set of 1000 pictures of the same girl.
>>
>>102354985
Pony models are guilty of this because of model inbreeding
>>
>>102354985
I think it just has to do with the fact checkpoints are basically an average of whatever they were trained on, or so I presume.
>>
>>102354985
i dont see it. i had to make my own models for the faces. maybe its just the symmetry
>>
>>102355034
I don't think so, you don't see it with other things like dogs and cats. If it was an average things you'd see the same lamp showing up over and over again in the background, for example. I think it's an overtraining issue.
>>
File: file.png (71 KB, 175x176)
71 KB
71 KB PNG
>>102355056
>>
>>102355085
Anon, posting one face won't convince them, if they haven't noticed it by now.
>>
File: 1715791664843311.png (398 KB, 512x512)
398 KB
398 KB PNG
>tfw no rotting gf
>>
File: 1706826273692256.png (316 KB, 512x512)
316 KB
316 KB PNG
>>
File: goth gob.jpg (135 KB, 768x1344)
135 KB
135 KB JPG
>>102355085
is it just the gentle smile neutral expression?
does this pic have ai face?
>>
File: file.png (33 KB, 200x70)
33 KB
33 KB PNG
>>102355155
look at the eyes
it's just the same person wearing facial prosthetics
>>
File: file.png (34 KB, 185x104)
34 KB
34 KB PNG
I'm not sure what about our biology says it's the same person. Must be the eye shape to cheekbone ratio.
>>
File: 1723275306485243.png (415 KB, 512x512)
415 KB
415 KB PNG
faceblind autist thinks he is unleashing true meaning on the thread
>>
>>102355223
Everyone knows AI face exists. If you don't think it exists, you are definitely face blind. Try going outside and looking at real people.
>>
>>102354985
Flux has buttchins
>>
if you dont know about sameface in 2024 it unironically over for you
>>
>>102354985
Isn't that a DPO thing?
>>
File: 1717856395738113.png (399 KB, 512x512)
399 KB
399 KB PNG
imagine thinking that I am affected by AI "sameface"
>promptlet
>>
So was there a solution for that SD3 "gelu_new" error or it's still unusable in Automatic/Forge?
>>
>>102355443
The Pony models wouldn't be using DPO
>>
File: 1702663117120487.png (167 KB, 1029x1079)
167 KB
167 KB PNG
i've hacked together a random prompt generator in comfy based on wild cards, and to read the prompt i've hooked up a simple string reader at the end of it. However, i can only see the prompt of the current image being generated, and when it's finished it's replaced by a new one. So when the image has finished generating, and i would like the see the prompt that generated it and compare it to the image, the string has already been replaced by the next prompt in the queue, if that makes sense. So what i would want is some sort of buffer or delay, so that it displays the last prompt generated instead of the current one.

Does anyone if this is possible somehow? I hope i described that well enough.
>>
>>102355675
anything is possible, just find a node or make a node that saves a string to a text file
>>
>>102355675
Yeah, so, you need to either find a way how to cache out the current prompt into a file on its own. I guess this should be doable because it's Python.
Or if it's not possible you could resort to writing a simple manager controller outside comfy but that's probably not that simple.
>>
>>102355835
>>102355675
To add: it's probably hard because ComfyUI is not time based on any level, it's not made like that.
You will need to figure it out.
I mean it's not like Maya which you can easily script the way you want and refer previous positions because it's there and if it's not present you can always read it back.
>>
https://github.com/ToTheBeginning/PuLID
Oh that's cool! It's like InstantID but for Flux
>>
>>102355835
>>102355915
it would be dead simple to make a node that saves a string to a text file
you could even save the output image and the prompt together as an image (.png) and .txt file pair.
>>
>>102356074
vram?
>>
>>102356074
>chink looks at viewer
>>
>>102356074
Demo here
https://huggingface.co/spaces/yanze/PuLID-FLUX
>>
File: Capture.jpg (318 KB, 2301x1518)
318 KB
318 KB JPG
>>102356135
dissapointing...
>>
File: Capture.jpg (280 KB, 2349x1489)
280 KB
280 KB JPG
>>102356135
not bad, but I should try something that gives a different expression than the input image I guess
>>
File: file.png (3.06 MB, 2030x1505)
3.06 MB
3.06 MB PNG
>>102356074
>As shown in the above image, in terms of ID fidelity, using fake CFG is similar to true CFG in most cases, except that in a few cases, true CFG achieves higher ID similarity. In terms of image aesthetics and facial naturalness, fake CFG performs better.
Interesting, that's what I noticed aswell at some point in time
https://reddit.com/r/StableDiffusion/comments/1emy5oz/a_higher_cfg_helps_flux_to_make_celebrities_look/
>>
>>102355763
This is the answer, also add the seed number to the save file name or something to make life easier.
>>
>>102356387
meant for also
>>102355675
>>
>gen big butt foid
>watching it gen with fast preview
>at some early step it decides to flip the foid around so she's facing the other way
>now it doesn't know what to do with the big protruding buttocks area that's no longer her butt
>watch as it turns into a big potbelly
>>
File: tmp0r8wnib6.png (669 KB, 768x1024)
669 KB
669 KB PNG
>>102355580
bump
>>
>>102356427
Sergeant Braphog came for you when you least expected it. Or maybe not, since it was an ass fetish image anyways.
>>
The new ChatGPT model might actually be good enough to design an image model from scratch. Investigating.
>>
>>102356076
Sure but you need to implement time function too.
I think this cannot be done without changing the source code of the original nodes or at least the interface itself.
I might be wrong because it has been some time I did any work etc.
>>
how's the 4070ti at flux? are all those new advancements in architecture or whatever the fuck making it really fast?

>of which i can't use because i'm on a 970 right now
>>
>>102356625
Comfy is just a behavior tree. Each node typically has an input and an output. The node takes in an input (if available) and does something and if there's an output pushes it forward. It's just a dynamic system of modules, it's trivial to do.
>>
>>102356312
is this face thing unique to flux or is it available for 1.5 or sdxl or pony?
>>
>>102356691
exists for SD1.5 but not for XL/Pony IIRC
>>
>>102356664
Don't worry, flux is way slower than other Open Source models in its class, but the image quality is on par with what Bing AI was using. Just make sure you have at least 12Gb VRAM and CUDA compatibility, you'll be good with that.
My RTX 3060 satisfies these, and I spend like 2 minutes with 30-40 steps. That's with forge and the optimized model the dev uploaded. You can make FHD wallpapers with it, even.
>>
>>102356714
>exists for SD1.5 but not for XL/Pony IIRC
instantID was for XL though?
>>
>>102356686
Yeah but you will need to implement 'frame' conception.
>>
>>102356777
what the fuck are you talking about?
he just wants to know what the prompt was
why are you making this so complicated?
>>
>>102356776
yeah but instant ID kinda sucks, at least from my experience it fucks up the face especially the mouth often, PuLID seems to work much better
>>
>>102356664
There are no advancements in arch. It's the exact same latent diffusion process as in SD or all the other current image generators. It just uses an additional text encoder and a transformer instead of a unet to predict noise from t -> t-1
>>
>>102356795
>PuLID seems to work much better
you tested the demo or on ComfyUi?
>>
>>102356890
I meant the one for 1.5, the faces came out better than the with InstantID on XL
>>
>>102356135
I guess that can be used on comfyui right?
https://github.com/cubiq/PuLID_ComfyUI
>>
>>102356774
looks like im buying a couple supers
https://youtu.be/nP5hG2voJ4I?si=rro4DZpWecgX7c2V

>and it takes over double this time at 1024pix on sdxl on my current gpu
>for around the same power draw
>>
>>102356947
retracting this, not the ti super, i got confused. which is exactly what nvidia was going for with this retarded scammy releasing scheme, of the three GPU's the TI is the best value price/performance/wattage.
>>
File: 00094-4149537017.jpg (483 KB, 1344x1792)
483 KB
483 KB JPG
gigu
>>
>>102356798
a transformer instead of U-Net is a massive change
>>
>>102356788
You either do it right or don't do it at all.
>>
>>102357284
Wouldn't say that fits my definition of "massive"
>>
>>102357362
that's alright. you only need to agree about that it's an advancement, specifically in terms of the arch
>>
File: file.jpg (542 KB, 1648x2066)
542 KB
542 KB JPG
No Flux Anime in sight...
>>
>>102357376
I do agree that it's an algorithmical advancement based on the same mathematical process.
>>
File: 00000-721546627.jpg (411 KB, 1344x1792)
411 KB
411 KB JPG
>>
i cant even google this question and get any results, probably because im wording the question wrong maybe, but those of you on 3090's/more than 24gb of vram, do you use batch size? Like generating more than a few images all at once, how much vram does that take and do you get a performance hit doing it?
i just thought about how more efficient that would probably be in performance + power usage vs genning 1 image at lower performance, say 4070ti vs 3090ti.
>>
File: clip_generation.gif (744 KB, 800x392)
744 KB
744 KB GIF
Back to basics!
>>
File: 1726175364.png (771 KB, 1024x1024)
771 KB
771 KB PNG
>>
File: file.jpg (103 KB, 1064x834)
103 KB
103 KB JPG
>>102357499
CLIP is trash

SigLIP is the future
>>
File: grid-0003.jpg (587 KB, 2048x2048)
587 KB
587 KB JPG
>>
File: file.png (1.78 MB, 1467x893)
1.78 MB
1.78 MB PNG
>>
>>102356945
>I guess that can be used on comfyui right?
PuLID is asking you to install the facexlib package, but unfortunately it's not available on python 3.11, and comfyUi uses 3.11 so you're fucked lol
https://github.com/vladmandic/automatic/discussions/110
>numba is not compatible so facexlib, gfpgan and realesrgan modules will not be available
>>
File: 1726175681.png (289 KB, 1024x1024)
289 KB
289 KB PNG
>>
>>102357586
That's a boy
>>
File: file.png (116 KB, 1529x1085)
116 KB
116 KB PNG
>>102357606
>and comfyUi uses 3.11
kek, how do I downgrade to 3.10 then?
>>
>>102357606
>comfyUi uses 3.11 so you're fucked lol
you can definitely run Comfy with older versions of Python. I use 3.10.6 without any issues or custom node incompatibilities

>>102357606
>https://github.com/vladmandic/automatic/discussions/110
you're quoting a post from April '23
>>
>>102357684
>>102357658
>>102357606
it's ok there's a way to make it work anyway
https://github.com/cubiq/PuLID_ComfyUI/issues/1#issuecomment-2102591918

Do this instead:
ComfyUI_windows_portable\update>..\python_embeded\python.exe -s -m pip install --use-pep517 facexlib
>>
File: 1726176053.png (487 KB, 1024x1024)
487 KB
487 KB PNG
>>
>>102357410
oh, well. that's not particularly controversial.
I'm sure you understand what I disagree about in terms of the statement "There are no advancements in arch" but meh
>>
File: file.png (310 KB, 2728x1505)
310 KB
310 KB PNG
>>102357704
bruh... now it doesn't want to install insightface, this shit is fucked
>>
File: 1726176457.png (653 KB, 1024x1024)
653 KB
653 KB PNG
>>
File: 00021-2170268794.jpg (726 KB, 1344x1792)
726 KB
726 KB JPG
>>
>>102357808
Don't build it, just install this prebuilt wheel
https://github.com/cubiq/ComfyUI_IPAdapter_plus/issues/162#issuecomment-1868967714
>>
>>102355675
>>102355763
>>102355763
>>102356387
Like this?
>>
>>102357774
Maybe it's helpful to differentiate between "software arch" and "mathematical arch"
>>
>Loads Flux
>Adds Lora
>Python.exe has stopped working
What's the git gud trick to get Flux running with a lora on a 16GB GPU?
>>
>>102357880
What UI are you using?
>>
>>102357896
Forge
>>
File: 00029-3113399324.jpg (697 KB, 1344x1792)
697 KB
697 KB JPG
>>
>>102351868
Soo any finetunes of Flux yet that add porn and furry?
>>
File: LGp_Vp-f_400x400.jpg (17 KB, 400x400)
17 KB
17 KB JPG
>>102357880
Oh wait there is a setting for that: fp16 LoRA
>>
>>102358014
Probably won't ever happen. It's very expensive to train and with the Flux's restrictive license, crowdfunding won't be possible.
>>
File: 00044-3113399326.jpg (558 KB, 1344x1792)
558 KB
558 KB JPG
>>
File: ComfyUI_00343_.jpg (1.9 MB, 3584x4608)
1.9 MB
1.9 MB JPG
>>
File: file.png (967 KB, 3744x1728)
967 KB
967 KB PNG
>>102356945
>>102356074
yep... the flux model isn't working for ComfyUi yet
https://github.com/cubiq/PuLID_ComfyUI/issues/69
>>
File: angry010.jpg (13 KB, 146x222)
13 KB
13 KB JPG
Say No to vrambloat.
It's a crime against civilization.
>>
File: 00047-3113399327.jpg (621 KB, 1344x1792)
621 KB
621 KB JPG
>>
File: ComfyUI_00540_.png (2.09 MB, 1568x1568)
2.09 MB
2.09 MB PNG
>>
I hope specific layer training gets implemented for Kohya soon. It would save VRAM and increase training speed.
>>
>>102358236
you tried this one?
https://github.com/ZHO-ZHO-ZHO/ComfyUI-PuLID-ZHO
>>
File: ComfyUI_00363_.jpg (2.07 MB, 3584x4608)
2.07 MB
2.07 MB JPG
>>
File: 0.jpg (97 KB, 1024x1024)
97 KB
97 KB JPG
>>
>>102357871
no. the arch is the model structure. it is /the/ arch in ML.
the "software arch" is the stack, and "mathematical arch" just sounds like a parabolic arch lmao
>>
>>102356074
I never tried that one, only InstantID, which was was better during the SDXL days?
>>
>>102357564
prompt?
>>
>>102355675
What's the node with the string for autorefresh called?
>>
File: file.png (2.88 MB, 1636x1523)
2.88 MB
2.88 MB PNG
>>102356074
>>102356135
nice
>>
>>102357880
In comfy you can use gguf flux nodes and load any loras you want. Lowest quality/smallest quantization flux model can fit in like 4 gig
>>
>>102357880
>>102358696
this, get a GGUF file that's under <16GB
i have 8GB VRAM and the Q4 quant works perfectly
>>
>>102351868
that's a cool looking super stargate
>>
>trying to get joycaption + llama 8b gguf to stop describing every image with 'emotions' and 'moods'
>make prompt something like "caption this image descriptively but concise. you will not use any emotions or moods.." [etc]
>every description, without fail: THE MOOD OF THIS IMAGE IS..
reeeeeeeeeeeeeeeeeeeeeeeeeeee
>>
>>102359001
I think it did stop describing everything as whimsical though, so that's.. progress?
>>
>>102359001
>every description, without fail: THE MOOD OF THIS IMAGE IS..
if joycaption start its sentence with "the mood of this image is" then you can easily remove that sentence with a regex python script
>>
I like loss graph
>>
>>102359027
it doesn't start it with that, no, its randomly throughout the caption. not like you can't still get it to remove every sentence that starts with the "the mood of", but I still don't want it to waste tokens on that shit
>>
>>102359188
yeah I feel that, but somehow every caption model add this fluff shit, even the best of them all, GPT4V, and they don't want to listen to our order at all, that's frustrating
>>
I have only used one single text box for prompting flux. How much of an impact does it really make?
>>
>>102359207
the most confusing part is this must mean they were all trained that way... why... no one talks this way, where did the LLM purple prose-descriptions originate from lmao
>>
>>102359225
that's a good question, my theory is that those captions models were trained as regular LLMs before, so they know how to write a story and add some "writing fluff", when finetuned to describe an image, they probably think it's the same thing as describing a scene like on a book, so they add those "mood" thing, that's my 2 cents
>>
File: 0.jpg (200 KB, 1024x1024)
200 KB
200 KB JPG
>>
>>102359001
8B models are retarded.
>>
>>102359408
even GPT4V does this shit and it's definitely not a 8b model
>>
>>102359413
Because you're retarded.
>>
>>102357198
>>102357417
>>102357579
>>102357834
>>102358011
Keep 'em coming FAGGOT
>>
File: 1726185683.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
>>
File: 1726185739.png (1.22 MB, 1024x1024)
1.22 MB
1.22 MB PNG
>>
>>102359722
omg it bigu
>>
File: 1726186180.png (1.39 MB, 1024x1024)
1.39 MB
1.39 MB PNG
>>
File: 1726186375.png (1.49 MB, 1024x1024)
1.49 MB
1.49 MB PNG
>>
File: 1726186502.png (1.62 MB, 1024x1024)
1.62 MB
1.62 MB PNG
>>
>>102359722
Miku by Ubisoft
>>
CogVideoX just added img2video, too bad this model sucks ass though
https://github.com/kijai/ComfyUI-CogVideoXWrapper/issues/54
>>
>>102359938
at this point I feel like img2anything is just worse controlnet
>>
>>102359955
dunno man, for me, controlnet feels like rotoscoping, it's not natural at all, especially if you want to transform a realistic image (or a 3d image) into an anime image
>>
>>102352228
"Clip Skip" isn't even a thing on SDXL, the differences for SDXL based models were always software bugs, there's no need to use the "Clip Set Last Layer" node for anything XL at all in current Comfy versions.
>>
>>102352357
no you don't, in current Comfy you should just not use "Clip Set Last Layer" nodes at all for XL models
>>
>>102354985
Flux Dev and Schnell have it because they're distilled from Pro, which reduces variance a lot. It's not the same reason that something like SD 1.5 has it.
>>
>>102352228
A1111's clip skip 1 is actually clip skip 2 on Comfy. A1111 is hardcoded to prevented clip skip from being set below 2 for no logical reason.
>>
>>102359938
uhh finally
>>
>>102359938
>>102360209
its a fucking nothing burger
>>
File: 1713471050125020.jpg (1.77 MB, 1632x1632)
1.77 MB
1.77 MB JPG
don't fuck with me
>>
File: 00107-2493429972.jpg (416 KB, 1152x1536)
416 KB
416 KB JPG
>>
File: 00111-15654573.jpg (412 KB, 1152x1536)
412 KB
412 KB JPG
>>
>>102359347
yeah that's not a bad theory and makes sense all things considered, but jeez does it suck for our purpose
>>
>>102360024
Control net is useful, open pose and setting a pose, or using one of the modes based off an existing image etc, there's a lot you can do with it, sd1.5 was great with control nets + latent couple dividing up the image and genning high res straight from txt2img
>>
File: 0.jpg (250 KB, 1024x1024)
250 KB
250 KB JPG
>>
What is the minimum number of images you need to create a model from scratch? I'm not talking about fine-tuning an already existing model
>>
>>102361285
the only thing we know is that Stable cascade used 2% of the Laion-5b dataset, so something like 100 millions of pictures is the range for pretraining
>>
>1 hour with no images
These are truly the last days
>>
Anon is busy training
>>
File: ComfyUI_01428_.png (1.51 MB, 1024x1024)
1.51 MB
1.51 MB PNG
>>
>>102361383
Unit ready
>>
>>
>>102353131

Real men prompt manually.
>>
>>102358014
Pony is training on auraflow
>>
File: 1707750947151756.jpg (74 KB, 1170x1082)
74 KB
74 KB JPG
>>102359938
>want to use cogvideo to do additive video inpainting
>have to set up an entire workflow for it
SOMEBODY PLEASE JUST MAKE AN AFTER EFFECTS PLUGIN ALREADY
>>
File: ComfyUI_33772_.png (2.08 MB, 1024x1024)
2.08 MB
2.08 MB PNG
>>
File: 00006-953169750.png (2.66 MB, 1152x1632)
2.66 MB
2.66 MB PNG
>>
File: ComfyUI_33818_.png (1.58 MB, 1024x1024)
1.58 MB
1.58 MB PNG
>>
File: ComfyUI_33819_.png (1.66 MB, 1024x1024)
1.66 MB
1.66 MB PNG
>>
File: 00008-93736787.png (2.73 MB, 1152x1632)
2.73 MB
2.73 MB PNG
>>
File: ComfyUI_33821_.png (1.83 MB, 1024x1024)
1.83 MB
1.83 MB PNG
>>
>>
File: 00002-589983247.jpg (3.5 MB, 2192x2192)
3.5 MB
3.5 MB JPG
>>102363147
This is a really cool style
>>
>>
:( I wanted to bake a lora over night but it's 2 am and I've only just finished cleaning the dataset, yet alone reviewing and fixing the captions... I'm going to bed... It'll have to wait another night..

Sometimes I wish I could bring myself to just shit out crappy no effort lora after crappy no effort lora like nochekaiser
>>
>>102363785
Proper planning prevents piss poor performance. PPPPPP
You're doing the right thing, sleep well anon.
>>
>>102363103
>>102363147
Model/LoRA? Alluring style
>>
File: ComfyUI_33824_.png (1.75 MB, 1024x1024)
1.75 MB
1.75 MB PNG
>>102363823
https://mega.nz/folder/mtknTSxB#cGzjJnEqhEXfb_ddb6yxNQ 16mei folder.
Based on this artist https://xcancel.com/ju6mei
>>
File: 00126-4192611187.jpg (524 KB, 1344x1728)
524 KB
524 KB JPG
>>
Ran killed this general.
>>
zzzzzz..... mimimimi...... zzzzzz.... mimimimi
>>
File: ComfyUI_33837_.png (1.3 MB, 1024x768)
1.3 MB
1.3 MB PNG
>>
https://github.com/Vchitect
>[09/2024] We release Vchitect 2.0, including the model and the training system
>Vchitect-2.0 is a high-quality video generative model with 2 billion parameters, supporting resolutions up to 720x480 and video durations of 10-20 seconds. Besides, We are also developing a larger verison with 5 billion parameters, and will be released in the future.
When I click on the link I got a 404 error though
>>
>>102365940
https://xcancel.com/ai_trends_hub/status/1834527949208621127#m
>A 5B parameter version will be released in the future.
https://vchitect.intern-ai.org.cn/
that looks better than CogVideoX I guess
>>
File: 1725456605554449.png (56 KB, 699x293)
56 KB
56 KB PNG
i just noticed i have 70gb of hugging face models in my home directory. does anyone know what these might be used for? like why is flux dev there
>>
>>102366117
It's the downloads/cache folder of huggingface that it arbitrarily deletes from so you get to download the model 100 times.
it's why niggers should never use code like:
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")

The model gets downloaded to that directory and then after an indeterminate amount of time, gets deleted. My conspiracy theory is they do this on purpose as a form of censorship.
>>
>>102365940
same here, which is a shame, it's 21:20 in Beijing so it might not be fixed until Monday.
>>
>>102365984
Their huggingface links are all 404 as well.
Curious.
>>
https://liuziwei7.github.io/papers/vchitect_slides.pdf

A very rounded video model (apparently)
>>
>>102366671
>no numbers
>they compare to CogVideo and not CogVideoX
hmm...
>>
>>102366721
Their technical report isn't even on their webpage...
It's over, before it even began. D.O.A.
Why do nerds constantly do this?
>"My GF actually goes to another school"
>"My AI model is nearly finished, here's the hard-drive im going to put it on when it's done"

No one cares ChangDev, put up or shut up.
>>
>>102366794
>No one cares ChangDev, put up or shut up.
this, 100% this, they're wasting our time with this bullshit
>>
Time to see if me and my coding buddy ChatGPT o1-preview can design a diffusion model from scratch.

- Flan T5 XXL
- Osiris 16channel VAE
- No crop image encoding (resize based on total pixels ie 256x256)
- pad and attention mask to handle "bucketing"
- rotary position embedding
- KV compression idea from Pixart
- Cross attention transformer blocks with Flash Attention

Will it work? Who knows.
>>
File: ComfyUI_33842_.png (1.25 MB, 1280x720)
1.25 MB
1.25 MB PNG
>>
>>102367058
unironically chatgpt has helped me solve several long standing physical irritations where doctors were just like 'live with it bro'
>>
>>102367082
Eat right, get sleep, exercise.
>>
File: SDXL20246.jpg (218 KB, 1256x1256)
218 KB
218 KB JPG
Ive been really sick so I haven't had the mental fortitude to come up with any decent gen ideas but I'm feeling a little better and am working on some now.
>>
File: 00046-892027988.jpg (52 KB, 477x463)
52 KB
52 KB JPG
> 20/20 [00:45<00:00, 2.28s/it]
> 20/20 [00:45<00:00, 2.39s/it]
How can I make Flux schnell actually being schnell on a 16GB GPU?
>>
>>102367311
You could have just recorded your feverish ramblings and used those as prompts.
Why should prompts come from orderly minds only?
The best musicians, painters and writers were mentally derainged on drugs or plain old varying levels of insanity during their most prolific periods.
Why should we, modern artists be any different?
>>
is there no flux training rentry yet?
>>
File: 1695268332467266.png (3.77 MB, 2048x2048)
3.77 MB
3.77 MB PNG
>>
>>102367397
And you could have not posted this but you did
>>
>>102367413
just use this, it's pretty easy https://github.com/cocktailpeanut/fluxgym
>>
>>102367459
>python -m venv env
>env\Scripts\activate
really? They still need this bullshit?
>>
>>102367438
It's an important lesson that creativity is nothing something you must only do when you are feeling well. Feeling unwell and doing something you normally enjoy can be part of the recovery process.
Thanks for prompting me to clarify it all for anons.
>>
File: 00052-2951739421.png (986 KB, 896x1152)
986 KB
986 KB PNG
>>
>>102363896
NTA but ty
>>
File: 1723074713770759.png (2.98 MB, 2048x2048)
2.98 MB
2.98 MB PNG
>>
Come and get your daily bread...
>>102367811
>>102367811
>>102367811
>>
great
>>
You're welcome



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.