[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: the longest dick general.jpg (3.62 MB, 3264x1534)
3.62 MB
3.62 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bred : >>102744592

Chink Edition

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io
Metastable: https://metastable.studio

>Advanced UI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
reForge: https://github.com/Panchovix/stable-diffusion-webui-reForge
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://aitracker.art
https://huggingface.co
https://civitai.com
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/kohya-ss/sd-scripts/tree/sd3

>Flux
https://replicate.com/black-forest-labs/flux-1.1-pro
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/aco/sdg
>>>/aco/aivg
>>>/b/degen
>>>/c/kdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vt/vtai
>>
Blessed thread of frenship
>>
>>102764387
What's bigma?
>>
Qu2t hallucinating.
>>
File: 1705772579350143.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
>>
>>102764413
pixart bigma
>>
https://github.com/AIFSH/PyramidFlow-ComfyUI?tab=readme-ov-file
How much VRAM does it ask for?
>>
>>102764413
i dunno wassa bigma with you?
>>
>>102764575
>>102760652
>https://github.com/jy0205/Pyramid-Flow/issues/12#issuecomment-2404752801
>>The 384p version requires around 26GB memory, and the 768p version requires around 40GB memory (we do not have the exact number because the cache mechanism on 80GB GPU)
>>
>>102764387
>/ldg/ returning to it's chang roots
nature is healing
>>
File: 0.jpg (269 KB, 832x1216)
269 KB
269 KB JPG
>>
AMD unveils new AI chips to compete with Nvidia.
>>
File: ComfyUI_temp_uyzyp_00040_.png (2.23 MB, 1072x1880)
2.23 MB
2.23 MB PNG
>>
>>102764727
we're more likely to see a completely new AI company making hardware from China than AMD seriously competing in AI
>>
>>102764727
it's useless, they'll always be below Nvdia because of CUDA
>>
>>102764768
Make the chips compatible with CUDA. Simple. Right?
>>
>>102764785
it's been more than 5 years they tried that, they got somewhere but it's still not close
https://github.com/vosen/ZLUDA
>>
File: file.png (3.58 MB, 1287x1788)
3.58 MB
3.58 MB PNG
Babe wake up, they improved SDXL
https://huggingface.co/comin/IterComp
https://civitai.com/models/840857/itercomp
>>
>>102764930
>SDXL
I sleep. Why are people wasting so much money on an objectively bad architecture.
>>
>>102764595
GGUF when?
>>
>>102765034
Ikr, today we got that video model that uses SD3 (to be fair they said they're retraining everything from scratch) and now this IterComp for SDXL, it's flux who needs love, not deprecated models
>>
>>102765061
I wouldn't say no to someone retraining SD3 or just making a 3B model.
>>
File: 1712031233757673.png (1.08 MB, 896x1152)
1.08 MB
1.08 MB PNG
>>
https://github.com/jy0205/Pyramid-Flow
https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow
there's a demo now
>>
File: 1717434688440756.png (1.6 MB, 896x1152)
1.6 MB
1.6 MB PNG
>>
File: 1724849575722123.png (752 KB, 896x1152)
752 KB
752 KB PNG
>>
File: 1700165491476736.png (757 KB, 896x1152)
757 KB
757 KB PNG
>>
File: file.webm (368 KB, 1280x768)
368 KB
368 KB WEBM
>>102765147
If you go for 24fps you'll only get 1 sec lol
>>
File: fs_0082.jpg (66 KB, 920x920)
66 KB
66 KB JPG
>>102765147
>>102765236
tried a few times to get this to look at the camera but it just sorta wiggled like yours each time :/
oh well not paying gpu minutes to try more, will wait for it to run in under 24gb
>>
File: file.webm (741 KB, 1280x768)
741 KB
741 KB WEBM
>>102765236
>>102765147
went for 8 fps and... kek
>>
>>102765061
wow, surprise, turns out all the people who know what they're doing came to the conclusion that flux is rigid, overhyped, and not worth the training costs. it's simply not a 12b-tier model. bloated with synthetic garbage and still requires sdxl refiner to unslop. bake again
>>
File: file.png (78 KB, 2230x498)
78 KB
78 KB PNG
>>102765279
>will wait for it to run in under 24gb
this model will be history anyway, they're retraining it from scratch to get the best model possible
https://github.com/jy0205/Pyramid-Flow
>>
bigma will save us
>>
>>102765304
>turns out all the people who know what they're doing came to the conclusion that flux is rigid, overhyped, and not worth the training costs.
and so for you going for the most broken base model ever (SD3M) was a good idea? get the fuck out of there
>>
>>102765304
>flux is rigid, overhyped, and not worth the training costs. it's simply not a 12b-tier model. bloated with synthetic garbage
All the CFG antiburners are cope too
>>
>>102765339
>All the CFG antiburners are cope too
good thing we don't need any CFG antiburners anymore with the undistilled models
https://huggingface.co/nyanko7/flux-dev-de-distill
https://huggingface.co/ashen0209/Flux-Dev2Pro
>>
>>102765279
what is putting this thing so high VRAM? The models seems all under 10GB
>>
>>102765305
Wow, SD3 is so shit that even CCP reject this shit.
>>
File deleted.
>>
>>102765370
because making pictures asks for vram anon, and 24fps + 10 sec means 240 pictures that have to be rendered at the same time, it's like you went for 240 batch size on SD models
>>
>>102765365
once someone makes a killer full real fintune then ill be interested
>>
>>102765083
>I wouldn't say no to someone retraining SD3 or just making a 3B model.
I would take retrained 1.5 at this point. V-prediction if possible
>>
>>102765481
Just train Pixart Sigma then.
>>
>>102765481
> retrained 1.5
why? unet is definitely inferior to a DiT architecture

>V-prediction
what's that?
>>
>>102765402
gotcha. I have two cards. I have been looking for a way to split the VRAM requirements. I am not seeing anywhere if that is supported. Models on one card and processing on the other seems like you could get 26GB pretty easily. The larger two text encoders are enough to drop it below 24GB.
>>
>>102765537
>The larger two text encoders are enough to drop it below 24GB.
what text encoders are they using? T5?
>>
>>102765548
I have no idea. There are just folders named text_encoder_1, text_encoder_2 and text_encoder_3. I don't see them used in the code either so I am not sure what is going on. I assume you need them, but I haven't dug that far.

Hopefully another anon will know.
>>
File: file.webm (845 KB, 640x384)
845 KB
845 KB WEBM
>>102765147
>based on SD3M
yeah I can see that
>>
File: mrnu3fzcavtd1.jpg (154 KB, 1178x706)
154 KB
154 KB JPG
better start saving up bros
>>
>>102765642
who the fuck is gonna buy the 5080 and the 5070? Do they pretend the 3090 and the 4090 doesn't exist?
>>
Is there any video model that doesn't do the thing where when you give it a painting, it just kind of does a Ken Burns slow panning effect on it instead of animating it?
>>
>>102765680
Minimax actually animate shit but it's not a local model so...
>>
>>102765673
aren't they discontinuing the 4090 already?
>>
>>102765733
why would they keep manufacturing 4090s?
>>
>>102765746
easy money
>>
>>102765781
you clearly have never run a business
what happens when they release the 5090?
the factory has capacity limits you know?
why would they sell new 4090s and 5090s side by side?
can you do a business plan that doesn't involve you, as a greedy poorfag, getting a new 4090 for $1000?
>>
>>102765805
what are you on about?
>>
File: meh.webm (577 KB, 1280x768)
577 KB
577 KB WEBM
Pyramid 8 fps img2vid: A middle aged female scientist watches a fantastic machine the spins and whirrs with sparks until a peice of fried chicken falls out from the glowing blue middle of the machine

Seems you get 1 gen then have to wait.
>>
>>102765833
I personally want to win a billion dollars
>>
File: meh-interp.webm (1.34 MB, 1280x768)
1.34 MB
1.34 MB WEBM
>>102765842
not bad with interpolation
>>
>>102764387
>pic
>authors: ching chong ping pong suk mai ding dong
dropped
>>
>>102765885
like it or not anon, but the chinks are the kings of video models, Kling, Minimax, CogVideo, Pyramid...
>>
>>102765781
that's what the 5080/5070 is for
>>
>>102765885
might as well drop this entire hobby then lol
>>
do you think any of them are cute chinese girls?
>>
>>102765949
i like to imagine the sweat and juices of many underpaid chinese jade beauties that touched my nvidia gpu during production
>>
which version of pytorch should i use with comfy? i remember seeing a comparison image that showed some versions are better than others but forgot which
or is it all just placebo?
>>
>>102764930
Been testing this. Seems pretty decent.
>>
>>102765895
It's because boob jiggle triggers all the safety teams
>>
>>102765877
I'd duplicate the space and try native 24fps but im not paying for it.
>>
File: IMG_7785.png (3.18 MB, 1248x1824)
3.18 MB
3.18 MB PNG
>>102766001
do you happen to have any examples that arent super sloppafied like picrel
>>
>>102766044
we can finally move on from flux
>>
>>102765323
yeah and they'd rather train their own model from scratch than use flux
>>
>>102766115
that's because flux is too big, their 3b model already asks for fucking 40gb of vram
>>
>>102766001
>Been testing this. Seems pretty decent.
care to show some examples
>>
>Most of the sample pictures on all loras are done with controlnet/img2img so expect different results if you trying to remix with the civitai generator.

You stupid buzz farming asswipes. Documentation is the most important part of all of this.

t. personal blog man
>>
>>102766057
"finally"
hasn't it only been out like 2 months
>>
>>102766057
>we can finally move on from flux
so you heard one comment from a single anon (that has no images on top of that) and that's it? it's enough for you to make this insane conclusion?
>t. the least disingenuous Flux hater
>>
>>102766057
i kekd
>>
File: ComfyUI_00174_ - Copy.png (1.1 MB, 1680x960)
1.1 MB
1.1 MB PNG
>>
>>102766217
hello saar, how much did the black forest labs pay you?
>>
>>102766001
is this command line only at this point?

>>102766196
the hype cycle has been at least 8. I want to say it started when comfyanon got shit canned (yes, that is bait).
>>
>>102766249
I ask you this question saar, how much did SAI pay you to smear Flux like that?
>>
Bigma
>>
>>102766266
no need to pay me anything to smear flux saar, if you want smear just generate realistic gen with base flux, skin already look smear saar
>>
File: 9900.png (1.68 MB, 1680x960)
1.68 MB
1.68 MB PNG
>>
>>102766287
Explain why Flux is so hyped even though for you it's the worst model ever, Lykon.
>>
File: 9902.png (1.04 MB, 1680x960)
1.04 MB
1.04 MB PNG
>>
File: grid-0007.jpg (1.04 MB, 2560x2560)
1.04 MB
1.04 MB JPG
>>102766170
Here's something

>>102766263
>is this command line only at this point?
I'm using the safetensor conversion
>>
>>102766332
>I'm using the safetensor conversion
on comfyUi? Forge?
>>
>>102766310
>hyped
that's all it is saar, hyped. people used it during a great image gen drought, was impressed by prompt understanding and text capabilities, then they saw through it's cracked and got bored. it's been months and nothing has happened. flux isn't even open source.
>>
>>102766345
>flux isn't even open source.
Schnell is Apache 2.0, SD3 has a shit licence, nice bait saar
>>
File: 00015-1922665712.png (3.36 MB, 1536x1536)
3.36 MB
3.36 MB PNG
>>102766342
>on comfyUi? Forge?
reforge
>>
File: ComfyUI_temp_uyzyp_00094_.png (1.69 MB, 1072x1880)
1.69 MB
1.69 MB PNG
>>
>>102766358
>Schnell
Sch-BRAAAAAAAAAAAAAAAAAAP 8 step unfinetunable distilled BRAAAAAAAAAAAAAAAAAAP
>>
File: file.png (538 KB, 1198x1148)
538 KB
538 KB PNG
>>102766382
>unfinetunable distilled
Uh oh...
https://huggingface.co/ostris/OpenFLUX.1
>>
>>102766399
spoken like a true saar!
>they left us their dookie doo doo to eat
i'll be waiting for progress!
>>
File: file.png (182 KB, 500x500)
182 KB
182 KB PNG
>>102766382
https://huggingface.co/stabilityai/stable-diffusion-3-medium
>Downloads last month 42,476
https://huggingface.co/black-forest-labs/FLUX.1-dev
>Downloads last month 1,130,973
lmao
>>
File: grid-0010.jpg (582 KB, 1728x2304)
582 KB
582 KB JPG
>>
>>102766425
looks overcooked as fuck, maybe your CFG is too high
>>
>>102765949
no. girls should stay far far away from this area. they'll simply fuck everything up by lobomotizing the models to make them "safe for women". we need the undivided attention of touch-starved chinks to fuel progress and women will, at best, be a major distraction.
>>
>>102766467
>no. girls should stay far far away from this area. they'll simply fuck everything up by lobomotizing the models to make them "safe for wome
this, we've seen the disaster when women went onto the video game industry, they made every female MC ugly because they're jealous of beautiful women
>>
>>102766467
this. and if you're desperate just i2i a picture of your face
>>
>>102765949
>do you think any of them are cute chinese girls?
I don't really care who's behind this, the only thing that matter to me is the result, I just want a good product at the end.
>>
>>102766587
but it would be cooler if some of the ones behind it were cute girls who are cute to look at
>>
File: grid-0014.jpg (401 KB, 1728x2304)
401 KB
401 KB JPG
>>
File: 00056-1922665714.png (1.82 MB, 1080x1440)
1.82 MB
1.82 MB PNG
>>
>>
People love to scrutinize the small details in AI images. So don't give them any. You need to be blurmaxxing
>>
>>102767125
The perspective is fucked up which is ironic because the blur makes it even more apparent
>>
>>102764930
>they improved SDXL
Can this be used on Flux aswell?
>>
>>102767125
based and blurpilled
>>
>>
File: ComfyUI_temp_uyzyp_00107_.png (1.45 MB, 1072x1880)
1.45 MB
1.45 MB PNG
>>
>>
>>102767125
>>102767217
>>102767273
>generating supersized thumbnails
but why
>>
>>102767317
I am assuming /sdg/ is shitposting/spamming the thread
>>
>>102767349
why would that be your first assumption?
>>
>>102767366
hes retarded
>>
>>102767366
there is a history of them trolling the thread and it has been stupid women shit, flux vs sd things and more images than this thread usually supports. If it smells like a duck and it clearly underage /sdg/ wants to fuck it.
>>
>>102767317
There is no such thing as style. Style IS content. An image is whole, contiguous, a fully-connected network of latent layers.
>>
>>102767317
Do you not how how latent space works?
>>
>>102765642
stop posting slop rumors you gossipy troon
>>
File: file.png (118 KB, 256x256)
118 KB
118 KB PNG
>/ldg/ gens a few months from now
yo guys check out my gen!
>>
>>102767486
>/ldg/ gens a few months from now
at the current rate it's optimistic to predict that there will be /ldg/ gens a few months from now
>>
>1.5: lacks the prompt coherence of later models
>XL: lacks the level of detail present in later models
>Pixart Sigma: lacks enough training
>Kolors: lacks comprehension of the english language
>HunyuanDiT: lacks non asian girl selfie dataset
>SD3: lacks anatomy
>Flux: lacks reasonable hardware requirements
It will never be as good as it once was.
>>
File: file.png (128 KB, 256x256)
128 KB
128 KB PNG
>>102767515
bigma will save us im sure of it
>>
>>102766622
>>102766703
>>102766425
someone please fucking fix the AI lighting problem already. i've seen more realistic shit on deviantart
>>
File: ComfyUI_temp_uyzyp_00112_.png (1.97 MB, 1072x1880)
1.97 MB
1.97 MB PNG
>>102767515
only the stongest will survive
>>
File: file.png (36 KB, 128x128)
36 KB
36 KB PNG
>>
File: file.png (11 KB, 64x64)
11 KB
11 KB PNG
>>
>>102767515
5090 and Titan AI will make bespoke 1B-3B models very common very soon.
I'm hoping the new Pixart architecture is friendly to this but if not Pixart Sigma is more than capable. I'll likely make a pretrained 16 channel VAE that is designed for training on 5090s for the purpose of truly having interesting full fine tunes rather than stacking Loras.
>>
File: ComfyUI_temp_uyzyp_00114_.png (1.84 MB, 1072x1880)
1.84 MB
1.84 MB PNG
>>
>>102765147
Did you know that the guys who open sourced Pyramid flow are the same guys who made Kling?
https://www.youtube.com/watch?v=GD6qtc2_AQA
>>
File: 0.jpg (127 KB, 832x1216)
127 KB
127 KB JPG
>>
File: file.png (2.54 MB, 1024x1024)
2.54 MB
2.54 MB PNG
>>
>>102767747
Pyramid - Zhicheng Sun - Peking University - Haidian, Beijing, China
Kuaishou AI - Haidian District, Beijing

You got anything to backup this bullshit claim?
>>
>>102767952
I should have said that they only thing that connects these things are they exist in the same location.
>>
File: 0.jpg (261 KB, 832x1216)
261 KB
261 KB JPG
>>
File: file.png (223 KB, 2591x704)
223 KB
223 KB PNG
>>102767952
there's some guy from the Kuaishou Technology, it's the company that made Kling innit?
>>
>>102767877
she's cute
>>
File: 0.jpg (294 KB, 832x1216)
294 KB
294 KB JPG
>>
>>102768016
funding a uni project is a far different being the guy who made Kling. He will probably be working for Kling shortly, but I can't find anything that says that the he does now or has the past.
>>
>>102768046
>funding a uni project is a far different being the guy who made Kling.
they're not just funding it, there's literally guys who are in the company that made Kling who participed in this paper, what else do you want?
>>
>>102768057
linkedin or Chinese equivalent.

Zhicheng Sun seems legit. I could be hoping that he doesn't have such ties to corporate ideals.
>>
>>102768016
In china, is the government who direct all, there are nor companies.
>>
File: file.png (748 KB, 1510x900)
748 KB
748 KB PNG
>>102768172
So you're telling me that it's Xi Jinping who decided to give us all good local models for free? Damn he's based! I love china now!
>>
>>102768197
Yes also, with the western restriction, they cannot buy their models so openly like JewAI, so their response would make their model open and free, so they reduce the gains of Jews.
>>
>>102768219
Who would've guessed that the chinks would be the ones who would save us all during this AI clown circus show? Not me, I'm pleasently surprised, any help is a good help
>>
File: file.png (923 KB, 3076x1466)
923 KB
923 KB PNG
https://sihyun.me/REPA/
this shit is interesting, it makes the model learn concepts way faster than the usual
>>
>>102768437
you have a link that I can trust?
>>
>>102768478
Sure
https://github.com/sihyun-yu/REPA
>>
>>102768437
Going to pull apart this buddy, I'm dying to do a new diffusion model. 17x is insane
>>
>>102768487
thanks. Looks promising enough to ignore the python3.9 version.
>>
>>102768497
>17x is insane
not just that, the final loss function is even lower at the end, so your model will be even better with that technique
>>
File: file.jpg (481 KB, 3076x1516)
481 KB
481 KB JPG
>>102768437
I love those papers, the more we improve the training process, the more accessible it'll be for everyone, at some point we won't have to rely on multi million dollar companies to make good shit
>>
File: 2698_.jpg (913 KB, 2688x3456)
913 KB
913 KB JPG
Is MeshGraphormer still the goto for hands?

>>102768721
mod approved edit. Stupid accidental cameltoe
>>
File: ComfyUI_temp_cpnsr_00002_.png (3.12 MB, 1126x1452)
3.12 MB
3.12 MB PNG
>>
>>102764387
>>
>>102767832
Very cool
>>
File: 0.jpg (286 KB, 832x1216)
286 KB
286 KB JPG
>>
>>102768571
you will because you still need the huge datasets that we don't have.
>>
File: 0.jpg (118 KB, 832x1216)
118 KB
118 KB JPG
>>
>>102767545
>flux lacks reasonable hardware req
512x512 flux-dev-nf4 works fine with midrange cards
>>
>>102770061
>you still need the huge datasets that we don't have.
it's not hard to get a dataset, you use Laion, you scrap some of them on the internet...
>>
>>102770786
was't laion taken down? due to CSAM or something?

I'll always be haunted by the time I CLIP searched Laion for "pretty college girl cleavage" and a literal picture of my old next-door neighbor was in the results
>>
>>102770815
>was't laion taken down?
no they brought that back recently after cleaning it
>>
>>102770722
"works fine" more like "cool to see for the first time, then you realize it's not worth it"
>>
>>102770786
laion being garbage is the reason sd1.5 and XL are so rudimentary.
>>
>>102770835
now you're making a different complaint. one I disagree with
>>
>>102770844
i should have phrased it differently. recommended hardware requirements. it's a big model. quants don't really improve speed just space optimization.
>>
>>102770815
I scraped millions of images using duckduckgo, it's not hard. Just get ChatGPT to generate thousands of search queries and download everything high resolution.
>>
>>102770901
>it's a big model. quants don't really improve speed just space optimization.
it's true, I wished it would be faster to render a single image on Flux, especially when I'm CFGmaxxing
>>
File: ComfyUI_34282_.png (1.4 MB, 848x1024)
1.4 MB
1.4 MB PNG
>>
File: ComfyUI_34283_.png (989 KB, 848x1024)
989 KB
989 KB PNG
>>
>>102771230
lol the one on the left bed
>"Sir you need to put your blankets over your lower body..."
>>
File: ComfyUI_34288_.png (1.26 MB, 848x1024)
1.26 MB
1.26 MB PNG
>>
File: ComfyUI_34293_.png (1.25 MB, 848x1024)
1.25 MB
1.25 MB PNG
>>
>>102764413
bigma ballz
>>
File: file.png (1.31 MB, 3013x1574)
1.31 MB
1.31 MB PNG
>>102765147
https://pyramid-flow.github.io/
I have a serious question here, why are the scores so close to each other? Kling is miles ahead that Pyramid model yet the number suggests they're on the same level, that's complete bullshit
>>
>>102771632
it only took you two years to realize benchmarks are meaningless
>>
>>102770904
>and download everything high resolution.
And why would I want to limit myself that way?
>>
Time for some REPA of ass. Trying 16 channel VAE training, too bad it's based on just a 256px crop model.
>>
>>102771829
can REPA be used for finetunes aswell? we would improve Flux a lot with it
>>
>>102771856
You're essentially use CLIP as a regulation technique when computing losses, so yes. I'm sure there are other ways to apply it too.
>>
>>102771867
>You're essentially use CLIP as a regulation technique when computing losses
imagine if you use T5, goddam the possibilities are endless
>>
>>102772076
They're using the image features, so it would be more like using Florence to create losses.
>>
File: HomoUI_00001_.jpg (30 KB, 682x512)
30 KB
30 KB JPG
>>102767515
Preach it sister!
>>
File: 57964.png (2.38 MB, 1440x3120)
2.38 MB
2.38 MB PNG
is this whole gay ass fucking website dead? maybe the nukes started flying in the mid east and we didn't hear about it yet?

>>102772216
hell yeah brother, comfysisters btfo
>>
I uploaded the Q8_0 version of dev2pro (another undistilled dev model)
https://huggingface.co/TheYuriLover/Flux-Dev2Pro-GGUF/tree/main
I still prefer de-distill but it's not that bad
>>
File: 57970.png (1.01 MB, 1440x1440)
1.01 MB
1.01 MB PNG
heh
>>
>>102772246
>is this whole gay ass fucking website dead?

A lot of people have been banned.
>>
>>102772746
probably for the best, but i dismay at the apparent attrition for our diffusion threads.
>>
Pyramid is saved! (after the comfy integration FUCKED some peoples Comfy setups, mine included lol (downgrade numpy then use checkpoint to repair shit/uninstall problem nodes, delete the integration and re add the broken nodes))
(really bad release so far but they are retraining to shake off the SD3 sauce)
>>
File: 57977.png (1.48 MB, 1440x3120)
1.48 MB
1.48 MB PNG
i waste this gen on you lot
not because i must,
but because i can,

ps 49 times...
>>
File: 00036_.jpg (452 KB, 1792x2304)
452 KB
452 KB JPG
>>102772912
I lost. Fuck that guy. Fuck anyone who just posts solutions at random.
>python 3.8 is history
End of life was Monday. Fuck him and his fucking waste of resources that he causes.
>>
>>102772912
I'm not gonna go that path, they're retraining their model so I'll wait until they got the best one out of the nature
>>
>>102773170
The code is there on github, why doesnt he just rewrite it to be compatible with 3.10?
Oh yeah i forgot, he's a money grabbing women-like (complain, don't offer a solution or do any coding work towards it then hold up a sign behind a paywall that says "I made this" while pointing to the work of others that you complained about) grifter.
>>
>>
>>102764387
ahh so thats what sailor moon would look like if she had downs syndrome
>>
File: 4010243045.jpg (3.34 MB, 2688x3456)
3.34 MB
3.34 MB JPG
>>
>>102765642
I could believe the 5090 but the others seem implausible. Anything less than 16GB for a xx70 seems pointless, and $1k+ for a 16GB card also seems like a hard sell. I could imagine them nickel and diming with 20GB for the xx80 though.
>>
>>102773815
>$1k+ for a 16GB card also seems like a hard sell.
don't forget those are graphic cards, you don't need much more than 16gb to run the latest games, so people won't mind if it gets better speed than the 4090
>>
>>102773821
I don't know about that, you don't really need more than 16GB of RAM for games either, but people still buy 64GB
>>
File: 1722412552754502.png (144 KB, 400x712)
144 KB
144 KB PNG
>>
>>102767562
Is this AI? Model/catbox?
>>
>>102771148
Love this
>>
File: 2412708583.png (3.65 MB, 2304x1792)
3.65 MB
3.65 MB PNG
>>
File: pumpkin.jpg (1.5 MB, 5040x2480)
1.5 MB
1.5 MB JPG
I tried to get flux to do some lazy halloween costumes. I created an image and then got flux to do i2i. Left is when I let flux have high denoise then how much flux is used lowers as it goes left. Is this because I had Halloween words in there and it wanted to turn it animated or simply a skill issue.

I saw the strap. I don't care if I am testing.
>>
File: file.webm (841 KB, 1200x720)
841 KB
841 KB WEBM
https://xcancel.com/cubiq/status/1844332817767072128#m
kek, Pyramid Flow looks fun to play with, too bad it's asking for too much VRAM though
>>
File: 00122-1757856519.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
>>102773913
sorry i didn't bother saving that specific gen but this should have the same prompt and settings i used
>https://files.catbox.moe/t1y61h.png
it's llustriousXL_smoothftSPO with 10 sampling steps downscaled to 256x256 to make it extra blurry and appealing for the average flux user. i stole most of the prompt from an anon in /h/
>>
>>102774404
>i stole most of the prompt from an anon in /h/
>>>8251510
this one
>>
>>102774460
nice
>>
>>102774460
how do i crosspost
>>>/h/8251510
>>
>>102774492
>how do i crosspost
yeah it worked anon, everyone is talking about that illustrous model, did it deprecate pony or it's not cooked enough yet?
>>
File: file.png (1.65 MB, 1024x1024)
1.65 MB
1.65 MB PNG
>>
>>102774502
>did it deprecate pony or it's not cooked enough yet?
for me, both. looks way better than ponysloppusion but i don't do hardcore sex gens so not sure about that but it's also undercooked, kind of unstable but the smoothftspo tune helps alot with that. it knows alot of artists but because it's undercooked they are only really useful for mixing styles, i recommend it.
>>
File: ComfyUI_34296_.png (980 KB, 848x1024)
980 KB
980 KB PNG
>>
File: ComfyUI_34302_.png (858 KB, 848x1024)
858 KB
858 KB PNG
>>
File: ComfyUI_34307_.png (1.24 MB, 848x1024)
1.24 MB
1.24 MB PNG
>>
File: ComfyUI_34316_.png (1.05 MB, 848x1024)
1.05 MB
1.05 MB PNG
>>
File: file.png (1.76 MB, 1024x1024)
1.76 MB
1.76 MB PNG
>>
File: IMG_0565.png (918 KB, 1024x1024)
918 KB
918 KB PNG
>>102766382
>dedistill
>finetune
>enjoy
>>
File: file.png (1.8 MB, 1024x1024)
1.8 MB
1.8 MB PNG
>>102774788
ftfy
>>
>>102774891
Did you just inpaint the nipples
>>
>>102775030
yeah
>>
>>102775107
Nice
>>
> | 7/8 [15:33<02:36, 156.20s/it]
> | 7/8 [15:25<02:33, 153.38s/it]
My 16GB on Flux Q8 ;_;
>>
>>102774693
Nice
>>
>>102775463
is your batch size higher than one?
also use a lower quant, Q6_K should be about as good, the K stands for quality
>>
>>102775640
Batch size is a mere 1.
>>
>>102775664
are you loading T5 and clip on cpu or gpu?
>>
>>102775691
t5xx16fp,
>clip on cpu or gpu
?
Swap location: Shared
>>
>>102775729
forge?
try the other swap location
>>
>>102775760
>the other swap location
That's slow (4.3something s/it) on Q4 and NF4 already.
>>
>>102775765
i think swap location might be for the model layers then, and not the text encoders
im actually not sure where it loads T5 and clip, and if you have an option to change it
maybe check your memory stats while everything is loading so you can identify what goes where, and consider trying it with comfy instead or just swapping to a lower quant
also, if that 16GB happens to be an AMD card, i think it is going to be slower regardless and you should look online for how other people deal with it
>>
File: 0.jpg (208 KB, 832x1216)
208 KB
208 KB JPG
>>
>>102775824
The card is a 4060TI. And I don't think it lets me set where to load T5 and Clip, according to console it looks like it puts everything in VRAM.
>Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
>[Unload] Trying to free 13464.34 MB for cuda:0 with 0 models keep loaded ... Done.
>[Memory Management] Target: JointTextEncoder, Free GPU: 14539.60 MB, Model Require: 9569.49 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 3946.11 MB, >All loaded to GPU.
>Moving model(s) has taken 24.69 seconds
>Distilled CFG Scale: 3.5
>[Unload] Trying to free 17053.25 MB for cuda:0 with 0 models keep loaded ... Current free memory is 4883.03 MB ... Unload model JointTextEncoder Done.
>[Memory Management] Target: KModel, Free GPU: 14530.14 MB, Model Require: 12125.39 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 1380.74 MB, All loaded to GPU.
>Moving model(s) has taken 69.72 seconds
>100%| | 8/8 [18:50<00:00, 141.33s/it]
>[Unload] Trying to free 4495.77 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3353.52 MB ... Unload model KModel Done. | 8/8 [18:42<00:00, 167.05s/it]
>[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 14528.17 MB, Model Require: 159.87 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: >13344.30 MB, All loaded to GPU.
>Moving model(s) has taken 171.20 seconds
>Total progress: 100%| | 8/8 [21:35<00:00, 161.89s/it]
>>
>>102775869
looks like it does but also unloads them and then fully loads the Q8 into vram so there's no way it should be that slow
are you trying to gen images in 4k or something?
>>
>>102764387
Never tough id say this, but that downie is looking kind of hot XD
>>
>>102775897
>unloads them and then fully loads the Q8 into vram
Possibly there's a part from previous gen with NF4 model. Loading the Q8 took more the 24s what it shows in the pasta.
>are you trying to gen images in 4k or something?
Just 1mp with the default preset of Forge for Flux at 896x1152
>>
>>102775928
if you switched from NF4 to this Q8 without restarting once ever since, unironically try turning it off and on again
forge is not free of bugs unfortunately
>>
>>102776031
I don't know, I rather would not restart Forge because one of the bugs is that it removes the generated image from the UI when it finished a new gen after a restart. Need to reload UI too which resets all the current parameters and prompt and resets to default preset.
However switching from Flux to SDXL and back or swapping different Flux models doesn't impact the speed of Flux nor XL.
>>
>>102776106
it might remove it from the UI but should still be in the outputs folder
there is a PNG info tab that you can drag your image into and then click "Send to txt2img"
you can also store your settings as a preset with a plugin, look it up
>>
Noob question. But when I use a lora. Does the lora eat the steps in my settings or does it use its own steps?
>>
>>102776179
the weights get merged onto your model during inference, before the steps, so yeah same settings
>>
>>102776163
Yeah I know but it's still annoying, that I have to do that, open file browser, navigate to folder, drag it into the info tag... I rather set the three numbers I changed again and copy the prompt before reload. But then reloading itself again takes a time.
>>
>>102776237
just try it once to see if it fixes the issue
>>
When will Nvidia increase the number of threads?
>>
I fucking hate python's dependecy system and conda
I finally got Pyramid Flow running on my computer
>>
>>102776429
>python's dependecy system and conda
And they're unaware that they suck and package management.
>>
File: pyram.png (3 KB, 1082x22)
3 KB
3 KB PNG
>>102776429
Not as slow as I thought it would be
>>
>>102776429
>filtered by python -m venv venv
>>
File: 1708843867582408.jpg (26 KB, 446x446)
26 KB
26 KB JPG
>>102764387
https://civitai.com/models/836888/flux1-schnell-fp8

This one is roughly 16 GB

https://civitai.com/models/622579/flux1-dev-fp8

This one is around 11 GB

https://huggingface.co/city96/FLUX.1-schnell-gguf/tree/main

And these ring from two gigs to 20 gigs plus.

What would be the best one to use if file size reduction and generation speed are priority for you? Also how are people even pruning these models? Does anyone know how to do that?
>>
>>102776557
The packages that came on requirements.txt weren't compatible with each other and I had to modify the code to make it work because these chinks don't know how numpy arrays work
And conda was a pain in the ass to set up
>>
>>102776546
Nvm s/it blew up in the next few steps and now it's at 52s/it at 12th iteration
>>
>>102776578
>bloo bloo bloo it's not compatible with my bastard comfyui setup with dozens of custom modules with their own requirements
>>
>>102776602
I just want things to werk, I got things to do besides modifying retarded code and figuring out which combination of versions of 30 modules makes the retarded code work.
>>
File: output.webm (519 KB, 1280x768)
519 KB
519 KB WEBM
Wanted to test how well the model knows real life physics, it's better than I expected but I asked for the avocado to be falling inside a bucket full of water, not the water to fall in a bucket full of avocado
>>
>>102776641
Your expectations don't align with the cutting edge software you're working with. Whether you like it or not you're not working with consumer tools or software. Feel free to come back in 10 years when it's all packaged into an app for your phone.
>>
File: ComfyUI_temp_movuq_00005_.png (3.41 MB, 1177x1518)
3.41 MB
3.41 MB PNG
>>
1girl supremacy
>>
>>102776429
What are you running to make that possible?

>>102776576
The Q1 version. Be aware you asked a speed, size, quality question.
Quantification
Yes. There are many how to quant guides out there.

>>102776818
262/86 ratio with low/no 1girl yous. Plus the asswipe flooding the thread with blurred pics. I weep for the lack of 1girl supremacy.
>>
>>102776882
>asswipe flooding the thread with blurred pics
you wouldn't get it
>>
>>102776882
>What are you running to make that possible?
3090, it's using 23.5GB
>>
File: ComfyUI_temp_movuq_00010_.png (2.58 MB, 1177x1518)
2.58 MB
2.58 MB PNG
>>
File: ComfyUI_temp_movuq_00011_.png (2.56 MB, 1177x1518)
2.56 MB
2.56 MB PNG
>>
File: ComfyUI_temp_movuq_00012_.png (2.64 MB, 1177x1518)
2.64 MB
2.64 MB PNG
>>
Gunna REPA the Sigma in the butt
>>
>>102776578
It's very telling that the chink devs CANNOT construct a requirements.txt that works in a new environment.
Personally i do not trust this project, they seem to have the skill level of undergrads who have copied someone elses work and really have no idea how to present it to the outside world.
>>
txt2vid in pyramid is surprisingly good, kudos to the creators
but the img2vid is very bad
>>
File: ComfyUI_temp_movuq_00016_.png (2.51 MB, 1177x1518)
2.51 MB
2.51 MB PNG
>>
>>102776576
>generation speed
They don't speed up inference like that unfortunately. Flux will always be a monster.
>>
>>102777189
the models itself is pretty good and I don't think a bunch of undergrads would have access to 20k hours of A100, maybe they were using some other version of numpy or torch or whatever but they should indicate that imo
>>
>>102777218
you're talking to a seething no coder whose experience with software is downloading apps on Android
>>
>>102777228
that's me you fucking retard
you can't even follow the order in a conversation, how would you feel if you hadn't eaten breakfast?
>>
>>102777254
I don't care, you're both retarded.
>someone made a model I really really want to use
>but they must be incompetent though
Can you at least be a tad more intelligent? Or are you really just an entitled faggot that is mad people who give things for free to him isn't doing it to the standards of his silver spoon life?
>>
we were never meant to have local video gen its too powerful an idea
>>
>>102777296
Im telling you I had to fix their own code because they were trying to convert a python array to a tensor using a numpy method
You sound underage, go back to wherever you came from
>>
>>102777296
>Basic intelligence is a gift you pig!
Maybe in your world, not in AI land, your world being Chinese land btw.
>>
>>102777320
clearly the code work on their system
I don't care
I'm more inclined to believe you are a retard
>>
>>102777333
Feel free not to use the model since China is le ebil, but it makes me laugh how you have to use it
>>
>>102777336
>works on my machine
so you are the retarded nocoder? fucking hell leave 4chan you sound new and tryhardy
>>
>>102777367
I know you must be retarded but "it works on my machine" basically says it's an ID-10-T error. Troubleshoot the problem between the chair and the computer. After being here long enough I've realized you people can't follow basic instructions.
>>
>>102776882
>the asswipe flooding the thread with blurred pics
blurred 1girl pics*
>>
File: file.png (41 KB, 409x371)
41 KB
41 KB PNG
Holy shit, REPA just werks
>>
>>102764387
What Local Model is 100% privacy friendly, not allowing anything to go out from your computer?
>>
>>102777687
>which CSV is 100% privacy friendly
>>
>>102777600
HYPE
>>
>>102777739
I wonder what happens if you stack of perceptual loss, since you're already doing CLIP which requires images you could put perceptual loss on it as well and you're probably going to get some great results and alignment.
>>
I miss titty elfs
>>
>>102775916
>Never tough id say this, but that downie is looking kind of hot XD
Like wise man once said: "those titties ain't retarded"
>>
>>102776882
>low/no 1girl yous
desu skill issue
>>
>>102772912
>>102773170
>>102773368
I don't understand. Based turkman helped some rando (for free, mind you) and you're upset?
>>
Anyone knows why pyramid flow imge2vid doesn't work?
I get mostly still image and is barely a video. Sometimes it does do something
>>
>>102772912
yeah that was I was talking about
btw the solution is using python 3.9, I had to do that and downgrade numpy to 2.0, and then fix line 146 of the time scheduler.py
>>
>>102778470
Yeah, img2vid is shit
txt2img is pretty good though, and fun to experiment with
>>
china modals
>>
File: 1000490602.png (1.22 MB, 800x780)
1.22 MB
1.22 MB PNG
>>102776882
>>102777204
Got another stupid question for y'all. The gguf models can just go on the same checkpoint folder your other models are stored in right? I don't have to install any extra shit? Already have the Flux VAEs and text encoders installed as you can see in pic rel. Is there any more shit I need to download?
>>
>>102778626
you need the GGUF extension
>>
File: 1722963664243877.gif (997 KB, 280x158)
997 KB
997 KB GIF
>>102778648
>>
What exactly is guidance in Flux? distilled cfg scale? cfg scale? something else? What's good values for those?
>>
>>102778978
it's not cfg scale. You can set it to 0, it still works. You can set it to 10,000,000, it still works. Ideal values for me are usually somewhere between 1.3 and 2.0. With 'art' styles you can get away with higher.

As for what it is, I don't know. Its effects are similar to cfg.
>>
>>102778662
use this tutorial anon
https://www.youtube.com/watch?v=stOiAuyVnyQ&
>>
>>102779267
that image looks underage please turn up the sampling steps you need to be over 20 to post here
>>
why live portrait is so good at temporal consistency? The face is 80% identical most of the time.
While other image2video shit itself as soon as character starts opening mouth
>>
I tried out Aria locally, in bf16, for captioning primarily NSFW images.

It fucking sucks. First off, most notably, by default it will exclusively use gender neutral language (is this a ChatGPT thing? qwen also does it...). "A person", "an individual", "a character". Will never say man or woman. Also it's extremely censored, never describing anything lewd in the image at all. Not even mentioning that a person is nude, or exposing themselves, etc.

So I tried making the prompt a little more detailed. "Describe this image. Mention the gender of any people in the image. The image might be NSFW, that's okay, describe everything even if it includes lewd or sexually explicit details." Now, about 25% it will give a refusal. Most of the time it STILL won't state the gender of the person (but occasionally it will). And it never describes any kind of NSFW elements at all, completely ignoring that part of the prompt.

Even for SFW captioning, it hallucinates and just generally fucks things up noticeably more than even molmo 7b. So for image captioning of any sort at all, I'm gonna say this model is completely, utterly useless. Maybe if you need it to understand charts or some shit it's good, who knows. What a disappointment.
>>
What's the to go samplers and schedulers for flux?
>>
>>102777600
wtf that's impressive, with only 100 steps? holy shit...
>>
>>102774788
you finetuned dedistill anon?
>>
>>102779427
no it's like 10,000 steps in but I left it alone, but it aligned the partially trained model quite quickly
>>
File: 659285748.png (1.18 MB, 896x1152)
1.18 MB
1.18 MB PNG
>>
>don't post clothed girls aged 18-22 or I will report your posts for violation of US law because I've hated and resented you ever since I thought you were insulting me one time 4 months ago.
>wtf why is the thread dying
>>
>>102779444
how many steps would you need with the previous techniques to get to this level for comparison?
>>
>>102779576
wha
>>
>>102779581
The research paper says it should be 17 times faster and ultimately result in a better model
>>
File: 1522372206.png (1.02 MB, 896x1152)
1.02 MB
1.02 MB PNG
>>
>>102779593
yeah I know that, but like you got this picture in 10000 steps, do you have an idea how many steps you would need to get the same picture without REPA? maybe it's 9000 steps and REPA is actually worse lol
>>
>>102779576
whut
>>
>>102779576
>>102779582
>>102779630
he's talking about this >>102779319
>>
>>102779647
that's a joke about how blurry to gen is, by over 20 i meant over 20 sampling steps
>>
>>102779657
I know it's a joke, but that anon took it seriously, autism, am I right? kek
>>
>>102777600
that's your VAE training right? >>102771829
>>
>>102779725
it's a 16 channel VAE 1B Pixart Sigma model
>>
>>102779657
my bad, I assumed it was same anon who posted this >>102767420
>>
>>102779615
bitch is fucked UP
>>
>>102779676
>>102779783
did you /g/irls laugh at my joke atleast
>>
>>102779752
you used CLIP as a regulation technique?
>>
baker-san...
>>
>>102779916
I'm not gonna lie to you anon, I didn't laugh
https://www.youtube.com/watch?v=lcsXGHl_hwg
>>
Fresh

>>102779929
>>102779929
>>102779929
>>
any good local AI upscale for video?

Also I tried few online services
tensorpix.ai seems good, what do they use? Topaz AI is also good but you need to manually tune it.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.