[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: the longest dick general.jpg (2.4 MB, 2000x1201)
2.4 MB
2.4 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bred : >>102804738

Hellish Fate Edition

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io
Metastable: https://metastable.studio

>Advanced UI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
reForge: https://github.com/Panchovix/stable-diffusion-webui-reForge
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://aitracker.art
https://huggingface.co
https://civitai.com
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/kohya-ss/sd-scripts/tree/sd3

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux
Quants: https://huggingface.co/TheYuriLover/flux-dev-de-distill-GGUF/tree/main

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/aco/sdg
>>>/aco/aivg
>>>/b/degen
>>>/c/kdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vt/vtai
>>
>>102826540
I like your collages OP, you put some meme images or some non AI related pictures, but those are interesting nonenthless
>>
>>102826540
you dropped >>102826381
>>
File: ComfyUI_01919_.png (1.9 MB, 1024x1024)
1.9 MB
1.9 MB PNG
>>
You know, if you have a two GPUs, you can block swap to the GPU instead of using the CPU fro full finetuning Flux on Kohya which is slightly faster. The microstutter is awful though, it makes my audio stutter so I can't watch Youtube with a stutter every second.
>>
File: ComfyUI_01926_.png (3.89 MB, 1920x1080)
3.89 MB
3.89 MB PNG
>>
>>102826540
>a whole row just for my gens
baker is trying to turn me into a mommy guy...
>>
>>102826540
that collage must smell foul
>>
>>102826999
>Hellish Fate Edition
>>
No more big old bitches for a while, I really must apologize again.
>>
File: file.png (2.72 MB, 1408x1539)
2.72 MB
2.72 MB PNG
https://github.com/THUDM/CogView3
>2024/10/13: We have adapted and open-sourced the CogView-3Plus-3B model in the diffusers version. You can experience it online.
https://huggingface.co/spaces/THUDM-HF-SPACE/CogView3-Plus-3B-Space
Cool that they released it, it looks all right, not at the Flux level but it's a fucking 3B model so I din't expect better results lol
>>
File: file.png (2.05 MB, 1852x1685)
2.05 MB
2.05 MB PNG
>>102827697
bruh it doesn't know what a photo is or something?
>>
File: ComfyUI_01930_.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>
are quantized flux models as good as the original?
>>
File: ComfyUI_01931_.png (1.52 MB, 1024x1024)
1.52 MB
1.52 MB PNG
>>
>>102827816
Q8 is extremely close to lossless, though if you do side by sides with FP16 you will notice some pixels shuffled around. 4bit introduces noticeable errors.
>>
File: file.jpg (2.04 MB, 7961x2897)
2.04 MB
2.04 MB JPG
>>102827816
>are quantized flux models as good as the original?
only Q8 is really close, you can more easily notice the difference with the others
>>
>>102827834
are people finishing with flux or is it XL for final i2i?
>>
>>102827804
>>102827828
Did you give her nails on purpose?
>>
File: ComfyUI_01935_.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
>>102827925
actually no I did not
>>
>>102827901
I think some people do an i2i pass with an XL model at the end to reduce the slop look. Personally though I find that using a good lora is enough to de-slop it.
>>
>>102827946
Oh, okay... I was just curious, sorry.
>>
>>102827955
>good lora
I only visit civit so I wouldn't know.
>>
>>102827970
kek... fucking kek....
>>
File: ComfyUI_01932_.png (1.33 MB, 1024x1024)
1.33 MB
1.33 MB PNG
>>102827958
I guess thats how the LoRa learned from the training material.
Aika always has her nails done
>>
I'm still using an Automatic1111 stable diffusion installation I setup late June of 2023.
I didn't upgrade as soon as I could have so I'm a little behind of where I should be. Should I be upgrading to SDXL or Flux?
>>
>>102827759
desu you could make that style interesting
>>102828253
dont skip xl
>>
>>102828253
>Should I be upgrading to SDXL or Flux?
flux, definitely flux
>>
>>102828253
are you asking the right question?
A1111 upgrades to A1111, forge or reforge.

SDXL and Flux are models. If you are asking this then >>102828261
>>
>>102828261
>>102828265
>>102828284
Okay... not to sound like a complete idiot, but I never really read up on SDXL. Do I have to start from scratch or can I use my existing setup and actually upgrade from SD to SDXL?
>>
>>102828527
Just drop in the XL models of your choice and you are good to go
>>
>>102828527
i recommend you ditch a1111 and switch to either forge or reforge, they are forks of a1111 so the ui should feel very familiar and your extensions should work fine. i personally use forge for flux and reforge for xl/sd 1.5
>>
>>102828527

Install Forge in a new folder or bite the bullet and learn ComfyUI.
>>
>>102828589
You dog
>>
>>102828621
>bite the bullet and learn ComfyUI.
learning ComfyUi is easy when all you do is downloading workflows and loading them lol
>>
>>102828624
What
>>
>>102828612
>>102828621
Alright. Thanks for your time and advice.
>>102828589
I actually did that when SDXL was pretty new and quickly realized that was probably not going to work because I was being dumb.
So I just kept using SD with the weird model I was using that kept giving me really nice results.
>>
>>102828674
Ok
>>
File: ComfyUI_01937_.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>
File: ComfyUI_01939_.png (1.86 MB, 1920x1080)
1.86 MB
1.86 MB PNG
>>
File: ComfyUI_01950_.png (2.62 MB, 1920x1080)
2.62 MB
2.62 MB PNG
>>
De-distill allows cfg>1, so what's your negative prompt bros?
>>
>>102826558
What are some things we can do with un-distilled that differentiate it from dev?
>>
File: main.jpg (604 KB, 2131x1196)
604 KB
604 KB JPG
Zero-shot editing for flux. At first i was excited but then i saw it's from Google. This will never see the light of day.

https://rf-inversion.github.io/
>>
File: file.png (1.46 MB, 1024x1024)
1.46 MB
1.46 MB PNG
>>102826558
>>102826626
It got better, i was using 20 steps, by using 40 steps i got a way better image. Had no idea Flux needed so many steps.
>>
>>102829471
>Had no idea Flux needed so many steps.
yeah, flux distilled could get away with less steps, the undistilled version is less forgiving on that
>>
>>102829468
yeah that's sad because that could solve the style issues on Flux, you just put a picture of something you like and there you go it mimic that style, no need for a lora or whatsoever
>>
>>102829465
you're quoting something that is answering the question you're asking lol
>>
>>102829503
Adjectives?
>>
>>102827873
The ball and phone being in the wrong order has me mildly frustrated. Not in particular but in general. I always get swap ups like that for various things and I have to roll the dice until I get lucky.
>>
>>102829526
Who are you quoting?
>>
File: file.jpg (498 KB, 3135x1586)
498 KB
498 KB JPG
>>102829468
wtf it's still not converging at 1000 steps?
>>
File: file.png (2.33 MB, 1594x1324)
2.33 MB
2.33 MB PNG
>>102829468
>This will never see the light of day.
https://rf-inversion.github.io/data/rf-inversion.pdf
we have the paper though so we can make the code by ourselves no? They're talking about a new sampler specialized for Flux (FluxSDE), that looks hella interesting
>>
>>
>>
>>102829643
>Additionally, we extend our framework to design a stochastic
sampler for Flux. Our inversion method allows for state-of-the-art performance
in zero-shot inversion and editing, outperforming prior works in stroke-to-image
synthesis and semantic image editing, with large-scale human evaluations confirming user preference.
K, now I'm waiting for a comfyUi sampler node of this
>>
>>
>>102829492
Wasn't PuLID solved this?
>>
>>102829733
pulid is only about using a celebrity's face to make a cool scene with it, it has nothing to do with styles
>>
>>102829577
idk

What improves? I haven't seen dedistilled understanding my prompts better.
>>
>>102829752
>I haven't seen dedistilled understanding my prompts better.
I have, for example if you write something + "pixel art style", distilled can't do it but undistilled can
>>
>>102829733
it was the thing before this anon. Seems everyone forgot because they are too busy cooming.
>>
Inspiration (in-spirit)

https://youtu.be/_hj86TnXpgE
>>
wait i just remembered i have a old rx 570 4gb lying around
think it'd be worth it using it?
btw how do you even run 2 gpus in a consumer motherboard?
>>
>>102830201
https://www.reddit.com/r/StableDiffusion/comments/130otiv/are_6x_amd_rx570_4gbs_useless_for_sd/

If your mobo supports 3 GPUs the manual will tell you.
>>
i will be sincere
i like flux
i ask for a drawing and it gives me something akin to a actual drawing on paper
shame it takes so much
>>
>>102831017
now be insincere
>>
The best part of something other than flux finally hitting this general will be that the mikufag isn't excused for spamming his retarded Miku pictures every thread and reposting slight variations of milu with dreads so frequently that he might as well be considered an SDG tier avatarfag at this point
>>
>>102831114
Why are you guys so full of hate
>>
>>102831114
Now if comfy could be chased out every time for being the utter retard he is as well...
>>
>>102831130
Don't get surprised when people call out your weird Discord ""customs"", we don't respect them because they're weird and you're unpleasant.
>>
>>102831130
having standards is what prevents sdg happening. some of you clearly needed to be bullied a lot harder in your formative years
>>
>>102831171
>some of you clearly needed to be bullied a lot harder in your formative years
ironic
>>
>>102831160
>>102831171
But i only post in /ldg/
I just dont get why you're so full of hate :/
>>
>>102831130
>OH HECKERINO WHY IS 4CHAN NOT MY UWU SAFESPACINO
rope immediately.
>>
>>102831191
I wish I believed you were trolling right now, I truly, genuinely wish I had that amount of faith left in me
>>
File: 00005-534283554.png (1.99 MB, 1128x1512)
1.99 MB
1.99 MB PNG
>>
>>102829688
The stockings are a bit much
>>
>>102831114
Yeah, I try not to read here when things are slow because thread """personalities""" become really obvious. Made the mistake of checking in just now., mind you.
>>
>>102831114
this isn't your personal blog retard, kys
>>
>>102831257
he's right tho.
>>
>>102831306
>he's right tho.
*soft
>>
I'm tired of seeing miku poster too. they were cross posting with illustrious in hdg as well, it's really obvious and annoying. I shouldn't be able to
>identify
anyone on an anonymous board desu
>>
>schizo general
You deserve to be here in the containment thread, all of you.
>>
>>102831386
you can't contain my girth
>>
>>102831386
Shut the fuck up and go back
>>
>>102831100
i think flux gens are aesthetically pleasing
>>
>>102831114
what about blurfag
>>
blurfag if you're reading this.. FUCK YOU!
>>
File: 1728870488355780.png (1.22 MB, 768x1344)
1.22 MB
1.22 MB PNG
what is the point of all of this
>>
I love ldg
>>
>>102831444
My brain automatically categorizes his posts as bot/troll tier spam so he might as well not exist
>>
Full finetune does seem to work now on Kohya for Flux on a 24 GB SPU. I don't know about their psychotic settings though. 1e-5 seems more appropriate for this model also seems psycho not to have grad norm set.
>>
only /ldg/ avatar i support is the scott pilgrim art style pony diffusion monster girl poster. where is he? does he still bake?
>>
File: 1697814031846063.png (1.86 MB, 768x1344)
1.86 MB
1.86 MB PNG
why are we here
>>
*angery tranny noises*
>>
>>102831600
yep, the snowflake whining about avatarfags is probably taking HRT
>>
very debo-esque
>>
the founding changs left this general to rot
>>
>>102831654
Yeah i think we should merge back with /sdg/
>>
>we should go back to posting with the schizo children that spam the same images every day because they're actually, unironically mentally ill
>>
>>102831692
merge conflict
>>
>>102831737
>accept theirs
>>
>>102831692
Not gonna happen, sorry bub
>>
>>102831784
butt why
>>
>>102831771
where'd my balls go?
>>
why does sdg want this threads 3 and 1/2 posters to return to it so badly
>>
>>102831818
Because that will double the amount of people in their thread.
>>
>>102831835
Kek'd
>>
>>102827946
Nice
>>
>>102829468
>This will never see the light of day.
whys that?
>>
>>102832303
maybe the j word
>>
I wrote a molmo-based captioner for flux dev lora training, as an attempt to improve on the one built into fluxgym which used Florence2:
https://huggingface.co/quarterturn/molmo-flux-captioner

There's a command-line and gradio version. Molmo does a much better job than Florence2, it's able to describe nudity and much more subtle details about a scene. I think it improves the quality of lora training.
>>
Bigma soon
>>
>>102832631
two more weeks
>>
File: 30606581.png (804 KB, 1216x832)
804 KB
804 KB PNG
>>102826540
Oh cool, my post (which was not actually my gen) got in.
>>
>>102832631
Eh at least we have working full fine tuning for Flux on 24 GB
>>
>>102831517
we all miss him. he made the best collages :(
>>
I'm trying to make an "Anime to Real Life" workflow in comfyui. Is there a way to load a directory of images one after another and add each prompt to the queue? I've tried the Inspire Load Image List from Dir node as well as the one from VideoHelperSuite and they seem to run the prompts in parallel which I don't have enough RAM for.
>>
File: 1693153131654296.png (390 KB, 660x796)
390 KB
390 KB PNG
Does 3080 Ti 12gb and 32ram fit flux Q8 or do I need to cuck myself to Q6K and lower.
>>
>>102832408
is there no .gguf versions of vision models?
>>
File: file.png (611 KB, 512x512)
611 KB
611 KB PNG
is it just me or flux has absolutely no fucking knowledge of anime?
this is what i get out from "saint seya, from the knights of the zodiac anime."
>>
>>102833130
Not trained on anime
>>
>>102833130
It was gimped during training, so out of all anime characters it only knows Donald Trump and Hatsune Miku, you can also trick it into generating Goku from Dragon Ball if you describe him without naming.
>>
>>102833095
There probably is. This was just a "get it working" version. I'll add the ability to use a quantized model eventually.
>>
>>102833076
It fits but expect around a minute gen time
>>
>>102833130
>>102833168
>>102833170
Where is the flux anime tune? Tell the richfags to get on it.
>>
>>102833273
Wait for 5090s
>>
>>102833273
Wait for
>>
>>102833222
version that can be used with 3060 12gb might get quite few downloads
>>
>>102833307
You could also wait 20 seconds per step on a 4090.
>>
File: file.jpg (332 KB, 1024x1024)
332 KB
332 KB JPG
>>102833130
doesn't look that bad
>>
>>102833362
it literally doesnt know what saint seya is. I did a100 count batch and not ONE got even close to it
all i got was females and females with kemomimi
>>
>all those params
>cant into anime
The absolute state of FLUX.
>>
>>102833505
>hire an office filled with accountants
>none can cook me a lobster thermidor
The state of accounting
>>
>>102833455
what was your prompt
>>
So this seems to be moreso the place to discuss models and technique, where the other thread is moreso where gens are posted, right?
>>
>>102833603
two different generals for two different groups of autism
>>
File: l.png (329 KB, 600x600)
329 KB
329 KB PNG
>>102833603
Not really it officially split because Stability AI turned full retard so Anons didn't want to call the thread "Stable" diffusion general anymore.

Unofficial reason is there was some drama and Anons got triggered by other anons posting the same gens and avatarfagging.
>>
>>102833644
It's the same reason why there's an adult table and a kids table.
>>
>>102833656
I sat at the adult table as a kid and it has fucked me up
>>
>>102833603
Look at the collage full of images
>>
>>102832716
>(which was not actually my gen)
where'd you find it?
>>
File: glurrh.png (2.27 MB, 1024x1024)
2.27 MB
2.27 MB PNG
>>
>>102833822
/sdg/
>>
no new models makes anon antsy as evidenced by this bread
>>
>>102833743
Yeah and not even the better ones
>>
bigma when
>>
File: image-37.jpg (194 KB, 1024x1024)
194 KB
194 KB JPG
>>102834029
Can't get it to repeat
>>
>>102834190
YNNART, YNNART!
>>
So I used to mostly believe the people on the /h/ board were generally trolling but I'm starting to think they actually are in many cases people who for some reason really cannot think in a context outside of "current meta checkpoint vs xyz incumbent meta checkpoint" or consider that anyone else might not approach things from that context by default.

Maybe I'll just stick to the /b/ board, the content was always way better there anyways.
>>
>>102833905
Sick
>>
File: sana.png (84 KB, 1727x626)
84 KB
84 KB PNG
New Pixart paper?
https://nvlabs.github.io/Sana/
>>
File: pixart news.png (79 KB, 1068x483)
79 KB
79 KB PNG
>>102834438
BIGMA PAPER THIS IS NOT A DRILL
>>
wtf we sana now
>>
>>102834438
some of the samples look decent, others look completely terrible. 0.6b so not exactly a 'bigma', about the same size as sigma. i will try it out later and compare. smaller models are cute but i still think a solid 3-4b model without any t5 shit would be ideal.
>>
File: Flux_00898_.png (925 KB, 1024x1024)
925 KB
925 KB PNG
>>102833822
CivitAI, there was a LoRA for 2000's alt girls.
>>
File: sana.png (191 KB, 1363x743)
191 KB
191 KB PNG
>>102834501
after reading more, there's a 0.6b and 1.6b version as well. if this comparison holds true, it would be a solid base for playing around with. but as we see so often, these benchmarks tend to stretch the truth. hope the release doesnt take forever
>>
>>102834501
seems like there's 2 models, a 0.6b and 1.6b i didn't know models this small could generate text, wtf was flux doing with 12b then
>>
>>102834599
Have faith in the chinamen
>>
>>102834438
>>102834599
VRAMLETS WE ARE SO BACK
>>
>>102834619
https://www.youtube.com/watch?v=osoT94KZE3E
BASADOS
>>
dissenters will be drawn and quartered
the day of reckoning is upon us
rejoice
>>
>>102834501
0.6b = actually trainable
And it's quality is good enough that you can make your porn pony model on the cheap
>>
imagine the humiliation stability ai must be feeling right now, quite exicted for the new sana model. hope it releases soon.
>>
File: ComfyUI_01974_.png (1.83 MB, 1024x1024)
1.83 MB
1.83 MB PNG
>>
>>102834438
>As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024 × 1024 resolution image.
Goddamn
>>
>>102834686
what the fuck, how is it possible
>>
wtf i love ldg now
>>
>>102834693
In the same way that Florence2 is extremely small yet is really good. Poor architecture causes the weights to be underutilized and the model to struggle finding the optimal solution.
>>
>>102834672
Also ding dong the T5 is dead.
I knew my Pixart boys would deliver.
- lightning sampler to begin with
- better text encoder
- better vae
- tiny model that runs on a laptop GPU (that's without the 8-bit / quantization copes)
>>
ldg returning to it's roots, the second coming of pixart is here. at our lowest low, chang came back!
>>
>>102834438
are any of these woman names? do you think they're cute?
>>
>>102834769
if they aren't you can always i2i, soon with blazing fast speeds
>>
>>102826540
somebody explain that equation to me, I was top tier in maths smoking weed and barely studying, now I can't even make a +2 digits divisions on paper
>>
>>102834749
Also it's not "4K" as we would say 4K, it's 4096x4096 images in 9 seconds.
>>
official sana weights waiting room
>>
>>102834783
see >>102815183
>>
/G/entoomen, we are seeing a major rebound
In praise of products for the high brand of diffusion
The, if you will, super-sana era

Chang light the way to our next endeavor
Lo, the fallen models: two have become one
Yet the image, it is singing

A new spirit of diffusion has arrived

In praise of speed
I say unto you, my weary anon:
The super-sana era is here!
>>
>>102834749
>Also ding dong the T5 is dead.
I hope I never have to use language models to unnecessarily expand prompts again. "Natural lighting" vs ""Sun queefs it's rays across the whole fucking image"
>>
wake me up when they drop the demo
>>
watch this, i am going to make them release the sana weights in 5 minutes.
>>
>>102834831
sounds like they just need basic 3d fluid simulation, shit solved decades ago
>>
File: file.png (311 KB, 622x370)
311 KB
311 KB PNG
>>102834811
Also we're looking at huge speedups in training since they're using linear transformers now which are o(n) rather than o(n^2), also no positional encoding which means the training process is likely even more ratio independent. This also means that the 0.6B of this model is going to be more efficient compared to the 0.6B of Sigma (less network wasted on positioning). The text encoder is Gemma-2 which apparently has better contextual language and internal reasoning / chain of thought compared to T5. They're also using a 32x Latent Representation rather than 8x, the VAE appears to be 32 channels (rather than 16).
>>
File: dnd_highres_00596_.png (3.94 MB, 2400x1680)
3.94 MB
3.94 MB PNG
>>
>>102834926
>They're also using a 32x Latent Representation rather than 8x, the VAE appears to be 32 channels (rather than 16).
we're eating so good
>>
proud to be a pixartsexual
>>
>>102834926
32 channel VAE is interesting, but ultimately the dataset is what will define the model (that is, if people will even bother using the base model anyways before finetuning it)
>>
Cog and that other shite img2vid model from last week are still shit right?
The ones where the comfyui install broke itself and cog couldnt write a fucking readme to save their lives.
Anything changed or anyone has any comments on the (absolute) state of img2vid?
>>
>>102834987
It's so lightweight you'd finetune it with whatever within a couple of weeks. It's not going to be like Flux, you're probably going to be doing batch 8 on a 4090 at 1024px with 2 seconds per step with this model.
>>
File: 4K-4.jpg (3.97 MB, 4096x4096)
3.97 MB
3.97 MB JPG
OMG PIXART BROS WE ARE SOOOOO LE BACK!!!!!!!!!
>>
>>102835003
there was another significantly better model that released but i cannot remember what it was called. pyramid or something.
>>
>>102835015
unbridled sovl
>>
File: Screenshot.png (288 KB, 645x570)
288 KB
288 KB PNG
>>102834854
> never have to use language models to unnecessarily expand prompts again
idk anon, feels like they just baked it in for you this time
>>
File: 38.png (1.08 MB, 1024x1024)
1.08 MB
1.08 MB PNG
>>102835015
>>
>>102834438
hope this isnt another cascade
>>
>>102831246
>>
>>102834926
Gemma-2 has been very good for local bot. Easy to guide creative style.
>>
>>102835018
Thanks pyramide was the other shite model I was referring to, I forgotten it's name as it was kinda DOA.
>>
i hope that sana will be our mistral nemo moment
>>
>>102835018
oh, to add, it was DOA to me as they were still training the model (to not be shit) but released the shit version anyway, because social credit points for productivity i guess???
>>
>>102835070
Pixart SIgma was very good, this model appears to fix everything. The architecture truly is what is SOTA and anyone wanting to make a model should probably use it.
>>
is there any anime finetune that isnt danbooru but rather more descriptive?
i find it hard to make these models give me exactly what i want when i am beholden to tags
>>
>>102835024
Wait, it contains system prompt? Can it be changed?
>>
>>102835090
>released the shit version anyway
probably as a way to get funding for the bigger model
>>
>>
>>102835030
in what way?
>>
>>102835224
i think he meant that he hopes it doesn't get ignored like cascade was
>>
>>102835242
Cascade was ignored because SAI said SD3 was the best thing ever and was coming out in two weeks. If Flux didn't come out we'd probably be using Cascade right now but honestly the architecture of Cascade wasn't that great.
>>
>>102835224
that the giga compression doesnt make it impossible to train into something actually usable
also the relatively low param count sucks
>>
>>102835297
Anon, it's a VAE, if it didn't work it wouldn't produce any high quality images.

>also the relatively low param count sucks
>wah why isn't anyone making a porn finetune for Flux
>>
File: peer.png (2.15 MB, 1024x1024)
2.15 MB
2.15 MB PNG
>Put in a bunch of AI images of various quality, mostly very high quality, into AI image detector
>Detects them with 100% accuracy
>Even put in abstract ones with no possible fingers to count wrong or obvious shit to copy
>Can always determine the human abstract from the AI abstract
>Put in an image I inpainted a bit
>Human rating was higher than the original
>Put in a picture I poured over and inpainted meticulously to make it high quality
>95% human rating
What is the detector detecting, exactly? Is there some uniquely human way of doing things that just comes through, even in the way we edit images? I didn't use an image editing tool on the 95% human one, just a ton of inpainting. It was still all AI, but it still recognized that it was heavily human-touched. Pic unrelated, since the 95% one is NSFW.
>>
>>102835336
AI images have detectable sub-pixel patterns, the detection isn't as complex as you think and is easily confused.
>>
>>102835336
cool gen
>>
>>102834438
>uses Gemma-2 2b as the text encoder
Why not just go for the 9b? You only process the prompt once per image. It's not the bottleneck, even with a relatively large TE. For actual LLM use cases, the 2b tier is just retarded. The model is simply too small. Meanwhile gemma 2 9b is quite good, and known for even punching above its weight. I feel like they left a lot of performance on the table for no real reason.
>>
>>102835350
Ahh, I see! Figured it was something like that, it just seemed odd that the patterns would be so broad, you'd think that such small patterns would be easily detectable even within the small areas of inpainting. Is the fact that those patterns exist something that can be solved, or is it inherent to the architecture?
>>
>>102835401
You're not talking to it, you just want it to know "dog" and ":dogemoji:" are the same thing. It doesn't need to write a coherent five paragraph story.
>>
>>102835317
you think the eye/clothing texture artifacts are ALL just from the unet randomly deciding to make them? go take a look at some cascade images and just how much it loves to take a dump on the details
and clearly if less than 2b params was enough we would all be using pixart sigma or other memes rather than everyone moving on to either sdxl or flux
>>
File: 407660.jpg (552 KB, 1428x1400)
552 KB
552 KB JPG
>>102835388
ayy, thanks! it's really fun mashing styles together and totally forsaking quality tags, it makes some super neat stuff. Girls go apeshit when you mix them with Charlotte from Madoka and album covers (picrel), turns out.
>>
>>102835437
I think you 12B fags will be forever waiting for a model lmao
The 1.6B model is going to be the porn model everyone uses, guaranteed.
>>
>>102835350
Are the patterns human-detectable? I know images that were img2img'd were visible to the human eye if you just looked a little closely, they had this really subtle, odd texture to them that couldn't really be removed.
>>
>>102835482
Probably not human detectable and who really cares. The pattern is going to be from the VAE decoding images from latent space.
>>
>anon brings up Cascade again
>>
>>102835032
Stockings with sneakers is a travesty
but I like the style
>>
>>102835420
I mean maybe, but I'd still like to see how much improvement you get using an even larger text encoder. Like look at prompt comprehension of pixart sigma vs SD1.5. The diffusion model is of comparable size, the gains are purely coming from pixart using T5-XXL. The Sana paper even has chart showing that gemma-2 2b is slightly worse than T5-XXL (they claim it is "comparable"). So there's no real improvement on the TE front. Imagine if the jump from T5 to a larger modern LLM is as big as the jump from CLIP to T5. But we'll never know.
>>
Will Sana's compression cause it to shit out on details like Cascade? Cascade sucked ass at fine details, eyes always blurry etc
>>
>>102835676
It compensates by being 32 channels.
>>
File: file.png (1.85 MB, 704x1312)
1.85 MB
1.85 MB PNG
>>102835691
Also Cascade was undertrained just like Pixart Sigma, so we don't know what its final form could've been.
>>
If Flux was smart they'd release their Pro 1.1 weights.
>>
>>102835691
no it doesnt, all that matters is the latent size, sdxl is 4x128x128 for a 3x1024x1024 input image
this compresses the latents to 32x32x32 which is still half the size of sdxl
maybe they have found some super efficiency gains to get them through this and it will look great, then thats awesome, but im skeptical
>>
>>102834934
Nice action
>>
>>102835770
We're talking about a 2D image being represented as a 32x32x32 cube. What you understand about compression doesn't apply to a (V)AE.
>>
sana-samas.... the wait is killing me... is it here yet?
>>
>>102835805
yeah right, we will see in practice, because the examples they gave sure dont look too hot
>>
sanasexual pride month
>>
>>102835858
The Sigma examples weren't great either but the model itself was capable of good images. I don't believe it's going to be the AE holding back Sana. And maybe its going to end up equivalent to SDXL for overall fidelity of fine details, but that might be worth being four or eight times faster at inference and training.
>>
>>102835100
most animu models will give a little to regular prompts as apposed to booru tags but the nature of animu necessitate using said tags considering booru is really the only database for it
perhaps merging an animu with non-animu checkpoint will get you closer to what you want
>>
>>102833505
>>cant into anime
>The absolute state of FLUX.
But there's tons of anime LoRAs for it.
I did these myself:
https://huggingface.co/quarterturn/cute-yuki-mix-adorable-lora-v2
https://huggingface.co/quarterturn/bing-anime-style-flux-lora
If anything filters people from Flux, it's probably that it started with ComfyUI, which isn't as easy as automatic1111.
>>
File: ass.jpg (596 KB, 3375x1108)
596 KB
596 KB JPG
>>102827873
Alright someone spoonfeed a babby because I'm genuinely too retarded.
Why does it refuse to listen to the prompt? Did I download some retarded meme version of flux? Because I'm getting nf4 tier slop results when I downloaded the Q8.
12gb 3080 Ti and 32gb system ram. Does the model go retarded because it can't load it all with vram? Am I missing some bullshit text encoder?
>>
>>102836084
>https://huggingface.co/quarterturn/cute-yuki-mix-adorable-lora-v2
>Chubby miku guy
Please tell me you put some cute chubbers in there. I'd kill for a high quality chubby anime flux LORA.
>>
>>102836123
Well, if I had to guess, it's probably that a lot of anime data was scrubbed, so even if it can do Miku, it doesn't "know" how to do a lot of shit with anime-style figures in different situations, since it has basically zero anime in the training data.
>>
genned 1500 flux 1girls overnight, here's the best failed gen of the batch
>>
>>102836159
but >>102827873
is doing it right there. Thats flux dev Q8 right? So why is mine being retarded.
>>
>>102836123
what's fluxfusion?
>>
>>102836178
https://civitai.com/models/630820?modelVersionId=936976
>>
>>102836183
holy shit it's so slop
for a second I thought you linked a SDXL model
what went wrong?
>>
>>102836161
Love it
>>
>>102836161
brap
>>
>>102836183
maybe that's the reason it's not listening to your prompts, try base flux. flux finetunes and merges are still just memes
>>
>>102836196
>>102836206
Yeah that was my guess I'll try the direct Q8 conversion of base flux since no chance of running the full model.
>>
>>102836123
The dedistilled dev model has better prompt adherence.
>>
I'm going to need the best team of prompt scientist working for 3 days exclusively on Stephanie from Lazy Town
>>
>>102836220
>The dedistilled dev model has better prompt adherence.

That can't run on Forge yet though or can it? I remember some Anon yesterday couldn't run it.
>>
>>102836084
Oh hi miguanon.
There's no use arguing with that one. It's an eldeegy shitposter just like the antilocal/antiopensource shitter on elemgy.
>>
File: 0.jpg (300 KB, 1344x768)
300 KB
300 KB JPG
>>
/ldg/ morale status?
>>
>>102836388
we have no morals
>>
>>102834287
They have a cargo cult mindset
>>
File: file.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
>>102836388
>>
>>102836388
rock hard
>>
>>102835691
I can see it, the details are very detailed (the 32ch VAE) but the general structure in which the details happen is extrmely messy (the super-compression)
>>
>>102836398
illiterate
>>
Anyone else having problems with loras working on forge? It's like forge is sometimes 'forgetting' to apply them, they show up in the lora tab and everything, but only some seem to be working. All those loras previously worked on A1111.
>>
>>102837003
did you try updating?
>>
Is Comfy really the best way to run flux?
>>
>>102837145
yeah
>>
File: 00037-2962381111.png (2.09 MB, 1024x1536)
2.09 MB
2.09 MB PNG
>>
I'm creating a lora. Should I make it fp16 or bf16? It sounds like I should only choose fp16 so that people with 20 series cards can use it. Is that a large audience? Should I prioritize compatibility?
>>
>>102833644
>Stability AI turned full retard so Anons didn't want to call the thread "Stable" diffusion general anymore.
I haven't genned anything since august 2023, and haven't looked back into the space until today. What happened?
>>
>>102837463
The usual muh safety and ethics bullshit so SD3 got lobotomized so badly it forgot human anatomy.
>>
File: 0.jpg (296 KB, 1344x768)
296 KB
296 KB JPG
>>
>>102837463
What brought you back?
>>
I hope the Sana model will be decent, it will be so nice to go back to a faster model again.
>>
>>102837463
If you remember SD 2.0, SAI did it again but worse with SD3 except this time it couldn't even make pictures of women laying in grass. Then what happened was multiple things: the company has been fragmenting part since the beginning of the year, a group of those researchers made Flux (the new hot model), SD3 came out, was absolute shit, didn't even come out with proper training tools so no one even tries finetuning it.

Of course outside of SAI you have multiple other models (particularly from China) which have been the real innovation in the space including now Sana that was announced today which is likely going to be a SOTA model especially for small, efficient models which will come out soon. These alternative models caused the general split because SDG is full of a bunch of literal children (and their groomers) who are incapable of change and just want to spam 1girls.
>>
File: 00007-98318545.png (2.12 MB, 1024x1536)
2.12 MB
2.12 MB PNG
>>
>Gemma2 license
>SANA DOA
>>
>>102837516
the fat girl in the OPs photo not gonna lie. I'm fiercely horny today
>>
>>102837592
>>102837498
thank you for the QRD. I wish I was here for that, it must've been a shitstorm.
I really can't tell what possesses these companies into moralfagging their way into failure.
>>
>>102837647
>don't commit crimes with it otherwise we don't give a fuck
Okay, really the hill you're dying on huh?
>>
will sana finally put an end to the 1girl menace plaguing /ldg/?
>>
>>102837739
maybe not if we can do 1girl 50x faster than flux, 50x more waifus
>>
>>102837739
Nothing will end the 1girl menace until low effort AI content is banned.
>>
Nothing will end the 1girl menace until high effort models are made
>>
so until then post more 1girls
>>
>when ldg died
>>
ldg stay kil until sana comes out and saves us all. if sana bad /ldg/ die for ever, sorry bud.
>>
I might need to ask for help making 1girl slop. I made a lora that I want to post on civitai and I figure having 1girl as the main image will be most helpful since that's what people use SD for most of the time. I don't usually make generic anime girls so this is new to me.
>>
>>102838346
If it's anything like Sigma it's going to rock and if it's more efficient compared to Sigma like they claim then it's going to be the king mini model. The only question is how the Gemma text encoder handles new concepts and particularly nsfw, hopefully the prompt enhancer doesn't do bullshit like "sorry I cannot do this". But since it's 2B, I guess you can fix that with a finetune too.
>>
>>102838401
post some pics
>>
File: ComfyUI_01992_.png (1.06 MB, 1024x1024)
1.06 MB
1.06 MB PNG
>>
>>102838586
if thats flux i wanna know what loras you used
>>
Too good to be true
>>
Too true to be good
>>
>>102836379
could you share prompt pls?
>>
File: ComfyUI_01991_.png (1.06 MB, 1024x1024)
1.06 MB
1.06 MB PNG
>>102838605
yes its Flux and the LoRa I used is my Aika LoRa that was banned from Civitai.
>>
>>102838653
banned why?
>>
>>102838653
isnt that a character lora? shes nice but im mostly impressed by the artstyle
wondering if you would share the prompt/methods
>>
File: ComfyUI_01859_.png (1016 KB, 1024x1024)
1016 KB
1016 KB PNG
>>102838666
>Hi there! I'm writing because we've been contacted by T-Powers, an entertainment agency, requesting the removal of model (https://civitai.com/models/694163/jav-aika) and a number of other models featuring people whom they represent. I'm removing that content per our policy allowing real people (or their representatives) to request the removal of their likeness, and I appreciate your understanding! Thanks.
>>
LDG never die
>>
File: ComfyUI_00940_.png (1.33 MB, 1024x1024)
1.33 MB
1.33 MB PNG
>>102838675
>isnt that a character lora?
yes
>wondering if you would share the prompt/methods
its nothing fancy actually, here see for yourself
https://files.catbox.moe/t6je96.png
>>
>>102831547
Just to suffer?
>>
File: catbox_k79428.png (1002 KB, 832x1216)
1002 KB
1002 KB PNG
>>102838505
The lack of arms is intentional, that's the purpose of the lora.

I'm using autismmix ponyxl and I wish it didn't look so halfway between anime and 3d. I've been having a lot of issues with the faces looking way too young as well, this is one here is the best I could get after several rerolls.

I ended up uploading the LoRA with this image here, but I think the cover image could be so much better: https://civitai.com/models/858871?modelVersionId=960964
>>
>>102837748
god i hope
>>
>>102838606
>>102838622
Is it?
>>
>>102838710
interesting, thanks
>>
>>102838733
that's cool, pic looks fine to me but i've never used pony before
>>
Fresh

>>102838795
>>102838795
>>102838795
>>
>>102837708
Yes
>>
>>102837656
based and fatbitchpilled
>>
File: 2024-10-15_00003_.png (1.83 MB, 1024x1024)
1.83 MB
1.83 MB PNG
>>102838686
Wow, crazy. Glad I downloaded a bunch of loras lol
>>
File: ComfyUI_34287_.png (1.35 MB, 848x1024)
1.35 MB
1.35 MB PNG
>>
File: ComfyUI_34330_.png (1.49 MB, 848x1024)
1.49 MB
1.49 MB PNG
>>
File: ComfyUI_34336_.png (1.43 MB, 848x1024)
1.43 MB
1.43 MB PNG
>>
File: file.png (6 KB, 781x65)
6 KB
6 KB PNG
the fuck does this even mean??
>>
>>102839314
>>102839358
care to share the catbox?
>>
File: ComfyUI_34334_.png (1.3 MB, 848x1024)
1.3 MB
1.3 MB PNG
>>102839484
https://files.catbox.moe/4f5axf.png
https://files.catbox.moe/vycmq5.png
With loras from here https://mega.nz/folder/mtknTSxB#cGzjJnEqhEXfb_ddb6yxNQ
>>
>>102839539
ebin
thank you anon, looks sick as fuck



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.