[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: the longest dick general.jpg (2.8 MB, 2137x3264)
2.8 MB
2.8 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bred : >>102940941

2 4 8 16 32 64 128 Edition

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io
Metastable: https://metastable.studio

>Advanced UI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
reForge: https://github.com/Panchovix/stable-diffusion-webui-reForge
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://aitracker.art
https://huggingface.co
https://civitai.com
https://tensor.art/models
https://liblib.art
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3

>SD3.5
https://huggingface.co/stabilityai/stable-diffusion-3.5-large
https://replicate.com/stability-ai/stable-diffusion-3.5-large

>Sana
https://github.com/NVlabs/Sana
https://8876bd28ee2da4b909.gradio.live

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux
DeDistilled Quants: https://huggingface.co/TheYuriLover/flux-dev-de-distill-GGUF/tree/main

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/aco/sdg
>>>/aco/aivg
>>>/b/degen
>>>/c/kdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vt/vtai
>>
File: 00002-1975399016.png (3.33 MB, 1280x1920)
3.33 MB
3.33 MB PNG
>>102949088
Oh yes

More
>>
>>102949167
You can't do that and make a good model ever
>>
>>102949205
>You can't do that and make a good model ever
this, at this point I'm just waiting for China to deliver the goods, the west has lost the AI battle
>>
billions must gen
>>
>>102949193
I've got an very old gen from you
>>
>>102949211
China has a no shit given about IPs, but they have their own problem around nsfw, so not sure they'd do nice things either.
Though at least the base quality would probably be leagues better than not using any copyrighted or "allowed" "safe" content ever.
The consent obsession in the west is turning everything into shit.
>>
>>102949238
>China has a no shit given about IPs, but they have their own problem around nsfw,
desu they aren't much more cucked on nfsw than the west for example, Moshi can do female nudes just fine
>>
File: 00011-2688074747.png (3.38 MB, 1280x1920)
3.38 MB
3.38 MB PNG
>3 of my gens made it into the OP
Excellent

>>102949229
Great stuff, flux?
>>
>>102949245
>Moshi can do female nudes just fine
kino piercel
https://files.catbox.moe/t6276z.mp4
>>
>>102949245
Oh the difference is just that individuals in the chinese companies have no problems with nsfw used in their training, while the law is very anti nsfw in general, so they always tread lightly, at leas in public.

In the west it's the opposite, nothing nsfw is illegal, so the individuals do that by conviction. It's kind of sad really.
>>
File: chwnol.png (2.35 MB, 1152x2052)
2.35 MB
2.35 MB PNG
>>
File: 1679020662560538.png (3 MB, 1152x2052)
3 MB
3 MB PNG
>>102949252
SDXL
>>
>>102949265
>In the west it's the opposite, nothing nsfw is illegal, so the individuals do that by conviction. It's kind of sad really.
amen anon, fucking amen... I'm just glad China exists at this point because if we had to only rely on the west to get good shit I would end up depressed kek
https://www.youtube.com/watch?v=XZcN6lIVmSo

To be fair, SD3.5 got a little better in that department, it can do females nudes now, I guess that removing Emad the prude from the team was a good idea after all
>>
>>102949252
since you're still using webui/forge, you should play with this extension https://github.com/muerrilla/sd-webui-detail-daemon
I remember using it but never got ported to comfy, thank to migu poster to post about it in the last thread
>>
>>102949318
>I remember using it but never got ported to comfy, thank to migu poster to post about it in the last thread
any equivalent for comfy?
>>
>>102949318
for posting about it**
>>
>>102949295
>Emad the prude
my view is that most of them are prudes, they all write the same "safety" shit (which always means no nsfw)
>>
>>102949346
https://www.reddit.com/r/comfyui/comments/1g9wfbq/simple_way_to_increase_detail_in_flux_and_remove/
>>
File: 00000-2710090048.png (2.84 MB, 1280x1920)
2.84 MB
2.84 MB PNG
>>102949284
What model, it looks good

>>102949318
I will look into it, thanks mate
>>
>>102949371
thanks man
>>
>>102949318
that dev is so good, this is another extension that doesnt have a comfyui port
https://github.com/muerrilla/stable-diffusion-NPW
>>
File: 00178-3093194836.jpg (439 KB, 1248x1824)
439 KB
439 KB JPG
>>
File: file.jpg (3.88 MB, 5884x3188)
3.88 MB
3.88 MB JPG
>>102949371
Idk about that method, I can't find a good value between 0.95 and 1, it's not consistent, what value consistently look the best to you anon?
>>
>>102949477
they all look great anon, 0.95 seems to be the perfect value
>>
File: 00023-1651728872.png (1.77 MB, 1024x1536)
1.77 MB
1.77 MB PNG
>>
File: cryingpepe.png (1.85 MB, 1120x1120)
1.85 MB
1.85 MB PNG
that one pic in the collage with the miku tranny is savage
>>
>>102949522
>they all look great anon
look at the prompts, it's really a hit or miss, sometimes they miss the text to be displayed, sometimes it doesn't understand it aswell, I mean for you that's all right you go for simplistic 1girl images but if you go past that I'm not sure if this is a good deal
>>
>>102949536
He will never be a Migu, many such cases :(
>>
File: 00024-558990164.png (1.88 MB, 1024x1536)
1.88 MB
1.88 MB PNG
>>102949318
This detail daemon extension seems to work wonders, thanks mate
>>
File: file.png (599 KB, 960x467)
599 KB
599 KB PNG
kek
>>
>>102949546
well, you could try re-formatting your prompt, have you tried using a LLM bot to rewrite your prompts? that can help with formatting and grammar
>>
Can you multigpu on the video models that just got released?
I have 1 3090 and 1 3080...
>>
OK but why would you do any of this
>>
File: file.png (394 KB, 2453x1347)
394 KB
394 KB PNG
>>102949601
because it's on ComfyUi, you can put the text encoder on the 2nd gpu, but that's all, I don't think you can do inference paralellism
>>
>>102949575
don't thank me, thank migu, I totally forgot about that extension until he posted about it
>>
>>102949613
thanks, is that faster doing that or is there no point?
I've also seen that :

https://github.com/victorchall/genmoai-smol
But it's monogpu.
>>
File: file.png (27 KB, 2823x91)
27 KB
27 KB PNG
>>102949622
>thanks, is that faster doing that or is there no point?
I guess that's faster the moment you want to change the prompt, it won't unload and reload the text encoder, but desu it's not that much of a deal, making a video is fucking long so you won't care if you won 10 sec because of the text encoder kek
>>
File: j2zouwj2jlwd1.webm (173 KB, 600x600)
173 KB
173 KB WEBM
>>102949477
here's another example
>>
>>102949585
>50s horor (makeup)
>80s horror (soul)
>modern horror (cgi)
>>
>>102949642
I see, sad
>>
File: merged_image.jpg (3.74 MB, 5632x2048)
3.74 MB
3.74 MB JPG
>>102948557
I did some blind comparisons with flux lite. I've got four sets of images of the same 10 seeds of this knight guy.

Lite plus artstyle lora vs Dev Q8 plus artstyle lora:
Results: 1 vote for lite, 9 votes for dev.

The artstyle lora does mostly work, but it's noticeably capturing only like 90% of the style.
And here's the second comparison without the artstyle lora:

Lite with no lora vs Dev Q8 with no lora:
Results: 3 votes for flux lite, 7 votes for dev q8

So in conclusion: flux lite is 23% worse in exchange for being 23% faster.
>>
File: file.png (2.04 MB, 2048x1024)
2.04 MB
2.04 MB PNG
>>102949648
>>102949618
https://www.reddit.com/r/comfyui/comments/1g9wfbq/comment/lte0rdg/?utm_source=share&utm_medium=web2x&context=3
seems like they improved on the Sigma thing with this "LyingSigmaSampler" node, it adds details without changing the overall picture
>>
>>
>>102949674
>Lite with no lora vs Dev Q8 with no lora:
>Results: 3 votes for flux lite, 7 votes for dev q8
ok so that's a fucking nothingburger, Q8 is 12gb big wheras flux lite is 16gb big, and Q8 wins lol
>>
>Moderated: QUALITY
That's a new one
Still let's you download the gen though, just doesn't show it to you online for some reason
>>
File: 00069-1911574376.png (3.16 MB, 922x2765)
3.16 MB
3.16 MB PNG
>>
File: 02131.jpg (2.31 MB, 1664x2432)
2.31 MB
2.31 MB JPG
>>
File: 00074-670133703.png (3.73 MB, 922x2765)
3.73 MB
3.73 MB PNG
>>
File: 00003-24852473.png (3.43 MB, 2400x960)
3.43 MB
3.43 MB PNG
>>
File: 00076-2964468965.png (2.32 MB, 2592x864)
2.32 MB
2.32 MB PNG
>>
File: file.png (356 KB, 2482x1222)
356 KB
356 KB PNG
https://github.com/kijai/ComfyUI-MochiWrapper
Ok gentlemen, I just did a bf16 vs fp8 comparison, they have the exact same settings (pircel image)

>A 22 years old woman dancing on the Hotel Room, she is holding a Pikachu plush
bf16
https://files.catbox.moe/fubqwj.webm
fp8
https://files.catbox.moe/92gksm.webm
>>
>>102949717
>>102949730
>>102949738
How does one achieve this
>>
File: Untitled.png (36 KB, 1026x251)
36 KB
36 KB PNG
>>102949295
3.5 Medium is apparently multi-res while Large isn't, so it might be better even
>>
>>102949764
Not use models with zero aesthetic like flux trained on synthetic slop and instead use models trained on actual art
https://civitai.com/models/833294/noobai-xl-nai-xl
>>
>>102949772
>3.5 Medium is apparently multi-res while Large isn't, so it might be better even
there's no way a 2b model is gonna be better than a 8b model, right? I can feel SD3.5M will be a distilled version of SD3.5-8b
>>
>>102949675
its amazing how webui extension devs mog the comfyui ones, adetailer, npw, detailerdaemon, resharpen, hires-fix-tweaks, webui-controlnet I could go on...meanwhile custom nodes all they do is pollute your comfyui with schizo options and tweaks, the only good custom node dev is kijai, the rest are really autistic
>>
File: 02133.jpg (1.59 MB, 1664x2432)
1.59 MB
1.59 MB JPG
>>102949717
>>102949730
im in awe
>>
>>102949795
You assume everyone's purpose is to create "quality". Some of us want to generate an image that corresponds to the prompt.
>>
File: 00004-449654167.png (1.02 MB, 832x1216)
1.02 MB
1.02 MB PNG
>>
>>102949755
We were so hyped after those previews... Minimax at home they said...
>>
>>102949675
How's that work?
>>
>>102949816
>You assume everyone's purpose is to create "quality". Some of us want to generate an image that corresponds to the prompt.
if this was true, SD3M would be a popular model and not a meme, because it follows the prompt well
>>
>>
File: 00023-3053294472.png (3.88 MB, 1037x3110)
3.88 MB
3.88 MB PNG
>>
>>102949823
>We were so hyped after those previews... Minimax at home they said...
we'll get Minimax at home anon, it'll be the HD version, that's the one they probably use in their demos
https://www.genmo.ai/blog
>Today, we are releasing our 480p base model, with Mochi 1 HD coming later this year.
>>
File: 00008-3694283219.png (985 KB, 832x1216)
985 KB
985 KB PNG
>>
>>
File: 00000-2196781888.png (3.86 MB, 1152x2016)
3.86 MB
3.86 MB PNG
>>
File: file.png (491 KB, 3402x1562)
491 KB
491 KB PNG
https://github.com/kijai/ComfyUI-MochiWrapper
Has anyone managed to make sage attention work on windows? got those weird ass errors
>>
File: 00078-3780410220.png (2.86 MB, 922x1843)
2.86 MB
2.86 MB PNG
>>
>>102949889
These look nice.
>>
>>102949823
Genmo falls off very quickly as soon as you stray from the training data. Heavy cherry picking is needed too. But let's not lie to ourselves: img2video is the only use case valuable for actual production companies and larger projects since you need some way to control subject consistency.

>>102949847
Even if the HD version is also underwhelming and suffers from the same issues as the 480p version, if it's Apache 2 we'll just have to wait for PonyGenmo in 2025. I highly doubt many people will be using the HD version locally given the VRAM requirements though,even if there's a god-tier 4bit quant of the model.
Hopefully the eventual fine-tunes improve prompt adherence too because after being spoiled by the natural language understanding of Flux and the Chinese video models it's really frustrating when genmo doesn't listen. The only consolation is I wasted 20 cents of some Silicon Valley incubators seed funding on the failed gen
>>
File: 00014-1700103974.png (1.11 MB, 832x1216)
1.11 MB
1.11 MB PNG
>>
>>102949800
no, it's literally a different model apparently. He's saying 3.5 Large is just a finetune of old 3.0 Large, but 3.5 Medium was done up from scratch which let them add new stuff. So he believes that's why it has multi-res training but Large doesn't. It definitely won't be distilled like Large Turbo, also.

I think it's possible for it to have worse prompt adherence but better image quality, at least.
>>
>>102949908
>Even if the HD version is also underwhelming and suffers from the same issues as the 480p version, if it's Apache 2
that's my concern, maybe it won't be apache 2 for the HD version
>>
>>102949935
Handsome guy. At first I though he was a lady.
>>
>>102949908
>highly doubt many people will be using the HD version locally given the VRAM requirements
I'm sure that's possible, the 480p + fp8 version only asks for 12gb of vram during inference and only 10 gb of vram during vae decoding if you go for vae tilt
>>
>>102949960
In that case we cope with 480p or just wait 6 months for the new hotness base model. As soon as a company finds a way to be profitable with AI video competition will increase, and Minimax beating out kling and now the open source release of genmo (with the paper coming out soon allegedly) shows that there's no moat for video models
>>
>>102950000
You underestimate the VRAM of the average normie, anon
Even an enjoyable SDXL experience is still too out of reach for a lot of the population since they're on 8GB or less cards
>>
>>102950017
The vram issue is because inference software sucks.
>>
vram is waaaayyyyy too large, for ai.

The reason to have loads of vram, for games, is that players can be very random.
>>
>>102950070
>vram is waaaayyyyy too large, for ai.
true, but we have no other choice, you can't get a good model with 1b, maybe if we improved the transformers architecture that would be possible, we'll see about that
>>
>>102950017
SDXL is totally fine in Comfy with Nvidia Turing or higher arch cards even at 6GB
>>
>>102950017
How many normies do you think are running AI models locally? Their interest in AI is playing for 5 minutes with whatever the new shilled thing (DALLE3/Suno/Minimax/etc.) is and moving on
>>
>>102950129
>How many normies do you think are running AI models locally?
a lot, there's a lot of discussions in the US about destroying the AI local ecosystem by making laws that would make impossible to train uncucked local models
>>
File: 00019-3822636690.png (1.08 MB, 832x1216)
1.08 MB
1.08 MB PNG
>>
>>102950165
>there's a lot of discussions in the US about destroying the AI local ecosystem by making laws that would make impossible to train uncucked local models
Huh? Where?
>>
>>102950182
https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202320240SB1047
In Commiefornia especially, a lot of AI companies are in there unfortunatly
>>
File: file.png (290 KB, 2808x1526)
290 KB
290 KB PNG
I FUCKING HATE BUILDING BINARIES IT NEVER WORK FUUUUUUUUUUUUUUCK
>>
File: 02135.jpg (3.13 MB, 1792x2304)
3.13 MB
3.13 MB JPG
>>
>he's actually trying to build flash attention
Give up lol, you're wasting your time. You have already spent more time than you'd ever save by slightly faster gens.
>>
>>102950218
>slightly faster gens
flash attention is useful for memory, which is the most important ressource in AI
>>
>>102950017
According to the latest steam survey around 25% of users have 12gb or more of VRAM which is pretty good considering a big portion of steam users are poverty Russians and Sudacas just playing DOTA2 on shitboxes
>>
>>102950194
>>102950165
>make impossible to train uncucked local models
Technically if you're using less than $10,000,000 in computes and keeping the model to yourself it's exempt. Still gay desu.
>>
>>102950257
>According to the latest steam survey around 25% of users have 12gb or more of VRAM
That sounds crazy when 12GB+ vram only really started existing at all after the RTX 2000 series
>>
>>102950334
>if you're using less than $10,000,000 in computes and keeping the model to yourself it's exempt. Still gay desu.
If the US don't want to do it, someone else will, and that's exactly why China will win the AI race, they don't want to kill the AI advance, that's the opposite, they want to make the best AI possible, oh well.
>>
>>102950370
it's not that surprising, the games of nowdays aren't optimised anymore so you need a shit ton of vram, and the 3060 has a lot of vram and is cheap as fuck because Nvdia was making money out of the crypto grifters during the crypto boom in 2021 kek
>>
>>102950113
If you're okay with a single pass and no hiresfix and consider that "totally fine" sure

>>102950381
Video games being unoptimized doesn't give thirdies money they can spend on new GPUs though. I guess it doesn't matter since the future of inference is with NPU/IPUs anyways
>>
>>102950420
>I guess it doesn't matter since the future of inference is with NPU/IPUs anyways
what's that?
>>
>>102950443
Neural/Inference processing units. Basically dedicated hardware for running inference of models. Right now they're just embedded into the CPU and used for small things like helping blur your webcam when on a zoom call etc but both AMD and Nvidia (as well as a lot of startups probably) are working on larger discrete accelerators
source: I worked at AMD for a bit on a project related to NPUs
>>
>>102950494
those NPUs will be able to do graphic stuff aswell? Like video games and shit? because the advantage of a GPU is that it can do both video games an AI
>>
>>102950494
>Neural/Inference processing units. Basically dedicated hardware for running inference of models.
how much faster will it be? Let's take a comparison, how much faster a NPU would be against a 3090 for example on inference?
>>
>>102950498
No, they're only for running inference. The advantage of an NPU is that it'll be much much cheaper than a 3k 4090 for the same ML performance which is interesting to me because I don't really care about gayming

>>102950519.
>how much faster will it be? Let's take a comparison, how much faster a NPU would be against a 3090 for example on inference?
Depends on the NPU. It'll be more efficient for sure. I wasn't working on the ML side of things but more the hardware side so I can't give you any estimates but I'm assuming it'll be similar to ASICs for crypto mining where they completely mog GPUs at the same price point since they're optimized for that specific task
>>
https://github.com/kijai/ComfyUI-MochiWrapper
Ok I managed to make sage attention work, here's the steps:
1) Install triton with those binaries
https://github.com/woct0rdho/triton-windows/releases/tag/v3.1.0-windows.post5
2) Install python 3.11.9 on your computer
https://www.python.org/downloads/release/python-3119/
3) Go to C:\Users\Home\AppData\Local\Programs\Python\Python311 and copy the "libs" and "include" folders
4) Paste those folders onto ComfyUI_windows_portable\python_embeded
>>
>>
>>102950544
forgot step 0) -> install sage attention -> pip install sageattention
>>
>>102950544
>>102950555
>sage attention
>Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively
Neat
>>
>>102949411
>my chatbot at the start
>by the third reply
>>
1girl, river, walking
>>
>>102950597
OMG IT TETO
https://www.youtube.com/watch?v=dDlljvDSLSg
>>
https://github.com/comfyanonymous/ComfyUI/commit/f82314fcfcc4d83b307f30f06e77db44e95686cf#diff-ff903427b64d57103d983ee5eeb6c33ffb5ab760526a48a35ad42d5afafdf2fbR360-R361
Interesting, so the beta had an issue on ComfyUi, maybe that could explain why that one tend to overburn the image compared to the other samplers
>>
>>102950542
>Depends on the NPU. It'll be more efficient for sure.
what about the memory though? it's the most important thing, it's easier to add memory onto a NPU compared to a GPU?
>>
File: ComfyUI_34401_.png (1.5 MB, 848x1024)
1.5 MB
1.5 MB PNG
>>
>>102950597
Now do 1girl, river, wanking
>>
>>102950742
nice style
>>
File: file.png (3.64 MB, 2638x1452)
3.64 MB
3.64 MB PNG
>>102950748
>wanking
>>
>>102950724
>what about the memory though? it's the most important thing, it's easier to add memory onto a NPU compared to a GPU?
Memory is cheap anon, don't let Nvidia and AMD trick you into thinking it's not because of their ridiculous prices at the data center tier. They could both sell us a 32GB vram graphics card for under $1000 if they wanted to. There's just no reason to ever do that because they know people will pay 4x the price or more for it due to the gold rush and their fiduciary duty to shareholders to maximize profit
>>
>>102950761
the problem is that those are the same greedy companies (Nvdia and AMD) that will make those NPUs, so yeah maybe NPUs will be fast, but if they only have 16gb of vram you're just as fucked as if you had a 16gb of gpu vram
>>
>>102950420
You can do hi-res-fix up to like 1.5x with 6GB VRAM + 16GB RAM on an SDXL model no problem. Again the card has to be Nvidia and it has to be Turing or later, though.
>>
File: ComfyUI_34406_.png (1.43 MB, 848x1024)
1.43 MB
1.43 MB PNG
>>
so many 1girl sloppu ;_;
>>
Remember this?
https://blackforestlabs.ai/up-next/
I thought they would never release it to the public, but now that we got Mochi, maybe they'll try to enter into competition with them, that would be cool desu
>>
>>102950788
Sure but my point is that a professional or prosumer tier NPU will be cheaper than the equivalent GPU for equivalent inference performance

>>102950906
>the card has to be Nvidia and it has to be Turing or later, though.
I stand corrected then. How long would a 1024x1024 hiresfixed to 1536x1536 be on 6GB+16GB? I consider anything longer than 2 minutes per gen "unusable"
>>
>>102951259
>my point is that a professional or prosumer tier NPU will be cheaper than the equivalent GPU for equivalent inference performance
Idk about that, if only Nvdia makes them good, they'll just make expensive as fuck like their overpriced GPUs
>>
flux gave this bitch a amulet
>>
>>
File: file.png (944 KB, 735x856)
944 KB
944 KB PNG
>>102951284
I read it as "a mulet" so I was expecting this kek
>>
File: file.png (129 KB, 2808x653)
129 KB
129 KB PNG
>>102950544
>>102950200
btw with this method you'll be able to build your own binaries of flash attention
>>
>>102951284
It didn't fill her other equipment slots though
>>
File: 00038-1638469766.png (763 KB, 1152x896)
763 KB
763 KB PNG
>>102951140
These models are all trained really well for 1girl slop, usually if i try to make anything else its shit. Yes there's a skill issue component, but it's easier to go with the flow and produce 1girls.
>>
>>102950568
>>Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively
I wanted to verify by myself, so here you go:

>Donald Trump mocks and laughs at a kneeling, weeping Kamala Harris
>121 frames, 64 steps, seed 42

>Sage: 35:28<00:00, 33.26s/it, 16.2gb VRAM during inference
https://files.catbox.moe/dyi3cm.webm
>flash_att: 42:49<00:00, 40.14s/it, 16.2gb VRAM during inference
https://files.catbox.moe/dqxrat.webm

Sage is definitely faster, use it >>102950544
>>
File: 00003-4037236168.png (2.02 MB, 1344x768)
2.02 MB
2.02 MB PNG
I wanted a fisherman catching a bass. Instead i get... a bass fisherman? I mean, if I'd written the words bass fisherman in the prompt, I'd get it.
>>
>>102952099
So it's faster, but not 2.7x faster
I wonder how long it'll take to go from 30+ minutes to under 5 between optimizations and hardware improvements
>>
File: 3380426930.png (1.11 MB, 896x1152)
1.11 MB
1.11 MB PNG
>>
File: 2611978228.png (1.03 MB, 896x1152)
1.03 MB
1.03 MB PNG
>>
File: file.png (105 KB, 2773x619)
105 KB
105 KB PNG
>>102952099
the neat part is that sageattention is also used on image models like flux, and it's also faster than going for the sdpa optimisation, I used to get 3.6s/it on sdpa, now I'm at 2.91, let's go dude!
>>
>>102952128
>So it's faster, but not 2.7x faster
*2.1x, 2.7x was for segatt vs xformers, but yeah of course they went for the most extreme of cases to get this number (as every researchers do kek), having a 1.2x speed is still really cool though and when I look at those 2 videos I notice the quality on sage is better, so it's a win/win situation there
>>
Does everyone know what cuda malloc do? When I disable it I don't notice any difference
>>
>>102949755
Thanks for the comparison.
II'll give bf16 a go today. I think you had an "unlucky seed" for your test btw.
>>
>>102952265
>II'll give bf16 a go today. I think you had an "unlucky seed" for your test btw.
probably yeah, it was on fp8 too, unfortunately you can't go too far on bf16, when I go over 60 frames it overflows, I wish there was something in between fp8 and bf16 so that you'll get the quality and enough space to stack up the frames
>>
File: 405816733.png (1.22 MB, 896x1055)
1.22 MB
1.22 MB PNG
>>
>>102952287.
>I wish there was something in between fp8 and bf16 so that you'll get the quality and enough space to stack up the frames
Why does bf8 not exist for fp8 like how bf16 exists for fp16?
>>
File: 3562288793.png (1.1 MB, 896x1152)
1.1 MB
1.1 MB PNG
>>
File: bComfyUI_132649_.jpg (1.34 MB, 3072x1536)
1.34 MB
1.34 MB JPG
>>
kek, so this general just devolved into coomposting too. its just sdg 2.0 now
>>
>>102952412
These are great, having human for scale is a nice touch
>>
So can you run that mochi thing on 2x3090, on comfy (on linux) ?
>>
>>102949193
>>102949229
>>102949252
>>102949318
>>102949358
>>102949371
>>102949409
>>102949522
>>102949599
are these real or AI? I don't have my glasses right now
>>
>>102950544
https://reddit.com/r/StableDiffusion/comments/1gb07vj/how_to_run_mochi_1_on_a_single_24gb_vram_card/
Just made a long ass tutorial for those who weren't able to make Mochi work on their computer
>>
File: screenshot.png (52 KB, 2406x1896)
52 KB
52 KB PNG
>>102945840
ok I've made this program (claude did) to help simplify the second half of the process:
https://github.com/rainlizard/EasyQuantizationGUI/releases

For anyone on Windows who wants to convert the 24GB flux1-dev.safetensors file to 12GB .gguf it should be pretty easy now.
>>
>>102952977
>So can you run that mochi thing on 2x3090, on comfy (on linux) ?
you can make it work with a single 3090 >>102953060
>>
File: mochi.png (4 KB, 1081x22)
4 KB
4 KB PNG
guys I finally got the "room heater" comfyui extension
>>
>>102953224
>150 steps
what? you can go for lower, the default value for Mochi is 64 steps
>>
>>102953068
>For anyone on Windows who wants to convert the 24GB flux1-dev.safetensors file to 12GB .gguf it should be pretty easy now.
nice job anon, I'll keep this link in mind I wanna make gguf of other flux variant models
>>
>>102953278
I think the examples used by mochi use 200, and I can wait. I can't fucking install sageattn so Im using pytorch's default attention, how much slower does it make the process?
>>
>>102953302
>I think the examples used by mochi use 200
I've heard it was 64, how did you get that number?

>I can't fucking install sageattn so Im using pytorch's default attention
Why? It works on both windows and linux >>102950544
>>
>>102953325
>how did you get that number?
half-remember reading it somewhere like yesterday, maybe I made it up idk
>>
>>102953325
I know but triton gives me a message about not being able to find Windows Kit/10/Include and I've been fiddling with pip shit for an hour now so I've given up
>>
>>102953357
>triton gives me a message about not being able to find Windows Kit/10/Include
did you do this? >>102950544
>3) Go to C:\Users\Home\AppData\Local\Programs\Python\Python311 and copy the "libs" and "include" folders
>4) Paste those folders onto ComfyUI_windows_portable\python_embeded
>>
>>102953372
Whoops, no. Ill do it after I get this gen, thanks
>>
>>102953060
Why specifiy 24gb when it works on a 16gb card, well at least in Linux it does.
>>
>>102953525
>Why specifiy 24gb when it works on a 16gb card
true, but my tutorial is for both fp8 and bf16 and your 16gb card can't handle the bf16
>>
File: ComfyUI_SD35L_0302.jpg (160 KB, 1152x896)
160 KB
160 KB JPG
Flux 8B model
https://huggingface.co/Freepik/flux.1-lite-8B-alpha
>>
>>102953753
it's still distilled right?
>>
The Russian teens are middle eastern indians now wtf. What garbage captioner did the genmo team use or is the website secretly modifying prompts?
>>
Here's the better of the two to show it wasn't just a fluke
>>
>>102953771
>>102953782
>1696x960
that's the official resolution you got when downloading those videos? if yes then it means that their demo is using the HD version, we only got the 480p model locally yet
>>
>>102953545
I tried, you are right, much sadness :(
Good redit post though.
>>
>>102953835
>I tried, you are right, much sadness :(
it's ok, the fp8 isn't that different to the bf16, you don't loose much and you can still run Mochi
>Good redit post though.
thanks :3
>>
>>102953790
That's the official resolution yeah. That would explain why the catbox webms of the 480p gens look so much worse.
>>
>>102953848
All my gens with fp8 have had considerable "wavy mirage" effects on motion, I'd hoped the bf16 would reduce that, facial coherence and generation is sometimes as spot on as FaceDetailer for sdxl but it's wildly unpredicable through the 8 or so gens i've done so far.
This is the first instance since I bought my GPU last Christmas (4060ti) that i've desired a new card due to the processing time, it was good while it lasted.
>>
File: file.png (65 KB, 716x676)
65 KB
65 KB PNG
>>102953918
>All my gens with fp8 have had considerable "wavy mirage" effects on motion, I'd hoped the bf16 would reduce that,
I think it does yeah >>102949755, but the quality is still not on par with their demo (that probably use the HD version), I'm trying to increase the resolution and see what it does kek (this resolution and fp8 asks for 16gb of vram)
>>
>>102953918
>All my gens with fp8 have had considerable "wavy mirage" effects on motion
maybe Comfy's new scaled fp8 could fix this, it's supposedly better in quality than the regular fp8, dunno how to make those though
https://huggingface.co/comfyanonymous/flux_dev_scaled_fp8_test
>>
She's looking at me judgingly because of my promptletness

The online site's text prompt moderation works like Luma's, where one prompt works but a very similar one with one word changed doesn't. In this case "bathroom" was blocking the prompt until I changed it to "apartment"
>>
File: 3948116548.png (937 KB, 1152x896)
937 KB
937 KB PNG
>>
>>102954000
>...dunno how to make those though
I don't think Comfy would do it specifically for mochi but some techniques might be applicable for people in that area to use if they wanted to try.
Also, I think captcha is a about to break, had a few just now where the slider doesn't change the image.
>>
Remember that meme? Good times.
https://xcancel.com/__theben/status/1829554120270987740
>>
File: ComfyUI_01567_.png (1.92 MB, 1024x1024)
1.92 MB
1.92 MB PNG
>>
>>102953070
thanks anon
>>
File: ComfyUI_01569_.png (1.42 MB, 1024x1024)
1.42 MB
1.42 MB PNG
SD3.5L is pretty decent so far
>>
>>102954405
anatomy is very bad for me and it doesn't work well with resolutions outside of 1024x1024, but they said they used a improved architecture for sd 3.5 medium so i'm waiting for that
>>
File: file.webm (813 KB, 1712x960)
813 KB
813 KB WEBM
Yep, their demo is definitely the HD one, because on the demo it's rendering a 1696x960 and when I put those resolutions on the local 480p model I got this blurry shit kek
>>
>>102954433
>they said they used a improved architecture for sd 3.5 medium so i'm waiting for that
desu they should've used the improved architecture for the 8b aswell, they could've beaten flux that way, their 2b model won't change the needle I'm afraid
>>
>>102954448
im assuming the 8b was just released as a way to generate hype and the medium model is what they put most of their work into. if they released the medium model by itself it would have been met with disappointment ig
>>
File: file.png (105 KB, 360x521)
105 KB
105 KB PNG
>>102954466
it's just a 2b model, they shouldn't focus on this little shit, they had the chance to beat flux with their 8b and they didn't go for it, why are they so fucking retarded??
>>
>>102953068
Just curious, are we able to quantize any of the general image diffusion models like sd 3 or pony and run it from sd webui forge?
>>
>>102954490
they are a collapsing company, i doubt they have the resources to focus on the 8b anymore. back when sd3 released, the old 8b was the one they locked behind the api to generate some coin out of desperation and even that failed because they were outclassed by the competition
>>
what do you jabronis put in your negative when genning with flux? Bad hands, bad image, ugly, deformed? Does any term in the negative improve overall image quality?
>>
File: ComfyUI_01577_.png (1.4 MB, 1024x1024)
1.4 MB
1.4 MB PNG
>>102954433
Okay, I'll keep having a good time though
>>
>>
>>
>>102950544
>>102953060
Do I have to use portable Comfy? I assume dropping those Python311 libraries into my Python 312 install won't work?
>>
>>102954569
>is happy generating white plates with strawberries and chocolate splattered around
I guess I'll envy you.
>>
>>102954529
No, the only use is if your gen has something you didn't want, add that to negs and regen.
>>
File: ComfyUI_temp_xgpij_00025_.png (1.94 MB, 1088x1360)
1.94 MB
1.94 MB PNG
>>
>>102954490
>they had the chance to beat flux with their 8b and they didn't go for it
They tried, a Base Model with naked women is the proof they really tried.
The bad thing with incompetence is that you can't do it no matter how hard you try.
>>
File: 00030-2621750539.png (1.21 MB, 832x1216)
1.21 MB
1.21 MB PNG
>>
>>102954707
>I assume dropping those Python311 libraries into my Python 312 install won't work?
I guess you can install python 312 on your computer and put those libraries, don't forget to also download the triton that has python 312 in it
>>
File: ComfyUI_temp_xgpij_00029_.png (2.26 MB, 1088x1360)
2.26 MB
2.26 MB PNG
>>
>>102954940
I mean, they will be using a better architecture for SD3.5-2b, if they also did that for SD3-8b, I'm sure they would've caught up with Flux
>>
>>102954707
>Do I have to use portable Comfy?
If you know where is located your python.exe that is used to run your Comfy, you can remix this tutorial to make it work I guess
>>
File: ComfyUI_temp_xgpij_00035_.png (2.17 MB, 1088x1360)
2.17 MB
2.17 MB PNG
>>
https://github.com/kijai/ComfyUI-MochiWrapper
https://huggingface.co/Kijai/Mochi_preview_comfy/blob/main/mochi_preview_dit_GGUF_Q4_0_v1.safetensors
>GGUF.safetensors
didn't know it could work like that :d
>>
>>102955078
can I run this shit on my 3060 12gb?
>>
>>102955097
without any issues, fp8 was asking for a bit more than 12gb of vram
>>
>>102955078
ok, I'm mildly aroused now.
>>
>>102955078
this mf transformed a gguf into a safetensor, that's a fucking genius move if you ask me, maybe if we do the same for flux gguf and transform them into safetensors, they'll be as fast as fp8 and won't be slow motherfuckers when we add loras on top of them?
>>
File: 00009-170385697.jpg (996 KB, 1280x1920)
996 KB
996 KB JPG
>>
>>102955168
neat
>>
File: 3603366327.png (1.16 MB, 1152x896)
1.16 MB
1.16 MB PNG
>>
>>102955078
it's crazy to think of where we were at the end of 2022 compared to now
>>
File: file.jpg (2.13 MB, 7262x1795)
2.13 MB
2.13 MB JPG
it's impressive how much of an impact this tiny node has
https://reddit.com/r/comfyui/comments/1g9wfbq/simple_way_to_increase_detail_in_flux_and_remove/
>>
File: file.jpg (931 KB, 1763x2304)
931 KB
931 KB JPG
https://civitai.com/models/883426/verus-vision-10b?modelVersionId=988886
This is the first finetune of dedistill, I like the skin texture, it doesn't look like plastic anymore
>>
File: file.png (227 KB, 2123x959)
227 KB
227 KB PNG
>>102955467
>This is the first finetune of dedistill
>finetune
not even close, it's just a Lora merge
finetune
>>
File: file.png (24 KB, 918x227)
24 KB
24 KB PNG
They'll never let us down
>>
File: file.png (61 KB, 927x677)
61 KB
61 KB PNG
A little bread from heaven
>>
>>102955532
they know what to do, a bigger model will make their shit relevant, we're waiting BIGMA my dead Mr. Lawrence
https://www.youtube.com/watch?v=jElCDsfptVU
>>
>>102955554
I know you love to burn money doing inefficient shit, but efficiency makes it possible to train big models. Or maybe you like having big models restricted to people with $100k in hardware?
>>
>>102955566
>Or maybe you like having big models restricted to people with $100k in hardware?
why are you whining like that? the llm fags are dealing with fucking 70b models, you have no idea how good you have
>>
>>102955593
Yeah, must be nice relying 100% on Meta lmao
>>
File: file.png (397 KB, 960x876)
397 KB
397 KB PNG
>>102955593
>>102955598
a tale as old as time
>>
>>102955548
goddamnit just give us the model from the demo I don't give a fuck give me new toys
>>
>>102955610
you are already tired from the SD3.5 toy from yesterday??
>>
>>102955619
you should've stopped caring about SAI awhile ago
>>
File: test3.jpg (2.82 MB, 3072x1440)
2.82 MB
2.82 MB JPG
>>102955467
I tested it a few threads ago, it doesn't really improve over regular dedistill while removing a lot of flexibility. I believe any perceived skin detail increase is just the extra grain Verus adds.
Left to right, Verus>DeDistillfp8>Distilledfp8
>>
>>102955643
I kinda do, last time I gave a fuck about them was during the SD3M fiasco, it was funni, and then they released SD3.5, I tested it out for an hour yesterday and I quickly noticed it wasn't at the level of flux, put it on the trash and went on with my life, I'll care about them for the next release though, if it's still inferior to flux it'll be 1 more hour, if not then they succeed on their redemption arc kek
>>
>>102954992
>>102955029
What model is that?
>>
>>102955655
I see, what I notice though is that the distilled version seems to have more realistic humans than vanilla flux dev, but it's just one picture so it's hard to make a definitive conclusion about that
>>
>>102955078
https://huggingface.co/Kijai/Mochi_preview_comfy/tree/main

Which one is recommended? The Q4? The fp8? The bf16?
Is there a difference in quality?
>>
File: file.jpg (2.04 MB, 7961x2897)
2.04 MB
2.04 MB JPG
>>102955731
>Which one is recommended? The Q4? The fp8? The bf16?
>Is there a difference in quality?
so far we only have image models as a baseline, but fp8 seems to increase some blur glitches during movement >>102949755 >>102953918
>>
>>102955665
I immediately saw potential in XL despite the initial reaction from anon but I also saw potential in sigma which, to be fair, got some love but never became the meta. regardless, the trajectory of SAI and the people who left for BFL seem to be larger and larger models which is not something I want. I do like flux but I'm not going to dillude myself into thinking its ecosystem will take any less time to flesh out than XL - in fact, we all know itll take MUCH longer. I'm sure some are okay with waiting and don't mind the hardware requirements but I think it's idiotic to completely forgo the idea of making smaller, better models. flux is bloated as hell and sd3 appears to suffer from le safety demon. it's like software in general being more about "lets put more shit in it" rather than "how can we optimize" or whatever.
I could simply be poorfag coping but I don't think I am.
>>
>>102955775
thanks anon, the videos are both very blurry anyway, I hope all local gens aren't that bad
I'll probably try both anyway
>>
>>102955885
we won't get the same quality as their API demo because they are using the HD version, and we are not
https://www.genmo.ai/blog
>Today, we are releasing our 480p base model, with Mochi 1 HD coming later this year.
>>102954440 >>102953790 >>102953884
>>
>>102955838
I just don't get how people don't see that Pixart gets decent results with 600m, much better than SD 1.5 or base SDXL, so why wouldn't a 1.6B model which an extremely efficient architecture be something that could be very good especially for a niche model. We don't need every model to be the kitchen sink, in fact Pony is a perfect example of a niche model that everyone likes. There's no reason to believe that Sana 1.6B can't be the next hyper specific booru anime model.
>>
>>102955899
oh I see, and I'd guess the hd would be unusable locally unless you get it work on multiple h100...
>>
>>102955775
q4 produced very blurred output and needed the frame_batch cranked right down. Testing again on a small gen rn using the default prompt and seed
>>
>>102955926
>oh I see, and I'd guess the hd would be unusable locally unless you get it work on multiple h100...
not at all, I was able to make a high resolution video on my 3090, of course the result look like absolute shit because it was never intended, it was just to test out if it would be enough in terms of VRAM, and it does, it's asking for 16gb of VRAM for fp8, we're good o/ >>102954440
>>
>>102955958
oh very cool, hopefully the hd one is released soon then
>>
>>102955902
I think base sigma being so undertrained compared to the competition did it. Perhaps some of it was skill issue and likely the audience for base models don't care about aesthetics (which it had in spades). I could shizo-babble about the west conspiring to kill any chinese competition by means of making their models look bad or something, but I think the real reason is something else.
I know for a fact that, somehow, many were filtered by its install. Somehow they didn't know where to place the files for sigma but then when Flux/SD3 arrived, magically they remembered or figured it out.
I wish I knew the answer to your question, but I don't.
>>
>>102955939
I retract this, seems it was just a bad gen idk...
iterate, iterate, iterate.
>>
>>102956002
It's the retarded bigger number meme. There's a reason why the Xbox 2 was called the Xbox 360 (because of the PS3). But I also think it was the timing, Pixart is too small of a model and SAI was promising a Flux-like super model so no one wanted to switch and even to this day ComfyUI doesn't natively support Pixart.
>>
>>102956036
>It's the retarded bigger number meme.
so much of a meme that the SOTA local model is the biggest one
>>
>>102956036
>ComfyUI doesn't natively support Pixart.
i have a suspicion that this is due to some dumbass politics between comfy and city et. al.
>>
>>102956045
The problem with Flux in particular is no one challenges that 12B isn't anything but a means of preventing local competition. It's what I would've done if I was monetizing in this space, you purposely make impossible to train base models and you get to build a loyal audience of window lickers while ensuring no one will make a Pony model to compete against you. We already see the results of this, BFL gets way too much respect despite ghosting us.
>>
File: file.png (46 KB, 827x417)
46 KB
46 KB PNG
https://www.reddit.com/r/comfyui/comments/1g9wfbq/comment/lte0rdg/?utm_source=share&utm_medium=web2x&context=3
Ok now that's impressive
https://imgsli.com/MzExNjQ2
>>
>>102956074
>BFL gets way too much respect despite ghosting us.
I think you overestimate the % of trainers in the ecosystem, this model is currently being downloaded more than a million times per month, the very vast majority of people are just using models, not training them, to them Flux is excellent and they'll never see its shortcommings in terms of training because they will never do something like that, so their feelings towards BFL can only be overall positive, they got in their hands a model that is consistently good and that's it
>>
>>102956079
 dishonesty_factor 
is probably the funniest name for a setting ive seen thus far
>>
>>102956122
Just sucks because this will arrest local development for a year or more.
>>
>>102955467
>>102955508
>verus-vision was lauded as the first flux finetune by anon
>it's not actually a finetune
holy kek
>>
>Hi everyone, yes, we are still alive! Thank you for your attention to SANA, our latest work on efficient text-to-image generation. It was developed jointly by people from NVIDIA, MIT, and Tsinghua University.
>We are preparing to open source SANA recently (waiting for the company's approval process, but whether it can be open source depends on the company's approval result). If you have any suggestions, you can leave a message directly in the channel, send an email to the SANA team (enzex@nvidia.com/junsongc@nvidia.com), or fill in this Google sheet
>https://docs.google.com/spreadsheets/d/1rQWGYdswcl8O6V5Vu3AqtBh9PotkmTkxN2inI_njDy0/edit?gid=0#gid=0
anything you guys wanna tell them?

>We initially plan to support ControlNet and expand to Video generation. We hope that with community feedback, SANA will get better and better.
i had a feeling they were planning on using sana as a base for a video gen model, only way the super compressed vae would make sense
>>
>>102956249
Super compressed VAE is good for both video and high resolution images. Also less tokens required for the model to learn which means per parameter efficiency goes up.
>>
>>102956079
https://imgsli.com/MzExNjYx
this shit is really amazing, I always felt flux was a bit empty on its image, that node fixes that perfectly
>>102956140
kek
>>
>>102956283
its not good when it wrecks eyes and hands and any other small detail
>>
>>102956295
You can use Flux I don't care
>>
>>102956249
https://github.com/NVlabs/Sana/issues/3#issuecomment-2434357814
>The released version will be further trained. This is a prototype demo for experience.

>>102956283
>Also less tokens required for the model to learn which means per parameter efficiency goes up.
you are right, i completely forgot about that
>>
>>102956249
>anything you guys wanna tell them?
yes, why do they want to compress the VAE so much? their model is small enough, a good quality VAE makes all the difference, especially on details, that's not something you can just overlook and compress
>>
>>102956310
>duh why woood u wunt efficansy
>>
>>102956295
you also have to take into account how much that model is trained / the webp compression from the demo. not saying the vae DOESN'T contribute to this but there are other factors at play
>>
>>102956300
might as well just close your eyes while you generate if you want the model to be good no matter the reality
>>
>>102956310
>yes, why do they want to compress the VAE so much?
this is probably why
>expand to Video generation. We hope that with community feedback, SANA will get better and better
they want to use sana's research to dip their toes into video gen, it would make sense min-max performance then
>>
>>102956317
they're not efficient at all, it looks like shit, that's the problem anon
>>
>>102956345
No it doesn't and I know this for a fact given I've been posting many Sana gens and no one has say anything :)
Almost like you're just a dumb BFL employee
>>
>>102956365
>Almost like you're just a dumb BFL employee
says the Sana employee
>>
>>102956344
It always makes sense to min-max performance because it's impossible to predict what happens when things run 8 times faster
>>
>>102956244
at least that means that the result is dissapointing only because he didn't really finetune the model, what a shame
>>
now now you guys, lets not fight, we are all big adults here. if you can please make sure to give the sana guys any good constructive criticism. i've never seen anyone else ask the community for advice like this before
>>
>>102956376
For example, there are multiple training losses techniques no one uses because the overhead is way too high (ie perceptual loss) despite those objectively improving convergence and final output
>>
>>102956338
thats true but after seeing the effects of sdxls/1.5 vae and cascade im definitely super skeptical against any model that does this super compression shit
>>
File: ComfyUI_temp_xhnhk_00015_.png (2.18 MB, 1088x1360)
2.18 MB
2.18 MB PNG
>>
File: ComfyUI_temp_xhnhk_00016_.png (2.1 MB, 1088x1360)
2.1 MB
2.1 MB PNG
>>102956288
gj miqu anon
>>
>>102956405
There's pros and cons for everything, but please keep demanding your filet minon and act like that's the only thing in the world.
>>
File: AnimateDiff_00001.webm (566 KB, 872x488)
566 KB
566 KB WEBM
10 fucking minutes
>>
>>102956441
>AnimateDiff
I thought you used Mochi for that one lol
>>
File: 00081-2227089261.png (1.11 MB, 832x1216)
1.11 MB
1.11 MB PNG
>>
>>102956453
Oops, that is Mochi. I forgot to change the filename prefix
>>
>>102956023
retracting this (again), I get blurry low coherence with the Q4 compared to fp8. the default 168 frames json has an error where i have to rebuild the video_combine node or it doesn't save the image or vid
(Some Time Later...)
fp8
https://files.catbox.moe/sj6ecc.mp4
ggufq4
https://files.catbox.moe/e1cg5r.mp4
idkwtf is going on.
>>
File: ComfyUI_temp_xhnhk_00019_.png (2.05 MB, 1016x1280)
2.05 MB
2.05 MB PNG
>>
>>102956440
the (sdxl mainly) compression together with low parameters and bad te IS one of the main things bottlenecking image generation right now
have u ever tried just taking a normal image and encoding it and decoding it? im sure u will hecking love the result
>>
>>102956474
>idkwtf is going on.
it's not complicated, Q4 is too aggressive to be usable
>>
>>102956482
The biggest bottleneck is training speed and requirements. We can survive just fine on the SDXL VAE quality if it means 8 times faster training and total requirements.
>>
File: ComfyUI_temp_xhnhk_00024_.png (1.78 MB, 1016x1280)
1.78 MB
1.78 MB PNG
>>
>>102956474
Ok so basically this low is unusable.
>>
>>102956498
sure training speed is a big problem but if the absolute quality ceiling is not good then there is no point in training in the first place
>>
>>102956523
You people were happy to slurp up shitty SD 1.5, I think you are just moving around requirements arbitrarily. And again, 100% VAE recreation is not the be-all-end-all of a model, in fact it's a very superficial requirement.
>>
>>102956523
>if the absolute quality ceiling is not good then there is no point in training in the first place
this, case closed

>>102956549
>You people were happy to slurp up shitty SD 1.5
are you a retard? we had no other choice back then, now we have
>>
>>102956497
It's probably this, I'm overthinking, searching for answers from the perspective of "I don't know enough about all the parameters in front of me".
>>102956516
Seems so, but i will polish the turd for a bit and see.
>>
>>102956565
It's okay, you can use Flux. I'll use Sana. We'll see who gets bored first, I'd imagine Flux will get boring when you realize Loras are extremely limited. Also as always, I never see the Flux apologists ever posting gens.
>>
File: ComfyUI_temp_xhnhk_00029_.png (2.1 MB, 1088x1360)
2.1 MB
2.1 MB PNG
>>
File: ComfyUI_02466_.png (1.65 MB, 1024x1024)
1.65 MB
1.65 MB PNG
>>102956079
can you pls post a workflow
>>
https://www.reddit.com/r/StableDiffusion/comments/1gb07vj/comment/ltjdvlm/?utm_source=share&utm_medium=web2x&context=3
>Apparently 200 steps is the official number too, I haven't dared to go that high yet.
HOLY SHIT ARE THEY SERIOUS?? THATS ALREADY TOO SLOW WITH 50 STEPS AAAAAAA
>>
Yeah I'm starting to get really disappointed in genmo, prompt adherence and world knowledge is really shit compared to minimax or kling
And I've only been using the website
Maybe I should be prompting using tags and commas instead of natural language?
>>
File: file.png (597 KB, 1963x612)
597 KB
597 KB PNG
>>102956614
what workflow do you need? it's just one new node to add between, it's not rocket science lol
>>
File: ComfyUI_temp_xhnhk_00033_.png (1.86 MB, 1088x1360)
1.86 MB
1.86 MB PNG
>>
>>102956371
Sana is a company?
>>
File: ComfyUI_temp_xhnhk_00034_.png (2.31 MB, 1240x1424)
2.31 MB
2.31 MB PNG
>>102956637
have you tried prompting in chinese?
>>
>>102956637
yeah their HD version isn't that good, but that's probably why they're not releasing it now but at the end of the year? They're probably trying to improve it I guess
>>
So am I getting this right that neither SDXL nor Flux offer any benefit to anime generations, which have already been perfected with SD1.5? I mean if you want to use a Pony you need to use SDXL, sure, but inherently it doesn't seem to add anything.
>>
>>102956685
Pony does actual porn with accurate genitals. SD 1.5 cannot do that.
>>
File: ComfyUI_temp_xhnhk_00036_.png (2.7 MB, 1240x1424)
2.7 MB
2.7 MB PNG
>>
>>102956685
ive been liking some pony models for anime quite a bit compared to other ones i've tried
is there a better low res model to look at? kinda stopped doing ai gens for a while when I swapped from my 3090 and rocm didn't yet support rdna3 for a while so I missed out on a lot between sd1 and now
>>
File: but where.jpg (298 KB, 3088x1636)
298 KB
298 KB JPG
>>102956645
but where exactly?
my shit looks like picrel
>>
File: file.png (21 KB, 2785x77)
21 KB
21 KB PNG
>>102956618
https://youtu.be/4lVUuuuJU7c?t=5
>>
Cutest 1girl so far but unfortunately she's 2young

>>102956670
kek I haven't but I've seen examples on the website using Arabic, Cyrillic etc and it seems to understand those languages. Gemno is a US company so English should be best

>>102956679
>They're probably trying to improve it I guess
Yeah their pricing page implies you get early access to new models so hopefully the new model is better AND it makes BFL release theirs in some way too
>>
File: 1715241882023243.jpg (1.14 MB, 1536x2048)
1.14 MB
1.14 MB JPG
>>
File: ComfyUI_temp_xhnhk_00038_.png (2.61 MB, 1240x1424)
2.61 MB
2.61 MB PNG
>>
File: ComfyUI_07742_.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>102956718
yeah it won't work with the KSampler, take this workflow: https://files.catbox.moe/a0snom.png
and also I modified the script so that it allows for more decimals on the values: https://files.catbox.moe/4gxohm.py
>>
File: ComfyUI_temp_xhnhk_00040_.png (2.57 MB, 1240x1424)
2.57 MB
2.57 MB PNG
>>
File: ComfyUI_temp_xhnhk_00041_.png (2.79 MB, 1568x1568)
2.79 MB
2.79 MB PNG
>>
>>102956613
>>102956744
>>102956772
unrealistic, no girls post here nor would they wear merch
>>
>>102956714
checkout noobxl
>>
File: wut are theset.jpg (108 KB, 1676x972)
108 KB
108 KB JPG
>>102956759
thanks bro, wut are these red ones?
>>
>>102956826
It's too bad illustrious has those issues. I choose models based on their names. This is why I will be using Sana over Flux.
>>
>>102956854
some nodes to put the text encoder on my second gpu, you don't need them I guess, just go with the regular loaders you're currently using
>>
>>102956826
>noobxl
when proxl?
>>
File: ComfyUI_temp_xhnhk_00046_.png (1.71 MB, 1080x1344)
1.71 MB
1.71 MB PNG
>>102956815
>he doesnt know about LDG brand clothing
>>
how much vram do you recommend for a beginner UI?
>>
Keep it rolling

>>102956911
>>102956911
>>102956911
>>
>>102956826
i think i did try their prerelease stuff but would have to totally rework my prompts that im using on pony atm
the early access one did generate an image but it does seem to be missing data that models like pony have for specific niches
>>
File: ComfyUI_temp_xhnhk_00048_.png (1.5 MB, 1080x1344)
1.5 MB
1.5 MB PNG
>>
File: watdo.jpg (33 KB, 1085x615)
33 KB
33 KB JPG
>>102956879
ok makes sense, also now I get pic related error.
do I have to change any settings?
>>
>>102956958
did you update comfyui?
>>
>>102956936
noobxl v5 pred has e621 images in it's dataset so that should help fill in some of the more niche tags.

>totally rework my prompts that im using on pony atm
yes, if you use pony score tags on an illustrious model you will be labeled a jeet
>>
>>102955958
There are other option they are just using upscalers, I mean 1696x960 are litterally the double factor of 480p. I mean, the resolution look like the 480 but upscaled.
>>
>>102956977
I did
>>
>>102957013
Idk man, their API demo output look way better than what we have locally, something else is going on
>>
>>102957006
i swapped all of that out for the typical danbooru quality tags they suggest (and I already use typically) but still wasn't really getting what I was asking for from it
its in progress and still training or whatnot so will probably wait to fully judge it once the full model releases
ive had issues trying to get certain tags to show up in anime-only/anime-primary trained models so it might just be because of that in general when that's their focus
those models also seem to overcook quite a bit earlier at higher cfg, at least that's how illustrious seems to be
>>
>>102957016
oh what node did you get that error? ComfyUi colors the culprit node
>>
File: file.png (1.93 MB, 1024x1024)
1.93 MB
1.93 MB PNG
>>
>>102957063
"SamplerCustomAdvanced"
>>
>>102955508
That explain many think, so even this guy have not the hardware to train a full Flux model, and you think SD 3.5 large trains with is just 4B less? I doubt, the reality is his frankmerge is best that the finetunes that we had till now, so this show that de destilled is the path. Sadly, no much people have the hardware to do it, and the nigger pony creator would not train a real model. We cannot make a fucking cross-founding to rent for a month computing cloud and train our model?
>>
any1 know some good chinese artists? maybe sana has lots of that in the dataset and would look really cool
>>
>>102957100
can you show a screen of your workflow? something's weird here
>>
File: watdooot.jpg (476 KB, 3592x1980)
476 KB
476 KB JPG
>>102957124
>>
>>102957158
you haven't used my custom script, your LyingSigmaSampler doesn't have all the decimals, and my workflow was going for values that your current node cannot reach, that's probably why you got an error
>>
>>102957198
>you haven't used my custom script
but I did.
I opened it up with the editor and pasted it in and saved it.
>>
>>102957255
then why do you have 0.1 and not 0.01? you should restart ComfyUi to get it working
>>
File: error2t.jpg (345 KB, 3363x1982)
345 KB
345 KB JPG
>>102957280
I did and I still get the same error
>>
>>102957301
can't help you further, I suggest you to use a workflow that works for you and the reconstruct everything from it to get it working, instead of using my workflow that is somehow incompatible with yours
>>
>>102957048
Is call prompt enhacer, the same with replika and flux.dev and some faggot that have skill issue saying the same moths ago.
>>
>>102949176
Could you give me a picture of the blonde?
>>
>>102956067
I mean it doesn't natively support Kolors either TBF. Only Hunyuan for some reason I think.
>>
>>102954448
I think it'll see interest if it has strong baseline resolution support / image quality, even with the almost certainly worse prompt adherence vs 3.5 Large
>>
File: file.png (2.22 MB, 2590x1227)
2.22 MB
2.22 MB PNG
https://xcancel.com/OpenAI/status/1849139783362347293
>We are sharing a new approach, called sCM, which simplifies the theoretical formulation of continuous-time consistency models, allowing us to stabilize and scale their training for large scale datasets. This approach achieves comparable sample quality to leading diffusion models, while using only two sampling steps.
Really interesting



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.