[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: the longest dick general.jpg (2.55 MB, 3264x1545)
2.55 MB
2.55 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bred : >>102930087

Can't Run It at Home Edition

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io
Metastable: https://metastable.studio

>Advanced UI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
reForge: https://github.com/Panchovix/stable-diffusion-webui-reForge
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://aitracker.art
https://huggingface.co
https://civitai.com
https://tensor.art/models
https://liblib.art
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3

>SD3 Large
https://huggingface.co/stabilityai/stable-diffusion-3.5-large
https://replicate.com/stability-ai/stable-diffusion-3.5-large

>Sana
https://github.com/NVlabs/Sana
https://8876bd28ee2da4b909.gradio.live

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux
DeDistilled Quants: https://huggingface.co/TheYuriLover/flux-dev-de-distill-GGUF/tree/main

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/aco/sdg
>>>/aco/aivg
>>>/b/degen
>>>/c/kdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vt/vtai
>>
File: tmpu9l28owi.png (891 KB, 768x960)
891 KB
891 KB PNG
>>
File: tmp3mdrnegg.png (1.1 MB, 1024x1024)
1.1 MB
1.1 MB PNG
>>
biggerma waiting room
>>
File: 2024-10-22_00012_.png (1.11 MB, 720x1280)
1.11 MB
1.11 MB PNG
>>
>>102934111
>>102934133
combine prompts
>>
>>102934135
whad you call me?
>>
>>102934163
based
>>
File: 1723950135982094.webm (830 KB, 720x720)
830 KB
830 KB WEBM
>>102934088
chekt

HANDS FREE SLOPPING:
> Text -> Word salad
> Word salad -> Image
> Image -> Video
> Edit/Subtitle
> Convert to WebM
> ?????

OLD MEME GUIDES:
https://files.catbox.moe/3az283.jpg
https://files.catbox.moe/e5mzsc.png
https://files.catbox.moe/5ix69v.png
https://dallery.gallery/the-dalle-2-prompt-book
>>
File: FUCK_STABILITY_2.png (3.56 MB, 2047x1024)
3.56 MB
3.56 MB PNG
I’m strongly stirring shit. I messaged Lykon, I posted a comparison on Reddit, and I posted a side by side on my X account.
I’m prepared to be and would love nothing more than to be totally wrong about the accusation that there are different versions of 3.5 Large floating around.
It’s certainly worse than the alternative, that SAI has separate weights and are intentionally obscuring that fact.
>>
>>102934226
am I supposed to know which is which?
>>
>>102934226
what if no one responds to you
>different versions floating around
thats exactly what happened with the last version, right?
>>
>>102934256
Yup.
>>102934240
Left is replicate right is local
>>
What's the difference between libre flux and dedistilled? Practically speaking of course.
>>
>>102934226
why are the eyes always retarded
>>
Anyone try SD 3.5? How does it compare to Flux Dev?
>>
File: fertility.jpg (781 KB, 1776x1776)
781 KB
781 KB JPG
>>
>>102934300
it’s different. It’s good and bad. Less waxy
>>
>>102934300
Flux is better but SD 3.5 is trainable.
>>
>>102934303
Turn your guidance down a little bit
>>
>>102934300
SD3.5 doesn't feel like much of an improvement over SD3. Flux Dev still wins in terms of human anatomy by a landslide.
>>
>>102934339
SD3.5 has better nipples but Flux and SD3 both have AI nipples. You can tell they don't train on the good stuff.
>>
>>102934226
Look at this first:
https://old.reddit.com/r/StableDiffusion/comments/1g9vm10/comparison_of_all_samplers_schedulers_for_sd_35/
>>
https://github.com/genmoai/models/issues/3#issuecomment-2430501267

Well that was fast. Quantization to 8 bit would save another 5GB VRAM. Mochi basically confirmed working on a single 4090, we just need to keep optimizing. A quanted and optimized context-parallel implementation probably would run at the full video length on just 2x4090.
>>
>>102934275
"Local" looks better here. Regardless I don't see how this is different from models on CivitAI genning different results than local on the same seed, it comes down to the workflow and how it impacts the noise
>>
>>102934339
I disagree a lot, as somehow who has actually trained hardcore multi-person NSFW concept loras for Flux
>>
>>102934404
I kneel
>>
all I know for certain is that once you get used to how people and bodies look in Flux it's very hard to go back to an SD model. Flux bodies feel more whole, like every part of them was drawn with the other parts in mind. Not perfectly, mind you, but with SD it feels like when the body parts match up you just got lucky, and they're still a little... off
>>
>>102934493
Flux bodies look like AI slop, it's got that Dreamshaper stamp of approval.
>>
>>102934493
>drawn
Please stop doing that
>>
>>102934501
What Flux makes depends on how you ask it.
>>
>>102934524
>just add a blur filter
Okay, but that doesn't really solve the problem.
>>
>>102934542
you don't know what you're talking about and you're not a seasoned Flux prompter if you think I'm "adding a blur filter"
>>
>>102934551
Realism isn't just doing "low quality digital picture from 1998 taken by someone with parkison's". Your protests only prove my point. Now do a 4K image.
>>
>>102934308
>Flux is better but SD 3.5 is trainable.
Flux is also trainable
https://huggingface.co/nyanko7/flux-dev-de-distill
>>
>>102934632
>the model that everyone says is worse
>that was last updated a month ago
>is an example of how well Flux training is going
Okay buddy. I'll believe you when I see your finetune using this model.
>>
File: sd35.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
sd3.5 is not bad. should i try it with XL loras?
>>
>>102934404
>A quanted and optimized context-parallel implementation probably would run at the full video length on just 2x4090.
that's even better, take a look at this
>73 frames and otherwise default settings takes 26GB VRAM (848x480), so 67 or 61 frames might even fit into 24GB.
It means a 4 sec video would work on a single 3090, let's fucking go duuude
>>
>>102934647
>the model that everyone says is worse
you're the only one that said something like that, the quality is on par with vanilla flux
>I'll believe you when I see a finetune using this model.
https://huggingface.co/SG161222/Verus_Vision_1.0bhttps://huggingface.co/SG161222/Verus_Vision_1.0b
>>
>>102934561
if you choose to prompt 4kslop you're gonna get the same shit images you keep asking for. Maybe next model will fix your skill issues
>>
>>102934654
>posts a model that looks worse than a Lora trained on Dev
>>
File: file.png (529 KB, 638x747)
529 KB
529 KB PNG
>>102934657
>>
>>102934671
Post proof. Same prompt using dev and your model. Make sure there's some text too so we can make sure you didn't lobotomize the model's capabilities.
>>
>>102934663
damn anon you're some special kind of retard, at least you realized your mistake and deleted your post
>>
>>102934551
Why don't women like this exist anymore
>>
>>102934677
you're the one who made the claim it's worse, therefore you're the one who has the burden of proof, hope that helps.
>>
>>102934688
no proof
probably because you're a poorfag that can't even run it
you always post about how great it is, but never any images with your posts
it's really strange really
>>
>>102934693
damn that's a lot of words, won't remove your burden or proof though
>>
I'll save both of us the trouble
/Verus_?Vision/is
>>
>>102934682
imagine her 30 lbs heavier in cheaper clothes with her hair unkempt
>>
>>102934654
>This version of the model does not have a variety of styles and is intended for realism and photorealism.
>There may be issues with anatomy and text.

Like always, de-distillation degrades the model and training degrades the model.
I'm guessing this same guy will shift to SD3.5 in less than a week.
>>
>>102934719
>Like always, de-distillation degrades the model and training degrades the model.
Talking out your ass again. Unless you have evidence it is the case you're just full of shit
>>
>>102934649
>848x480
that's the SD version, the HD version they'll release will probably be unrunable lol
>>
>>102934099
From /lmg/, someone got mochi-1 inference working on 24GB of vram single gpu. 60 frames/2.5 seconds af 25fps max for 24GB vram

>Two beautiful young teen Russian girls, dressed in sleek white latex outfits, ride in the back of a modern car. The scene is bathed in vibrant neon lights as they look at each other and then stare at the camera
Mochi apparently doesn't know what Russian teen girls look like, and there's no image2video so the base model is completely worthless for my use case unless I got unlucky in my 8 or so attempts or this is just a prompt/skill issue. Probably would've subbed to the 8 dollar tier on 4 different accounts if this weren't the case.
(I might still do this because just like what that other anon the other day said, generative AI is my only source of joy and novelty in life). >I'm assuming this is a result of the NSFW filtering they did.
>>
>>102934780
you made that video locally or you used the demo?
>>
File: file.png (1.87 MB, 928x1120)
1.87 MB
1.87 MB PNG
>>
>>102934780
Most video AI look like they're exclusively trained on stock videos. We're not in the wild west of Youtube video scraping yet.
>>
>>102934780
>young teen
>they look mid 20s
why admit to this prompt if this is all you have to show for it?
>>
File: file.png (1.4 MB, 928x1120)
1.4 MB
1.4 MB PNG
>>
>>102934802
>We're not in the wild west of Youtube video scraping yet.
the chinks don't give a fuck about that, MiniMax has probably being trained on a shit ton of movies, and it shows
>>
>>102934813
That example is absolutely a stock video.
>>
>>102934818
I never said it wasn't, but pretending that no one have already scrapped copyrighted videos is objectively wrong, MiniMax is the evidence of that
>>
>>102934793
demo, it's 5 seconds long
There's also a NSFW endpoint and my other gen with the same prompt got hit by it so I didn't receive it and even though the site said I have 1 gen remaining I couldn't use it. So moderated gens use your quota too right now which is another reason I can't justify paying

>>102934802
>We're not in the wild west of Youtube video scraping yet.
China is, unfortunately. Even Kling which is months old tech at this point had better knowledge of ethnicities and better prompt adherence to ages

>>102934803
>young teen
>they look mid 20s
modern white women amirite
>why admit to this prompt if this is all you have to show for it?
I'm literally complaining that this is all I have to show for it anon
>>
>>102934802
In the early days of stable diffusion people were frustrated that they kept getting stock photo styles, but it's because of how they were prompting. Stock photos and stock videos are tagged comprehensively and descriptively, and naïve prompting will reproduce this style of captioning.

I don't know anything for certain because I don't play around with the video AI stuff as much. Is the new one better than Hailuo? I had fun with that one
>>
Reminder that based Sana is the reason models released sloppily in one day.
>>
>>102934861
Why do all the videos look like spastic retards?
>>
sana-samas stay winning
>>
File: 2024-10-22_00013_.png (1.25 MB, 720x1280)
1.25 MB
1.25 MB PNG
>>102934149
>>
GGUF model lets fucking gooo
https://huggingface.co/city96/stable-diffusion-3.5-large-gguf
https://huggingface.co/city96/stable-diffusion-3.5-large-turbo-gguf
>>
>>102934845
>I'm literally complaining that this is all I have to show for it anon
my bad I didn't read the post as it turns out
>>
File: file.png (1.2 MB, 928x1120)
1.2 MB
1.2 MB PNG
>>102934868
It's impossible for chang to lose.
>>
>>102934892
I have Flux, why would I use that?
>>
>>102934905
>I have Flux, why would I use that?
this, if a model can't surpass Flux in image quality it's fucking useless, that's how our standards are today
>>
>our
>>
can flux do inpainting?
>>
>>102934648
I wish I was this naive
>>
>>102934861
The new AI is the best local AI for video and in most people's opinions appears to be the first usable locally-runnable (in theory) text to video AI.
Minimax is probably better since like that anon said Chinese don't care about copyright or ethics and shove all the movies and thirst traps into their datasets.
>>
>>102934861
This is promising, you can tell it would be able to do good handjobs
>>
Whats the best age slider for pony?
>>
>>102935054
that video is from MiniMax, not from the local model
>>
>>102934905
Not now, but in a few weeks we will probably all change to SD 3.5 or Sana or something else.

Because Flux is simply too heavy, does not finetune well (way too much bleeding etc)

So if 3.5 finetunes well we will see somce cool models.
>>
>>102935067
yeah I know, I just mean we will get there eventually. Nice to see a close example, they way she's talking and moving, I can imagine her talking dirty while giving a HJ.

Sorry about being perverted. I'll move on
>>
>>102935079
thank you for being perverted about my gen, I appreciate it, that's what 1girl is for
>>
>>102935092
No thank you.
>>
>>102935069
>16gb download for each conceivable prompt category
no thanks
>>
adamw8bit is based
>>
>>102935063
The ones not on civitai but instead on the website for the Bureau of Alcohol, Tobacco, and Firearms
>>
>>102935130
The download page is the report a crime interface on homeland security's website.
>>
>>102935130
I'm just trying to age my dog, dog.
>>
>>102935144
I support the CIA auditing the NSA.
>>
>>102935136
Apparently the FBi had to ask people to stop reporting AI children to them because it was clogging up their tip line of actual crimes
source: I heard it on the internet somewhere, maybe here

>>102935144
I wonder why doggirls are so much less popular than catgirls especially when most anime nerds would love a doggirl's personality
>>
>>102935193
Dogs are generally more stink and not as cute, although there are some cute dogs.

Better question is why do people want to fuck animals in the first place, NO THANKS
>>
>>102935248
Maybe because I'm a dog.
>>
>>102934470
Can you show some examples?
>>
File: file.webm (356 KB, 600x340)
356 KB
356 KB WEBM
https://www.genmo.ai/blog
for a local model that's fucking impressive, and it's only the 480p versiob
>Today, we are releasing our 480p base model, with Mochi 1 HD coming later this year.
>>
File: 1709407070499708.jpg (78 KB, 960x753)
78 KB
78 KB JPG
>>102935264
>>
File: file.png (244 KB, 1920x1145)
244 KB
244 KB PNG
>>102935292
So that's the 480b model that has those insane benchmarks, imagine when we'll get the HD version, damn they're good
>>
>>102935292
The fuck did we do to deserve such a powerful model? It's really close to MiniMax and it's a fucking Apache2.0 licence, I thought we would get a local video model of this level in 5 years, not now
>>
>>102935292
>>102935864
Who can even run this????
you need two 3090s at least
>>
>>102935885
>you need two 3090s at least
not anymore
https://github.com/genmoai/models/issues/3#issuecomment-2430501267
>67 or 61 frames might even fit into 24GB.
>>
https://civitai.com/models/652699?modelVersionId=828456
Is the v4 still the best lora for realistic pictures?
>>
File: 489654165465465.gif (1.47 MB, 422x422)
1.47 MB
1.47 MB GIF
>>102935901
close fit...
>>
>>102935985
if you go for 15fps, you can get 4 seconds of vid (60 frames), that's not bad at all
>>
>>102935989
12 fps anime bros...
>>
>>102936011
if you go for 12fps anime you can get 5 seconds kek
>>
>>102936019
the average film cut is 3 seconds, the future is now slop-bros....
>>
File: file.png (152 KB, 500x300)
152 KB
152 KB PNG
>>102936034
>the future is now slop-bros....
NOT SO FAST WEEB
https://www.genmo.ai/blog
>Mochi 1 is also optimized for photorealistic styles so does not perform well with animated content.
>>
>>102936072
>>
>>
I just tested SD3.5, let's go for the positives first:
- It's way better than SD3M on anatomy, which is expected because we're comparing a 8b against a 2b model lol
- I like the variety of outputs I got, feels less rigid than Flux and isn't afraid to add relevant fluff onto the picture
- Nudity is good, we finally have nipples yayy, for once SAI decided to step down on acting like an old nun (I have nothing against nuns, but they shouldn't be working on image models kek)
- The licence is fine, you can make your own money as long as you don't surpass 1 million dollars, that's not gonna happen for a lot of people
- It's not distilled and is smaller than Flux dev, so it'll be easier to train
- It's faster to run, that's refreshing, especially when I spent 2 months working with Flux, waiting minutes to get a single picture!

Now that's out of the way, let's get to the issues... And of course I'll have to compare it to actual local sota model, Flux dev:
- I said the anatomy is better, but it's still not good enough, I often got monstruosities when with FLux it's working fine 80% of the time
- There's some moments where the image gets oversaturated as fuck, I didn't touch the cfg at all, first time I'm seeing this
- The prompt adherance is worse than Flux, it's just is, they are both using T5 but flux seems to get the nuances more
- The quality is inconsistent, Flux is consistent and rigid, wheras SD3.5 is diverse but inconsistent, I feel consistency is more important
- The NFSW censorship is still there, yeah it can do booba, but it can't do vaginas and penis, that's a shame really
- Lykon said they used a VLM to caption all the pictures, so of course it doesn't know styles and celebrities, that's a tie with Flux on that one

I feel they stepped on the right direction, Imo they should go for a SD4, if that one is close enough to Flux while having a better licence + undistilled + smaller model, it could definitely replace it, but for the moment I'm sticking with Flux dev
>>
File: 1709308391287219.jpg (243 KB, 1024x1024)
243 KB
243 KB JPG
>>102936190
Just remember that if it ever gets overly frustrating you can go outside and touch grass, or moss even!
>>
File: file.png (2.19 MB, 1232x1304)
2.19 MB
2.19 MB PNG
>>102936286
>go outside and touch grass
If you say so Lykon... there you go!
>>
>>102936307
OH NO NO NO NOT LIKE THAT
>>
>>102936307
>upside down
can flux do it?
>>
File: 1729668047.png (5 KB, 753x47)
5 KB
5 KB PNG
I guess i wont get my waifu model :/
>>
File: file.png (1.77 MB, 1024x1024)
1.77 MB
1.77 MB PNG
>>102936509
It can do it better yeah
>>
>>102936521
what are you training anon?
>>
>>102936542
flux lora of a girl i'm in love wtih
>>
https://xcancel.com/genmoai/status/1848820965813162306#m
>Our new model, Mochi, is entirely text to video.
DOA
>>
>>102936784
That's it? After 16 hours they only got 2 replies? From the same guy?
I'm glad he was around, imagine just getting crickets and tumbleweeds.
>>
>>102936966
>After 16 hours they only got 2 replies?
he responded to that question 4 hours after the release, and there's a lot of questions answered on that twitter post, mostly about technical issues on the demo though
>>
>>102936981
Oh, not familiar with twitter, they should show the entire thread like in here, or even old reddit.
>>
File: ComfyUI_04236_.png (1.6 MB, 1024x1024)
1.6 MB
1.6 MB PNG
Oh no.
>>
File: ComfyUI_04254_.png (1.45 MB, 1024x1024)
1.45 MB
1.45 MB PNG
>>
File: file.png (28 KB, 688x368)
28 KB
28 KB PNG
wait what? I didn't know you could download the training data of someone's Lora on Civitai, kek
>>
File: COMPARISON.png (2.07 MB, 1536x1024)
2.07 MB
2.07 MB PNG
Um... the fuck?
Top left: AliveAndKicking.safetensors
Bottom left: Pineapple Anime Mix
(to show how the prompt should look like)
Right: Stable Diffusion 3.5
Is this what people meant with "inconsistent"? Just put a cat head on the girl?
>A realistic painting of a CUTE CHIBI playing videogames, ponytail, potted plants, and classical antiquities by raphael, by ROSSDRAWS and alberto vargas, highly detailed and intricate, cinematic lighting 4k
>>
>>102937118
Everyone has the option to share their training data, most choose to keep it private.
>>
>>102934133
I can see a pretty clear checkerboard pattern on this, what controlnet did you use?
>>
File: ComfyUI_04257_.png (1.87 MB, 1024x1024)
1.87 MB
1.87 MB PNG
hehehe
>>
>>102937135
>Is this what people meant with "inconsistent"?
no, this is what it means >>102930706
>>
File: ComfyUI_04262_.png (1.88 MB, 1024x1024)
1.88 MB
1.88 MB PNG
Sora has trouble genning full bodies
>>
>>102937195
human anatomy is really hard to master, that's why Flux is so special, it just gets it
>>
>>102937166
Okay, I like the diversity of outputs but putting cat head on people is going too far.
Well, I remember a big branch of SD1.5 models created just to get animal heads on human bodies, I guess they'll be happy to know SD3.5 does that automatically.
>>
>>102937195
What the fuck is Sora?
>>
File: ComfyUI_04269_.png (1.69 MB, 1024x1024)
1.69 MB
1.69 MB PNG
>>102937216

lol i meant sd 3.5l

Yesterday they announced too many new models
>>
File: COMPARISON2.png (2.05 MB, 1536x1024)
2.05 MB
2.05 MB PNG
Left: Stable Diffusion 3.5
Top right: 365.safetensors
Bottom right: AliveAndKicking.safetensors
Okay, it's outdoing SD1.5 models, I shared this one because she clearly has the exact same face of top right, but I still like bottom right better, too bad it got bad hands.
>Pretty detailed CUTE LITTLE Girl, Cartoon, sitting on a computer monitor, holding antique TV, DETAILED CHIBI EYES, gorgeous detailed hair, Magazine ad, iconic, 1940, sharp focus. Illustration By KlaysMoji and artgerm and Clay Mann and and kyoani and Dave Rapoza
>>
>>102937243
Well, looks like it can make full bodies with CHIBI in the prompt, if it's like old image models you can get somewhere inbetween by moving it to the right of the prompt.
>>
do any of the gui backends support omnigen yet? i wanna try it
>>
>>102937292
Did you try it here?
https://huggingface.co/spaces/Shitao/OmniGen
The idea is, if it's not worth the download and setup time, this should let you know.
>>
>>102937118
only if they upload it
>>
File: file.jpg (1.59 MB, 3264x1456)
1.59 MB
1.59 MB JPG
https://github.com/Mann-E/Mann-E-Flux
>Mann-E Flux : The last AI image model you'll test
Lawl
>In the past few months, FLUX models became a trend on the internet and specially the AI community. We, at Mann-E, tried to develop FLUX-inspired models (with similar architecture) and this repository is for tools and codes related to the Mann-E Flux model.
>This model, follows the same architecture as flux and can be used with diffusers and we currently use it in our private systems. Thanks to RunPod we managed to make this model in less than 480 hours on A100 and H100 GPU's!
Ok you got my attention
>>
File: file.png (1.32 MB, 889x876)
1.32 MB
1.32 MB PNG
>>102937492
>The images used in training, like our Mann-E Dreams model, are from midjourney.
DOA
>>
>>102937492
so it's a new model they trained from scratch using a flux-like architecture?
>>
>>102937507
https://www.reddit.com/r/StableDiffusion/comments/1ga5o41/comment/ltbd3b2/?utm_source=share&utm_medium=web2x&context=3
>It's mixture of fine-tune, LoRA merge and knowledge distillation on Dev model. I may release a blog post soon explaining what we've done.
I have no idea what he's talking about
>>
>>102937518
the whole thing gives off big grifter vibes, barely any information about the models themselves too. i'm going to remain skeptical until they release it
>>
File: sigmas rats.jpg (3.4 MB, 4506x2048)
3.4 MB
3.4 MB JPG
Here's 50 winged rats.
Flux sigmas are interesting. I think it might be following the art style of the prompt more when you lower the sigma. For example, "whimsical and surreal atmosphere" is in this prompt which could explain the strange colours.
But I am using an art style lora here so that might be interfering.
>>
>>102936190
>Imo they should go for a SD4
I'm not sure I see the point if they use the same kind of limited VLM in the dataset, and the same censorship.
>>
>>102937492
I want prompt for 1st image
>>
File: sigmas knight.jpg (3.62 MB, 9012x912)
3.62 MB
3.62 MB JPG
>>102937592
This prompt includes "The setting is a grassy field filled with exotic flora, unusual botany, and strange vegetation." but I'm surprised monsters start appearing.
>>
>>102937592
>Flux sigmas are interesting.
what's a flux sigma? and how you can change that value on ComfyUi?
>>
File: COMPARISON3.png (1.88 MB, 1536x1024)
1.88 MB
1.88 MB PNG
>>102937274
And this one.
SD3.5 has this problem where characters stare at you in the most boring possible pose imaginable.
I guess this is the point I throw it into the trash, but I already had Flux in the trash, so I guess I'll compare them at some point.
>little girl in red shorts, yellow shirt, fashion shoes, DETAILED CHIBI EYES, 1940, magazine ad, iconic, Soup shop in a busy market, cars on the street, james gilleard, print, game art
>>
>>102937610
https://www.reddit.com/r/comfyui/comments/1g9wfbq/simple_way_to_increase_detail_in_flux_and_remove/
>>
File: file.png (91 KB, 784x217)
91 KB
91 KB PNG
>>102937620
Oh ok so you use this, and you reduce the sigma by 5% for each step, interesting, I'm gonna tinker with it later, thanks anon
>>
Why huge companies that has millions of GPUs don't train image models?
>>
File: xyz_grid-0002-1131927144.jpg (496 KB, 4000x1342)
496 KB
496 KB JPG
what's the best flux model today?
>>
>>102937641
because image models is one of the most dangerous thing to make if you care about your company's PR, images have always been a powerful tool to create propaganda and shit, and they don't want their company to be associated with that
>>
>>102937639
yeah but it's looking like 0.95 makes a massive difference. 0.96 or 0.97 might be better.
>>
File: 1706203941791375.png (3.01 MB, 2048x1024)
3.01 MB
3.01 MB PNG
>>102937620
the difference is kind of insane
>>
>>102937672
I think beta + sigma 0.95 is a good idea because in my experiments, I always found beta to be the one to give the best prompt understanding in flux, but it was burning the image a bit so I didn't liked it too much, maybe that could be the solution
>>
>>102937620
>https://www.reddit.com/r/comfyui/comments/1g9wfbq/simple_way_to_increase_detail_in_flux_and_remove/
Okay, this is extraordinary, it seems to be bringing up the style one had to change CFG to get, or use destil, on regular Flux Dev! Isn't this just looking better than Flux 1.1 Pro already??
Now all we need is a space like this:
https://huggingface.co/spaces/FilipeR/FLUX.1-dev-UI
With sigmas set to 0.95!
After so many disappointing releases yesterday THIS could be what makes us so back!
>>
>>102937647
The one without shiny skin and buttchins, of course.
>>
>>102937708
>After so many disappointing releases yesterday THIS could be what makes us so back!
are you serious? damn, can you show some pictures to show how well it improves shit? this is the github to get that node btw >>102937639
https://github.com/Extraltodeus/sigmas_tools_and_the_golden_scheduler
>>
>>102937708
>be me SAI
>spent months repairing my PR disaster on SD3M by making SD3.5-8b
>It's still not close to flux dev, people noticed it
>One day later they improved flux dev to pro level thanks to some schizo node from a custom Github repo
>before I was on the top, then I had to catch up to flux dev, and now I have to be at pro level
>feelsbadman
kek, the jokes write by themselves
>>
File: file thumbnail.png (71 KB, 284x216)
71 KB
71 KB PNG
>>102937708
There is a slight downside in that it makes the image a little bit more flat. For example in this thumbnail from a distance you can see the top row looks like noise, whereas the bottom row has very strong silhouettes.

But maybe there's other ways to manipulate the noise as well to affect this.
>>
Anyone else noticed how /ldg/ is behaving like a cult when it comes to SD3.5?
>>
>>102937759
maybe the sweet sport is between 0.95 and 1
>>
>>102937667
>it's looking like 0.95 makes a massive difference
You mean "it's looking like 0.95 makes a massive improvement."
>>
>>102937680
No, it'd just be hyperdetailed burning.
I have used the word hyperdetailed in hundreds of prompts but it's my first and second times using it on a sentence.
>>
>>102937795
>No, it'd just be hyperdetailed burning.
what's that?
>>
File: ComfyUI_SD35L_0292.jpg (205 KB, 1152x896)
205 KB
205 KB JPG
>>
>>102937759
I never liked the silhouettes and flat backgrounds so all the bottom ones look poor to me.
Default Flux's outputs looking poor to me now, jonesaid is the hero we'll never be able to deserve.
>>
>>102937820
>jonesaid is the hero we'll never be able to deserve
it's funny how many little tricks we discovered in just 2 months just because we love flux so much and we wanted to fix some of its shortcommings kek
>>
>>102937801
The same burning, but with lots of detail.
To see what I mean, get a burned picture and use image to image with a very detailed model to see what I mean.
If we could do that, people would not mind burning because they'd just fix it in a second step, and we could use Flux with higher CFG to get better styles because we would fix the burning in the aftertouch.
Unfortunately, once something is burnt, not even MacGyver can save it, that's why people spent so much time creating things to avoid the burning.
>>
>>102937708
is there any way in comfyui to see previews while it's generating while using SamplerCustomAdvanced?

The only reason why I still use KSampler is because of the mid gen previews
>>
>>102937832
I now wonder if the shortcomings were a good thing, if they had released Flux pro in the open we'd not have all this tech and discoveries.
BFL made the right decision after all.
>>
File: file.png (2.46 MB, 2901x1504)
2.46 MB
2.46 MB PNG
>>102937857
>The only reason why I still use KSampler is because of the mid gen previews
you have mid gens preview with the SamplerCustomAdvanced node
>>
>>102937492
Wake me up when November ends
>>
>>102937852
yeah I don't think burning is a good thing at all, there's a point where sure you get better prompt understanding when it's a little burned, but if you burn it too much the prompt understanding starts to be bad, there's probably a perfect equilibrum somewhere and we might have reached it closer thanks to that MultiplySigma trick, that's my 2 cents
>>
>>102937870
you use such a high cfg, are you sure 3.5 doesn't look better?
>>
>>102937900
I'm using DynamicThresholding, it's cfg 6 + mimic 3
https://reddit.com/r/StableDiffusion/comments/1g2luvs/my_suggested_best_settings_for_fluxdevdedistill/
>>
>>102937881
They haven't released this one but they had released many before:
https://huggingface.co/mann-e
Were they any good? I'll give you a hint: have you heard of them before?
>>
>>102937937
>Were they any good? I'll give you a hint: have you heard of them before?
this, 100% this, if a model is unknown it means it's shit, it's not rocket science, we're starving from good models, so when a good one comes out we jump to it like enraged dogs kek
>>
>>102937708
So is detail daemon is as good?
>>
File: file.jpg (2.47 MB, 8171x1795)
2.47 MB
2.47 MB JPG
>>102937620
>Hatsune Miku with dreadlocks and a black skin skateboarding in New York, she has a hat that has "MIGU" written in it, anime style
Really interesting, you can see the bokeh starts to come back at 0.97
>>
>>102938022
All of these are the same seed?
>>
>>102938026
yes, the exact same settings for each picture
>>
>>102935885
Progress. Wont be able to comment for another 15 mins due to the new timers (thanks hiroshomoot :/)
>>
>>102938077
Kijai is a fucking goat, not gonna lie, but let's not get ahead of ourselves, the model they gave to us is the 480p model, they'll also release an HD version and that one won't probably fit on a single 24gb vram card
https://www.genmo.ai/blog
>Today, we are releasing our 480p base model, with Mochi 1 HD coming later this year.
Now I'm starting to wonder if they used the HD version for their trailer or not
https://xcancel.com/genmoai/status/1848762405779574990#m
>>
>>102937964
Yeah, I've been using it for ages.
>>
>>102938094
480p with one of the many AI-video-genned-specific upscalers (cog upscaler?) might be alright for now.
>>
File: chrome_McKirLaBvt.png (37 KB, 1190x489)
37 KB
37 KB PNG
>>102938096
What's best for flux? all to 0.95?
>>
>>102937947
It's quite unfortunate that mann-e has the resources to do all these finetunes but not the talent to do them well.
If they had the talent they could be black forest level, instead, you need to use it once to go "no thanks."
>>
>>102938131
increasing the resolution isn't just about that, try to make a Flux image on 512x512 and another one on 1024x1024, the 512 one isn't just a downscaled picture of the 1024 one, it's an objectivly worse picture, so it's important we're able to go for higher resolutions, because the quality will just be better
>>
>>102938133
read the github page bro
>>
>>102938144
>It's quite unfortunate that mann-e has the resources to do all these finetunes but not the talent to do them well.
how did they manage to get all those resources though, they haven't made a single good shit right?
>>
>>102938147
I understand, i suppose i was thinking more on coherence rather than resolution, but your point is fair.
>>
>>102938173
Not your "bro" faggot.
>>
File: FluxVsFlux.png (278 KB, 512x256)
278 KB
278 KB PNG
>>102938147
You got it the wrong way around, the 512x512 is the superior picture (on the right), its problem is that it's small, but comparisons have been done before.
Flux makes a certain level of detail so the smaller the picture the more detail by pixel area.
>>
File: file.jpg (2.28 MB, 8171x1795)
2.28 MB
2.28 MB JPG
>>102938022
>>102937620
>A woman skateboarding in New York, she is Holding a Pikachu plush, subtitle text on the bottom of the image that reads: "This is a test about the sigma multiplier."
Only sigma 0.99 managed to not mess up the text lol
>>
>>102938182
With this:
https://mann-e.com/pricing
In the end, if you sell it, people will buy it.
People vote with their wallets and half of them have less than 100 IQ, so a thing you can do is sell image services and get money from people even if your models are subpar (just look at Craiyon now, it's the quality of SDXL base with the censorship of Dalle 3 and they're doing well.)
>>
File: file.png (1.69 MB, 2006x1414)
1.69 MB
1.69 MB PNG
>>102938259
if it was the case we would all go for 512x512 inference instead of waiting way longer whild doing 1k pictures, but it's not the case, when it's too small, the anatomy gets bad and shit, those models need more pixels
>>
>>102938210
not your "faggot" nigger.
>>
Is it possible to make forge ignore resolution data when sending to img2img? Sick of constantly having to adjust it whenever I get a new good t2i result and want to reroll it at higher res a few times.
>>
File: ComfyUI_04325_.png (1.75 MB, 1024x1024)
1.75 MB
1.75 MB PNG
As she says

I've also downloaded the multiplier and using it at .99, i'm noticing a substantial detail difference
>>
File: 512Vs1024.png (1.55 MB, 1536x1024)
1.55 MB
1.55 MB PNG
>>102938334
I was going to tell you "I'm talking about level of detail and you're using a toon link lora that kills the detail for the style", but my prompt didn't do better (other than, I think my chessboard looks better than yours in both sides.)
A photograph captures two young women playing chess in an outdoor café. The woman on the left is Hatsune Miku, with long teal hair, wears a grey tank top and is focused on her moves. The woman on the right is Sailor Moon, with long blonde hair, wears a pink lace top and is in mid-move. The chessboard is black and white, positioned on a wooden table. In the blurred background, people walk and sit, suggesting a busy urban setting. The atmosphere is casual and vibrant.
>>
>>102938416
nvm found it
>Send size when sending prompt or image to another interface
>>
>>102938436
kek, I wished 512x512 pictures would give better quality but it's not the case, that would make Flux way faster if this was the meta that's for sure
>>
>https://arxiv.org/abs/2410.17146
>https://github.com/wang-kee/LiNeS
could this work with image models? supposed to prevent catastrophic forgetting
>>
>how do I do x?
>what is <some inane term from two years ago>?
Nice technical discussion.
>>
>>102938473
>could this work with image models? supposed to prevent catastrophic forgetting
it could, LLMs are based on the transformers architecture, it's also the case for DiT models like Flux or SD3/3.5
>>
File: Boo0.png (1.34 MB, 1024x1024)
1.34 MB
1.34 MB PNG
>>102938436
This is SD3.5 for laughs and giggles.
>>
>>102938501
I think that's the biggest issue of SD3.5, why is it making those oversaturated pictures that's so weird
>>
File: ComfyUI_04340_.png (1.42 MB, 1024x1024)
1.42 MB
1.42 MB PNG
This is flux, i'll try with sd3.5
>>
I want to train a Lora on my 1080ti in Comfy. I'm just not sure about the setting (I'm not a pro).
It works, but I feel like I could reduce the numbers.
>>
File: file.jpg (2.41 MB, 8171x1795)
2.41 MB
2.41 MB JPG
>>102938285
>A woman smoking a pipe outside, it's sunny and there's a blue sky, she is wearing a MAGA hat that reads: "This isn't a hat", subtitle text that reads: "This isn't a subtitle", she wears a green skirt and red pumps
This is so all over the place, hard to find the sweet spot when they all act so different
>>
cfg 1.0
>>
File: ComfyUI_04341_.png (1.63 MB, 1024x1024)
1.63 MB
1.63 MB PNG
sd 3.5 .. euler simple and 30 steps
>>
>>102938506
I don't mind the saturation, I mind it doesn't even draw their eyes and I've seen better chess boards on SD1.5 models.
It's so bad in general that it'd still be bad even without the oversaturation problem.
Also
>>102938638
SD3.5 would never.
>>
>>102938672
looks fine at cfg 1.0, if I do the same on dev undistilled it'll look horrible lol
>>
https://www.reddit.com/r/StableDiffusion/comments/1g9n9kf/comment/lt8spp5/?utm_source=share&utm_medium=web2x&context=3
>Yeah I don't know what that's about, already ran this under 20GB in fp8 and tiled VAE decoding, the VAE is the heaviest part, will wrap to Comfy nodes tomorrow for further testing.
let's goo, we'll truly get MiniMax at home!
https://xcancel.com/genmoai/status/1848762405779574990#m
>>
Im already back to using SDXL, reality is bleak
>>
>https://github.com/centuryglass/IntraPaint
very nice alternative if you dont want to use gimp/krita for inpaint
>>
>>102938739
>let's goo, we'll truly get MiniMax at home!
>>102938750
>reality is bleak
The duality of an anon.
>>
File: nomiku.png (687 KB, 1280x720)
687 KB
687 KB PNG
Ouch, it seems Flux fails to draw Hatsune Miku if you have too many girls in there.
This is a colorful digital drawing in an anime style, featuring four young girls playing a chess game on a pink table in a bedroom. The girls are dressed in school uniforms with white sailor collars and blue skirts. The girl on the left is Sailor Moon and has long blonde hair tied into twin ponytails, the girl in the center has pink hair styled in pigtails, the girl on the right has dark blue hair, the girl in the bottom right is Hatsune Miku, and there's a small black cat sitting on the bed on the far right. They are all sitting on the floor, focused on the game. Behind them, there is a large bed with a blue and yellow striped blanket. The room has pastel-colored walls with a window that shows a bright blue sky. The overall atmosphere is playful and cheerful, with bright colors and simple, clean lines typical of anime art.
>>
>>102938820
it's bleak for vramlets
>>
File: ComfyUI_04347_.png (1.98 MB, 1024x1024)
1.98 MB
1.98 MB PNG
Trying artstyles.
>>
All I know is that for generating hotties, SD 1.5 is still king.
They dont need to have functioning hands.
>>
>>102938739
>already ran this under 20GB
And it only took 60 hours for a 5 second clip!
>>
File: ComfyUI_04185_.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
>>102938855

3.5 has problems understanding the number of things or people you want in picture, if >=3

look at this,
kamala, trump, biden and putin should have appeared, but Biden has Putin's face (lol)
>>
>>102938892
he went for fp8 so he's saying that he was able to run it without any cpu offloading
>>
File: nomiku2.png (889 KB, 1024x1024)
889 KB
889 KB PNG
>>102938855
And here's SD3.5 which, besides also missing Miku, just ignores the 4th girl.
Where is my better prompt adherence?
>>
>>102938879
Some Analog Madness version had the hotties and good rate of good hands.
Also, FennPhoto just delivered good hands and general coherence, at the cost of compositions, so all they have the same pose.
>>
File: file.jpg (546 KB, 3361x1582)
546 KB
546 KB JPG
>>102938919
>Where is my better prompt adherence?
dalle3 still the goat
>>
>>102938963
Please, you just make me very sad as I crash back into reality.
>>
>>102938983
it is sad indeed, can't beat a year and a half model on prompt adherance, I wouldn't say it's that surprising, GPT4V is still the best caption model and neither BFL or SAI are using it to pretrain their model with it
>>
File: ComfyUI_00169_.png (1.94 MB, 1280x960)
1.94 MB
1.94 MB PNG
Damn the only time it got the size difference almost right it got every other detail wrong
>>
File: flux dev dalle3.jpg (760 KB, 2026x1024)
760 KB
760 KB JPG
>>102938963
I think that Flux dev is still almost as good.
Both pictures below share the same prompt.
The one on the right is Dall-E 3. The one on the left is flux-dev1
The tech is out there, it's just that prompt adherence is very expensive.
>>
>>102939090
>The tech is out there, it's just that prompt adherence is very expensive.
imho it's easier to inpaint than chase the perfect gen
>>
>>102939090
>Dalle3
Warm, textured, cozy, inviting
>Flux
Digital, cold, bland, boring
>>
File: ComfyUI_04233_.png (1.38 MB, 1024x1024)
1.38 MB
1.38 MB PNG
I shouldn't have posted this.
>>
>>102939159
that's true too. The style is bland, but the scene is perfectly set, you cant deny that
>>
>>102939169
why?
>>
File: ComfyUI_04239_.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>102939194
Because.
>>
>>102939242
everybody has their own fetishs anon, no need to be ashamed.
>>
>>102939170
It's not just the style, look at the background and foreground in comparison to Dalle3, it's a completely dull image to look at
>>
File: ComfyUI_04252_.png (1.71 MB, 1024x1024)
1.71 MB
1.71 MB PNG
>>102939253
It was just a byproduct of a failed prompt, mind you

sd3.5 merged the Elf and Trump. Pretty cool tho
>>
>>102939090
Huh? Your left part looks like garbage from here.
In fact, I'd be able to pick out a dalle 3 gen from a pair every time because of it rich detail and style it has.
It's like realizing all the open and free models out there are pure cope because I can't have Dalle 3 at home.
>>
File: cs.jpg (6 KB, 241x209)
6 KB
6 KB JPG
>>102937867
>BFL made the right decision after all.
>Can't do anything with the model except what the devs allow you to do.
You will own nothing and you will be happy.
>>
>>102939393
he's trolling
>>102937867
you're trolling, right?
>>
File: 202410230931-628866315.png (1.22 MB, 1152x896)
1.22 MB
1.22 MB PNG
>>
>>102939442
SD3.5?
>>
>>102939422
I should have formatted it like this:
I now wonder if
<the shortcomings were a good thing, if they had released Flux pro in the open we'd not have all this tech and discoveries.
BFL made the right decision after all.>
The last part is what I'm wondering about, too.
Remember if SD3.0 was as good as SD3.5 we may have not gotten Flux, it was created to fill the space of a new base model that everyone would adopt, and created for that reason.
So it was good SD3.0 was trash because we got flux>
>>
File: U_SUCK.png (1.83 MB, 1024x1024)
1.83 MB
1.83 MB PNG
>>102939442
You know the whole point of such pictures was to hide a message that can only be read on the thumbnail and it disappears when you open it, right?
>>
File: zunfat.png (1.05 MB, 832x1216)
1.05 MB
1.05 MB PNG
Using AI for evil!
>>
File: file.png (546 KB, 960x540)
546 KB
546 KB PNG
>>102939484
>>
>>102939484
Jokes on you, I can still read it if I squint
>>
>>102938739
>we'll truly get MiniMax at home!
doesn't seem to be made for image2vid, how do you make it happen? you change the code or you train it more or something?
>>
File: ComfyUI_01789_.png (1.72 MB, 1280x1024)
1.72 MB
1.72 MB PNG
>>102939489
Hahaha oh no!
>>
I said a few hours ago that the guy who made:
Verus_Vision_1.0b Flux Dev (de-distill) (https://huggingface.co/SG161222/Verus_Vision_1.0b)

would switch to SD3.5 in less than a week...
Well, it took less than a day:
https://huggingface.co/SG161222/RealVis_Large_V1.0
kek
>>
>>102939555
Did this guy ever made a good model in the first place?
>>
>>102939555
I tested his model and I don't really understand the point. It barely improves anything at all over the dedistilled model while making it much less flexible and still can't do nipples
>>
File: 3x_upscale_video_00016.webm (1.95 MB, 1440x960)
1.95 MB
1.95 MB WEBM
>>
>>102939603
CogVideoX?
>>
>>102939622
Yes, img2video
>>
>>102939641
CogVideoX is fine but I can't wait to test out Mochi 1, seems like we'll be able to run it on a 24gb card , if we're lucky we'll get the kino tommorow >>102938739
>>
>>102939660
Should be good, text2video only though right?
>>
>>102939568
The Realistic Vision models were some of the best SD1.5 of its time, and the #1 ranked finetune of SDXL was RealVisXL before Pony hit.
That's what made VerusVision so disappointing.
>>
>>102939660
>>102939687
https://github.com/kijai/ComfyUI-MochiWrapper
It's already here!
>text2video only though right?
yeah sadly, but it's possible to change the code to make it work with image2vid no?
>>
>>102939518
Should be easy, change the code that creates an image to skip it and use an user's image instead as if it was the one the model made.
>>
File: 202410230959-1761253436.png (1.32 MB, 1152x896)
1.32 MB
1.32 MB PNG
>>102939459
Flux dev
>>102939484
"such" pictures?
>>
File: file.png (33 KB, 2495x165)
33 KB
33 KB PNG
>>102939701
>It's already here!
I don't think it's finished, let the man cook kek
>>
>>102939701
I was told a 16GB card was suitable for amater AI video production. We were lied to.
(but maybe in another 24hrs it'll be true!)
>>
>>102939764
it depends on the numbers of frames you want to go for >>102939701
>so far highest I've done is 97 with the default tile size 2x2 grid.
97 frames is almost 5 sec + 20 fps, maybe you can go for less sec or less fps to make it work on your 16gb card
>>
File: hidden.png (468 KB, 640x629)
468 KB
468 KB PNG
>>102939510
Because you saw the thumbnail, the idea is to trick people that don't know you're insulting them.
Anyway, the point is that anon used AI to generate a pic of people walking around and it didn't spell anything with their clothes, so it was disappointing.
>>
>>102939776
It'd be nice if he mentioned how long it took
>>
File: notpepe0.png (647 KB, 576x512)
647 KB
647 KB PNG
>>102939746
Yeah, ones with hidden messages, except it's just random people walking without the messages.
>>
File: 202410230908-239481783.png (1.34 MB, 1152x896)
1.34 MB
1.34 MB PNG
>>102939789
It's an added bonus I agree, but it's not the whole point. I'm just trying to see what realism I can pull off with Flux without over-describing it
>>
>>102939818
>It'd be nice if he mentioned how long it took
it I had to guess I'll say twice as slow as CogVideoX-5b, because Mochi is a 10b, but I'm probably talking out my ass kek
>>
File: 02366-3441573212-1.png (841 KB, 768x768)
841 KB
841 KB PNG
>>102939827
Can flux do it? I've done it with earlier models
>>
>>102939833
At that point you can find the image you want by google searching. I think we passed "Wow! That looks so real!" a few months ago.
>>
>>102939701
https://huggingface.co/Kijai/Mochi_preview_comfy/tree/main
can we do GGUF of video models?
>>
>>102939848
No idea, but the code exists, so it could.
>>
>>102939848
>Can flux do it?
of course, flux can do image2image like every other models
>>
>>102939854
actually mostly i get is ad, stock or ai images
>>
>>102939836
I have a 16gb card and 50 steps on CogVideoX takes around 8 minutes.
16 minutes would be okay with me (assuming it fits in my VRAM) if it's the same quality as the previews, one of the problems with Cog is it spits out garbage 80% of the time
>>
>>102939865
This is done using controlnet with a QR controlnet model
>>
File: file.png (528 KB, 3209x1258)
528 KB
528 KB PNG
>>102939701
it doesn't specify the fps, do we know how many fps this model is running at?
>>
File: file.png (227 KB, 2815x1306)
227 KB
227 KB PNG
>>102939994
kek, got an OOM, maybe I fucked something up I have a 24gb card
>>
File: file.png (14 KB, 799x300)
14 KB
14 KB PNG
>>102940025
tf you mean OOM it hasn't even reached half of my capacity
>>
File: REALIMAGE.png (2.92 MB, 1536x1024)
2.92 MB
2.92 MB PNG
>>102939871
Learn to use negatives when searching.
I mean, would you be awed if Flux came with this picture? But, why? What's the point if I can find billions of pics like this online? Coming up from an AI doesn't make them special.
Realism does nothing, the point is generating something you can't find.
>>
>>102940043
Memory isn't allocated a couple of megabytes at a time, big operations will attempt to allocate gigabytes at a time.
>>
File: file.png (149 KB, 2106x808)
149 KB
149 KB PNG
>>102940124
Ok I guess? But I thought it would be possible for a 24gb card, that's what he said lol
>>
File: ComfyUI_04368_.png (1.85 MB, 1024x1024)
1.85 MB
1.85 MB PNG
Hmm,do they train all those new models with the same ' database' ? (forgive my terminology but i'm not an expert)
It seems that all recent models know a bit about Reimu (red white , black hair), but are clueless about all the other Touhou characters, just to make an example
>>
>>102940146
There's a lot of settings you can tweak to reduce memory usage, all of the memory is going to be in the sequence length which impacted by:
- the dimensions of the video
- the number of frames
These models are quadratic in memory costs, that means every frame exponentially increases the amount of memory you need.
>>
File: file.png (417 KB, 2550x1509)
417 KB
417 KB PNG
>>102940208
I don't have OOM during the inference, but during the very begining when it loads the fp8 model
>>
>>102940195
>It seems that all recent models know a bit about Reimu (red white , black hair), but are clueless about all the other Touhou characters,
that's because Reimu is the main character, the most known touhou character, so it's not surprising that we can see her on boring datasets
>>
>>102940238
Make sure you don't have any reserved VRAM from other processes, if people say it fits in 24 GB of VRAM then it should always fit if it's available.
>>
>>102940286
maybe it has to do with flash attention, I don't have it installed yet
https://github.com/kijai/ComfyUI-MochiWrapper
>Requires flash_attn !
>>
>>102940303
https://github.com/Dao-AILab/flash-attention/releases/tag/v2.6.3
>only linux binaries
Uh oh, that looks bad...
>>
>>102940327
the windows binaries are there, it only goes up to cu123 though
https://github.com/bdashore3/flash-attention/releases/
>>
>>102940195
Most models know the "normie" type of characters that you could ask random stranger on the internet and he would go: "Oh, it's that android sexy lady"
>>
>>102939746
nice
>>
>>102940369
I installed one of those wheels (torch 2.4.0 + cu123) even though I'm on (torch 2.5.0 + cu126), and obviously, it didn't work, too fucking lazy to downgrade though... god I hate python packages so much
>>
>>102939746
>"such" pictures?
No matter how perfectly your gen captures exactly what you wanted it to, there will always be some faggot who will accuse you of failing to make something completely different which is in his mind—"this isn't even the right style at all."
>>
File: file.webm (197 KB, 848x480)
197 KB
197 KB WEBM
>>102940486
I made it work (I installed pytorch 2.3.1 + cu121) and I went for this flash attention wheel (pytorch 2.3.1 + cu123) and I managed to make this (was purely a 1 sec test) kek
>>
By default the attitude of most of the "AI enthusiasts" in this thread to AI images is total disdain. "Why are you wasting my time with this AI slop?"

Look no further for your explanation of why the thread is dying. We just can't bother pretending we actually like any of this stuff.
>>
>>102940647
not him, but you're right, this is true of all conversations on the internet that have degree of human opinion. Including "score" charts for models on human appreciation of the image, i hate normies so much and they should be kept away from devs to avoid infecting them with their stink, it's repulsive!
>>
File: file.png (15 KB, 2775x101)
15 KB
15 KB PNG
>>102940710
for 37 fps I have this speed, not bad
>>
>you guys don't really care about making AI pictures
huh?
>>
>>102940724
>37 fps
*37 frames
>>
>>102940724
>37fps
what?
>>
>>102940727
they care about making their own images, and regard everyone else's as a waste of their time and attention. There are exceptions, I merely say this is the default attitude of the average poster.
>>
>>102940737
>>102940733
>>
>>102940740
I mean, that's the whole appeal of genning, you can do what you want without having to rely on anybody else.
>>
>>102940751
he's just mad nobody likes his gens
>>
>>102940740
>and regard everyone else's as a waste of their time and attention
better to have anon call your image shit so you maybe try a different approach than to be constantly inundated with "wauw so cool! :P" and never grow
also what >>102940751 said
>>
File: 162946_00001.webm (196 KB, 848x480)
196 KB
196 KB WEBM
>>102940710
Only 13 frames at defaults. 16gb -70% utilisation during sampler, 90% during decode.
>>
File: file.png (152 KB, 849x1477)
152 KB
152 KB PNG
>>102940806
I tried 25 frames but it overflowed my 24gb card, did you touch anything in there? I tried to activate vae tilting but I got a python error lol
>>
>>102940806
>13 frames
>2 seconds
so that model gives 6fps videos? LOOOOOOOL
>>
>>102940822
I oomed on decode the first run so changed the decode frame batch size to 4 or 2, i cant recall, everything else was untouched.
>>102940836
No, i set the video combine to 6fps, you can set it to whatever you want based on the bumber of frames you generate.
>>
>>102940766
You don't understand me at all. Telling someone their gen needs improvement is good. It means you looked at it and formed a judgment and thought it was worth the time to write out that judgment.

The hugbox is worse. Hugboxers don't care about the gens, they're networking with forced pleasantries. They disdain criticism because they can't imagine the images themselves being more important than the feelings of the creators.

The common practice here is to ignore everything, assuming at the outset that it's worthless. Probably it is, but openness to the possibility of being wrong no matter how many times you're disappointed is what keeps the thread alive.
>>
File: 00010-378968932.png (1.75 MB, 1024x1536)
1.75 MB
1.75 MB PNG
>>
>>102940855
>No, i set the video combine to 6fps, you can set it to whatever you want based on the bumber of frames you generate.
yeah but if you increase that framerate it'll simply accelerate the video, I feel like it's default speed is at 6fps, could be wrong though
>>
Fresh

>>102940941
>>102940941
>>102940941
>>
>>102940765
My gens have done pretty well. I have no complaints. It just bothers me whenever I see the attitude expressed that it's almost rude to post gens and expect people to see them—I am seeing the growth of a very anti-gen atmosphere here, a sort of "why did you post this? fuck off" vibe, in its early stages. On the rest of the internet most would flip a switch to make all AI "slop" disappear if they could. If that feeling catches on here too then we might as well not have this thread at all.
>>
>>102940900
Oh i get what you were aiming at. I don't know it may say in their paper. I'd guess its like 24 or 30 by default judging by the clips they've put out.
>>
>>102940718
>why the thread is dying
who said that?
>>102940948
>anti-gen
no it's anti "you've seen my images posted many times before so i will attach a recognizable image to this post so that anon thinks im a thread authority on this matter"
the only time this is merited is when you're showing an example of something for instance a new approach or sampler etc
>>
>>102934088
wtf man
>>
>>102939323
>because I can't have Dalle 3 at home.
More like because we cant create porn with dalle 3



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.