[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models and UI

Prev: >>106457557

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassic
Chromaforge: https://github.com/maybleMyers/chromaforge
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://tensor.art
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://rentry.org/wan22ldgguide
https://github.com/Wan-Video
https://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Samplers: https://stable-diffusion-art.com/samplers/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbours
https://rentry.org/ldg-lazy-getting-started-guide#rentry-from-other-boards
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
Cake.
>>
>>106464276
you forgot to read anistudio
>>
File: AnimateDiff_00265.mp4 (3.65 MB, 720x1040)
3.65 MB
3.65 MB MP4
>>
Cursed thread of hatred and animosity
>>
Blessed thread of frenship
>>
File: ComfyUI_temp_knjof_00001_.png (2.67 MB, 1024x1024)
2.67 MB
2.67 MB PNG
>50 steps euler/simple
Yeah radiance needs to stay in the oven
>>
>>106464308
TO VALHALLA
>>
>>106464345
I am curious as to the final result but to me the entire concept seems backward. A big reason we use vae was to cut computational cost, the very first models didn't use vae and was a pain to train and gen with.
>>
so chroma.... what happened there???
>>
File: 1745435907365688.png (860 KB, 900x716)
860 KB
860 KB PNG
Can someone share a good workflow to upscale images and them still looking sharp?
What I tried so far looked like shit.
>>
Is it hard to set up a local model? I've tried to watch a few guides and it seems quite overwhelming.
And would a 9070xt be good for generating videos/images?
>>
File: IMG_2925.jpg (3.28 MB, 5712x4284)
3.28 MB
3.28 MB JPG
>>106464424
at least yours is INSIDE your case lmao. This is what i get for trying to upgrade a prebuilt. Card don't fit. psu don't work. bought a second replacement psu but it doesn't have enough PCIE plugs so i have to keep using the original psu as well.
but it works. so
>>
>>106464510
>AMD
it's 10x harder than the plug n' play Nvidia. If you thought it was hard after watching videos it's not going to be a good time
>>
File: 1740252647672142.png (1.63 MB, 991x1562)
1.63 MB
1.63 MB PNG
>>
>>106464404
too much the diaper
>>
File: AnimateDiff_00259.mp4 (2.15 MB, 720x1280)
2.15 MB
2.15 MB MP4
milk
>>
File: ComfyUI_temp_nkjid_00006_.png (3.77 MB, 1536x1152)
3.77 MB
3.77 MB PNG
>>
>>106464643
Are you pretending that you can't prompt with 12gb vram ? You must be some /saasdg/ fag who has never user local.
>>
File: AnimateDiff_00268.mp4 (3.14 MB, 720x720)
3.14 MB
3.14 MB MP4
>>106464404

A bit of a disappointment. I had high hopes for a model with a large character database like Noobai, with the advantage of natural language and the variety of styles.

Unfortunately, it only recognizes a few more characters than Flux or Qwen, but the advantages end there.

Since Qwen arrived, I can easily create a scene, add a background, and add characters described just with a prompt. Then I edit the characters with Noobai InPaint.
(pic related)
If they don't do something, Chroma will soon be forgotten.
>>
File: 1743836343824727.jpg (1.55 MB, 2016x1152)
1.55 MB
1.55 MB JPG
>>106464736
It's interesting that qwen has almost no traction on civit compared to chroma.
>>
>>106464510
If you can follow instructions on a github repo, installation shouldn't be a problem. Coming from an AMD user, you should avoid AMD if possible. Its gonna be more of a headache to setup than an Nvidia card. If you already have the card though, and are on Windows, I've had the best luck with this repo: https://github.com/patientx/ComfyUI-Zluda
>>
>>106464749
Probably because qwen is like 20gb and vramlets can't use it.
>>
>>106464345
skill issue.
you're using either the wrong settings or you are a promptlet. try euler simple 24 steps
>>
File: 1740294979677564.jpg (1.04 MB, 1824x1248)
1.04 MB
1.04 MB JPG
>>106464797
I use Q4 with a 12GB card. The loss of quality is absolutely negligible and the speed is comparable to Chroma.
>>
>>106464811
retard
>>
>>106464711
vramlet cope
>>
>>106464812
>the speed is comparable to Chroma.
I haven't used chroma but if that's true then it's slow as all fuck. takes several minutes for one image on qwen q4 on my 3060 12gb.
>>
>>106464819
damn you really got me there didn't you.
pathetic. hate on a model because it's bad, not because you are incapable of using it.
>>
>>106464835
It's the same prompt I use on normal chroma and it is perfectly fine there. And less steps won't make it better.
>>
qwen has nunchaku so it's vramlet friendly. chroma doesn't.
>>
>>106464859
Still doesn't have lora support.
>>
im not fond of the model but for more nuanced reasons than you
>>
>>106464749
i can't figure out how to train it. the aitoolkit method is like 2bit or some nonsense and the kohya's musubi seems way too complicated. i have 24gb (bare minimum btw).

but the reason qwen, hidream, etc get no traction on civit is because nobody can run/train it. sdxl is cheap and easy to use so it has tons of content. even flux, which was hailed as the savior of local and better than dalle3, barely received any resources in comparison to XL.
>>
>>106464842
post the prompt. all my usual chroma prompts work fine on 20~30 steps. not even normal chroma uses 50 steps so you are going full retard here.
>>
I am trying to get an anime character to take off their wristwatch in wan2.2, but it is outright not playing ball.

Did the chinese developers just not include that in the model's training? I would have thought it would be fairly simple since it can do clothes stripping well enough
>>
>>106464749
what are you talking about? qwen has far more lora than chroma on civitai.
>>
>>106464918
i don't get why people don't realize you can rent a 40GB A40 for $0.40/hour and train a lora for like 1usd cost
>>
sars
>>
>>106464812
>The loss of quality is absolutely negligible
lol
>>
>>106464301
that level of titty squish up needs to be mandatory dress for all hot women.
>>
>>106464964
It is tho. The only loss it with the text capabilities. Image is barely touched.
>>
>>106464964
Oh, go on, prove me wrong.
>>
>>106464941
its not a 1click setup. majority of lora trainers just use civitai to train loras, or the free-tier google colab (emphasis on the free part). even $1 is a barrier of entry too high.
>>
>>106464977
>>106464984
you people clearly have low standards and/or are blind so no point showing examples
>>
>>106464999
So all bark no bite, eh? Fuck outta here.
>>
quality loss only kicks in at the level below whatever quant i can run
>>
>>106465006
Q4 looks just like Q8 looks just like FP16 looks just like FP32 I swear guys I swear!
>>
>comparing Qs to FPs
retard alert
>>
>generates plastic fluxslop at Q2
saaaarssss where cellphone camera lora for fix output
>>
>>106465015
>got blown the fuck out so badly she just started pissing and shitting herself all over the thread
>>
>>106465030
>thinks quantization is magic
>>
>>106465033
indians have a tendency to fall for cheap magic tricks
>>
>>106465033
Post several comparisons across different quants. Assuming you can even run any of them, attention whore.
>>
>>106465049
you must be a newfag if haven't seen the endless quant comparisons already
>>
>>106464276
Nano banana is insane for genning subject LoRAs without a lot of pics. There's nothing that comes close. When will local catch up?
>>
>>106465020
>implying fp8 looks better than Q quants > Q4
>>
why did they call it nano banana though
>>
>>106465059
qwen edit is the only contender really
>>
>>106465057
threads recently have been overrun with saars trying to hop on the video bandwagon. they don't like to accept the fact that trying to fit a 20b param model into 12gb requires corners to be cut.
>>
>>106465069
most engineers at Google have nano penis. Asian genetics aren't very kind
>>
>>106465015
>Q8 looks just like FP16
To be fair, it's really, really close.
>>
>>106465057
Still see no comparisons from you
>>
>>106465069
because it is under 1B parameters
>>
File: Chroma_00010_.png (1.76 MB, 920x1536)
1.76 MB
1.76 MB PNG
>>
File: test.jpg (813 KB, 2784x2496)
813 KB
813 KB JPG
Testing out my fashion wildcards on Qwen with different other models right now.
Which one of these do you guys like more?
Mostly in terms of face.
Prompt adherence is pretty damn good on Qwen, though, gotta admit.
Here's the Yume Kawaii prompt I used:
>A highly detailed photograph of a young japanese woman. She sports an oversized white t-shirt dress with a magical girl transformation sequence print in pastel gradients, worn over a lavender tulle petticoat that peeks out below the hem. Her feet are wrapped in platform boots covered in pearl white holographic material with chunky 4-inch soles and lace-up fronts adorned with star-shaped charms. A sleeping mask accessory rests on her head like a headband - pale pink satin with gold embroidery reading "Sweet Dreams" and dangling pearl chains. Her hair is a long pastel lavender wig with bangs, styled in loose waves and decorated with tiny LED star clips that twinkle softly. Her face showcases pale blue circle lenses, white eyeliner drawn in star patterns at the outer corners, cotton candy pink blush applied generously under her eyes, and glitter tears made from iridescent gems.
>>
Can we fast forward to when NetaLumina or one of its derivatives is good enough to replace Noob
>>
>>106465072
Close, but no dice. Photorealism just isn't there yet.
>>
how do we fast forward to a timeline that doesn't exist?
>>
>>106465125
right looks better. eyes on the left look digitally fake
>>
>>106465059
>When will local catch up?
Comparing local against a huge new model that can't run on consumer hardware even if it wasn't proprietary, are you retarded or just a saas shill ?
>>
>>106465125
neither are showing pantsu so both are bad
>>
>>106465132
Quite honestly i'm already using it a lot. Very fun to play with because of the insane prompt compliance.
>>
File: AnimateDiff_00104.mp4 (2.99 MB, 1280x720)
2.99 MB
2.99 MB MP4
>>
>>106465184
Not enough artist knowledge desu, and I'd rather not do a second pass with Noob. Not saying you can't get pleasing results, just seems like it needs a bit more time in the oven.
>>
>>106465173
Most of their "huge" parameters are only being put to use for the prompt following. For simple ID copy/style, we do not need so many parameters. Qwen Edit, Kontext Pro/Max should tell you that (though it's API only, but Dev is same size). Local is behind, but that far behind. We just need better models.
>>
>>106465237
>but that far behind.
but not*
>>
>>106464749
Ehh? I barely see chroma LoRAs. Qwen is getting a lot more. Still flux and illustrious seems to be the most though. Did you just look up "Chroma" and saw there was more? lol
>>
>>106465223
nta prompt adherence is the most important thing to me personally, I am very much bored of noobs/illustrious compositions so completely on the Neta train.
>>
>>106464736
You have no idea what you're talking about. Chroma (the model base) is still more powerful than any other model we've seen so far. An additional finetune would help it learn styles/characters, but the caveat is that it requires compute that even small companies do not have. Neta Lumina on the other hand is quite good with anime/styles and it should be a breeze for anime tunes. Chroma is still SOTA for local photorealism NSFW.
>>
>>106465287(me)
Also Qwen for more interesting prompts Neta can't do. I was hoping to use chroma but man is it bad at anime like no idea why photoreal is fine but as soon as the word "anime" is anywhere in the prompt everything about the model takes a giant nose dive.
>>
>>106465237
Shit you just made up

Qwen Edit which is far behind Nano is still a pain to run on local due to its resource demands, there's no magic that will make the gap disappear between SOTA saas running on extremely expensive hardware versus local models to be used on consumer hardware

But local is still better overall, because you can finetune those models to be much better than SAAS models for specific concepts, and even concepts that SAAS models will never allow due to NSFW censorship

But for pure technical prowess, SOTA saas would have to be totally incompetent for local consumer models to compete
>>
>>106465335
>Qwen Edit still a pain to run on local due to its resource demands
Works easily and fast on a 500$ 3090 at basically fp16 quality of Q8
>>
>>106465335
Perceived nano superiority is due to a slight better adherence given any scenario. Again, local can catch up given a decent unfiltered/non-distilled model.
>>
>>106465335
Nano is shit. I can't say I got even one good result out if it. Colorize, it misses half the image, "show me character behind" the damn thing gives some weird combo of front of the character clothes that makes no selnse, "complete the rest of the image" gives me either a dwarf cause you can't change the aspect ratio of the image or just some nonsensical design. I genuinely believe this is some sort of weird as gaslighting from google. There has been multiple comparison on reddit too and almost all of them show how much nano fucks up the image. The censorship is just a cherry on top.
>>
File: Chroma_00022_.png (1.94 MB, 1152x1304)
1.94 MB
1.94 MB PNG
>>
>>106465287
For sure, and to be clear it can do SOME artists but I'm tired of models that are technically good with the caveat of needing a lora for styles that other models can do OOTB.
>>
File: edit leaderboard.png (131 KB, 1846x1265)
131 KB
131 KB PNG
>>106465059
sure qwen is behind nano banana, but it's not that far behind 4o, which was considered to be an impossibly massive gigasaas saas monolithic 300b model. i wouldn't discredit local so soon, especially since it's roundhouse kicking BFL's faggot API license shit right in the face. plus qwen have hinted that they're already working on a v2
>>
File: QwenChromfaceWan_00005_.jpg (322 KB, 1392x2496)
322 KB
322 KB JPG
>>106465170
Alright, thought so as well. Eye colors are really weird on base Qwen. Tends to lean hard into extreme colors.
>>106465176
Now that's just lewd.
>>
>>106465388
This Wicked City, Ninja Scroll anime style still holds up
>>
>>106465343
Define “fast”. Everything seems to be getting progressively slower than sd1.5, which is “fast” in my mind
>>
File: Chroma_00025_.png (2.02 MB, 1152x1304)
2.02 MB
2.02 MB PNG
>>106465445
yeah it's timeless
>>
>>106465395
>or styles that other models can do OOTB
Those models are finetunes specifically trained on art styles though, not really comparable to a base model which needs to know some of everything
>>
>>106465471
I was thinking Cyber City Oedo 808 has this style as well but I'm not sure.
>>
>>106465472
Which is why it needs more time in the oven
>>
>>106465457
less than a minute for an edit that would have took much more if done with inpainting, photoshop, regenning or anything similar
>Everything seems to be getting progressively slower than sd1.5
hardware didnt improve much compared to the models, no shit toy models from 3 years ago will be nothing to modern day models that are order of magnitude better
>>
>>106465457
because hardware hasn't kept up. 4090->5090 isn't as much of an improvement as 3090->4090. the vram gain is worthless because they didn't want to cut into their (already dated) a100s which are still selling for 5 figures. pray for chinese asic or something idk
>>
>>106465343
where are you finding 3090's for $500
I just searched ebay and the lowest price is $750~$800
>>
Is there anywhere I can download a decent dataset of a few thousand (at minimum) random photos? I want to train a realism lora, a good one, so I need thousands of images.

I've tried searching Huggingface. It's a disaster. Anything sourced from Pexels is unusable. The images are majority "artsy" type stuff, bokeh out the ass, just a weird slopped look. There is a dataset that claims it is 120M images from Flickr, but it is actually 3000 unique images duplicated millions of times each (yes, really, 500 downloads per month btw). There are various Flickr30k, Flickr8k, etc datasets, but they are resized to low resolution with no original URLs.

I'd really want to avoid resorting to scraping some website myself, but it looks like that might be the only option.
>>
>>106465502
there's quite many, like: private eye goku, vampire hunter d etc. they had pretty high budgets back then
>>
>>106465543
why would anyone collect a great dataset for no reason? the dataset collection is the bottleneck, not the lora training, fire up that yt-dlp and gallery-dl
>>
>>106465395
If you are curious here is a list someone did for the artist Neta can do, I don't think they went through the danbooru ones though but yeah overall still needs more cooking time.
https://neta-lumina-style.tz03.xyz/
>>
>>106465567
>why would anyone collect a great dataset for no reason?
To share with the community so everyone can improve their models? That's the whole point of HF. There are plenty of image datasets there, they are just complete ass for various reasons. Either Pexels slop, or downsampled Flickr, or "whoops I fucked up and duplicated each image tens of thousands of times teehee sorry"
>>
>>106465621
>To share with the community so everyone can improve their models?
then you wouldnt need to train a lora on those images
>>
>>106465184
What does a neta prompt look like? Last time I looked through their user guide, their prompts were the most convoluted shit I have ever seen since Pony and the score_7, score_8, score_9 days.
Did you train any loras for it?
>>
>>106465610
based should be in OP and replace the khroma section
>>
>>106465610
some of these are good but a lot are scarily bad, especially how it quickly replaces basic animals with humans
>>
>>106465471
>>106465388
>>106465113
>>106464736
These belong to Anime Diffusion Thread.
>>
>>106465723
Cope and seethe
>>
File: AnimateDiff_00279.mp4 (2.41 MB, 832x832)
2.41 MB
2.41 MB MP4
>>106465132
>>106465184

oh shi- forgot about Neta lumina. Too busy with wan 2.2 and qwen, but i'll try it again.
>>
>>106465738
>tranimetard is retarded
basically every time
>>
>threadshitting faggot is a tourist
basically every time
>>
File: ComfyUI_temp_cxpmt_00001_.png (1.65 MB, 1152x1152)
1.65 MB
1.65 MB PNG
If I have a dataset with hundreds of pics, can I get away with training less steps/epochs?
>>
>>106465757
>tranimetroony so mad and castrated by his hrt hes to scared to even reply
>but still has to reply in the thread to lash out for attention like a child he is
uh oh, not again tranimesisters, maybe... maybe if we say "tranime website!!!" on cue now we can recover?? lmao
>>
>>106465765
no. the more images you have, the more steps you should train for with a lower learning rate
>>
what caused the absolute seething meltdown demonstrated above?
>>
>if i call the guy btfoing me mad i dont have to engage with the argument
>no, i am NOT a child btw
tranimetards really are embarrassing...
>>
File: Chroma_00039_.png (2.15 MB, 1456x992)
2.15 MB
2.15 MB PNG
>>
BIG stinky
>>
File: GeYnhsKX0AAT7BK.jpg (361 KB, 2048x1024)
361 KB
361 KB JPG
>solves the fluxplastic problem
>vanishes
what was their endgame?
>>
>>106465539
I don't know why but people keep repeating this bullshit. It's just straight up false. Maybe you can find one on craigslist for $500 and go that route if you're a giga retard.
>>
File: file.png (402 KB, 823x1045)
402 KB
402 KB PNG
>>106465539
>>106465849
huh?
>>
>>106465543
Flickr urls are standard format with no secret up to 1024
https://www.flickr.com/services/api/misc.urls.html
If they have the original ID you can grab them easily
Other than that scraping flickr is easy, you don't need to register for their api, just grab the key from the site and use the site's endpoint instead. Go for the Explore page on a few dates, should be enough for your needs and inherently high quality because they're featured
t. scraped 5b from flickr (no you can't have it)
>>
>>106465871
>pounds
RETARD
E
T
A
R
D
>>
>>106465871
anon said $, as in USD
£500 is $669.38
>>
>>106465889
fair enough, i didn't realize how weak the dollar is recently so yeah you're right they're about 700 usd
>>
>>106465871
oi you got a loicense to post here?
>>
>>106465849
browse other sites, more local sites
>>
>>106465909
>oi you got a loicense to post here?
yes but probably not for long. enjoying my final days before the govt bans 4chan
>>
>>106465960
isn't kiwifarms and 4chan suing them? lmao
>>
>>106465879
Ok thank you anon. I also was finally able to find LAION highres subset on HF, which appears to be decent. Currently scraping a subset of those URLs right now.
>>
There was someone in the previous thread that asked why corporations aren't doing what radiance is doing. The answer is that radiance is based off of a paper that is about a month old.
https://arxiv.org/abs/2507.23268
>>
>>106466134
https://huggingface.co/datasets/madebyollin/megalith-10m
You could try this but it's cc0 trash so probably low quality aesthetically
>>
>>106466144
Yes.
But if the furryfag really want to go VAE-less, then he should also go with the route of sub-quadratic complexity attention
>>
>>106466144
I don't know about the approach radiance is doing but it has been explored before a while back.
https://github.com/ShoufaChen/PixelFlow
>>
My return is on the horizon
Just wait for me to get back to speed and master chroma
>>
>>106466430
big if true
>>
>>106466226
>sub-quadratic complexity attention
Is this a speedup?
>>
>>106466439
It will be, I apologize in advance the no nothing hamsters that shit up the general and can't be happy with containment will go nuclear
>>
File: ComfyUI_temp_rpeaj_00042_.png (2.15 MB, 1152x1152)
2.15 MB
2.15 MB PNG
>>
File: Jubileebakuhatsu1.jpg (1.83 MB, 3000x1537)
1.83 MB
1.83 MB JPG
Any way to gen a girl causing an energy explosion out of her body?
>>
>>106466545
try with a regular prompt but in my exp it will probably need to be rewritten to be more detailed by some ai and then try that
>>
File: fixing images 1.png (1.65 MB, 2048x2048)
1.65 MB
1.65 MB PNG
How does inpainting on comfyui work? The official comfyui workflow doesn't work well as A111/reforge method.
>>
File: ComfyUI_temp_rpeaj_00005_.png (3.66 MB, 1664x1152)
3.66 MB
3.66 MB PNG
>>
>>106466582
>part 2 - 1
do you have the other parts?
>>
File: fixing images 2.png (3.31 MB, 2048x2048)
3.31 MB
3.31 MB PNG
>>106466631
>>
File: fixing images 3.png (3.18 MB, 2048x2048)
3.18 MB
3.18 MB PNG
>>106466631
>>106466642
>>
>>106466582
>How does inpainting on comfyui work?
It doesn't :^)
>>
imagine getting filtered by more knobs and buttons
>>
Can I assume that 95% here generate nsfw but are over the post only phase and are now larping as photographers prettifying their photos of the year?
>>
>>106466679
You have never made anything
>>
>>106466679
Inpainting in comfy sucks tho
>>
File: AnimateDiff_00270.mp4 (3.23 MB, 720x960)
3.23 MB
3.23 MB MP4
>>
>>106466710
only for subhuman mouth breathers that are probably using windows or apple anyway not like their opinion matters
>>
>106466729
stop damage controlling for comfy lil bro. Anyone who has ever used any graphic problem can tell it sucks. Shitty mask editor with clunk UI, no filters or other tools. Nothing.
>>
>>106466578
I can do Jubilee in a hoodie, easy
Bald Jubilee after the explosion and the little outfit she is given
It's the explosion itself that's the hard gen, glowing skin, floating, clothes disintegrating, center of energy or energy ball in center, flowing hair, glowing eyes, glitter, fireworks, abstract background, bright background,
>>
>>106466755
>too pussy to actually reply
concession humbly accepted on behalf of comfychad and his asian QT gf
>>
File: 1731843127812647.png (106 KB, 771x526)
106 KB
106 KB PNG
>>106466710
I agree it was pretty bad before but get this node in impact pack, it gets you pretty much 90% of a1111 functionality with it. Only thing missing is using an upscaler model for the upres before inpaint.
>>
>>106466784
Hed rather seethe than learn something new.
>>
>people now optimize everything around light loras so that using regular 20 step settings, normal cfg etc actually produces worse results with a lot of loras now
It is what it is, light is worth it, but its grim that some movement quality for a lot of loras is basically better in 2.1 until better loras are published
>>
>>106466784
Or I can just use Krita and have a normal human gui.
>>
File: In paint 2024 04.jpg (1.42 MB, 2500x2900)
1.42 MB
1.42 MB JPG
>>106466784

This node worked wonderfully. Thanks for the tip.
>>
>>106466835
I only use light loras for posting memes/garbage for other anons. I'd never use it for my own content. It's actually not that bad for anime since anime doesn't tend to need high motion fidelity anyway.
>>
>>106460743
If you're still around can you upload a PNG to catbox? Jpeg doesn't preserve metadata
>>
>https://www.illustrious-xl.ai/sponsor
These guys are sitting on trained models and just let the hype die. Big brain.
>>
>>106467247
Are they at least trying to graft an llm onto those models or is it all clip?
>>
nice collage
>>
>>106465502
>>106465445
>>106465557
My niggas have taste

>>106465723
Nah, that thread is for straight up pedos, above can stay

>>106465871
Buy 2 for training
>>
>>106467247
By the time those release we'll have a model more widely used than SDXL, making it completely pointless.
>>
>>106467247
I am thankful we got what we got. Praying for a leak. Expecting nothing.
>>
File: 1728921925257796.jpg (901 KB, 1344x768)
901 KB
901 KB JPG
>>
>>106467347
i kneel
>>
comfy should be dragged out on the street and shot
>>
so, anything new in the world of image/video genning?
>>
>>106467347
Composition on this is great, really deserves an inpaint pass on the faces and hands to add that extra detail
>>
does qwen nunchaku work with loras?
>>
>>106467275
No clue. Newer versions have way better compatibility with natural language.

>>106467328
>By the time those release we'll have a model more widely used than SDXL, making it completely pointless.
Idk man I think we might be stuck with current models for a while, which would suck

>>106467338
>I am thankful we got what we got
Yeah me too, I even put $50 for their stardust, but I guess it doesn't matter
>>
File: 1731123809069982.jpg (1.8 MB, 2016x1152)
1.8 MB
1.8 MB JPG
>>106467394
This is inspired by Henry Fuseli and quite honestly his own faces and hands/legs are quite messy too.
>>
>>106467433
>Yeah me too, I even put $50 for their stardust, but I guess it doesn't matter
$50 out of the $300,000 they need.
>>
How much does Qwen edit suffer from quanting?
>>
>>106467459
I don't think quanting hurts it as much as the speed LoRAs do. Up to Q4 should be alright
>>
>>106467459
>>106467501
Q8 or bust for image edit models if you want to keep quality over multiple passes
>>
>>106467516
Assuming you properly mask your edits it shouldn't matter. Multiple vae passes probably hurt the image way more than the quants would.
>>
>>106467516
Oh the Q8 is only ~20GB. I though it's gonna be far worse. Is the template workflow in comfy fine?
>>
>>106467437
Neat. Did you use a cnet or i2i as well?
>>
File: 1731036954930137.jpg (1.75 MB, 1248x1824)
1.75 MB
1.75 MB JPG
>>106467555
Nope. Chroma is just a good model for these styles and all sorts of weird compositions.
>>
>ComfyUI: v0.3.50
is it safe to update to 0.3.54? any noticeable regressions?
>>
>>106467437
This is garbage. I get it. The renaissance era was full of people finally figuring out how to paint, but larping as a classical fag as if you have this deep connection to the art world, especially when using AI, makes you look like a twat.
>>
>>106467584
Memory is still fucked other than that nothing really special one way or another.
>>
File: screenshot.1756852177.jpg (125 KB, 674x493)
125 KB
125 KB JPG
>>106467584
0.3.54? we're on 0.3.56 now. to answer your question, it's perfectly fine.
>>
>>106467582
how did you upscale it? looks nice
>>
>>106467692
I just natively prompted it in this resolution. Chroma can easily stretch to nearly 2k without losing coherency.
>>
>>106467692
Chroam can rawdog 1080p
>>
File: elf hugger_00431_.png (1.48 MB, 824x1600)
1.48 MB
1.48 MB PNG
I like krea.
>>
>>106467437
Yeah, I looked at his art and although he paints great details in many pieces, he also draws many with undefined features much like in this image
>>
>>106467433
>man I think we might be stuck with current models for a while, which would suck
If you’re just doing big titty 1girls, how would it suck? The newer bigger models are just slower and don’t particularly make anime even in different styles better than noob/illus models. I can’t imagine how a “better” model would actually improve the output much.
>>
File: 1729280580258672.jpg (379 KB, 1024x1024)
379 KB
379 KB JPG
>>
>>106467753
> I can’t imagine how a “better” model would actually improve the output much.
prompt adherence + better handling of multiple subjects. If all you care about is 1girl standing in a basic pose, then yeah noob/illus will be fine for ages.
>>
>>106467542
https://www.reddit.com/r/StableDiffusion/comments/1myr9al/use_a_multiple_of_112_to_get_rid_of_the_zoom/
>>
>>106467768
learn to regional prompt and inpaint
>>
>>106467845
Already know them. This isn't about me.
>>
>>106467845
learn that that's no replacement for a model that can do multiple subjects without bleeding
>>
>>106467902
and current day ai isnt even safe from vae loss let alone autoregressive let alone a replacement for brain to computer interface reading your thoughts and putting them into pixels and yet you can create 99.99% of everything that you want all the same
>>
>>106467407
my niggas...
>>
>>106467635
I updated a day or two so ago. It was fine.
>>
File: WanVideo2_2_I2V_00297.webm (220 KB, 1248x720)
220 KB
220 KB WEBM
GM
>>
>>106467845
>adding a turbo to VW golf will make it a porsche !!1!
>>
>>106468175
post an example change thats basically impossible with current models with inpainting, regional prompting and controlnet tools
>>
File: blondyyy.jpg (289 KB, 1017x1240)
289 KB
289 KB JPG
>>106468131
he seems like he smokes a LOT of cigs lately ;c
>>106465871
>founders
that fat brick wont even fit in my pc case (hotbox)
>>
the joke is that modern models are better at it without needing to mindbreak them with a dozen external tools
>>
Since there are some nice chroma users in this thread and I want to try croma on my vramlet setup:
How well does Chroma run on 8gb vram + 16gb systemram?
Which version and quant should I use?
should I just go for the smallest possible quant, regardless of the newest version?

I'm guessing this is the right place to find it: https://huggingface.co/silveroxides/Chroma-GGUF/tree/main

There are so many versions. Are there big differences between them? I can see they are frequently updated.
>>
>>106468195
>this job needs to be done with an all in one tool instead of using whatever is best for the particular task because... it just does, ok!?
>>
>>106468228
Step 1: buy more ram
>>
>>106468228
Order more ram while you wait for https://github.com/nunchaku-tech/nunchaku/issues/431
>>
>>106468131
>Hercules going through his teen angst emo phase
Probably posted nudes of himself on /b/
>>
>>106468228
those are old versions. here are the "final" versions:
https://huggingface.co/QuantStack/Chroma1-Base-GGUF/tree/main
https://huggingface.co/QuantStack/Chroma1-HD-GGUF/tree/main

>8gb vram
oof
>>
>>106468228
No idea, but you should be using https://huggingface.co/rocca/chroma-nunchaku-test
At least until the official nunchaku Chroma drops

Alternatively you can try 4bit GGUF of Chroma Flash HD (you'd use the recommended 8 heun steps for that model)
>>
>>106468235
Yes? I mean by that logic might as well go back to the good old days of 1.5 and spend 20 hours inpainting one shitty pic lol
>>
>>106468285
>ad absurdum
npcs are incapable of logical thought
>>
File: hero_.jpg (1.07 MB, 2616x2000)
1.07 MB
1.07 MB JPG
>>
>>106468285
>why can't i just attach a saw blade to my drill instead of HAVING to use a circular saw??? THIS TECH IS TRASH
>>
>>106468193
catbox?
>>
>>106468298
my boy Spyro grew up
>>
>>106468314
you think he puts his benis into her???
>>
>>106468296
luddite are incapable of looking ahead so I guess we are at a stalemate lol. Hey moron inpainting and region isn't magically going away cause these models are getting better at prompt adhesion you can still jerk yourself using them just that base image will be made faster and will need less edits.
>>
>>106468228
You could *probably* train Chroma in OneTrainer using nfloat_4 for both model and text, but I don't know if the quality drop would be worth it, that said with 8gb vram you will have to accept a quality drop.

If you use OneTrainer, you don't download any special quantization model, you select quantization (in your case nfloat_4) in the trainer (model section) and it will do it 'on the fly' on the full model.

OneTrainer has a 8gb vram Chroma preset, but I doubt it can run with just 16gb system ram.
>>
>>106468334
i never said the models shouldnt get better nor that they wont, even to a point one day where most things will be automated so it might as well be all "1 thing" or even mostly just 1 ai model
i argued against the cope that with all the modern tools even today like regional prompting and good inpainting uis you cant do things like multiple subjects without bleeding

and given that the retard cope continued without being able to actually post a single example where this is impossible >>106468185
i accept your concession, and dont bother coping further without posting proof, luddite retard
>>
>>106468264
>>106468265

What's the difference between the base, the hd and the nunchaku versions?
I want an allround model that's as versatile as possible with the most context awareness possible.
>>
Alright, why does Comfy load the face detailer model once the workflow starts even if that's only used once the first inference is done?
Why doesn't it just load it sequentially?
>>
>>106468399
>Nunchaku is a high-performance inference engine optimized for 4-bit neural networks, as introduced in our paper SVDQuant.
best vramlet cope tier for fags like you with no vram, wait for this to support chroma, then
>between the base, the hd
use hd
>>
>>106468399
The Nunchaku one is an outdated hack and not really for general use. Since you're vram limited I would suggest trying lower resolutions (768 and below) with Base. HD loves blurring outputs but you can try it too if you want.
>>
>>106468399
base is 48 epochs trained at 512 resolution

hd is those 48 epochs + a lot of epochs trained at 640, 768, 1024, 1152 resolutions as well

nunchaku is a quantization method that yields overall best results compared to other methods, and it's fast since it's q4
>>
>>106468422
>>106468458
>>106468462

ok. thanks.
And then I need to use one of text encoders, and this vae, right?
They're linked from the official lodestones repo: https://huggingface.co/comfyanonymous/flux_text_encoders/
and the vae from here, also linked from the official lodestones repo: https://huggingface.co/lodestones/Chroma/tree/main
>>
>>106468458
Yes, good point, the 'nunchaku' version here is NOT a real nunchaku quantization

That will happen once they're done with the Wan nunchaku
>>
>>106468470
Yes, you need the Flux vae and the t5xxl llm, typically they are named:

ae.safetensors
t5xxl_fp16.safetensors

There's a t5xxl_fp8_e4m3fn.safetensors you can use instead if you don't have enough vram
>>
>>106468470
T5 encoder is fine. You might have more luck with a fp8 version since it's smaller.
>>
>>106468495
>>106468507

perfect. thanks a lot guys. that's exactly what I needed to hear.
>>
>>106468511
Np, let us know if something fucks up.
>>
>>106468333
In her butte
>>
File: WanVideo2_2_I2V_00299.webm (1.18 MB, 1248x720)
1.18 MB
1.18 MB WEBM
>>
>>106468507
>>106468470
Do NOT use scaled text encoders.
>>
>>106468550
What happens if you do?
What's the difference?
>>
>>106468554
It makes the model more retarded. Offload it to CPU in the loader node if you must.
>>
>>106468565
how much difference are we talking?
>>
>>106468577
yes
>>
>>106468577
Also I can recommend the GNER fp16 version.
https://huggingface.co/wikeeyang/GNER-T5-xxl-encoder-only/tree/main
Avoid FLAN since there's some schizo myth circulating around that you should use it, but it's so bad.
>>
>>106468577
there is no significant difference for qwen's TE at least
>>
>>106466715
Oh no!
>>
File: 1725766727174114.png (290 KB, 1215x590)
290 KB
290 KB PNG
>>106467162
>Jpeg doesn't preserve metadata
yes it does
>>
>>106464276
newbie here, so I managed to get the wan22 i2v workflow working, but how do you modify it to include a form of reference image?
>>
>>
>>106468663
https://github.com/kijai/ComfyUI-WanVideoWrapper/tree/main/example_workflows

https://github.com/kijai/ComfyUI-WanVideoWrapper/blob/main/example_workflows/wanvideo2_2_I2V_A14B_example_WIP.json

Just copy from the examples.
>>
File: kroahmah.jpg (108 KB, 1106x890)
108 KB
108 KB JPG
Is there where the hidden chroma gems truly are? Anyone tried any of these? If I can make even wonkier stuff, I'm in
>>
>>106468577
It's minor, but there. Some people are anal about it though. Your ram is low which is why I suggested the fp8 version. If you want to try the fp16 go ahead though. There will be a node in comfyUI that says "load CLIP". Make sure it's set to CPU in the device section, otherwise you might run out of memory.

>>106468599
Also true. Gner is worth experimenting with, but flan is just t5 but worse.
>>
>>106468704
Looks a bit like Adam Hughes style ?
>>
>>106468735
>but flan is just t5 but worse
I thought flan was supposed to do better on text ?
>>
>>106468734
no. those are all models meant for testing. look at the dates. they are all old as shit
>>
>>106468734
These are mostly abortions.
>>106468763
Who tf uses chroma for text?
>>
>>106468763
It's possible. I don't ever really generate text though so I wouldn't know.
>>
>>106468599
>>106468645
>>106468735
thanks
>>
>>106468779
>Who tf uses chroma for text?
Not me, but I'm sure there's some weirdo out there who does
>>
>>106468399
Nunchaku is basically almost about q8 quality (or q8 if you have a 5090), at the size of q4 but 4x faster than a regular q4. If you don't know wjat this quants mean, q8 is a gguf quant which is almost lossless, half the size of fp16 model (but not necessarily faster if you can fit it into memory), and q4 is half q8, which represents a larger quality loss but is the largest you can shrink before you start losing too much quality, and is also not faster than fp16.
>>
>>106468754
It's a Jubilee lora but all I have prompted is Joel Jurion and Traditional medium
>>
is this what people do at mardi gras?
>>
File: AnimateDiff_00110_.mp4 (2.94 MB, 1280x720)
2.94 MB
2.94 MB MP4
someone at wan dataset team REALLY likes blue archive
to the point it pollutes all high res anime style gens
>>
File: korma.jpg (14 KB, 859x80)
14 KB
14 KB JPG
>>106468770
>>106468779
I will set sail for the seven epochs
>>
>>106468845
I see, nice style
>>
File: Radiance.png (1.62 MB, 1072x1072)
1.62 MB
1.62 MB PNG
>>106468898
That's a special model experiment that uses pixel space instead of VAE, you need a special hacked version of Comfy to try it since it's still training

I would wait until it's done unless you really want to try bleeding edge stuff, that said it seems to be progressing quite nicely
>>
File: WanVideo2_2_I2V_00300.webm (752 KB, 1248x720)
752 KB
752 KB WEBM
>>
>>106468930
The stupid furfag should leave bleeding edge research stuff to actual data scientists and just train the inevitable qwen model already and I'm tired of people doing the mental gymnastics into convincing themselves this is anything more than a waste of time.
>>
File: 388416.mp4 (3.52 MB, 832x1056)
3.52 MB
3.52 MB MP4
>>
>>106468930
>you need a special hacked version of Comfy
Seriously? Well that's no fun...
>>
>>106468948
not bad this one
>>
>>106468954
>people doing the mental gymnastics into convincing themselves this is anything more than a waste of time.
It is, but it is a waste of his time, and it's very likely he is an actual researcher. Why do you want him to finetune qwen, anyway? The person with the relevant nsfw photographic dataset is a bigasp guy, not lodestone.
>>
>>106468957
I've got the nude version of this BTW.

https://files.catbox.moe/xtbq2u.jpeg
>>
>>106468856
looks accurate to me, only it's missing people dressed as crows
>>
File: 1733542626615002.gif (569 KB, 400x266)
569 KB
569 KB GIF
>mfw all the jumpscare ai videos coming in october
>>
>>106468957
The unaltered Tifa image is a benchmark for genning 3d pinups, I want to get a good look like that in Illustrious
>>
>>106468954
>inevitable qwen model
Qwen is too slow and seemingly trains too poorly to be of any real interest , you'd be better off finetuning t2i Wan, but even that will be slow and EXPENSIVE

I doubt it will happen, but we'll see
>>
>>106468948
kek, based
>>
>>106468987
>>106469081

Quit replying to bait you knuckleheads
>>
what is this weird miku cult
>>
>>106469131
But I... yes, you are likely correct
>>
>>106469132
Why are you new?
>>
>>106469149
i dont care about miku. i dont get the appeal
>>
>>106469081
>trains too poorly
?
>>
>>106468930
what's pixel space
>>
>>106469132
It's not a cult.
>>106469155
>i dont care about miku. i dont get the appeal
I'm so fucking tired of this internalized unconscious transphobia from newfags.
>>
>>106469182
People threw their bs XL settings onto the model and decided it didn't work and ignored multiple people who have made great LoRAs for it already. So same as early flux days lol
>>
File: 1750315798740291.png (591 KB, 1125x900)
591 KB
591 KB PNG
>upload girl to nano banana
>"change her shirt into a different shirt"
>it massively downsizes her tits from a C cup to an A cup
why
>>
>>106469218
Do you feel safe? The girl you were editing was clearly very unsafe for your health. Good guy google.
>>
>>106469218
it takes your gen's breast milk as payment. they have a full data center for AI breast milk
>>
File: 1747324221633417.png (602 KB, 1024x1024)
602 KB
602 KB PNG
>>106469218
(and if you're curious on why this is in local diffusion general, it's because it's a reminder that corpo models, even when smart, are still retarded because the industry leaders will make it retarded before you get to use it)

>>106469228
lol. god forbid she walks outside with that unsafe amount of fully clothed breasts in regular daywear
>>
>>106469182
Having done a few Qwen lora training tests, it's hard to get good results when you try new concepts that are largely unfamiliar to the base model, including just different art styles, and I've been hearing the same from other people who also train, most likely due to it being very overtrained.

Also it is very slow to train, meaning few will experiment to find the potential 'best settings', but perhaps they will emerge.
>>
File: WanVideo2_2_I2V_00301.webm (830 KB, 1248x720)
830 KB
830 KB WEBM
>>
>>106466582
>>106466642
>>106466642
Hero of the thread
>>
>>106469218
>Using Google AI
>Why is it censoring everything ?
...
>>
>>106469287
still less censored than chyatgpt
>>
>>106469287
Sure, but that's not an outright rejection. Nor is there obscenity in the prompt or the image. It's just wild to me the extents these people will go. And for what?

I don't know how else to put it. It's wild that a sfw image is censored into a sfw image, completely randomly.
>>
>>106469258
It's kind of insane that you can generate this stuff on local

Yes, it loses likeness as the video progresses, but still, it's impressive, also continued likeness could be improves by training a lora of the show
>>
>>106469294
lowest bar ever
>>
File: migu.mp4 (1.9 MB, 1280x720)
1.9 MB
1.9 MB MP4
>>106469155
y-you don't like miku, anon?
>>
>>106469132
Asch effect
- Normative social influence
(fear of rejection in a social group)
- Informational social influence
(if everyone does the ritual, it must be right)

think of it as an endless cycle of insecure newfags. they see it here and adopt it. Old fags leave, new ones come and so this continuity is created.
>>
>>106469303
yeah thats what im saying
>>
File: AnimateDiff_00278.mp4 (2.77 MB, 720x960)
2.77 MB
2.77 MB MP4
>>106468654
>>
File: Screenshot_.jpg (81 KB, 470x457)
81 KB
81 KB JPG
>>106469258
"holy shit it's sonic's wife"
>>
>>106465845
>what was their endgame?
I am guessing they want to switch over because Lumina 2.0 came out and blew what Lumina-Next did out of the water and solve the remaining issues probably they encountered in the background. It's not like they disappeared but they have gone dark since late last year. Their public domain dataset is still up so at least even if they are gone, we still have a good dataset from them for not getting slop from a foundational diffusion model.
https://huggingface.co/datasets/Spawning/PD12M
>>
>>106469378
>first she has to suck a penis
>then she drops BB-8
>then BB-8 runs away from her
Man, it's been a rough first day of school for Jenny today.
>>
>>106469378
That didn't look like an accident...
>>
>>106469469
Thats based on a real chick?
>>
new
>>106469492
>>106469492
>>106469492
>>106469492
>>
>>106469477
?
she's just surprised it came to life
>>106469487
yeah
>>
>>106469503
LMAO hope she doesn’t see that stuff that menace with the oral insertion Lora was going around with. Is she an actress or something?
>>
>>106469551
no, she's a jewtuber. Jenny Nicholson
>>
>>106469571
>jewtuber
Ok I’m interested
>Jenny Nichols is a video essayist…
Boner killed. Gay.
>>
>>106469630
i just checked her page and she uploaded once in the past three years. this general is actually the premier source of Jenny content now lol
>>
>>106469571
>Jenny Nicholson
The likeness of your LoRA is very impressive, anon. I'm kinda into her now.... I might have to... I dunno... gen her without her clothes on.... maybe use the facialinsertion LoRA.... I dunno we'll see...
>>
>>106469253
I also agree with this. I haven't had much luck training styles, although I'm noticing now that higher dim values than expectedd are producing better results.
>>
File: ComfyUI_16567.png (3.08 MB, 1200x1600)
3.08 MB
3.08 MB PNG
>>106469551
>Is she an actress or something?
She's the cutest girl in the whole wide world!

>>106469666
>this general is actually the premier source of Jenny content now lol
Jenny's super-burned out atm (you can hear her almost wanting to cry at getting nothing done) and struggling to produce a "short" video about one of her favorite bad movies. No timeline for that, my guess would be around Halloween time, but she completely gave up on Barbie movie ranking video she was working on up until July. So even her current project isn't a sure thing.
>>
>>106464677
nice
>>
>>106466582
nubcake here, is inpainting in comf actually worse?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.