[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: tmp.jpg (780 KB, 3264x3264)
780 KB
780 KB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102418421

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/kohya-ss/sd-scripts/tree/sd3

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/c/kdg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/u/udg
>>>/tg/slop
>>>/trash/sdg
>>>/aco/aivg
>>
i heard this is the cozy thread
>>
>>102434655
many people are saying this
>>
>>102434662
>Flux is fundamentally fucked
True but not because it can't do styles (it can). It's fucked because it's too large to run comfortably on the average anons card.
>>
>>102434682
>It's fucked because it's too large to run comfortably on the average anons card.
it can be run on 8gb vram cards though if you use quants, it's not asking for that much when you think about it
>>
>>102434694
Even with quants it's painfully slow.
>>
>>102434723
you can use schnell and get your image in 4 steps, but yeah, I prefer to wait more for the quality dev is giving
>>
>>102434568
All of those images are ai?
>>
>>102434780
yeah, that's the point
>>
>>102434799
Neat!
I never visit these threads, you guys seem cool!
>>
>>102434833
thanks :3
>>
>>102434329
I think I made a mistake, but I'm going let the training finish before retrying. It seems that you specify --network_args in additional parameters, it with go with defaults for certain settings, even there are no arguments.
>>
https://github.com/THUDM/CogVideo/issues/291#issuecomment-2354409848
>We have not yet released the I2V model, it is expected to be released this month, and the final preparations for the model are underway
I thought we could already use it locally, or there's something I've missed
>>
So I added Pixtral as an option to my captioning scripts (only took a few lines of code, which was nice). Testing it out on both anime and real images. It's genuinely so fucking bad. Like it makes me think something is broken, but the text it generates is completely coherent. Literally 80% of what it describes in the image is wrong. Is it really that bad? It can't be, right? I'm using the Transformers implementation, not the mistralai one, if that matters. Maybe it's subtly broken somehow?
>>
>>102435216
no no, it's completety shit, that's all there is to it lol
>>
>>102435038
Being generous, even with local image gen we can say we're "close to" SaaS but with video we're not even in the same universe.
>>
>>102435309
>video we're not even in the same universe.
we can also add text to music/songs, we are in 10 universe away from suno/udio kek
>>
>>102435236
How do they even fuck up that bad then? Joycaption is two nn.Linear layers in a trained projector between a pretrained, frozen CLIP model and a pretrained, frozen LLM. Literally two fucking weight matrices. And it mogs Pixtral. Just embarrassing.
>>
File: __00202_.png (2.83 MB, 1560x2280)
2.83 MB
2.83 MB PNG
>>
File: 00008-715896776.jpg (156 KB, 1080x1280)
156 KB
156 KB JPG
>>
>>102435355
Knowing MistralAI, they got roasted so hard by the LLM community by "selling their soul to the devil by keeping their best models as an API when they promised full open source releases" that they'll throw any kind of failed experiment shit to us as a way to say "see, we're still thinking of you peasents" :^)
>>
File: ComfyUI_01557_.png (1.5 MB, 1024x1024)
1.5 MB
1.5 MB PNG
so what comes after Flux?
>>
>>102435402
>so what comes after Flux?
Flux video
https://blackforestlabs.ai/up-next/
>>
>>102435402
Salvation from this Mortal Coil
>>
>>102435407
woah better than I expected.
what about image generators tho?
>>
>>102435431
>what about image generators tho?
I don't want to doom, but we'll probably never get a base model with the quality of Flux, the best we can do now is to finetune that mf to make it even better I guess
>>
It's all matrices all the way down.
Matrices shaped into other matrices viewed into other matrices all turned into a loss function.
>>
File: 00073-4237509925.png (1.01 MB, 1024x1024)
1.01 MB
1.01 MB PNG
>>
File: 00030-2814665992.jpg (887 KB, 1080x1280)
887 KB
887 KB JPG
>>
File: 00010-2034227891.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
>>
>>102435444
trust the plan
>>
File: 00113-2577591480.png (1.32 MB, 1024x1024)
1.32 MB
1.32 MB PNG
>>
File: amputate the mutant.png (1.59 MB, 1344x768)
1.59 MB
1.59 MB PNG
Good Comfy workflows to automate hand fixes? The Impact pack to detect and fix faces works great, but the automatic hand detection/fixing seems to have little or no positive impact
>>
File: 00197-1997104261.png (1.22 MB, 1024x1024)
1.22 MB
1.22 MB PNG
>>
File: 00032-3559231248.jpg (779 KB, 1080x1280)
779 KB
779 KB JPG
>>
File: bidler2.png (3.92 MB, 2000x2000)
3.92 MB
3.92 MB PNG
>>
>>
File: 00008-464414774.jpg (752 KB, 1080x1280)
752 KB
752 KB JPG
>>
File: 00010-2797418403.jpg (903 KB, 1080x1280)
903 KB
903 KB JPG
>>
File: 00063-1422458154.png (1.07 MB, 1024x1024)
1.07 MB
1.07 MB PNG
>>
>>102435407
shits gonna be slow as fuck even on a 4090
>>
>>102435777
yeah, for example CogVideoX-5b, for a 8fps + 50 steps that took 2 mn for a H100, we're fucked lol
>>
File: 00744.png (861 KB, 1024x1024)
861 KB
861 KB PNG
>>
File: temp_vcmlh_00012_.png (1.05 MB, 832x1216)
1.05 MB
1.05 MB PNG
>>
File: yodalady.png (895 KB, 752x896)
895 KB
895 KB PNG
>>
File: rattigieg1.png (462 KB, 512x672)
462 KB
462 KB PNG
>>
Miraculous thread
>>
File: campaign.png (3.87 MB, 2048x1288)
3.87 MB
3.87 MB PNG
>>
Dare I say... us?
>>
>>102435841
kek, really based image not gonna lie
>>
File: 00005-376976035.png (1.55 MB, 1024x1024)
1.55 MB
1.55 MB PNG
>>
File: 00021-3845168184.jpg (867 KB, 1080x1280)
867 KB
867 KB JPG
>>
File: 00000-376976030.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
>>
>>102435850
I was at work and nearly kek'd out loud when my nigga sent me that pic
>>
File: 00017-2754352315.png (1.58 MB, 1024x1024)
1.58 MB
1.58 MB PNG
>>
File: 00080.png (1.03 MB, 896x896)
1.03 MB
1.03 MB PNG
>>
File: 00000-3227682127.png (1.34 MB, 1024x1024)
1.34 MB
1.34 MB PNG
>>
>>102435841
No, I like manual artists too
>>
>>102435916
I like artists who respect AI, those who want AI dead won't be missed if they lose their jobs lol
>>
File: 00003-3498046325.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
>>
Next one is an actual gen I promise lol
>>
File: 00000-24400397.png (1.53 MB, 1024x1024)
1.53 MB
1.53 MB PNG
>>
>>102435923
Fair but I'd always prefer the olive branch first
>>
File: 00001-2456067307.png (1.48 MB, 1024x1024)
1.48 MB
1.48 MB PNG
>>
>>
File: 00000-2731582637.png (997 KB, 1024x1024)
997 KB
997 KB PNG
>>
File: 00021-2397995919.png (885 KB, 1024x1024)
885 KB
885 KB PNG
>>
File: 00009-2456132599.png (945 KB, 1024x1024)
945 KB
945 KB PNG
>>
File: 00017-2674112797.png (960 KB, 1024x1024)
960 KB
960 KB PNG
>>
File: 00005-3322961658.png (1.32 MB, 1024x1024)
1.32 MB
1.32 MB PNG
>>
>>102436032
would
>>
File: ComfyUI_temp_aydfz_00891_.png (3.59 MB, 1434x1434)
3.59 MB
3.59 MB PNG
>>
>>102436072
shameless motherfucker
>>
File: 00037-2412469408.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
>>
>>102435782
>+ 50 steps
what if flux video can do with less steps?
if we only need like 4 it could be okay
>>
File: -.png (1.48 MB, 1344x768)
1.48 MB
1.48 MB PNG
>>
>>102435695
I think everyone forgot about hand detailers now that flux gets it right most of the time
>>
File: ComfyUI_temp_aydfz_00719_.png (2.49 MB, 1434x1434)
2.49 MB
2.49 MB PNG
>>
>>102436172
Haven't heard good things about flux for anime
>>
File: ComfyUI_temp_aydfz_01095_.png (3.75 MB, 1434x1434)
3.75 MB
3.75 MB PNG
>>
>>102436192
It's default style isn't the most pleasant but there are several style loras that are pretty good
>>
File: 00030-345097352.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>
File: ComfyUI_temp_ueozj_00085_.png (2.13 MB, 1434x1434)
2.13 MB
2.13 MB PNG
>>
File: ComfyUI_temp_aydfz_00961_.png (3.38 MB, 1434x1434)
3.38 MB
3.38 MB PNG
>>
>>102436215
It doesn't get booru tags, though, from what I've heard? Which really limits what you can do

Maybe I'll check it out if a proper full finetune is released, but it seems pretty inferior right now
>>
File: ComfyUI_temp_plsgq_00177_.png (1.48 MB, 1024x1024)
1.48 MB
1.48 MB PNG
>>
File: ComfyUI_temp_ueozj_00028_.png (3.7 MB, 1434x1434)
3.7 MB
3.7 MB PNG
>>
File: ComfyUI_temp_plsgq_00046_.png (1.16 MB, 1024x1024)
1.16 MB
1.16 MB PNG
>>
File: ComfyUI_temp_aydfz_01072_.png (3.89 MB, 1434x1434)
3.89 MB
3.89 MB PNG
>>
File: ComfyUI_temp_aydfz_00916_.png (2.55 MB, 1434x1434)
2.55 MB
2.55 MB PNG
>>
File: konosuba flux dev v1.jpg (826 KB, 3072x1024)
826 KB
826 KB JPG
>>102436245

Booru tags style dataset works. Just cook your own waifus LoRa with booru tags.
>>
File: ComfyUI_temp_aydfz_00988_.png (3.41 MB, 1434x1434)
3.41 MB
3.41 MB PNG
>>
File: 00000-575325872.png (1.59 MB, 1024x1024)
1.59 MB
1.59 MB PNG
>>
File: 00000-1280067345.png (1.34 MB, 1024x1024)
1.34 MB
1.34 MB PNG
>>
File: 00000-1778453302.png (1.62 MB, 1072x1072)
1.62 MB
1.62 MB PNG
>>
File: 00001-2333064770.png (1.56 MB, 1024x1024)
1.56 MB
1.56 MB PNG
>>
>>102436317
I'm the opposite - I prefer to cook my own LoRas for style, and I don't really care about the characters it recognizes, but more the concepts from boorus. How well it understands specific concept tags like "bob cut", "doll joints", "goo girl", "yandere", etc. is what's really important, to give random examples

I'm sure it can infer those to some extent, but the reports I've heard say that it's far inferior to the SDXL models in that area at the moment
>>
File: 00004-3527511055.png (1.6 MB, 1024x1024)
1.6 MB
1.6 MB PNG
>>
File: 00009-125602019.png (1.52 MB, 1024x1024)
1.52 MB
1.52 MB PNG
>>
File: 00003-3016189836.png (1.49 MB, 1024x1024)
1.49 MB
1.49 MB PNG
>>
File: 00011-3199069743.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>
File: 00006-2785877804.png (58 KB, 768x768)
58 KB
58 KB PNG
>>
File: 00005-42579165.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
>>
File: 00007-3347182046.png (1.56 MB, 1024x1024)
1.56 MB
1.56 MB PNG
>>
File: 00431.png (958 KB, 1024x1024)
958 KB
958 KB PNG
>>
>>102436479
Beautiful
>>
File: ComfyUI_temp_cupog_00068_.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
>>
File: 00444.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>
File: dino_00163_.png (1.05 MB, 1024x1024)
1.05 MB
1.05 MB PNG
for training, do you guys prefer adafactor, adamw8, LION, prodigy, or something else?
>>
File: ComfyUI_temp_cmdfs_00445_.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
>>
File: ComfyUI_00014_.png (1 MB, 832x1216)
1 MB
1 MB PNG
I couldn't get flux to run in comfy using the github documentation. Is there an manual for what everything is doing? I really don't want to be doing the pull it together from a billion workflows bullshit again. I understand basic concepts, but have no idea why their are over a dozen sampler options as part of comfy core.
>>
File: ComfyUI_temp_aydfz_00154_.png (1.68 MB, 1024x1024)
1.68 MB
1.68 MB PNG
>>
File: chrome_fynJTPGFRx.gif (2.17 MB, 237x186)
2.17 MB
2.17 MB GIF
is there an automatic1111 implementation of the technique in the new Steve Mould video? https://www.youtube.com/watch?v=FMRi6pNAoag
i'd love to make some twisty squares pics
>>
>>102436568
no more half measures walter
>>
File: workflow.jpg (203 KB, 1873x780)
203 KB
203 KB JPG
>>102436535
this is my basic ass workflow
>>
File: ComfyUI_temp_aydfz_00639_.png (3.66 MB, 1291x1291)
3.66 MB
3.66 MB PNG
>>
File: 00001-430385072.png (1.67 MB, 1024x1024)
1.67 MB
1.67 MB PNG
>>
File: 00010-2461924006.png (1.86 MB, 1024x1024)
1.86 MB
1.86 MB PNG
>>
File: temp_vcmlh_00046_.png (1.03 MB, 640x1536)
1.03 MB
1.03 MB PNG
>>
>>102436479
>>
File: 00014-1077822227.jpg (2.87 MB, 2912x1840)
2.87 MB
2.87 MB JPG
Do any other anons ever get sad when the seasons change
>>
>>102436535
euler_cfg_pp is the only sampler that matters
>>
>>102436610
she looks like you crossed nick cage and scarlet johansson
>>
>just train layers 7 and 20 single blocks at 128 dim, bro
>resulting lora is a blurry mess, likeness isn't captured well, and looks much uglier than a lora trained and 8 dim
>>
File: ComfyUI_33521_.png (939 KB, 736x1024)
939 KB
939 KB PNG
>>102432426
7 styles and 2 characters.
https://mega.nz/folder/mtknTSxB#cGzjJnEqhEXfb_ddb6yxNQ
https://mega.nz/folder/ekklHRgY#pH4JkFk-kFj4c09r-EtmpQ
>>
>>102435373
Nice
>>
>>102437186
Yeah you can't trust advice in anything ML dev related because nobody who is being loud knows what they're doing. The only way to make anything work is to experiment yourself.
>>
File: ComfyUI_33944_.png (1.67 MB, 1280x720)
1.67 MB
1.67 MB PNG
>>
File: ComfyUI_33905_.png (1.38 MB, 768x1024)
1.38 MB
1.38 MB PNG
>>
File: ComfyUI_33912_.png (1.5 MB, 768x1024)
1.5 MB
1.5 MB PNG
>>
File: temp_vcmlh_00017_.png (1.01 MB, 832x1216)
1.01 MB
1.01 MB PNG
>>
File: ComfyUI_00032_.png (1.46 MB, 1216x832)
1.46 MB
1.46 MB PNG
>>102436703
Robert frost - reluctance or most of his stuff. You aren't alone anon.

>>102436996
this caused pixelization and garbage.
>>
File: ComfyUI_33965_.png (1.26 MB, 1280x720)
1.26 MB
1.26 MB PNG
>>
Is there any way to raise flux adherence to the prompt. I can't get more than about a dozen poses and god help me if I want something I wouldn't find on imagestock
>>
File: ComfyUI_Flux_14047.jpg (390 KB, 832x1216)
390 KB
390 KB JPG
>>
File: bComfyUI_114630_.jpg (1.61 MB, 3072x1536)
1.61 MB
1.61 MB JPG
>>
File: 1167752492635494298-SD.png (1.68 MB, 896x1152)
1.68 MB
1.68 MB PNG
best party evaaaaaaaaaaa
>>
Well it's working, but fuck it's slow
>>
>>102435692
whats with all the sudden love for sandniggers? i say fuck them both.
also, go back.
>>
https://github.com/aigc-apps/CogVideoX-Fun/tree/main/comfyui

Has anyone tried this yet? Looks sus af, don’t wanna get some
Malware
>>
>>102440072
workflow link?
>>
>>102440170
https://comfyanonymous.github.io/ComfyUI_examples/flux/
I got it from here, I don't really know what I'm doing though
>>
>>102440180
>I don't really know what I'm doing though
welcome to the club
>>
>>102440072
just so you know, a 4090 with that exact workflow is only double the speed... which is to say the per iteration speed is still slow in comparison to SDXL and obviously SD1.5. In terms of money it's not too bad since an XTX is about half the cost of a 4090, so getting half the speed is okay.
>>
>>102440245
That's good to know, I was kind of curious where this sat in terms of performance.
>>
>>102440155
No idea, it's the 2B parameter model according to the ss's not the 5B one that is out on huggingface and modelscope and img2vid is due in maybe a day maybe two weeks.
I'm going to use the online ones (when hugginface is back up and or I can be bothered to register at modelscope with it's rune language)
https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space (currently down)
https://modelscope.cn/studios/ZhipuAI/CogVideoX-5b-demo
(rune language)

Some other links:
https://github.com/kijai/ComfyUI-CogVideoXWrapper
https://modelscope.cn/models/ZhipuAI/CogVideoX-5b
>>
>>102440072
Are you using the zluda comfy?

Try this workflow

https://openart.ai/workflows/onion/flux-gguf-q8-12gb/X5HzyhrKjW2jqHVCTnvT

unet: flux1-dev-Q8_0.gguf
clip: t5-v1_1-xxl-encoder-f16.gguf
clip: ViT-L-14-BEST-smooth-GmP-TE-only-HF-format.safetensors


https://huggingface.co/city96/FLUX.1-dev-gguf
https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf
https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/tree/main
>>
>>102440505
>Are you using the zluda comfy?
Yes

I'll give it a try tomorrow it's 1AM and it basically took me 4-5 hours to get this working after faffing around with a1111
>>
>>102440245
How long it takes on 4090 to generate 20 steps with flux dev?
>>
>>102440354
The 5b model is linked right on the GitHub page
>>
>>102440542
15 seconds for me using the same settings
>>
File: 1491272745.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>
File: file.png (105 KB, 2304x467)
105 KB
105 KB PNG
https://github.com/THUDM/CogVideo/tree/CogVideoX_dev
I2V CogVideoX will be released on huggingface tommorow
>>
>>102440505
Alright I did it anyway and it is working but the clip gguf doesn't show up, the unet does but if I unselect it I can't choose it again
>>
I wonder why the chinese are so involved in machine learning. Is it just a byproduct of their investments into scientific research in general, or could it be a more directed effort?
>>
>>102441125
What about this? Seems like there's a branch of CogVideo that allows for every resolution or something, I wish it would also allow for more than 8fps lol
https://huggingface.co/spaces/alibaba-pai/CogVideoX-Fun-5b
https://github.com/aigc-apps/CogVideoX-Fun
>>
>>102441187
>I wonder why the chinese are so involved in machine learning.
AI is the most important tool of the 21th century, and it's a good way to get investor money right now, so of course people are going for it
>>
File: tmpm41y86p5.png (1.14 MB, 896x1152)
1.14 MB
1.14 MB PNG
>>
>it's just the mathematical average of all the images it was trained on!
>no it's not
Okay, explain.
>>
>>102441309
>inb4 it's the machine spirit, I ain't gotta explain shit
>>
>>102441187
This field only needs a couple dozen people to care and most of the gimmicks of AI is great for social media engagement.
>>
>>102441190
cant install it
 error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
exit code: 1
─> [20 lines of output]
[WARNING] Unable to import torch, pre-compiling ops will be disabled. Please visit https://pytorch.org/ to see how to properly install torch on your system.
[93m [WARNING] [0m unable to import torch, please install it if you want to pre-compile any deepspeed ops.
DS_BUILD_OPS=1
Traceback (most recent call last):
File "G:\ComfyUI\venv\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
main()
File "G:\ComfyUI\venv\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "G:\ComfyUI\venv\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "<string>", line 155, in <module>
AssertionError: Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops.
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
exit code: 1
─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.


damn it chang
>>
>>102441581
>AssertionError: Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops.
looks like you have to install torch before doing this process?
>>
>>102441642
too much effort
>>
>>102441689
it's working on this comfyUi node too, maybe it's easier
https://github.com/kijai/ComfyUI-CogVideoXWrapper
>>
>>102441703
>Note that while this one can do image2vid, this is NOT the official I2V model yet, though it should also be released very soon.
I wonder what the difference will be between this one and the tommorow one
>>
>>102441154
Did you place the Q8 in "unet" folder and the clips in "clip" folder?

Make sure you have ae.safetensors in the "vae" folder too

https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main

You have to then install the missing nodes, restart comfyui and select the unet, clip and vae in each node
>>
>>102441759
And remove the dualcliploader and add the dualcliploader (gguf) if you can't see the gguf clip
>>
>>102441642
I already have torch installed, that chang repo is just sucks, ill go with kijai better

"(venv) G:\ComfyUI\custom_nodes\CogVideoX-Fun>python -c "import torch; print(torch.__version__)"
2.4.1+cu124"
>>
>>102441836
>ill go with kijai better
yeah, you know you won't get weird ass errors when going for kijai's repos, this guy is really good
>>
>>102435692
based
>>
File: ComfyUI_33976_.png (456 KB, 1280x720)
456 KB
456 KB PNG
>>
File: ComfyUI_33907_.png (1.52 MB, 768x1024)
1.52 MB
1.52 MB PNG
>>
File: ComfyUI_33946_.png (1.65 MB, 1280x720)
1.65 MB
1.65 MB PNG
>>
File: ComfyUI_00735_.png (1.41 MB, 1024x1024)
1.41 MB
1.41 MB PNG
>>102441125
Nice.
>>
File: cogvideo.png (90 KB, 895x507)
90 KB
90 KB PNG
>>102441854
well here i go, vramlets are not going to like this img2video model lel
>>
>>102442031
how many gb of vram does it ask? there's also the fp8 version if it's too big? which one are you using?
>>
File: cogvideo2.png (79 KB, 451x721)
79 KB
79 KB PNG
>>102442042
its using 16,9 GB of 24 (I have a 4090)
and im using the CogVideoX-Fun-5b-InP model, I already downloaded it from the fun repo
>>
>>102442089
compared to Flux it's not asking for that much, and like I said, it can still be quant to a Q8, the quality will be similar and it will only ask for probably 9gb of memory
>>
File: cogvideo3.png (135 KB, 1663x691)
135 KB
135 KB PNG
>>102442042
>>102442089
fuckkk, all that electricity used down to the drain
>>
File: file.webm (85 KB, 512x512)
85 KB
85 KB WEBM
>>102441190
>https://github.com/aigc-apps/CogVideoX-Fun
kek
>>
Anyone know what kind of "language" you're supposed to use with SD 1.5?
I've been using Pony for a month now and it's come out fairly good, but trying SD 1.5 and the ai is clearly confused as fuck by what I'm trying to tell it. It looks like an acid trip during a fever dream.
>>
File: 00012-3850524567.png (2.18 MB, 1120x1440)
2.18 MB
2.18 MB PNG
>>
File: CogVideotest.webm (407 KB, 640x384)
407 KB
407 KB WEBM
>>102442156
Ok, 512 base resolution with 24 steps, not bad
>>
File: 00031-1208650182.jpg (738 KB, 1600x960)
738 KB
738 KB JPG
>>102442268
base image
>>
File: file.png (226 KB, 1799x1343)
226 KB
226 KB PNG
>>102442031
>10 mn of wait to get a 5 sec + 8fps bad video on a 4090
it's over, the only way to make this viable (if we were being serious and wanting to reach MinMax level), it would be to create a BitNet model, the inference would be way faster because your GPU only does additions instead of matrix multiplications
https://arxiv.org/abs/2402.17764
>>
>>102442268
did you use the original prompt for the first frame as you did in the img2vid process?
>>
When I use inpainting to fix eyes, do I redo the prompt for fixing eyes or do I just keep using the same prompt and make 20 copies until it makes good eyes?
>>
so what's the next step after becoming a prompt engineer
>>
>>102442402
Learn photoshop to touch up the outputs
>>
>>102442339
Nop, just used "fashion week footage, model walking down the ramp"

>>102442315
512 base with 24 steps takes 90 seconds, not bad at all

768 base makes me OOT when VAE decoding, im trying to use 768 base with a resized image, hopefully where will be a solution like tiled vae decoding or something
>>
>>102442402
Earning a degree in synthology
>>
>>102442375
I typically only put character, eye color, and a few style tags. You can also check out adetailer, it will automatically img2img the face with the generation which usually clears up any eye issues.
>>
>>102442415
Please try it again with the original prompt and the modifications you want to happen during the clip. Anecdotally I got better results when the huggingface version was active than just feeding it an image and letting her rip with hardly any guidance.
>>
>>102442428
I see. I'll check that out thanks.
>>
File: file.png (80 KB, 2099x222)
80 KB
80 KB PNG
>>102442415
>512 base with 24 steps takes 90 seconds, not bad at all
but when you look at that >>102442031
it's 5 min for 768 base (24 steps), how can it be this slower when going from 512 to 768 lol
>>
File: cogvideo4.png (43 KB, 351x475)
43 KB
43 KB PNG
>>102442415
768 base resolution gives me OOT again, im trying fewer frames now
>>
>>102442461
maybe you should go for fp8 instead? can GGUF be a thing for video models?
>>
File: CogVideotest2.webm (445 KB, 912x624)
445 KB
445 KB WEBM
768 base with 25 steps worked

>>102442435
Im testing quality right now

>>102442449
Because my original screenshot had 50 steps, thats why it took like 8 minutes, since then im just using 24 steps
>>
>>102442491
>Because my original screenshot had 50 steps, thats why it took like 8 minutes, since then im just using 24 steps
that's what I said, it took 1.32 + 8.04 = 9.36 mn for 50 steps -> ~4.30 mn for a 24 steps inference for 768 steps, and then for 512 steps it's only 1.30 mn? damn that's a big difference if you ask me
>>
>>102442491
meant 25 frames**
>>
>>102442491
>quality
Ok, but with the original prompt or without?
>>
File: cogvideo5.png (108 KB, 951x763)
108 KB
108 KB PNG
fuck this gay earth, now all it gives me is OOT errors, vram leak?
>>
>>102442627
kek there is a vae tilling option in the decode node
>>
In my adventures of 16 channel VAEs I may have got Pixart Sigma compatible with the Osiris one. Also I want to try playing with Dynamic Routing in the patches, maybe teach them to pay attention to useful patches.
>>
File: CogVideotest3.webm (389 KB, 976x584)
389 KB
389 KB WEBM
>>
>>102442683
Big of true
>>
>>102442702
honestly tolerable for fapping
>>
>>102442683
>the Osiris one
the what?
>>
File: file.png (592 KB, 1361x735)
592 KB
592 KB PNG
>>102442712
I mean it's noise and the learning rate was going down. So theoretically yes. Going to do a regular 600m VAE training test.

>>102442725
https://huggingface.co/ostris/vae-kl-f8-d16
>>
>>102442713
I feel that, back then porn videos were only 360p so I'm used to low quality coom videos, that's good enough to me kek
>>
File: file.png (59 KB, 1783x626)
59 KB
59 KB PNG
>>102442736
interesting, seems like the SD3 one is the best of them all, now I want to see if the Flux Vae beats that
>>
File: gen3.webm (1.87 MB, 1280x768)
1.87 MB
1.87 MB WEBM
>>102442702
runwayml gen3 for comparison
>>
>>102442766
vae = AutoencoderKL.from_pretrained(config.vae_models_dir, allow_pickle=True).to(device).to(torch.float16)


anything that is compatible with this should "work"
>>
>>102442778
I prefer the low res over smeared HD
>>
>>102442736
It appears that everyone forgot about trying to do this once Flux released but I hope you get further.
>>
>>102442713
yes and the cool thing is that allows img2video, i've used every imag2video online service there is and they all suck, its like playing a coin slot machine and there is also the fucking censorship, they suck
>>
>>102442778
do we know if MiniMax will do a I2V at some point in time? if yet it will crush the competition so hard...
https://reddit.com/r/StableDiffusion/comments/1fjvyc5/would_you_listen_to_this_band/
>>
>>102442801
It's obvious we're not fine tuning Flux any time soon and there's room for truly open < 2B models
>>
File: CogVideotest4.webm (542 KB, 1216x728)
542 KB
542 KB WEBM
960 base resolution worked, 1024 gave me OOT right away, now im gonna test more steps and schedulers for quality
>>
File: file.png (1.75 MB, 1024x1024)
1.75 MB
1.75 MB PNG
https://civitai.com/models/772407/flux-gas-room-lora?modelVersionId=863913
OY VEY
>>
>>102442766
flux is the best one
it was flux>aura>sd3>ostris when i was trying them
>>
>>102442977
also aura can sometimes beat flux its like flux => aura
>>
>>102442977
Is it AutoencoderKL compatible? Do you have a link? I can swap out my osiris one.
>>
File: CogVideotest5.webm (513 KB, 1216x728)
513 KB
513 KB WEBM
>>102442848
50 steps, don't see much improvement for the longer genning time desu
>>
>>102443005
https://huggingface.co/black-forest-labs/FLUX.1-schnell/tree/main/vae for flux
https://huggingface.co/AuraDiffusion/16ch-vae for aura
and it should be its what i was using
>>
>>102442848
>>102443021
looks like some ps1 3d glitch it usually had https://youtu.be/x8TO-nrUtSI?t=74
>>
>>102440088
So true! We folks should donate another gorrilion $$$ to Israel - our greatest ally!!!
>>
>>102443054
just looks like a different scaling factor is all
>>
>>102443021
the official weights are released a few minutes ago
https://huggingface.co/THUDM/CogVideoX-5b-I2V/tree/main
https://github.com/THUDM/CogVideo
>>
>>102443109
>Apache-2.0 license
that's pretty based, even though the model kinda sucks, it's the best one we have so far and we can do whatever we want with them
>>
>>102443109
next step is I2V2I, once you can use key frames it's ogre
>>
>>102443270
this, I absolutely loved the I2V2I shit on Luma, the transitions are so smooth
https://www.youtube.com/watch?v=KshorouF0s4
>>
>>102443109
cant wait for the porn finetunes
>>
>>102443109
it's the same weights at those ones or? >>102441190
>>
File: rife.webm (587 KB, 640x384)
587 KB
587 KB WEBM
>>102442268
looks pretty rough alright
>>
File: CogVideotest7.webm (606 KB, 840x1024)
606 KB
606 KB WEBM
>>
>>102443656
that's really rough, either 5b isn't enough to get something interesting, either they still haven't mastered this technology (which is completely fair)
>>
>>102443681
chud wants minmax/gen3 level of video quality in his shitty local computer lel
>>
File: qwen2-72b-vl.png (275 KB, 1484x1921)
275 KB
275 KB PNG
>>102434568
Qwen2-72B-VL released, it's a vision model.
https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct
>>
>>102443873
that's possible if we use GGUF quants, before Flux we thought it was impossible to get something in the same level of dalle or MJ locally
>>
>>102443933
is there a demo we can try somewhere? because I can't run a fucking 72b model, they're tripping
>>
>>102443873
trust the plan anon
https://blackforestlabs.ai/up-next/
>>
>>102443958
here you, all you have to do is search HF's spaces section https://huggingface.co/spaces?sort=trending&search=qwen2
>>
File: file.png (802 KB, 800x600)
802 KB
802 KB PNG
>>102444026
https://huggingface.co/spaces/Qwen/Qwen2-VL
>The image depicts an anime-style character sitting on a wooden surface, likely a desk or a table, near a window. The character has short, brown hair and is wearing a white shirt with rolled-up sleeves, green pants, and a black tie. The character is barefoot and appears to be in a relaxed or contemplative pose, with one leg bent and the other extended.
>The character is holding a fork in their right hand, which is positioned near their mouth, suggesting they are about to take a bite. In front of the character, on the wooden surface, there is a small tray with a piece of food on it, possibly a cake or a pastry. The background shows a window with a view of greenery outside, indicating that it might be daytime. The overall atmosphere of the image is calm and serene, with soft lighting and a warm color palette.

>The character is holding a fork in their right hand
it's fucking over, GPT4V remains the king, only that model understands she's holding her fork with her left fooot
>>
>>102444117
cant you just manually change the caption of such autistic image?
>>
>>102444239
>every image that isn't about a 1girl standing in front of the picture is autistic
Sorry anon, but there's a reason we want good vision models, because not all the pictures we caption are "1girl, standing", we're not simpletons as you are, sometimes we have complex pictures that should be captioned well
>>
>>102444239
You're everything that's wrong with this field, if image models can't do complex poses and scenes, that's because they have been trained with pictures that were captioned wrongly, there's no excuse to be made, it should be accurate on every situations
>>
>>102444263
the rest of the caption is really good, the shitty nitpick you're pointing out is just a autistic sperg, the point of captioning is for training/gen purposes
>>
>>102444332
>the point of captioning is for training/gen purposes
yes? and if it makes mistakes then you'll get bad training, are you retarded or something? it should do complex poses and not just your average coomerbrain 1girl standing you fucking morron
>>
>>102444313
not enough people eating food with their feet in their dataset i guess
>>
File: file.png (542 KB, 1425x1228)
542 KB
542 KB PNG
>>102444356
yet GPT4V does it good, you reek of mediocrity, do you really need OpenAI to tell you that everything is possible?
>>
>>102444354
those kind of mistakes can be easily fixed, what kind of a "complex" pose is that anyway, is just an autistic image
>>
>>102444413
>those kind of mistakes can be easily fixed
yeah sure, when you pretrain a model, it has hundreds of millions of pictures, what's the point of fixing them all manually if you can instead have a good model that is reliable enough? do you get the point of having those models in the first place? They exist so that you don't have to do the work manualy, goddam you're fucking retarded anon
>>
>>102444434
>pretrain a model, it has hundreds of millions of pictures

lmao people here barely can train a shitty lora and you're sperging out about >muh millions of pictures
really anon, are you really training a model with "millions of pictures"
>>
>>102443109
plebbit says they are not the official weights for img2vid as they said they were releasing them tomorrow, ofc it is tomorrow in the middle Kingdom, so maybe they are.
Anyone support this theory about them being the official weights?
>>
So the difference between this and /sdg/ is /sdg/ doesn't care about privacy and /ldg/ is strictly open source and not inherently network connected? New to this.
>>
>>102444469
why do you believe Flux is the last base model we'll ever get? The pretraining is the most important process on making a good model, the finetuning is just the cherry on top, we need good vision models so that the people that will work on a new base model will get some quality training, and to get quality training you need good captions, and to get good captions you need a good vision model, I hope that helped
>>
>>102444470
it's really confusing to me, like there's 1 days of difference between those weights and the official weights, I'm sure they are the same, we'll get the truth tommorow I guess
>>
>>102444470
they are already released anon, its china, their tomorrow its today for us

https://huggingface.co/THUDM/CogVideoX-5b-I2V
>>
>>102444604
yeah but there was this that released the models before
https://github.com/aigc-apps/CogVideoX-Fun?tab=readme-ov-file#model-zoo
https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-5b-InP.tar.gz
I guess they're the same weights as the official ones?
>>
File: possed.png (68 KB, 1297x419)
68 KB
68 KB PNG
>>102444636
its pozzed
>>
>>102444808
who are those aigs-apps anyway? they finetuned the "original" weights to make it work on multiple resolutions or something? Do they work with THUDM? They appeared out of nowhere and improved this shit, I won't complain though that's cool it can do multi resolution kek
>>
File: CogVideotest8.webm (629 KB, 840x1024)
629 KB
629 KB WEBM
>>102444808
CogVideoX-Fun can do anything

the official one can't do shit
>>
>>102444857
>CogVideoX-Fun can do anything
>the official one can't do shit
so far, the only thing "Fun" can do that the official can't is multiresolution, is there some other stuff that it can do that the official can't?
>>
File: CogVideoX-I2V_00002.webm (1.55 MB, 720x480)
1.55 MB
1.55 MB WEBM
>>102444840
yeah with the "fun" one I can generate in any resolution. the official THUDM one is resolution locked and giving me shitty results, but i think its the wrapper fault than the original repo
>>
Anyone used fast training on civitai?
>>
>>102444905
>i think its the wrapper fault
you should update the wrapper, he seems to have made it work with the original weight
https://github.com/kijai/ComfyUI-CogVideoXWrapper/commit/6729faa4717c5049aba9031ecb67ff245e63121f
>>
>>102444857
What if it comes out that this AI was just a bunch of cam girls doing shit that looks like AI at low res. Lol
>>
>>102444636
>I guess they're the same weights as the official ones?
if you know how to read chink there's the technical report of "Fun" on how they managed to make it work on different resolutions
https://blog.csdn.net/weixin_44791964/article/details/142205114
>>
>>102444931
i updated it but it needs time, the fun one seems better already, it can run on fp8 even

THUDM got cucked on their release day lol
>>
>>102445240
>THUDM got cucked on their release day lol
that's so weird, I'm sure they had no idea some other company had their weight and improved on that, or if they knew maybe they didn't expect them to release the weights, there's no logical explaination on why they would release the "bad weights" officially if they knew something better was being cooked
>>
https://github.com/aigc-apps/CogVideoX-Fun?tab=readme-ov-file#model-zoo
there's no fp8 quant of that one right?
>>
File: file.png (136 KB, 632x500)
136 KB
136 KB PNG
>>102435355
>Literally two fucking weight matrices. And it mogs Pixtral. Just embarrassing.
B-but... le heckerino youtuber said it was good!!
https://www.youtube.com/watch?v=7aGTKJJMb5w
>>
>>102441309
>human art is just the mathematical average of all the art the human has experienced before!
>no it's not
Okay, explain.

>inb4 "soul"

Both statements are dramatic oversimplifications of the process. Human brains are very complex and difficult to explain. Neural nets as well, to a slightly lesser extent right now.
>>
File: file.png (33 KB, 381x204)
33 KB
33 KB PNG
>>102445355
you download the model and you can choose the precision on the node
https://github.com/kijai/ComfyUI-CogVideoXWrapper
>>
>>102443933
Any other 96GB VRAMchads get this to run locally? I tried the 8 bit GPTQ version, when loading the model it complains that all the bias weights were not initialized. Then running inference immediately fails because it complains some value was NaN (caused by the aforementioned weights not loading? idk). Currently downloading the fp16 version, will try that with bitsandbytes quantization.
>>
>>102445585
Neural nets are not difficult to explain. The math is simple. AI is also deterministic and can only produce autocompletions of patterns it has encountered in the training. The human brain, on the other hand, can actually conjure up new concepts completely foreign to the training dataset. Someone figured out the wheel. Someone figured out electromagnets. Don't let your atheist m'lady hat stop you from critically thinking especially when you're attempting to humanize these dumb autocomplete algorithms.
>>
File: 0.jpg (158 KB, 1024x1024)
158 KB
158 KB JPG
>>
>>102445654
Neurons are not difficult to explain. The math is simple. It's all just voltage. The brain works deterministically based on learned patterns and initial input conditions.
>>
>>102445736
>Neurons are not difficult to explain. The math is simple.
that's not the point, we have no idea what's the role of each weights so we can't control that shit, like imagine you want to remove censorship on a model, if you knew how it work you would deactivate the bad weights, but we can't because we have no idea how it works it's just a black box, a black box that does simple calculations to transform an input into an output
>>
>>102445830
Explainable AI is a thing anon, you're a bit behind the times.
>>
>>102445862
never said the field doesn't exist, but they haven't find much out of it, like I said, if for example I'm asking you to identify the weights that have the concept of Miku in Flux and I'm asking you to remove that concept, you wouldn't be able to do it because we don't really know how to do something like that
>>
>>102445830
You are simultaneously describing the human brain and a neural network.
>>
>>102441187
Technology for automation will always be useful.
>>
>>102445736
actually retard, they are very difficult to explain given they run on the piece of cheese you ate for breakfast this morning, do yourself a favor and grow up and mature, being a 14 year old atheist on the internet worshipping an autocomplete algorithm is pathetic and you'll always be a retard when you don't see AI for what it is
it's not a magic box
>>
>>102445885
Find the part in the human brain describing "Hatsune Miku" and completely remove it without damaging anything else

A really good brain surgeon might be able to find part of it during a very long, difficult, and risky surgery, but completely removing all memory and concept of Hatsune Miku without leaving a trace, and without removing anything else? Good luck
>>
>>102445885
that's irrelevant when we literally understand how to put together AI neural networks, they are just stacked up matrices using gradient descent, they flow information from 0-1 trying to autocomplete a given pattern given data samples it was trained on based on a set of known inputs and outputs
your brain doesn't even work close to this way
>>
>>102445995
>your brain doesn't even work close to this way
never said it works the same, my point is that for the both of them, it's just a black box, we cannot modify shit manually, I wish it was the case for neural networks, imagine the possibilities, censorship removal, concept removal...
>>
File: Diddy.webm (708 KB, 512x384)
708 KB
708 KB WEBM
in 1992, Rare paid $275,000 for each SGI workstation to make the prerendered graphics in Donkey Kong Country
nowadays that tech is trivial
some day we're going to be able to immediately make feature-length movies with a click of a button
>>
>>102445995
The electricity screams into the void, proudly claiming itself to be more than electricity
>>
>>102446014
It's not a black box, you are just trying to exaggerate because you have a biased worldview so you try to pretend an autocomplete algorithm is the same as your brain which processes a trillion times more context and memory in a trillionth of the time using a trillionth less power while also being capable of learning on the fly and INVENTING things. We literally understand how AIs work. We are not even one iota close enough to understanding how the human brain works. It's always funny too because you people always end up saying soul (trying to be ironic) but it just is admission.
>>
>>102446095
*$275K is already accounting for 2024 inflation
>>
>>102446095
back then, GPUs weren't even close to the architecture limit, now we got 4nm shit and we can't go much lower
>>
>>102446113
>We literally understand how AIs work.
we don't know shit, and the fact we can't do any manual manipulation on the network proves my point
>>
>>102446132
Manually manipulate a jet engine with your bare hands
>>
My GPU just blew up. I am in Lebanon.
>>
We have to come to terms with the fact that we simply lack the required efficiency to push parameter counts high enough. Some neurons in the cortex require around 1k artificial neurons to emulate, even at billions of parameters we're still orders of magnitude off.
>>
we know how AI do but do we know why AI do?
>>
>>102446343
We're far away from even just having models train while also being able to evaluate, the whole process is pure brute force.
>>
>>102446305
send more lebanese women to texas bro
>>
>>102446381
Even harder, we need a solid reward function to even know if training is progressing correctly, that model would somehow too need to be tuned. Kind of a recursive issue.
>>
>>102445995
>Stacked up matrices using gradient descent
Neurons
>they flow information from 0-1
Neurotransmitters
>trying to autocomplete a given pattern
Electricity flows through neurons in ways that were advantageous for natural selection, flows through neural nets in ways that were advantageous to being selected as favorable by humans. Neither neurons nor neural nets are "trying" to do anything, they're just naturally fulfilling the patterns they evolved on
>given data samples it was trained on based on a set of known inputs and outputs
Human evolutionary behaviors and lived experiences
>>
>>102446403
AGI is just contemporary cold fusion.
>>
File: file.png (225 KB, 570x727)
225 KB
225 KB PNG
you vill learn the flux latents
>>
>>102446343
Maybe analog computers will be the next big AI breakthrough
>>
File: 00076-453299283.png (2.33 MB, 1120x1440)
2.33 MB
2.33 MB PNG
>>
File: file.png (1.68 MB, 3400x1579)
1.68 MB
1.68 MB PNG
>rescaled width 720
>rescaled height: 480
>base_resolution: 512
>Final video: width = 608 + height = 416
what the fuck?
>>
>>102446555
sd3 and resize you image manually for your first run.
>>
>>102446499
Maybe, they definitely introduce some more stochastic behavior into the system, probably important to reach a certain level of dynamic behavior.
>>
>>102446628
I don't think I understand, can you elaborate anon
>>
Another bread, has arrived...
>>102446651
>>102446651
>>102446651
>>
>>102446660
resize you image manually for your first run and sd3.
>>
File: 1726690182703685_edit.png (1.06 MB, 3400x1579)
1.06 MB
1.06 MB PNG
>>102446660
I am not sure about the SD3 thing, but it doesn't match the example on the website.

The image resize thing just ads complexity while you are figuring things out.
>>
File: file.png (897 KB, 2384x1549)
897 KB
897 KB PNG
>>102446829
>I am not sure about the SD3 thing, but it doesn't match the example on the website
that's the example of the repo though
https://github.com/kijai/ComfyUI-CogVideoXWrapper
>>
>>102446861
yup and they are using fp8. It is probably fine, but you are getting errors you don't like.
>>
>>102435695
If it's that hand fixing method that use MeshGraphormer, the way it's meant to work is hand detection (bounding box) -> fitting a hand mesh to the deformed hand as well as possible with MeshGraphormer -> render the depth map of the fitted hand mesh, which is thresholded to also provide a mask -> inpaint the region defined by the dilated mask, using a depth Controlnet with the hand depth map as an input.
For anime, based on what I've seen using the hand refinement stuff in the Acly Krita plugin, either the bounding box hand detector or the MeshGraphormer fails to detect/fit the hand, because neither were trained on anime I think.
While training a bounding box detector for anime hands would be doable and might already exist, a specialized MeshGraphormer would be harder. Maybe using a synthetic dataset with toon-shaded anime models, perhaps augmented with light img2img.
>>
>>102441782
Oh yup that was where I was stuck, it was working with the default one though



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.