[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: long dick general.jpg (3.95 MB, 3264x2790)
3.95 MB
3.95 MB JPG
General dedicated to creative use of free and open source text-to-image models.

Previous /ldg/ bread : >>101292106

Renaissance Edition

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio
EasyDiffusion: https://easydiffusion.github.io

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
StableSwarmUI: https://github.com/Stability-AI/StableSwarmUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
ComfyUI: https://github.com/comfyanonymous/ComfyUI

>Auto1111 forks
SD.Next: https://github.com/vladmandic/automatic
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
Anapnoe UX: https://github.com/anapnoe/stable-diffusion-webui-ux

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Models, LoRAs & Training
https://civitai.com
https://huggingface.co
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
Comfy Nodes: https://github.com/city96/ComfyUI_ExtraModels
*Also supported by SD.Next

>Animation
https://rentry.org/AnimAnon
https://rentry.org/AnimAnon-AnimDiff
https://rentry.org/AnimAnon-Deforum

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>View and submit GPU performance data
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Share image prompt info
https://rentry.org/hdgcb
https://catbox.moe

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/trash/sdg
>>
Blessed thread of frenship
>>
File: 0.jpg (438 KB, 2048x1024)
438 KB
438 KB JPG
>>
>>101301739
nice collage
>>
File: 00001-3183164047.jpg (554 KB, 1344x2016)
554 KB
554 KB JPG
>>
File: tmp0_i6_9_9.png (1.14 MB, 1344x768)
1.14 MB
1.14 MB PNG
>>
File: cce3.jpg (115 KB, 1024x1024)
115 KB
115 KB JPG
>>
File: 00214-3467391479.jpg (398 KB, 1176x1764)
398 KB
398 KB JPG
>>
>>101301617
Anybody knows?
>>
>>101302023
youtube has a tutorial on it
>>
File: 00232-3467391481.jpg (721 KB, 1260x1680)
721 KB
721 KB JPG
>>
File: 00261-3467391478.jpg (381 KB, 1260x1680)
381 KB
381 KB JPG
ripley
>>
>>101301625
please put the prompt in SD3 and post the results. Let's see if there is any "improvement".
And of course there is a large memory requirement. You need to swap the CLIP with beefy LLM in order to get good textual understanding. LLMs require lots of memory. There will be optimizations down the line, but the truth is simply that the next generation of image models require significantly more memory than Stable Diffusion 1.4 or SDXL.

If you are looking at buying GPUs and you want to play with best local models in the future, you will need to invest into something that has at least 16GB of vram. It's the price you pay for superior textual understanding. If not, then you can continue to use Stable Diffusion 1.4 or whatever on your 4GB card. These researchers are always targeting the 90 class card for their releases. Something that a normal consumer could get access to.
If the 5090 has 32GB of vram, then you can bet your ass that the next generation of diffusion models from research labs are targeting that.
>>
File: 1713153346418885.png (24 KB, 751x415)
24 KB
24 KB PNG
https://github.com/plemeri/InSPyReNet
i'm looking for an .onnx for this model but i cant for the life of me find it, if it even exists. i've found a .pth but the imagesegmentation on comfyui requires it to be .onnx as i understand it.
>>
File: 2356215439.jpg (76 KB, 896x768)
76 KB
76 KB JPG
>>
File: 00316-3467391479.jpg (618 KB, 1260x1680)
618 KB
618 KB JPG
>>
File: kolor vs sd3.jpg (607 KB, 1696x1732)
607 KB
607 KB JPG
>>101301625
>>101302212
>>
File: 00473-663714552.jpg (330 KB, 941x1260)
330 KB
330 KB JPG
>>
>>101302279
can this model run the llm first, then do the inference to save vram like SD3? or since it's unet, both have to be used at once?
>>
File: 240707_122.jpg (63 KB, 512x768)
63 KB
63 KB JPG
it's not local but fun. cant stop generating since yesterday.
I want strong machine
>>
File: 00004-663714549.jpg (1.1 MB, 1890x2520)
1.1 MB
1.1 MB JPG
>>101302875
fun hobby, can't recommend enough
>>
File: 00005-299241702.jpg (1.03 MB, 1890x2520)
1.03 MB
1.03 MB JPG
>>
File: 00006-2111909322.jpg (1.08 MB, 1890x2520)
1.08 MB
1.08 MB JPG
>>
For some reason this model likes to use yellow background
>>
Any way to run Kolors with comfy yet?
>>
File: canvas (1).png (1.56 MB, 840x1256)
1.56 MB
1.56 MB PNG
>>101302949
>>101303002
>>101303107
Model?
>>
File: PA_0001.jpg (795 KB, 2560x1536)
795 KB
795 KB JPG
>>
>>101303904
https://github.com/kijai/ComfyUI-KwaiKolorsWrapper
>>
File: PA_0002.jpg (763 KB, 2560x1536)
763 KB
763 KB JPG
>>
File: PA_0003.jpg (812 KB, 2560x1536)
812 KB
812 KB JPG
>>
File: PA_0023.jpg (763 KB, 2560x1536)
763 KB
763 KB JPG
>>
>>101304302
Sick pixel art
>>
File: PA_0026.jpg (685 KB, 2560x1536)
685 KB
685 KB JPG
>>
File: PA_0027.jpg (634 KB, 2560x1536)
634 KB
634 KB JPG
>>101304326
Thanks
>>
>>101304378
Np, I just realized this one >>101304116 is also very cool
>>
>>101304093
Where is quantized text encoder?
>>
File: PA_0029.jpg (942 KB, 2560x1536)
942 KB
942 KB JPG
>>101304116
>>101304135
This prompt always gens awesome things
>>
>>101304420
what was the prompt? still using the new bunline? really awesome
>>
File: KOLORS_00001_.png (3.71 MB, 1920x1088)
3.71 MB
3.71 MB PNG
>>
Is kolors a model created from scratch or is it just a hypertuned XL?
>>
>>101304504
That's what creates pixels all over it.

Noita is a magical action roguelite set in a world where every pixel is physically simulated. Fight, explore, melt, burn, freeze and evaporate your way through the procedurally generated world using spells you've created yourself.
>>
>>101304558
all we need is a comfy node, the UNET is about the same size as XL.
>>
File: PA_0034.jpg (681 KB, 2560x1536)
681 KB
681 KB JPG
>>
>>101304595
You forgot the giganiga bytes of text encoder. That;s the big problem for vramlets!
>>
File: PA_0035.jpg (860 KB, 2560x1536)
860 KB
860 KB JPG
>>
File: PA_0036.jpg (993 KB, 2560x1536)
993 KB
993 KB JPG
>>101304504
It's a mix of Bunlinev8 and booruMadness
>>
>>101304618
can't you run that on RAM? or do it in a similar way to sd3? also >>101304419
>>
>>101304647
>booruMadness
you try https://civitai.com/models/505948/pixart-sigma-1024px512px-animetune ?
>>
>>101304618
Can you use that text encoder for PixArt?
>>
File: PA_0037.jpg (1 MB, 2560x1536)
1 MB
1 MB JPG
>>
File: PA_0038.jpg (986 KB, 2560x1536)
986 KB
986 KB JPG
>>101304665
No, I don't think I have. Don't usually gen 1girl unless the thread derails
>>
>>101304419
can you use the SD3 T5 on this model?
>>
>>101304678
i imagine you might be able to coax some non-1girl out of it but im just speculating ive never tried myself
>>
>>101304669
From the workflow image it seems different. It's called ChatGLM3 model.
>>
File: PA_0039.jpg (917 KB, 2560x1536)
917 KB
917 KB JPG
>>
File: KOLORS_00013_.png (1.96 MB, 1024x1024)
1.96 MB
1.96 MB PNG
KOLORS works pretty well right out of the box. Really impressive for a base model.
I think they really overstated the prompt adherence though.
>>
File: PA_0041.jpg (628 KB, 2560x1536)
628 KB
628 KB JPG
>>101304689
I like that the model maker posted some Training stats

Below I will list the GPU and training time I used for my training. Please use it as a reference for your training!

If you want to know the exact settings, please download the onetrainer data.

GPU: RTX 4060 Ti 16GB

â– 512px

Batch size: 48

70,000 / 48 = 1,500 steps

1 epoch: 5 hours

15 epochs: 75 hours

GPU usage: 13GB
>>
>>101304716
the model is uncesored, but what about copyright, can you throw some anime characters in there? maybe some artist names?
>>
File: KOLORS_00028_.png (962 KB, 768x1024)
962 KB
962 KB PNG
>>
>>101304788
how are you making these?
>>
File: KOLORS_00029_.png (1.12 MB, 768x1024)
1.12 MB
1.12 MB PNG
>>101304785
Give me a prompt and I'll see what it spits out
>>
please keep your fatjak on his leash, he's running free in sdgs yard again
>>
File: KOLORS_00033_.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
>>101304801
idk, it was posted here
https://github.com/kijai/ComfyUI-KwaiKolorsWrapper
>>
>>101304841
thanks i will check it out
>>
For 2D gen dpm++2sa prob not a mandatory
>>
>>101304803
NTA
90s retro anime screencap, an illustration inspired by the anime 'Ghost in the Shell', Depict a woman in a stasis tank, capturing the iconic aesthetic and atmosphere of the anime.
>>
i will wait for the quant 8/4 version and a proper implementation before trying it. the life of a 8gb VRAMlet is not easy.
>>
File: KOLORS_00061_.png (1.51 MB, 1024x1024)
1.51 MB
1.51 MB PNG
>>101304903

I regret to inform you that it bungled the assignment
>>
File: KOLORS_00056_.png (1.46 MB, 1024x1024)
1.46 MB
1.46 MB PNG
>>101304939

Another one
>>
File: PA_0042.jpg (257 KB, 2560x1536)
257 KB
257 KB JPG
>>101304689
Didn't do anime for me.
>>
File: KOLORS_00063_.png (1.51 MB, 1024x1024)
1.51 MB
1.51 MB PNG
>>101304953
>>101304939
Here's one after putting things like 3D, CGI etc in the negatives.
>>
File: KOLORS_00075_.png (1.59 MB, 1024x1024)
1.59 MB
1.59 MB PNG
I just don't think it gets what a stasis tank is.
>>
>>101304939
>>101304953
>>101304966
I don't see the resemblance, it does understand retro/90s though
>>
File: KOLORS_00084_.png (1.49 MB, 1024x1024)
1.49 MB
1.49 MB PNG
>>101305016

I think it suffers from the same issue pixart has in that most of the captioning is done by an LLM and the LLM is just describing what it sees. So while a person captioning might say, "That's ghost in the shell" the LLM will say "That's a 90's anime that appears to depict a woman with purple hair" etc etc
>>
>>101305016
Maybe you can save some tokens if you put

>timeless 1girl
>>
I can't go back to unet after trying out dit thodesu
>>
File: file.jpg (880 KB, 1664x2304)
880 KB
880 KB JPG
>>101301739
Nice
>>
>>101305034
>most
i thought it was more like 50/50?
>>
File: file.jpg (995 KB, 1664x2304)
995 KB
995 KB JPG
>>101303930
maybe https://civitai.com/models/550737?modelVersionId=612841
>>
>2023-12-26
1.5 just HITS DIFFERENT
>>
>>
>>
>>
>>101305536
nice
>>
>>101304959
are you using booru tags?
>>
File: PA_0001.jpg (497 KB, 2560x1536)
497 KB
497 KB JPG
>>101305770
It's a weird model, had to use makers workflow to test things out (didn't go well) so I'm back to my original one modified a bit to look like his. Finally getting some results.
>>
File: PA_0011.jpg (446 KB, 2560x1536)
446 KB
446 KB JPG
>>
File: file.png (3.7 MB, 1664x2304)
3.7 MB
3.7 MB PNG
>>
Does the new update to a1111 make it as good as forge? Has anyone tried?
>>
File: PA_0012.jpg (333 KB, 2048x2048)
333 KB
333 KB JPG
>>101305986
>>101305822
>>
File: file.png (3.06 MB, 1664x2304)
3.06 MB
3.06 MB PNG
>>101306210
nice sigma anime
>>
File: PA_0391.png (1.3 MB, 1024x1024)
1.3 MB
1.3 MB PNG
https://files.catbox.moe/n348s1.png

All the progress I've made with it as a workflow.

I would also suggest using for description ( about a paragraph) of what you're trying to make.
In example.
Gandalf is described as a tall, slender man with a long white beard and bushy eyebrows that stick out beyond the brim of his hat. He wears a tall pointed blue hat, a long grey cloak, and a silver scarf. He is often depicted as having a wise and authoritative presence.
>>
File: kolors.png (1.69 MB, 1024x1024)
1.69 MB
1.69 MB PNG
tldr; on kolors? tried on it a huggingface space and it's looking pretty good! is there a way to offload the text encoder to cpu like with pixart or sd3? so far we now have this and hunyuan. there's also the new lumina and pixart bigma models to be released. seems like /ldg/ will be eatin good soon.
>>
>>101306419
samples look good but I can't run it either
>>
File: ComfyUI_temp_eclgh_00030_.png (1.65 MB, 1024x1024)
1.65 MB
1.65 MB PNG
My thoughts on kolors so far: It's extremely good for a base mode. Like really good.
As for prompt adherence, it's kinda hit or miss. It doesn't understand brands or people or IPs unless they're ludicrously famous. Jesus, pikachu etc. Don't type in megumin and expect to get megumin. Even if she's in the dataset, she probably wasn't tagged as such because it looks like it was tagged by an LLM.
>>
>>101306759
hands lookin good my man
>>
>>101306759
if its prompt adherence is so so, what do you like about it?
>>
File: file.png (3.68 MB, 1664x2304)
3.68 MB
3.68 MB PNG
>>
>>101306778
I think the outputs look good on their own, but it's held back by its prompt adherence. It honestly might be because it was trained on Chinese as well as English. Like is a strong word though. I'm lukewarm on it.
>>
what matters prob just 'vintage'
>>
>>101306759
Characters and concepts can be added later easily. When I look at base model, all I care about are just three things:
>Prompt following
How complicated can my prompt be with multiple elements/concepts
>Image quality
How nice looking and crips are the images.
>Anatomy
Are humans and animals anatomically correct with no extra or merged limbs.

Only thing I don't like about Kolors is the prompt following. It feels like it does better in Chinese and always translating your shit is a hassle, but doable. It has pretty basic image quality. Lower than base SDXL.
It however has superior anatomical understanding. It's on par with Dalle 3 with that regard.
I wonder if you could swap the LLM out to something more basic and still keep the anatomical understanding. There is no need for Chinese understanding for those that speak English and also the prompt following abilities are just not there, I don't think the LLM is very useful and it's just pure bloat, if the image quality and anatomy stays same without it.
>>
I hold a doctorate in applied synthography
>>
>>101306960
Applied Synthology
>>
>>101306960
pfff thats nothing compared to promptsmithing.
Get a real job loser
>>
>manual image generation
>>
File: file.png (3.57 MB, 1664x2304)
3.57 MB
3.57 MB PNG
>>
>>101303930
>>101305494
yeah that's the one
>>
>>101306936
Too true, anyone who complains about characters or artists I think are silly because an individual can cook up an artist lora or character lora so easily on their own and share with others.
>>
File: HANDS.png (2.43 MB, 2048x1024)
2.43 MB
2.43 MB PNG
>>101304716
>>101302279
Yeah, that's probably the best local model we got there, it has good anatomy, is uncensored, can do hands! Too bad it's fucking unet and not DiT though...
>>
File: file.png (1.65 MB, 1024x1024)
1.65 MB
1.65 MB PNG
man, 1girl posters would love this model... if they could run it. fingers crossed they release a version where you can offload the llm onto cpu.
>i could not gen it with her fingers crossed
>>
File: file.jpg (988 KB, 1920x2176)
988 KB
988 KB JPG
>>
>>101307281
>unet and not DiT
Does it matter if it works? Is DiT more efficient?
>>
>>101307360
>Is DiT more efficient?
It's way more efficient, look at Sora for example, that's an example of a good DiT model

Ultimately, Colors could possibly be the step forward SDXL, but we can do even better with DiT models (unfortunately we got shit DiT models like SD3, pixart, hunuyuan)
>>
>>101307375
Isn't pixart's only "fault" being under trained? I don't think that puts it on the same level of suck as the other two.
>>
File: 16.jpg (32 KB, 344x281)
32 KB
32 KB JPG
>>101307375
>unfortunately we got shit DiT models
surely there is no correlation between DiT and bad quality
>>
>>101307391
Did you purposely stoped reading the part where I said that Sora is a DiT model?
https://www.youtube.com/watch?v=h37A4zocIFg
>>
>>101302279
>>101307281
>>101304716
the only mildly interesting images ive seen from that model
>>
>>101307469
For a base model that's insane, way better than anything we ever got, finetuning this shit will be a blast
>>
>>101307476
fair, the system reqs sound a bit harsh however
>>
File: file.jpg (825 KB, 1920x2176)
825 KB
825 KB JPG
1girl
>>
File: file.png (1.08 MB, 1024x1024)
1.08 MB
1.08 MB PNG
kolors does some pretty good expressions like base pixart, doesn't feel soulless.
>>
>>
The Kolors has some fucky licensing situation going on. It claims to be apache-2.0
>The code of this project is open-sourced under the Apache-2.0 license
and then next sentence is
>We sincerely urge all developers and users to strictly adhere to the open-source license
>https://huggingface.co/Kwai-Kolors/Kolors/blob/main/MODEL_LICENSE
And in that new license they forbid any commercial usage and require you to contact them for "new license" if you intend to use it for commercially.

All the projects have apache-2.0 tags. They claim that the code only is under apache and that the model is under their own license. I wonder if it is all legally sound.
>>
>>101307568
i'm sorry to break your enthusiasm anon, but there's nothing about it's expressions that speaks sovl
>>
>>101307595
sovl
>>101307568
sovlless
>>
>>101307604
im talking compared to the plastic dogshit sd models usually shit out
>>
>>101307603
>And in that new license they forbid any commercial usage and require you to contact them for "new license" if you intend to use it for commercially.
Caught my attention as well. That's basically the SAI license terms with chink characteristics.
>>
>>101307622
They're both equally unremarkable in terms of sovl. Even this >>101307616 anon gets it.
>>
File: license.png (22 KB, 669x356)
22 KB
22 KB PNG
>>101307624
I had GPT-4o read trough their new license and analyze it. These are the main issues it points out.
>>
File: file.png (615 KB, 1024x1024)
615 KB
615 KB PNG
>>101307639
i was planning on genning an image with extra sovl to make you eat your words, but i ended up just proving you right instead
>>
>>101307693
i do indeed enjoy that one
>>
File: file.png (1.99 MB, 1024x1024)
1.99 MB
1.99 MB PNG
>>
>>101307700
i like it too, but it cannot compare to what base pixart gave me
>>
>>101307603
>code has license A
>model has license B
it's not a difficult concept
>>
File: file.png (1.21 MB, 1024x1024)
1.21 MB
1.21 MB PNG
>>
But can it do booba
>>
>>101307720
But there is more. I looked up their questionnaire that you have to fill and send to them and you are also required to accept some new agreements.

https://kolors.kuaishou.com/agreement
https://kolors.kuaishou.com/policy

These fucking people.
>>
>>101307761
What did you expect? It's tech demo bait, same as SD3
>>
>>101307805
>What did you expect? It's tech demo bait, same as SD3
the problem is that it's a GOOD demo bait, SD3 sucks ass so we don't bother with this shit, but we'd love to tinker with Colors
>>
File: awdefsrd.png (370 KB, 606x678)
370 KB
370 KB PNG
>>101307761
>https://kolors.kuaishou.com/agreement
>https://kolors.kuaishou.com/policy
>>
>>101307822
kek
>>
File: file.png (1.72 MB, 1024x1024)
1.72 MB
1.72 MB PNG
>>101307745
yeah.. er.. meow?
>>101307822
it's all so tiresome
>>
>>101307896
Just curb your enthusiasm, curiously observe and adapt. It's simple as.
>>
File: 00722-2857992323.jpg (365 KB, 1260x1680)
365 KB
365 KB JPG
>>101307595
>>
in pony prompts are you supposed to use underscores or not? what people do doesn't seem consistent because you see the typical score_9, score_8_up, etc, but you also see all other tags like "green eyes," which have no underscore and yet they're trained on booru tags which actually do have underscores on those websites. but I also see a lot of "source_cartoon, source_anime".
>>
File: 00049-2079579033.png (1.57 MB, 1024x1024)
1.57 MB
1.57 MB PNG
>>
>>101307925
>in pony prompts are you supposed to use underscores or not?
You only really want to use them for two specific cases. The full score_schizo prefix, and source_mongolian. When it comes to regular prompts, with the way the tokenizer seems to work, it makes a very marginal difference. I just ran a couple of quick comparisons and it's very hit and miss in terms of quality. I'd say it's not worth the bother of adding underscores to prompts/tags. It understands them just as well without it.
>>
>>101304716
Yeah, prompt adherence is SDXL level at best. Really disappointed trying this out myself after having read the paper.
>>
File: 00796-2857992323.jpg (378 KB, 1075x1613)
378 KB
378 KB JPG
Euler A AYS seems decent

>>101307972
castle by hr giger?
>>
File: tmpovqfgi61.png (1.72 MB, 2048x1182)
1.72 MB
1.72 MB PNG
Out of curiosity I'll also do a couple of comparisons for score_schizo without the underscore.
>>
>>101302140
Blade Runner vibes. Very nice.
>>101308027
Good to know. Pony prompting is very strange. Do you know how often one should use BREAK?
>>101308075
Is that some new sampler?
>>
>>101308072
the worst part is that it's asking for a shit ton of VRAM to run the LLM that is supposed to make it "good" at prompt understanding when it's not good at all
>>
>>101308086
>Do you know how often one should use BREAK?
I never BREAK, especially ever since I started to use Pony, since it seems to impact performance, and I'm on a tight vram budget to be able and test it properly. I experimented with breaking back in sd 1.5 and I still don't understand how it works.
>>
>>101308086
>Is that some new sampler?
not new, just didn't bother testing

https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/
>>
File: file.jpg (985 KB, 1920x2176)
985 KB
985 KB JPG
>>
>>101308075
Nope. Gateway to hell by Beksinski and Haeckel.
>>
File: tmpj8b1uekm.png (2.87 MB, 2048x1182)
2.87 MB
2.87 MB PNG
>>101308086
>Pony prompting is very strange.
Aside from having to add the score_prefix, not really. I never really found the need for source_prefix, since I rely on style loras. Otherwise I just prompt with more or less natural language, be it a mix of short tags or longer descriptions. I only really had to hop off the vanilla pony model for the sake of autismmix, since the regular version was sometimes a pain in the ass to get good results from.

Protip: don't experiment with changing the score_prefix, leave it at the recommended score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up. Putting any of the scores in negatives, or leaving any of them out does impact quality, detail and prompt adherence.
>>
File: tmpgaqo6gm3.png (2.51 MB, 2048x1182)
2.51 MB
2.51 MB PNG
>>
>>101308171
nice
>>
File: file.png (317 KB, 444x453)
317 KB
317 KB PNG
Could anyone throw me a bone?

Whenever I do inpainting, I get bruising. For example, I tried on a drawing of mine, and the blonde hair turned into this pink/purple-ish mess. Equally, whatever I inpaint on, starts to turn purple and loses a huge amount of loss.

Why the fuck does this happen? I'm using Autism DPO and the sampler is DPM++ SDE Karras. Never happened on SD 1.5 models.
>>
>>101308182
thx for the tip and cool gen
>>
>>101308182
I agree with not changing "score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up" however according to my tests it does seem like a good idea to put that whole string at the end of the prompt, because if it's at the start then it seems to hurt the strength of the other tags.
>>
File: 00875-2857992324.jpg (375 KB, 1075x1613)
375 KB
375 KB JPG
>>
>>101308205
thanks anon
>>
File: tmp8gpcqc86.png (2.04 MB, 2048x1182)
2.04 MB
2.04 MB PNG
>>101308213
Hard to say what exactly causes this behaviour, so you can only play around with setting and see what works for this particular case:

Try changing the sampler, increasing steps to ~45 can help with blending and preserving original content, gradually lower the denoise, CFG or masked padding. At some point it should budge.

DPM++ SDE Karras is a bit of a peculiar sampler, so that alone might be reason enough. Some samplers handle way differently with img2img and it's settings.

>>101308242
>it does seem like a good idea to put that whole string at the end of the prompt
That's actually a very good point. Wouldn't surprise me, since prompts at the beginning always get more attention. I guess the only difference would be what are your priorities. Do you want more emphasis on quality and have it loosely inspired by the prompt, or are you more concerned with prompt adherence, even if at the cost of aesthetics. Damn, might want to test it myself now.
>>
>>101308281
Which sampler do you recommend? If I try with Euler A for example, it gets way worse.
>>
>>101308086
>Do you know how often one should use BREAK?
I'm relatively new to all of this, but I believe it works like this. Typically a prompt's word strength is by the order the words appear in so if we have 8 words:
word, word, word, word, word, word, word, word, word, word
You could imagine their strengths as:
1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1,
But if you put a BREAK in the middle, their word strengths would become like this:
1.0, 0.8, 0.6, 0.4, 0.2 BREAK 1.0, 0.8, 0.6, 0.4, 0.2

Someone correct me if I'm wrong.
>>
>>101308297
I've fallen behind on samplers, and I've been using Euler A lately myself. I remember DPM++ 2s a / Karras being good at img2img refining, but it might be dated and slow. You could try DDPM I guess?
>>
>>101308312
meant to say 10* words
>>
>>101308312
75 tokens per chunk, using break allows you to define you own chunks
essentially yeah
>>
when will they invent ai smell generators? i want to know what my waifu smells like
>>
File: file.jpg (849 KB, 1920x2176)
849 KB
849 KB JPG
>>
>>101308426
>when will they invent ai smell generators? i want to know what my waifu smells like
Wake up, anon: https://civitai.com/product/odor
>>
>>101308518
s-s-sniffa sniffa?
>>
>>101308613
c-cringe
>>
File: ComfyUI_01745_.jpg (366 KB, 2048x1024)
366 KB
366 KB JPG
>>
>>101309037
wtf model is that
>>
File: ComfyUI_Kolors_00101_.png (1.55 MB, 1216x832)
1.55 MB
1.55 MB PNG
I'm impressed by Kolors so far.

Prompt:

 Outside of a supermarket in a cyberpunk dystopian future, a wizard casts a spell to open a portal to the world of infinite eggs. It is stormy, windy and thundering. Magical portal. Photography. 
>>
>>101309054
that's really good. How about cartoon banana that is stripping itself like the old meme
>>
>>101309052
most of the style comes from this lora https://civitai.com/models/84527/chinese-style-illustration
>>
>>101302279
how do i run kolor locally? does it run in comfyui?
>>
File: ComfyUI_Kolors_00116_.png (1.26 MB, 1216x832)
1.26 MB
1.26 MB PNG
>>101309070
Kind of? It doesn't do peeled bananas, it looks like.

>>101309107
https://github.com/kijai/ComfyUI-KwaiKolorsWrapper
>>
>>101309054
can you try complex poses like handstand or backflip?
>>
File: ComfyUI_Kolors_00134_.png (1.34 MB, 1216x832)
1.34 MB
1.34 MB PNG
>>101309158
 A man doing a handstand. 
>>
>>101309186
that's not bad at all, probably the best base model of them all, can you do the same for a woman in a bikini? :^) just to see the anatomy, yes the anatomy :^)
>>
File: ComfyUI_Kolors_00140_.png (1.37 MB, 1024x1024)
1.37 MB
1.37 MB PNG
>>101309186
It can't do backflips, though.
>>
File: long dick general (2).jpg (3.6 MB, 2365x3264)
3.6 MB
3.6 MB JPG
Can't say I'm a fan of using score as suffix over prefix, it's hit and miss. The differences are very marginal, but it does adhere to prompts slightly better. Then again, devil is in the details, so there might be some merit to it. There was a couple of cases where the suffix version did have better quality, more detail or interesting composition.

Overall there isn't a big discrepancy in their behaviour between steps or CFG. Even longer and shorter prompts tend to behave similarly.
>>
>>101309220
one day i'll be able to smell her
>>
File: ComfyUI_Kolors_00161_.png (1.08 MB, 1216x832)
1.08 MB
1.08 MB PNG
>>101309206
I tried to gen this north of 30 times I think. This was the only time it gave me a woman actually doing something that looked like a handstand. It was usually just frontal shots of women with flat chests or outright manly pectorals in bikinis.
>>
>>101309337
lmao it looks like a deflated sex doll
>>
File: ComfyUI_Kolors_00164_.png (1.3 MB, 832x1216)
1.3 MB
1.3 MB PNG
>>101309337
Sometimes stuff like this.
>>
File: 0.jpg (358 KB, 1024x1024)
358 KB
358 KB JPG
>>
>>101309337
>>101309351

It's really good at wallpaper material though.
>>
File: long dick general.jpg (3.19 MB, 2306x3264)
3.19 MB
3.19 MB JPG
>>
File: ComfyUI_Kolors_00188_.png (1.73 MB, 1216x832)
1.73 MB
1.73 MB PNG
>>101309445
Forgot my pic. Whoops!
>>
File: 0.jpg (292 KB, 1024x1024)
292 KB
292 KB JPG
>>
File: ComfyUI_Kolors_00205_.png (1.55 MB, 1216x832)
1.55 MB
1.55 MB PNG
>>101309456
>>
>>101309220
>>101309448
mind if i ask for a catbox/prompt? how do you get dynamic pictures like that? sd loves to be symmetrical
>>
File: file.png (1.35 MB, 1323x634)
1.35 MB
1.35 MB PNG
>>101308314
No luck with anything. I'm tempted to just re-install.

Maybe it's a problem with forge? If anyone's ever experienced this, pls reply I'm going nuts here.
>>
File: kolors_00105_.png (1.38 MB, 1024x1024)
1.38 MB
1.38 MB PNG
>>
File: ComfyUI_Kolors_00222_.png (1.85 MB, 1216x832)
1.85 MB
1.85 MB PNG
Kolors generates faster than Pixart Sigma and SD3, too, which is a real bonus.
>>
File: ComfyUI_Kolors_00229_.png (1.98 MB, 1216x832)
1.98 MB
1.98 MB PNG
>>
>>101309545
the hands look really really good, I think we found the next model to move forward, thank god
>>
File: kolors_00137_.png (1.49 MB, 1024x1024)
1.49 MB
1.49 MB PNG
Anyone else notice kolors really having trouble producing images of people below the shoulders?
>>
>>101309680
This will always be the problem with models obsessed with aesthetics metrics.
>>
File: long dick general (1).jpg (3.05 MB, 2306x3264)
3.05 MB
3.05 MB JPG
>>101309538
>Maybe it's a problem with forge?
Doubt it. Hell, you could try using the soft inpainting mode it has pre-installed. Can you give me an example of your settings, mask, prompt? Anything and everything.
>>101309535
>how do you get dynamic pictures like that?
Honestly, no clue. Tough luck with catbox, but I can throw in a hint or two. Autismmix (Pony), 1344x768, 25/45 Euler A steps, 7 CFG, so nothing out of the ordinary. As for prompts, it's pretty much "random bullshit go". I think the combo of horizontal resolutions coupled with dutch angle, from above/below tends to give gens a lot of depth. Sometimes I barely have a prompt in there, other than what I just mentioned, and it does well either way. I think horizontals in general just tend to be more dynamic due bias in training material. Same might be the case for verticals? 1:1 just tend to be simpler in composition. Also good style loras go a long way.
>>
>>101309337
Artificial filter layers that when activated will fuck up the image perhaps? Chinese cucks. Nothing that can't be removed of course.
>>
File: kolors_00143_.png (1.34 MB, 1216x768)
1.34 MB
1.34 MB PNG
>>101309692
It's true, the images look REALLY good, but it's hard to wrangle this thing to produce anything other than a portrait.
>>
>>101309680
If you can, try changing the resolution to something asymmetrical, see if that changes anything. Resolution ratios REALLY make a huge impact.
>>
>>101309680
Yeah, looks like you have to use some faggot verbose prompt to get what you want
https://huggingface.co/Kwai-Kolors/Kolors/discussions/7#668a52edcf56ff052f2886b9
>>
File: kolors_00145_.png (1.73 MB, 1216x768)
1.73 MB
1.73 MB PNG
>>
>>101309728
>it's your bad prompt.
I'm starting to dislike this model even more with every post itt.
>>
>>101309744
he sounded like Lykon not gonna lie: "hurdur skill issue"
>>
File: kolors_00147_.png (1.76 MB, 1216x768)
1.76 MB
1.76 MB PNG
>>
>>101309744
It's a retarded take because it ultimately ignored the pope keyword which means it's bad at prompt adherence.
>>
File: kolors_00153_.png (1.57 MB, 1216x768)
1.57 MB
1.57 MB PNG
>>
File: ComfyUI_Kolors_00234_.png (1.74 MB, 1216x832)
1.74 MB
1.74 MB PNG
>>101309680
It's not so bad if you're generating dudes, but with women it's painful trying to get anything other than a face shot.
>>
File: kolors_00160_.png (1.48 MB, 1216x768)
1.48 MB
1.48 MB PNG
>>
>>101309755
this, I hate those mf who gaslight people into writing a fucking bible to describe mundane stuff everyone would understand anyway
>>
>>101309753
>>101309755
A base model indeed shouldn't require much thought to have decent output by default. Even prompts like "masterpiece, good quality, highly detailed, score_9" or whatever should be their default per se, nevermind fucking prompt adherence, which doesn't even make sense, since more prompts in the long run means more of them will get lost and ignored along the way.
>>
>>101309787
guess that women flooded the internet with their useless selfies, and they trained the model with only that kek
>>
can someone generate a big pair of breasts? i'd like to look at some
>>
>>101309852
Surprisignly, Kolor can do that, the chinks are way less prude than the western fags, that's how far the west has fallen, we're loosing to a communist country that banned porn
>>
>>101309848
No, if you only train using high aesthetics metrics you start filtering out even images of people standing. Portraits are going to score higher. It's why I think it's one of the reasons that hold base models back especially SAI's because they incompetently set their filters.
>>
>>101309707
Do you mind if it's a NSFW image? I'm willing to post my WiP image with metadata on catbox.
>>
>>101309922
shoot
>>
>>101309738
>>101309754
>>101309818
damn great
>>
>>101309857
>chinks are way less prude than the western fags
chinks are pretty horny, their fetishes are always on the more extreme side
>>
File: 00007-2857992324.jpg (552 KB, 1344x2016)
552 KB
552 KB JPG
>>
>>101309934
https://files.catbox.moe/1blbj3.png
Sorry for the wait.
I'd post the mask but it's literally visible given it became a discolored blob on her abdomen.
>>
>>101309934
>>101310151
Oh yeah, and my settings on forge are default except Eta noise seed delta at 31337 and clip skip 2.
>>
>>101310151
Good stuff. Simplify your inpainting workflow and slowly build up from a minimum. Chances are you overcomplicated your prompt and so the inpainting gets lost with your input.

Try removing the BREAK, reduce your prompt to essentials like the score_prefix and something like dark shiny/glossy skin, naked, maybe laying back. Get rid of the negatives, score especially shouldn't be there. I'm not sure what the original input looks like, but you can try lowering denoise to 45 or 35.
>>
File: 00974-2857992327.jpg (644 KB, 1344x1781)
644 KB
644 KB JPG
>>
File: file.png (363 KB, 636x608)
363 KB
363 KB PNG
>>101310291
Thanks. Original input looks like pic related.
I'll try what you said, hope to god it works. Even if it doesn't, thank you for your help so far. I appreciate it, anon.
>>
File: 0.jpg (412 KB, 1024x1024)
412 KB
412 KB JPG
>>
File: 0.jpg (406 KB, 1024x1024)
406 KB
406 KB JPG
>>
This is what kolors thinks nipples look like.

https://files.catbox.moe/k0vps6.png
>>
File: 01004-2857992326.jpg (590 KB, 1344x1781)
590 KB
590 KB JPG
>>
File: PA_0014.jpg (411 KB, 2048x2048)
411 KB
411 KB JPG
>>101309489
Very nice>>101310527
>>
>>101310557
looks fine to me
>>
File: 00014-1847830168.jpg (238 KB, 1400x1008)
238 KB
238 KB JPG
>>101309818
Nice
>>
>>101310557
WTF????
>>
>>101310557
can confirm. looks like a nice base for finetunes though

https://files.catbox.moe/jgsrs2.jpg
>>
>>101310835
yeah, only the nipples look weird, the rest of the anatomy is really great, sounds easy enough to fix
>>
File: ComfyUI_Kolors_0025.jpg (193 KB, 1024x1024)
193 KB
193 KB JPG
>>
For everyone wanting to test Kolors out, you can use this ComfyUi wrapper
https://github.com/kijai/ComfyUI-KwaiKolorsWrapper
>>
File: ComfyUI_Kolors_0035.jpg (210 KB, 1024x1024)
210 KB
210 KB JPG
>>
It should be criminal to ship a new model without fine tuning code.
>>
File: ygwfy21ec3bd1.png (2.23 MB, 768x1344)
2.23 MB
2.23 MB PNG
You can test Kolors on this huggingface demo:
https://huggingface.co/spaces/gokaygokay/Kolors
>>
File: BRUH.jpg (56 KB, 1096x799)
56 KB
56 KB JPG
>>101310993
https://github.com/Kwai-Kolors/Kolors
Holy fuck it's not even on their plan
>>
>>101310291
It's the fucking noise multiplier for img2img and vae. It was at 0.7.

For some reason it's still "tinted" towards purple/pink on forge but normal on A1111.
>>
File: grege.png (1.85 MB, 1024x1024)
1.85 MB
1.85 MB PNG
>>101310995
Noice, I have trouble removing the blur though, even by putting "bokeh, blur" on the negative prompt
>>
>>101311010
Chads
>>
File: image (2).png (1.71 MB, 1024x1024)
1.71 MB
1.71 MB PNG
>>101310995
>Hatsune Miku shaking hands with Gawr Gura, studio ghibli drawing style
It doesn't know who Gura is :(
>>
>>101307281
Could I get a prompt please?
>>
>>101311233
it was "drawing of hands"
>>
>>101310557
>>101310835
It's hit or miss, but it can generate decent nipples:
https://files.catbox.moe/vmjuus.webp
https://files.catbox.moe/d5iigy.webp
>>
>>101311333
For a base model that's insane how good it looks, I knew the chinks would save us from cucked SAI
>>
>>101309337
Try writing it in chinese
>>
>>101307925
>>101308027
Aren't underscores useful to make something one term so you won't confuse the definitions

Like if I use feather_duster, I'm going to get feather dusters. But if I use feather duster with a space in between I might see rogue feathers in my images, or someone will be wearing a long jacket
>>
File: image (3).png (1.62 MB, 1024x1024)
1.62 MB
1.62 MB PNG
>>101310995
I'm really impressed by this model, finally we can be free of cucked SAI
>>
official pixart bigma and lumina 2 waiting room
>>
>>101311557
I can't be 100% sure, but that would mean "doggy style" would create dogs wouldn't it? I think it identifies it as a booru tag.
>>
>>101311073
Weird, I'm also using Forge and the multiplier is at 1.0 for me by default.
>>
>>101311561
without finetunes? not really.
>>
File: Kolors.jpg (1.48 MB, 2532x2572)
1.48 MB
1.48 MB JPG
>>101311612
Kolors is probably the best base model we ever had, that will probably be the model going forward for finetunes and shit
>>
>>101311623
50 yuan has been deposited to your account
>>
>>101311623
Curb you enthusiasm anon, lest you want to end up disappointed.
>>
>>101301938
Ha, I did a few gens of something like this (though my version was more "monstrous", with claws and a tail)
>>
>>101311623
It looks like one of the better bases, it was also published without model creation / finetuning code.
>>
>>101311623
the model is good but i have this bad feeling
>>
>>101311637
>>101311647
Seriously, look at that, the potential is here, it's completely uncensored and the anatomy is good, remember this is just a base model, imagine with a nice finetune on top of that >>101311333
>>
>>101311665
>Seriously, look at that
That, anon, is an extremally small sample of a model barely anyone of us can run or finetune locally, with questionable licensing and arguable quality. I'm having hunyan flashbacks already.
>>
>>101311557
>>101311583
I don't think that particular example is likely because there would be a lot of examples in its training data for "doggy style" whereas doggy and style are going to less represented in a porn data set

I think it happens if one of your terms is more common than both combined. Like feather would surely be more common than feather duster.
>>
>>101311719
Hunyuan isn't that good compared to Kolors though, people will work hard to optimise Kolors, like they did for the SAI models, that's how it always worked in the imagegen ecosystem
>>
Has anyone been able to cover up nudity with objects? Like the subject holding, like, flowers or a doll something to their chest?
>>
>>101311740
I'll give you that from a glance I see more quality in kolors, as compared to hunyuan, but I remain sceptical nontheless. Licensing remains questionable, and hardware requirements an obstacle.
>>
File: 0.jpg (384 KB, 1024x1024)
384 KB
384 KB JPG
>>101310581
thanks, wizard.
>>
>>101311790
inpainting, regional prompting. or you can train a concept lora
>>
>>101311810
What even are the hardware requirements to train Kolors?
>>
>>101311810
it asks for 20gb of vram because of the model + llm, the llm can be quantized and put on the cpu, and the model can be put at 8bit without much accuracy drop, the optimisation will be easy to do, for the licence yeah it needs clarification, it say it's MiT but it's actually not
>>
>>101311834
The thing about training, let alone finding a lora is I don't even know what that sort of thing is even called
>>
>>101311790
haven't tried it but "convenient censoring" is well tagged on danbooru
>>
>>101311860
we don't know, we haven't the training code
https://github.com/Kwai-Kolors/Kolors/tree/master
>>
>>101311866
are you new to this?
>>
>>101311866
search for "censor" on civitai, they have loras for tails, hair, steam, soap, a general convenient censorship, etc
>>
>>101311623
>that will probably be the model going forward for finetunes and shit
Stop repeating this.
>>
>>101311931
https://www.youtube.com/watch?v=yWULCfJ2PGA
>>
can kolors generate girls with armpit hair
>>
>>101311876
>no training code
So it's as useful to anon as SD3. Nice.
>>
File: PA_0016.jpg (894 KB, 2560x1536)
894 KB
894 KB JPG
>>
>>101311995
try it
https://gokaygokay-kolors.hf.space/
>>
File: PA_0017.jpg (633 KB, 2560x1536)
633 KB
633 KB JPG
>>
File: Kolors-Miku.png (2.09 MB, 1024x1024)
2.09 MB
2.09 MB PNG
>>101312025
I'm sure some autist will make a training code, the worst part was getting a good base model
>>
>>101312072
Is it possible to create training code without the weights?
>>
>>101312093
Why do you ask this question, we have the weights already
https://huggingface.co/Kwai-Kolors/Kolors
>>
>>101312072
>I'm sure some autist will make a training code
just like AMD expects their userbase to make up for their shortcomings?
>>
>>101312107
Fair enough I am mildly retarded
>>
>>101311333
Holy shit that looks decent
>>
>>101312112
you could have chosen a better example, like SAI asking us to fix their shitty base model since the begining (2022), it's been 2 years we were polishing their turds, nothing new in the sun
>>
Asking again; is Kolors a model trained from scratch or is it simply a hypertuned XL
>>
>>101312128
fair nuff
>>
>>101312138
it's a model trained from scratch, they said it on their paper
https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf
>>
>>101312118
it's all right kek
>>
Kolors added to...
>>101312179
>>101312179
>>101312179
>>
File: PA_0023.jpg (858 KB, 2560x1536)
858 KB
858 KB JPG
>>
File: file.png (1.42 MB, 1024x1024)
1.42 MB
1.42 MB PNG
>>101312040
no hair :(
>>
>>101311876
Let's see if that changes. Team Sigma / Lumina is already ok tho.
>>
>>101312194
yeah I just tried with 4 pictures and it doesn't show armpit hair either, it will be fixed with some finetunes though, not the hardest thing in the world to add
>>
>>101312194
nice pits but yeah.. looks like mostly trained with chinese-typical pretty but non-fetish art?
>>
>>101312288
>non-fetish art?
they can't, porn is illegal in China
>>
File: 0.jpg (174 KB, 1024x1024)
174 KB
174 KB JPG
>>
File: 0.jpg (235 KB, 1024x1024)
235 KB
235 KB JPG
>>
>>101312332
>>101312642
i respect the grind



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.