[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ComfyUI_01791_.jpg (3.32 MB, 2048x2048)
3.32 MB
3.32 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102013088

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>GPU performance
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/g/sdg
>>>/h/hdg
>>>/e/edg
>>>/c/kdg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/u/udg
>>>/trash/sdg
>>
Blessed thread of frenship
>>
File: ComfyUI_00935_.jpg (2.1 MB, 2048x2048)
2.1 MB
2.1 MB JPG
>>
File: ifx168.png (1.34 MB, 1024x1024)
1.34 MB
1.34 MB PNG
>>
File: FLUX_00018_.jpg (486 KB, 2016x1152)
486 KB
486 KB JPG
>>
File: 1698822159771336.jpg (25 KB, 525x384)
25 KB
25 KB JPG
Ideogram niggas launched 2.0 in panic its barely any different than last ver .
>>
File: SuperMetroidScreenshot.png (393 KB, 618x464)
393 KB
393 KB PNG
Challenge: Find a captioner that, when fed with this picture, produces a prompt that when fed to Flux recreates it.
This is now a benchmark for how close we get to it, when we do it, the sky is the limit.
>>
>>102017385
we GAN now?
>>
>>102017385
do you know how diffusion works
>>
>>102017385
https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha
>This image is a still frame from a video game, specifically a scene from a side-scrolling action game. The setting is a dark, cavernous environment with a green, rocky floor and ceiling. The floor is covered in small, sharp spikes and jagged rocks. On the left side of the image, a character dressed in a yellow, armored suit is seen, with a red visor and a gun in their hand, standing near a metallic, cylindrical object that resembles a door or entrance. The character is facing a massive, green, reptilian creature with a large, round head, red eyes, and sharp teeth. The creature is standing in the middle of the frame, towering over the character, and appears to be about to attack. The creature's skin is rough and scaly, with spikes protruding from its back and shoulders. The background is mostly black, with occasional greenish hues, enhancing the eerie and intense atmosphere. The top of the image features various energy bars and health indicators, indicating the game's status. The image is rendered in a pixelated, retro style, typical of early 90s video games, with vibrant colors and sharp contrasts, creating a visually dynamic and immersive scene.
>>
>>102017428
>The soulless cash grab remake
>>
File: fs_0142.jpg (372 KB, 1536x2048)
372 KB
372 KB JPG
>>
File: ComfyUI_00625_.png (1.99 MB, 896x1344)
1.99 MB
1.99 MB PNG
>>
>>102017300
no collage but thats a nice image
>>
File: ComfyUI_00578_.png (1.9 MB, 1024x1024)
1.9 MB
1.9 MB PNG
>>
>>102017428
https://huggingface.co/spaces/SakanaAI/Llama-3-EvoVLM-JP-v2
>The image is a scene from a video game, specifically a classic game with pixel art graphics. The game is set in a jungle environment, with lush greenery and a dense forest that fills the background. In the foreground, there is a large, green, and slimy monster that is the main subject of the scene. Its body is covered in moss and dirt, with thick, green leaves on its head and neck. The monster's mouth is open, revealing sharp teeth, and it appears to be roaring or shouting something. The monster is standing on a platform, which is a solid gray color and has a slightly raised edge.
lol
>>
File: FD_00057_.png (2.37 MB, 1024x1536)
2.37 MB
2.37 MB PNG
>>
Lower your guidance or whatever removes the sloppa nowadays
>>
>>102017504
that's hard with fuxxed apparently
>>
>>102017515
For you
>>
>>102017504
and then it stops looking like what you're trying to prompt
>>
>>102017530
>no image
>>
>>102017484
https://aistudio.google.com/app/prompts/new_chat (Google Gemini)
>The image is a screenshot from a video game. It is a side-scrolling platformer where the player controls a character named Samus Aran, who is a bounty hunter. The game is set in a dark, futuristic world. Samus is in the lower left corner of the screen. She is facing right. Her power suit is light green and yellow, and she has a helmet on. She is holding a weapon in her right hand. Samus is standing in front of a large, circular object. The object is a giant metal pipe or tube. Above Samus and the metal tube, there are a series of green, rectangular blocks. In the center of the top of the screen, there are two icons. Both are brown. The one on the left is rectangular and has a number '67' in it. The one on the right is square and has a '4' in it. To the right of the center of the screen, there is a very large, green alien creature that looks like a dinosaur. It is a light green and has brown markings on its face. It has a large, round belly. The creature's head is very large, with a long, flat snout, and very big eyes. The alien's upper jaw is open. The creature is partially obscured by green, spiky objects that resemble giant thorns. It is hard to tell what these objects are. The background is dark, with a black sky and green ground. The bottom of the image is a green, spiky surface. There are many large, green, flat blocks, and many smaller, round blocks. The player can walk on all of these surfaces. The image is a screenshot from a video game and is very stylized. The graphics are pixelated and the colors are bright and colorful.
>>
File: _mLpMwsav5eMeNcZdrIQl.png (1.11 MB, 3960x2378)
1.11 MB
1.11 MB PNG
>>102017428
I'm still confused about people applying JoyCaption to everything as a VLM when it isn't SOTA. The only valid usecase is to make uncensored captions, but some of the models out there from from the Chinese can be retrofitted with less censored finetunes which will effectively do the same thing. MiniCPM-V is the best small model but even that is no match for the best one out there which is the 76B model from InternVL2. There's no usecase other than speed. Even for training, you would take the extra time to label all these images.
>>
>>102017554
While I agree with you and I've modified my Joy caption script to run smaller versions of intern vlm, There are serious practical hurdles to running a 76b model locally.
>>
Why are the flux training sample images just noise? I'm already at 500 steps.
>>
File: 3d_conv.jpg (465 KB, 1826x2048)
465 KB
465 KB JPG
>>102017300
this is annoying and I don't want to do anymore. I'll call it at 2.5D

Things I should have noticed.
-Resolution is difficult and I should of outpainted immediately
-There are few things about this image that is really special and therefore hard to keep it as is. I could have protected the hair accessory, but I thought the color was better.
-I don't know what dress ridges are or what they are called and inpainting loved to eat them.
-I could have removed the necklace, but I like it.
-There is no way to keep the blush and eyeliner at such strong levels without it looking more artificially generated, at least to me
-the arm size on her left arm is unnatural and should be fixed, but I attempted to keep it the same

As far as tips, when impainting leave the arm band out. It doesn't survive any touching. Using a controlnet IP adapter helped, but not greatly. Upscaling/Downscaling tricks didn't really work. I had to gimp the ridges in the dress back in for reasons previously mentioned
>>
File: fs_0160.jpg (54 KB, 512x512)
54 KB
54 KB JPG
>>
>>102017646
should have stopped at around 20 steps
>>
>>102017300
why is sdg still in the op?
>>
>>102017646
It's broken mate. Stop it and fix it
>>
File: 4.jpg (2.7 MB, 3072x1280)
2.7 MB
2.7 MB JPG
>>
>>102017663
>this is annoying and I don't want to do anymore. I'll call it at 2.5D
What is it?
>>
>>102017704
neat
>>
>>102017696
why do you care?
>>
>>102017706
not realistic, but getting there.

I linked the wrong thing. The request was to convert this >>102014576
I thought it would be a fun challenge.
>>
File: ComfyUI_00818_.png (1.31 MB, 896x1152)
1.31 MB
1.31 MB PNG
https://imgsli.com/Mjg5OTQ1
a comparison between training flux loras at 1024,1024 vs 512,512. Same epoch, repeats, steps, etc. The 512 one I trained at a higher dim and without the split network optimization argument.
24 images, 10 repeats, 10 epochs
1024,1024 took around 5 hours
512,512 took me 1 hour
Pretty stoked on the speed and quality of training at 512.
>>
File: flux_00394_.png (1.06 MB, 1160x936)
1.06 MB
1.06 MB PNG
>>
>The more images, the better
>You can train a lora with just 20 images
w-which is it
>>
>>102017742
why not throw a few 1024x1024 in with the 512x512 for some diversity
>>
>>102017742
Nice, I'm having good luck training at 512, too. How did you caption your images?
>>
>>102017385
https://aichatonline.org/gpts-2OToA97Vhr-Describe-Image (GPT4o)
This went to hard that I had to pastebin it:
https://pastebin.com/Y4CGtRyH
>>
>>102017646
It's ogre. Your training parameters or dataset were bunk and the loss went parabolic.
>>
>>102017781
the more UNIQUE images the better. If you have the same person in the same outfit then stop at 20.
>>
File: ComfyUI_00819_.png (2.04 MB, 1536x1152)
2.04 MB
2.04 MB PNG
>>102017788
the input images are of varying size, I just mean the --resolution launch parameter. I didn't resize/crop anything unless it just resizes the buckets automatically. Still very new to training loras.

>>102017789
I just used joycaption
>>
Is it possible to train a Flux LoRa in 8GB VRAM?
>>
>>102017781
More images makes the lora more flexible.
>>
>>102017742
>512
I was skeptical at first too, but it seems like any potential quality loss from the smaller images is made up for with buckets and higher dimensions.
I've trained at 1024 and 512 and I did not notice a quality drop or increase except for like you said, speed.
I'll wait for more people to try various settings first, but at least on a visual level, training LoRAs at 512 seems a-okay
>>
>>102017859
Possible? Maybe? Practical? No.
>>
>>102017880
So what would be a decent amount of VRAM to train a Flux LoRa?
>>
>>102017742
What dims?
>>
>>102017899
the devs have been assuming that you should have 24GB since cascade.
>>
>>102017899
Realistically? 24gb. You might be able to get away with 16, but your LoRAs will take forever to train and can't take advantage of higher dimensions
There is an anon here claiming he does it on 12gb and there are scripts to confirm that on the Kohya page, but he effecively bricks his PC for 12 hours while it trains.

I don't think you could realistically train on 8GB in a manner that's practical.
>>
>>102017742
>>102017869
so input size doesnt matter? i guess i've been "pre" bucketing to 1024x1024 on mine
>>
>>102017856
>I just mean the --resolution launch parameter
well you'd have to change the logic of it to sometimes use 1024 images instead of 512 all the time
>>
File: flux_00452_.png (1 MB, 1200x936)
1 MB
1 MB PNG
>"five six, one hundred and twenty pouns, three years as mall security night shift and five years of krav maga..."
>"well, uh... thank you for your application, but it's just that our ninja enterprise has a different profile in mind..."
>"we hope you understand"
>>
File: 2024-08-22_00109_.png (2.32 MB, 1024x1280)
2.32 MB
2.32 MB PNG
>>
File: ComfyUI_00821_.png (1.76 MB, 1152x1536)
1.76 MB
1.76 MB PNG
>>102017906
the 1024 was 4, was using the default kohya settings. 512,512 I did 16, figuring I had more juice to spare, after looking through a few config files of what people were doing on Bmaltais Kohya GUI github discussions.
>>
File: 00036-3114675299.png (1.87 MB, 1024x1440)
1.87 MB
1.87 MB PNG
>>102017922
I've been training on 16gb. 512x512, 3000 steps, 16 dim, takes around 1 hour 40 minutes
>>
>>102017382
Yeah, I'm a few tests in and it's nowhere near as good as Flux.
Flux has realism LoRAs which look better than what that model thinks realism looks like. Sure, it may know more concepts, as closed models tend to due to their sheer size, but it is not as polished in details. Aside from that, it also seems to not be following the prompt as good as Flux, at least not as consistently (and it fucked up in one of my tests which Flux passed with flying colors). Even with Magic Prompt turned off it does its own thing.
>>
File: 1723857565887861.jpg (3.84 MB, 7961x2897)
3.84 MB
3.84 MB JPG
Anyone has an updated version of this?
>>
>>102018003
>512x512
Is that good enough? Is it because you can't fit larger images on 16GB?
>>
>>102018003
>3000 steps
repeats, epochs?
>>
>>102017922
12 hours anon here, the training used 7GB VRAM at most (512x512, dim 8) and I was able to comfortably work in Photoshop at the same time. Anyway, it only took so long because I was using very low learning rate.
>>
>>102018009
Q6_K takes less, around 11.1GB during inference and 7.3GB during idling.
>>
File: FD_00060_.png (2.11 MB, 1024x1536)
2.11 MB
2.11 MB PNG
>>
File: FD_00063_.png (2.24 MB, 1024x1536)
2.24 MB
2.24 MB PNG
>>102018003
She looks like a fat capitalist. She needs some good manual labour and a diet of borscht.
>>
File: CUI_iced_00348_.jpg (790 KB, 1440x2560)
790 KB
790 KB JPG
>>102017990
>>102017471
>>102017329
love these
>>
File: 00050-3146234069.png (1.42 MB, 1024x1440)
1.42 MB
1.42 MB PNG
>>102018019
>512x512
>Is that good enough?
Seems to be the consensus. I've noticed finer skin details like moles not being picked up but that could be my settings. I'm still testing various configs and captions
>>102018033
30 Images. So 10 repeats 10 epochs
>>
How's flux for landsdcape paintings and porn?
>>
>>102018006
And it's kind of strange. I noticed so many shills on Reddit claiming the model is really really good, but it's not better than Flux dev.
>>
>>102017300
It's 2024
Still can't spread diffusion model VRAM across GPUs
>>
File: FD_00065_.png (2.25 MB, 1024x1536)
2.25 MB
2.25 MB PNG
>>102018079
I'd have gone for 20 repeats, 10 epochs. But it's obvious who she is.
Can you try some close ups?
>>
Just trained a lora on my own 3d render style and damn flux is really good at it. Though autocaptioners seem like complete shit. Florence-2 just kept making shit up and repeating itself over and over again. I wound up having to just rewrite most of them myself. Is there any info on what Flux used to caption? I feel like captioners are still the main limiter and the improvement for image models can just keep growing as long as captioning improves.
>>
File: flux_00461_.png (1.01 MB, 1200x936)
1.01 MB
1.01 MB PNG
>>
File: 00000-3023947591.png (1.17 MB, 1024x1024)
1.17 MB
1.17 MB PNG
>>102018123
Like I said skin detail is lost, it totally failed to pick up her cheek mole despite me tagging it. You think 20 repeats would've helped?
>>
SD 3.1 soon, any carers?
>>
File: fs_0202.jpg (99 KB, 2048x1344)
99 KB
99 KB JPG
>>
>>102017587
Yeah I understand the issues with the hardware requirements but even InternVL2 at a smaller size is going to be better than something that is still closed source. Until we know where it lands in terms of performance, it seems a bit too hasty to apply it to everything. The only issue with smaller models like MiniCPM or InternVL2 is that they are using internal or Chinese models. The finetunes aren't great for these, like with Qwen 2 7B used as the base for MiniCPM-V 2.6, pickings are slim. It's either Dolphin or Einstein. I probably should regress back to 2.5 so I can use any Llama 3 model of choice but the vision CLIP part is worse.
>>
Comfy/Auto when? https://github.com/ShaochengShen/MegaFusion/
>>
File: 00051-AYAKON_12481768768.jpg (1.94 MB, 3840x1600)
1.94 MB
1.94 MB JPG
made some more small edits to this, really like can be done with pony and flux mixing parts of the process
>>
>>102018196
Nope.
>>
>>102018204
prompt?
>>
File: flux_00465_.png (926 KB, 1200x936)
926 KB
926 KB PNG
>>
>>102018196
Pressing X to doubt bout 2 times
strongly doubting a .1 can fix the issues.
>>
>>102018233
>a still image from a weatherstation camera overlooking a small town at dusk, the sky is darkened with a serious looking storm. The photo captures the awe-inspiring split second that lightning arcs across the entire landscape. Barely visible in the distant clouds, an ominous looming monstrous terrifying kaiju with glowing eyes
>>
File: fs_0206.jpg (77 KB, 2048x1344)
77 KB
77 KB JPG
>>
File: flux_00022_.png (1.45 MB, 1160x896)
1.45 MB
1.45 MB PNG
>>102018196
SAI can fuck off and die.
>>
>>102018196
look if the model is by some miracle better than flux (99.9% chance it won't be) I will be back on board the stable diffusion train
Realistically it will probably be mediocre and we get a ton of bullshit about safety and respecting 3rd worlders 2gb vram pcs
>>
File: 16767674178512926.png (363 KB, 735x581)
363 KB
363 KB PNG
>the future of stability is focusing on as low spec as possible for the indian market
>>
File: 3.jpg (734 KB, 896x1152)
734 KB
734 KB JPG
>>102017715
thank you anon
>>
File: FLUX.jpg (154 KB, 1496x1168)
154 KB
154 KB JPG
Still getting the hang of the prompting of flux.
>>
>>102018292
>if this new free thing is better than this other free thing I'll take the better free thing
Well yeah
>>
File: flux_00470_.png (1.05 MB, 1200x936)
1.05 MB
1.05 MB PNG
>>
File: Capture.png (2.29 MB, 1493x1168)
2.29 MB
2.29 MB PNG
Why is flux so prone to creating arches and pathways like this
>>
>>102018340
https://old.reddit.com/r/StableDiffusion/comments/1exw2m4/sd_31_is_coming/ljad3lk/
>>
>>102018349
Very common in concept art. Try to include a viewing angle, might help it.
>bird's eye view
>side view
>from above
etc
>>
>>102018316
Beautiful
>>
I have a schizo idea. We know that flux is trained with guidance distillation. Meaning that instead of predicting the noise, it is trained to predict the CFG difference from the teacher model, i.e. cfg*noise_pred - (cfg-1)*unconditional_noise_pred. And the guidance vector tells it what CFG was used when constructing that target.

Now the problem is that when finetuning flux, we don't have a teacher model. So we can't train on the distillation loss, since we don't have the unconditional half of that expression. Instead it seems best to just train on noise prediction alone (corresponding to CFG=1) and set the guidance vector to 1.

But why can't we just use the unconditional prediction from the model itself when making the guidance distillation target? Just predict the noise with an empty prompt and guidance=1, to get uncond_noise_pred. Now the target is just cfg*real_noise - (cfg-1)*uncond_noise_pred. The model itself was used to get the unconditional half of the CFG difference. You could use caption dropout as well to make sure the model learns to make good unconditional predictions. Call this "self distillation" or some shit.

why wouldn't this work
>>
File: CUI_iced_00379_.jpg (644 KB, 1440x2560)
644 KB
644 KB JPG
>>102018389
madman, try it out, if it works ask a llm to write you a research paper and drop it and become a legend
>>
>>102018389
>why wouldn't this work
Sir I just type in "boobs" and hit generate
>>
Anyone else notice that painting LoRAs still manage to make images that look too realistic/CGI instead of painting? In particular this one
https://civitai.com/models/649868/leonardo-davinci-style?modelVersionId=727069
and
https://civitai.com/models/669566/style-of-rembrandt-flux-135

and many others. This is not an issue when it's very stylized, but when it's involving painting portraits like these it tends to do that.
>>
File: titty lora stack.png (41 KB, 428x886)
41 KB
41 KB PNG
holy fucking SHIT now THAT"s a LORA STACK
https://civitai.com/images/25026595
>>
File: 1.jpg (2.47 MB, 1792x2304)
2.47 MB
2.47 MB JPG
>>102018382
No, you are beautiful anon
>>
>>102018019
512x512 generates elongated bodies like here >>102018079
This has already been documented.
>>
>>102018520
>175 steps
>1000 steps
poorfags shouldn't be baking loras. crank that up to 12000 and it will be fine.
>>
>>102017805
https://www.repixify.com/tools/image-description-generator
>The image is a screenshot from a video game. The player character, a yellow and red figure with a helmet, is standing on a platform to the left of the screen. A large green and yellow monster with a large mouth is in the foreground. The monster is facing the player character and appears to be about to attack. The background is a dark, pixelated landscape with green foliage. The game's interface is visible at the top of the screen, showing the player's energy level, score, and other information. The image evokes a sense of danger and excitement, as the player character faces a formidable opponent.
>>
>>102018389
isn't flux the teacher model?
>>
File: CUI_iced_00397_.png (3.8 MB, 1440x2560)
3.8 MB
3.8 MB PNG
>>102018706
>>102017805
i use 4o as well for all my prompts now too lol, we truly are fucked in a decade
>>
>>102018706
https://asuo-ai-labs.streamlit.app/Image_Accessibility
>A screenshot from a retro video game depicts a battle scene between a small human character and a large, menacing green creature. The human character appears to be wearing a suit and helmet, standing on a platform composed of blue and black blocks on the left side of the screen. The platform is slightly raised above a bed of sharp spikes. On the right side of the screen, the large green creature with multiple eyes and sharp spikes on its back is facing the human character. The creature appears to be attacking, as there are small explosions and smoke near its body. The background of the scene consists of dark colors, with green vegetation at the top and bottom. At the top left corner of the screen, there are indicators for "ENERGY 67," and below it, the numbers "020 04." In the top right corner, there is a small grid. Retro video game screenshot showing a human character facing a large green creature in a battle scene.
I found surprising that generating these descriptions takes longer than it takes Flux to draw them.
>>
>>102018701
I trained a LoRA at 7000 steps last night and it wasn't even particularly overbaked.
>>
I have been going crazy trying to hack together a solution for local image tagging. I had this crazy idea that I could just feed an LLM a single image's clip embeddings of a character and it would be able to identify them when captioning data, instead it said almost everyone was that character.
>>
>>102018520
Undertrained loras. This also happens with 1.5 and sdxl ones. From my experience with training stuff, faces are usually the last thing to change and need the most amount of training to get rid of their baked in look.
>>
File: image.png (1.52 MB, 800x1400)
1.52 MB
1.52 MB PNG
>>
>>102018805
What was the learning rate?
>>
>>102018797
So far it's the only one capable of recognizing that it's Super Metroid, Samus Aran, and even Kraid the boss and all the things.
>>102018800
https://huggingface.co/spaces/llava-hf/llava-4bit
>The scene is set on an alien planet, where two characters are engaged with each other amidst the greenery of their surroundings. One character appears to be fighting or interacting closely while standing next to another large creature that seems more like some kind of monster than its own species member. This creates tension between them as they face off against one another within what could possibly resemble outer space terrain filled mostly by plants rather than typical landscapes found elsewhere such as grassy fields forests etc..
So far the furthest from the target.
>>
>>102018880
Nta but just use Prodigy and you don't have to worry about the learning rate.
>>
>>102018901
>Prodigy
Do I just set the optimizer to Prodigy or are there other parameters I need to adjust?
>>
>>102018901
nta, but does prodigy work with flux? I thought it ate a huge amount of vram.
>>
>>102018889
Sometimes I find these so bad I find them amusing.
https://huggingface.co/spaces/gokaygokay/FLUX.1-dev-with-Captioner (Florence 2)
>The image is a screenshot from a video game. It shows a scene from the game, where the player is in the middle of a battle. The player is wearing a yellow outfit and is standing on a platform with a blue background. On the right side of the screen, there is a large green dragon with sharp teeth and red eyes. The dragon is attacking the player with its mouth open, as if it is about to attack. The background is black, and there are various enemies and enemies scattered around the scene. At the top of the image, there are two buttons, one labeled "Energy 67" and the other labeled "020 04".
Whoever is using Florence 2 to caption your training data, please reconsider.
>>
can someone link me the training scrips for flux with Kohya, I cant find them,
>>
>>102018960
Use prodigy and set learning rate to 1.

>>102018976
Uses the same vram as adamw8bit. I haven't completed training yet, so I don't know if VRAM scales up later on.
>>
>>102018978
https://huggingface.co/spaces/gokaygokay/KolorsPlusPlus (Long Captioner - to generate the prompt you have to generate a picture)
>a detailed close-up captures a scene from the video game "metroid," featuring a green monster with large teeth and sharp claws. the monster's body is covered in a mix of green, yellow, and red hues, while its head and hands are adorned with black spikes. the background features a pink tunnel, a gray platform, and a blue water surface. the player character, wearing a green suit, is positioned on the left side of the screen, facing the viewer. the words "energy 67" are prominently displayed at the top of the screen, adding depth to the image.
>>
>>102018988
https://github.com/bmaltais/kohya_ss/issues/2701#issuecomment-2297761417
I've been using this. Might want to bump the learning rate up a little
>>
>>102018181
>skin detail is lost
That's just Flux though, turns them all into smoothskins
>>
>>102018901
Prodigy will never match the quality of the optimizers like CAME and Adam with proper settings. But at the same time, you won't get failures and you will most likely get civitAI quality lora out of it. You just can't improve it or fix it when it goes wrong.
>>
>>102019069
And here's the picture I had to generate by Kolors to get the prompt. Hmmm, is it better than Flux's rendition?
>>
Can someone put a guide together on how to train LoRAs for Flux?
>>
File: ComfyUI_09432_.png (1.2 MB, 800x1400)
1.2 MB
1.2 MB PNG
>>
I'm yet to find a style flux lora that doesn't degrade the anatomy that's not undertrained. Is it just me that have noticed it?
>>
What should be the average loss when training flux loras? One anon said his was around .4 but is that the "correct" value?
>>
>>102019191
das what i'm gettin
>>
>>102019117
Sure, once we get our shit straight
>>
File: Capture.png (2 KB, 541x27)
2 KB
2 KB PNG
>>102019191
Here is my latest batch
>>
>>102019117
You just go to Kohya github or Ai toolkit github and follow the steps.
>>
>>102018360
>reddit
Didnt read
>>
>>102018521
People have no lives.
>>
File: 00090-3902273944.png (1.35 MB, 896x1152)
1.35 MB
1.35 MB PNG
>>
>>102019232
that seems like a lowish lr
>>
>>102018521
Overkill for such a generic image
>>
I'm having a hard time wrapping my head around training a LoRA using Kohya. I didn't have issues with ai-toolkit.

Like, am I supposed to run these json files through the gui or through the command prompt?
I can't find simple documentation.
>>
File: 00107-4288387873.png (1.38 MB, 896x1152)
1.38 MB
1.38 MB PNG
>>102019303
I'm still experimenting with parameters. The results seem quite good though.
>>
>>102019333
kohya wrote scripts-ss, bmaltais wrote a gui frontend for it called kohya-ss.

scripts-ss, despite being a buggy mess, more or less works for flux out of the box, you run it from a command line. bmaltais' shit is currently a half-updated hash of kohya's library and i haven't gotten it to work at all.
>>
File: Capture.png (31 KB, 1451x277)
31 KB
31 KB PNG
>>102019333
>>
>>102017805
>>102018797
Oh, okay, looks like we have a new champion, it even managed to get Samus in the picture unlike all other captioners, I didn't even know if it was possible o_O
https://huggingface.co/spaces/Quardo/gpt-4o-mini (gpt-4o-mini-2024-07-18 - it has a hidden queue, so you have to wait for several minutes before it starts processing you, unless you're third or later in the queue, which increases the waiting.)
Again, the prompt is too long for a post so it's pastebinned:
https://pastebin.com/xUXWBDy2
>>
>>102019232
My is at .42 but it seems to be going down. The first sample image looks okay.
>>
File: ComfyUI_32705_.png (1.1 MB, 1024x1024)
1.1 MB
1.1 MB PNG
>>
>>102019333
LoRA_Easy_Training_Scripts on github
>>
>>102017385
joycaption

>This is a detailed, colorful screenshot from a video game, specifically from a side-scrolling, action-adventure game. The scene is set in a dark, cavernous environment with a green, rocky floor and a ceiling covered in spiky, green vegetation. On the left side, a yellow, armored character with a red helmet and a gun is visible, standing near a large, metallic door. The character appears to be a human, wearing a suit with a muscular build and a protective helmet.
>In the center-right of the image, a massive, green, reptilian monster with sharp, spiky horns and a large, gaping mouth is attacking. The creature's skin is textured with scales and its eyes are red and glowing. It is mid-action, with its clawed hand reaching out towards the character, and its mouth open wide, emitting a burst of green, smoky particles.
>The background is dark, with the only light coming from the green glow of the monster and the character's flashlight. Above the monster, there are various icons and indicators, including a health bar, energy meter, and a timer. The overall style is reminiscent of classic, pixelated 16-bit graphics, typical of early 1990s video games.
>>
>>102010022
Thank you Anon, this is exactly what I wanted
>>
>>102019411
>>102019408
>>102019455

Never mind, I figured it out on my own. I think when I updated the dependencies it switched back to the master branch. Please don't call me an dumbass. I'm average at worst.
>>
File: ComfyUI_00830_.png (3.51 MB, 2048x2048)
3.51 MB
3.51 MB PNG
>>
>>102019421
And I guess that finishes the challenge, is it me or does it look BETTER than the original picture? >>102017385 I could use the new one instead, that Kraid Samus monster thing looks badass.
I'm just going to compare these to the best we used to have in the past, the clip interrogators, which, while not reproducing pictures at all, seem fun and cool nonetheless.
https://huggingface.co/spaces/fffiloni/CLIP-Interrogator-2 (Best 24 max flavors)
>a video game with a dragon attacking a man, pixel art, inspired by katsuya terada, stalagmites, turtles, 2 4 0 p footage, 1988 video game screenshot, metroid, gameplay video, fangs and slime, donatello, screenshot from a movie, 240p, sofubi, from berserk, protagonist
>>
File: 00124-1750270787.png (1.06 MB, 896x1152)
1.06 MB
1.06 MB PNG
>Civitai Flux Training Contest
>https://civitai.com/articles/6797
>>
>>102019525
Hands off my magic rock, bitch
>>
>>102019421
>>102019545
nice, props for the tests anon, some neat gens and good info
>>
Is it just me, or the more batches you make with the same prompt and settings in Flux the more accurate results you get over time and the more precise and with less fails.
>>
File: 00323-2024-08-21-cJak.jpg (3.17 MB, 2048x2688)
3.17 MB
3.17 MB JPG
>>
>>102019545
https://huggingface.co/spaces/pharmapsychotic/CLIP-Interrogator (Fast. It always gave me the most fun prompts.)
>a video game with a dragon attacking a man, metroidvania, epic boss fight, epic boss battle, metroid, contra, bossfight, royo, 16bits videogame, boss battle, cacodemon, snes screenshot, koopa, 8bits videogame, super mario bros 1 9 8 5, boss fight, dangerous swamp, snes graphics, 1 6 - bit
>>
File: 2.jpg (650 KB, 832x1216)
650 KB
650 KB JPG
>>
File: 18881.png (545 KB, 1280x768)
545 KB
545 KB PNG
>>102019636
:D
I tend to do these things in private, I'm glad I shared. I had no idea we were at this level already, I thought "maybe next year", but, apparently Flux is all about the prompts.
To close, I used seed 18881 in all my samples, and at one point I hit enter before pasting the prompt and sent a blank prompt, back in SD1.5 days that would produce garbage, but here, I got picrel.
>>
>>102017554
can internvl2 be quanted?
>>
File: ComfyUI_32709_.png (1.28 MB, 1024x1024)
1.28 MB
1.28 MB PNG
>>
File: 1724296462.png (13 KB, 740x323)
13 KB
13 KB PNG
>>102019820
guess i should have kept reading the page before asking
>>
>>102019858
I'm not on the page right now, but I distinctly remember them saying not to quant it because it fucks it up, but that may have just been for the bnb quants
>>
>>102019896
Quants will make most VLMs make subtle mistakes like confusing left and right. Florence honestly is the best bang for your buck VLM, it's almost uncensored.
>>
So do people always train at network dimensions that are double the previous dimension? Why can't I train at the biggest dimension I can?
>>
File: 104308-tmp.png (2.73 MB, 1536x1728)
2.73 MB
2.73 MB PNG
>>
>>102019916
In my rough experience the bigger the better. It's definitely more accurate to crank it up.
>>
>>102019914
I've spent most of this week trying to tard wrangle several vlms to give decent output on the first go. My conclusion is anything you can run at home is mid at best and automatic captions + manual editing is king.
>>
>>102019896
page says 4bit quants fuck it up but nothing about 8bits causing issues
>>
>>102019932
manual editing only works for baby datasets
>>
Nobody respond to it, it will go back to /sdg/ on its own.
>>
File: file.jpg (24 KB, 768x768)
24 KB
24 KB JPG
>>
>>102019938
Interesting. I have 1000 baby Indians in a sweatshop captioning my niche porn.
>>
>>102019950
florence + wdv large v3
ezpz
>>
>>102019916
I remember reading an AI article that says network dim of 8 is the sweet spot. Going above that would give you diminishing returns and reduce flexibility. It would also increase VRAM required.
>>
File: FD_00124_.png (1.65 MB, 768x1344)
1.65 MB
1.65 MB PNG
>>
File: delux_me_00049_.jpg (343 KB, 896x512)
343 KB
343 KB JPG
>>102019949
>what she sees
>>
File: 103991-tmp.png (2.73 MB, 1536x1728)
2.73 MB
2.73 MB PNG
>>
>>102019997
>remember reading an AI article that says network dim of 8 is the sweet spot.

That sounds like absolute bullshit vramlet cope from the days of 1.5.
No offense.
>>
File: 00145-2024-08-21-cJak.jpg (2.94 MB, 2048x2688)
2.94 MB
2.94 MB JPG
>102020013
So ugly
>>
File: ComfyUI_09504_.png (2.15 MB, 1400x800)
2.15 MB
2.15 MB PNG
Whose the retard who started the entire "merge dev with schnell" trend? All he did was fuck up dev models.
>>
>>102020032
It's just people trying to figure out quick fixes to complex problems. It was bound to happen once this model got released. Remember that "pony merge" that came out like on the day of release?
>>
>>102020020
It was from the creators of the algorithm. The original LoRA article.
>>
File: 5.jpg (1.95 MB, 1568x2016)
1.95 MB
1.95 MB JPG
>>
I got some questions about flux training on Kohya.
I'm getting a lot more it/s on Kohya compared to AI toolkit. Anyone else?
Also, I noticed the config I was using was training in 8bit, was that always the case for 24gb cards?
>>
File: FD_00136_.png (1.61 MB, 768x1344)
1.61 MB
1.61 MB PNG
>>
>>102020064
That doesn't really rebut my claim it's outdated shit from the 1.5 days, but I'd assume the guy who created LoRAs probably isn't a vramlet.

I think bigger dimensions on a bigger model makes sense anyway. It was true though that if you trained at 128 dim on XL you'd generally deep fry the model.
>>
>>102019570
Whoever uncucks flux will win all prizes, even the ones from different categories.
>>
File: ComfyUI_09520_.png (2.27 MB, 1400x800)
2.27 MB
2.27 MB PNG
>>102020051
I get why it was tried, what I'm confused about is why people are still doing it. There are zero benefits to it, there are only negatives.
>>
>>102020086
>That doesn't really rebut my claim

What's your claim?
>>
>>102020096
>>102020032
w cheese atv
>>
>>102020117
That you will get a better quality LoRA that resembles the subjects and visuals if you use a higher dimension and as the model gets larger the larger the dimension you can use and achieve good results.
And that dim of 8 is for vramllets.
>>
https://reddit.com/r/StableDiffusion/comments/1ey6hss/kohya_ss_gui_flux_lora_training_on_rtx_3060_lora/
>Kohya SS GUI FLUX LoRA Training on RTX 3060 - LoRA Rank 128 - uses 9.7 GB VRAM - Finally made it work. Results will be hopefully tomorrow training at the moment :)
Excuse me?
>>
Someone get Flux to make a girl in a straitjacket. I give up.
>>
File: 19240467.jpg (21 KB, 460x460)
21 KB
21 KB JPG
>>102020225
Total cerfuckin' victory.
>>
>>102020225
>CeFurkan
Ignore this faggot grifter
>>
>>102017300
What are some cool things to do combining Flux and SD?

There are so many versions of SD, I am checking out waifu-diffusion Radiance. In the example, which has to be changed a bit to get it working, it just takes a ksampler output over to an upscaler, then to another ksampler.

Can Flux handle this? Don't I have to convert it to a format Flux likes?
>>
File: cunt.png (320 KB, 396x387)
320 KB
320 KB PNG
>>102020268
You sure you wanna be talking shit about the guy who rides a dinosaur?
>>
>>102020225
>Finally made it work
By whining in Kohya's issues and making someone else do the fixing and then appropriating results.
>>
>>102020225
>LoRA Rank 128
Into the trash it goes.
>>
>>102020225
Noooo you can't train it on anything less than a 2 A100s! I mean 1 A100! I mean a 4090! I mean a 3090! I mean a 4080! I mean a 3080! I mean... I mean... I mean
>>
>>102020225
Don't forget to subscribe to his patreon for an exclusive early access to the tutorial
>>
>>102020305
Drives me fucking mad and then he'll have the nerve to shill his fucking patreon to sell (you) the results (you) just helped him with
>>
>>102020321
Before you go about reddit clapping for
>Muh community
Take a good look at who's posting it and realize it's probably got some massive asterixis attached.
>>
>>102020323
Nobody will do shit because 99% of people never even set foot into github, let alone look at the discussions. As far as they're concerned, he's THE top developer for LoRAs.
>>
>>102020225
Do your part and downvote this faggot.
>>
>>102020157
Anyone can test more dims on Flux and know side by side more dims is better. We're seriously talking about 10x the model. Also SDXL is a piece of shit that doesn't learn anything and deep fries even on a good day.
>>
>>102020225
>23 seconds per it
>>
>>102020347
I 100% agree, which I why I said a network dimension of 8 is retardedly small.
>>
>>102020310
128 is not enough? I never trained models so I don't know the standards
>>
Fried is an aesthetic look, and shouldn't be tossed out immediately, especially since there might be interesting gens in terms of what objects it might generate, that could be used with processing.
>>
>>102020157
If you're just training styles and faces, you can probably go very high with network dimension. But, if you're training new concepts, you'll have lesser flexibility with a higher network dimension.
>>
File: 104337-tmp.png (2.62 MB, 1536x1728)
2.62 MB
2.62 MB PNG
>>
File: ComfyUI_09583_.png (2.32 MB, 1400x800)
2.32 MB
2.32 MB PNG
Can't wait for complete multimodal model (everything in/everything out). You can give it a title of a comic and have it actually create it, instead of just gibberish. The full version of GPT-4o is supposed to be like this but OpenAI hasn't/wont release it.
>>
File: 00172-966204905.png (1.12 MB, 896x1152)
1.12 MB
1.12 MB PNG
>>
>>102020405
>full version of GPT-4o is supposed to be like this but OpenAI hasn't/wont release it
Anon.. that's just something saltman tells to investors to get more capital. Like telling kids if they're good little boys and girls they'll get a present.
>>
Has anyone managed to make Skimmed_CFG work on Flux? For me it acts as if it doesn't exist
>>
>>102020416
it's all smoke a mirrors anyways, at best it's just a bunch of workflows stapled together, there's never going to be a magic does it all model
>>
>>102020425
Not as long as transformers is the backbone of it all.
>>
>>102020437
yeah the model you're asking for requires billions to train and run, they're running out of runway
>>
File: FLUX00016.png (2.32 MB, 1536x1248)
2.32 MB
2.32 MB PNG
>>
This is fun. waifu-diffusion (sd 1.5 based?)

not sure why it thinks this is a fat frog. Maybe I am word-frying it with too many negatives from the old prompt.

https://files.catbox.moe/v5y9mx.png
>>
File: 00002-1555449755.png (1.57 MB, 1024x1024)
1.57 MB
1.57 MB PNG
>>
>>102020467
>waifu-diffusion
Is it 2022 again?
>>
File: 00003-3031908286.png (1.23 MB, 1024x1024)
1.23 MB
1.23 MB PNG
>>
>>102020479
My goal isn't successful renders, necessarily :^)

Ever try swapping vae?
>>
>>102020467
>waifu-diffusion
>>
File: 00005-3346765107.png (1000 KB, 1024x1024)
1000 KB
1000 KB PNG
>>
Anyone training on Kohya? I'm currently only using 18/24gb and I think that's a little low? I just wanna make sure I'm not using any unnecessary optimizations.
Currently training at rank 32 on 512x512
>>
>>102020514
more batch size bb
>>
>>102020519
True, I forgot about batch size. I remember getting into the weeds trying to find out if it was good or bad for the model to have bigger batch sizes and I think the consensus was that it was actually good for helping the model generalize?
>>
File: Griffter.jpg (44 KB, 471x501)
44 KB
44 KB JPG
>>102020514
don't worry anon, that man is gonna save you >>102020225
>>
File: tod.png (2.44 MB, 1018x1018)
2.44 MB
2.44 MB PNG
>>
>>102020542
batch size is more fasterer, everything else is dumb retard shit from people who train within the margin of noise
you actually think they train the models with batch size 1 or batch size 128?
>>
>>102020548
No, I don't want to be saved by cerfuckin. I want to make sure one of his shortcuts isn't fucking up the potential quality of my models.
>>
File: ComfyUI_00086_ sm.png (1.92 MB, 1280x720)
1.92 MB
1.92 MB PNG
>>102020506
a 1.5 finetune.

Here's an example of possible output.
>>
>>102020566
How new are you
>>
File: ComfyUI_00087_sm.png (2.23 MB, 1510x850)
2.23 MB
2.23 MB PNG
>>102020566
Here I have changed the prompt a bit.
>>
>>102020580
>>102020566
NTA, but everyone knows what waifu diffusion is. Everyone is acting puzzled about it because they genuinely cannot fathom why you are using an outdated proof of concept model in the year 2024
>>
>>102020566
you can do that with Flux by training a lora with 1e3 learning rate of pictures of frogs
>>
>>102020566
>>102020580
looks like shit
>>
>>102020600
waifu sisters it's over for us
>>
File: ComfyUI_00088_sm.png (1.29 MB, 1280x720)
1.29 MB
1.29 MB PNG
>>102020577
One problem with Fusion is it refuses to draw some things, for example deformed arms.

>>102020585
Artistically, instead of middleclass "artistically". The problem with Flux is it's extremely mid, so far.

"to break it"
>>
>>102020618
that's not artistic, that's what every model does when you fry the fuck out of it, it's about as technically impressive as kicking over a can of paint on your carpet
>>
and if you can't see why
>>102020618
is the best gen of the thread, then you aren't qualified to comment on aesthetics.
>>
>>102020618
This has to be a bot or something
>>
>>102020618
You're just generating a 1.5 model at resolutions it was not trained to generate at. That's why it outputting garbage
>>
File: 00009-1426206558.png (1.02 MB, 1024x1024)
1.02 MB
1.02 MB PNG
>>
>>102020632
not gonna lie. I was thinking the same thing. Its replies are nonsense.
>>
File: 00009-1053315402.png (1.27 MB, 896x1152)
1.27 MB
1.27 MB PNG
Behold, real art
>>
>>102020646
Yeah idk what's going on
>>
>>102020632
>>102020646
You are basically uncultured. What is the best art magazine, in your opinion. Both respond please.
>>
>>102020643
>not a school girl outfit
>>
>>102020656
art magazine, lol
>>
>>102020656
That's cool, for me my favorite dessert is Strawberry Short Cake
>>
>>102020653
your images are too white, you should ditch Dynamic Thresholding for Tonemap or AutomaticCFG
>>
>>102020654
It kind of makes sense. The choice of a low overhead model like waifu diffusion is probably to save vram for its word salad.
>>
>>102020660
no, its a professor outfit
>>
File: 00016-1541856161.png (1.25 MB, 896x1152)
1.25 MB
1.25 MB PNG
real art
>>
File: 00013-2692419392.png (1.23 MB, 1024x1024)
1.23 MB
1.23 MB PNG
>>
>>102020670
Is there a way to do that on forge? I have comfy working but forge just seems way faster
>>
>>102020674
that's the problem with loading multiple loras, one will take over the other, George's face is everywhere on that picture kek
>>
>>102020656
Shonen Jump
>>
>>102020656
Ideogram isn't good at resolution like it's been hyped up. Artistically you might say that <|endoftext|>
>>
>>102020684
george billrapekillstanza
>>
>>102020689
Too on the nose.
>>
File: Capture.png (77 KB, 693x750)
77 KB
77 KB PNG
>>102020683
I don't think Forge has Tonemap or Automatic CFG unfortunately, if you want to stick with Forge at least use those parameters on DynamicThresholding, they work the best for Flux
>>
File: delux_hh_00027_.png (2.32 MB, 1024x1344)
2.32 MB
2.32 MB PNG
>>102020653
shits fried yo
>>
File: 00201-350718977.png (1.24 MB, 896x1152)
1.24 MB
1.24 MB PNG
>>
>>102020698
Thank you, yea mine were a little different than that. I'll give that a shot. I'll give comfy another shot too...was definitely getting better outputs even with normal dynamic thresh
>>
File: 00018-110857331.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
>>
>>102020225
So he's been squatting here plundering and piecing together advice from anon, right?
>>
Just thought I'd share a sample image from my first test LoRA with Kohya. It's from a really badly captioned dataset of like 4 anpanman images and 3 Simpsons images I was using to test the captioner.
>>
>>102020514
Why aren't you training at a higher resolution? The high rank pointless when the resolution is so small.
>>
>>102020729
>The high rank pointless when the resolution is so small.
Can you explain the relation of rank and resolution as if I was retard?
>>
>>
File: 00021-5001803.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
>>
>>102020729
Supposedly training at 512 takes about a quarter of the time it does for 1024.
Anon here shows a comparison of the results of 1024 vs 512 training:
>>102017742
>>
File: 00210-1936981994.png (1.35 MB, 896x1152)
1.35 MB
1.35 MB PNG
>>
File: 00024-266248679.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
>>
>>102020758
I think that anon is claiming that training at 512 at such a high rank is somehow not making use of that rank. But I'm not sure what he means by that.
>>
File: P08174_10.jpg (202 KB, 1103x1536)
202 KB
202 KB JPG
>>102020697
>>102020689
Rate this.
>>
File: 00028-2530809537.png (1.39 MB, 1024x1024)
1.39 MB
1.39 MB PNG
>>
File: ComfyUI_00091_sm.png (1.14 MB, 1280x720)
1.14 MB
1.14 MB PNG
>>102020752
Do you understand why yours is crap as art? And why this is not?
>>
File: 00030-1684899914.png (1.47 MB, 1024x1024)
1.47 MB
1.47 MB PNG
>>102020789
>>
>>102020782
Cool, how did you do it? You conquered the auto-whore.
>>
File: 1694708342248886.jpg (130 KB, 1264x708)
130 KB
130 KB JPG
>>102018104
It have best prompt understanding but older ver. was also very good at that . This new update didnt have anything new
>>
>>102020738
Network rank represents the max amount of information that can be learned. The resolution is the max size of the training images. If your dataset has images that are larger than the max resolution, they will be resized to a smaller size. There's only so much detail that can be learned from a small sized image, so having a high rank won't be beneficial.
>>
>>102020795
i wrote some words and hit generate
>>
File: FD_00010_.png (822 KB, 1024x1024)
822 KB
822 KB PNG
>>102020789
Do you understand why yours is crap as art? And why this is not?
>>
>>102020806
Because you are poor.
>>
File: 00037-1694111930.png (1023 KB, 1024x1024)
1023 KB
1023 KB PNG
you dumb
>>
File: SD3_13624_00069_.png (1.7 MB, 1024x1024)
1.7 MB
1.7 MB PNG
>>102020789
>>102020806
>>
>>102020548
isn't this a professor in AI at some Turkish university or something and he knows jack shit about AI
>>
>>102020791
Add trash to the yard, remove eye contact, and make her frown. There should be a pit bull.

>>102020815
You're improving.
>>
File: 00040-1914404962.png (1.33 MB, 1024x1024)
1.33 MB
1.33 MB PNG
>>
>>102020814
You think she looks pretty, but she looks ugly.
>>
File: 00044-3105816695.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
>>
>>102020839
"she"
>>
>>102020825
He's the monster that haunts your github repository.
>>
File: 00049-3374216342.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
>>
>>102020867
Sometimes Flux is Middest Journey
>>
File: 00043-3496507613.png (1.22 MB, 1024x1024)
1.22 MB
1.22 MB PNG
>>
>>102020548
Pony lora of this guy when?
>>
File: ComfyUI_00095_.png (17 KB, 96x96)
17 KB
17 KB PNG
In small images, we see Flux being strange.
>>
>>102020975
Of all things posted, this is the first thing to make me really upset that I can't run Flux
>>
I think I'm starting to get results, The caption is really important, which means tard-wrangling each individual auto-generated caption......
>>
File: ComfyUI_00006_.png (824 KB, 1024x1024)
824 KB
824 KB PNG
Thank you anon who recommended automatic cfg.
>>
>>102021027
you're welcome o/
>>
>>102020771
awesome
I would buy a fine print, bro
rly
>>
>>102021013
>auto-generated caption
did you generate the caption through a vlm?
>>
>>102021013
>I think I'm starting to get results, The caption is really important
Well, if you feed the AI with garbage data, it will sput out garbage output, so yeah, data quality is really really important
>>
Freshly baked loaf of...
>>102021045
>>102021045
>>102021045
>>
>>102020997
Thanks!

>>102021038
I like it.

https://www.tate.org.uk/art/artists/gerard-schneider-1906
>>
>>102017856
>I just used joycaption
Did you at least read some of them to see what you are training for?
You people are the reason slop loras are so prevalent on civitai.
Garbage in, garbage out.
Curate by hand. Caption by hand (or at least review automatic captioning by hand).
>>
>>102018044
It’s also slower than Q8_0
>>
>>102021690
>>102021712
New
>>102021045
>>102021045
>>102021045
>>
>>102018701
>poorfags shouldn't be baking loras
This. And use 1024x1024. Don’t settle for “good enough” that produces mediocrity.
>>
>>102018851
This is good.
>>
>>102017742
Nice, but you can see some nonsense with the books being blank notebooks in the 512 version. Needs more testing to see how much detail you are sacrificing. Flux is able to produce very coherent backgrounds and you might be fucking it up.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.