/g/ - /ldg/ - Local Diffusion General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/ldg/ - Local Diffusion Genera(...) 08/21/24(Wed)19:11:21 No.102017300

File: ComfyUI_01791_.jpg (3.32 MB, 2048x2048)

3.32 MB JPG

/ldg/ - Local Diffusion General Anonymous 08/21/24(Wed)19:11:21 No.102017300 Archived

Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102013088

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>GPU performance
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/g/sdg
>>>/h/hdg
>>>/e/edg
>>>/c/kdg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/u/udg
>>>/trash/sdg

Anonymous
08/21/24(Wed)19:12:02 No.102017311

Anonymous 08/21/24(Wed)19:12:02 No.102017311

Blessed thread of frenship

Anonymous
08/21/24(Wed)19:12:32 No.102017320

Anonymous 08/21/24(Wed)19:12:32 No.102017320

File: ComfyUI_00935_.jpg (2.1 MB, 2048x2048)

2.1 MB JPG

Anonymous
08/21/24(Wed)19:13:35 No.102017329

Anonymous 08/21/24(Wed)19:13:35 No.102017329

File: ifx168.png (1.34 MB, 1024x1024)

1.34 MB PNG

Anonymous
08/21/24(Wed)19:18:21 No.102017373

Anonymous 08/21/24(Wed)19:18:21 No.102017373

File: FLUX_00018_.jpg (486 KB, 2016x1152)

486 KB JPG

Anonymous
08/21/24(Wed)19:19:08 No.102017382

Anonymous 08/21/24(Wed)19:19:08 No.102017382

File: 1698822159771336.jpg (25 KB, 525x384)

25 KB JPG

Ideogram niggas launched 2.0 in panic its barely any different than last ver .

Anonymous
08/21/24(Wed)19:19:35 No.102017385

Anonymous 08/21/24(Wed)19:19:35 No.102017385

File: SuperMetroidScreenshot.png (393 KB, 618x464)

393 KB PNG

Challenge: Find a captioner that, when fed with this picture, produces a prompt that when fed to Flux recreates it.
This is now a benchmark for how close we get to it, when we do it, the sky is the limit.

Anonymous
08/21/24(Wed)19:22:17 No.102017410

Anonymous 08/21/24(Wed)19:22:17 No.102017410

>>102017385
we GAN now?

Anonymous
08/21/24(Wed)19:22:53 No.102017421

Anonymous 08/21/24(Wed)19:22:53 No.102017421

>>102017385
do you know how diffusion works

Anonymous
08/21/24(Wed)19:23:31 No.102017428

Anonymous 08/21/24(Wed)19:23:31 No.102017428

File: SuperMetroidScreenshotJoy.png (1.03 MB, 1280x768)

1.03 MB PNG

>>102017385
https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha
>This image is a still frame from a video game, specifically a scene from a side-scrolling action game. The setting is a dark, cavernous environment with a green, rocky floor and ceiling. The floor is covered in small, sharp spikes and jagged rocks. On the left side of the image, a character dressed in a yellow, armored suit is seen, with a red visor and a gun in their hand, standing near a metallic, cylindrical object that resembles a door or entrance. The character is facing a massive, green, reptilian creature with a large, round head, red eyes, and sharp teeth. The creature is standing in the middle of the frame, towering over the character, and appears to be about to attack. The creature's skin is rough and scaly, with spikes protruding from its back and shoulders. The background is mostly black, with occasional greenish hues, enhancing the eerie and intense atmosphere. The top of the image features various energy bars and health indicators, indicating the game's status. The image is rendered in a pixelated, retro style, typical of early 90s video games, with vibrant colors and sharp contrasts, creating a visually dynamic and immersive scene.

Anonymous
08/21/24(Wed)19:24:31 No.102017438

Anonymous 08/21/24(Wed)19:24:31 No.102017438

>>102017428
>The soulless cash grab remake

Anonymous
08/21/24(Wed)19:25:17 No.102017443

Anonymous 08/21/24(Wed)19:25:17 No.102017443

File: fs_0142.jpg (372 KB, 1536x2048)

372 KB JPG

Anonymous
08/21/24(Wed)19:25:29 No.102017445

Anonymous 08/21/24(Wed)19:25:29 No.102017445

File: ComfyUI_00625_.png (1.99 MB, 896x1344)

1.99 MB PNG

Anonymous
08/21/24(Wed)19:27:20 No.102017466

Anonymous 08/21/24(Wed)19:27:20 No.102017466

>>102017300
no collage but thats a nice image

Anonymous
08/21/24(Wed)19:27:47 No.102017471

Anonymous 08/21/24(Wed)19:27:47 No.102017471

File: ComfyUI_00578_.png (1.9 MB, 1024x1024)

1.9 MB PNG

Anonymous
08/21/24(Wed)19:28:32 No.102017484

Anonymous 08/21/24(Wed)19:28:32 No.102017484

File: SuperMetroidScreenshotLlama3.png (1.28 MB, 1280x768)

1.28 MB PNG

>>102017428
https://huggingface.co/spaces/SakanaAI/Llama-3-EvoVLM-JP-v2
>The image is a scene from a video game, specifically a classic game with pixel art graphics. The game is set in a jungle environment, with lush greenery and a dense forest that fills the background. In the foreground, there is a large, green, and slimy monster that is the main subject of the scene. Its body is covered in moss and dirt, with thick, green leaves on its head and neck. The monster's mouth is open, revealing sharp teeth, and it appears to be roaring or shouting something. The monster is standing on a platform, which is a solid gray color and has a slightly raised edge.
lol

Anonymous
08/21/24(Wed)19:28:36 No.102017485

Anonymous 08/21/24(Wed)19:28:36 No.102017485

File: FD_00057_.png (2.37 MB, 1024x1536)

2.37 MB PNG

Anonymous
08/21/24(Wed)19:29:53 No.102017504

Anonymous 08/21/24(Wed)19:29:53 No.102017504

Lower your guidance or whatever removes the sloppa nowadays

Anonymous
08/21/24(Wed)19:30:52 No.102017515

Anonymous 08/21/24(Wed)19:30:52 No.102017515

>>102017504
that's hard with fuxxed apparently

Anonymous
08/21/24(Wed)19:31:59 No.102017530

Anonymous 08/21/24(Wed)19:31:59 No.102017530

>>102017515
For you

Anonymous
08/21/24(Wed)19:32:00 No.102017531

Anonymous 08/21/24(Wed)19:32:00 No.102017531

>>102017504
and then it stops looking like what you're trying to prompt

Anonymous
08/21/24(Wed)19:33:28 No.102017540

Anonymous 08/21/24(Wed)19:33:28 No.102017540

>>102017530
>no image

Anonymous
08/21/24(Wed)19:34:24 No.102017546

Anonymous 08/21/24(Wed)19:34:24 No.102017546

File: SuperMetroidScreenshotGemini.png (807 KB, 1280x768)

807 KB PNG

>>102017484
https://aistudio.google.com/app/prompts/new_chat (Google Gemini)
>The image is a screenshot from a video game. It is a side-scrolling platformer where the player controls a character named Samus Aran, who is a bounty hunter. The game is set in a dark, futuristic world. Samus is in the lower left corner of the screen. She is facing right. Her power suit is light green and yellow, and she has a helmet on. She is holding a weapon in her right hand. Samus is standing in front of a large, circular object. The object is a giant metal pipe or tube. Above Samus and the metal tube, there are a series of green, rectangular blocks. In the center of the top of the screen, there are two icons. Both are brown. The one on the left is rectangular and has a number '67' in it. The one on the right is square and has a '4' in it. To the right of the center of the screen, there is a very large, green alien creature that looks like a dinosaur. It is a light green and has brown markings on its face. It has a large, round belly. The creature's head is very large, with a long, flat snout, and very big eyes. The alien's upper jaw is open. The creature is partially obscured by green, spiky objects that resemble giant thorns. It is hard to tell what these objects are. The background is dark, with a black sky and green ground. The bottom of the image is a green, spiky surface. There are many large, green, flat blocks, and many smaller, round blocks. The player can walk on all of these surfaces. The image is a screenshot from a video game and is very stylized. The graphics are pixelated and the colors are bright and colorful.

Anonymous
08/21/24(Wed)19:35:11 No.102017554

Anonymous 08/21/24(Wed)19:35:11 No.102017554

File: _mLpMwsav5eMeNcZdrIQl.png (1.11 MB, 3960x2378)

1.11 MB PNG

>>102017428
I'm still confused about people applying JoyCaption to everything as a VLM when it isn't SOTA. The only valid usecase is to make uncensored captions, but some of the models out there from from the Chinese can be retrofitted with less censored finetunes which will effectively do the same thing. MiniCPM-V is the best small model but even that is no match for the best one out there which is the 76B model from InternVL2. There's no usecase other than speed. Even for training, you would take the extra time to label all these images.

Anonymous
08/21/24(Wed)19:38:55 No.102017587

Anonymous 08/21/24(Wed)19:38:55 No.102017587

>>102017554
While I agree with you and I've modified my Joy caption script to run smaller versions of intern vlm, There are serious practical hurdles to running a 76b model locally.

Anonymous
08/21/24(Wed)19:44:15 No.102017646

Anonymous 08/21/24(Wed)19:44:15 No.102017646

File: 000500_00_20240821163314.png (543 KB, 512x512)

543 KB PNG

Why are the flux training sample images just noise? I'm already at 500 steps.

Anonymous
08/21/24(Wed)19:46:04 No.102017663

Anonymous 08/21/24(Wed)19:46:04 No.102017663

File: 3d_conv.jpg (465 KB, 1826x2048)

465 KB JPG

>>102017300
this is annoying and I don't want to do anymore. I'll call it at 2.5D

Things I should have noticed.
-Resolution is difficult and I should of outpainted immediately
-There are few things about this image that is really special and therefore hard to keep it as is. I could have protected the hair accessory, but I thought the color was better.
-I don't know what dress ridges are or what they are called and inpainting loved to eat them.
-I could have removed the necklace, but I like it.
-There is no way to keep the blush and eyeliner at such strong levels without it looking more artificially generated, at least to me
-the arm size on her left arm is unnatural and should be fixed, but I attempted to keep it the same

As far as tips, when impainting leave the arm band out. It doesn't survive any touching. Using a controlnet IP adapter helped, but not greatly. Upscaling/Downscaling tricks didn't really work. I had to gimp the ridges in the dress back in for reasons previously mentioned

Anonymous
08/21/24(Wed)19:46:42 No.102017672

Anonymous 08/21/24(Wed)19:46:42 No.102017672

File: fs_0160.jpg (54 KB, 512x512)

54 KB JPG

Anonymous
08/21/24(Wed)19:48:32 No.102017688

Anonymous 08/21/24(Wed)19:48:32 No.102017688

>>102017646
should have stopped at around 20 steps

Anonymous
08/21/24(Wed)19:48:47 No.102017696

Anonymous 08/21/24(Wed)19:48:47 No.102017696

>>102017300
why is sdg still in the op?

Anonymous
08/21/24(Wed)19:49:05 No.102017698

Anonymous 08/21/24(Wed)19:49:05 No.102017698

>>102017646
It's broken mate. Stop it and fix it

Anonymous
08/21/24(Wed)19:49:56 No.102017704

Anonymous 08/21/24(Wed)19:49:56 No.102017704

File: 4.jpg (2.7 MB, 3072x1280)

2.7 MB JPG

Anonymous
08/21/24(Wed)19:50:05 No.102017706

Anonymous 08/21/24(Wed)19:50:05 No.102017706

>>102017663
>this is annoying and I don't want to do anymore. I'll call it at 2.5D
What is it?

Anonymous
08/21/24(Wed)19:51:06 No.102017715

Anonymous 08/21/24(Wed)19:51:06 No.102017715

>>102017704
neat

Anonymous
08/21/24(Wed)19:52:11 No.102017728

Anonymous 08/21/24(Wed)19:52:11 No.102017728

>>102017696
why do you care?

Anonymous
08/21/24(Wed)19:52:47 No.102017739

Anonymous 08/21/24(Wed)19:52:47 No.102017739

>>102017706
not realistic, but getting there.

I linked the wrong thing. The request was to convert this >>102014576
I thought it would be a fun challenge.

Anonymous
08/21/24(Wed)19:52:53 No.102017742

Anonymous 08/21/24(Wed)19:52:53 No.102017742

File: ComfyUI_00818_.png (1.31 MB, 896x1152)

1.31 MB PNG

https://imgsli.com/Mjg5OTQ1
a comparison between training flux loras at 1024,1024 vs 512,512. Same epoch, repeats, steps, etc. The 512 one I trained at a higher dim and without the split network optimization argument.
24 images, 10 repeats, 10 epochs
1024,1024 took around 5 hours
512,512 took me 1 hour
Pretty stoked on the speed and quality of training at 512.

Anonymous
08/21/24(Wed)19:54:19 No.102017762

Anonymous 08/21/24(Wed)19:54:19 No.102017762

File: flux_00394_.png (1.06 MB, 1160x936)

1.06 MB PNG

Anonymous
08/21/24(Wed)19:55:44 No.102017781

Anonymous 08/21/24(Wed)19:55:44 No.102017781

>The more images, the better
>You can train a lora with just 20 images
w-which is it

Anonymous
08/21/24(Wed)19:56:19 No.102017788

Anonymous 08/21/24(Wed)19:56:19 No.102017788

>>102017742
why not throw a few 1024x1024 in with the 512x512 for some diversity

Anonymous
08/21/24(Wed)19:56:20 No.102017789

Anonymous 08/21/24(Wed)19:56:20 No.102017789

>>102017742
Nice, I'm having good luck training at 512, too. How did you caption your images?

Anonymous
08/21/24(Wed)19:58:08 No.102017805

Anonymous 08/21/24(Wed)19:58:08 No.102017805

File: SuperMetroidScreenshotGPT4o.png (1.2 MB, 1280x768)

1.2 MB PNG

>>102017385
https://aichatonline.org/gpts-2OToA97Vhr-Describe-Image (GPT4o)
This went to hard that I had to pastebin it:
https://pastebin.com/Y4CGtRyH

Anonymous
08/21/24(Wed)19:59:26 No.102017826

Anonymous 08/21/24(Wed)19:59:26 No.102017826

>>102017646
It's ogre. Your training parameters or dataset were bunk and the loss went parabolic.

Anonymous
08/21/24(Wed)20:00:31 No.102017839

Anonymous 08/21/24(Wed)20:00:31 No.102017839

>>102017781
the more UNIQUE images the better. If you have the same person in the same outfit then stop at 20.

Anonymous
08/21/24(Wed)20:01:51 No.102017856

Anonymous 08/21/24(Wed)20:01:51 No.102017856

File: ComfyUI_00819_.png (2.04 MB, 1536x1152)

2.04 MB PNG

>>102017788
the input images are of varying size, I just mean the --resolution launch parameter. I didn't resize/crop anything unless it just resizes the buckets automatically. Still very new to training loras.

>>102017789
I just used joycaption

Anonymous
08/21/24(Wed)20:02:13 No.102017859

Anonymous 08/21/24(Wed)20:02:13 No.102017859

Is it possible to train a Flux LoRa in 8GB VRAM?

Anonymous
08/21/24(Wed)20:02:39 No.102017866

Anonymous 08/21/24(Wed)20:02:39 No.102017866

>>102017781
More images makes the lora more flexible.

Anonymous
08/21/24(Wed)20:02:45 No.102017869

Anonymous 08/21/24(Wed)20:02:45 No.102017869

>>102017742
>512
I was skeptical at first too, but it seems like any potential quality loss from the smaller images is made up for with buckets and higher dimensions.
I've trained at 1024 and 512 and I did not notice a quality drop or increase except for like you said, speed.
I'll wait for more people to try various settings first, but at least on a visual level, training LoRAs at 512 seems a-okay

Anonymous
08/21/24(Wed)20:03:46 No.102017880

Anonymous 08/21/24(Wed)20:03:46 No.102017880

>>102017859
Possible? Maybe? Practical? No.

Anonymous
08/21/24(Wed)20:05:30 No.102017899

Anonymous 08/21/24(Wed)20:05:30 No.102017899

>>102017880
So what would be a decent amount of VRAM to train a Flux LoRa?

Anonymous
08/21/24(Wed)20:06:05 No.102017906

Anonymous 08/21/24(Wed)20:06:05 No.102017906

>>102017742
What dims?

Anonymous
08/21/24(Wed)20:07:10 No.102017915

Anonymous 08/21/24(Wed)20:07:10 No.102017915

>>102017899
the devs have been assuming that you should have 24GB since cascade.

Anonymous
08/21/24(Wed)20:08:26 No.102017922

Anonymous 08/21/24(Wed)20:08:26 No.102017922

>>102017899
Realistically? 24gb. You might be able to get away with 16, but your LoRAs will take forever to train and can't take advantage of higher dimensions
There is an anon here claiming he does it on 12gb and there are scripts to confirm that on the Kohya page, but he effecively bricks his PC for 12 hours while it trains.

I don't think you could realistically train on 8GB in a manner that's practical.

Anonymous
08/21/24(Wed)20:09:12 No.102017930

Anonymous 08/21/24(Wed)20:09:12 No.102017930

>>102017742
>>102017869
so input size doesnt matter? i guess i've been "pre" bucketing to 1024x1024 on mine

Anonymous
08/21/24(Wed)20:11:42 No.102017956

Anonymous 08/21/24(Wed)20:11:42 No.102017956

>>102017856
>I just mean the --resolution launch parameter
well you'd have to change the logic of it to sometimes use 1024 images instead of 512 all the time

Anonymous
08/21/24(Wed)20:13:50 No.102017981

Anonymous 08/21/24(Wed)20:13:50 No.102017981

File: flux_00452_.png (1 MB, 1200x936)

1 MB PNG

>"five six, one hundred and twenty pouns, three years as mall security night shift and five years of krav maga..."
>"well, uh... thank you for your application, but it's just that our ninja enterprise has a different profile in mind..."
>"we hope you understand"

Anonymous
08/21/24(Wed)20:14:46 No.102017990

Anonymous 08/21/24(Wed)20:14:46 No.102017990

File: 2024-08-22_00109_.png (2.32 MB, 1024x1280)

2.32 MB PNG

Anonymous
08/21/24(Wed)20:15:19 No.102017994

Anonymous 08/21/24(Wed)20:15:19 No.102017994

File: ComfyUI_00821_.png (1.76 MB, 1152x1536)

1.76 MB PNG

>>102017906
the 1024 was 4, was using the default kohya settings. 512,512 I did 16, figuring I had more juice to spare, after looking through a few config files of what people were doing on Bmaltais Kohya GUI github discussions.

Anonymous
08/21/24(Wed)20:16:20 No.102018003

Anonymous 08/21/24(Wed)20:16:20 No.102018003

File: 00036-3114675299.png (1.87 MB, 1024x1440)

1.87 MB PNG

>>102017922
I've been training on 16gb. 512x512, 3000 steps, 16 dim, takes around 1 hour 40 minutes

Anonymous
08/21/24(Wed)20:16:28 No.102018006

Anonymous 08/21/24(Wed)20:16:28 No.102018006

>>102017382
Yeah, I'm a few tests in and it's nowhere near as good as Flux.
Flux has realism LoRAs which look better than what that model thinks realism looks like. Sure, it may know more concepts, as closed models tend to due to their sheer size, but it is not as polished in details. Aside from that, it also seems to not be following the prompt as good as Flux, at least not as consistently (and it fucked up in one of my tests which Flux passed with flying colors). Even with Magic Prompt turned off it does its own thing.

Anonymous
08/21/24(Wed)20:16:46 No.102018009

Anonymous 08/21/24(Wed)20:16:46 No.102018009

File: 1723857565887861.jpg (3.84 MB, 7961x2897)

3.84 MB JPG

Anyone has an updated version of this?

Anonymous
08/21/24(Wed)20:17:34 No.102018019

Anonymous 08/21/24(Wed)20:17:34 No.102018019

>>102018003
>512x512
Is that good enough? Is it because you can't fit larger images on 16GB?

Anonymous
08/21/24(Wed)20:18:35 No.102018033

Anonymous 08/21/24(Wed)20:18:35 No.102018033

>>102018003
>3000 steps
repeats, epochs?

Anonymous
08/21/24(Wed)20:19:34 No.102018043

Anonymous 08/21/24(Wed)20:19:34 No.102018043

>>102017922
12 hours anon here, the training used 7GB VRAM at most (512x512, dim 8) and I was able to comfortably work in Photoshop at the same time. Anyway, it only took so long because I was using very low learning rate.

Anonymous
08/21/24(Wed)20:19:38 No.102018044

Anonymous 08/21/24(Wed)20:19:38 No.102018044

>>102018009
Q6_K takes less, around 11.1GB during inference and 7.3GB during idling.

Anonymous
08/21/24(Wed)20:19:42 No.102018046

Anonymous 08/21/24(Wed)20:19:42 No.102018046

File: FD_00060_.png (2.11 MB, 1024x1536)

2.11 MB PNG

Anonymous
08/21/24(Wed)20:21:59 No.102018070

Anonymous 08/21/24(Wed)20:21:59 No.102018070

File: FD_00063_.png (2.24 MB, 1024x1536)

2.24 MB PNG

>>102018003
She looks like a fat capitalist. She needs some good manual labour and a diet of borscht.

Anonymous
08/21/24(Wed)20:22:12 No.102018072

Anonymous 08/21/24(Wed)20:22:12 No.102018072

File: CUI_iced_00348_.jpg (790 KB, 1440x2560)

790 KB JPG

>>102017990
>>102017471
>>102017329
love these

Anonymous
08/21/24(Wed)20:22:34 No.102018079

Anonymous 08/21/24(Wed)20:22:34 No.102018079

File: 00050-3146234069.png (1.42 MB, 1024x1440)

1.42 MB PNG

>>102018019
>512x512
>Is that good enough?
Seems to be the consensus. I've noticed finer skin details like moles not being picked up but that could be my settings. I'm still testing various configs and captions
>>102018033
30 Images. So 10 repeats 10 epochs

Anonymous
08/21/24(Wed)20:23:36 No.102018090

Anonymous 08/21/24(Wed)20:23:36 No.102018090

How's flux for landsdcape paintings and porn?

Anonymous
08/21/24(Wed)20:25:04 No.102018104

Anonymous 08/21/24(Wed)20:25:04 No.102018104

>>102018006
And it's kind of strange. I noticed so many shills on Reddit claiming the model is really really good, but it's not better than Flux dev.

Anonymous
08/21/24(Wed)20:26:33 No.102018116

Anonymous 08/21/24(Wed)20:26:33 No.102018116

>>102017300
It's 2024
Still can't spread diffusion model VRAM across GPUs

Anonymous
08/21/24(Wed)20:27:07 No.102018123

Anonymous 08/21/24(Wed)20:27:07 No.102018123

File: FD_00065_.png (2.25 MB, 1024x1536)

2.25 MB PNG

>>102018079
I'd have gone for 20 repeats, 10 epochs. But it's obvious who she is.
Can you try some close ups?

Anonymous
08/21/24(Wed)20:27:27 No.102018128

Anonymous 08/21/24(Wed)20:27:27 No.102018128

Just trained a lora on my own 3d render style and damn flux is really good at it. Though autocaptioners seem like complete shit. Florence-2 just kept making shit up and repeating itself over and over again. I wound up having to just rewrite most of them myself. Is there any info on what Flux used to caption? I feel like captioners are still the main limiter and the improvement for image models can just keep growing as long as captioning improves.

Anonymous
08/21/24(Wed)20:29:56 No.102018154

Anonymous 08/21/24(Wed)20:29:56 No.102018154

File: flux_00461_.png (1.01 MB, 1200x936)

1.01 MB PNG

Anonymous
08/21/24(Wed)20:32:49 No.102018181

Anonymous 08/21/24(Wed)20:32:49 No.102018181

File: 00000-3023947591.png (1.17 MB, 1024x1024)

1.17 MB PNG

>>102018123
Like I said skin detail is lost, it totally failed to pick up her cheek mole despite me tagging it. You think 20 repeats would've helped?

Anonymous
08/21/24(Wed)20:33:54 No.102018196

Anonymous 08/21/24(Wed)20:33:54 No.102018196

SD 3.1 soon, any carers?

Anonymous
08/21/24(Wed)20:34:29 No.102018204

Anonymous 08/21/24(Wed)20:34:29 No.102018204

File: fs_0202.jpg (99 KB, 2048x1344)

99 KB JPG

Anonymous
08/21/24(Wed)20:34:54 No.102018207

Anonymous 08/21/24(Wed)20:34:54 No.102018207

>>102017587
Yeah I understand the issues with the hardware requirements but even InternVL2 at a smaller size is going to be better than something that is still closed source. Until we know where it lands in terms of performance, it seems a bit too hasty to apply it to everything. The only issue with smaller models like MiniCPM or InternVL2 is that they are using internal or Chinese models. The finetunes aren't great for these, like with Qwen 2 7B used as the base for MiniCPM-V 2.6, pickings are slim. It's either Dolphin or Einstein. I probably should regress back to 2.5 so I can use any Llama 3 model of choice but the vision CLIP part is worse.

Anonymous
08/21/24(Wed)20:34:55 No.102018208

Anonymous 08/21/24(Wed)20:34:55 No.102018208

Comfy/Auto when? https://github.com/ShaochengShen/MegaFusion/

Anonymous
08/21/24(Wed)20:35:19 No.102018221

Anonymous 08/21/24(Wed)20:35:19 No.102018221

File: 00051-AYAKON_12481768768.jpg (1.94 MB, 3840x1600)

1.94 MB JPG

made some more small edits to this, really like can be done with pony and flux mixing parts of the process

Anonymous
08/21/24(Wed)20:35:50 No.102018232

Anonymous 08/21/24(Wed)20:35:50 No.102018232

>>102018196
Nope.

Anonymous
08/21/24(Wed)20:35:51 No.102018233

Anonymous 08/21/24(Wed)20:35:51 No.102018233

>>102018204
prompt?

Anonymous
08/21/24(Wed)20:36:01 No.102018234

Anonymous 08/21/24(Wed)20:36:01 No.102018234

File: flux_00465_.png (926 KB, 1200x936)

926 KB PNG

Anonymous
08/21/24(Wed)20:36:39 No.102018241

Anonymous 08/21/24(Wed)20:36:39 No.102018241

>>102018196
Pressing X to doubt bout 2 times
strongly doubting a .1 can fix the issues.

Anonymous
08/21/24(Wed)20:36:52 No.102018242

Anonymous 08/21/24(Wed)20:36:52 No.102018242

>>102018233
>a still image from a weatherstation camera overlooking a small town at dusk, the sky is darkened with a serious looking storm. The photo captures the awe-inspiring split second that lightning arcs across the entire landscape. Barely visible in the distant clouds, an ominous looming monstrous terrifying kaiju with glowing eyes

Anonymous
08/21/24(Wed)20:40:51 No.102018274

Anonymous 08/21/24(Wed)20:40:51 No.102018274

File: fs_0206.jpg (77 KB, 2048x1344)

77 KB JPG

Anonymous
08/21/24(Wed)20:41:01 No.102018275

Anonymous 08/21/24(Wed)20:41:01 No.102018275

File: flux_00022_.png (1.45 MB, 1160x896)

1.45 MB PNG

>>102018196
SAI can fuck off and die.

Anonymous
08/21/24(Wed)20:42:25 No.102018292

Anonymous 08/21/24(Wed)20:42:25 No.102018292

>>102018196
look if the model is by some miracle better than flux (99.9% chance it won't be) I will be back on board the stable diffusion train
Realistically it will probably be mediocre and we get a ton of bullshit about safety and respecting 3rd worlders 2gb vram pcs

Anonymous
08/21/24(Wed)20:43:43 No.102018299

Anonymous 08/21/24(Wed)20:43:43 No.102018299

File: 16767674178512926.png (363 KB, 735x581)

363 KB PNG

>the future of stability is focusing on as low spec as possible for the indian market

Anonymous
08/21/24(Wed)20:45:30 No.102018316

Anonymous 08/21/24(Wed)20:45:30 No.102018316

File: 3.jpg (734 KB, 896x1152)

734 KB JPG

>>102017715
thank you anon

Anonymous
08/21/24(Wed)20:46:31 No.102018331

Anonymous 08/21/24(Wed)20:46:31 No.102018331

File: FLUX.jpg (154 KB, 1496x1168)

154 KB JPG

Still getting the hang of the prompting of flux.

Anonymous
08/21/24(Wed)20:47:37 No.102018340

Anonymous 08/21/24(Wed)20:47:37 No.102018340

>>102018292
>if this new free thing is better than this other free thing I'll take the better free thing
Well yeah

Anonymous
08/21/24(Wed)20:48:00 No.102018345

Anonymous 08/21/24(Wed)20:48:00 No.102018345

File: flux_00470_.png (1.05 MB, 1200x936)

1.05 MB PNG

Anonymous
08/21/24(Wed)20:48:29 No.102018349

Anonymous 08/21/24(Wed)20:48:29 No.102018349

File: Capture.png (2.29 MB, 1493x1168)

2.29 MB PNG

Why is flux so prone to creating arches and pathways like this

Anonymous
08/21/24(Wed)20:49:34 No.102018360

Anonymous 08/21/24(Wed)20:49:34 No.102018360

>>102018340
https://old.reddit.com/r/StableDiffusion/comments/1exw2m4/sd_31_is_coming/ljad3lk/

Anonymous
08/21/24(Wed)20:49:59 No.102018367

Anonymous 08/21/24(Wed)20:49:59 No.102018367

>>102018349
Very common in concept art. Try to include a viewing angle, might help it.
>bird's eye view
>side view
>from above
etc

Anonymous
08/21/24(Wed)20:50:43 No.102018382

Anonymous 08/21/24(Wed)20:50:43 No.102018382

>>102018316
Beautiful

Anonymous
08/21/24(Wed)20:51:30 No.102018389

Anonymous 08/21/24(Wed)20:51:30 No.102018389

I have a schizo idea. We know that flux is trained with guidance distillation. Meaning that instead of predicting the noise, it is trained to predict the CFG difference from the teacher model, i.e. cfg*noise_pred - (cfg-1)*unconditional_noise_pred. And the guidance vector tells it what CFG was used when constructing that target.

Now the problem is that when finetuning flux, we don't have a teacher model. So we can't train on the distillation loss, since we don't have the unconditional half of that expression. Instead it seems best to just train on noise prediction alone (corresponding to CFG=1) and set the guidance vector to 1.

But why can't we just use the unconditional prediction from the model itself when making the guidance distillation target? Just predict the noise with an empty prompt and guidance=1, to get uncond_noise_pred. Now the target is just cfg*real_noise - (cfg-1)*uncond_noise_pred. The model itself was used to get the unconditional half of the CFG difference. You could use caption dropout as well to make sure the model learns to make good unconditional predictions. Call this "self distillation" or some shit.

why wouldn't this work

Anonymous
08/21/24(Wed)21:01:05 No.102018486

Anonymous 08/21/24(Wed)21:01:05 No.102018486

File: CUI_iced_00379_.jpg (644 KB, 1440x2560)

644 KB JPG

>>102018389
madman, try it out, if it works ask a llm to write you a research paper and drop it and become a legend

Anonymous
08/21/24(Wed)21:03:08 No.102018500

Anonymous 08/21/24(Wed)21:03:08 No.102018500

>>102018389
>why wouldn't this work
Sir I just type in "boobs" and hit generate

Anonymous
08/21/24(Wed)21:05:41 No.102018520

Anonymous 08/21/24(Wed)21:05:41 No.102018520

Anyone else notice that painting LoRAs still manage to make images that look too realistic/CGI instead of painting? In particular this one
https://civitai.com/models/649868/leonardo-davinci-style?modelVersionId=727069
and
https://civitai.com/models/669566/style-of-rembrandt-flux-135

and many others. This is not an issue when it's very stylized, but when it's involving painting portraits like these it tends to do that.

Anonymous
08/21/24(Wed)21:05:45 No.102018521

Anonymous 08/21/24(Wed)21:05:45 No.102018521

File: titty lora stack.png (41 KB, 428x886)

41 KB PNG

holy fucking SHIT now THAT"s a LORA STACK
https://civitai.com/images/25026595

Anonymous
08/21/24(Wed)21:10:31 No.102018575

Anonymous 08/21/24(Wed)21:10:31 No.102018575

File: 1.jpg (2.47 MB, 1792x2304)

2.47 MB JPG

>>102018382
No, you are beautiful anon

Anonymous
08/21/24(Wed)21:18:40 No.102018666

Anonymous 08/21/24(Wed)21:18:40 No.102018666

>>102018019
512x512 generates elongated bodies like here >>102018079
This has already been documented.

Anonymous
08/21/24(Wed)21:20:58 No.102018701

Anonymous 08/21/24(Wed)21:20:58 No.102018701

>>102018520
>175 steps
>1000 steps
poorfags shouldn't be baking loras. crank that up to 12000 and it will be fine.

Anonymous
08/21/24(Wed)21:21:34 No.102018706

Anonymous 08/21/24(Wed)21:21:34 No.102018706

File: SuperMetroidScreenshotRepixify.png (758 KB, 1280x768)

758 KB PNG

>>102017805
https://www.repixify.com/tools/image-description-generator
>The image is a screenshot from a video game. The player character, a yellow and red figure with a helmet, is standing on a platform to the left of the screen. A large green and yellow monster with a large mouth is in the foreground. The monster is facing the player character and appears to be about to attack. The background is a dark, pixelated landscape with green foliage. The game's interface is visible at the top of the screen, showing the player's energy level, score, and other information. The image evokes a sense of danger and excitement, as the player character faces a formidable opponent.

Anonymous
08/21/24(Wed)21:29:16 No.102018787

Anonymous 08/21/24(Wed)21:29:16 No.102018787

>>102018389
isn't flux the teacher model?

Anonymous
08/21/24(Wed)21:29:43 No.102018797

Anonymous 08/21/24(Wed)21:29:43 No.102018797

File: CUI_iced_00397_.png (3.8 MB, 1440x2560)

3.8 MB PNG

>>102018706
>>102017805
i use 4o as well for all my prompts now too lol, we truly are fucked in a decade

Anonymous
08/21/24(Wed)21:29:54 No.102018800

Anonymous 08/21/24(Wed)21:29:54 No.102018800

File: SuperMetroidScreenshotAsuo.png (796 KB, 1280x768)

796 KB PNG

>>102018706
https://asuo-ai-labs.streamlit.app/Image_Accessibility
>A screenshot from a retro video game depicts a battle scene between a small human character and a large, menacing green creature. The human character appears to be wearing a suit and helmet, standing on a platform composed of blue and black blocks on the left side of the screen. The platform is slightly raised above a bed of sharp spikes. On the right side of the screen, the large green creature with multiple eyes and sharp spikes on its back is facing the human character. The creature appears to be attacking, as there are small explosions and smoke near its body. The background of the scene consists of dark colors, with green vegetation at the top and bottom. At the top left corner of the screen, there are indicators for "ENERGY 67," and below it, the numbers "020 04." In the top right corner, there is a small grid. Retro video game screenshot showing a human character facing a large green creature in a battle scene.
I found surprising that generating these descriptions takes longer than it takes Flux to draw them.

Anonymous
08/21/24(Wed)21:30:26 No.102018805

Anonymous 08/21/24(Wed)21:30:26 No.102018805

>>102018701
I trained a LoRA at 7000 steps last night and it wasn't even particularly overbaked.

Anonymous
08/21/24(Wed)21:33:25 No.102018840

Anonymous 08/21/24(Wed)21:33:25 No.102018840

I have been going crazy trying to hack together a solution for local image tagging. I had this crazy idea that I could just feed an LLM a single image's clip embeddings of a character and it would be able to identify them when captioning data, instead it said almost everyone was that character.

Anonymous
08/21/24(Wed)21:33:46 No.102018845

Anonymous 08/21/24(Wed)21:33:46 No.102018845

>>102018520
Undertrained loras. This also happens with 1.5 and sdxl ones. From my experience with training stuff, faces are usually the last thing to change and need the most amount of training to get rid of their baked in look.

Anonymous
08/21/24(Wed)21:35:03 No.102018851

Anonymous 08/21/24(Wed)21:35:03 No.102018851

File: image.png (1.52 MB, 800x1400)

1.52 MB PNG

Anonymous
08/21/24(Wed)21:37:45 No.102018880

Anonymous 08/21/24(Wed)21:37:45 No.102018880

>>102018805
What was the learning rate?

Anonymous
08/21/24(Wed)21:38:56 No.102018889

Anonymous 08/21/24(Wed)21:38:56 No.102018889

File: SuperMetroidScreenshotLlava.png (1.3 MB, 1280x768)

1.3 MB PNG

>>102018797
So far it's the only one capable of recognizing that it's Super Metroid, Samus Aran, and even Kraid the boss and all the things.
>>102018800
https://huggingface.co/spaces/llava-hf/llava-4bit
>The scene is set on an alien planet, where two characters are engaged with each other amidst the greenery of their surroundings. One character appears to be fighting or interacting closely while standing next to another large creature that seems more like some kind of monster than its own species member. This creates tension between them as they face off against one another within what could possibly resemble outer space terrain filled mostly by plants rather than typical landscapes found elsewhere such as grassy fields forests etc..
So far the furthest from the target.

Anonymous
08/21/24(Wed)21:40:11 No.102018901

Anonymous 08/21/24(Wed)21:40:11 No.102018901

>>102018880
Nta but just use Prodigy and you don't have to worry about the learning rate.

Anonymous
08/21/24(Wed)21:47:14 No.102018960

Anonymous 08/21/24(Wed)21:47:14 No.102018960

>>102018901
>Prodigy
Do I just set the optimizer to Prodigy or are there other parameters I need to adjust?

Anonymous
08/21/24(Wed)21:49:24 No.102018976

Anonymous 08/21/24(Wed)21:49:24 No.102018976

>>102018901
nta, but does prodigy work with flux? I thought it ate a huge amount of vram.

Anonymous
08/21/24(Wed)21:49:36 No.102018978

Anonymous 08/21/24(Wed)21:49:36 No.102018978

File: SuperMetroidScreenshotFlo(...).png (365 KB, 1280x768)

365 KB PNG

>>102018889
Sometimes I find these so bad I find them amusing.
https://huggingface.co/spaces/gokaygokay/FLUX.1-dev-with-Captioner (Florence 2)
>The image is a screenshot from a video game. It shows a scene from the game, where the player is in the middle of a battle. The player is wearing a yellow outfit and is standing on a platform with a blue background. On the right side of the screen, there is a large green dragon with sharp teeth and red eyes. The dragon is attacking the player with its mouth open, as if it is about to attack. The background is black, and there are various enemies and enemies scattered around the scene. At the top of the image, there are two buttons, one labeled "Energy 67" and the other labeled "020 04".
Whoever is using Florence 2 to caption your training data, please reconsider.

Anonymous
08/21/24(Wed)21:50:25 No.102018988

Anonymous 08/21/24(Wed)21:50:25 No.102018988

can someone link me the training scrips for flux with Kohya, I cant find them,

Anonymous
08/21/24(Wed)21:55:42 No.102019030

Anonymous 08/21/24(Wed)21:55:42 No.102019030

>>102018960
Use prodigy and set learning rate to 1.

>>102018976
Uses the same vram as adamw8bit. I haven't completed training yet, so I don't know if VRAM scales up later on.

Anonymous
08/21/24(Wed)21:58:23 No.102019069

Anonymous 08/21/24(Wed)21:58:23 No.102019069

File: SuperMetroidScreenshotLon(...).png (1.09 MB, 1280x768)

1.09 MB PNG

>>102018978
https://huggingface.co/spaces/gokaygokay/KolorsPlusPlus (Long Captioner - to generate the prompt you have to generate a picture)
>a detailed close-up captures a scene from the video game "metroid," featuring a green monster with large teeth and sharp claws. the monster's body is covered in a mix of green, yellow, and red hues, while its head and hands are adorned with black spikes. the background features a pink tunnel, a gray platform, and a blue water surface. the player character, wearing a green suit, is positioned on the left side of the screen, facing the viewer. the words "energy 67" are prominently displayed at the top of the screen, adding depth to the image.

Anonymous
08/21/24(Wed)21:58:32 No.102019071

Anonymous 08/21/24(Wed)21:58:32 No.102019071

>>102018988
https://github.com/bmaltais/kohya_ss/issues/2701#issuecomment-2297761417
I've been using this. Might want to bump the learning rate up a little

Anonymous
08/21/24(Wed)21:59:26 No.102019082

Anonymous 08/21/24(Wed)21:59:26 No.102019082

>>102018181
>skin detail is lost
That's just Flux though, turns them all into smoothskins

Anonymous
08/21/24(Wed)22:00:09 No.102019091

Anonymous 08/21/24(Wed)22:00:09 No.102019091

>>102018901
Prodigy will never match the quality of the optimizers like CAME and Adam with proper settings. But at the same time, you won't get failures and you will most likely get civitAI quality lora out of it. You just can't improve it or fix it when it goes wrong.

Anonymous
08/21/24(Wed)22:01:43 No.102019104

Anonymous 08/21/24(Wed)22:01:43 No.102019104

File: SuperMetroidScreenshotKolors.png (1.31 MB, 1280x768)

1.31 MB PNG

>>102019069
And here's the picture I had to generate by Kolors to get the prompt. Hmmm, is it better than Flux's rendition?

Anonymous
08/21/24(Wed)22:03:18 No.102019117

Anonymous 08/21/24(Wed)22:03:18 No.102019117

Can someone put a guide together on how to train LoRAs for Flux?

Anonymous
08/21/24(Wed)22:05:36 No.102019135

Anonymous 08/21/24(Wed)22:05:36 No.102019135

File: ComfyUI_09432_.png (1.2 MB, 800x1400)

1.2 MB PNG

Anonymous
08/21/24(Wed)22:08:03 No.102019172

Anonymous 08/21/24(Wed)22:08:03 No.102019172

I'm yet to find a style flux lora that doesn't degrade the anatomy that's not undertrained. Is it just me that have noticed it?

Anonymous
08/21/24(Wed)22:09:18 No.102019191

Anonymous 08/21/24(Wed)22:09:18 No.102019191

What should be the average loss when training flux loras? One anon said his was around .4 but is that the "correct" value?

Anonymous
08/21/24(Wed)22:10:33 No.102019202

Anonymous 08/21/24(Wed)22:10:33 No.102019202

File: Screen_20240821_081015_0001.jpg (9 KB, 596x30)

9 KB JPG

>>102019191
das what i'm gettin

Anonymous
08/21/24(Wed)22:13:25 No.102019224

Anonymous 08/21/24(Wed)22:13:25 No.102019224

>>102019117
Sure, once we get our shit straight

Anonymous
08/21/24(Wed)22:13:56 No.102019232

Anonymous 08/21/24(Wed)22:13:56 No.102019232

File: Capture.png (2 KB, 541x27)

2 KB PNG

>>102019191
Here is my latest batch

Anonymous
08/21/24(Wed)22:14:12 No.102019234

Anonymous 08/21/24(Wed)22:14:12 No.102019234

>>102019117
You just go to Kohya github or Ai toolkit github and follow the steps.

Anonymous
08/21/24(Wed)22:14:18 No.102019235

Anonymous 08/21/24(Wed)22:14:18 No.102019235

>>102018360
>reddit
Didnt read

Anonymous
08/21/24(Wed)22:19:37 No.102019274

Anonymous 08/21/24(Wed)22:19:37 No.102019274

>>102018521
People have no lives.

Anonymous
08/21/24(Wed)22:20:20 No.102019277

Anonymous 08/21/24(Wed)22:20:20 No.102019277

File: 00090-3902273944.png (1.35 MB, 896x1152)

1.35 MB PNG

Anonymous
08/21/24(Wed)22:23:51 No.102019303

Anonymous 08/21/24(Wed)22:23:51 No.102019303

>>102019232
that seems like a lowish lr

Anonymous
08/21/24(Wed)22:24:33 No.102019311

Anonymous 08/21/24(Wed)22:24:33 No.102019311

>>102018521
Overkill for such a generic image

Anonymous
08/21/24(Wed)22:26:40 No.102019333

Anonymous 08/21/24(Wed)22:26:40 No.102019333

I'm having a hard time wrapping my head around training a LoRA using Kohya. I didn't have issues with ai-toolkit.

Like, am I supposed to run these json files through the gui or through the command prompt?
I can't find simple documentation.

Anonymous
08/21/24(Wed)22:31:45 No.102019378

Anonymous 08/21/24(Wed)22:31:45 No.102019378

File: 00107-4288387873.png (1.38 MB, 896x1152)

1.38 MB PNG

>>102019303
I'm still experimenting with parameters. The results seem quite good though.

Anonymous
08/21/24(Wed)22:34:48 No.102019408

Anonymous 08/21/24(Wed)22:34:48 No.102019408

>>102019333
kohya wrote scripts-ss, bmaltais wrote a gui frontend for it called kohya-ss.

scripts-ss, despite being a buggy mess, more or less works for flux out of the box, you run it from a command line. bmaltais' shit is currently a half-updated hash of kohya's library and i haven't gotten it to work at all.

Anonymous
08/21/24(Wed)22:35:01 No.102019411

Anonymous 08/21/24(Wed)22:35:01 No.102019411

File: Capture.png (31 KB, 1451x277)

31 KB PNG

>>102019333

Anonymous
08/21/24(Wed)22:35:59 No.102019421

Anonymous 08/21/24(Wed)22:35:59 No.102019421

File: SuperMetroidScreenshotGPToMini.png (1.07 MB, 1280x768)

1.07 MB PNG

>>102017805
>>102018797
Oh, okay, looks like we have a new champion, it even managed to get Samus in the picture unlike all other captioners, I didn't even know if it was possible o_O
https://huggingface.co/spaces/Quardo/gpt-4o-mini (gpt-4o-mini-2024-07-18 - it has a hidden queue, so you have to wait for several minutes before it starts processing you, unless you're third or later in the queue, which increases the waiting.)
Again, the prompt is too long for a post so it's pastebinned:
https://pastebin.com/xUXWBDy2

Anonymous
08/21/24(Wed)22:36:54 No.102019427

Anonymous 08/21/24(Wed)22:36:54 No.102019427

>>102019232
My is at .42 but it seems to be going down. The first sample image looks okay.

Anonymous
08/21/24(Wed)22:39:50 No.102019451

Anonymous 08/21/24(Wed)22:39:50 No.102019451

File: ComfyUI_32705_.png (1.1 MB, 1024x1024)

1.1 MB PNG

Anonymous
08/21/24(Wed)22:40:08 No.102019455

Anonymous 08/21/24(Wed)22:40:08 No.102019455

>>102019333
LoRA_Easy_Training_Scripts on github

Anonymous
08/21/24(Wed)22:42:24 No.102019472

Anonymous 08/21/24(Wed)22:42:24 No.102019472

File: 08010-915118443-This is a(...).png (1.74 MB, 1536x1152)

1.74 MB PNG

>>102017385
joycaption

>This is a detailed, colorful screenshot from a video game, specifically from a side-scrolling, action-adventure game. The scene is set in a dark, cavernous environment with a green, rocky floor and a ceiling covered in spiky, green vegetation. On the left side, a yellow, armored character with a red helmet and a gun is visible, standing near a large, metallic door. The character appears to be a human, wearing a suit with a muscular build and a protective helmet.
>In the center-right of the image, a massive, green, reptilian monster with sharp, spiky horns and a large, gaping mouth is attacking. The creature's skin is textured with scales and its eyes are red and glowing. It is mid-action, with its clawed hand reaching out towards the character, and its mouth open wide, emitting a burst of green, smoky particles.
>The background is dark, with the only light coming from the green glow of the monster and the character's flashlight. Above the monster, there are various icons and indicators, including a health bar, energy meter, and a timer. The overall style is reminiscent of classic, pixelated 16-bit graphics, typical of early 1990s video games.

Anonymous
08/21/24(Wed)22:42:41 No.102019475

Anonymous 08/21/24(Wed)22:42:41 No.102019475

File: Screenshot 2024-08-22 144141.png (40 KB, 472x405)

40 KB PNG

>>102010022
Thank you Anon, this is exactly what I wanted

Anonymous
08/21/24(Wed)22:45:53 No.102019498

Anonymous 08/21/24(Wed)22:45:53 No.102019498

>>102019411
>>102019408
>>102019455

Never mind, I figured it out on my own. I think when I updated the dependencies it switched back to the master branch. Please don't call me an dumbass. I'm average at worst.

Anonymous
08/21/24(Wed)22:48:40 No.102019525

Anonymous 08/21/24(Wed)22:48:40 No.102019525

File: ComfyUI_00830_.png (3.51 MB, 2048x2048)

3.51 MB PNG

Anonymous
08/21/24(Wed)22:50:16 No.102019545

Anonymous 08/21/24(Wed)22:50:16 No.102019545

File: SuperMetroidScreenshotCLIP2.png (1.12 MB, 1280x768)

1.12 MB PNG

>>102019421
And I guess that finishes the challenge, is it me or does it look BETTER than the original picture? >>102017385 I could use the new one instead, that Kraid Samus monster thing looks badass.
I'm just going to compare these to the best we used to have in the past, the clip interrogators, which, while not reproducing pictures at all, seem fun and cool nonetheless.
https://huggingface.co/spaces/fffiloni/CLIP-Interrogator-2 (Best 24 max flavors)
>a video game with a dragon attacking a man, pixel art, inspired by katsuya terada, stalagmites, turtles, 2 4 0 p footage, 1988 video game screenshot, metroid, gameplay video, fangs and slime, donatello, screenshot from a movie, 240p, sofubi, from berserk, protagonist

Anonymous
08/21/24(Wed)22:52:24 No.102019570

Anonymous 08/21/24(Wed)22:52:24 No.102019570

File: 00124-1750270787.png (1.06 MB, 896x1152)

1.06 MB PNG

>Civitai Flux Training Contest
>https://civitai.com/articles/6797

Anonymous
08/21/24(Wed)22:53:38 No.102019588

Anonymous 08/21/24(Wed)22:53:38 No.102019588

>>102019525
Hands off my magic rock, bitch

Anonymous
08/21/24(Wed)22:56:38 No.102019636

Anonymous 08/21/24(Wed)22:56:38 No.102019636

>>102019421
>>102019545
nice, props for the tests anon, some neat gens and good info

Anonymous
08/21/24(Wed)22:57:02 No.102019641

Anonymous 08/21/24(Wed)22:57:02 No.102019641

Is it just me, or the more batches you make with the same prompt and settings in Flux the more accurate results you get over time and the more precise and with less fails.

Anonymous
08/21/24(Wed)22:57:36 No.102019653

Anonymous 08/21/24(Wed)22:57:36 No.102019653

File: 00323-2024-08-21-cJak.jpg (3.17 MB, 2048x2688)

3.17 MB JPG

Anonymous
08/21/24(Wed)22:58:10 No.102019662

Anonymous 08/21/24(Wed)22:58:10 No.102019662

File: SuperMetroidScreenshotClipFast.png (1.3 MB, 1280x768)

1.3 MB PNG

>>102019545
https://huggingface.co/spaces/pharmapsychotic/CLIP-Interrogator (Fast. It always gave me the most fun prompts.)
>a video game with a dragon attacking a man, metroidvania, epic boss fight, epic boss battle, metroid, contra, bossfight, royo, 16bits videogame, boss battle, cacodemon, snes screenshot, koopa, 8bits videogame, super mario bros 1 9 8 5, boss fight, dangerous swamp, snes graphics, 1 6 - bit

Anonymous
08/21/24(Wed)23:00:27 No.102019691

Anonymous 08/21/24(Wed)23:00:27 No.102019691

File: 2.jpg (650 KB, 832x1216)

650 KB JPG

Anonymous
08/21/24(Wed)23:06:39 No.102019766

Anonymous 08/21/24(Wed)23:06:39 No.102019766

File: 18881.png (545 KB, 1280x768)

545 KB PNG

>>102019636
:D
I tend to do these things in private, I'm glad I shared. I had no idea we were at this level already, I thought "maybe next year", but, apparently Flux is all about the prompts.
To close, I used seed 18881 in all my samples, and at one point I hit enter before pasting the prompt and sent a blank prompt, back in SD1.5 days that would produce garbage, but here, I got picrel.

Anonymous
08/21/24(Wed)23:11:25 No.102019820

Anonymous 08/21/24(Wed)23:11:25 No.102019820

>>102017554
can internvl2 be quanted?

Anonymous
08/21/24(Wed)23:13:50 No.102019846

Anonymous 08/21/24(Wed)23:13:50 No.102019846

File: ComfyUI_32709_.png (1.28 MB, 1024x1024)

1.28 MB PNG

Anonymous
08/21/24(Wed)23:14:54 No.102019858

Anonymous 08/21/24(Wed)23:14:54 No.102019858

File: 1724296462.png (13 KB, 740x323)

13 KB PNG

>>102019820
guess i should have kept reading the page before asking

Anonymous
08/21/24(Wed)23:19:13 No.102019896

Anonymous 08/21/24(Wed)23:19:13 No.102019896

>>102019858
I'm not on the page right now, but I distinctly remember them saying not to quant it because it fucks it up, but that may have just been for the bnb quants

Anonymous
08/21/24(Wed)23:20:50 No.102019914

Anonymous 08/21/24(Wed)23:20:50 No.102019914

>>102019896
Quants will make most VLMs make subtle mistakes like confusing left and right. Florence honestly is the best bang for your buck VLM, it's almost uncensored.

Anonymous
08/21/24(Wed)23:20:55 No.102019916

Anonymous 08/21/24(Wed)23:20:55 No.102019916

So do people always train at network dimensions that are double the previous dimension? Why can't I train at the biggest dimension I can?

Anonymous
08/21/24(Wed)23:21:27 No.102019921

Anonymous 08/21/24(Wed)23:21:27 No.102019921

File: 104308-tmp.png (2.73 MB, 1536x1728)

2.73 MB PNG

Anonymous
08/21/24(Wed)23:21:53 No.102019926

Anonymous 08/21/24(Wed)23:21:53 No.102019926

>>102019916
In my rough experience the bigger the better. It's definitely more accurate to crank it up.

Anonymous
08/21/24(Wed)23:22:13 No.102019932

Anonymous 08/21/24(Wed)23:22:13 No.102019932

>>102019914
I've spent most of this week trying to tard wrangle several vlms to give decent output on the first go. My conclusion is anything you can run at home is mid at best and automatic captions + manual editing is king.

Anonymous
08/21/24(Wed)23:22:35 No.102019934

Anonymous 08/21/24(Wed)23:22:35 No.102019934

>>102019896
page says 4bit quants fuck it up but nothing about 8bits causing issues

Anonymous
08/21/24(Wed)23:22:55 No.102019938

Anonymous 08/21/24(Wed)23:22:55 No.102019938

>>102019932
manual editing only works for baby datasets

Anonymous
08/21/24(Wed)23:23:13 No.102019942

Anonymous 08/21/24(Wed)23:23:13 No.102019942

Nobody respond to it, it will go back to /sdg/ on its own.

Anonymous
08/21/24(Wed)23:24:06 No.102019949

Anonymous 08/21/24(Wed)23:24:06 No.102019949

File: file.jpg (24 KB, 768x768)

24 KB JPG

Anonymous
08/21/24(Wed)23:24:14 No.102019950

Anonymous 08/21/24(Wed)23:24:14 No.102019950

>>102019938
Interesting. I have 1000 baby Indians in a sweatshop captioning my niche porn.

Anonymous
08/21/24(Wed)23:25:10 No.102019959

Anonymous 08/21/24(Wed)23:25:10 No.102019959

>>102019950
florence + wdv large v3
ezpz

Anonymous
08/21/24(Wed)23:27:55 No.102019997

Anonymous 08/21/24(Wed)23:27:55 No.102019997

>>102019916
I remember reading an AI article that says network dim of 8 is the sweet spot. Going above that would give you diminishing returns and reduce flexibility. It would also increase VRAM required.

Anonymous
08/21/24(Wed)23:27:57 No.102019998

Anonymous 08/21/24(Wed)23:27:57 No.102019998

File: FD_00124_.png (1.65 MB, 768x1344)

1.65 MB PNG

Anonymous
08/21/24(Wed)23:30:08 No.102020013

Anonymous 08/21/24(Wed)23:30:08 No.102020013

File: delux_me_00049_.jpg (343 KB, 896x512)

343 KB JPG

>>102019949
>what she sees

Anonymous
08/21/24(Wed)23:30:38 No.102020019

Anonymous 08/21/24(Wed)23:30:38 No.102020019

File: 103991-tmp.png (2.73 MB, 1536x1728)

2.73 MB PNG

Anonymous
08/21/24(Wed)23:30:39 No.102020020

Anonymous 08/21/24(Wed)23:30:39 No.102020020

>>102019997
>remember reading an AI article that says network dim of 8 is the sweet spot.

That sounds like absolute bullshit vramlet cope from the days of 1.5.
No offense.

Anonymous
08/21/24(Wed)23:31:33 No.102020024

Anonymous 08/21/24(Wed)23:31:33 No.102020024

File: 00145-2024-08-21-cJak.jpg (2.94 MB, 2048x2688)

2.94 MB JPG

>102020013
So ugly

Anonymous
08/21/24(Wed)23:32:26 No.102020032

Anonymous 08/21/24(Wed)23:32:26 No.102020032

File: ComfyUI_09504_.png (2.15 MB, 1400x800)

2.15 MB PNG

Whose the retard who started the entire "merge dev with schnell" trend? All he did was fuck up dev models.

Anonymous
08/21/24(Wed)23:34:26 No.102020051

Anonymous 08/21/24(Wed)23:34:26 No.102020051

>>102020032
It's just people trying to figure out quick fixes to complex problems. It was bound to happen once this model got released. Remember that "pony merge" that came out like on the day of release?

Anonymous
08/21/24(Wed)23:35:17 No.102020064

Anonymous 08/21/24(Wed)23:35:17 No.102020064

>>102020020
It was from the creators of the algorithm. The original LoRA article.

Anonymous
08/21/24(Wed)23:35:55 No.102020069

Anonymous 08/21/24(Wed)23:35:55 No.102020069

File: 5.jpg (1.95 MB, 1568x2016)

1.95 MB JPG

Anonymous
08/21/24(Wed)23:36:18 No.102020075

Anonymous 08/21/24(Wed)23:36:18 No.102020075

I got some questions about flux training on Kohya.
I'm getting a lot more it/s on Kohya compared to AI toolkit. Anyone else?
Also, I noticed the config I was using was training in 8bit, was that always the case for 24gb cards?

Anonymous
08/21/24(Wed)23:38:04 No.102020085

Anonymous 08/21/24(Wed)23:38:04 No.102020085

File: FD_00136_.png (1.61 MB, 768x1344)

1.61 MB PNG

Anonymous
08/21/24(Wed)23:38:21 No.102020086

Anonymous 08/21/24(Wed)23:38:21 No.102020086

>>102020064
That doesn't really rebut my claim it's outdated shit from the 1.5 days, but I'd assume the guy who created LoRAs probably isn't a vramlet.

I think bigger dimensions on a bigger model makes sense anyway. It was true though that if you trained at 128 dim on XL you'd generally deep fry the model.

Anonymous
08/21/24(Wed)23:38:53 No.102020089

Anonymous 08/21/24(Wed)23:38:53 No.102020089

>>102019570
Whoever uncucks flux will win all prizes, even the ones from different categories.

Anonymous
08/21/24(Wed)23:40:21 No.102020096

Anonymous 08/21/24(Wed)23:40:21 No.102020096

File: ComfyUI_09520_.png (2.27 MB, 1400x800)

2.27 MB PNG

>>102020051
I get why it was tried, what I'm confused about is why people are still doing it. There are zero benefits to it, there are only negatives.

Anonymous
08/21/24(Wed)23:42:28 No.102020117

Anonymous 08/21/24(Wed)23:42:28 No.102020117

>>102020086
>That doesn't really rebut my claim

What's your claim?

Anonymous
08/21/24(Wed)23:44:41 No.102020132

Anonymous 08/21/24(Wed)23:44:41 No.102020132

>>102020096
>>102020032
w cheese atv

Anonymous
08/21/24(Wed)23:46:31 No.102020157

Anonymous 08/21/24(Wed)23:46:31 No.102020157

>>102020117
That you will get a better quality LoRA that resembles the subjects and visuals if you use a higher dimension and as the model gets larger the larger the dimension you can use and achieve good results.
And that dim of 8 is for vramllets.

Anonymous
08/21/24(Wed)23:54:10 No.102020225

Anonymous 08/21/24(Wed)23:54:10 No.102020225

https://reddit.com/r/StableDiffusion/comments/1ey6hss/kohya_ss_gui_flux_lora_training_on_rtx_3060_lora/
>Kohya SS GUI FLUX LoRA Training on RTX 3060 - LoRA Rank 128 - uses 9.7 GB VRAM - Finally made it work. Results will be hopefully tomorrow training at the moment :)
Excuse me?

Anonymous
08/21/24(Wed)23:55:02 No.102020241

Anonymous 08/21/24(Wed)23:55:02 No.102020241

Someone get Flux to make a girl in a straitjacket. I give up.

Anonymous
08/21/24(Wed)23:56:46 No.102020264

Anonymous 08/21/24(Wed)23:56:46 No.102020264

File: 19240467.jpg (21 KB, 460x460)

21 KB JPG

>>102020225
Total cerfuckin' victory.

Anonymous
08/21/24(Wed)23:57:10 No.102020268

Anonymous 08/21/24(Wed)23:57:10 No.102020268

>>102020225
>CeFurkan
Ignore this faggot grifter

Anonymous
08/22/24(Thu)00:00:26 No.102020295

Anonymous 08/22/24(Thu)00:00:26 No.102020295

>>102017300
What are some cool things to do combining Flux and SD?

There are so many versions of SD, I am checking out waifu-diffusion Radiance. In the example, which has to be changed a bit to get it working, it just takes a ksampler output over to an upscaler, then to another ksampler.

Can Flux handle this? Don't I have to convert it to a format Flux likes?

Anonymous
08/22/24(Thu)00:00:32 No.102020297

Anonymous 08/22/24(Thu)00:00:32 No.102020297

File: cunt.png (320 KB, 396x387)

320 KB PNG

>>102020268
You sure you wanna be talking shit about the guy who rides a dinosaur?

Anonymous
08/22/24(Thu)00:01:23 No.102020305

Anonymous 08/22/24(Thu)00:01:23 No.102020305

>>102020225
>Finally made it work
By whining in Kohya's issues and making someone else do the fixing and then appropriating results.

Anonymous
08/22/24(Thu)00:02:11 No.102020310

Anonymous 08/22/24(Thu)00:02:11 No.102020310

>>102020225
>LoRA Rank 128
Into the trash it goes.

Anonymous
08/22/24(Thu)00:03:44 No.102020321

Anonymous 08/22/24(Thu)00:03:44 No.102020321

>>102020225
Noooo you can't train it on anything less than a 2 A100s! I mean 1 A100! I mean a 4090! I mean a 3090! I mean a 4080! I mean a 3080! I mean... I mean... I mean

Anonymous
08/22/24(Thu)00:03:56 No.102020322

Anonymous 08/22/24(Thu)00:03:56 No.102020322

>>102020225
Don't forget to subscribe to his patreon for an exclusive early access to the tutorial

Anonymous
08/22/24(Thu)00:04:06 No.102020323

Anonymous 08/22/24(Thu)00:04:06 No.102020323

>>102020305
Drives me fucking mad and then he'll have the nerve to shill his fucking patreon to sell (you) the results (you) just helped him with

Anonymous
08/22/24(Thu)00:04:56 No.102020327

Anonymous 08/22/24(Thu)00:04:56 No.102020327

>>102020321
Before you go about reddit clapping for
>Muh community
Take a good look at who's posting it and realize it's probably got some massive asterixis attached.

Anonymous
08/22/24(Thu)00:06:12 No.102020333

Anonymous 08/22/24(Thu)00:06:12 No.102020333

>>102020323
Nobody will do shit because 99% of people never even set foot into github, let alone look at the discussions. As far as they're concerned, he's THE top developer for LoRAs.

Anonymous
08/22/24(Thu)00:06:46 No.102020338

Anonymous 08/22/24(Thu)00:06:46 No.102020338

>>102020225
Do your part and downvote this faggot.

Anonymous
08/22/24(Thu)00:08:42 No.102020347

Anonymous 08/22/24(Thu)00:08:42 No.102020347

>>102020157
Anyone can test more dims on Flux and know side by side more dims is better. We're seriously talking about 10x the model. Also SDXL is a piece of shit that doesn't learn anything and deep fries even on a good day.

Anonymous
08/22/24(Thu)00:09:01 No.102020354

Anonymous 08/22/24(Thu)00:09:01 No.102020354

>>102020225
>23 seconds per it

Anonymous
08/22/24(Thu)00:10:01 No.102020365

Anonymous 08/22/24(Thu)00:10:01 No.102020365

>>102020347
I 100% agree, which I why I said a network dimension of 8 is retardedly small.

Anonymous
08/22/24(Thu)00:10:49 No.102020372

Anonymous 08/22/24(Thu)00:10:49 No.102020372

>>102020310
128 is not enough? I never trained models so I don't know the standards

Anonymous
08/22/24(Thu)00:11:45 No.102020380

Anonymous 08/22/24(Thu)00:11:45 No.102020380

Fried is an aesthetic look, and shouldn't be tossed out immediately, especially since there might be interesting gens in terms of what objects it might generate, that could be used with processing.

Anonymous
08/22/24(Thu)00:12:09 No.102020388

Anonymous 08/22/24(Thu)00:12:09 No.102020388

>>102020157
If you're just training styles and faces, you can probably go very high with network dimension. But, if you're training new concepts, you'll have lesser flexibility with a higher network dimension.

Anonymous
08/22/24(Thu)00:12:24 No.102020391

Anonymous 08/22/24(Thu)00:12:24 No.102020391

File: 104337-tmp.png (2.62 MB, 1536x1728)

2.62 MB PNG

Anonymous
08/22/24(Thu)00:14:29 No.102020405

Anonymous 08/22/24(Thu)00:14:29 No.102020405

File: ComfyUI_09583_.png (2.32 MB, 1400x800)

2.32 MB PNG

Can't wait for complete multimodal model (everything in/everything out). You can give it a title of a comic and have it actually create it, instead of just gibberish. The full version of GPT-4o is supposed to be like this but OpenAI hasn't/wont release it.

Anonymous
08/22/24(Thu)00:14:36 No.102020407

Anonymous 08/22/24(Thu)00:14:36 No.102020407

File: 00172-966204905.png (1.12 MB, 896x1152)

1.12 MB PNG

Anonymous
08/22/24(Thu)00:16:02 No.102020416

Anonymous 08/22/24(Thu)00:16:02 No.102020416

>>102020405
>full version of GPT-4o is supposed to be like this but OpenAI hasn't/wont release it
Anon.. that's just something saltman tells to investors to get more capital. Like telling kids if they're good little boys and girls they'll get a present.

Anonymous
08/22/24(Thu)00:16:17 No.102020417

Anonymous 08/22/24(Thu)00:16:17 No.102020417

Has anyone managed to make Skimmed_CFG work on Flux? For me it acts as if it doesn't exist

Anonymous
08/22/24(Thu)00:18:07 No.102020425

Anonymous 08/22/24(Thu)00:18:07 No.102020425

>>102020416
it's all smoke a mirrors anyways, at best it's just a bunch of workflows stapled together, there's never going to be a magic does it all model

Anonymous
08/22/24(Thu)00:19:16 No.102020437

Anonymous 08/22/24(Thu)00:19:16 No.102020437

>>102020425
Not as long as transformers is the backbone of it all.

Anonymous
08/22/24(Thu)00:21:42 No.102020458

Anonymous 08/22/24(Thu)00:21:42 No.102020458

>>102020437
yeah the model you're asking for requires billions to train and run, they're running out of runway

Anonymous
08/22/24(Thu)00:22:27 No.102020463

Anonymous 08/22/24(Thu)00:22:27 No.102020463

File: FLUX00016.png (2.32 MB, 1536x1248)

2.32 MB PNG

Anonymous
08/22/24(Thu)00:22:52 No.102020467

Anonymous 08/22/24(Thu)00:22:52 No.102020467

This is fun. waifu-diffusion (sd 1.5 based?)

not sure why it thinks this is a fat frog. Maybe I am word-frying it with too many negatives from the old prompt.

https://files.catbox.moe/v5y9mx.png

Anonymous
08/22/24(Thu)00:23:33 No.102020472

Anonymous 08/22/24(Thu)00:23:33 No.102020472

File: 00002-1555449755.png (1.57 MB, 1024x1024)

1.57 MB PNG

Anonymous
08/22/24(Thu)00:24:37 No.102020479

Anonymous 08/22/24(Thu)00:24:37 No.102020479

>>102020467
>waifu-diffusion
Is it 2022 again?

Anonymous
08/22/24(Thu)00:25:29 No.102020484

Anonymous 08/22/24(Thu)00:25:29 No.102020484

File: 00003-3031908286.png (1.23 MB, 1024x1024)

1.23 MB PNG

Anonymous
08/22/24(Thu)00:26:52 No.102020501

Anonymous 08/22/24(Thu)00:26:52 No.102020501

>>102020479
My goal isn't successful renders, necessarily :^)

Ever try swapping vae?

Anonymous
08/22/24(Thu)00:27:05 No.102020506

Anonymous 08/22/24(Thu)00:27:05 No.102020506

File: 360_F_316205097_YJel9gZHf(...).jpg (17 KB, 540x360)

17 KB JPG

>>102020467
>waifu-diffusion

Anonymous
08/22/24(Thu)00:27:17 No.102020508

Anonymous 08/22/24(Thu)00:27:17 No.102020508

File: 00005-3346765107.png (1000 KB, 1024x1024)

1000 KB PNG

Anonymous
08/22/24(Thu)00:28:16 No.102020514

Anonymous 08/22/24(Thu)00:28:16 No.102020514

Anyone training on Kohya? I'm currently only using 18/24gb and I think that's a little low? I just wanna make sure I'm not using any unnecessary optimizations.
Currently training at rank 32 on 512x512

Anonymous
08/22/24(Thu)00:28:42 No.102020519

Anonymous 08/22/24(Thu)00:28:42 No.102020519

>>102020514
more batch size bb

Anonymous
08/22/24(Thu)00:31:20 No.102020542

Anonymous 08/22/24(Thu)00:31:20 No.102020542

>>102020519
True, I forgot about batch size. I remember getting into the weeds trying to find out if it was good or bad for the model to have bigger batch sizes and I think the consensus was that it was actually good for helping the model generalize?

Anonymous
08/22/24(Thu)00:32:08 No.102020548

Anonymous 08/22/24(Thu)00:32:08 No.102020548

File: Griffter.jpg (44 KB, 471x501)

44 KB JPG

>>102020514
don't worry anon, that man is gonna save you >>102020225

Anonymous
08/22/24(Thu)00:32:45 No.102020553

Anonymous 08/22/24(Thu)00:32:45 No.102020553

File: tod.png (2.44 MB, 1018x1018)

2.44 MB PNG

Anonymous
08/22/24(Thu)00:32:48 No.102020554

Anonymous 08/22/24(Thu)00:32:48 No.102020554

>>102020542
batch size is more fasterer, everything else is dumb retard shit from people who train within the margin of noise
you actually think they train the models with batch size 1 or batch size 128?

Anonymous
08/22/24(Thu)00:33:08 No.102020557

Anonymous 08/22/24(Thu)00:33:08 No.102020557

>>102020548
No, I don't want to be saved by cerfuckin. I want to make sure one of his shortcuts isn't fucking up the potential quality of my models.

Anonymous
08/22/24(Thu)00:33:43 No.102020566

Anonymous 08/22/24(Thu)00:33:43 No.102020566

File: ComfyUI_00086_ sm.png (1.92 MB, 1280x720)

1.92 MB PNG

>>102020506
a 1.5 finetune.

Here's an example of possible output.

Anonymous
08/22/24(Thu)00:34:34 No.102020577

Anonymous 08/22/24(Thu)00:34:34 No.102020577

>>102020566
How new are you

Anonymous
08/22/24(Thu)00:34:48 No.102020580

Anonymous 08/22/24(Thu)00:34:48 No.102020580

File: ComfyUI_00087_sm.png (2.23 MB, 1510x850)

2.23 MB PNG

>>102020566
Here I have changed the prompt a bit.

Anonymous
08/22/24(Thu)00:35:54 No.102020585

Anonymous 08/22/24(Thu)00:35:54 No.102020585

>>102020580
>>102020566
NTA, but everyone knows what waifu diffusion is. Everyone is acting puzzled about it because they genuinely cannot fathom why you are using an outdated proof of concept model in the year 2024

Anonymous
08/22/24(Thu)00:36:35 No.102020588

Anonymous 08/22/24(Thu)00:36:35 No.102020588

>>102020566
you can do that with Flux by training a lora with 1e3 learning rate of pictures of frogs

Anonymous
08/22/24(Thu)00:37:29 No.102020600

Anonymous 08/22/24(Thu)00:37:29 No.102020600

>>102020566
>>102020580
looks like shit

Anonymous
08/22/24(Thu)00:37:55 No.102020607

Anonymous 08/22/24(Thu)00:37:55 No.102020607

>>102020600
waifu sisters it's over for us

Anonymous
08/22/24(Thu)00:39:51 No.102020618

Anonymous 08/22/24(Thu)00:39:51 No.102020618

File: ComfyUI_00088_sm.png (1.29 MB, 1280x720)

1.29 MB PNG

>>102020577
One problem with Fusion is it refuses to draw some things, for example deformed arms.

>>102020585
Artistically, instead of middleclass "artistically". The problem with Flux is it's extremely mid, so far.

"to break it"

Anonymous
08/22/24(Thu)00:40:52 No.102020628

Anonymous 08/22/24(Thu)00:40:52 No.102020628

>>102020618
that's not artistic, that's what every model does when you fry the fuck out of it, it's about as technically impressive as kicking over a can of paint on your carpet

Anonymous
08/22/24(Thu)00:40:52 No.102020629

Anonymous 08/22/24(Thu)00:40:52 No.102020629

and if you can't see why
>>102020618
is the best gen of the thread, then you aren't qualified to comment on aesthetics.

Anonymous
08/22/24(Thu)00:41:19 No.102020632

Anonymous 08/22/24(Thu)00:41:19 No.102020632

>>102020618
This has to be a bot or something

Anonymous
08/22/24(Thu)00:41:59 No.102020639

Anonymous 08/22/24(Thu)00:41:59 No.102020639

>>102020618
You're just generating a 1.5 model at resolutions it was not trained to generate at. That's why it outputting garbage

Anonymous
08/22/24(Thu)00:42:38 No.102020643

Anonymous 08/22/24(Thu)00:42:38 No.102020643

File: 00009-1426206558.png (1.02 MB, 1024x1024)

1.02 MB PNG

Anonymous
08/22/24(Thu)00:43:00 No.102020646

Anonymous 08/22/24(Thu)00:43:00 No.102020646

>>102020632
not gonna lie. I was thinking the same thing. Its replies are nonsense.

Anonymous
08/22/24(Thu)00:43:59 No.102020653

Anonymous 08/22/24(Thu)00:43:59 No.102020653

File: 00009-1053315402.png (1.27 MB, 896x1152)

1.27 MB PNG

Behold, real art

Anonymous
08/22/24(Thu)00:44:14 No.102020654

Anonymous 08/22/24(Thu)00:44:14 No.102020654

>>102020646
Yeah idk what's going on

Anonymous
08/22/24(Thu)00:44:31 No.102020656

Anonymous 08/22/24(Thu)00:44:31 No.102020656

>>102020632
>>102020646
You are basically uncultured. What is the best art magazine, in your opinion. Both respond please.

Anonymous
08/22/24(Thu)00:44:55 No.102020660

Anonymous 08/22/24(Thu)00:44:55 No.102020660

>>102020643
>not a school girl outfit

Anonymous
08/22/24(Thu)00:45:13 No.102020663

Anonymous 08/22/24(Thu)00:45:13 No.102020663

>>102020656
art magazine, lol

Anonymous
08/22/24(Thu)00:45:55 No.102020669

Anonymous 08/22/24(Thu)00:45:55 No.102020669

>>102020656
That's cool, for me my favorite dessert is Strawberry Short Cake

Anonymous
08/22/24(Thu)00:45:58 No.102020670

Anonymous 08/22/24(Thu)00:45:58 No.102020670

>>102020653
your images are too white, you should ditch Dynamic Thresholding for Tonemap or AutomaticCFG

Anonymous
08/22/24(Thu)00:45:59 No.102020671

Anonymous 08/22/24(Thu)00:45:59 No.102020671

>>102020654
It kind of makes sense. The choice of a low overhead model like waifu diffusion is probably to save vram for its word salad.

Anonymous
08/22/24(Thu)00:46:06 No.102020673

Anonymous 08/22/24(Thu)00:46:06 No.102020673

>>102020660
no, its a professor outfit

Anonymous
08/22/24(Thu)00:46:10 No.102020674

Anonymous 08/22/24(Thu)00:46:10 No.102020674

File: 00016-1541856161.png (1.25 MB, 896x1152)

1.25 MB PNG

real art

Anonymous
08/22/24(Thu)00:46:51 No.102020679

Anonymous 08/22/24(Thu)00:46:51 No.102020679

File: 00013-2692419392.png (1.23 MB, 1024x1024)

1.23 MB PNG

Anonymous
08/22/24(Thu)00:47:12 No.102020683

Anonymous 08/22/24(Thu)00:47:12 No.102020683

>>102020670
Is there a way to do that on forge? I have comfy working but forge just seems way faster

Anonymous
08/22/24(Thu)00:47:21 No.102020684

Anonymous 08/22/24(Thu)00:47:21 No.102020684

>>102020674
that's the problem with loading multiple loras, one will take over the other, George's face is everywhere on that picture kek

Anonymous
08/22/24(Thu)00:47:26 No.102020685

Anonymous 08/22/24(Thu)00:47:26 No.102020685

>>102020656
Shonen Jump

Anonymous
08/22/24(Thu)00:48:06 No.102020689

Anonymous 08/22/24(Thu)00:48:06 No.102020689

>>102020656
Ideogram isn't good at resolution like it's been hyped up. Artistically you might say that <|endoftext|>

Anonymous
08/22/24(Thu)00:48:26 No.102020692

Anonymous 08/22/24(Thu)00:48:26 No.102020692

>>102020684
george billrapekillstanza

Anonymous
08/22/24(Thu)00:48:51 No.102020697

Anonymous 08/22/24(Thu)00:48:51 No.102020697

>>102020689
Too on the nose.

Anonymous
08/22/24(Thu)00:48:57 No.102020698

Anonymous 08/22/24(Thu)00:48:57 No.102020698

File: Capture.png (77 KB, 693x750)

77 KB PNG

>>102020683
I don't think Forge has Tonemap or Automatic CFG unfortunately, if you want to stick with Forge at least use those parameters on DynamicThresholding, they work the best for Flux

Anonymous
08/22/24(Thu)00:49:08 No.102020700

Anonymous 08/22/24(Thu)00:49:08 No.102020700

File: delux_hh_00027_.png (2.32 MB, 1024x1344)

2.32 MB PNG

>>102020653
shits fried yo

Anonymous
08/22/24(Thu)00:50:48 No.102020714

Anonymous 08/22/24(Thu)00:50:48 No.102020714

File: 00201-350718977.png (1.24 MB, 896x1152)

1.24 MB PNG

Anonymous
08/22/24(Thu)00:51:03 No.102020719

Anonymous 08/22/24(Thu)00:51:03 No.102020719

>>102020698
Thank you, yea mine were a little different than that. I'll give that a shot. I'll give comfy another shot too...was definitely getting better outputs even with normal dynamic thresh

Anonymous
08/22/24(Thu)00:51:10 No.102020721

Anonymous 08/22/24(Thu)00:51:10 No.102020721

File: 00018-110857331.png (1.18 MB, 1024x1024)

1.18 MB PNG

Anonymous
08/22/24(Thu)00:51:28 No.102020724

Anonymous 08/22/24(Thu)00:51:28 No.102020724

>>102020225
So he's been squatting here plundering and piecing together advice from anon, right?

Anonymous
08/22/24(Thu)00:51:58 No.102020728

Anonymous 08/22/24(Thu)00:51:58 No.102020728

File: Flux-1_FT_10_1e-5_sigmoid(...).png (721 KB, 768x768)

721 KB PNG

Just thought I'd share a sample image from my first test LoRA with Kohya. It's from a really badly captioned dataset of like 4 anpanman images and 3 Simpsons images I was using to test the captioner.

Anonymous
08/22/24(Thu)00:51:59 No.102020729

Anonymous 08/22/24(Thu)00:51:59 No.102020729

>>102020514
Why aren't you training at a higher resolution? The high rank pointless when the resolution is so small.

Anonymous
08/22/24(Thu)00:52:59 No.102020738

Anonymous 08/22/24(Thu)00:52:59 No.102020738

>>102020729
>The high rank pointless when the resolution is so small.
Can you explain the relation of rank and resolution as if I was retard?

Anonymous
08/22/24(Thu)00:54:12 No.102020751

Anonymous 08/22/24(Thu)00:54:12 No.102020751

File: Flux-1_FT_10_1e-5_sigmoid(...).png (691 KB, 768x768)

691 KB PNG

Anonymous
08/22/24(Thu)00:54:16 No.102020752

Anonymous 08/22/24(Thu)00:54:16 No.102020752

File: 00021-5001803.png (1.36 MB, 1024x1024)

1.36 MB PNG

Anonymous
08/22/24(Thu)00:55:14 No.102020758

Anonymous 08/22/24(Thu)00:55:14 No.102020758

>>102020729
Supposedly training at 512 takes about a quarter of the time it does for 1024.
Anon here shows a comparison of the results of 1024 vs 512 training:
>>102017742

Anonymous
08/22/24(Thu)00:56:16 No.102020766

Anonymous 08/22/24(Thu)00:56:16 No.102020766

File: 00210-1936981994.png (1.35 MB, 896x1152)

1.35 MB PNG

Anonymous
08/22/24(Thu)00:56:19 No.102020767

Anonymous 08/22/24(Thu)00:56:19 No.102020767

File: 00024-266248679.png (1.24 MB, 1024x1024)

1.24 MB PNG

Anonymous
08/22/24(Thu)00:56:49 No.102020769

Anonymous 08/22/24(Thu)00:56:49 No.102020769

>>102020758
I think that anon is claiming that training at 512 at such a high rank is somehow not making use of that rank. But I'm not sure what he means by that.

Anonymous
08/22/24(Thu)00:57:00 No.102020771

Anonymous 08/22/24(Thu)00:57:00 No.102020771

File: P08174_10.jpg (202 KB, 1103x1536)

202 KB JPG

>>102020697
>>102020689
Rate this.

Anonymous
08/22/24(Thu)00:58:14 No.102020782

Anonymous 08/22/24(Thu)00:58:14 No.102020782

File: 00028-2530809537.png (1.39 MB, 1024x1024)

1.39 MB PNG

Anonymous
08/22/24(Thu)00:59:08 No.102020789

Anonymous 08/22/24(Thu)00:59:08 No.102020789

File: ComfyUI_00091_sm.png (1.14 MB, 1280x720)

1.14 MB PNG

>>102020752
Do you understand why yours is crap as art? And why this is not?

Anonymous
08/22/24(Thu)00:59:42 No.102020791

Anonymous 08/22/24(Thu)00:59:42 No.102020791

File: 00030-1684899914.png (1.47 MB, 1024x1024)

1.47 MB PNG

>>102020789

Anonymous
08/22/24(Thu)01:00:08 No.102020795

Anonymous 08/22/24(Thu)01:00:08 No.102020795

>>102020782
Cool, how did you do it? You conquered the auto-whore.

Anonymous
08/22/24(Thu)01:00:10 No.102020796

Anonymous 08/22/24(Thu)01:00:10 No.102020796

File: 1694708342248886.jpg (130 KB, 1264x708)

130 KB JPG

>>102018104
It have best prompt understanding but older ver. was also very good at that . This new update didnt have anything new

Anonymous
08/22/24(Thu)01:00:20 No.102020798

Anonymous 08/22/24(Thu)01:00:20 No.102020798

>>102020738
Network rank represents the max amount of information that can be learned. The resolution is the max size of the training images. If your dataset has images that are larger than the max resolution, they will be resized to a smaller size. There's only so much detail that can be learned from a small sized image, so having a high rank won't be beneficial.

Anonymous
08/22/24(Thu)01:00:51 No.102020804

Anonymous 08/22/24(Thu)01:00:51 No.102020804

>>102020795
i wrote some words and hit generate

Anonymous
08/22/24(Thu)01:01:05 No.102020806

Anonymous 08/22/24(Thu)01:01:05 No.102020806

File: FD_00010_.png (822 KB, 1024x1024)

822 KB PNG

>>102020789
Do you understand why yours is crap as art? And why this is not?

Anonymous
08/22/24(Thu)01:02:05 No.102020812

Anonymous 08/22/24(Thu)01:02:05 No.102020812

>>102020806
Because you are poor.

Anonymous
08/22/24(Thu)01:02:11 No.102020814

Anonymous 08/22/24(Thu)01:02:11 No.102020814

File: 00037-1694111930.png (1023 KB, 1024x1024)

1023 KB PNG

you dumb

Anonymous
08/22/24(Thu)01:02:15 No.102020815

Anonymous 08/22/24(Thu)01:02:15 No.102020815

File: SD3_13624_00069_.png (1.7 MB, 1024x1024)

1.7 MB PNG

>>102020789
>>102020806

Anonymous
08/22/24(Thu)01:03:57 No.102020825

Anonymous 08/22/24(Thu)01:03:57 No.102020825

>>102020548
isn't this a professor in AI at some Turkish university or something and he knows jack shit about AI

Anonymous
08/22/24(Thu)01:03:58 No.102020826

Anonymous 08/22/24(Thu)01:03:58 No.102020826

>>102020791
Add trash to the yard, remove eye contact, and make her frown. There should be a pit bull.

>>102020815
You're improving.

Anonymous
08/22/24(Thu)01:04:15 No.102020829

Anonymous 08/22/24(Thu)01:04:15 No.102020829

File: 00040-1914404962.png (1.33 MB, 1024x1024)

1.33 MB PNG

Anonymous
08/22/24(Thu)01:04:59 No.102020839

Anonymous 08/22/24(Thu)01:04:59 No.102020839

>>102020814
You think she looks pretty, but she looks ugly.

Anonymous
08/22/24(Thu)01:05:08 No.102020841

Anonymous 08/22/24(Thu)01:05:08 No.102020841

File: 00044-3105816695.png (1.31 MB, 1024x1024)

1.31 MB PNG

Anonymous
08/22/24(Thu)01:06:47 No.102020857

Anonymous 08/22/24(Thu)01:06:47 No.102020857

>>102020839
"she"

Anonymous
08/22/24(Thu)01:06:54 No.102020859

Anonymous 08/22/24(Thu)01:06:54 No.102020859

>>102020825
He's the monster that haunts your github repository.

Anonymous
08/22/24(Thu)01:07:30 No.102020867

Anonymous 08/22/24(Thu)01:07:30 No.102020867

File: 00049-3374216342.png (1.14 MB, 1024x1024)

1.14 MB PNG

Anonymous
08/22/24(Thu)01:09:50 No.102020894

Anonymous 08/22/24(Thu)01:09:50 No.102020894

>>102020867
Sometimes Flux is Middest Journey

Anonymous
08/22/24(Thu)01:12:20 No.102020911

Anonymous 08/22/24(Thu)01:12:20 No.102020911

File: 00043-3496507613.png (1.22 MB, 1024x1024)

1.22 MB PNG

Anonymous
08/22/24(Thu)01:18:19 No.102020955

Anonymous 08/22/24(Thu)01:18:19 No.102020955

>>102020548
Pony lora of this guy when?

Anonymous
08/22/24(Thu)01:20:18 No.102020975

Anonymous 08/22/24(Thu)01:20:18 No.102020975

File: ComfyUI_00095_.png (17 KB, 96x96)

17 KB PNG

In small images, we see Flux being strange.

Anonymous
08/22/24(Thu)01:22:56 No.102020997

Anonymous 08/22/24(Thu)01:22:56 No.102020997

>>102020975
Of all things posted, this is the first thing to make me really upset that I can't run Flux

Anonymous
08/22/24(Thu)01:25:07 No.102021013

Anonymous 08/22/24(Thu)01:25:07 No.102021013

I think I'm starting to get results, The caption is really important, which means tard-wrangling each individual auto-generated caption......

Anonymous
08/22/24(Thu)01:26:57 No.102021027

Anonymous 08/22/24(Thu)01:26:57 No.102021027

File: ComfyUI_00006_.png (824 KB, 1024x1024)

824 KB PNG

Thank you anon who recommended automatic cfg.

Anonymous
08/22/24(Thu)01:28:45 No.102021035

Anonymous 08/22/24(Thu)01:28:45 No.102021035

>>102021027
you're welcome o/

Anonymous
08/22/24(Thu)01:29:19 No.102021038

Anonymous 08/22/24(Thu)01:29:19 No.102021038

>>102020771
awesome
I would buy a fine print, bro
rly

Anonymous
08/22/24(Thu)01:29:43 No.102021043

Anonymous 08/22/24(Thu)01:29:43 No.102021043

>>102021013
>auto-generated caption
did you generate the caption through a vlm?

Anonymous
08/22/24(Thu)01:29:47 No.102021044

Anonymous 08/22/24(Thu)01:29:47 No.102021044

>>102021013
>I think I'm starting to get results, The caption is really important
Well, if you feed the AI with garbage data, it will sput out garbage output, so yeah, data quality is really really important

Anonymous
08/22/24(Thu)01:30:33 No.102021047

Anonymous 08/22/24(Thu)01:30:33 No.102021047

Freshly baked loaf of...
>>102021045
>>102021045
>>102021045

Anonymous
08/22/24(Thu)01:37:52 No.102021111

Anonymous 08/22/24(Thu)01:37:52 No.102021111

>>102020997
Thanks!

>>102021038
I like it.

https://www.tate.org.uk/art/artists/gerard-schneider-1906

Anonymous
08/22/24(Thu)02:51:50 No.102021690

Anonymous 08/22/24(Thu)02:51:50 No.102021690

>>102017856
>I just used joycaption
Did you at least read some of them to see what you are training for?
You people are the reason slop loras are so prevalent on civitai.
Garbage in, garbage out.
Curate by hand. Caption by hand (or at least review automatic captioning by hand).

Anonymous
08/22/24(Thu)02:54:27 No.102021712

Anonymous 08/22/24(Thu)02:54:27 No.102021712

>>102018044
It’s also slower than Q8_0

Anonymous
08/22/24(Thu)02:54:57 No.102021716

Anonymous 08/22/24(Thu)02:54:57 No.102021716

>>102021690
>>102021712
New
>>102021045
>>102021045
>>102021045

Anonymous
08/22/24(Thu)02:58:15 No.102021746

Anonymous 08/22/24(Thu)02:58:15 No.102021746

>>102018701
>poorfags shouldn't be baking loras
This. And use 1024x1024. Don’t settle for “good enough” that produces mediocrity.

Anonymous
08/22/24(Thu)02:59:29 No.102021761

Anonymous 08/22/24(Thu)02:59:29 No.102021761

>>102018851
This is good.

Anonymous
08/22/24(Thu)03:09:30 No.102021858

Anonymous 08/22/24(Thu)03:09:30 No.102021858

>>102017742
Nice, but you can see some nonsense with the books being blank notebooks in the 512 version. Needs more testing to see how much detail you are sacrificing. Flux is able to produce very coherent backgrounds and you might be fucking it up.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.