[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: coll.jpg (3.31 MB, 3264x3264)
3.31 MB
3.31 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102079096

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>GPU performance
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/c/kdg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/u/udg
>>>/trash/sdg
>>
File: delux_hh_00043_.png (2.27 MB, 1024x1344)
2.27 MB
2.27 MB PNG
>mfw
>>
File: ComfyUI_02389_.png (1.3 MB, 960x1344)
1.3 MB
1.3 MB PNG
>>
Has anyone tried LoRA training with clip L training supported? I think simple tuner is the only one that currently supports it but I can't be downloading like 5 different venvs just to train LoRAs.
>>
File: ComfyUI_02322_.png (1.59 MB, 1024x1024)
1.59 MB
1.59 MB PNG
>>
File: 2024-08-26_00032_.jpg (1.71 MB, 3840x2160)
1.71 MB
1.71 MB JPG
>>102083367
ty baker, also Gundam
>>
File: 1696554017300009.png (1.82 MB, 1024x1024)
1.82 MB
1.82 MB PNG
Trying a LoRA trained on 68 DALL-E 3 produced images intended to have boomer shooter aesthetic. Minus trigger word...
>>
File: flux.jpg (813 KB, 1552x1200)
813 KB
813 KB JPG
>>102083422
Love this
>>
File: 1707846905856831.png (1.82 MB, 1024x1024)
1.82 MB
1.82 MB PNG
>>102083452
Plus trigger word, 500 steps
>>
File: 1702463862874427.png (1.82 MB, 1024x1024)
1.82 MB
1.82 MB PNG
>>102083460
1000 steps
>>
File: 00002-3156892793.jpg (421 KB, 1552x1200)
421 KB
421 KB JPG
>>102083452
>boomer shooter aesthetic
What
>>
File: 1695760119164526.png (1.86 MB, 1024x1024)
1.86 MB
1.86 MB PNG
>>102083467
1500 steps, starting to see a bit of pixelation and more texture-like repetition
>>
File: 1720632053302116.png (1.87 MB, 1024x1024)
1.87 MB
1.87 MB PNG
>>102083475
And 2000 steps, suddenly video game looking, albeit still 3D and not 2.5D. 2000 steps more to go, and I should probably mess with weights as well. The prompt also makes no reference to video games, pixel art, Doom/Duke Nukem, etc while the input captions do, so this is kind of "hard difficulty"
>>
File: 1724517206431867.jpg (250 KB, 1024x1024)
250 KB
250 KB JPG
>>102083470
Like picrel. I haven't seen any Stable Diffusion or FLUX model or lora that can pull it off like DALL-E 3 can. I'm currently taking a bunch of 1920x1080 screenshots in classic boomer shooters and wads to crop and caption as well, in case this first attempt is a total failure.
>>
>>102083452
Not to be a dick, but there are already plenty of boomer shooters out there from which you can gather images, why feed the model slop?
>>
You cold easily condense these into say a collage or multiple images
>>102083519
Because he likes the look of slop
>>
>>102083535
>You cold easily condense these into say a collage or multiple images
Worried that you have to post a new worthless general 10 minutes earlier than usual?
>>
>>102083519
For better or worse, DALL-E 3's boomer shooter outputs have a pixel artness and variety that real boomer shooters can't replicate easily, so there's a potential of aesthetic advantage that I want to try.
>>102083535
>You cold easily condense these into say a collage or multiple images
I spent an hour trying to do that in Comfy and gave up. Collages are one thing that A1111 does much better. Also, the last thread hit the bump limit very early and still hasn't even hit the image limit yet.
>>
>>102083556
But AI images also contain a lot of nonsense, if you put AI images back into AI training you're just going to magnify the nonsense.
>>
>>102083550
Not who you think I am
>>102083556
There are external programs for combining images, anon
>image limit
Who cares
>>
What is a good flux test prompt?
>>
>>102083556
>and still hasn't even hit the image limit yet.
Why did newfags make a new thread anyway?
>>
>>102083572
That's entirely dependent of what you're testing
>>
>>102083582
The thread was over the bump limit so the thread would have died of old age regardless of the image limit.
>>
File: 1696275900772118.png (1.51 MB, 1024x1024)
1.51 MB
1.51 MB PNG
>>102083494
And here it is with a prompt with references to Doom, video games, etc. Getting there.
>>
>>102083604
>so the thread would have died of old age regardless of the image limit.
So will this thread, fucking retard. Why do you make a new thread while you can still post in the old one? I don't get you retarded newfags.
>>
>>102083604
Usual protocol is to keep using the thread until either image limit is reached or it expires
>>
>>102083632
>>102083630
When a thread goes over the bump limit, nothing can be done to prevent it from falling off the catalog and 404ing, it could happen slowly, it could happen quickly. Best practice is to move on to a new thread when that happens because the users lose all control over keeping the thread from 404ing
>>
>>102083658
What control, you stupid bitch? You can make a new thread when it 404s. Until then you can post in the thread just fine.
>>
>>102083516
Cool, I've been playing Turbo Overkill lately. Amid Evil also looks awesome with Ray tracing. So nice to play a game without cutscenes and dialogue. Boomer shooters > call of duty movieslop.
You could potentially create a full blown ad with screenshots of an amazing-looking boomer shooter that doesn't exist, lmao
>>
File: file.jpg (89 KB, 768x768)
89 KB
89 KB JPG
>>
>>102083658
It would be nice if people attached their latest decent gen to the "discussion" because it often devolves into a conversation between two schizos that no one else cares about
>>
File: 3x3 doomguy tours america.jpg (3.15 MB, 3072x3072)
3.15 MB
3.15 MB JPG
>>102083694
>You could potentially create a full blown ad with screenshots of an amazing-looking boomer shooter that doesn't exist, lmao
I'd post fictional wads in the /v/ DALL-E 3 generals and usually get a fair number of (You)'s, people hunger for more Duke Nukem 3D experiences exploring real life areas in a 2.5D world
>>102083716
This is great, love the first-person can grab, lora or just prompting?
>>
>>102083746
>boomer shooter screenshot, In a pixelated, neon-lit corridor that screams retro '90s video game aesthetic, a rugged, blocky character clad in a leather jacket and combat boots stands ready for action, his arm extended in a classic first-person shooter view. But instead of the usual arsenal of firearms, he’s gripping a can of Diet Coke, its iconic red and silver label shimmering under the harsh, fluorescent lighting. The can looks out of place against the backdrop of grimy metal walls, flickering CRT screens, and distant explosions echoing through the labyrinthine halls. The contrast is jarring, almost comical, as if this caffeine-powered elixir is his weapon of choice against the alien invaders or mutant hordes lurking just around the corner. His other hand, visible at the bottom of the screen, is clenched into a fist, ready to crush the empty can once the last drop of soda is gone, his health bar visibly increasing with each sip, defying the dystopian chaos around him
1000 steps in a Lora using real screenshots of random boomer shooters
>>
File: 1695696049007485.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
>>102083778
Nice, how many images did you train it on?
>>
>>102083870
about 200
>>
>>102083689
>What control, you stupid bitch?
When users post something, the thread is raised to the top of the board, going over the bump limit disables this function. It's always better to have the next thread prepared for visibility rather than making the thread after the it 404s.
But you're clearly a schizo so I'm done talking to you about it.
>>
>>102083720
can't gen, training.
>>
>>102083910
you're talking to an llm that tries to find something negative to bitch about every thread
>>
File: 2024-08-26_00048_.jpg (1.16 MB, 3840x2160)
1.16 MB
1.16 MB JPG
attack of the bikini clone 1girls
>>
File: 2024-08-26_00050_.jpg (749 KB, 3840x2160)
749 KB
749 KB JPG
>>
>>102083910
I know how 4chan works. But apparently you don't. You can post in a thread even if it has reached bump limit. It won't be "closed" (using terminology even a newfag can understand) until it leaves page 10. The thread won't suddenly get closed if it's on page 9 or 8. The old thread is still on page 8. Do you know where the page counter is, or do I need to tell you something this basic?
Why don't you make a new thread when we have reached 100 posts? Because we could reach bump limit in another some hundreds of posts!
You retarded shits are inaccessible to basic logic.
>>
File: 1714668170062214.png (819 KB, 824x824)
819 KB
819 KB PNG
>AI gives people the power to generate anything their heart desires
>anons just use it to generate generic instagram bimbos as if we don't have enough of those in the world
humanity was a mistake
>>
>>102084016
don't pretend art and music don't exist, most of them are about women too
>>
File: 00356.png (1.07 MB, 1280x720)
1.07 MB
1.07 MB PNG
>>102083746
nice, sdxl is pretty good with game screenshot prompts on it's own tbqh.
>>
File: 2024-08-26_00052_.jpg (1.51 MB, 3840x2160)
1.51 MB
1.51 MB JPG
>>
File: 00024.png (1.71 MB, 1024x1024)
1.71 MB
1.71 MB PNG
>>102083995
>that effortpost
I might just have to bake a couple more new threads now just to make you mad
>>
File: chick.png (400 KB, 1024x1024)
400 KB
400 KB PNG
>>102083367
>3D render of a cute asian chick with a voluptuous figure on a dark blue background, digital art
Very funny Flux. Verrry funny.
>>
>>102084087
try to prompt some big black cocks ... beautiful animal these
>>
>>102083933
Yeah I'm just ignoring him now.
>>
File: mikuquestion2.jpg (989 KB, 1710x1779)
989 KB
989 KB JPG
Is R-ESRGAN 4x+ Anime6B still the best upscaler for anime gens or is there now something higher quality, or equivalent quality and faster?
>>
File: ComfyUI_32977_.png (1.59 MB, 1024x1024)
1.59 MB
1.59 MB PNG
>>
File: ComfyUI_32983_.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>
>>102084100
i tried big black cooks earlier just gave me white cooks tho
>>
A chick with a Desert Eagle riding a big black cock
>>
I have no reason to believe captioning images can currently introduce new concepts to the model. All LoRA training seems to be doing right now is guiding the model in a direction it already knows and new concepts aren't being introduced.
Which makes sense with the fact that we currently do not train the clip model when training LoRAs.
>>
File: 2024-08-26_00057_.png (1.08 MB, 1280x720)
1.08 MB
1.08 MB PNG
>>102084126
>A very clear photo with no bokeh and no blur, shot with a Canon EOS f/22. There is no visible transition between front and back of the picture, everything in the background is crystal clear and not blurred, a realistic view of ...
>some big black cocks
>>
File: fs_0054.jpg (74 KB, 1024x1024)
74 KB
74 KB JPG
>>
>>102084215
sadly the annoying part is that FLUX completely ignores asking for a clear picture
>>
>>102084193
lol
>>
File: bigblack.png (702 KB, 1024x1024)
702 KB
702 KB PNG
>>102084143
>A chick with a Desert Eagle riding a big black cock
I didn't know the desert eagle was a weapon.
>>
File: FLUX_00042_.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
bang bang
>>
File: 00386-3785725608.png (2.22 MB, 1024x1440)
2.22 MB
2.22 MB PNG
>>
So are loras basically like mods you can add to models to get pictures more inline with what you're looking for?
>>
>>102084240
It's driving me mad, nobody seems to notice or care.
>>
>>102084278
loras are basically microsoft you insert into a slot behind your ear to instantly learn Spanish.
>>
>>102084290
It's wrong. The model definitely doesn't have pictures of my anus, yet when I train the LoRA on my anus it matches exactly.
>>
>>102084322
Now, put a picture of an alien anus in the dataset and give it the activation word of alienanus and watch your LoRA get jumbled up between the two concepts.
If I have two characters or concepts in a dataset, I should be able to flip between those concepts with a simple activation phrase and it's not working. People are just making dumb coomer LoRAs so most of them haven't noticed yet.
>>
File: 1719485343484526.png (1.51 MB, 1024x1024)
1.51 MB
1.51 MB PNG
>>102083620
2500 steps now. Getting quite close to a sprite look, though I probably need to experiment with pixel art lora's to get rid of this overly smooth look.
>>
>>102084356
my anus is alien to the model
>>
>>102084290
because you're wrong, sometimes when you say something and no one else has your problem it's because you did something wrong
>>
>>102084380
>no one else has your problem
They do, holy shit, how can nobody see this?
>>
File: 1367455286278.jpg (96 KB, 615x593)
96 KB
96 KB JPG
>>102084360
>>
You're all celebrating your images looking like your training data without properly testing how what's going on under the hood.
>>
>>102084426
See
>>102084360
That fucked up Doom HUD cannot be generated in Flux natively, it required my lora to do it
>>
>>102084426
>Why can nobody else see the demons!?
>>
>>102084426
no they don't but please tell me your special concept that can't be trained
the problem ultimately is you are traumatized by SD and SDXL and you're used to resorting to gimmicks to train a model
>>
>>102084439
Look, if I make a Debbie from Summertime Saga Lora and it generates a photorealistic version of her I don't need to know how it works under the hood. And really, none of us understand how it works under the hood, we're basically practicing magic.
>>
File: FLUX_00009_.png (906 KB, 896x1152)
906 KB
906 KB PNG
>>102084448
I've been doing to do cigarettes to no real avail
but I don't know what I'm doing, so take that into heavy consideration
>>
File: 2024-08-26_00065_.jpg (1.64 MB, 3840x2160)
1.64 MB
1.64 MB JPG
>>
>>102084533
Why does everyone here beat around the bush? Post a training image and it's caption and state, specifically, your goal of the lora.
>>
>>102084550
I'd rather you prove me wrong
that's win-win
>>
>>102084583
>no I won't say what I want you need to prove me wrong though
Okay, so you're just a schizo.
>>
File: 2024-08-26_00067_.jpg (1.85 MB, 3840x2160)
1.85 MB
1.85 MB JPG
>>
File: Untitled.png (7 KB, 784x312)
7 KB
7 KB PNG
>>102084448
>>102084445
>>102084443

You're all fucking idiots that don't know what the difference between the model working with concepts it knows and adding new ones in through tags and I'm going to prove it , I'm going to train this model on Konosuba characters using WD14 captions, watch as it completely ignores the very unique tags for each character and jumbles them up.
>>
>>102084607
*I'd rather
you need to start paying attention
>>
File: 00221-916362361.png (2.06 MB, 1024x1536)
2.06 MB
2.06 MB PNG
1st good pic of this lacklusterdg thread, comin at cha
>>
>>102084549
Nice
>>102084616
Ewwww
>>
>>102084646
>I'm going to train a natural language model schizo tag prompting
>>
File: 2024-08-26_00070_.png (1.24 MB, 1280x720)
1.24 MB
1.24 MB PNG
>>102084663
lol.. its the same prompt "intergalactic peace conference"

pic related is "illuminati meetup" tho
>>
>>102084646
4 repeats?
>>
File: file.png (1013 KB, 1280x896)
1013 KB
1013 KB PNG
>>
File: ComfyUI_00882_.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
>already got 104 downloads on my Kasia LoRa and 16 thumbs up

feels good man
>>
>>102084701
lol
>>
>>102084701
i sit like this
>>
File: ComfyUI_00659_.png (1.81 MB, 1024x1024)
1.81 MB
1.81 MB PNG
>>102084716
>>
>>102084678
Once again missing the point, as long as the activation phrases are being train, the output SHOULD match the physical features of the character in that dataset.

Sample output is a very simple:
>ダクネス, An anime style image of a woman standing in a living room setting

I chose this prompt for one reason, I want to confirm if the activation tag is functioning

Learning rate is 1e-4 on AdamW8bit
Rank is 64 (overkill but I want to prove a point that no what happens these tags wont be utilized)
>2500 steps 4 repeats for image.

You're all calling me a schizo but I know I'm right.
>>
File: 2024-08-26_00072_.jpg (1.6 MB, 3840x2160)
1.6 MB
1.6 MB JPG
>>
>>102084768
>use jibber jabber characters on an English model
Oh no, it's retarded. It's always funny to see blatant acts of retardation.
Keep on going anon, train that T5 and Clip.
>>
>>102084768
To add, batch size is 4 and training at 512x512
>>
File: ComfyUI_33010_.png (1.1 MB, 1024x1024)
1.1 MB
1.1 MB PNG
>>
>>102084784
It's getting nervous. It knows I'm right and the fact that the unique tag I assign to a concept won't be trained is a problem that will cause concept bleed.
>>
>>102084646
If you go to civitai you will see that people are already noticing those things that, at least right now, cant be trained without degrading the image quality or generating abominations, but no one dares to be very vocal about it because it's still a sin against the religion.
>>
>>102084707
Imagine how many loads have already been dumped to your lora
>>
>>102084798
Anon, it's just as likely your trainer strips your chink language. I already know you're wrong but thanks for demonstrating it because I know I can ignore you from now on. You can't even do the obvious like, I dunno, using XXFUCKYOU as the activation tokens. No, you got to use foreign characters.
>>
>>102084803
I know, it's why with previous models the text encoder was also trained along with the model. I just want to prove to these people that a good deal of their unique captioning is being ignored.
>>
>>102084793
soul
>>
>>102084817
>it's just as likely your trainer strips your chink language.
>Kohya, the script developed by a Japanese developer and with code commented in Japanese strips Japanese.

Alright dude, find me the line of code that strips the fucking activation phrase if it's not English
>>
File: ComfyUI_00870_.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
>>102084816
My next LoRa will make dudes jizz 24/7.
>>
>>102084803
Yeah anon, you can't train something like penises and vaginas with a Lora without destroying the existing knowledge of the model. There are certain concepts which must be done as a full fine tune not just with new knowledge but images that preserve the existing knowledge.
>>
Yeah I'll wait until Furkan Gozukara weighs in on the issue. btw anyone got $5 to spare?
>>
File: 1714660143764982.png (1.41 MB, 1024x1024)
1.41 MB
1.41 MB PNG
>>102084360
>>102084436
3000, getting pretty baked
>>
>>102084833
Yeah, go all in on the shit code trainer that doesn't even claim to work well. Usually when you're mentally ill claiming something stupid you try other solutions, like, I dunno, trainers that proven to work.
>>
>>102084860
The Kohya LoRA script is based of the AI toolkit trainer script.
Both are proven to train a LoRA, if you don't know what I'm arguing shut up and wait for me to show you because you're confusing the concept of a LoRA altering the output of a generation and training new concepts into a LoRA.
>>
>>102084880
Kohya is shit and support is experimental at best. It doesn't even sample while training. But keep at it slugger, I'm sure the problem is you need to train the T5.
>>
>>102084890
>I'm sure the problem is you need to train the T5
Yeah this alone is telling that you're just ignoring me. I'm talking about the clip model.

I don't even know why you're denying this is happening. It's fairly well documented now. People are not having great success with multi character LoRAs
>>
File: file.png (1.05 MB, 832x1216)
1.05 MB
1.05 MB PNG
Activation tokens don't work
>nereirfpnxl, frieren, elf, pointy ears, green eyes, white hair, twintails, shirt, long sleeves, jewelry, earrings, striped, capelet, striped shirt, white skirt, black pantyhose, closed mouth, standing, collarbone, cowboy shot, hand on hip, contrapposto, abstract background,
>>
>>102084917
Now do the same see again but without the activation phrases.
>>
>>102084923
>bro are you telling me loras rape the weights????
>>
>>102084930
This is like pulling teeth, I swear to god.
If I have a LoRA with three characters, and I use the activation phrase for one of the three characters, the outputs should look like that character, however in practice they do not.

This is a problem, this is not intended functionality.
>>
>>102084947
>bro are you telling me I can't just train it like SDXL this is garbage!
>>
>>102084963
>Moving the goalposts
>>
>>102085017
I believe your issue is a failure to experiment and being incapable of understanding what best practices for training Flux is.
>>
What's a good consistent sampler for img2img that preserves the hands? It's for sdxl
>>
>>102085037
I think you're wrong and I think activation tokens are not working properly. If I have a folder of red characters, blue characters and yellow characters, the yellow character should appear when I use its activation phrase. I am not the only person with this issue and if you have proof to the contrary you should show me.
>>
>>102085057
The model was not trained on things like "CharacterName standing in room"
>>
File: 1711896934397384.png (573 KB, 768x768)
573 KB
573 KB PNG
is there any way to get flux to generate an image of a girl walking on a sidewalk? every time i try it generates something like this with her standing in the center of the image.
>>
>>102085070
Well it's a good thing I'm training a LoRA of a character so I can prompt it standing in a room, let's see if the training goes as intended.
>>
>>102085057
https://desuarchive.org/g/thread/102067488/#102069590
This anon got separate costumes to activate on use of their token with his lora
>>
>>102085093
Wow you're sure going to show me training on shitty SDXL captions.
>>
>>102085103
I don't buy his approach, without seeing the exact prompt he used, he may have just prompted in the clothing and falsely attributed it to the activation phrase.

>>102085107
I'm right, you know I'm right and so does anyone else who doesn't suck buzz out of Indian dick to train LoRAs.
>>
>>102085133
no one who has had this issue/the anon who didn't have this issue have told us what trainers they used. for all we know this is all due to a specific trainer's code having something fucked up, but we'll never find out because you guys keep talking about it but refuse to offer any actual information towards the potential source of the issue. you and the other anons were also all asked if you had specified keep tokens or not, and you never answered. how can we troubleshoot if the only thing being said is "its working" or "its not working"?
>>
>>102085189
>you guys keep talking about it but refuse to offer any actual information towards the potential source of the issue
The clip isn't be trained. I've said this multiple times.
>how can we troubleshoot if the only thing being said is "its working" or "its not working"?
I'm literally training a model right now with three very unique characters to prove that the basic approach is not working

Like I said, the clip needs to be trained along with the model.

I'm not asking for solutions. I'm trying to prove to you that it's not functioning as intended.
>>
Okay I was here earlier this morning and I think I've nailed down an extremely potent combination for getting good captions for flux.

My images have booru-style tags that capture the core concepts in the image. But are otherwise not very descriptive. These tags were done manually, these are real images, not anime.

So anyway, caption1 from InternVL2 prompted with the tags as additional context. caption2 from joycaption, also prompted with the tags as context. Then caption1 + caption2 + tags into Mistral-123b, with detailed instructions on how to combine all the info. You can tell it to treat the tags as 100% accurate, treat the captions as possibly inaccurate, always include info from all the tags, write in a certain style, use certain words (pussy instead of genitals), etc. And the GigaMistral model is powerful enough to follow all that consistently. The result appears to be extremely good captions.

Currently running my script to do InternVL2 captions on 2000 images, eta 23 hours. Then comes the other phases. Takes a while and requires 4 3090s to run this locally, but I dunno any other way to get this level of quality.
>>
So from "you can't train on 12 GB" we came to "kohya doesn't sample while training"? What's next, "the water is dry"?
>>
>>102085189
because he doesn't want his shizo theory disproved, instead he'll post multiple times a day who shizo theory and do nothing to troubleshoot it
>>
>>102085229
I don't think the theory is schizo at all. Nobody has shown me a demonstrably functional multi concept LoRA and the one I'm currently training isn't looking so hot either.

This character should be the yellow one. Let's let to cook a little longer.
>>
File: aseet.jpg (20 KB, 542x375)
20 KB
20 KB JPG
>>102085220
>The result appears to be extremely good captions.
>>
>>102085217
if you keep refusing to answer these basic questions:
1) what trainer
2) are you using the keep tokens ARG
then I can't in good faith accept your "proof" as anything. you don't need to train clip to introduce trigger words, that's the magic of t5. if the trigger words aren't working, there is something else wrong! it's that simple
>>
>>102085220
post examples, i've been screwing around with a machine centipede of generating captions with joy-caption then feeding them into the model without any loras to tinker with until the outputs resemble the training picture well enough
>>
did all of /h/ just get purged
>>
>>102085248
>>102085307
Sorry don't have examples. When I was testing the process earlier today I did each step manually. Then built the prompt for mistral using text-generation-webui. Didn't save any of the results. I can post here again with examples once I've finished the automated runs for all the images. But that will be a few days lol.
>>
File: walking_along.png (2.19 MB, 1600x1200)
2.19 MB
2.19 MB PNG
>>102085073
cap
>>
>>102085351
i will wait warmly
>>
>>102085325
/c/ and /aco/ too
>>
File: 07366-1876201437.png (2.75 MB, 1904x2192)
2.75 MB
2.75 MB PNG
>>
>>102085270
Kohya and I don't even think keep tokens is an argument in the documentation.
>>
>>102085358
I must be doing something wrong.
>>
>>102085384
>I don't even think keep tokens is an argument in the documentation
well, there we go, this very well might be your issue, you will need to use it in your next test to find out. next time you go to train, add a nonsense arg like --gdhfkgh and the training available args will populate in your command prompt, which includes the keep tokens one. iirc its --keep_tokens and if you have datasets that will have different amounts of keep tokens you can set them via the dataset_config toml as per:
https://github.com/kohya-ss/sd-scripts/blob/main/docs/config_README-en.md

is my belief that if you are not caption shuffling, in theory, it shouldn't matter this isn't set and the first word should be seen as a keep token anyway, but it is very much possible that isn't the case for whatever reason so it can't be ruled out to be the issue until tried
>>
>>102085445
>is my belief
meant its my belief, tired
>>
do people get sued for putting character models on civit
I'm wondering how I'm the first to do this specific person, who may or may not be particularly litigious
>>
>>102085445
Oh cool, someone who wants to investigate instead of call me a schizo and dismiss me.
I'll take a look at this when this test model is baked enough. Still not hopeful.
>>
>>102085443
Why would she not walk in the middle of the street if there are no cars? Are you stupid?

If you prompt with her as the focus she'll be in the middle of the frame. And the middle of the street is the easiest to photograph and frame. You should go outside with a real camera (or phone) and practice framing to get the idea
>>
File: ComfyUI_33020_.png (1.12 MB, 1280x720)
1.12 MB
1.12 MB PNG
>>
>>102085478
I am sincerely curious about it too, but I don't have any good datasets ready for characters that would have distinct enough outfits to test it myself right now. if your results are inconclusive I'll tag something quick and test a few things, too. it is unfortunate the anon who had them working didn't offer more information on his trainer/params so we could compare the potential differences. his images did appear to be distinct character outfits, but more info including his prompt for reference would've helped narrow stuff down a lot..
>>
File: 1693840905270837.jpg (206 KB, 850x595)
206 KB
206 KB JPG
>>102085482
How do you prompt with her not as the focus? Is there a specific wording to not get that? I tried moving the setting to the front of the prompt but that didn't change anything. Do I need to add random shit to the foreground just to move her to the background? I want something like pic related so I might just upload it to a caption node and copy whatever it gives me.
>>
>>102085477
Not currently, but deepfakes and ai porn of real people are in a sketchy spot right now so I wouldn't risk uploading a lora of a real person
https://www.rollingstone.com/politics/politics-news/aoc-deepfake-porn-bill-senate-1235067061/
>>
>>102085477
if it's donald trump don't bother, flux is already perfect at him without any training
>>
>>102085545
N
>>
>>102085478
because you aren't even good faith trying you faggot, you literally have been bitching all day about it and you haven't even tried another trainer
>>
>>102085477
Wouldn't upload any lora's of individuals without being behind a few proxies
>>
File: Capture.png (3 KB, 455x130)
3 KB
3 KB PNG
>>102085384
There's a setting in pleb Koyha gui fork
>>
File: Untitled.png (64 KB, 796x611)
64 KB
64 KB PNG
>>102085445
So here's how the set looks when starting training. I'm not really sure how keep tokens works should I be setting that to 1 or something?
>>
If you upload a lora of a real person on Civit you unironically deserve to have criminal charges filed against you for harassment, defamation and damages.
>>
>>102085572
I think roasting my building a test dataset and roasting my GPU to confirm something is amiss qualifies as good faith. You're just angry because I seem to be having a problem for some reason and I suspect it's not isolated to me.
>>
File: 1703520112821474.png (1.19 MB, 1895x1300)
1.19 MB
1.19 MB PNG
>>102085539
you know what I'm just gonna gen the background separately and inpaint the girl into the image this is ridiculous and I'm fucking retarded.
captcha: ART MD
>>
>>102085585
Thanks I'll check it out.
>>
File: ComfyUI_00020_.png (2.84 MB, 1824x1248)
2.84 MB
2.84 MB PNG
>>
>>102085545
so it's only women?
I'm good then, it's a dude
>>
File: sidewalk.png (150 KB, 1024x1024)
150 KB
150 KB PNG
>>102085073
Look for a picture in google with the concept you want to draw in Flux, save the pic and paste it here:
https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha
Tweak the prompt and make Flux draw it.
>The image is a digital drawing depicting a girl crossing and a bus stop. The background is a simple white wall, creating a clean and minimalistic look. In the foreground, a pedestrian crossing strip extends horizontally across the image, with bold, white stripes painted on a gray asphalt road. The road itself is bordered by a curb with alternating black and white stripes. To the right of the crossing, there is a blue and white bus stop sign with the word "BUS STOP" written in blue letters. The sign is mounted on a metal pole. To the left of the image, a girl is walking across the sidewalk. >She is dressed in a dark gray hoodie, light brown pants, and black shoes. The overall style of the drawing is flat and minimalistic, with clean lines and solid colors, typical of vector art. The textures are smooth, with no detailed shading or intricate patterns, focusing on simplicity and clarity. The image effectively conveys the message of a typical urban setting, with clear pedestrian and bus stop infrastructure.
This prompt produces what you want (in this case, a girl walking on a sidewalk) and you go from there.
>>
File: power-of-iteration.png (2.01 MB, 1280x1280)
2.01 MB
2.01 MB PNG
>>102085539
>>102085443
Try iterating faster

also you better not be cranking the hog to these photos without using blood as lube
>>
File: sweat.png (1.96 MB, 1280x1280)
1.96 MB
1.96 MB PNG
>>102085657
mfw
>>
File: 1715502052249916.png (1007 KB, 1024x768)
1007 KB
1007 KB PNG
>>102085657
i'm on a 4070 (non-ti) i can't iterate any faster sir
and no I'm not jerking to these pictures I want it for a...jpop AMV

>>102085651
I don't really wanna use a lora because I'm stubborn. I'll try taking that prompt and minimizing it as much as possible to get something that works. Thanks for the prompt bro.
>>
>>102085594
ok so the way keep tokens works is its literally specifying how many words at the very start of your caption you want it to 'keep' (or turn into trigger words). then it learns those words = the things that were not tagged in the attached image, generally

so if each of your captions has 2 'trigger words'
>the characters name
>the characters outfit
you'd make keep tokens = 2
and at the very start of every captioned image that has the character + that outfit, you'd have those two words

example: I have images of goku supersaiyan
I want activation words for both "goku" and "supersaiyan"
my caption is set up like this:
goku, supersaiyan, [everything else]
and my keep tokens are set to 2 for that folder, because I want to 'keep' goku and supersaiyan

then maybe I have some images that are only goku, no outfits, and have no further trigger words. those ones need to be put into their own folder and in the toml config, set to just 1 keep token. caption would look like:
goku, [the rest of your caption]

let me know if that doesn't make sense, I'll try to explain better
>>
>>102085702
When you say tokens are you talking about words separated by commas or the llm style token where each character has a certain value assigned to it?
>>
>>102085568
he doesn't look old and tired enough, model needs to be updated
>>
File: cloned.png (2.09 MB, 1280x1280)
2.09 MB
2.09 MB PNG
>>102085701
> sidewalk, tearful fatigued sickly beautiful young woman with bandages on her wrists walking towards the camera, demonic dark aura smoky Dementors are standing behind her, akihabara sidewalk, empty streets, photographic realism, foggy weather haunted ambience, pedestrians across the street are staring in horror, empty roadway

apparently black people are dementors, cool

it's hit an miss with placement, you just gotta take more shots
>>
>>102085618
you don't even know how to use your tool and you're passing blame like a retard, hopefully this will be a lesson to actually try harder testing in the future instead of jumping to dumbass conclusions like "I need to train the T5". You weren't even doing your training settings correctly.
>>
>>102085702
Okay, so based on that explanation, I'm going to train again with keep tokens set to 1, which should take the activation phrase(the folder name) as the token to keep, correct?
>>
>>102085794
>I need to train the T5
You keep saying I said that and I keep telling you I didn't.
I'm not even convinced that the new settings I'm using will fix the problem. You are the least helpful person of all.
>>
File: cheese.png (1.9 MB, 1280x1280)
1.9 MB
1.9 MB PNG
>>102085786
>>102085701
> sidewalk, wide angle photo showing empty street and pedestrians on opposite sidewalk, smiling young japanese woman with bandages on her wrists walking towards the camera, smoky horned smiling demonic mystical figures follow her, next to restaurant window on akihabara sidewalk, empty roadway to the right, beautiful sad emotionally charged photo, dark but realistic, dark crowd of pedestrians across the street far away

captcha: KAWP
>>
>>102085702
I should add that if you aren't familiar with character loras, it is best practice to not describe in the caption whatever the 'keep token' is supposed to describe.
example: if my keep token is 'goku' and my image is of goku, I describe only the traits that aren't permanent. I wouldn't describe his hair, eyes, so on. I would describe the background, his pose, clothing (if its not becoming an outfit trigger word), etc.

this way it comes to believe that the word 'goku' = this guy in the image
>>102085744
in this case, I mean words separated by commas. if you look at your config in >>102085594 it's whatever the 'caption_seperator' is set as. by default that is a comma
ie:
>goku, supersaiyan,
with 2 keep tokens is effectively the same idea as
>goku, super saiyan,
>>102085797
the folder name is not the keep token, infact the folder name could be 'dig bicks' or something entirely randomly and it should not impact your training what so ever. the very first tag in your .txt caption is the keep token, if keep token = 1
if keep token = 2, its the first two tags (or two phrases separate by a comma)
>>
>>102085809
lol
every time you post now, you just get an lol
the lol of incompetence
>>
>>102085815
okay this is pretty rad stealing this prompt
>>
>>102085828
just gotta type sidewalk a bunch of times bro boomer prompting is the way
>>
>>102085823
Okay, this is what's confusing me, In Kohya, the folder format is (number of repeats)_class token.

What's the point of the class token? That's the one that doesn't seem to be doing anything when prompting for it.
>>
>>102085823
I believe the folder name gets used if your dataset lacks captions
>>
>>102085855
maybe this will help clear that up:
https://github.com/bmaltais/kohya_ss/blob/master/docs/image_folder_structure.md
>If a file with a .txt or .caption extension and the same name as an image is present in the image subfolder, it will take precedence over the concept name during the model training process. For example, if there is an image file named image1.jpg in the 30_cat subfolder, and there is a corresponding text file named image1.txt or image1.caption in the same subfolder, the concept name used during training will be determined by the content of that text file rather than the subfolder name.

so to be more specific, the (number of repeats)_class token thing only applies if you have no captioning what-so-ever, which really wouldn't work out well with a character lora since you have to caption things to be able to tell the ai you want it to learn, well, a character vs everything else in the images
>>
>>102085886
that is correct, yes. apologies for any confusion, I misspoke
>>
File: ComfyUI_00888_.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
>>
>>102085917
are those midgets or old guys on scooters
>>
>>102085888
Okay, that makes sense, so what I did now was added the trigger words as the first token in all of my captions. Let's see if that fixes it. Thanks for the pointer.
>>
>>102085998
perf, just don't forget to add the --keep_tokens 1
arg too as I'm not 100% if its necessary with non-shuffled captions or not
>>
So. Flux for outpainting. What's the best way to go about this? Say I have the top half of a woman's photo and want to outpaint the rest downwards with Flux. What are my options?
>pad image and do normal inpaint (high denoise): the inpainted part doesn't match up at all with the body, terrible result
>pad image and do normal inpaint (lower denoise): the grey padding gets made into an object or affects the image in some other way
>padding + inpaint + depth controlnet (denoise 0.8-0.99): sort of works, helps the generated part match the body but suffers the same problem as above
>padding + inpaint + depth controlnet + a modified depth map that contains a generic outline of the lower body (denoise 0.8-0.99): i can't draw the perspective right and it ends up looking off, plus the controlnet is sensitive to details so hands/sleeves might be off, also the grey padding ruins it
>padding + inpaint + depth controlnet + custom depth map + lightly colourised padding (denoise 0.8-0.99): sort of fixes the colour issue above but the rest of the issues remain
The amount of effort required to even get something MILDLY serviceable is insane, is there anything that could ease the process? Or other options I'm not aware of? (other controlnet types aren't as easy to draw, sadly)
>>
>>102086019
Done and done, I'll report back when it's done. Just don't expect a masterpiece LoRA, this was made on the fly.
>>
>>102086036
no worries, will be a good potential test to check if its working or not with kohya regardless
>>
File: 00049-171646211_cleanup.png (3.04 MB, 1280x1920)
3.04 MB
3.04 MB PNG
>>
File: fs_0108.jpg (133 KB, 1024x1024)
133 KB
133 KB JPG
>>
In ComfyUI, how can I use multiple images for an IP adapter reference?

In Automatic1111 you can just drag and drop a batch of images into the UI to get higher accuracy. I can't seem to figure out how to properly use, for example, 10 different reference images. I have the ComfyUI IPAdapter plus nodes but unsure how to do this.
>>
>>102086142
Have you updated your ipadapter nodes? The newer version, which is unfortunately harder to use, allows you to chain unlimited reference images together.
>>
File: file.png (292 KB, 768x1024)
292 KB
292 KB PNG
aww that's cute
https://civitai.com/models/686921/peekingflux?modelVersionId=768790
>>
File: 00003-1190563778_cleanup.png (2.93 MB, 1280x1920)
2.93 MB
2.93 MB PNG
>>
File: 1707215926883422.png (96 KB, 815x802)
96 KB
96 KB PNG
>>102086142
you can just use a batch uploader and direct it to a folder.
that said I've never tried this workflow myself so who knows if it works.
>>
File: 00005-1566629179_cleanup.png (3.27 MB, 1280x1920)
3.27 MB
3.27 MB PNG
>>
File: ron.png (1.82 MB, 1024x1024)
1.82 MB
1.82 MB PNG
I love America.
>>
>>102086354
kek, that's the grifter's face on top of Trump's right?
>>
Is there someone who claimed he's making a real finetune of flux right now? Because so far we only got lora merges, fuck that.
>>
File: 1704702640549102.png (1.68 MB, 1280x1280)
1.68 MB
1.68 MB PNG
>>
>>102086424
there are a couple people doing some home experimenting I think, but no true massive dataset tier project yet to my knowledge
>>
>>102086440
you used a lora for this?
>>
>>102086354
that looks absolutely nothing like trump. i could fix it with 1 simple SDXL mask but im too lazy
>>
>>102086465
I have no idea how he fucked up Trump's face, that's probably the best thing Flux can do, Trump
>>
File: ComfyUI_02282_.png (1.29 MB, 1024x1024)
1.29 MB
1.29 MB PNG
>>102086442
I know of person who is working on Flux fine tuning. But you're not gonna like it...
>>
File: 1776trump.png (2.15 MB, 1024x1024)
2.15 MB
2.15 MB PNG
>>102086472
in my experience SDXL does better trumps
>>
I'm looking at dalle's thread and their pictures are really creative, dunno if we can replicate anyone of them on flux yet
>>102027227
>>
File: S.jpg (107 KB, 1024x1024)
107 KB
107 KB JPG
>>102086424
we have top men working on it right now
>>
File: savedalt.png (2.89 MB, 1536x1536)
2.89 MB
2.89 MB PNG
>>
>>102086482
if this mf manages to make a top tier finetune of Flux, then it'll be his best redemption arc
>>
File: T.jpg (89 KB, 1024x1024)
89 KB
89 KB JPG
>>102086500
throw the image into joycaption or florence and put the prompt into flux and see what you get
>>
File: 1724512071910931.jpg (2.43 MB, 3600x3104)
2.43 MB
2.43 MB JPG
>>102086500
I really like that one for example, I don't think Flux can make this style unfortunately, maybe with a lora but such one doesn't exist yet
>>
>>102086455
Yes, see >>102083452
>>
>>102086486
you think? his face looks weird on your example kek
>>
File: ComfyUI_02384_.png (865 KB, 1120x704)
865 KB
865 KB PNG
>>102086515
He'll burn it like everything else he touches and sell the ashes to his loyal patreons
>>
File: _tod___.png (2.23 MB, 1018x1018)
2.23 MB
2.23 MB PNG
>>102086531
>cats

not that impressed with DALL-E. lately ive been composing with flux and upscaling/inpainting with SDXL. maybe i just hate DALL-E because i refuse to login, to anything
>>
File: 42.png (1.14 MB, 832x1216)
1.14 MB
1.14 MB PNG
I love looking through the early samples
>>
>>102086424
>Is there someone who claimed he's making a real finetune of flux right now? Because so far we only got lora merges, fuck that.
How many gb of VRAM do you need? I'd like to try it out but I "only" have 2 gpus, a 3090 and a 3060, dunno if that's gonna be enough
>>
File: goldenear.png (1.96 MB, 1024x1024)
1.96 MB
1.96 MB PNG
>>102086543
yeahh thats an old one didnt pick the best example
>>
>>102086365
No it's just flux being lazy. I think it's funny how fat he looks though
>>
>>102086562
Technically should be possible on a single 3090 with 64gb of system ram. In practice it's looking tough to actually do.
>>
>>102086569
>golden ear
kek, I'd totally see him do something like that to fix his ear
>>
File: Thisisnew.png (19 KB, 401x371)
19 KB
19 KB PNG
>>102086526
>throw the image into joycaption or florence
I made an study and this is the best captioner of them all:
https://huggingface.co/spaces/Quardo/gpt-4o-mini
You need a prompt asking for a very detailed and long description of everything in the picture.
>>
>>102086562
50 gb vram https://github.com/bghira/SimpleTuner/blob/main/documentation/quickstart/FLUX.md
>>
>>102086594
we all know GPT4V is the best captioner, but we're looking for something local and who would do NFSW stuff
>>
>>102086608
>Flux requires a lot of system RAM in addition to GPU memory. Simply quantising the model at startup requires about 50GB of system memory.
the 50GB cited is about the RAM and it's about quantising, nothing to do with finetuning I guess?
>>
I'm generating cool background images every minute for 0.05 cents of electricity on a loop in the background. Flux is the dopest model ever invented period
>>
File: dead.gif (56 KB, 220x165)
56 KB
56 KB GIF
>>102086625
>>Flux requires a lot of system RAM in addition to GPU memory. Simply quantising the model at startup requires about 50GB of system memory.
>the 50GB cited is about the RAM and it's about quantising, nothing to do with finetuning I guess?
> Unlike other models, AMD and Apple GPUs do not work for training Flux.
macfags btfo
>>
File: 1711847644437275.png (1.21 MB, 768x1152)
1.21 MB
1.21 MB PNG
>>
>>102082604
that's precisely why local is the best, you can't nerf local, we have Pepe on flux (lora) and no one is gonna take it from us
>>
File: ULTIMATEPEPE.png (3.98 MB, 1536x1536)
3.98 MB
3.98 MB PNG
>>102086660
kee rect
>>
File: BIKE3.png (3.23 MB, 1536x1536)
3.23 MB
3.23 MB PNG
>>
File: 1713044492045621.png (846 KB, 1024x1024)
846 KB
846 KB PNG
>>
File: file.png (2.32 MB, 768x1344)
2.32 MB
2.32 MB PNG
Impressive
https://civitai.com/models/684646/lyhdalleanime?modelVersionId=766519
>>
File: fs_0136.jpg (77 KB, 1024x1280)
77 KB
77 KB JPG
>>
>>102086914
>Your flux model can now look like 1.5 slop

Nah
>>
File: file.png (2.31 MB, 1024x1024)
2.31 MB
2.31 MB PNG
>>
File: 1702035658148362.png (387 KB, 640x480)
387 KB
387 KB PNG
>>102086837
I knew I was doing something wrong. 640x480... home...
>>
File: file.png (2.52 MB, 1024x1024)
2.52 MB
2.52 MB PNG
>>
Flux is very impressive with its competent prompt following but the fact that humans don't describe a scenes in the level of detail required by it (we abstract a ton of information in speech or think in imagery) so you kind of need to use an LLM as an intermediary dampers the fun somewhat.
>>
all this shit looks terrible and only appeals to brainless 70 iq pajeets
>>
>>102086997
care to show an example on what you can't do with words?
>>
n slur
>>
File: 1723912330645471.png (709 KB, 824x824)
709 KB
709 KB PNG
>>102087005
I genuinely think Flux is turning the tide in AI image generation. Everything looks way too professional and "cinematic" and it's fucking soulless and ugly. We're all waiting for someone to come up with a decent prompt for images that don't have that Netflix/Instagram look because so far it's just endless over-produced slop.
>>
File: 1703631570408911.jpg (623 KB, 1344x1728)
623 KB
623 KB JPG
>>102087028
Just use loras bro
>>
>>102087034
nta, which lora did you use for this one?
>>
File: file.png (2.29 MB, 1024x1024)
2.29 MB
2.29 MB PNG
>>102087028
>Everything looks way too professional and "cinematic" and it's fucking soulless and ugly.
the loras are here to save the day
https://civitai.com/models/652699/amateur-photography-flux-dev
>>
>>102087034
needing to wait for someone to make a LoRA is dumb. I should be able to get any image I want using just the base model. it's just ridiculous that the model is biased towards this extremely unnatural "style" that needs extremely specific prompting just to get a natural looking photo instead of a movie scene. it's like they got everything completely backwards.
or more realistically they built the model intending it to be used for commercial purposes and didn't think that the average customer doesn't want super high contrast lighting in every single image.
>>
>>102086914

>Base Model: Flux.1 S

Disappointed.
>>
>>102087057
I completely agree with you anon, the model should've have more concepts in it and should not be as biased on the generic ones, I hope the finetunes will fix that
>>
>>102087054
has anyone used this lora? got any of your own example pics?
>>
>>102087045
I didn't make that one but it says <lora:Ralph_McQuarrie_FLUX:0.5> <lora:flux_dev_frostinglane:0.25> <lora:rogi:0.2>
>>
File: 1704051940344779.jpg (264 KB, 832x1216)
264 KB
264 KB JPG
>>102087057
>wait
There are hundreds on Civitai already, many of which are quite good
>>
File: 1705362023925883.png (425 KB, 640x480)
425 KB
425 KB PNG
>>
>>102086615
>who would do NFSW stuff
But why? Once I had it I realized the prompt was the same and it added the naughty stuff to it, so just a normal SFW modified to add the naughty stuff worked for me.
And apparently what I wanted was being able to generate that, part of the appeal was being unable to do it, once I was able it became a boring subject like the SFW stuff, I guess once I can truly gen whatever I want with ease I'll quit because I'll get bored.
>>
>>102086970
What was the prompt for this style?
>>
>>102087114
it's a lora style
https://civitai.com/models/667307/flux-y2k-typeface
>Y2K style cover art with a low poly 3D render of:
>Hatsune Miku, the beloved virtual pop star with her iconic turquoise twin-tails, is standing on a small stage in a cozy comedy club. The stage is illuminated by a spotlight, casting a warm glow on Miku as she holds a microphone in one hand. She is dressed in a stylish, casual outfit that includes a graphic tee and jeans, giving her a relaxed and approachable look. The audience, a mix of excited fans and curious newcomers, is seated at round tables with drinks and snacks, eagerly watching her performance. Miku’s face is animated with expressions of humor and joy as she delivers her punchlines, her body language lively and engaging. Behind her, a brick wall adorned with comedy club posters adds to the authentic atmosphere. The room is filled with laughter and applause, creating a vibrant and cheerful scene.
>Y2K style text at the bottom: "Comedy"
>>
>>102086970
>destroyed hands and anatomy
that's the problem with loras, they rape the weights and make flux worse on other shit like human anatomy, that's why loras are just a band tape and not the real solution, a real finetune is all you need
>>
>>102087131
Thanks! So now I can answer this:
>>102087008
>care to show an example on what you can't do with words?
A picture in this style >>102086970 without Lora s, I realized the most fun I had with SD1.5 was style exploration, the eyes didn't match, the hands were horrible and the NSFW nightmare inducing, but it feels like I already fully explored Flux's style capabilities and every time a novelty appears it needs a Lora.
>>
>>102087147
could you stop using the word rape and use a descriptive word? What are you actually trying to say? Does it zero out the weights, add too many nodes, increase the weights out of a normal range? What?
>>
>>102087147
>>destroyed hands and anatomy
I didn't even notice and it didn't bother me, it turns out this is the way and I thank Stable Diffusion for it, I would rather have this than perfect anatomy and hands in an equivalent picture that was just another AI slop.
>>
>>102087194
>I didn't even notice and it didn't bother me
it's just sad to destroy flux abilities to do good anatomy just to get a style it doesn't know, it shouldn't be that way, that was my point
>>
>>102087178
yeah I agree with that, it's way more fun to just prompt a style on the model and get what you want rather than having to go to civitai, download a lora, and rape the base model with its weights, it gives slower speed, if you stack multiple loras it won't work well...
>>
File: 1706820802625322.png (436 KB, 640x480)
436 KB
436 KB PNG
>interpreted "fashion magazine" to mean extended-capacity
>>
File: 1695625205761192.png (422 KB, 640x480)
422 KB
422 KB PNG
>>102087308
>>
File: DalleVFlux.png (1.25 MB, 1400x700)
1.25 MB
1.25 MB PNG
>Wants to recreate picture on the left by Dalle 3
>Puts it through a descriptor
>Gets 6 long paragraphs
>Feeds them to Flux-dev
>Gets image on the right
This makes me want to die, for some reason, is this what they call disphoria?
>>
>>102087345
Your prompt had spelling mistakes that causing your distress.
>>
File: ComfyUI_05314_.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
>>102087345

Looks like you have skill issues. Prompting basics still applies. Not going to waste anymore time on this:

Style Prefix

Background

Descriptions.

https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alphatssy
>>
Why are these threads so dead now? Have the tourists finally gone?
>>
>>102087399
It's early morning in yurope and the middle of the night in the US
>>
>>102087407
And I am in the middle of prime aussie hours. Fuck me.
>>
File: 1696438765810362.png (417 KB, 640x480)
417 KB
417 KB PNG
>>102087337
>>
File: 1710960090700536.png (398 KB, 640x480)
398 KB
398 KB PNG
>>102087519
>>
>>102086930
Even the ancient Greeks would say that ass is too manly.
>>
>>102087578
>Even the ancient Greeks would say that ass is too manly.
and they would love that shit, they were a bunch of faggots back then
>>
File: 1706996653312665.png (370 KB, 640x480)
370 KB
370 KB PNG
>>102087560
>>
>>102087560
>>102087602
>>102087519
too smooth
>>
File: 1711382012396910.png (438 KB, 640x480)
438 KB
438 KB PNG
>>102087634
I am emphasizing her smooth legs in the current prompt, can try to adjust things
>>
>>102087634
this, looks like they applied a linear gaussian filter or some shit
>>
>>102087596
nta.. but anon.. that..that was the joke... that it'd be too many even for a bunch of faggots...
>>
>>102087690
yeah I know it's a joke, but it doesn't make any sense, for a faggot, the manlier the better kek
>>
>decide to rent a gpu
>uploading dataset
>my internet loses a bar out of no where
>if I reset my router in hopes it resolves itself, I have to wait while my rent runs
>if I don't, I might have to wait forever for it to upload
>if I stop the runpod, I waste the time I spent uploading things already and setting up kohya
rrrrrEEEEEEEEEEEE
>>
>>102087715
imagine using wifi lmao
>>
>>102087715
You will own nothing and you will be happy
>>
File: 1700369141998278.png (175 KB, 640x480)
175 KB
175 KB PNG
>>102087640
>>
>>102087740
my cat chewed through my cord and the replacement hasn't shown up yet. worst part is I took the risk of resetting my router and its still a bar down. fucking jeets bought out my local provider and its been downhill ever since
>>
File: fluxcomic.png (1.13 MB, 1024x1024)
1.13 MB
1.13 MB PNG
I guess Joy Caption missing the shotgun that the mouse had ruins the joke.
>This image is a digital cartoon drawing in a humorous style. The scene depicts a maze with dense green hedges, creating a complex and confusing path. The maze is located on a grassy field with no other visible objects or structures. In the foreground, a woman stands with her arms raised, seemingly lost or frustrated. She is dressed in a white lab coat over a red shirt and blue pants, suggesting she might be a scientist or a professor. She has a serious expression on her face, and her hair is drawn in a simple style. To the left of her, a small, cartoonish mouse is holding a sign that reads, "YOU HEARD ME PROFESSOR! GET IN THERE!" The sign is in black text on a yellow background, and the mouse is holding it with both paws. The background of the maze is filled with tightly packed green hedges, creating a maze-like structure that is visually dense and intricate. The overall mood of the image is humorous and playful, with a touch of irony.
>>
File: 1694226410627142.png (190 KB, 640x480)
190 KB
190 KB PNG
>>102087763
>>
Training a LoRA of my cat.
>>
>>102087870
based
>>102087715
>>102087785
>dig out old cord I knew I had burried somewhere
>direct connection
>its not uploading any faster than the 1 bar less wifi
I'm gonna sudoku, its over
>>
File: 1699091013801129.png (1.21 MB, 1024x1024)
1.21 MB
1.21 MB PNG
>>102087519
prompt? are you saying DOOM PC ui and interface?
>>
File: flux.png (866 KB, 1024x1024)
866 KB
866 KB PNG
>>102087383
>Not going to waste anymore time on this:
Please, I beg you! Why is my Joy Caption and Flux giving me entirely different results from what you're getting?
>This is a vibrant, digitally drawn illustration depicting a joyful scene of two children playing basketball in a sunny outdoor park. The foreground features a young boy and girl, both with dark skin and expressive features. The boy, on the left, has short black hair and is wearing a bright yellow T-shirt, navy blue shorts, and black sneakers. He is dribbling a basketball with his right hand, smiling widely. The girl, on the right, has long, braided dark hair and is wearing a pink short-sleeved shirt with a white collar, a pleated blue skirt, and blue sneakers. She is also smiling, and her right hand is reaching out to the basketball.
>In the background, there is a basketball hoop with a red rim and a white backboard, positioned slightly to the left of the boy. Behind the hoop, a third child, also with dark skin, is standing with a neutral expression, wearing an orange shirt and blue shorts, observing the game. The background features a clear blue sky with a bright sun shining in the upper left corner, casting warm light over the scene. The park has a few trees with green leaves, and the ground is a light brown, suggesting a grassy area. The overall style is cartoonish, with smooth lines and
>>
File: 1713204187964010.png (504 KB, 713x753)
504 KB
504 KB PNG
>>102087914
No, it's a LoRA I trained using 68 DALL-E 3 generated images as inputs, combined with two other LoRA's (EnvyFluxPixelArt and Dreamy Floating Flux). In addition to prompting coom bait I'm captioning actual screenshots from classic boomer shooters/wads, and plan on training another LoRA to see how it stacks up.
>>
>>102087946
nice, looks good: what are you using for lora training? I wanted to try kohya with a 4080 even if it takes a while but there seems to be no flux option, it's just 1.5 or SDXL. Or i'm missing something obvious in the main lora menu. did a git pull and everything, its up to date.
>>
File: help.png (2.11 MB, 1920x1080)
2.11 MB
2.11 MB PNG
hello,
only when i upscale i get this blue-ish miscoloring as if overexposed on a camera , always on shiny places filled with light or on wall edges.
What the hell is this?? Cant see the cfg do anything, removing lora dont do anything, only happens on some models , even when im using the "right" sampler stated by the models author.
Changing the denoise strength does eventually change it but only when the image has become something completely different at 0.7.
>>
>>102087914
>>102087946
Here's the catbox btw

https://files.catbox.moe/kicvzy.png

Not ready to share the LoRA yet but I used JoyCaption for all the inputs, and edited them to fix glaring inaccuracies.
>>102087964
I can't do it locally yet (2060 12GB), it cost me 50 Google Colab credits and took nearly 8 hours to get 4000 steps. I'll probably train on Civitai in the future since that's apparently more cost-effective.
>>
retard here, what is the difference between this thread and the stable diffusion general?
>>
>>102087978
looks good, a deus ex (2000) lora would be neat too imo but that doom one looks great
>>
>>102087981
we are unstable
>>
File: 1699765620633656.png (1.14 MB, 723x719)
1.14 MB
1.14 MB PNG
>>102087983
>a deus ex (2000) lora
Someone said they considered doing that a couple days ago, but I haven't heard anything of it yet. It's something I'd like to do, have messed with Deus Ex models since creating a Looking Glass model trained on face textures.
>>
>>102087968
You need to use a VAE
>>
>>102088003
a VAE is always used
>>
>>102088003
when you don't use a VAE, it's using one by default that's integrated on ComfyUi/Forge?
>>
Fresh cooked homemade bread...
>>102088021
>>102088021
>>102088021
>>
>>102088010
>that's integrated on ComfyUi/Forge
That's part of the model. The whole "use a vae" thing comes from 1.5 when there was an alternative vae. There aren't any alternative vaes for later models.
>>
>>102087194
>I didn't even notice and it didn't bother me
Copium is a meme term created by combining 2 words together - cope and opium. It is used satirically and is a joke term used to describe a fictional drug that one consumes after suffering a loss, defeat, or disappointment.
>>
>>102088639
this, I hate people who wallow in mediocrity like that, you don't advance with those guys
>>
File: ifx134.jpg (340 KB, 1024x1024)
340 KB
340 KB JPG



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.