[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


Officially Edition

Discussion of Free and Open Source Diffusion Models

Prev: >>107896297

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Flux Klein
https://huggingface.co/collections/black-forest-labs/flux2

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>NetaYume
https://huggingface.co/duongve/NetaYume-Lumina-Image-2.0
https://nieta-art.feishu.cn/wiki/RZAawlH2ci74qckRLRPc9tOynrb

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
I don't think klein has any notion of what "image 1" and "image 2" are, I don't see anything in the workflow passing that information.
>>
>>107897866
>>107897880
kek
>>
File: 1737619414755880.png (1.5 MB, 832x1248)
1.5 MB
1.5 MB PNG
the woman in image 1 is dressed as the anime girl in image 2 who is wearing black stockings

blanc from nikke winter outfit:
>>
>>107897891
bro almost made me download 4B
>>
>>107897886
The klein docs have it, you have to click from hf model card -> bfl blogpost -> klein docs to get there without explicitly searching them though. I've also been a little unclear on how strongly it understands "image 1" vs "the image with the girl in pink" though, and it's mildly dodgy enough in both cases that it's hard to figure out which is more reliable.
>>
File: Klein 9b.png (3.79 MB, 2521x1088)
3.79 MB
3.79 MB PNG
>>107897900
works well with locations too
https://www.youtube.com/watch?v=efMqb9-2bEY
>>
File: 1760425625287696.png (1.41 MB, 832x1248)
1.41 MB
1.41 MB PNG
>>107897892
bunny ade, also nikke
>>
File: 1757680929451260.png (503 KB, 802x449)
503 KB
503 KB PNG
https://xcancel.com/spectatorindex/status/2011884192075038998#m
Germany won't have electricity in the future because they are a bunch of retards, BFL knows it and since they are aware it'll be the last model they'll be able to train on, they made it based, that is my theory lul
>>
>>107897921
if you were an LLM you'd hallucinate non stop
>>
>>107897909
I'll go with descriptions until it's clearer, the way everyone uses "image 1" makes no sense to me.
>>
File: 1747962290698697.png (1.45 MB, 832x1248)
1.45 MB
1.45 MB PNG
>>107897912
summer elegg, actually worked
>>
>>107897931
>model trained on doing referential edits can do referential edits
woaw
>>
File: this.png (197 KB, 640x421)
197 KB
197 KB PNG
>>107897930
>the way everyone uses "image 1" makes no sense to me.
if it works it works
>>
File: 1740834823610328.png (1.87 MB, 1280x1280)
1.87 MB
1.87 MB PNG
>>
>>107897937
That's the thing, half the time it does the opposite of what I asked for, especially when both images are really alike.
>>
why do edit models bring out all these retard who have to absolutely share ALL of their gens here?
>lol fent floyd edit XD
>cia guy
>gosling
>generic asian in 200 clothes
what's this mental illness called?
>>
File: 1768334598553345.png (2.23 MB, 2343x1214)
2.23 MB
2.23 MB PNG
>>107897936
at least it doesn't slopify it like Qwen Image Edit
>>
>>107897947
they see this place as their only form of community, please understand
>>
>>107897949
QIE is more consistent but slopped
>>
>>107897947
>generic asian in 200 clothes
I'm fine with this, it's nice eye candy, I'd rather have people experiment with that than the billionth trump or epstein or hitler or floyd.
>>
>>107897947
kys, fentman mikunig is carrying these threads
>>
>>107897947
>>gosling
I only saw him once those last 10 threads, you must have a hard crush on him to immediatly notice his gen and remember it
>>
>>107897947
Be the change you want to see anon, show your high quality gens, don't let them win!
>>
Imagine if BFL released Klein back then instead of 1 and Schell. All that time and compute lost...
>>
will 4b loras work with 9b do you think?
>>
>>107897978
you don't get klein without flux 1 first, they needed to experiment and improve on their skills
>>
File: file.jpg (359 KB, 1616x1616)
359 KB
359 KB JPG
I used a blowjob image as a test with the picrel girl on klein.
It does properly swap the girls, it even makes her skin worse to look like the porn actress, but it clearly doesn't understand what a penis is :
(nsfw)
https://files.catbox.moe/4b6ur1.jpg
>>
why is memory management such shit in comfyui now? trying the 4b and it arbitrary decided to offload half the model despite leaving half of my vram unused
>>
Imagine if BFL released Blitz back then instead of 2 and Klein. All that time and compute lost...
>>
>>107897983
absolutely not, they're not the same models
>>
>>107897983
no, think of them as completely different things
>>
>>107897989
unironically use multi gpu's node to load your model with their memory management script instead of Comfy's one
https://github.com/pollockjj/ComfyUI-MultiGPU
>>
>>107897978
I'd rather imagine them releasing everything with apache 2.0 license and not writing paragraphs about safety bullshit.
>>
File: 1755375953669550.png (13 KB, 109x117)
13 KB
13 KB PNG
>cant train 9b loras
it's figuratively over
>>
>>107897988
This is 1-2 loras (penis, vagina) away of being usable for nsfw.
>>
>>107897947
fr cuh all these wack ahh edits finna make me crash out :skull::skull::skull:
>>
>>107898008
Ram Torch

>>107898012
No it's not
>>
File: 1762761297593585.png (2.74 MB, 1024x1472)
2.74 MB
2.74 MB PNG
>>
How many images can you use for image edit before it falls apart? Tried 3 completely different images and it works well so far.
>>
>>107897861
Where are the Klein LoRAs/finetunes?
>>
the way "ReferenceLatent" node are used for multiple images in klein official wf gives me a headache
>>
tested the klein 4B nsfw claim
as expected, it's bullshit
>>
>>107898038
go for that node instead
https://github.com/BigStationW/ComfyUi-TextEncodeEditAdvanced
>>
File: 1747765083338629.png (1.32 MB, 912x1136)
1.32 MB
1.32 MB PNG
fun with flux klein: first lady of california btw with her real ex, harvey weinstein

2 steps:

the blonde girl is kneeling on the floor in front of a black couch. she has a small amount of vanilla pudding dripping on her forehead and chin. keep her expression the same.

2 img: the man in image 2 is sitting on the couch behind the woman in image 1, smoking a cigar.

qwen edit did a worse job of maintaining the original face in edits at times, not any more!
>>
training a 4b lora right now and it seems to be learning extremely fast. 1000 steps is probably enough
>>
>>107898047
?
>>
File: 1737560179523025.png (1.32 MB, 912x1136)
1.32 MB
1.32 MB PNG
>>107898053
>>
>>107898062
look the previous thread, a troll wanted to make us believe klein could render dicks out of the box
>>
what's the best "make this look realistic" prompt for klein?
>>
File: F2K4bGyate.jpg (44 KB, 512x512)
44 KB
44 KB JPG
>>107898037
The first few trainers (ai-toolkit, AFAIK also OneTrainer) added support like half a day ago.

I am trying to train tho and it seems pretty easy comparatively.
>>
File: F2K4bGyate.png (655 KB, 1024x1024)
655 KB
655 KB PNG
>>
>>107898067
I don't know if it can generate penises but you can make the edit mode zoom in on an existing penis in an image
>>
>>107898047
if anyone actually cared about "muh NSFW" they'd train 17B Hunyuan Image 2.1 because it's the ONLY recent model that's ACTUALLY "muh hckin uncensored" by the definition of retarded zoomers / ESL jeet retard faggots on /r/stablediffusion who believe that "uncensored" means "actively trained on pornography with proper captions".
>>
Pussy is changed to a zipper.
>>
>>107898067
it can't but neither can anything that isn't Hunyuan Image 2.1. Thus complaining about muh censorship for any these recent models basically makes you a faggot, they're ALL very similar.
>>
>>107898049
ok I'll check
>>
File: 1765448380440875.png (2.16 MB, 1088x1424)
2.16 MB
2.16 MB PNG
>>
>>107898083
Bet someone has that fetish
>>
>>107898083
almost like an innie, my fav
>>
File: radiance.jpg (104 KB, 848x1488)
104 KB
104 KB JPG
>>
File: LTX2_00014-audio-1.mp4 (3.77 MB, 1280x1280)
3.77 MB
3.77 MB MP4
Using ltx2 as an upscaler for old stuff is very impressive. Hopefully by 2.5 they'll have solved the faces changing.
>>
>>107898079
>they'd train 17B Hunyuan Image 2.1
who's gonna train such a big model, chroma is 8.9b and it has costed 200k to lodestone
>>
>>107898106
wonder if itll be possible to use it for upscaling only, that'd be great
>>
>>107898096
> radiance.jpg
> 2026
>>
can you use a negative prompt with klein?
>>
File: 1761477937559242.png (1.4 MB, 912x1136)
1.4 MB
1.4 MB PNG
>>
>>107898116
its not done yet so prob 2026, yea
>>
>>107898117
Only with base, or whenever NAG gets supported
>>
>>107898128
> yet
>>
File: radiance.jpg (189 KB, 848x1488)
189 KB
189 KB JPG
>>107898116
yep, nice model. it's getting cleaner too even on his consumer hardware training

of course if the same dataset is trained on a more powerful base/edit model i won't mind either, but it is good
>>
>>107897911
>2000s CG model enjoyer
I See You.
>>
>>107898137
looks like sd 1.5
>>
File: Untdfsdfsd.png (92 KB, 942x713)
92 KB
92 KB PNG
he says he will not stop training it till its done
>>
File: Flux2-Image_00257_.png (1.69 MB, 976x896)
1.69 MB
1.69 MB PNG
>>
File: 1767630246211213.png (233 KB, 768x432)
233 KB
233 KB PNG
>>107898138
based
>>
File: LTX2_00017-audio-1.mp4 (3.46 MB, 512x512)
3.46 MB
3.46 MB MP4
Doing t2v with no prompts makes 100% jeet content.
Doing v2v with no prompt creates 100% italian audio.

>>107898115
I think I did that earlier, it changes too much.
>>
I miss Neta Yume
>>
>>107898146
what a retard, radiance is only really relevant on edit models, and there's no shame in abandoning a base model when a better one just got released
>>
>>107897947
Shit, I still can't get it to keep the likeness and I've jumped through more than a few hoops; removing the resize option from the first (base) image, calculating the MP for the second, combining the text encoder from shards (which was a pain in the ass because the first few options I tried were for older versions of Python) to get completely off of fp8. Just got a quality increase out of all that work, it'll still almost entirely change any person used.

I'll keep at it, but not too sure where to look next.
>>
File: 1767700962155846.jpg (366 KB, 2396x1034)
366 KB
366 KB JPG
>>107898038
Use set and get nodes from kjnodes and line them up, and it'll become easy to understand.
>>
File: radiance.jpg (131 KB, 848x1488)
131 KB
131 KB JPG
>>107898140
your memory is tricking you
>>
File: LTX2_00020-audio-1.mp4 (3.58 MB, 896x896)
3.58 MB
3.58 MB MP4
Fuck me, I nearly vomited from the laughter, I just finished breakfast.
>>
>>107898129
>Only with base, or whenever NAG gets supported
Hopefully NAG will be soon. Getting extra fingers and all that crap
>>
>>107898161
if you didn't know, he does not give a fuck about people using his models. it does it for proofs of concept in order to write his papers

btw, early klein 4B nsfw dan / e621 lora, posting it again
https://files.catbox.moe/3n5lui.png
https://files.catbox.moe/9a8g4q.safetensors

still far too early for proper sex but its learning super fast:
https://files.catbox.moe/lnn42y.png
>>
>>107898187
>its learning super fast
based, I can feel this model's potential already
>>
>>107898187
nice, I hope people will do one for 9B too
>>
File: radiance.jpg (148 KB, 848x1488)
148 KB
148 KB JPG
>>107898146
based, 4b should get results faster than most of the earlier models too

>>107898161
>radiance is only really relevant on edit models
on edit models? what?

either way it's also an actually good proof of concept for the pixel space technique used and ramtorch
>>
>>107898177
no it genuinely looks like sd 1.5, that pic included. NAI V1 vibes
>>
File: 1764593490554549.png (141 KB, 249x259)
141 KB
141 KB PNG
>>107898170
you have to frame the photo like this
>>
>>107898215
>on edit models? what?
because on edit models if you use a vae you compress the image input at the end, a radiance edit model would fix that
>>
File: 1756779178007016.png (1.44 MB, 1149x641)
1.44 MB
1.44 MB PNG
the asian girl is riding a jetski in the ocean, with a beach visible in the distance. keep her pose the same and the perspective the same.
>>
>>107898234
thank you for showing me that the edit model can indeed edit images when provided with an image and a prompt
>>
>>107898234
asians are so fucking disrespectful. showing your ass like that
>>
File: 1738265716356769.png (1.46 MB, 960x1072)
1.46 MB
1.46 MB PNG
>>107898234
replace the asian woman in image 1 with the anime girl in image 2.
>>
>>107898234
>>107898261
Why do npcs post like this?
>>
>>107898251
>What is editing quality
>>
>>107898234
>>107898261
kek, nice
>>
>>107898230
well yea, a vae is sort-of a lossy compression technique acting both on the latent and pixel space... but that also happens with non edit mode, just imagegen

the theoretical advantages of doing things in pixel space are the same even if you just train and generate without editing?
>>
File: LTX2_00027-audio-1.mp4 (3.43 MB, 1152x1024)
3.43 MB
3.43 MB MP4
It seems to understand the overall style of the video it's given very well. The audio turns into cartoon for a cartoony image, and for this, an italian soap opera as well as subtitles.
>>
File: 1746540558981165.png (1.62 MB, 1360x752)
1.62 MB
1.62 MB PNG
replace the text "RUSH HOUR" with "DEI HOUR". make the black man very fat. give the asian man black skin like a black man.
>>
>>107898269
>that also happens with non edit mode, just imagegen
fair enough
>>
>>107898280
https://github.com/Rolandjg/LTX-2-video-extend-ComfyUI/tree/main

use this workflow and set the intro cap to like 33 or 60 frames, it can clone the audio of a clip just set a good starting point
>>
Using img2img and condition strength isn't having any effect?
>>
replace the tanks with hello kitty. the anime girl in image 2 is standing on the street on the left.

you see? nothing bad at all happened.
>>
File: 1766203310538079.png (1.37 MB, 1360x768)
1.37 MB
1.37 MB PNG
>>107898305
helps if I add the image.
>>
File: LTX2_00029-audio-1.mp4 (3.61 MB, 1024x768)
3.61 MB
3.61 MB MP4
"Why are we here..? Just to suffer..?"

>>107898302
So when I feed it the entire framecount, it replicates the entire video?
>>
File: 1742416107057936.png (1.36 MB, 1360x768)
1.36 MB
1.36 MB PNG
>>107898308
my mistake, there were no tanks. -10000 social credit points.
>>
>>107898313
it will extend whatever is after the max frame count you set could be 10 frames could be 100, just pick a spot

entire count would dupe the video I think
>>
>>107898222
It won't get the outfit like that, would adding a third image work?
>>
>>107898318
kek
>>
File: Flux2-Concat_00261_.jpg (919 KB, 2858x2048)
919 KB
919 KB JPG
Reframe the camera to a slightly lower angle and wider field of view while keeping the subject’s proportions anatomically consistent.
Change the background from a flat green studio backdrop to a dim, industrial interior with depth and atmospheric perspective.
Shift the lighting to a single strong rim light from behind and above, with softer fill lighting on the front of the body.
Rotate the subject’s head a few degrees toward the camera while maintaining natural neck and shoulder alignment.
Replace the knit top’s material with distressed leather, preserving seams, straps, and realistic surface wear.
Modify the metal armor pieces to show scratches, edge wear, and subtle oxidation consistent with use.
Adjust skin shading to a cooler overall tone while preserving natural highlights and subsurface scattering.
Introduce a secondary colored light source that casts a faint green or amber glow onto nearby surfaces.
Change the leg pose so the raised knee lowers slightly, redistributing weight believably through the hips and core.
Add subtle motion cues such as loose fabric tension and slight hair displacement without global motion blur.
Insert a reflective floor surface beneath the subject that correctly mirrors lighting and nearby forms.
Apply a cinematic color grade that increases contrast and mood while avoiding crushed shadows or blown highlights.
>>
how do you set ai-toolkit to use int8 instead of bf16?
>>
>>107898384
theres quantization for transformer and text encoder on the right site when creating a job, right?
>>
File: 1742831569832612.png (14 KB, 444x406)
14 KB
14 KB PNG
>>107898397
yeah i saw these but there's no int8 option. also i'm running out of vram no matter what i put them at.
>>
>>107898412
oh i see. yeah in my experience aitoolkit is massive vram hungry
>>
>>107898234
nvm the haters this is great
>>
File: 1767897271779496.png (1.34 MB, 1248x832)
1.34 MB
1.34 MB PNG
replace the man in image 1 with the anime girl in image 2 in the same pose, sitting in the chair counting money.
>>
File: ai7.png (569 KB, 768x833)
569 KB
569 KB PNG
guys we've come so far, this was mid-2022
>>
>>107898463
og dall-e mini was peak
>>
Z-image-base is going to feel like SD1.5 when it comes out.
>>
so ai-toolkit is the only real option for flux2 klein training rn?
>>
File: 1765751808228347.png (1.38 MB, 1376x752)
1.38 MB
1.38 MB PNG
remove the cartoon man in image 1 and replace them with the anime girl in image 2 sitting on a chair, in the same pose. change the text "MARIO PAINT" to "MIKU PAINT".
>>
>>107898484
nah
>>
File: 1758922412877480.png (1.88 MB, 1248x832)
1.88 MB
1.88 MB PNG
>>107898463
>guys we've come so far, this was mid-2022
and with edit models we basically solved celebrities/characters loras lul
>>
>>107898489
what else is there? dont think onetrainer supports it yet
>>
File: 1749516130477566.png (1.33 MB, 1376x752)
1.33 MB
1.33 MB PNG
>>107898486
and with a painting prompt added:
>>
File: Flux2-Image_00264_.png (1.09 MB, 1008x688)
1.09 MB
1.09 MB PNG
>>
>>107898146
OUUUGGH
SEEEXXXXX
>>
>>107897983
If flux loras work with chroma at least partially I don't see why not.
>>
File: 1749613144217481.png (2.14 MB, 1850x768)
2.14 MB
2.14 MB PNG
https://www.youtube.com/watch?v=4PzetPqepXA
>>
Its amazing how confidently wrong some people are: https://www.reddit.com/r/StableDiffusion/comments/1qftepq/comment/o09c3t6/?context=1
>>
File: Flux2-Image_00270_.png (1.87 MB, 992x1216)
1.87 MB
1.87 MB PNG
>>
>>107898530
>https://www.reddit.com/r/StableDiffusion/comments/1qftepq/comment/o09c3t6/?context=1
bruh this Pyros guy is a retard, he's been spreading this BS forever
undercaptioning makes you a jeet retard, end of story, there are no downsides to using proper NLP captions on recent models
>>
File: 1743863698980938.png (77 KB, 1168x761)
77 KB
77 KB PNG
>>107898550
>>107898460
>>107898499
would be good if you also show the original input so that we can see what you've changed, you can use this node to do so
https://github.com/Eagle-CN/ComfyUI-Addoor
>>
>>107898556
fuck off retard
>>
What's the use case for Klein 4B when 9B exists?
>>
>>107898563
sdxl sisters are coping that they can use their 1060 6gb card to train their loras there
>>
>>107898563
muh license
>>
>>107898556
Imagine needing a (((((custom node))))) for stitching images together
>>
>>107898562
take a rope and hang yourself
>>
>>107898563
license
>>
File: Flux2-Concat_00119_.jpg (1.09 MB, 3072x2048)
1.09 MB
1.09 MB JPG
>>107898556
I always do, I just don't post them because sometimes they're too big, and sometimes I just don't feel like it you nyo what I mean?
>>
>>107898573
you first fag
>>
>>107898571
native comfy is so lackluster
>>
Is a RTX5070 12gb VRAM and 32gb of ram good enough to run Flux locally? Grok got censored and I really need img2img deep fakes for gooning... anybody got a good comfyui workflow?....
>>
klein 4b is too fucking fast. 4 second gens at 1024x1546 is retarded
>>
>>107898563
speed I guess
4B + 4B faster than 9B + 8B
>>
>>107898576
nice gen
>>
>>107898604
Thanks
>>
>>107898585
>5070
>12gb
Should've bought 5060ti 16gb if you couldn't afford 5070ti
>workflow
Latest comfyui has workflows for the latest Flux 2 Klein, it should work fine, and if it doesn't then look up gguf quants on huggingface
>>
>>107898563
>What's the use case for Klein 4B when 9B exists?
the licence, because of that, people who make expensive finetunes like lodestone won't be able to take donations and will opt out for the smaller model (4b has the perfect Apache 2.0 licence)
>>
>>107898598
imagine being poor
>>
>>107898585
I have 12 GB VRAM and 32GB RAM too, I can run Flux2 klein 4B but it OOMs at higher resolutions, Haven't tried 9B yet.
>>
Is there any 3dcgi/y2k 3d style loras for zit?
>>
>>107898598
9B-nvfp4 is faster and better than 4B-fp8/16. But 4B-nvfp4 is pushing it.
The real reason is license.
>>
so klein loras were an absolute flop.
>>
It's actually kind of annoying how bad Flux Klein is with anatomy. I don't think even Qwen Image Edit was this bad with limbs and proportions. Hopefully z-image edit will btfo this model
>>
>>107898645
substantiate your claim.
>>
File: Flux2-Image_00296_.png (810 KB, 912x1136)
810 KB
810 KB PNG
>>
>>107898647
Agreed. Flux is awful with anatomy. Feels like their training dataset was cucked only a little less than SD3. How can a 9B model gen three legs and four arms?
>>
File: 1760434626470012.jpg (2.93 MB, 3500x2947)
2.93 MB
2.93 MB JPG
Mr President, another edit model has hit the tower.
https://riko0.github.io/VIBE/
>>
File: ffqfqfqwfq-1.mp4 (3.3 MB, 1620x1080)
3.3 MB
3.3 MB MP4
Wan 2.2 left, ltx2 right.

"the man is looking into the camera for a moment while smiling. he then begins to crawl across the floor towards the camera very fast. he struggles to squeeze through under the toilet stall door as he is obese while still looking into the camera. the man begins to reach and claw towards the camera as he is stuck under the toilet stall door. the camera moves as if it is a handheld camera."
>>
>>107898669
umm sir bobs and vagene? i need to know the potential of bobs and vagene sir. does it into bobs and vagene tremendously? bobs and vagene
>>
>>107898669
>GPU Memory: 24 GB
>>
>>107898676
you could make like $2000 off impressions from a video like this kek
>>
>>107898676
wan was just making shit up in the end lol
>>
>>107898269
I'm assuming the point is that no vae step means that if your edit model is also well-behaved about the regions it modifies, it's much easier to perform edits where you keep the bulk of the image pixel-perfect or near pixel-perfect identical to the original. Gold medal being 1:1, silver medal being near enough that you can paste the edited region on to the old image, bronze medal being that but you need a transparency fade at the border to hide the change (any of these is still a good achievement)
>>
>>107898647
use moar than 4 steps faggot
also SD3 being bad because of censorship is a meme only retards believe in, it was bad because the noise scheduling and embeddings were literally fucking broken in obvious ways
>>
>>107898669
>>107898680
>first Kandinsky
>now this
Russians sure are the masters of optimization
>>
>>107898669
>based on Sana 1.5
probs shite
>>
>>107898645
wat
>>
File: Klein 9b.png (2.19 MB, 1635x864)
2.19 MB
2.19 MB PNG
>>
>>107898653
>>107898717
likeness is god awful
>>
>>107898704
and it wont be 12x slower why?
>>
>>107898722
likeness of whomst
>>
>>107898723
now the pixel methods is using JiT, and basically it makes it as fast as a vae, I tried a radiant model and it was as fast as regular chroma
https://github.com/LTH14/JiT
>>
File: 1756705255011226.png (531 KB, 686x386)
531 KB
531 KB PNG
>>107898720
you could have made it funny but you didnt
>>
>>107898722
source? better not be a jeet who thinks training anything at 512x512 is sensible
>>
>>107898723
I don't know anything about speed, I'm pointing out why being able to avoid the vae step is more meaningful for an edit model / i2i than it is for standard t2i workflows. It'll be nice if/when we get any functional models and workflows that accommodate that, even if no practically usable ones currently exist on the radar.
>>
>>107898669
>based on Sana1.5-1.6B
so its shit. Got it
>>
>>107898686
You can make money on meming with AI, where?

>>107898687
Wan was most true to the prompt, I haven't bothered learning it for ltx2.
>>
>>107898722
substantiate your claim.
>>
do we have some vram estimations for 4b and 9b training at 1024?
>>
File: Klein 9b.png (1.9 MB, 1451x864)
1.9 MB
1.9 MB PNG
>BFL be like after releasing Klein and BTFO the Alibaba fucks
Your move, chang.
>>
File: 1665046905653142.png (1.1 MB, 1939x1473)
1.1 MB
1.1 MB PNG
This might be a stupid question, but since kleins are still flux2, wasn't it trained on much larger resolution? Do people still train on 512??
>>
>>107898770
Probably an example of the weakness of the model here. It gave Mao Hitler's outfit.
>>
File: Flux2-Image_00322_.png (945 KB, 816x1024)
945 KB
945 KB PNG
>>
File: hmm.jpg (12 KB, 417x201)
12 KB
12 KB JPG
lads
I have 6 GB GPU and 16 GB RAM on my laptop
I can run flux but not Wan
I'm thinking of upgrading my RAM
Should I go all in on 64 GB or is 32 GB RAM enough?
>>
Today I had a redditor try and convince me you don't need a large GPU because you can just offload to vram. Actually gave me the ick.
>>
>>107898814
*offload to RAM
>>
File: 1763666662167500.jpg (2.39 MB, 1664x2432)
2.39 MB
2.39 MB JPG
>>107898585
I have 3060 12gb and i can prompt in gguf 9b klein just fine.
>>
>nvfp4 or fp8 for klein
is the difference huge?
>>
>>107898813
go for 64gb if you get a reasonable price
I got OOM constantly with 32gb (due to certain tools having no memory management whatsoever)
>>
>>107898829
q8
>>
>>107898840
but I'm on 5070 ti and I want speed
>>
>>107898845
Nigga you you can fit the entire model. Literally why bother.
>>
>>107898550
Gem
>>
>>107898849
because you gen more? what dumb fucking question is that?
>>
Saars, send help.

When I concat the prompt output from these two angle nodes, it gives me a new image for each amount of light I have added. One light for each new image.

But if I manually add the prompt from each light, it combines them, which is what I want. A workaround is to add a new light with a whole new duplicate light node which is bad.

Saar, please, what can I do?
>>
>>107898849
NTA but speed I presume. 5070ti is fast enough for distill, but I would say it can be worthwhile for base.
>>107898829
nvfp4
FP8 is worthless if you have nvfp4 acceleration.
Quality/speed ratio of nvfp4 is insanely good.
Either go for bf16 for precision or nvfp4.
>>
>>107898845
I'm using the raw flux2klein 9B with the qwen3 8B q8, on a 5070TI I'm getting 1.9s/it on ~1MP i2i. Even spamming 3-5 outputs to hunt for the best result doesn't take long, sacrificing model quality for the speed to spam more than that is likely to just be a worse process. Still, everyone's workflow is different, so if those speeds don't sound good enough to you, thankfully >>107898873 has answered your question.
>>
File: 1672655319922495.png (1.12 MB, 1024x768)
1.12 MB
1.12 MB PNG
>>107898550
OMGSISA
>>
>>107898813
It will cost fortune in current prices but either go for 64 or don't upgrade I would say.
>>
File: Flux2-Concat_00352_.jpg (1.55 MB, 3132x1760)
1.55 MB
1.55 MB JPG
Took a few tries, but it was more about being specific with the prompting
>>
File: Flux2-Klein_00052_.png (1.75 MB, 1280x720)
1.75 MB
1.75 MB PNG
>>
cant get 9b base to train on 16gb and ai-toolkit
always oom
ugh
>>
Do the base templates from comfy for Flux2 fail to randomize the seed properly or am I going crazy?
>>
File: Flux2-Klein_00062_.png (1.4 MB, 1280x720)
1.4 MB
1.4 MB PNG
>>
I need an hf account to download klein 9b
what a great start
>>
>>107898976
yeah they do, just go into the subgraph and delete the "random noise" node entirely and just directly hook up the SamplerCustomAdvanced's "noise" input to a new node on the input list on the left, then go use a random noise from seed like normal on the top level, much more user friendly. also nuke the "ImageScaleToTotalPixels" while you're in there.
>>
>>107898989
Embrace the glorious Chinese culture and use modelscope
>>
>>107898976
>>107898983
>>107898989
what's klein? What does it do?
>>
>>107898993
Isn't this model 1MP?
>>
>>107899018
Flux 2 but can be run inside a normal GPU with sane speeds and doesn't suck dick.
>>
File: 1760645350995373.jpg (2.7 MB, 1664x2432)
2.7 MB
2.7 MB JPG
>>107899018
It sneeds
>>
File: 1738774785588101.png (2.66 MB, 1608x880)
2.66 MB
2.66 MB PNG
>>107899018
>What does it do?
really high quality edits, it's so far the best local edit model
https://huggingface.co/black-forest-labs/FLUX.2-klein-9B
>>
File: Flux2-Klein_00010_.jpg (1.59 MB, 2048x2048)
1.59 MB
1.59 MB JPG
>>107899019
It's whatever your vram can handle, though it does get a bit of weird noise at higher resolutions
>>
File: Flux2-Klein_00067_.png (2 MB, 1024x1024)
2 MB
2 MB PNG
>>
>>107899030
>>107899032
>>107899034
will give it a try
I take breaks because SD is too addictive
>>
>>107899019
works totally fine at other resolutions for me, but in particular if you have something that's like, 0.9MP or 1.1MP, it's really dumb to have that slightly resized for the i2i, easily causes nasty artifacts in certain image types (i had some issues with the black borders on cartoonish art)
>>
flux2 q4m or klein 9b bf16? which way?
>>
>>107899072
Small one is cope. Just got 9b.
>>
>>107899072
flux2 is better at some things, but not a whole lot
do I need to list the reasons why klein is superior? especially against a gimped flux2
>>
>>107899072
The large Flux 2 just sucks. The cope quant will also run a lot slower than 9b bf16. The turbo lora is also 8 steps compared to klein's 4 step distill.
>>
>>107899101
4 steps is shit though, super sloppy and sometimes bad anatomy. 8+ is needed
>>
>>107899072
it's been so long and there's not a lot of loras for dev
>>
>>107899108
>8+ is needed
which is fine because even at 8 steps it's fast as fuck
>>
>>107898790
The "people" as in civitai jeets love 512p training. But you should train at 1024p for best results.
(Z-Image seems to be partially an exception as it can go crazy training at 1024p under some conditions. But still even with that I would go for 768p if I don't want to risk it)
>>
>>
dont bother training 4b loras
>>
>>107899125
I hope you understand that all major models are trained 80% or so on like 256 or 512 for sake of costs. Only the late stage training is high res.
>>
>>107899137
Not useful commentary without telling why.
>>
I am training klein 9b on ugly jp manko right now, and it actually manages to make it look anatomically correct in ~4 hours of compute.
It is crazy how fast it learns when compared to the 40 hours I wasted on zimage. Hell, I wasted about 2 months of compute on flux1 dev and it never learned it well either.
>>
>>107899142
because I said so
>>
>>107899137
TOO LATE!
>>107898187
>>107898146
>>
>>107899133
man obese people are so lucky because they can go butt naked without being fined for public indecency
>>
>>107899143
>40 hours I wasted on zimage.
I don't know why anyone bothers to train on distilled models at all after Chroma.
>>
>>107899141
Yes I know this already.
But if you want the lora to learn finer details of your dataset properly you should train at 1024p.
>>
should i always increase LR with batch size? e.g. 4e-4 for batch size of 4
>>
>>107899143
sample output after 3 hours of training:
https://files.catbox.moe/zmo7be.png
>>
>klein = small
TIL
>>
>>107898867
why do your nodes lok like that?
>>
>>107899150
desperation
>>107899143
yeah 9b is probably the perfect model to train with, but its shitty licence kills all momentum, there will be no serious finetune on it
>>
>>107899169
klein dick energy.
>>
>>107899143
>>107899168
512? 1024? how much vram kind sir?
>>
I think I've found a good way to get ComfyUI working on a Framework Desktop. I will report back once my hardware arrives.
>assemble Framework Desktop
>install Fedora 43 or later
$ sudo usermod -a -G render,video <your username>
$ sudo dnf install rocm
$ docker pull corundex/comfyui-rocm
$ docker run -d \
--device=/dev/kfd \
--device=/dev/dri \
--group-add=video \
-p 8188:8188 \
-v <host folder for models>:/workspace/ComfyUI/models \
-v <host folder for output>:/workspace/ComfyUI/output \
corundex/comfyui-rocm:latest
>>
>rocm
stopped reading there
>>
>>107899189
Cheaper than any novidya GPU with 128GB VRAM.
>>
>>107899168
How many samples you tried? If you are not cherry picking good results as far as beating nsfw into sfw models with just lora goes. May I also ask rank, dataset size, total steps, training res?
>>
File: 1533423826134.jpg (100 KB, 466x380)
100 KB
100 KB JPG
So is anyone gonna post Klein lora training settings? I'm not trial-n-erroring random jeet settings from googling.
>>
>>107899196
128GB of completely useless "vram" (its not vram)
Its slow as fuck, you will make a image even 10 mins maybe. A video in a few hours. It runs shitty old llms like 70B at like 3 tks.
>>
>>107899209
That sounds a lot like you didn't actually install ROCm and were running on the CPU like a retard.
>>
>>107899207
then you have to wait until other people figured it out for you
>>
>>107899186
>>107899196
Inference speeds?
Also you are not filling that 128gb with anything besides MOE LLMs and get sane speeds. It is roughly only as powerful as a 60 series card.
Seems like a bad buy for anything besides glm air or gpt oss 120b to me.
>>
>>107899198
I know that its relatively ""easy"" to overtrain a lora on 50 pussy images. I don't want that so for this run the dataset is ~5k images, and it is only 4 epochs in.

>>107899182
256, using a lot of vram because I made the lora huge, but it could fit in 16gb with resonable settings, 8gb with offloading.
>>
>>107899226
go ahead and show me your iterations per second then with a full size image
>>
rocm socm retards
>>
File: 1762157407096402.jpg (3.08 MB, 1664x2432)
3.08 MB
3.08 MB JPG
>>
>>107899287
>>
Has anyone tried current OneTrainer PR for Flux Klein? Does it work?
>>
File: Untitled.png (92 KB, 725x677)
92 KB
92 KB PNG
I'm actually starting to get excited about ace step 1.5

I thought we were getting another eternal two weeks but actually looking through their discord I think it actually is going to launch on January 27th.

I'm usually distrusting of Chinese culture like this but they have a concrete date and have openly said they plan to stay open source until they surpass Suno. We might be about to get an actual good music model.
>>
File: Flux2-Klein_00069_.png (1.09 MB, 1024x1024)
1.09 MB
1.09 MB PNG
>>107899241
>>
>>107899348
china is the only hope for open music models. the music industry here would literally kill you
>>
File: Klein 9b.png (3.08 MB, 1984x1040)
3.08 MB
3.08 MB PNG
https://www.youtube.com/watch?v=hzFDTtjBmmU
>>
>>107899365
>the music industry here would literally kill you
I do not doubt it. Like not even joking. There are some evil people in that industry.
>>
>>107899365
its so funny that this is why we dont have any good local audio models. whoever produces one will get raped in court
>>
File: 1752578174593632.png (189 KB, 560x407)
189 KB
189 KB PNG
>>107899411
it can be "leaked" on 4chan like llama 1 or novelai
>>
do I need to use a custom node to load klein in nvfp4?
>>
do I need anything special for nvfp4? I tried using it and it gave me a weird looking output that looks halfway like the latent image before VAE decoding
>>
>>107899227
>then you have to wait until other people figured it out for you
Well they are already training so where's the sauce?
>>
>>107899424
Don't you need to convert it with tensor rt? Or was that something else?
>>
>>107899424
no, but you need a cuda13 pytorch installed otherwise it wont work
>>
>>107899430
oh I'm retarded I got a 40 series card
>>
wait what? there is a new edit model?
>>
>>107899333
>NotImplementedError: Loading of single file Flux2 models not supported. Use the diffusers model instead. Optionally, transformer-only safetensor files can be loaded by overriding the transformer.
Yeah I am not gonna bother downloading all that.
>>107899430
Just a 5000 series GPU. Driver version/cuda/torch too old?
The outputs are only moderately different than 16bit.
>>107899440
That has nothing to do with that. No idea what you are thinking.
>>
File: 1768567198322198.jpg (2.53 MB, 2000x2850)
2.53 MB
2.53 MB JPG
https://xcancel.com/ModelScope2022/status/2012752195939168522#m
Babe wake up, a real finetune of Qwen Image has been made
>>
>>107899484
ARE THOSE..OMG..BENCHMARK IMAGES??
>>
File: Klein 9b.png (3.77 MB, 2174x928)
3.77 MB
3.77 MB PNG
>>
>>107899492
holy deepfry
>>
>>107899484
>Much thanks to kanttouchthis for providing the quantization script and the sdnq_uint4_r128 quantized model. The image quality can reach above Q6, nearly Q8.
sdnq is that good?
>>
>>107899492
>pure slop outfit on Ghost of Kiev.
Look at your gen for a second before spamming.
>>
File: 1761402726619829.jpg (459 KB, 859x925)
459 KB
459 KB JPG
>>107899492
insert this guy into the image
>>
>>107899498
Isn't that the shit nunchaku uses?
>>
>>107899209
>you will make a image even 10 mins maybe
NTA, but even on a Strix Point laptop, I get a 1024x1024 SDXL image in about a minute (24 steps). 1280x1600 takes about two minutes. Strix Halo would be much faster.

I can also run Z-Image and Klein, but loading/unloading the big text encoder is the bottleneck. (Wish I could throw it on the NPU or something, but ComfyUI doesn't support it yet.) 128GB would not have that issue.
>>
>>107899465
oh that's why I don't see speed boost between fp8 and fp4. I'm on torch 2.8.0dev cu128
>>
File: Flux2-Klein_00071_.png (2.14 MB, 1024x1024)
2.14 MB
2.14 MB PNG
>>
>>107899484
a finetune of a finetune of a finetune... i think qwen image is already too fried, that's why you get weird line artifacts and shit.
they need to start over and stop using synthetic data. a model with that many params shouldn't perform that bad.
>>
File: 1749733693045434.png (2.03 MB, 1360x768)
2.03 MB
2.03 MB PNG
>>107899501
all right lol
>>
File: Flux2-Klein_00073_.png (1.74 MB, 720x1280)
1.74 MB
1.74 MB PNG
>>
Interestingly, editing on 9B-distill with Q8 at 8 steps takes about as long 9B-base with nvfp4 at 20 steps.
>>
>>107899498
I used nunchaku (what sdnext burrowed the idea from minus fused kernel speedup) quite a bit with flux and while I like it no, that's a fucking cope. Q6 is closer.
The lora also get "strong-willed" at higher ranks, improves coherency but deviates more strongly with Q8.
>>107899509
Yes but sdnext implementation is slower.
>>
File: 1532138146477.png (129 KB, 250x418)
129 KB
129 KB PNG
so uhh everyone forgor about glm image already?
>>
>>107899529
truth nuke, the BFL fags improved their product by going for a better text encoder and a better vae, you have to work smarter and not insisting on a outdated architecture
>>
>>107899545
There was nothing to remember. It was a shitty autoregressive model that brought nothing to the table and ate up obscene amounts of memory.

The only noteworthy thing about it is that it was trained entirely on Chinese made GPUs. Why they chose to train THAT of all things? I have no idea.
>>
>>107899545
its utter trash.
>>
>>107899553
>Chinese made GPUs
Frankensteined Nvidias or that homebrew cuda emulating cards?
>>
>>107899553
>Why they chose to train THAT of all things? I have no idea.
you experiment, you think it's gonna work, and it didn't, and instead of putting it to the garbage, they released it to the public and said to humanity "this is what you should NOT do", which is a valuable asset desu
>>
>>107899545
the only thing that thing is for is generating semi-legible text infographics on an RTX Pro 6000
>>
what sampler/steps do you guys use for klein?
>>
File: Flux2-Klein_00076_.png (3.06 MB, 1024x1488)
3.06 MB
3.06 MB PNG
>>
>klein-test-v3: 31%|###1 | 940/3000 [11:09<24:27, 1.40it/s, lr: 1.0e-04 loss: 3.270e-01]
Klein 9B training on 512x512 so fast, you can shit out a 3000 step lora in 35 minutes on a 5090
>>
>>107899563
I think autoregressive models are scam by larger AI companies to trick others into wasting compute desu.
>>
File: Klein 9b.png (2.45 MB, 1699x768)
2.45 MB
2.45 MB PNG
>>107899492
Ok that one is pretty good
>>
>>107899566
I am going with default euler + flux2scheduler for now. I will try more combos when I am bored.
Preliminary testing seems to show that 8 steps can help anatomy a bit. But maybe it is worth it over just rolling for another seed.
>>
>>107899587
>But maybe it is worth it over just rolling for another seed.
that's what I noticed with Klein distill, it has good variety, when I hit a wall on Z-image turbo I know I'm fucked because I can't try the seed gacha, ZiT is overtrained wheras Klein is a bit undertrained I would say
>>
>>107899596
yes it's so nice to be able to just reroll seeds again, z-image you'd need to rewrite the prompt again or hook in an LLM to do it for you which slows it down considerably
>>
>>107899596
Seed variety was something I was missing lately.
>>
>>107899578
I can tell there will be a huge wave of Klein loras in the next few weeks, this model is perfect for training
>>
Ready when fresh

>>107899633
>>107899633
>>107899633
>>
>>107899596
>>107899601
Not having to load/run the text encoder again saves a bunch of time for me, so yeah, meaningful seed rerolls are nice.
>>
>>107897892
looks like Anri Okita with a little cracker mixed in



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.