[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now closed. Thanks to all who applied!


[Advertise on 4chan]


Spatial Frequencies Edition

Discussion and Development of Local Image, Video, and Music Models

Previous: >>109024741

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
SDWebUI: https://rentry.org/ldg-lazy-getting-started-guide#the-stable-diffusion-web-ui-lineage
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, & Upscalers
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/tdrussell/diffusion-pipe
https://github.com/kohya-ss/sd-scripts
https://github.com/kohya-ss/musubi-tuner

>Z
https://huggingface.co/Tongyi-MAI/Z-Image

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
https://animadex.net

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>Wan
https://github.com/Wan-Video/Wan2.2

>LTX-2.3
https://huggingface.co/collections/Lightricks/ltx-23

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
>>109028009
shit op cos im not in it again
>>
>>109028009
MAKIMA AWOOOOGAAA
>>
>>109028009
excellent op since i'm in it
>>
Blessed thread of frenship
>>
Just made the collage phew.
>>
>>109028022
What do you mean?
>>
>>109028009
how did anon get so good at generating pictures?
>>
this shit is demonic i cant stop cumming to ai slop fuck you guys
>>
How long NL captions can anima manage? My current datasets have 200-500 tokens captions.
>>
>>109028042
i put satan into the negative prompt
>>
File: z-image-turbo_00092_.png (1.14 MB, 1344x1344)
1.14 MB PNG
>>109028049
I notice that anything above 250 gets iffy during usage. Probably the same during training.
>>
File: ComfyUI_00773_.png (1.1 MB, 896x1152)
1.1 MB PNG
>>
File: matou.png (898 KB, 1024x1024)
898 KB PNG
>>
when is tdruss releasing animagram preview 1?
>>
>>109028092
I can fix her
>>
File: cheeky wank.png (141 KB, 608x439)
141 KB PNG
>1boy, me, 1girl, oneitis, penetration, penis in vagina
how many pics do I need to train such LoRA?
>>
>>109028148
0
>>
I think Ideogram 4 might be the worst local model I've ever used.
The sad part is the community is just taking it.
Safety filters built into the model, grainy output slop, went through SFT on NBP outputs for benchmarks.
Things are getting grim ngl.
>>
>>109028176
You haven't used it because you can't run it
>>
>>109028161
please tell me, I want to make a ZIT LoRA
I've 21 images of her after she turned into an adult and kicked me to the curb
>>
>>109028199
>tell me how to make the deepfake porn saar
Go away
>>
best realism model for i2i from anime rn? can handle complex cosplay tier shit?
>>
>>109028206
prolly Klein
>>
>>109028204
>>109028199
>For a real-person LoRA on Z Image Turbo:

>Goal Images
>Quick test 15–20
>Good likeness 30–40
>Very good likeness 50–80

I'd start with: 40-50 photos

That's usually enough.

Image Types

Try to include:

50%

Face / headshots

30%

Upper body

20%

Full body

And vary:

clothing
lighting
backgrounds
camera angles

Keep image quality reasonably high.

Captions

For Z Image Turbo character training, I would NOT use long natural-language captions. Just a LoRA trigger word
>>
>>109028199
let me see so i know if it is enough data
>>
>>109028049
I aim around 100 words and it seems to work OK. No idea the maximum limit though.
>>109028092
ow the edge
>>
Anon, I know you want to make sexy time pictures of your old crush but I implore you to stop and think before you act. This is not a road you wish to go down, it will only drive you further into depression and loneliness. You simply have to accept the fact that it will never happen. You will never have sex or have any sort of intimate contact with her again. It's better to accept this now than to wake up in 30 years wondering why you are still alone and still jerking off to a girl who hasn't thought of you since 30 years ago. Heed my advice anon.
>>
Tdrussell short for turdrussell?
>>
>>109028250
ok :(
>>
>>109028250
Happily married with kids but I phone a photo of my ex on my iPhone and started genning images of her as a bbw.
>>
>>109028148
Klein can already do this with NSFW lora and reference pictures.
Needs some seed lottery for good likeness but it can be done.
>>
>>109028250
trvthnvke
>>
I scraped a bunch of pics from a couple of my crushes instagram accounts from when they were in highschool and it feels so good to jerk off to the loras i made of each of them
>>
I need help, I am new to all this.

Would "(tight:1.2) shorts," as a tag give me tighter shorts or do the closed brackets around tight make it its own tag?
>>
>>109028176
also trained on gemini outputs

at least it seems to train well so there is hope
>>
File: ComfyUI-upscaled_00003_.jpg (2.62 MB, 3168x3168)
2.62 MB JPG
Oooohhh I'm upscaaaling
>>
>>109028287
Don't split the word like that. Just (tight shorts:1.2) or something
>>
>>109028315
Okay thanks, I have been trying both but wasnt sure about the general rules for tags.
>>
>>109028287
>>109028315
In my experience, something you want to make smaller does benefit from emphasizing just the adjective. Because emphasizing the noun tends to make that thing bigger.
>>
>>109028333
I have found that yes but the thing is what I am going for is only the shorts tight I want other things loose.
>>
hello nigbos

what the fuck are the new captchas?!!? im writing a whole ass exam

Which idiot came up with this shit
>>
File: 1773482753626723.png (1.2 MB, 1024x1024)
1.2 MB PNG
>>109028398
>>
>>109028398
>>109028406
cant even post my ideogram grill coz of some ip shit
>>
>>109028406
weird to flex that you are so retarded, you have to pay not to solve captchas
>>
>>109028420
if you solve enough captchas it reduces from 3/3 to 1/1 captchas and then 2 no captcha posts after solving 1 captcha
>>
>>109028398
I am still getting the scroll bar below to find the image that is not like the others captcha.
Do you have a fresh cookie?
>>
>>109028420
>pay
>>
>>109028420
>retarded
>>
I pay hookers to prompt my futa gens for me
>>
File: 13252554.gif (2.2 MB, 300x229)
2.2 MB GIF
>not using the solver
oh no no no no no
>>
has anyone played around with the Ideogram 4 prompt builder KJ node?

it has photo, aesthetics, lighting, medium, parameters. How important are these? What kinds of inputs do you use for them?
>>
>>109028412
>iMaGE bLOckeD bY sAFetY fILteR

files. catbox. moe / n0zy7h. png

>>109028406
paypig

>>109028455
sharee it
>>
>>109028459
post some kinos first
>>
>>109028472
cant post here, hence the catbx
>>
>>109028477
i didn't see any nuclear explosions in that link
>>
File: 577043181091757.png (2.13 MB, 1152x1600)
2.13 MB PNG
>>
File: 1770656698200790.jpg (679 KB, 1248x1824)
679 KB JPG
>>
File: 1761668499831994.jpg (1.22 MB, 1248x1824)
1.22 MB JPG
>>
Can I train a LoRA on civitai but not publish it?
>>
File: 1750556101500141.jpg (1.13 MB, 1248x1824)
1.13 MB JPG
>>
>Florence-2 Large
>JoyCaption Alpha Two
>Qwen2.5-VL
which one should I use for natural language tagging? Quick.
>>
>>109028605
I don’t know why you just wouldnt use any other gpu service if you absolutely must rent a gpu
>>
>>109028653
Gemma 4
>>
>>109028660
I have some buzz I wanna put to good use
>>
>>109028605
Yes
>>
File: Ideogram_0066.jpg (420 KB, 1264x1680)
420 KB JPG
>>
niggerjack including herself in the OP again and also making a shit collage and also hijacking the thread once again.
>>
File: 44564.png (18 KB, 666x132)
18 KB PNG
this is too much. the great kino purge is coming soon
>>
>negative prompts were being bypassed for last hour

how didnt i realize
are negatives a total fucking meme what the hell
>>
>>109028991
>are negatives a total fucking meme what the hell
no. i found it to be a very useful thing when i realized i could use it to remove random things from appearing in my generations, like wrist watches and finger rings. typing "no write watch" into the main prompt never worked
>>
>>109028991
Negatives don't work on distilled models unless you're using NAG
>>
hand detailer is affecting way more than hands
what do
>>
>>109028991
As long as the model supports negatives, they are far more useful than generic masterpiece positives.
>>
>>109029022
funny cos i dont even know what a distilled model is i just gen 1girls with sex toys
>>
>>109028256
Yeah I prefer klein over zit because I am doing more i2i than t2i, everying i2i thing I see for zit isnt true i2i but rather how to adjust one zit image using another zit image.
>>
>>109029032
many such cases
>>
>>109028009
thot party las thread huh? nice nice
>>
>download random workflow from civit
>it has a bypassed "anus detailer" part
>>
File: Ideogram_0069.jpg (345 KB, 1936x1088)
345 KB JPG
>>
>>109029061
did you plug it back in?
>>
>109027240
>109027514
>>109027606
>>109027713
>>109027715
>>109027720
>>109028781
the raped
>>
>>109029129
welcome back, raped one
>>
File: Ideogram_0070.jpg (433 KB, 1936x1088)
433 KB JPG
>>
>>10902914
turn your monitor on, Julien
>>
>>109029155
what does this mean? how is he supposed to read this if the monitor was off?
>>
>>109029171
If you haven't noticed, tran is basically a low IQ retard for keeping this retarded crying about his lover up for years
>>
>singular schizo anon theory
>>
>>109029213
I would be very sad with humanity if there were two schizos that want Julien's dick up their bum and continuously screech about it
>>
>>109029171
>>109029181
>>109029228
Whatever you say, you are still a worthless unemployed raped retard that should kill itself ASAP
>>
>>109029240
shut up drama tranny
https://rentry.org/LDG_vital_info

post discussion and kinos
>>
>>109028305
ideogram? looks cool
>>
don't forget the flip julien
>>
>>109029256
funny how you never post that when the most subhuman attention-starved worthless avatartranny makes xir posts
especially when you're already an expert on replying to yourself
>>
>>109029152
reminds me of batman beyond
>>
>>109026151
>>negpip
>>cfg normalization
>>shift scheduling
>>a good prompt
>desu all u need
How do I do normalization and shit in forge neo? I just move the bars.
>>
Anyone know a good background model for replacing backgrounds I'm tired of anima's backgrounds and want to change it in another pass
>>
>>109029441
just remove the background then gen a background only
>>
>>109029446
How do you guys remove backgrounds, I try to use rembg and other tools but I always get leftovers and shit on small parts if the character has gaps showing the background, there is always that doesn't get removed or that gets stuck on the outline/border
>>
>>109029446
Best node for that and what is a good model?
Klien or something?
The scaling from anima is poor for backgrounds so I need a more robust model
>>
>>109028009
https://rentry.org/LDG_vital_info
>>
On the third pass in ltx 2.3 I'm getting color shifts, but it's not the tiled vae decodeas it's the same with just regular decode. Send help.
>>
>>109029521
what do you mean third pass?
>>
>>109029477
We have the same guy that passed malware multiple times post his spam daily anon. Why are you trying to protect anons from a repeat offender?
>>109029446
wanted to ask if you know of any small image diffusion models that specialize in text
>>
>>109029524
Three passes to generate the video. It solves a lot of issues, worked for me the past weeks but now it's shifting in brightness only on the third pass.
>>
File: 5845266.webm (3.98 MB, 420x291)
3.98 MB
3.98 MB WEBM
>>109029537
oh. i don't know. i only do single pass generations because people said it gives better quality results. sounds like something in your software got updated if it was working earlier
>>
how many epochs before my LoRA starts resembling the character?
>>
File: 2564786.webm (3.72 MB, 420x291)
3.72 MB
3.72 MB WEBM
>>
>>109029470
use rembg or sam3 (latter is better). just gen the character on a simple background, one color. remove the background then just layer the char on another gen that's just a background
>>
>>109029578
>>109029559
did you train a LoRA of this girl? How do you get the same face in every gen?
>>
How to prompt pubic stubble?
>>
>>109029576
Not enough information
>>
>>109029528
>know of any small image diffusion models that specialize in text
no I don't. Sorry
>>
>>109029588
36 images character ZIT LoRA
>>
>>109029576
it should be fairly fast, 30 epochs
>>
File: 233245.gif (3.58 MB, 300x208)
3.58 MB GIF
>>109029582
no, i just put her character sheet as the first few frames and then terminate it with a black screen so the actual text prompt can kick in. you have to generate at a higher resolution for the face to be replicated correctly
>>
File: 1776203788023910.png (256 KB, 680x976)
256 KB PNG
>>109029606
>I had only enough buzz for 5 epochs
>>
>>109029619
how much buzz was it?
>>
>>109029625
500
>>
>>109029152
Comfy...
>>
>>109029630
it didnt have repeats for the dataset? did you try it?
>>
>>109029630
I put in dataset as test. 144 images, got 10 epochs with 2 repeats, cost 1k buzz. using the settings it gave as standard
>>
File: ComfyUI-upscaled_00001_.jpg (1.88 MB, 3104x3104)
1.88 MB JPG
>>109029263
Yeah+lora&upscaling
>>
>>109029663
>>109029731
how many epochs and repeats should I set for
>character
>LoRA
If I have around 100 images
this was my first time training a LoRA and I shit the bed. There's no resemblance.in output
>>
>>109029816
The civitai testrun hasn't finished yet so I'm not sure if it even works. Looks like it's using batch 1, lr 0.0001, for 1024 resolution. I don't have my hopes up
>>
File: Ideogram_0078.jpg (1.1 MB, 1936x1088)
1.1 MB JPG
>>
File: 25434565.webm (3.86 MB, 420x291)
3.86 MB
3.86 MB WEBM
>>
File: ComfyUI_00774_.png (1.05 MB, 896x1152)
1.05 MB PNG
>>
File: Ideogram_0080.jpg (1.07 MB, 1936x1088)
1.07 MB JPG
>>
Improve my prompt?

> A beautiful blonde woman with long wavy hair and bright pink painted toenails, wearing a tight white tank top and tiny pink lace panties, sits on the edge of a luxurious bed and uses her soft bare feet to tease and then deliver repeated firm ballbusting kicks and stomps to a naked man lying on his back on the floor in front of her. She smiles playfully while pressing her pink toes down on his cock and balls before kicking him hard, the man groaning and writhing in pain, elegant bedroom with soft lighting, realistic fetish video style, natural indoor lighting, detailed feet and pain reaction.
>>
File: 1259694.gif (3.1 MB, 300x208)
3.1 MB GIF
>>
>>109029992
you forgot to specify your brown penis and favorite holy cow watching
>>
File: too_late.jpg (81 KB, 626x986)
81 KB JPG
>>109030219

too late, already generating
>>
>>109030229
what model?
>>
File: wtf2.jpg (1.59 MB, 5162x1556)
1.59 MB JPG
>>109030271
wan2.2, trying to train my own LoRA on clippings from my... collection
>>
>>109030282
good, keep that coal away from the kinoplexatorium
>>
>>109030294

> kinoplexatorium

what did he mean by this?
>>
>>109030300
>doesn't know what kino is
Suffa bish
>>
>>109030329
I'm sorry. I'm only here to learn and goon at scale. Can you tell me what it means?
>>
>>109028605
Yes
>>
>>109030366
You wouldn't get it... But also your endeavor is noble
>>
File: 303468891816649.png (965 KB, 992x1000)
965 KB PNG
>>
>fat loras stay up
>skinny loras get obliterated
what the fuck is civit's problem? being obese hambeasts is OK but women being a bit too skinny is VERBOTEN?
>>
>>109030413
promotes eating disorders like anorexia, sweetie
>>
File: hfghg.png (416 KB, 600x800)
416 KB PNG
deal with it, chud
>>
>>109028009
haven't been here since last summer or so, tell me we have something better than wan 2.2 and that one model that did pics well, chroma I think it was called? tell me there's been any progress
>>
>>109030457
Anima, Ideogram, LTX
>>
File: 719240989004427.png (1.26 MB, 1536x1024)
1.26 MB PNG
>>
File: 4756445.webm (3.97 MB, 420x291)
3.97 MB
3.97 MB WEBM
>>109030457
>tell me we have something better than wan 2.2
we do
>>
>>109030495
okay 20 seconds is nice
how jewed is it
>>
ComfyUI's blog is AI-generated...
>>
File: ComfyUI_00729.jpg (3.35 MB, 1500x1920)
3.35 MB JPG
>>109028653
Qwen 3.6, the biggest quant you can run.

>>109029061
Bold of him to assume you didn't want detailed anuses.
>>
>>109030524
it can go past that. i have made 2 minute long videos in a single shot before. i just lower it so i can iterate my prompts faster. as for how jewed it is, go look up the company that made it
>>
>>109030568
back to wan then okay
>>
File: 1780816036224028.gif (2.07 MB, 498x404)
2.07 MB GIF
>>109030581
>>
>>109030465
Ltx giving audio at the same time is nice but for video strictly wan 2.2 seems much better
>>
>>109030586
Brother imma need an explanation of your stack like asap. The dancing is mid but like everything else is crazy better than anything I can get. I use LTX though I am assuming this is Wan? Got 64GBs of vram.

I need sauce to reproduce this rn. Give me like a step by step with prompt and a fixed seed and I'll send you a $10 tip in xmr if I can reproduce.
>>
>>109030527
Remember when comfy would write blogs? All the sovl from the project is gone. Comfyui is just another corpo grift
>>
This is my first time every downloading or trying anything even remotely related to AI on my system as I recently upgraded my computer from a shit rig to an okay rig. Any suggestions, tips or advice I should know before going in?

Most gonna be using it for fun, generating tabletop character art and maybe some nsfw so no censors would also be nice.
>>
>>109030704
>This is my first time every downloading or trying anything
anima

>generating tabletop character art
anima

>maybe some nsfw so no censors would also be nice.
anima
>>
>catastrophic forgetting
>>
>>109030656
i just use wan2gp, so i don't have a workflow or whatever you guys use. the seed and prompt are in the metadata. you can see how i use frame injection to show the model what she looks like, and then the prompt describes her in detail so it knows what those frames were showing. don't forget the KINOSOVL screen, or else it won't work
https://files.catbox.moe/g39ehg.mp4
>>
File: 1780500097412143.png (588 KB, 880x1320)
588 KB PNG
thoughts?
>>
>>109030748
noob filter
>>
>still no proof
>>
whenever i train a lora for flux klein, the results resemble the character decently but the character always comes out looking gaunt, as if they havent eaten in months.
does anyone know how to fix this?
>>
File: 1245757.gif (2.29 MB, 320x222)
2.29 MB GIF
okay i'm going to bed, goodbye
>>
>>109030720
Seems limiting, I've played with NovelAI before and really do want an on computer generated AI system. Seems neato.
>>
File: ideogram4_00011_.png (3.13 MB, 1184x1776)
3.13 MB PNG
>>
File: Ideogram_4.0_00054_.png (2.61 MB, 1456x1456)
2.61 MB PNG
>>
AI toolkit's Ideogram captioner is a goddamn piece of shit! Just locks up randomly, no error, vram still loaded just stops working after a few minutes. 8b, 4b, 2b, doesn't matter it just doesn't function!
>>
>>109031040
>turk patreon scammer produces garbage
w-whoa??
>>
File: Ideogram_4.0_00057_.png (2.25 MB, 1456x1456)
2.25 MB PNG
That's enough of that. Getting some sd15 real vibes since the model likes to put in shit that definitely wasn't asked for. Like lines on the skin from a medical skin marker where the surgeon is going to cut.
>>
>>109031055
I don't think Furkan has anything to do with AI Toolkit. Unless you mean a different turk?
>>
Big Russ when he uploads an Anima update
>>
Fuck this! I give up. I'm going in with my natural language .txt captions
>>
>>109030830
Did she died?
>>
>>109031132
me on the left
>>
>>109028176
ideogram's safety filter is on the same level as deepseek's, how the fuck do you even remember it's there it's easy as hell to bypass
>>
so the boxes where you get to add details, characters are only for ideogram model? not possible with klein?
>>
File: ComfyUI_Anima_04513_.png (1.11 MB, 1344x960)
1.11 MB PNG
>>
>got a taste of seedance 2

Fuck, localbros.. I just hope we get some tech to fit large models and speed shit up and get jailbroken api models.
>>
>>109030748
usually actually avoidable with more bboxes
>>
File: 1768334690444403.png (1.55 MB, 1280x720)
1.55 MB PNG
Do you guys still face anatomy problems or is it just me?
>>
>>109031752
sometimes, especially with fingers
>>
>>109031752
it's normal but the frequency of it happening depends on the model, prompt, lora, resolution etc.
>>
File: 1777492719417525.png (995 KB, 1360x768)
995 KB PNG
>>109031764
>>109031762
editing with flux klein seems particularly bad with this
>>
File: ComfyUI_00784.jpg (3.16 MB, 1500x1920)
3.16 MB JPG
>>109030748
I don't know about everyone else, but i feel safer.

>>109031752
Nah, it happens pretty regularly. Sometimes it's from vague limb placement (not being explicit in prompting what should be where), a pose the model has little to no idea how to create or it has a similar one with more weight, or just plain ol' "noise" spinning off in complimentary directions so it decides to pick both. That's why you just keep pulling.
>>
>>109031783
also depends on the edit. but if you for example change the pose/scene entirely it can easily happen, sure.
>>
>https://github.com/zlab-princeton/i1
>https://arxiv.org/pdf/2606.11289
Interesting paper with some bold claims like:
Encoder-decoder models (T5Gemma) outperforms decoder-only LLMs (Qwen3, Qwen3-VL) as TE
LLM instruction tuning has minimal impact, not that anyone bothers to release base models anymore
Multi-text-encoder setups (Flux 1, SD 3.5) are useless and the gains don't come from the additional TE
They don't exactly put it this way but small models don't benefit from large TEs to any meaningful degree. (Schizos who claim Anima should have used Qwen 3 4b or whatever might want to read this)
AdaLN is most likely unnecessary for Flow models.
Marginal difference in output quality with training small models on a few million images vs a few dozen million, though again the "quality" here is the benchmeme so...
Not that interesting but worth mentioning:
Long captions improve model performance but provide ass results when used with short prompts. I wonder how much a post-training finetune on shorter prompts would help to make model more flexible, or varying caption lengths during training.
Some limitations like:
Benchmemes are used heavily for evaluation
Very high cfg inference (though they are based for using rescale cfg)
>>
>>109031783
It's the distill, not necessarily the edit causing it. The distill can fuck up anatomy on t2i too.
It's a tradeoff you make for quick 4 step gen.
>>
when is pony v8 coming
>>
>>109031872
>Long captions improve model performance but provide ass results when used with short prompts.
Why not train with several different captions?
>>
>>109031883
Flux2.dev is better? No one talks about it anymore.
>>
>>109031783
This shit shouldn't happen with edits on klein desu, how are you prompting
>>
Been using nano banana a lot recently, still feels so far ahead of what local offers.
>>
>>109031883
Klein prompted properly literally doesn't change shit beyond what you want it to for edits though, IDK how you'd wind up with crazy anatomy issues.
>>
File: Untitled.png (213 KB, 1468x1240)
213 KB PNG
>>109031872
>Very high cfg inference (though they are based for using rescale cfg)

personally I use a cfg of 2048
>>
File: 1758686423488027.png (701 KB, 1360x768)
701 KB PNG
>>109031825
this shit is straight up cursed
>A woman lays on a black couch with her feet up
>>
File: 1774902321313149.png (587 KB, 1360x768)
587 KB PNG
>>109032057
oops misquoted
meant to quote >>109032016
>>
File: 043.png (1.34 MB, 928x1248)
1.34 MB PNG
>>109032022
who asked
>>
>>109031825
Do you have to be like extremely verbose and explicit?
>>
>>109032008
No I meant the 9b-base vs 9b (step distill)
Few people use Flux 2 dev.
>>
File: ComfyUI_169090_.jpg (407 KB, 1069x1920)
407 KB JPG
>>
>>109032081
I'm using \Flux2_Klein_9b_kv
is that a bad one?
>>
>>109032090
>kv
>kissless virgin
>>
File: Klein9BDistBeforeAfter.png (1.79 MB, 1152x1536)
1.79 MB PNG
>>109032057
If this is an edit, e.g.

"Change the style of photographic image 1 to 8-bit pixel art while keeping the original compsition and layout and character design exactly the same."

4 steps with 9B Distilled.
>>
>>109032090
KV boosts speed but hurts quality.
Either use just 9b (non kv) and reroll seeds for bad gens or switch to base and sit through 20+ steps.
>>
>>109031872
>we replace the small MLP adapter (2.6M parameters) used in the default setup (Section 3) with larger transformer adapters (17.2M parameters/block) with the same width as the backbone blocks
>As shown in Figure 10, despite the marginal increase in parameter count, increasing the adapter capacity consistently improves performance across all backbone architectures
Everyone that's been saying the Anima LLM adapter is a fundamental architecture flaw just got eternally BTFO. It's basically the same thing that the researchers did here.
>>
>>109032032
Not sure if I want to touch that sampler node, but how does skimmed compare to rescale?
>>
>>109032132
It was just the thread schizo and jealous devs of the stillborn Mugen failbake.
Worth mentioning that this is not the precise reason why td created the adapter (he wanted to preserve existing knowledge of Cosmos while taking advantage of a more modern TE) but it ended up working great regardless.
>>
File: Untitled.png (60 KB, 1188x708)
60 KB PNG
>>109032136
Oh I use rescale as well as skimmed.
My own custom one which has some arbitrary values.
>>
File: ComfyUI_21388.png (2.49 MB, 1920x1080)
2.49 MB PNG
>>109032080
No, just very clear. Instead of using "left" and "right" (which can be confusing right of the bat) I like to place "one hand" somewhere and use "the other" to place the opposite hand. It easier to make the subject cross their body or place one behind their back without those cardinal directions getting in the way (even though it understands left and right for the most part).
>>
>>109032162
Yeah T5-XXL v1.1 had a max context length of only 512 tokens. Also it used a gorillion times more memory than Qwen3 0.6B obviously.
>>
>>109032169
BABY YODA!!!
>>
File: 1751650896034829.png (1 MB, 1360x768)
1 MB PNG
>>109032125
I'm trying to change the composition but keep the style and the identity. Is that simply not possible with a text prompt? It runs for 20 steps now

>Change the pose of the woman to be laying on a white couch with her feet up while keeping the graphical style and the character design exactly the same
>>
>>109032186
20 steps / higher than CFG 1 is bad if you're using the Distilled model (which I find to be better for editing, usually). Does the one you have have Base in the name or no.
>>
>>109032166
So they combine well?
>curved
What's the precise idea here, is the curve for interpolation strength? You want less interpolation on early steps for better prompt adherence, higher interpolation mid steps for less artifacts? And for low sigmas???
I use CFG override to switch to a sane value for lower sigmas. I tried tinkering amateurishly with early sigmas but wasn't satisfied with anything so I use unifrom interpolation strength when rescale is on.
>>
is bernini working in comfyui master ATM? i at least got errors
>>
>>109032200
(samefag) Here's another example anyways, using the couch image they posted as an edit input. 4 steps on the 9B Distilled model, DPM++2S Ancestral / Simple.

"The literal exact same woman from image 1 is now standing on a tropical beach. Maintain the art style of image 1 precisely."

Inverted the output resolution to 768x1360 from 1360x768 just to show that works also.
>>
File: 1778559284993566.png (1.23 MB, 1360x768)
1.23 MB PNG
>>109032200
I guess not, just flux-2-klein-9b-kv-fp8.safetensors
It's FP8 thought. Should I use the one that is 19 GiBs? I don't know how because VRAM is already at 20GB running the 9GB FP8. It's because of the Qwen 3_8b that's already 8GB. Do you have a super GPU or something?

>Change the style of the pixelated image to a real photograph while keeping the composition and the character design exactly the same

>>109032290
Doesn't work on my machine :(
>>
File: 1767319065998422.png (1.26 MB, 1360x768)
1.26 MB PNG
It feels like a waste of time to wait for 3 minutes for a image like this
>>
>>109032308
You are using the Distilled then, but the KV version as opposed to regular. I was using Q8 regular + Q8 Text encoder.
>>
>>109032463
It's impossible to find anything on huggingface really
>>
File: 20260611_124520.jpg (346 KB, 1659x1187)
346 KB JPG
https://xcancel.com/i/status/2065101440088199376
>It is a live reward trainer. Images are generated and you pick you favorite variant and your preference is fed straight into GRPO. It is a very involved process, but should adjust a model to your personal preference.
lets fucking go
>>
>>109032595
It's really a huge fucking shame ai-toolkit gets cool features first. Credit where due I suppose but the tool is a frankenstein monstrosity of barebones low functionality baby trainer for idiots and bleeding edge tech.
>>
>>109032595
Cewl.
>>
>>109032595
>>109032637
I wonder how difficult this would be to vibe implement in something like diffusion-pipe or onetrainer. Can't be that hard, right?
>>
File: Anima-00031-1609793762.jpg (849 KB, 2688x2176)
849 KB JPG
>>109032595
Neat. I hope it arrives to a more relevant tool soon.
>>
>>109032706
Probably not that difficult.
But the problem with vibe implementing shit is that you run into the risk of discovering that something wasn't working properly after wasting dozens of hours on multiple runs.
>>
>>109032595
interesting but it seems like snake oil to me.
if your dataset is good enough, theres no need to babysit it, just let it run.
if it sucks, telling the model which images you prefer won't fix anything so im not sure who this is actually for
>>
>>109032132
> >we replace the small MLP adapter (2.6M parameters) used in the default setup (Section 3) with larger transformer adapters (17.2M parameters/block) with the same width as the backbone blocks
> >As shown in Figure 10, despite the marginal increase in parameter count, increasing the adapter capacity consistently improves performance across all backbone architectures
So clip-llm adapters would work too?
>>
File: Anima-00042-1513910340.jpg (751 KB, 2176x2688)
751 KB JPG
>>
AspGuy always trains at 1024px unlike Kekstone so I feel like there's basically no way these won't be pretty decent if he uses his full sized dataset
>>
>>109033090
Catbox anon plz... I'm impressed it's anima
>>
LTX CEO on reddit:
https://www.reddit.com/r/StableDiffusion/comments/1u3a4dp/ceo_thoughts_whats_next_at_ltx/?utm_source=share&utm_medium=mweb3x&utm_name=mweb3xcss&utm_term=1&utm_content=share_button
>>
File: rei4v2_00014_.png (1.26 MB, 768x1376)
1.26 MB PNG
Got it, took some tinkering
>>
File: 00045-1134276568.jpg (507 KB, 2176x2688)
507 KB JPG
>>109033160
https://files.catbox.moe/p0ml3h.png
>>
>>109033202
based forge user. she reminds me a little of the girl on a train but missing the tan corduroy pants
>>
>>109033176
they need to focus on v2v and i2v.
no serious genners bother with t2v anymore, you’re always better off generating a strong starting frame first with something like klein or zit
>>
>women with whore faces and makeup
slop
>cutiepies with normal faces and natural makeup
kino but make sure they still have fat asses. that part is good. and make them a little younger perhaps.
>>
File: Wan21_SCAIL2_00023.mp4 (3.24 MB, 640x1152)
3.24 MB
3.24 MB MP4
>>
File: rei4v2_00030_.png (1.51 MB, 768x1376)
1.51 MB PNG
>>109033194
>>
File: Wan21_SCAIL2_00027.mp4 (2.75 MB, 640x1152)
2.75 MB
2.75 MB MP4
>>
File: Ideogram_0002.jpg (1001 KB, 1936x1088)
1001 KB JPG
>>
File: 972507224441578.png (1 MB, 832x1216)
1 MB PNG
>>
>>109031710
catbox?
>>
>>109033620
https://files.catbox.moe/lq48k5.png
>>
File: 1769352452961748.png (1.35 MB, 1024x1024)
1.35 MB PNG
klein edit 9b is very fun. glad we have a good local tool, even at 4 steps distilled it works great.

also ltx 2.3 + the extend workflows are fun. it can even clone audio if you extend.
>>
File: 1777139007041328.jpg (102 KB, 1920x1080)
102 KB JPG
>>109030531
>>109031825
>>109032169
detailed jenny anus node when
>>
I wonder if there's a good LLM prompt to create continuations of a comic in ID4. I tried to build the first page of the comic and give it the JSON as an example but the output is always too sparse or has a weird layout.
>>
File: Ideogram_0004.jpg (2.19 MB, 3056x1728)
2.19 MB JPG
>>109033632
LTX is an exceptionally powerful audio model that desperately wants you to believe it is a video model.
>>
Still thinking about that one fat Jenny ass.
>>
File: Ideogram_0005.jpg (640 KB, 1936x1088)
640 KB JPG
>>
>>109033316
>>109033456
Anon, where is the SCAIL-2 release? link?
>>
>>109033740
https://github.com/Comfy-Org/ComfyUI/pull/14373
>>
>>109029766
>>109028305
You convinced me to give ideogram a chance...

If its shite, I'm coming for that booty.
>>
I take back everything bad I said about Anima Turbo lora.
If you're getting gibberish-body-horror results while using a character lora, combine it with the turbo lora and it'll stabilize it completely.
>>
>>109030422
imagine the weight of the moderators
>>
Uh oh. Nodes 2.0 poopie!
https://github.com/Comfy-Org/ComfyUI/commit/33e6ebd0d92b270e9bd79ea74e967f7e23e7d7e8
>>
>looked in the mirror
>my first gray hairs
I'm too old for this hobby, aren't I?
>>
>position of comfy cloud and local keep switching on launcher

what jew did this
>>
>>109033929
>born to early to have an AI waifu
>joined this hobby to late for an affordable RAM and GPU
>>
Will AI let me experience teen love at some point?
>>
File: Wan21_SCAIL2_00062.mp4 (1.64 MB, 640x1136)
1.64 MB
1.64 MB MP4
>>
>>109033316
does scail run on 16gb?
>>
>>109033970
Will I ever experience the type of shallowness a beautiful woman experiences in her prime?
>>
File: Wan21_SCAIL2_00064.mp4 (1.73 MB, 640x1136)
1.73 MB
1.73 MB MP4
>>
>>109033710
tried to make it real lol
>>
it's tough coming to terms with the fact that local will never be at the forefront of technology again, and that API models are exponentially better thanks to datacenter compute while half of local is still on 2020-tier <16gb cards.
>>
>>109033960
Physically? No. Emotionally? It's pretty close.
>>
>>109033960
No. Because arrogant AI companies are gatekeeping the intelligence behind the Government and Socialite classes.

You can have the dumbed down censored models thoughever. Just dont expect them to make you feel good
>>
>>109034030
There is a boundary that always maxxes out at 80% the amount of quality if saas. local will always be free and uncensored tho
>>
>>109034030
Did the blog factory explode
>>
>>109034051
Since when truth is an exploded blog factory?
>>109033906
Don't care, still using API
>>
please learn english before posting ITT desu
>>
>>109034048
>local will always be free and uncensored tho
LOL enjoy waiting 3 minutes for a censored ideogram 4 output. using nu-local models is an actual humiliation ritual, all that just to generate brown-tinted inbred gptslop
>>
>>109034048
>be free
Where is my state mandated GPU?
>uncensored tho
Kek good joke
>>
>>109033815
yeah the turbo lora is great. i wish i could get its body stabilization while keeping the artists unique style though. it sort of "compresses" the artstyles making everything look too uniform. i think that's anima's biggest weakness, you either get really fucking atrocious bodies and poses or you get stuff that conforms to sort of a specific art style
>>
>>109034073
Anime website
Chink&Jeet hobby
>>
>>109034073
Fun fact: cloud providers only pay jeets to shill. No white men.
>>
>I FUCKING HATE CLOUD!!!!
>...even though all my favorite local models were made possible by datacenter cloud compute
>>
Is this guy still mad that LDG is the forefront of image generation on /g/ or
>>
>>109034083
I made a lora off the diff between WAI and Anima, gives you solid stability and way more granular weight control without going full WAI
>>
>>109034094
This is why ComfyUI needs to stay in the OP. ComfyCloud allows us to stay on the cutting edge and harness the power of API models with local workflows
>>
>>109034103
sounds interesting. could you post it?
>>
>>109034094
>17 hours general
Usecase for being in the forefront of a dead board of a dead website
>>
>>109034094
clearly he is, based on the two replies to your post kek
>>
>>109034051
apparently
https://github.com/Comfy-Org/ComfyUI/issues/14420
>>
>>109034092
Hate is too strong a word. Apathetic is more apt. I just don't really care that much.
>>
>>109034131
>all that crying about the frontend when he could just vibecode his own
wawa
>>
File: Wan21_SCAIL2_00060.mp4 (1.36 MB, 640x1136)
1.36 MB
1.36 MB MP4
>>109034131
>I am an advanced long-time ComfyUI user,

stopped reading right there,
>>
>>109034113
I would but i don't post anime stuff outside its own generals (4chan culture), the process is the same as sdxl lora checkpoints
>>
>>109034122
Strange that you are still here, then?
>>
>>109034147
what makes you say this thread doesnt have anime?
>>
>>109034153
Usecase for being in denial that 4chan is an abandoned imageboard?
>>
>>109034167
>abandoned
But you are still here?
>>
>Finding features that were first deployed in closed-source text-to-image models without any preceding or concurrent academic literature is exceptionally rare. This scarcity exists because the primary closed-source companies (OpenAI, Google, and Midjourney) are heavily staffed by academic researchers or constantly adapt features directly from pre-print servers like arXiv.Core capabilities like text-to-image, inpainting, outpainting, and image-to-image translation all originated in open academic literature long before they appeared as polished consumer tools.
>>
>>109034169
Usecase for Ad Hominem?
>>
>>109034075
>LOL enjoy waiting 3 minutes for a censored ideogram 4 output.
wait unironically are you unaware of how trivial jailbreaking it is? surely enough time has passed that anons techniques have proliferated to the other parts of the web
>>
File: Anima_ZiT_img2img.jpg (70 KB, 1000x1000)
70 KB JPG
>>
>>109034198
Neat anon, catbox? Looks so real lodestone losted!
>>
>>109034188
But anon, I don't need to jailbreak the fridge in my house, so why should I jailbreak a local model?
>>
>>109034181
Thank you for replying to my post.
>>
>>109034219
you couldve just said you dont know how
>>
>>109034011
turned them into mormons lol.
>>
>>109034250
>>109034188
>>
>>109034188
>jailbreaking
i thought this shit was only for apikeks? why is local now resorting to having to jailbreak their own models? what a cope
>>
File: 27742123412854.png (1.93 MB, 1024x1536)
1.93 MB PNG
>>109034075
i prefer brown girls
>>
>>109034261
1 local model affected vs every single cloud model affected
kek
>>
>>109034265
you don't have to jailbreak novelai, and it's a true anime base model
>>
>>109034261
>their own models
In thier own houses
>>
CELEB AI IS BACK ON B
I REPEAT
CELEB AI IS BACK ON B
>>
>>109033929
I had gray hairs since I was a teenager
>>
>>109034268
paying for the pleasure of sending your loli desires straight to the alphabet boys
kek
>>
>>109034265
I wouldn't use that as an excuse I remember when lmg said the same thing about Llama 3.
>>
So this is just to make porn then? Or is this used for anything of value?
>>
Friendly reminder: use NovelAI now. Their payment issues have spread even to longtime users. At this rate, that company’s gonna bream, and we will lose one of the best anime models.
>>
>>109034290
i dont remember asking what you would and would not use tho
>>
>>109034295
>So this is just to make porn then?
Sometimes
>Or is this used for anything of value?
Value is subjective
>>
>>109034295
I can't remember the last time I purposefully generated a NSFW image.
>>
>>109034301
Porn is subjetive tho
>>
just starting nsfw i2v stuff
give it to me straight, is it A LOT of rerolls generally to get something that is consistent?

i feel like im ffighting multiple problems at the same time and not sure how to begin to start
using anime/2.5d style images, outputs are sometimes becoming too real and whole shading look of character changes drastically
getting artifacts and blurry bits on motion
prompt isnt being adhered to enough

thankfully gen time isnt long, like 300s for a 1500x3200 vid, 6s long with 24fps so i am just doing bunch of trial by error bullshit right now but
>>
>>109034295
It sucks at porn too. We should all just stop doing AI.
>>
>>109033632
can you move on to the next normie meme already?
>>
wtf is comfy desktop
>>
>>109034383
The aborted sibling
>>
is there a way to automate changing lora with every next generation?
I've 24 different LoRA I want to test on the same seed
>>
>>109034383
its for those with a learning impairment or developmental disability
>>
>>109034393
load them all, dynamic prompt them in
>>
>>109034393
Yes
>>
File: 1764433050416465.png (494 KB, 427x679)
494 KB PNG
>>109034295
I'm anti-porn so... I made a VN, some minigames, and a couple personal sites, some pure aesthetic things.

But AI is still very hated so I dont get any love.

So really, if you're an "ideas" guy, right now the ceiling is basically 2d games, VNs, etc.
>>
>>109034393
yes there is a way, just vibecode a lora loader node that cycles after each output
>>
>>109034415
ty, but why attach your slop? so i don't believe a word you say?
>>
>>109034415
>I made a VN
Help me senpai
what's the best way to retain character likeness across the entire thing?
>>
>>109034446
Make a lora.
>>
>>109034295
no, all types of art and also porn

sure, also "of value", see all the increasing usage in politics and advertising and also work.
>>
File: 1752891259107156.png (3.15 MB, 1888x1050)
3.15 MB PNG
>>109034446
Its hard... very hard.

Right now, making sprites is easy, you just generate the base, I usually generate a full body so I can position it anyway I want (usually stands at around 1500x2600 pixels)

then I generate their mouth flaps (2 frames) and their blinking (3 frames)

Expressions are hard because you don't want the hair to change position so I use the auto segmenter or whatever its called when inpainting.

So then you have a sprite, but what about the UI and stuff.
I just use Nano Banana or GPT image 2 to generate the little buttons and such.

Then the hardest part of all. CGs
Getting consistency in CGs is basically the shittiest part about making VNs - because you're not just fighting camera angles, and background consistency but also character consistency, genuinely at this point you just have to open photoshop and learn how to "inpaint draw" which is a mix of drawing and inpainting.

So yeah, its not easy, but at least the coding is a complete afterthought.
>>
>apicucks can't even activate their neurons properly
i pity you
>>
It is always interesting to read anons blockers and witness him blame the tool rather than his own lack of skill and knowledge...
>>
>>109034494
You can probably make UI mockups with ID4 now
>>
>>109034552
How true is this? Can someone post an example? What else game related stuff can it do? I need some topdown tilesets.
>>
>>109034567
You map out each element with a bbox
>>
just use nano banana pro
>>
>>109034612
do you have a free account for me to use?
>>
File: 1765642058930341.jpg (12 KB, 385x304)
12 KB JPG
how to stop prompt bleeding in anima
>>
>>109034612
This is /ldg/ sweaty, we don't like your kind (browns).
>>
Almost time for a new thread. How you feeling about your chances of getting into the collage? I’m feeling pretty confident myself
>>
>>109034683
i have zero chance
>>
File: Wan21_SCAIL2_00002.mp4 (3.81 MB, 1872x528)
3.81 MB
3.81 MB MP4
>>109033748
It appears Wan2.1 SCAIL-2 is better than Wan2.2 Bernini for face tracking.
>>
>>109034552
> ID4
Is it really four? Never heard about 1, 2 and 3.
>>
File: igram.jpg (290 KB, 1328x752)
290 KB JPG
>>109034446
lora or the various methods to use a character/face reference (easiest of which is probably just using edit capable image/video models that are good at retaining likeness)
>>
>>109034689
looks good, im gonna test it
>>
fast ideogram when. great model, but seriously slow...
>>
>>
File: igram4.jpg (275 KB, 1328x752)
275 KB JPG
>>109034691
4 is the first one you'd care about



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.