[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor application acceptance emails are being sent out. Please remember to check your spam box!


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107227636

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://rentry.org/wan22ldgguide
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd
https://gumgum10.github.io/gumgum.github.io/
https://huggingface.co/neta-art/Neta-Lumina

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
comfy should be dragged out on the street and shot
>>
cursed spitebake of misery
>>
>>107238019
kys julien
>>
Blessed thread of frenship
>>
>>107237999
>grok video in faggollage
>>
>>107238050
it's a troll bake
real thread:
>>107237888
>>107237888
>>107237888
>>
>>107238080
No one is going to use tranistudio, not now, not ever. Give up.
>>
Comfyui is fucking dog shit
>>
Can someone please explain to me why julien is off his meds? He has been behaving very erratically over the past week. I know he has autism, but usually he knows better than to try and mess with the /ldg/ OP.
>>
so there's this implementation of kandisnky in comfyui
https://github.com/Ada123-a/ComfyUI-Kandinsky/
but then kandinsky team has their own implementation?
https://github.com/kandinskylab/kandinsky-5/tree/main/comfyui

anyone tried either?
>>
File: ComfyUI_00261_.png (1.18 MB, 1280x1120)
1.18 MB
1.18 MB PNG
>>
>>107237999
put anistudio in the op so the schizo has to spread across all chan diffusion threads. fill the other thread first
>>
File: ComfyUI_00266_.png (1.01 MB, 1280x1120)
1.01 MB
1.01 MB PNG
>>107238405
the anti anistudio schizo is a netayume poster, I know this cause I mindbroke him by asking for an anti-netayume poll, and he copied my idea for his anti ani schizo polls
>>
>>107238226
The unofficial ones seems a lot better. Try that.
>>
File: 1733915560397991.png (241 KB, 1347x959)
241 KB
241 KB PNG
>>107238461
I am but it's taking a while. Oh, I just noticed it's 50 steps. Shiiiieeet. Spoiled by lightx2v
>>
MIGRATE TO COMFY THREAD
>>107238591
MIGRATE TO COMFY THREAD
>>107238591
MIGRATE TO COMFY THREAD
>>107238591
...
>>
Again, what set ani off?
>>
File: 1763203829792991.mp4 (1.92 MB, 736x496)
1.92 MB
1.92 MB MP4
>>107238557
>Prompt executed in 01:08:46
wew. not worth it
>>
>>107239370
Damn that's rough
>>
I fixed a major bug with my Kandinsky implementation, try again
>>
didn't notice cause I mostly did short videos for testing but full length videos had noise issues
>>
there is still a issue with windows not liking my torch compile stuff, so windows may still has error messages btw, but noise should be fully fixed
>>
actually there might be one more issue... this is complicated. I blame their own implementation being rough
>>
>>107239370
whats the gen time for single image?
>>
File: 1746449791802563.mp4 (1.1 MB, 688x448)
1.1 MB
1.1 MB MP4
>30 steps
>hatsune miku is sitting at a desk typing on a laptop. the laptop faces away from the camera. hatsune miku turns the laptop to face the camera. on the laptop screen is the black text "/ldg/" on a white background. hatsune miku smiles and does a peace sign with her hand
It... it doesn't know migu
>>
ok, the original repo had a bug with tiled vae decoding which caused big noise issues, I had wrongly thought scheduler_scale was the issue due to the repo having bad documentation there with defaults and suggestions not matching
>>
>>107239934
Is this kandinsky? Looks like actual anime, not like that 3dslop wan produces
>>
File: 1751974069161743.png (96 KB, 604x832)
96 KB
96 KB PNG
>>107240058
>Is this kandinsky?
Yeah. Now trying to describe her appearance and see what happens
>>
schizo holocaust when
>>
>>107239934
>it doesn't know migu
Thank goodness! Based model
>>
>>107239934
I believe at this point all major models instruct whatever they are using to tag images not to tag copyrighted characters or real people.
There is no other explanation why almost everything released since SDXL struggle to do even most popular characters.
>>107240076
Won't help too much. It's not the same as knowing the character, for example the facial features will be off.
>>
>>107240223
now
>>
File: 1754328866637162.mp4 (2.08 MB, 688x448)
2.08 MB
2.08 MB MP4
>it really doesn't know miku
it's over...
>>
>>107240390
the thing is, its 2B, so you could super easily train it who that is with just a few images
>>
File: 1747737947788309.png (1.9 MB, 1120x1440)
1.9 MB
1.9 MB PNG
>>
>>107240408
that was the 20b q4 gguf
>>
XL until the heat death of the universe
>>
btw, still some bugs in implementation. I should have made it hidden until it was finished
>>
>>107240517
It's not bleeding edge unless it cuts. Slap a warning on that bad boy and call it a day
>>
File: 1737341319542217.jpg (1.55 MB, 1248x1824)
1.55 MB
1.55 MB JPG
>>
File: comfyui__00036_.png (1.02 MB, 1024x1024)
1.02 MB
1.02 MB PNG
>>
We need Alibaba to keep releasing their models to the community.
>>
>>107240791
Alibaba said no.
>>
They should make a computer that runs not on electricity but from you fucking it with your penis and it giggles
>>
>>107240797
[citation needed]
>>
>>107240818
Trust me, insider sources who wish to remain anonymous have told me this.
>>
>>107240791
Putting all your hopes in a single company is mega retarded doodoo head
>>
>>107240969
No it isn't you stupid idiot. They're the only company who have been continually releasing their video models up to this point, so it's only natural they are the best hope to keep releasing good video models.
>>
>>107240390
>Zero Japanese knowledge

I would say that's impressive, but given they wanted the model to know Russian that gives a clue as to why it doesn't know Miku. A shame, onto waiting for something good from China.
>>
>>107237999
Been a bit out of the loop, are AMD cards still shit for local AI and if not would the R9700 be a good card if you can't go for the top notch stuff and want to be a bit future proof with the 32gb of ram?
>>
>>107241081
>are AMD cards still shit for local AI
yes
>>
>>107241025
Give me the name of a company who delivered more than once and didn't pivot to closed source. There is not a single lab who released something wildly successful and then followed it up with another. Subsequent releases are always either shit or closed source.
It's not controversial to say someone will take Alibabas local throne. That's just the way it is, the way it has been since the invention of genai.
>>
>>107241081
>are AMD cards still shit for local AI
y
>the R9700 be a good card
n
>want to be a bit future proof with the 32gb of ram
lol, 32gb struggles now, much less future proof
>>
>>107241093
>>107241109
Well, fuck. Thanks for the quick reply though.
>>
>>107241138
If you use Linux then you can make AMD work. Nvidia is said to have better performance of course but I wouldn't know.
>>
>>107241109
How about when you just wanna do ComfyUI and similar stuff?
>>
Is regularization dataset supposed to be tagged with simply "a photo of a man" etc?
>>
>>107239695
KandinskyImageToVideoLatent has an extra tab for latent_frames declaration and is in the exception handler.
>>
Asked in wrong thread fuck my life. Anyways

I've been using i2v for wan 2.2 a shit ton, I like the 3d blender type of style used in animations. Is there a local gen model that's actually good at that so I can gen my own base images?
Last I used local imagegen illustrious was the meta and that was awful at any 3d
>>
>>107241801
Nothing spectacular but in my experience Flux and Qwen are the least-worst at generating 3d render style images.
Flux produces cartoony-looking people in that style and Qwen has absolutely zero variety in its outputs, so pick your poison.
Also I am definitely no expert but try playing around with various sampler+scheduler combinations, I think somebody said that some combination of (deis, heun) sampler and (beta, linear_quadratic) scheduler gets decent results in that style. Play around and see what you get.
>>
>>107241790
thank you
pushed some other changes as well. Added preview as well
>>
>>107241081
>are AMD cards still shit for local AI
they're fine on linux.

>and if not would the R9700 be a good card
for LLMs, yes
https://www.phoronix.com/review/amd-radeon-ai-pro-r9700/2

in image gen benchmarks, 7900 XTX appears to be faster, but that could be due to immature R9700 drivers. I haven't seen a really trustworthy benchmark comparing this. I suggest considering a 7900 XTX, since it's cheaper and still has 24GB VRAM.

>>107241093
>>107241109
njudea FUD
>>
File: ipndm_beta_10step_00001_.png (1.33 MB, 768x1280)
1.33 MB
1.33 MB PNG
>>
>>107241906
if you want to deal with troubleshooting and non-existent support, have fun. if you just want to gen then get an nvidia
>>
>>107241958
It's true AMD still requires a bit more config and research than nvidia for local gen, but it's nothing crazy if you're not low IQ. and this is cutting edge experimental tech, you will have to troubleshoot issues alone no matter what brand you're using. someone who is scared to use an AMD card shouldn't bother with local gen yet anyway, they'll get frustrated and give up the moment they try to work with comfyui.
>>
>>107241892
Thanks for running the vibe to get this going the i2v for this model is really fast with comparable outputs to wan so far.
>>
>>107238261
I'm more triggered by the broccoli head
>>
>>107242016
yep, and the biggest deal is that people will be able to do a full finetunes on it since its only 2B. I think the 20B will be a after thought. 2B should be the new sdxl, small enough for people to actually bother training
>>
>>107241958
FUN
UUN
NNN
>>
>>107242016
>doesn't post the the outputs
>>
File: img_00009_.jpg (773 KB, 1264x1656)
773 KB
773 KB JPG
>>
File: 1759604701295889.mp4 (3.55 MB, 640x592)
3.55 MB
3.55 MB MP4
>kandinsky5lite_i2v_5s
Uhh, I guess the input image is just a suggestion
>>
>>107242253
That's literally me when I hide my power level IRL.
>>
>>107242138
nta but here is this:
here is a 2B I2V attempt
https://files.catbox.moe/3dgy3a.mp4
>>
>>107241559
no, the model learns to gen these images
>>
>>107242310
Do you dont tag them at all?
>>
File: ComfyUI_00034_.mp4 (476 KB, 480x832)
476 KB
476 KB MP4
>>
>>107242325
Bazinga!
>>
>>107242283
another attempt
https://files.catbox.moe/qtd3qm.mp4
>>
>>107242318
i don't use reg images
>>
File: img_00018_.jpg (631 KB, 1264x1656)
631 KB
631 KB JPG
>>
File: 1762277945847062.mp4 (3.54 MB, 448x784)
3.54 MB
3.54 MB MP4
>the woman grabs her breasts. the woman massages her breasts. she sticks her tongue out and sneers at the camera
k, a second woman instant-transmissioned into the frame. kandinsky i2v 2b
>>
>>107242466
did you pull latest? a bit ago I fixed a error with I2V >>107241892
>>
>>107242486
yeah I cloned like 15 minutes ago, I got the preview window
>>
File: ComfyUI_08583_.png (1.94 MB, 1152x1152)
1.94 MB
1.94 MB PNG
>>
>>107242510
huh, I just used it fine a moment ago.

10.0 scheduler scale, 5.0 cfg, 20-50 steps, 768 x 512 res?
>>
>>107242466
>teleports behind u
>>
>>107239695
anon should add it to OP https://github.com/Ada123-a/ComfyUI-Kandinsky/
>>
>>107242528
What the fuck is that seating arrangement.
Their eyes are fucked up.
The trolley for the refreshments is retarded looking.
Terrible
>>
File: 1753805762102054.mp4 (3.02 MB, 496x736)
3.02 MB
3.02 MB MP4
>>107242533
>10.0 scheduler scale, 5.0 cfg, 20-50 steps, 768 x 512 res
yeah, this is 20 steps
>the woman turns around and types on the computer keyboard. on the computer monitor appears the text "/ldg/" in black font on a white background. the woman looks back at the camera and smiles
don't know how this is comparable to wan desu
>>
>>107242637
what in the world. Something is wrong here. Mine is not doing that and I'm the one who pushed the changes.
Here is a earlier I2V before I fixed the noise
https://files.catbox.moe/la9r93.mp4
>>
I must have missed something, I'm checking
>>
ok, now pull and try it
>>
File: 1752460198317977.mp4 (3.73 MB, 736x496)
3.73 MB
3.73 MB MP4
>the anime girl gets inside the car and closes the door. the camera follows the car as the car drives off into the distance
wasn't expecting that. it seems to have trouble comprehending stuff already in the frame. it is only 2b though
>>107242757
will do
>>
>>107242325
>>
>>107242767
I was not fully passing the conditioning for I2V, missed one line
visual_cond_input[:1] = visual_cond_typed[:1]
>>
How do people make videos with high consistency that are longer than 5 seconds?

I can output 9 sec gens on wan 2.2 using the workflow from the rentry in the OP, but anything more and my 5090 runs out of vram and the video loses cohesion toward the end
>>
>>107242805
there are multiple ways but none of them have high consistency except wan animate but the quality of those videos suck donkey dick.
>>
also what scheduler scale is best for I2V is still unknown to me, I have not tested it enough. 10 or 5 maybe
>>
File: WAN22_00048.mp4 (795 KB, 512x768)
795 KB
795 KB MP4
>>
ok this is 2B I2V with fix
https://files.catbox.moe/qq0m6h.mp4
>>
>>107242892
oh, and I used 5.0 scheduler scale for this one, that might be better for I2V
>>
File: wan22_00060.mp4 (2.08 MB, 480x608)
2.08 MB
2.08 MB MP4
>>
>>107242892
what are you trying to test or prove anon?
>>
gonna try to make a difference lora so the distill can be used as a lora on the I2V model
>>
>>107242923
>>107242767
>>
>>107242931
wrangle it to i2v?
>>
>4 years and 4,000,000 generations later
>still shit and not worth saving
>>
>>107242956
fixing a error that it had
>>
File: ComfyUI_08601_.png (1.76 MB, 1152x1152)
1.76 MB
1.76 MB PNG
>>107242593
That is very good for 8 steps and first try of pure txt2img on a Flash model. The eyes are a quirk of my prompting. The alternative with other models is slopness or not being able to follow my prompt at all.
>>
File: ComfyUI_08608_.png (1.61 MB, 1152x1152)
1.61 MB
1.61 MB PNG
>>
File: ComfyUI_08599_.png (1.96 MB, 1152x1152)
1.96 MB
1.96 MB PNG
>>
File: 1741707031146034.mp4 (3.88 MB, 496x736)
3.88 MB
3.88 MB MP4
>the green frog takes out a cigar and zippo lighter from his pockets. he puts the cigar in his mouth and lights the end of the cigar with the lighter. he inhales then exhales a puff of smoke and smiles
meh. back to genning myself hugging hot sluts with wan
>>
>>107242805
Its still not there yet. While theres longcat and svi, they still rely too much on daisy chaining. The best ones that do "1 minute" gens still suffer from janky movement every 81 frames, take this for example https://www.reddit.com/r/StableDiffusion/comments/1oh4q3w/wan21_svishot_lora_long_video_test_1min/

There's a simpler method where you dont have to fuck around in another application is https://github.com/princepainter/ComfyUI-PainterLongVideo I tried two gens, didnt see any color burn but it still suffers from the janky noticable 81 frames

Best is to use woct0rdo's radial attention/sage/sparge/triton (sadly at fixed dimensions), pusa loras, 245+ frames and pray you dont oom, kek
>>
>>107243056
or use vace
https://markdkberry.com/workflows/research/#vace-22---extending-video-clips
>>
i can't even get a small enough output to post here with kandinsky
>>
>>107243071
Yes vace can be good too. I've often find there's constant color shifts and inconsistencies (changing background items, body or facial features changes). There's bbaudio's nodes that seem to do a pretty good job with this, although the issues are less obvious https://github.com/bbaudio-2025/ComfyUI-SuperUltimateVaceTools/tree/main

There's the wan2.2 vace fun node he recently added but man, it is slow switching between high and low noise
>>
File: DiscoElysium_00014_.jpg (1018 KB, 1256x1704)
1018 KB
1018 KB JPG
>>107243091
ask chatgpt etc for a python script that converts video to under 4 MB
>>
nah.. last time i asked chatgpt for help with anything i lost my whole 1tb of sandboxed games
>>
>>107243071
it still accumulates error
>>
>>
File: ComfyUI_08633_.png (1.77 MB, 1152x1152)
1.77 MB
1.77 MB PNG
>>107243021
That was pure txt2img again. But imagine, if you will, a proper Chroma edit model.
>>
>>107243091
>can figure out video gen but can't figure out how to re-encode a video
>>
>>107243021
prompt?
>>
>>107243237
t2V? workflow? this result is not bad
>>
>>107243272
>Amateur photograph, split view of a young beautiful Japanese idol woman. Her dark hair neatly pulled back with wisps framing her face. She is dressed in traditional Japanese attire, featuring a flowing white top and a vibrant red pleated skirt. The left side shows a selfie of her face, she is smiling and doing okay sign. The background, softly blurred, shows what appears to be a traditional Japanese building with a dark roof and wooden structures, situated on a bright, paved ground. The right side closeup of only her legs, squatting with the skirt lifted, and panties visible

I engineered it to be exactly >>107243032
But it prefers to show her face in most gens anyways. Though I supposed it's a skill issue because if I keep mentioning it in the prompt then it's more likely to show it.
>>
>>107243299
thx
>>
>>107243286
i2v from the example included in the git repo
>>
>>107243254
bro this sucks, i'm sorry
>>
>>107243379
>nogen
>>
>>
>>107243430
so far, i havent seen anything that would make me want to use this over wan
>>
>>
>>107243399
come on the shit is blurry and blocky and looks like shit. if you like the aesthetic good for you and ignore my post but that is objectively a bad gen, her nails are dogshit, the chain goes nowhere the pattern on the doors is insane, what is even hanging there?
>>
Anyone got any chroma loras? Been Kinda bored genning
>>
>>107243559
>Anyone got any chroma loras? Been Kinda bored genning
Your turn to train and share, giddy up!
>>
File: 1733913747226673.png (1.81 MB, 1344x1728)
1.81 MB
1.81 MB PNG
which is more intellectual, lmg vs ldg?
>>
>>
>>
>>107243755
Snu snu
>>
>>107243755
>>107243653
Far worse than wan2.2. Russians are fucking stupid.
>>
>>107243794
it's a 2b model senpai
>>
>>107243794
>Far worse than wan2.2.
It's blurry and not very detailed, but I don't know if it's the model or just not enough steps/resolution too low.
>>
>>107243822
it is very low resolution but also it takes a long ass time for each of these.. over 6 minutes on a 5090.. generally WAN doesn't take that long even with 20+ steps on a much higher resolution
>>
>>107243844
Sounds like DOA model.
>>
>>107243851
Does she come in a refrigerated case?
>>
>>
File: ComfyUI_00039_.mp4 (2.98 MB, 1024x1024)
2.98 MB
2.98 MB MP4
same prompt as ^ but wan
>>
>>107243894
6 1/2 minutes

>>107244044
4 1/2 minutes
>>
>>107244053
Damn. Surely there will be optimizations, hehe.
>>
File: ComfyUI_00040_.mp4 (3.27 MB, 720x1280)
3.27 MB
3.27 MB MP4
>>107244044
5 minutes 21 seconds, better resolution
>>
>>107242861
kek
>>
>>107244058
no one's gonna bother making optimizations if there's no reason to use it over wan
>>
>>107244044
>>107244075
You're prompting Wan at max res though. Try prompting Wan at lower or perhaps unintended res like you are for kandinsky, it's shit too (practically unuseable on 3090 due to this). Kandinsky has way better physics knowledge than Wan.

Basically Wan has a total of two res:
720 x 1280 or 480 x 832

But even 480 x 832 is inferior to 720 x 1280.
Everything else looks like shit.

Kandinsky is probably similar.
>>
no, it sucks
24fps is retards decision
>>
>>107244075
Dude, still rocking Sabrina lora? I love slicks
>>
>>107244107
Ran baked this thread. What a sad little man.
>>
>>107244217
upgrade to 64 gigs ram
off load vaedecode to cpu
>>
>>107244397
There was a lora of her for hunyuan but I'm surprised there's no wan 2.2 lora.
So I'm gonna guess this is i2v.
>>
>>107244409
comfy is the only decent IU when it comes to perf and stability. but if you insist on the gradio UI, use neo forge or something. a1111 is out of date
>>
>>107244424
Sorry, I'm new.
Which schizo strawman is that?
>>
>>107244436
Because they didn't receive enough hugs as a child.
>>
File: 1761301899575172.png (290 KB, 460x405)
290 KB
290 KB PNG
>>107244449
Okay nice talking to you
>>
>>107244451
Are you barfanon from /v/?
>>
>>107244460
Sorry, I'm new.
Which schizo strawman is that?
>>
>>107244471
What does it mean?
>>
>>107244485
Okay, nice talking to you.
>>
>>107244500
You fucked up by using avatar op
>>
>>107244546
Sorry, I'm new.
Which schizo strawman are you referring to?
>>
>>107244580
What level of schizo does it take to not just wait for a thread that's 1/3 complete to finish? Are you really taking this that seriously? We're in the middle of discussing tech stuff and you're derailing by making another thread?
>>
>>107244602
Hey sorry, but I'm actually new and have never baked on this board.
Where's the thread you're referring to and which schizo personality are you conflating me with?
>>
>>107244617
it's obfuscated. you can't tell me what it's sending but I can tell you it's sending data. maybe go fuck yourself and learn op sec
>>
>>107244632
Sorry but I'm confused.
Which schizo fantasy botscript am I reading right now?
>>
Dare I say all the drama is coming from bots?
>>
you must mean 3.5 if "released 9 days ago is true"
what is the prompt, anyways? and like what sampler / scheduler / etc are you using

no one is saying it's like perfect quite yet anyways but it's definitely annoying to see people dismiss the clear advantages of better architectures. That's how we wind up in this endless cycle of "when new thing" -> "new thing comes out" -> "not nearly enough people make any attempt to work on / with creating resources for it or training it more"
>>
File: Untitledgsgsdgsg-1.mp4 (3.66 MB, 1200x674)
3.66 MB
3.66 MB MP4
Some styles are so crisp.
>>
>>107245010
I'm not using any artist tags, any recommendations? I am but a humble 1girl gooner trying to generate sexy pictures of smug-looking bitches, which is another limitation I'm running into: either it can't understand facial expressions very well or it can't generate facial expressions that differ from how a given character is usually depicted.
>>
>>107245010
Very cool
>>
File: wan2.2_00167.mp4 (814 KB, 848x480)
814 KB
814 KB MP4
>>107245071
Suffering from the usual with loops sadly.
>>
any better alternatives to Local Dream for Hexagon NPU on Android? shit's not FOSS
>>
File: wan2.2_00169.mp4 (993 KB, 496x368)
993 KB
993 KB MP4
"the video starts with showing an old crt tv which is displaying a news channel about a girl, the camera then quickly pans out and pans to the left showing a wide angle view of a warehouse facility with a group of villain goons and a man dressed as the joker sitting on a pony and they are all laughing while the pony is chewing on dollar bills."
>>
>>107245071
gonna try this out tomorrow, i got three more things to train today
>>
>>107245079
what about treason for chinese gold
>>
>>107245093
I'm using NetaYume Lumina v3 which was released ~9 days ago according to Civitai. I'm aware of SDXL's limitations, believe me.
Maybe it's just this one particular prompt that it's having trouble with, but the problems seem to boil down to a lack of diversity in training data rather than the strengths of the algorithm itself.
>>
>>107245118
huh, Res Multistep Linear Quadratic (this gen) looks way better than Euler Beta (last one) on the same seed
>>
File: wan2.2_00176.mp4 (1.06 MB, 480x528)
1.06 MB
1.06 MB MP4
Man, ropes are difficult, huh.
>>
>>107245221
please get a Nvidia gpu with higher vram. With 8gb of vram, you will have annoying issues with running normal fp16 non-lighting sdxl models when using hires fix and upscaling. make sure you have 32-64gb of either ddr4 or ddr5 ram anon.
>>
File: ComfyUI_00002_.png (117 KB, 512x512)
117 KB
117 KB PNG
Repost from previous thread
Is there anty way to remove the noise?
I train it using Illutrious 0.1
>>
>>107245163
I mean the app, not models
>>
File: wan2.2_00178.mp4 (925 KB, 640x480)
925 KB
925 KB MP4
>>
File: 12512616151251.png (95 KB, 1108x1116)
95 KB
95 KB PNG
>>107245221
"controversial" data like that is something they don't allow you to generate without jailbreaking the model.
>>
>how well does it handle something like a penis?
haven't tried, assuming not well
>>
>>107245316
Anime penis works, real world penis no. This is seems to be the case because "porn" is anti-chinese thus they have to censor it. So whatever CPC propaganda says can't be done, can't be done with an AI without bypassing security features.
>>
>>107245266
uhmmm whats this non-freedom nonsense??
>>
remember when comfy posted fennec girl with a bag of money after getting $17M in funding and ani was seething uncontrollably
>>
File: wan2.2_00185.mp4 (1.1 MB, 960x720)
1.1 MB
1.1 MB MP4
"the woman is looking at the sea, to then turn her head slightly as she thinks she hears something, she turns her head fully and gets surprised and sits up straight then gets happy to see the viewer as she starts to wave her hand hello cheerfully to the viewer. the ocean waves crash calmly at the beach rocks.
oil painting style."

My proompt-fu is getting better.

>>107245282
Fair enough.

>>107245380
Free laptop, bro.
>>
>>107245428
actually no I won't contrarian faggot
>>
File: wan2.2_00187-1.mp4 (3.76 MB, 800x628)
3.76 MB
3.76 MB MP4
"the camera moves up and forward into the distance revealing a lively futuristic cityscape.
abstract and colorful oil painting style."

Damn, haven't done any cityscape stuff before.
>>
I'm from /ldg/ - Landscape Diffusion General. >>107238591
I see our acronyms are the same and people can get confused.

Request for the baker of this general:
Please change the acronym to avoid confusion to /odg/ - OSS Diffusion General .
>>
>>107245489
Julien should hang xirself
>>
>>107245489
kys
>>
>>107245483
Reminds me of Planetside 2
>>
File: wan2.2_00192-1.mp4 (3.75 MB, 1000x752)
3.75 MB
3.75 MB MP4
>>
>>107245563
Obsessed schizo.
>>
>>107245797
so true xister
when a retarded niggerfaggot starts annoying everyone, one should stay quiet and do nothing, like a good cuck
>>
>immediate pol schizo meltdown
I see.
>>
remember when an anon here, on /ldg/, posted the fast cancel for comfy and some little redditor reposted it, and then it was officially implemented by comfy
>>
File: wan2.2_00194.mp4 (2.58 MB, 720x928)
2.58 MB
2.58 MB MP4
"the girl tilts her head up towards the viewer, looking at the viewer, she is full of despair. her skin is that of a cracked paint on an oil painting.
she holds a human skull.
colorful rough oil and watercolor painting style."

Shame, the cracked paint doesn't stick on her skin.
>>
File: wan2.2_00196.mp4 (2.35 MB, 960x720)
2.35 MB
2.35 MB MP4
"the man is in despair seeing the broken wine bottles, he then bends down and crawls over to the broken wine bottles and starts to lick the wine up from the ground.
colorful rough oil and watercolor painting style."
>>
>>107241081
AMD isn't great. However, if you use rocm from TheRock you get much better speeds, the latest build pretty much cut my gen times in half compared to using zluda, so if you are content with subpar speed compared to Nvidia, then its a lot more viable than in the past.
>>
>>107246167
and?
>>
>>107247060
its anti-ani schizo
>>
lodestone said he figured out the reason why chroma did not learn artist styles and its already learning them quick. he needed to train at full fp32
>>
>>107247110
And?
Who here isn't anti-ani?
>>
>>107247242
>chroma
*yawn*
>>
>>107247325
its the best at complex nsfw stuff and having non ai art styles. Its basically local midjourney that can do nsfw
>>
>>107247350
but I only care about anime
noob and neta already cover it for me :)
>>
>>107247360
yea, those are specialized models trained specifically for that with half a million dollars worth of compute
>>
>>107247242
Is he making a finetune or what?
>>
>>107247378
hes grifting as usual
>>
>>107247378
he is still training it from what I know, he just had to get ramtorch working in order to train at full precision
>>
logs over the course of a few weeks

okay FP32 is a must when training a model
the difference is at the basin
bf16 struggled so hard at the basin convergence
you can still do bf16 compute
but the accumulator states has to stay in fp32
so that means the master weights, and optimizer states
grad can stay in bf16 because it's a short accumulator
Feffy — 11/9/25, 10:11 PM
so mixed precision then
Lodestone Rock — 11/9/25, 10:11 PM
yes
but the optimizer has to be in fp32
Feffy — 11/9/25, 10:12 PM
stochastic rounding not good enough?
Lodestone Rock — 11/9/25, 10:12 PM
nope
Feffy — 11/9/25, 10:12 PM
even with kahan summation?
Lodestone Rock — 11/9/25, 10:12 PM
nope
at the basin you want to remove as much noise as possible
so any form of compression is intolerable
you can do bulk compute at bf16 first
but at the final say 10% of training do what you must to make sure the precision is as high as possible
do it in fp64 if you have to

fp32 accumulator is important :catree:
Bunzero (hates VLMs)

— 11/11/25, 3:03 AM
I remain skeptical :furry_gigachad:
Lodestone Rock — 11/11/25, 3:03 AM
radiance suddenly learned a lot of artist tags within a day of training in partial fp32
Bunzero (hates VLMs)

— 11/11/25, 3:04 AM
can the universe let me be right at least once :crying_cat:
Lodestone Rock — 11/11/25, 3:05 AM
im going to make it train at full fp32 accumulator state
as soon i fixed the ram sharing issue
you really cant bargain with the accumulator
well atleast we have tools to mitigate this issue
Lodestone Rock — 11/11/25, 3:07 AM
on 8x4090
just to rub the salt on the wound even more
cuz 8xh100 couldn't do it because i need to train it on full b16
cuz there was no ramtorch back then :synth_derp~1:
Bunzero (hates VLMs)

— 11/11/25, 3:08 AM
can't or couldn't
couldn't :synth_derp~1:
Lodestone Rock — 11/11/25, 3:08 AM
engrish
but yeah
guys train your shit in fp32
you cant do it in NVFP4
you cant do jack shit in NVFP4 lol
Bunzero (hates VLMs)

— 11/11/25, 3:11 AM
>>
but how did OAI do it then
Lodestone Rock — 11/11/25, 3:11 AM
they dont
:synth_derp~1:
they have bajilion of b200
so during training
any long running accumulator has to stay in fp32
so that means master weights, and optimizer state
because those things are literally an integrator
and you know yourself that integrator will accumulate error over time
that's literally control theory 101
during the span of training you literally doing integration of model vector in the model vector field where the vector field is the loss landscape itself
Lodestone Rock — 11/11/25, 3:18 AM
so any non white noise error will cause drift
Talan — 11/11/25, 3:19 AM
wait lode, did you added more danbooru and e621 data to chroma radiance training?
Lodestone Rock — 11/11/25, 3:19 AM
no
the data are identical to previeous run
Talan — 11/11/25, 3:19 AM
i vaguely heard someone said something about it
or me schizoing :mpreg_hydra:
Lodestone Rock — 11/11/25, 3:20 AM
i said i'll add it if i managed to fix the ram sharing issues
some of the states are sharing the ram but not others
the master weights are being shared it but for some reason grad is not
or atleast that's what i believe what's happening

guys i just tried overfitting flow model to one example using bf16
it cant overfit to details like at all

int8 vs bf16
it's official
you're no longer need nunchaku
this works on any model
no need calibration
2-4x speedups
metal63 — 11/16/25, 10:16 PM
training? or just inference
Lodestone Rock — 11/16/25, 10:19 PM
should be both
but i havent integrated it to ramtorch backward
im making your consumer gpus as powerful as datacenter gpus
Aura — 11/16/25, 10:20 PM
sorry, it's been forever since i've poked my head in here, what's this?
Lodestone Rock — 11/16/25, 10:20 PM
imagine nunchaku
but for any model
and can be used for training
the speedup is about 2-4x
i need to give amd a love too
need to create kernel that works on amd too
because amd tensor layout is different
>>
File: chroma___0001.png (1.78 MB, 832x1216)
1.78 MB
1.78 MB PNG
anon please stop this nonsense at once
>>
>>107247527
>you're no longer need nunchaku
holy snake oil seller
>>
>>107247676
man you have you even tried ramtorch? this man is doing the real work. I don't doubt him
>>
>repost bot spam is back
>>
>>107247242
and what was the reason?
>>
>>107247616
What is she even eating? Roasted seaweed dipped in some sauce? Is she a single celled organism filter feeding? Who the fuck "eats" that
>>
>>107247759
did you read what you responded to? or the log after it? Not training at full precision
>>
>>107247779
oh, but why would training at full precision suddenly fix the tags given the model wasnt trained on something insane like fp4 or whatever and given that the model didnt learn absolute shit when it comes to artist tags during its entire long training run?
would more precision really give it that much more capacity for knowledge being packed in within the same sized model?
>>
>>107247808
its all here:
>>107247507
>>107247527
>>
>>107247808
basically he had to train at full bf16 cause that is what there was to work with, he didn't realize till after he needs the accumulator weights at fp32 or else noise in the form of rounding will keep it from learning after a certain point / to a certain level of accuracy. Now he has been working on ramtorch to make it possible to train at mixed precision, and to train models with a fraction of the vram needed without speed loss. And in a single day chroma radiance starting learning stuff it refused at only bf16 like artist tags
>>
>>107247937
also he said for testing he tried on purpose over fitting a model on just fp16 and it was impossible to do sob because of sad precision which explains the small detail issue

thing is no one else not the big ai firms with their own code tried training on this scale before him so he is learning this as he goes
>>
>>107247835
>>107247507
>>107247527
if this improves lora training quality too i think a good idea would be for him to collaborate with ostris who already partially implemented ramtorch for training loras into ai-toolkit, so they can properly implement something that works well enough so they can publish something marketable online, to get a lot of eyes on this
>>
File: wan2.2_00212.mp4 (2.05 MB, 640x912)
2.05 MB
2.05 MB MP4
"the woman turns 180 extending her left arm behind her and faces the camera as she extrends her arm holding the katana and points the katana towards the viewer with an extreme upclose shot of the katanas tip."
>>
>>107247507
lol, lmao even

I am one of the "anti Chroma schizos", who literally months ago, posted a breakdown of the many mistakes Chroma was making during training. One of the top things I pointed out was how using pure bf16 and stochastic rounding was fucking retarded and he should just use mixed precision training like everyone else. At least he finally came around, even if it took $150k flushed down the drain first.

Now let's see if he realizes all the other things that are wrong with the Chroma training setup.
>>
File: flux_0137.png (1.47 MB, 832x1216)
1.47 MB
1.47 MB PNG
>>107248138
noice
>>
>>107248138
"controversial" data like that is something they don't allow you to generate without jailbreaking the model.
>>
>>107248339
huh, Res Multistep Linear Quadratic (this gen) looks way better than Euler Beta (last one) on the same seed
>>
>>107248308
post desu link or lying
>>
File: wan2.2_00217.mp4 (1.48 MB, 480x480)
1.48 MB
1.48 MB MP4
"the camera zooms in very fast to the end of the hallway while twisting the camera. very fast and intense motion."

>>107248339
I love getting surprised how good some stuff looks, the reflections are amazing.
>>
>>107248454
I'm using NetaYume Lumina v3 which was released ~9 days ago according to Civitai. I'm aware of SDXL's limitations, believe me.
Maybe it's just this one particular prompt that it's having trouble with, but the problems seem to boil down to a lack of diversity in training data rather than the strengths of the algorithm itself.
>>
>>107248477
have you seen his training data? its about as diverse as possible, that is not a issue at all there. That shit is already the most diverse style wise model there is atm. The issue is small details and it not learning past a certain point which were apparently due to bf16 rounding errors
>>
>>107248455
What level of schizo does it take to not just wait for a thread that's 1/3 complete to finish? Are you really taking this that seriously? We're in the middle of discussing tech stuff and you're derailing by making another thread?
>>
>>107248454
https://desuarchive.org/g/thread/104885523/#104888771
>>
use already baked thread when done
>>107237888
>>107237888
>>107237888
>>
>>107248512
comfy is the only decent IU when it comes to perf and stability. but if you insist on the gradio UI, use neo forge or something. a1111 is out of date
>>
>>107248531
well you got me, you should have told him lol
>>
>>107248531
Damn. Surely there will be optimizations, hehe.
>>
>>107248537
it is very low resolution but also it takes a long ass time for each of these.. over 6 minutes on a 5090.. generally WAN doesn't take that long even with 20+ steps on a much higher resolution
>>
>>107248551
come on the shit is blurry and blocky and looks like shit. if you like the aesthetic good for you and ignore my post but that is objectively a bad gen, her nails are dogshit, the chain goes nowhere the pattern on the doors is insane, what is even hanging there?
>>
>>107248537
No.
>>
>>107248618
>can figure out video gen but can't figure out how to re-encode a video
>>
though it shouldn't be a 'waste'. He can still just resume training with accumulator at fp32. Just the time he 'maxed out' the accuracy bf16 could achieve would be a waste
>>
File: wan2.2_00220.mp4 (1.09 MB, 480x672)
1.09 MB
1.09 MB MP4
"the camera pans in slowly as the cat walks up to the man and leaps onto his head and sits down on his head while the man reacts to the cat while holding a cigaratte."

Damn, this was a cool one, first gen too.
>>
So im trying to into comfyui and tried this node and workflow here:
https://github.com/regiellis/ComfyUI-EasyIllustrious
Is it typical that its just midwittery spaghetti json "code" where there are like 12 different pre and post processing effects that don't do anything or are even directly in opposition to each other?
Or am I just using the wrong node/workflow? sd next feels so much better out of the box
>>
File: wan2.2_00222.mp4 (598 KB, 704x480)
598 KB
598 KB MP4
"the man, adolf hitler, points at the viewer with his hand and finger, then does a thumbs up as he smiles."

Cool, it doesn't warp the face.
>>
>>107248725
why
why the FUCK would you do this
base comfy has all the nodes needed to start out.
>>
>>107248780
Okay thats why I'm asking. Because it seemed retarded to me as I was doing it but I was just following LLM slop.
>>
>>107248823
just check the OP (1girl guide) it has a lot of basic workflows to start out.
>>
File: flux_0193.png (908 KB, 832x1216)
908 KB
908 KB PNG
>you know what???? I"M GONAN SPLIT DA THRED
>>
barf
>>
File: meincumpf.png (19 KB, 717x143)
19 KB
19 KB PNG
comfy is based
>>
File: wan2.2_00226-1.mp4 (3.76 MB, 1200x772)
3.76 MB
3.76 MB MP4
"a group of camels walk across the desert as a massive fire and smoke rages behind them in the distance, heavy winds, fast motion."
>>
>>107249418
Based on what
>>
>>107249687
Python
>>
retard here, why are outputs with dmd/lighting lora better? Shouldn't the image get better with more compute?
>>
>>107248531
What do you think about the results from here https://civitai.com/models/2093591 where in the description of the lora it says that you can use qwen image edit lightning lora on the basic qwen image model instead to kinda fix the low seed difference that qwen image has?

Makes the images a little grainy but seems to work, I guess the lora for the edit model being used for the normal model changes the model enough to add seed difference but not too much to destroy the output given the two models are similar.
>>
>>107249418
what 'native' block swap nodes are they referring to? kija's?
>>
>>107249717
cringe
>>
>noooo you can't use blockswap
> please make more all in one node packs with 90% useless nodes instead
>>
>>107249418
Which blockswap is he talking about?
>>
>>107249751
>why are outputs with dmd/lighting lora better
they are?
>>
I don't understand why anyone needs a block swap node anyway. UnetLoader from MultiGPU already has an option for putting in how much ram you want swapped.
>>
why would I swap my ram? my ram works fine! I can't afford to swap ram every gen!
>>
File: wan2.2_00232.mp4 (296 KB, 512x480)
296 KB
296 KB MP4
"man, adolf hitler, is playing a video game holding a game controller in his hands, he lets go of it with one of his hands and points to the left laughing as he stomps his leg."
>>
>>107250114
INTERPOLATE WITH FILM VFI NIGGEEEEEEEEEEER
>>
>>107250128
no
>>
>>107250128
That takes longer than the gen, I'm just going through folders.
>>
>>107249967
yes at least for XL
>>
>>107250128
*gimm-vfi
>>
>>107250215
no, film vfi has better physics in its interpolation, basically topaz level for 16 to 32 fps interpolation
>>
>>107249418
wtf is this from
>>
>>107250278
sounds like something a nigger would claim
>>
cool it with the racism, buds. take that to X the racist app.
>>
Is a future with an UI that doesn't have 30GB of python dependencies possible?
>>
>>107250469
We are continuing to investigate this issue. In the meantime we recommend you use AniStudio.
>>
>>107250205
you might be the only one here who feels that
>>107250469
maybe in a decade
>>
>>107250469
shhhh don't say it out loud or the comfyorg goons will detail the thread. there is an anon working on it though
>>
File: wan2.2_00237.mp4 (432 KB, 480x480)
432 KB
432 KB MP4
"the cartoon man is dancing. the text "IT'S AN ABSTRACT KIND OF FEEL" remains throughout the video."

Why am I the only one posting gens?
>>
>>107250469
Incoming rust port. It's 30GB+ but it's memory safe.
>>
>>107250469
just buy more storage until we get AGI to fix this issue unironically, nothing else can
>>
>>107250501
>Why am I the only one posting gens?
Sorry I'm training right now
>>
>>107250501
i post my gens in the real thread
>>
>>107250511
storage is going up in price as is memory and vram. the future sucks



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.