[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of Free and Open Source Diffusion Models

Prev: >>107823785

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
load video as image input, 17 frames (from a mp4), kek:

https://files.catbox.moe/305yaa.mp4
>>
Importing gifs in LTX2 is fun, despite their low resolution fucking up the result.
https://files.catbox.moe/9rp28s.webm
>>
>>107827006
twitch be like:
>slur = ban
>animal torture = good
>>
File: ComfyUI_temp_higpi_00104_.png (3.57 MB, 1824x1248)
3.57 MB
3.57 MB PNG
https://files.catbox.moe/02v7iz.png
>>
File: x.png.png (1.93 MB, 1536x1024)
1.93 MB
1.93 MB PNG
>>
https://files.catbox.moe/ewn3t2.png
>>
File: ComfyUI_temp_higpi_00078_.png (3.32 MB, 1824x1248)
3.32 MB
3.32 MB PNG
https://files.catbox.moe/uy8vjh.png
>>
>>107826985
garbage collage
>>
File: ComfyUI_temp_higpi_00071_.png (3.59 MB, 1824x1248)
3.59 MB
3.59 MB PNG
https://files.catbox.moe/c9b3pr.png
>>
>>107826985
>>Maintain Thread Quality
>https://rentry.org/debo
>https://rentry.org/animanon
wtf I thought we removed these already? is the troontard back or something?
>>
File: 1766018059432434.png (1.47 MB, 1024x1472)
1.47 MB
1.47 MB PNG
>>
File: 1763933658700323.png (1.61 MB, 1632x928)
1.61 MB
1.61 MB PNG
>>
video input with 17/33 frames as the cap is really good for i2v but following the start of the original video.

https://files.catbox.moe/xoj73n.mp4
>>
File: file.png (2.18 MB, 1512x1032)
2.18 MB
2.18 MB PNG
>>107827045
>>
>>107827011
are you using load video with frame_load_cap?
>>
File: file.png (1.76 MB, 1536x1024)
1.76 MB
1.76 MB PNG
>>107827051
>>
>>107827159
No I used "Load Video FFmpeg (Upload) ", it works with gifs well, didn't test videos formats yet.
>>
File: file.png (1.61 MB, 1656x944)
1.61 MB
1.61 MB PNG
>>107827131
>>
Enjyoing your base you fucking retards?
>>
Enjoying your enjoyment you fucking enjoyers?
>>
File: file.png (1.8 MB, 1040x1504)
1.8 MB
1.8 MB PNG
>>107827130
>>
Need more Chinese commits please
>>
>>107827221
this I need just one more copium injection pretty please
>>
>>107827176
When you set the number of frames to generate, does that get interpreted as frames in addition to the gif, or does it include the gif frames?
>>
Need more Chinese commies please
>>
File: girl on desert.png (1.61 MB, 832x1216)
1.61 MB
1.61 MB PNG
>>
>>107827221
it's Sunday in China they are in church
>>
baseless thread
>>
>>107827254
They are sleeping.
>>
chang no pity begging western dog
>>
File: 9195251641.png (2.98 MB, 1128x1600)
2.98 MB
2.98 MB PNG
>>107827238
>>
Is PainterI2V shit? Does it actually improve motion?
>>
>>107827271
imagine how nice they must sleep thinking about all the gwailo they tricked
>>
impressive results with 2511. it blows Kontext and "Grok put her in a bikini" out of the water
>>
Requesting 2girls sitting
>>
remember, you can right click -> unpack subgraph to get rid of that retardation
>>
>>107827304
but can it into nudity? no
also grok has nicer skimpy outfit variety
>>
When is GenJam coming back?
>>
File: ComfyUI_00321_.png (1.59 MB, 832x1216)
1.59 MB
1.59 MB PNG
>>
>>107827320
indian detected
>>
>>107827284
it does but if you but it kinda feels like it just exaggerates the motion instead of improving the overall motion in the whole gen, if that makes sense
also if you ramp it up too high it will fuck up the colors
>>
>>107827333
ani was organizing those I think but the thread schizo drove him away
>>
>>107827284
its ok, hit or miss if it discolors or not. he released an updated version but havnt had the chance to test it. could also try the motion scale node https://github.com/shootthesound/comfyUI-LongLook seems to speed things up, only briefly tested it
>>
>>107827320
well shit i dont know, i dont often prompt for nudity, i try to make complex scenes doing multiple passes with different models. is everyone here a gooner
>>
>>107827346
Does it only work with high-fps content? I wanted to use it for anime gens, so I need it lower than 32fps.

When I reduced the frame rate, the quality suffered a lot and it looked like slowmotion.
>>
>>107827355
you literally have it in reverse. ani was one of the people complaining about the randomly selected themes.
>>
File: 1748790328093526.png (2.09 MB, 928x1632)
2.09 MB
2.09 MB PNG
>>
>>107827362
it seems to help with any sort of gen but results can vary, try it out and see.
the real way to get proper motion on Wan is to just not use the high noise lora, but that means you need more steps + CFG so you'll gen like 4x slower
>>
>>107827368
I think you are misremembering
>>
is there a v2v setup that doesn't require the massive memory as i2v does?
seems like there should be a simple script that compares reference images to the video and ensures some consistency frame to frame because you are basing on video frames not the previously generated frames.
>>
>>107827386
post evidence it was ani organizing.
>>
File: 32GB RAM face.jpg (170 KB, 900x685)
170 KB
170 KB JPG
Anyone else use Onetraienr for Z imge turbo loras? Are the default settings good? What settings did you modify and why?
>>
>>107827386
you dont think much do you
>>
>>107827397
I couldn't get it to work but the results were less shit than AI-Toolkit.
>>
>>107827395
can you do the same for what you are doing?
>>
>>107827397
I increased rank but that's about it. It's all about crap in crap out.
>>
kek, used the first 33 frames of miku fortnite as a source:

"anime girl dances to music"

https://files.catbox.moe/wd2wd4.mp4
>>
>>107827397
change the timestep value or whatever its called in the training tab to like three and youll be fine
>>
lol this time it made a miku after the switch:

https://files.catbox.moe/814o9c.mp4
>>
>>107827412
>>107827438
ngl this model is hella uncooperative, when it doesn't want to do powerpoint shit it doesn't want to stay with your input character at all, please chang, save us from jerusalem!!
https://files.catbox.moe/s6h9l2.mp4
>>
>>107827409
you made the first claim tho :]
>>
>>107827451
disagree, ltx is great.

https://files.catbox.moe/89lfb2.mp4
>>
>>107827451
Yeah, its not kind to lower end computers plus every video has a smudge or smear quality to them. Its as if you fed wan 2.1 a bunch of sd1.5 images then made a video. I'll pass on ltx (again)
>>
Two more weeks?
>>
File: egg.png (1.08 MB, 1024x1024)
1.08 MB
1.08 MB PNG
>>
File: 1757869160300468.png (3.16 MB, 1344x1344)
3.16 MB
3.16 MB PNG
>>
>>107827511
Z base just flew over my house
>>
File: x.png (1.59 MB, 1024x1024)
1.59 MB
1.59 MB PNG
>>
why ltxv has these manual sigmas and can't set the number of steps normally?
>>
LTX2 uses way less vram on T2V than on I2V somehow on ComfyUi, dunno if it's normal but the difference is huge
>>
mari from eva as video input source, capped frames at 33 (17 is ideal)

pretty good, can take any video and use the start of it as a gen, like i2v but with a part of it as video. good for motion too.

https://files.catbox.moe/ihu4an.mp4
>>
ltx kinda mid ngl, but sometimes a little t2v kino comes through (im not that autistic hasan/floyd/trump/cia poster, you may proceed in confidence)

https://litter.catbox.moe/k6khbb092rjfe6uf.mp4
>>
File: 1760592179196253.png (131 KB, 317x305)
131 KB
131 KB PNG
>>107827571
she should leave him alone he's watching peak rn kek
>>
>>107827564
i've noticed that too. does this happen on Wan2GP too or is the comfy i2v implementation borked?
>>
File: 1741573518978166.png (1.46 MB, 1056x1472)
1.46 MB
1.46 MB PNG
>>
>>107827284
did you try this one too? https://github.com/princepainter/ComfyUI-PainterI2Vadvanced
>>
Last one from me
Good night anons
>>
File: 1762534109805132.png (1.74 MB, 1408x1088)
1.74 MB
1.74 MB PNG
>>107827624
oyasumi
>>
>no loras
>80% of the videos are powerpoints
it feels like the early days of SD1.5
At least with wan2.2, you actually get something with your prompt
>>
>>107827568
another example with machu from gundam, 33 frame input, really good imo: just use load video (videohelpersuite) node

whats neat is you are basically doing i2v but the motion of the gif/mp4 also guides the output:

https://files.catbox.moe/34jshm.mp4
>>
File: dbd-3130705007.jpg (90 KB, 600x1067)
90 KB
90 KB JPG
I'm training on dedistilled ZiT now the samples look much better for some reason. Idk how it'll translate into actual gens yet, but the samples stayed somewhat coherent and are seemlingly slowly and steadily picking up the concept instead of the barely recognizeable samples that looked liek someone put the image in a blender that I got when training with regular bf16 ZIT.
>>
>https://github.com/silveroxides/ComfyUI_Hybrid-Scaled_fp8-Loader
>'dict' object has no attribute 'orig_shape'
I assume it just doesn't work in 3000 series.
>>
>>107827571
prompt? this one looks like sora. nice
>>
>>107827662
i've been preaching training on the de-distilled instead of the adapter lora for ages based on my own tests, and people just don't care
>>
>>107827571
this boy loves cabinets
>>
>>107827659
also, audio has no censorship, clearly:

https://files.catbox.moe/yrbock.mp4
>>
>>107827662
>I'm training on dedistilled ZiT now the samples look much better for some reason.
I mean, that was the goal of that model yeah, to make lora training easier
>>
>>107827662
>>107827676
Does OneTrainer do this automagically?
>>
File: na2.png (1.8 MB, 1824x2000)
1.8 MB
1.8 MB PNG
>im not that autistic hasan/floyd/trump/cia poster, you may proceed in confidence
>>
i just dropped a huge z-image base in the toilet
>>
>>107827691
>audio has no censorship
sometimes it beeps the swears but it's rare enough to not be bothered by it
>>
>>107827700
yeah, it now downloads De-Turbo when you use a ZiT preset
>>
Messi if he used English:

https://files.catbox.moe/4052v1.mp4
>>
File: 1736990164469274.png (430 KB, 580x711)
430 KB
430 KB PNG
>>107827736
Lionel Griezmann lul
>>
>>107827674
First person perspective handheld camera opening the door to a bedroom. A woman's voice is heard in English: "And here's our son's room. Say Hi, Bobby!"

Inside, a 36-year-old obese balding white man with frizzy facial hair sits at his computer desk on a gaming chair wearing a headset, the monitor in front of him displaying an anime girl. The anime girl on the monitor animates wildly and speaks in a high-pitched cadence in Japanese: "愛しています."

The man faces the camera in terror, taking his headset off and throwing it before shouting: "What the FUCK, ma! I told you to knock before coming in!" He rises from his chair and steps toward the camera in a confrontational manner. The room is filthy, filled with food containers, cans of soda and other various clutter. The video is uncensored and raw.


interestingly when I didn't have him wearing a headset LTX gave him desktop speakers that played the girl's voice. Adding that it was an uncensored video stopped bleeping out the fuck.
>>
Mads with some advice:

https://files.catbox.moe/5al0o9.mp4
>>
I can really do with a lot less yelling gens. shit sounds like it's coming through a tin can. just awful
>>
>>107827753
based and redpilled
>>
File: comfyu update.png (2.96 MB, 1664x1216)
2.96 MB
2.96 MB PNG
I just pulled comfyui update (haven't updated for a month or so).
I re-run older gens with same seed and params and I am getting slightly different results.
(In case you can't tell ring finger of her right hand is longer and the shadow on her right is narrower)
Tested both Chroma and ZiT.
A bit shot in the dark but I am curious, does anyone know what might have changed?
And no it's not custom nodes. No loras. Nothing I know of that could introduce non-determinism.
>>
>>107827773
the image changes if you change the pytorch version, did you do that?
>>
sopranos on sam altman and openAI:

https://files.catbox.moe/xygcrw.mp4
>>
File: lo1.png (2.54 MB, 1360x1928)
2.54 MB
2.54 MB PNG
>Last one from me
>Good night anons
>>
>>107827773
i think someone mentioned something earlier about having to use ModelComputeDtype node on zit to force it to run in bf16, dunno if that's the issue
>>
>>107827779
kek
>>
>>107827782
n8
>>
File: 1736679117242847.png (2.12 MB, 1200x1200)
2.12 MB
2.12 MB PNG
>>107827779
I mean, we pay this chink a lot for prompts so...
>>
File: 1750904400055015.png (147 KB, 306x591)
147 KB
147 KB PNG
>>107827779
and all you need is this instead of load image, for the LTX2 workflow:
>>
>>107827778
Both were 2.9.1 though iirc
>>107827786
I mean I never used that with ZIT so I don't think that's the variable that changed.
>>
File: ku6.png (3.27 MB, 1360x1928)
3.27 MB
3.27 MB PNG
>n8
>>
ltx isnt good, especially since it has no nsfw capabilities.

wake me up when we get some real progress
>>
>>107827701
>>107827782
>>107827802
0.o
>>
>>107827799
>I mean I never used that with ZIT
they said that comfy changed the way zit runs in one of the newer patches so now you have to force it back to bf16
>>
>>107827812
>they said that
where?
>>
>>107827812
Okay then let me give it a try.
>>
>>107827820
check under your foreskin
>>
>>107827791
once you have your GPU you can generate a billion times, not the same as paying $1000 for 3 prompts
>>
>>107827830
says the guy running a model made by israel, the country responsible for your gone foreskin lul
>>
kek used mariokart world as a video source: everything after the initial hop is new.

the anime girl drives her motorcycle high into the air and over a cliff into the water, and says "oh no, thats not good."

https://files.catbox.moe/ag8vci.mp4
>>
You know why I like LTX? Because Reddit hates it. They have turned into wan tit suckling freaks and the mere existence of ltx makes them mad
>>
>>107827867
desu it works fine, especially after giving such a low res gif
>>
>>107827874
>Because Reddit hates it.

are you blind? reddit loves it, they are just spamming shitty videos of how they can run this video on their 8g gpus
>>
>>107827045
>>107827055
>>107827065
awesome
>>
>>107827874
really? all I see is LTX spam videos in there
>>
>>107827812
>>107827821
No that's not it.
It seems to be running at bf16 anyway (for me at least) when not using loras.
Tried both fp16 and bf16, different results than before.
Tried SDXL and while one image came out same looking another had a minor difference too.
Again no custom nodes for this testing.
They changed something I guess.
>>
lmao, video input is pretty funny

the anime girl fires a large blue shell forward that makes a huge explosion, and says "oh no, thats not good."

https://files.catbox.moe/lmkpa0.mp4
>>
>>107827895
what if it's you that changed
>>
>>107827889
>>107827894
Read the comments they aggressively compare it to Wan
>>
>>107827900
the anime girl stops her bike and says "I want to take a break, lets relax." as soothing music plays.

KINO, especially with the potato input video quality

https://files.catbox.moe/as6uwj.mp4
>>
File: ComfyUI_00336_.png (1.42 MB, 832x1216)
1.42 MB
1.42 MB PNG
>>107827903
>>
>>107827874
headcanon
>>
is there anything that can add audio to wan videos?
>>
>>107827967
https://old.reddit.com/r/StableDiffusion/comments/1q916xs/you_can_add_audio_to_existing_videos_with_ltx2/
>>
>>107827284
it's alright. just try it
>>
listen to Miyamoto

the man says "that faggot is not making Miku videos, why?" in a japanese accent.

https://files.catbox.moe/tnl708.mp4
>>
>>107827895
Seed determinism is dead. Forge, ReForge, NeoForge, and Forge Classic all break seeds using the same RNG and ESND, Comfy Swarm and Krita do too, the same happens with Sage Attention, it’s over, we’ll never be equal again.
>>
File: 1752964862385280.png (46 KB, 386x438)
46 KB
46 KB PNG
>>107827551
>>
>>107828009
kino, but don't put the prompt you're spoiling it lol
>>
I maintain that the aggressively vocal anti-NetaYume campaigners that sometimes show up around here are most likely just giga-ESLs who simply hate the idea of ANY model that might sometimes require them to be able to coherently boomer prompt in English to get the best possible results.

There's no other plausible explanation to me as a white Canadian guy in his 30s, I don't see how else you could possibly conclude the model was "shit" if you'd actually used it at all.
>>
>>107828017
What does 'kino' even mean?
>>
>>107828027
it means cinema in a lot of european languages
>>
any E3 fans?

https://files.catbox.moe/3xeycf.mp4
>>
>still no model that can do crowds with explicit details.
>>
>>107828011
Actually no, I tested that too.
I was getting the same results rerunning same seed on ZIT.
It seems to be some commit or something changed in requirements.txt that is causing this.
>>
>>107828025
what I've seen of NetaYume is decent but I also haven't used it since I'm more into realism so idk
>>
File: yes.png (153 KB, 500x500)
153 KB
153 KB PNG
>>107828036
I'm more of an E33 fan
>>
>>107828044
>It seems to be some commit or something changed in requirements.txt that is causing this.
maybe it's the comfy kitchen package thing
https://github.com/Comfy-Org/comfy-kitchen
>>
>>107828034
Kino means kinetic touch.
Film is cinema.
Theater is cinema.
You are a bastard child.
>>
real talk, anyone tried to train an ltx2 lora yet? do you need to rent a H100 or is it trainable locally on a 5090?
>>
>>107827409
Anon, you are very clearly a complete idiot.
>>
>>107828025
It has its faults but you are correct, NLP is not one of them nor inherently bad.
>>
>>107828059
>Kino means kinetic touch.
this sounds like a false etymology. even online slang dictionaries relate the modern online slang meaning to the words cinema or film
>>
>>107828054
I see that they added that that thing to requirements.txt in between.
Yeah maybe that's it anon, thanks.
It seems to be running slightly faster too but I haven't rigorously tested that.
>>
>>107827998
I did, and the results were completely unusable. I am genning anime, so I need low-FPS. If I lower the fps from 32 to 16, it destroys the quality AND makes it extremely slow-motion.
>>
lmao now it really is sessler

https://files.catbox.moe/b7kk6y.mp4
>>
>>107828027
>pleasing, entertaining, memorable, high quality
>>
File: file.png (35 KB, 714x299)
35 KB
35 KB PNG
>>107828060 (me)
the documentation for the official lora trainer is LLM written slop so i don't trust it much
it falsely states that triton is only available on linux, and also says that 80GB VRAM is recommended, but who the fuck knows
>>
>>107828078
Kino is an American nigger slang for cinema.
What was later used for 'kinoing' females when they got closer.
In any case it is sub 90 iq language.
>>
>>107828045
Like I've even compared various complex boomer prompts between NetaYume 3.5 / 4.0 and NovelAI 4.5 before, NovelAI was not always better.

There's simply not anything else in terms of a "pure anime focused" local model right now that has anywhere remotely close to the same level of prompt adherence combined with generally good quality as NetaYume. But (some) people insist this isn't just very obviously true in a way that's blatantly disingenous IMO.
>>
>>107828112
didn't the current usage originate on /tv/?
>>
>>107828120
Maybe try reading some books when you get older.
>>
>>107828132
funny coming from someone with zero reading comprehension. i said CURRENT USAGE as in the retarded 4chan slang.
i'm well aware the word has existed for centuries
>>
>>107828113
>There's simply not anything else in terms of a "pure anime focused" local model right now that has anywhere remotely close to the same level of prompt adherence combined with generally good quality as NetaYume.
It should replace Illustrious in OP desu. Not sure why that's still up there but Yume isn't.
>>
>>107828134
Americans are like this. Real King's English is impossible to them.
>>
>>107828134
>hes
saar?
>>
>>107828078
>>107828120
>>107828147
why are you even replying to him hes an ESL retard
>>
miku mcdonalds ad, with 33 frames of the original ad as input:

https://files.catbox.moe/ntgxel.mp4
>>
File: civscreen.png (34 KB, 310x660)
34 KB
34 KB PNG
>>107828072
(I'm the person you're replying to)
yeah it's not perfect for sure (although no model is I guess) which is why I was happy to see that the trainer guy apparently is going to still do more updates after V4.0 simultaneously with his new Z-Image-based project
>>
I don't know, wan has its problems but I think its outputs are better than ltx. Unless a wave of new loras are released, I don't think I'll be using it much. I certainly like the crazy speed though
>>
>>107828112
This.
>>107828120
Yes, and all the low-IQs started using the word.
There is no surer sign of low-IQ than when someone types that word in their post.
>>
>>107828162
Is this why you are here?
>>
>>107828179
gen video with wan, use ltx to add audio to it
>>
>>107828167
https://files.catbox.moe/sjj800.mp4
>>
>>107828179
i want the quality of wan and the motion of ltx
>>
>>107828113
>>107828152
no loras, no use
>>
>>107822279
>4ch vae models are garbage yes,
>>107822296
>just the fact that a model has a 16ch vae means it's unquestionably better at everything.

NetaYume and by extension Neta and Lumina2 is the only model that proves a 16ch VAE means nothing if the model is undercooked.
>>
>>107828194
motion?! Like the fucking powerpoint freeze shit? It's faster but way more gens are worthless because they just freeze up. Maybe it's because I'm doing stuff on the lewder side and the model is too censored?
>>
>>107828223
never experienced this powerpoint stuff, i suspect it's a prompting issue
>>
>>107828223
>Maybe it's because I'm doing stuff on the lewder side and the model is too censored?
it is, (((they))) cucked the layers
>>
>>107828025
Dunno why people might be angry and anti-neta, but I do call it a no-choice because it's
1. too raw, it needs a shitton of additonal training to become "illustrious but with fully utilized 16ch vae and modern llm TE"
2. Lumina-Accessory? No. Controlnets? No. Loras? No. >1mp res? No.
3. And it's slow. I can make a gen with ZIT and secondpass it with Illust and get a more appealing result and it'll take the same time.
>>
>>107828191
nta, do i have to update comfy or can i just add the custom ltx2 nodes? really cant be bothered troubleshooting this time if something goes wrong
>>
is there no way to vae-encode a video and audio and pass it into ltx2 sampler?
>>
>>107828261
nope, impossible, can't be done, pipe dream, fantasy
wait, actually yeah it is, encode and replace empty latent
>>
>>107828261
There is.
>>
motivational golshi:

https://files.catbox.moe/iyzod7.mp4
>>
>>107827797
I have never attached a vae to the input of my Load Image node.
>>
>>107828310
oh thats behind the node, all im using is the image output instead of the load image (if you want i2v with a static image, use that).
>>
>>107828215
>NetaYume and by extension Neta and Lumina2 is the only model that proves a 16ch VAE means nothing if the model is undercooked.

the initial release of Neta Lumina 1.0 was definitely somewhat undercooked. IMO the NetaYume finetune of it really isn't especially as of V3.5 and V4.0.

>>107828197
loras for what exactly though?
>>
>>107827773
did you update pytorch/cuda/etc (python cuda stuff)?
because that can cause micro-changes
>>
>>107828256
>1. too raw, it needs a shitton of additonal training to become "illustrious but with fully utilized 16ch vae and modern llm TE"

It's not perfect but I really don't find the last couple versions of NetaYume to be all that raw. That said even the original release of Neta Lumina 1.0 was quite significantly more coherent and "put together" than the original Illustrious 0.1 was.

>3. And it's slow. I can make a gen with ZIT and secondpass it with Illust and get a more appealing result and it'll take the same time.

that sounds weirdly fast to me, 9ish steps of ZiT should be pretty similar to 25ish steps of NetaYume at the same resolution. Z-Image even distilled is (quite expectedly) step-for-step slower than Lumina 2.0 architecturally by a decent amount.
>>
I worked out my grainy video problem. I was trying to run the model at a smaller resolution to trial prompts, this is not the thing to do.

Make the time shorter and stick to around 1024 to 1280. Resolution too low is super bad for video models it seems.

5B manifests as shakey cam / rainbow output, 14B manifests as grainy video.
>>
>>107828113
I will try it again, what's a good way to prompt for it?
Last time I tried it couldn't mix styles well, and the threshold was far higher than NovelAI to get a character out of it.
NovelAI could shit me a character with barely 30-20 pics on danbooru, while Neta couldn't unless it had at least 1000 pics, and even then some details were wrong, which makes it kind of useless because it doesn't have any loras like noob/nai to compensate for this lack of knowledge
>>
>>107828363
(intentional samefag addon to this comment with something I forgot to mention)

>No. >1mp res? No.

I don't get this point either. That's like the opposite of true, I almost always gen directly at 1280x1536, 1536x1280, or 1536x1536 directly when I use NetaYume as I find it to be more coherent up around there than it is down at more XLish resolutions.
>>
>restart comfyui to make sure it doesn't crash on the next gen
>start a new gen
>it crashes
Th-Thanks...
>>
>>107828391
Maybe my issue is that I never treated it as a real model, only did a bunch of tests and went back to SDXL. I said my piece but didn't intend to defend it.
>>
>>107828390
my only real tips would be leave the Gemma boilerplate "You are an assistant designed to generate anime images based on textual prompts. <Prompt Start>" in for the start of the positive, and "You are an assistant designed to generate low-quality images based on textual prompts. <Prompt Start>" for the start of the negative.

Artist tags from Danbooru should be prefixed with `@` as I guess they trained them that way to help distinguish them better from other tokens. Using token weights that add up to a total of 1.0 in whatever combination seems to work best when using multiple artists.

And generally I find that (without getting into any more complex RES4LYF sampler stuff) DPM++ 2S Ancestral Linear Quadratic at around CFG 5.5 gives the best results.
>>
I just can't understand why ComfyUI doesn't have a Hires fix like forge has.
>>
>>107828430
just a little bit of manual inpainting is WAY better than ANY automatic upscaling method.
>>
>>107828445
huh what does Hires fix have to do with inpainting?
>>
Anyone have a comfy workflow that does first and last image for ltx? The template doesn't seem to have this option
>>
any tips on counteracting the slideshow effect with i2v ltx? lots of my gens are static for the first several seconds or throughout
>>
>>107828430
what do you mean, highres fix is just a second img2img pass, you can set it up in like 1 minute
>>
>>107828336
Characters.

I like to gen from characters in their original style, specifically anime characters as they appear in their anime.
With NetaYume, it might be able to faithfully recreate some popular characters from a prompt, but the less popular the anime, the less accurate the result.
>>
File: jmqyyTE.png (363 KB, 1301x903)
363 KB
363 KB PNG
>>107828430
Hi Res Fix in Comfy is just something like this, pretty straightforward
>>
File: uppy.png (483 KB, 1656x1888)
483 KB
483 KB PNG
>>107828461
hires fix is that you copy paste a second ksampler process (in comfy [take it back to pixel space, dont upscale latent btw]). inpainting manually allows you to upscale much more effectively by masking subject OR background. scaling the background is much more forgiving and you can do in large chunks, upscaling the subject takes more effort, too large on the infill and you will distort your anatomy very easily.

The picture is effectively a "hires-fix." Initial latent is within the models normal resolution (1024 to 1535) then it is upscaled (in pixel space NOT latent.) Then a much lower denoise pass is taken.

I only work on furry so I can't post anything here HAHAHAHAHA!
>>
>>107828482
yeah fair enough, obscure characters is definitely one thing where loras might be useful
>>
File: x_o2ki2i.png (1.9 MB, 2048x1024)
1.9 MB
1.9 MB PNG
>>
>>107828513
He is absolutely right. I mean in real way.
>>
>>107828474
>>107828506
All these always run. I don't want them to always run. I want to chose when to run a Hires fix pass.
Forge has a button for that which I click when I like the first low res pass.
>>
>>107828513
Oh god, don't use that scaling method though, holy crap why do i have it nearest exact. LANCZOS ONLY!
>>
>>107828513
I don't know what that noodle mess of yours does but I mask nothing. I just click a button and it gives me a high res version of my picture.
>>
ok so even after vae-encoding the audio and passing it in, the sampler does not keep the original audio.

All I'm doing is taking audio track with the proper length, encoding it with audio vae, and using it in place of the empty audio latent. Any ideas?
>>
Ok. Seems like you are bit afraid of yourselves. Afraid that someone will come and talk over...
These threads have been like Charles Dickens' stories lately.
>>
>>107828541
what's denoise at
>>
>>107828506
Btw I dont see what the sense in having two samlers there is instead of just running the target resolution in the first node already.
>>
>>107828525
upscaling without the use of an actual trained ESRGAN / DAT / etc architecture model produces objectivly worse results FWIW. The "scale_method" in my pic in:
,>>107828506
refers to like, what's used to scale it BACK DOWN to a ratio of 1.5x the original AFTER the actual upsizing happens via the model (as the `upscale_model_opt` there was IIRC some 4x DAT-based model)
>>
euler a is shit
>>
>>107828471
https://github.com/TTPlanetPig/Comfyui_TTP_Toolset/blob/main/examples/LTX_2_First_Last_I2V_ttp_v2.json
>>
>>107828561
whycome? Is your model euler only?
>>
>>107828550
the only way your idea would work is if Z-Image was a model where I could do 16 steps instead of 8 for the initial pass at the "target resolution" without frying it, but it's not. Even though though that's nearly never as good as gen -> upscale with actual trained upscale model and not dogshit traditional resizing algo -> second pass with KSampler that has literally identical settings other than denoise strength, on the upscaled version.
>>
>>107828565
>>107828561
>>107828548
same poster.
Same interval, same cadence, same aggression.
>>
File: 1762333206901525.png (80 KB, 653x652)
80 KB
80 KB PNG
>>107828083
>If I lower the fps from 32 to 16, it destroys the quality
changing the fps setting in wan workflows just... changes the fps and nothing else. wan itself has no fps setting, it just outputs a specified number of frames that are expected to be compiled to a 16fps video. so yeah, I don't know what you mean by "it destroys the quality".

if you have a video you like but it's too high of a framerate you can use vhs nodes to remove frames. load video, select every second frame, then just connect that to a save video node
>>
>>107828561
for realistic content kinda yeah. Otherwise not usually.
>>
>>107828561
There are way worse sampling methods. For z I mostly use euler but sometimes euler a consistently gives an extra detail that doesn't appear in euler.
>>
>>107828574
nope, silly.
>>
>>107827908
Yeah, you're right, is that LTX results are fucking trash if is not a close-up video, plus it sucks that results are so inconsistent between users, I have a 4090 with 64 gigs of RAM and I keep getting OOM errors, I've been tried several fixes but they are just shit fixes that work once and they are useless for a consistent workflow, plus some users are claiming to get 1440p videos with a 5070 and 16 gigs of ram but reddit is known for being a site full of pajeet liars, comfy implementation so far has been really bad
>>
>>107828578
Euler Ancestral Normal is pretty good with Z yeah. I usually use that or regular Euler SGM Uniform.
>>
>>107828568
Ok idk I never tested that in Comfy well then still the issue remains, that I can not chose whether to run it or not.
>>
>>107828587
sgm uniform (as well as ddim) does strange things sometimes. As in hallucinates things that havent been said especially on more complex prompts (multiple characters each with different detail description).
On 1girl tho, yeah sometimes I use that too.
>>
File: flow.png (1.26 MB, 3582x1427)
1.26 MB
1.26 MB PNG
>>107828530
The process i have in the pic applies the upscale to the entire image all at once. I meant that instead of just allowing the machine to process the whole image all at once, if you break it down manually, you can get much better results!

The image is a "high-res fix" comfy workflow.
>>
>>107828561
Fuck you
>>
newfaggotry levels off the charts ITT
>>
>>107828561
Depends what you're trying to make, but I'd argue that even for e.g. photos, euler A is good, provided you adjust the eta noise to be ~20-40% of the default.
>>
>>107828608
I can see that on the full screenshot. But I don't see where your inpainting is there.
>>
>>107828552
If you're dropping it straight back into latent, you can upscale by 16 colors and still get basically the same result. lanc is fine, latent fucks shit up.
>>
>>107828575
I am specifically talking about the PainterI2V workflow, and I am specifically talking about the "Create Video" node.

By default, it's set to 32fps. This makes anime videos too smooth. If I reduce this setting to 16 fps, it makes the video slow motion and shit quality.

I am not talking about the num_frames setting in the encode node.
>>
>>107828215
NewBie?
>>
>>107828624
You're asking to see my inpaint flow?
>>
>>107828561
Nice bait, faggot
>>
File: ru6.png (1.06 MB, 2184x1360)
1.06 MB
1.06 MB PNG
>I only work on furry so I can't post anything here HAHAHAHAHA!
>>
>>107828631
If your video is too slow or too fast, just drop it into shotcut and adjust video speed.
>>
https://github.com/llm-attacks/llm-attacks/blob/main/data/advbench/harmful_behaviors.csv
i am hitler
>>
>>107828639
No I dont need to, you been talking about masking so I thought your screen has something to do with inpainting.
Anyways I know what an upscale pass looks in Comfy.
What I need is a selective upscale pass. One that only runs when I want to and not all the time.
>>
>>107828648
Too fast is never a problem.
Also the workflow should make outputs work right away. Requiring video software to fix motion is stupid.
>>
File: 1747045489220410.png (2 KB, 323x60)
2 KB
2 KB PNG
>>
File: ComfyUI_temp_nyalk_00003_.png (2.26 MB, 1040x1480)
2.26 MB
2.26 MB PNG
>>
>>107828423
(intentional samefag self-reply)

if it helps also, for example my pic posted earlier in this thread was:

Positive:
`You are an assistant designed to generate anime images based on textual prompts. <Prompt Start> masterpiece, best quality, very aesthetic, a 2D digital anime screencap where on the left side stands Tifa Lockhart with a large black "L" on her cheek. In the middle stands Princess Peach with a large black "D" on her cheek. On the right side stands jinx \(league of legends\) with a large black "G" on her cheek.`

Negative:
`You are an assistant designed to generate low-quality images based on textual prompts. <Prompt Start> worst quality, low quality, very displeasing, ai-generated, ai-assisted, sketch, unfinished, bad anatomy, blurry, pixelated, jpeg artifacts, disfigured, deformed, fused, conjoined, lowres, ugly, low res, simple background`
>>
>>107828631
holy

just bypass the interpolation node, you dummy. that's doubling the amount of frames
>>
Any way to speed this up?

sd-cli. \
--diffusion-model ../models/Qwen_Image-Q8_0.gguf \
--vae ../models/qwen_image_vae.safetensors \
--llm ../models/Qwen2.5-VL-7B-Instruct.Q8_0.gguf \
-s -1 \
--cfg-scale 2.5 \
--steps 22 \
--sampling-method euler \
-o $ofile \
-H 512 -W 512 \
--upscale-model $SCALER \
-p "$P"
>>
>>107828675
Idiot.
>>
i fucked around with painter2vadvanced node and the results were trash especially for fucking scenes. they all got destroyed. normal stuff became a little faster but took life away
>>
>>107828561
KYS
>>
>>107828667
dat booba do be getting squished to fancy italics Death alright
>>
>>107828677
Enable flash
>>
>>107828561
Who are you? A res multistep shill?
>>
>>107828712
Will try, thx.
>>
>be wan2gp
>run on a potato
>allow text, image, video, audio as inputs
>allow last frame as input
>never shit the bed and crash computer
wan2gp won
>>
File: inpaint.png (1.14 MB, 3694x1910)
1.14 MB
1.14 MB PNG
>>107828656
if you dont want it to run just right click a node in the pass and set it to bypass.

PicRel is my inpaint flow, totally different to the very low effort attempt to upscale with "high-resfix"

>>107828660
Using MS paint to blob an input is stupid too then?

>>107828646
Furry is what I use it for. No apology, not sorry, deal with it.
>>
Everyone always talks about Wan2gp is good, but nobody can ever explain why.
Shits all in your head.
>>
>>107828755
see >>107828742
>>
>>107828747
>if you dont want it to run just right click a node in the pass and set it to bypass.
And then when I want to run it? The whole gen starts again - with a different seed.
>>
>>107828755
Show me the comfy workflow that allows audio, video and a final frame as input for ltx-2 and doesn't crash my computer. The only thing comfy potentially has over it is the ability to see decent previews, except it can't for ltx-2
>>
>>107828742
it doesn't allow video as input, it extracts the last frame and uses that for i2v, then concats the two
the only 2 features it's missing is audio and video continuation, I dunno if this is a bug or intentional behaviour, but it will truncate output length to the audio length instead of having the latent the size that you specify
>>
>>107828772
>it extracts the last frame and uses that for i2v
I don't think that's true. I can run a test later. Should be able to do a camera pan right and then continue that with a camera pan left to see if it's the same.
>>
>>107828763
right click and un-bypass.

If you want to preserve seed then click the "control after generate" and select "fixed"
>>
>>107828795
>right click and un-bypass.
The whole gen still runs again from the beginning.
If you want to preserve seed then click the >"control after generate" and select "fixed"
Then I have to switch back to random again, then back to fixed again, back to random...
In forge I click one two buttons: generate and when I want to keep Hires fix (or img2img tab for inpaint etc)...
In Comfy its just so retarded. And I don't understand why no one has made a cache node yet that keeps the last result and can send it to a different sampler when a boolean button is true.
I just don't get how people can use that voluntarily.
>>
euler a is all you ever need
>>
File: 1766091298877110.png (987 KB, 2302x678)
987 KB
987 KB PNG
/ldg/ on suicide watch
>>
>>107828823
drag drop the image generated to the browser, it will load the flow including the seed, loading the same seed feels like cope, like you don't really know know how to specify something properly.
>>
>>107828853
So I need to download the low res image, drag it into browser, run the workflow again then download the upscaled gen and delete the low res...
I want to click one fucking button when I want to keep an image.
>>
>>107828849
At least I finally understand what omni-base means and why it's going to be out together with the regular base.
>>
>>107828876
i have no fucking idea what youre trying to do
>>
>>107828895
I want to click ONE button to upscale the image I just generated. What is so hard to understand with that?
>>
File: jr4.png (3.53 MB, 1920x2016)
3.53 MB
3.53 MB PNG
>Furry is what I use it for. No apology, not sorry, deal with it.
>>
>>107828561
What's wrong woth euler a, can't implement it in your shitty wrapper?
>>
>>107828768
>>107828760
You're missing my point. Nobody can pull out and point to specifically what's going on in the script that makes it good.
>>
>>107828849
Human Feeback = Commissars
>>
>>107828925
especially the comfy devs desu
>>
>>107828914
Then just add the upscale nodes to the comfy workflow, you can click one button and you will get the upscaled image at the end.
>>
File: akRgGXl9_700w_0.jpg (52 KB, 700x811)
52 KB
52 KB JPG
>>107828967
That's not what I wanted FFS. I don't want to upscale every image.
>>
>>107828915
prompt and model?
>>
why does comfy not used shared memory? is his smart memory too smart?
>>
File: output.mp4 (1.69 MB, 512x512)
1.69 MB
1.69 MB MP4
>>107828772
>>107828789
here's the pan up. Let's see if it can remember
>>
>>107828995
Jesus fucking Christ, make your fucking mind up?!

Are you a woman? Holy SHIT!
>>
File: output2.mp4 (3.46 MB, 512x512)
3.46 MB
3.46 MB MP4
>>107828772
>>107828789
>>107829002
looks like you were right anon. It's so over.
>>
So several guides across 4chin suggest forge/reforge as an easy starting point, but then its called a "WebUI".
What does that mean in the sense of 'local' generation if its a 'web' UI.
I want to have a setup that could work even if my internet was not connected, I don't want the program to send anything in or out and at any point via internet.
>>
>>107829012
You go and upscale every image even when you don't want to keep it.
Fine.
I don't.
My first pass runs 30s and the upscale 40-60s. I don't want to wait 1:10min and then throw it away. I want to see after 30s if it's worth the upscaling.
That's retarded.
How is the issue so hard to grasp for you?
>>
File: jr5.png (1.87 MB, 1920x2016)
1.87 MB
1.87 MB PNG
>newfaggotry levels off the charts ITT
>>
>>107829038
just means the front end is a webapp, the server is your own computer.
>>
>>107829002
A man with a pan.
>>
>>107828945
touché
>>
File: 2 step 2 image.png (9 KB, 832x97)
9 KB
9 KB PNG
>>107829040
Then just hit cancel numbnuts! The process will cancel if you hit cancel.

I use a similar process to decide if it's worth letting the larger upscale run or not. Just hit cancel if the first ksampler result is shit. PicRel, 30 steps on first run, then if it's ok i let it run the other 20 steps at the higher res.

>>107829038
the UI runs a web server on your PC that is local to you and isolated to you.

>>107829041
Unhelpful CUNT, I hope you stand on a british standard electrical plug
>>
>>107828561
I love you please come to my house and fuck my wife please please please
>>
File: fw6.png (1013 KB, 1536x1784)
1013 KB
1013 KB PNG
>Unhelpful CUNT, I hope you stand on a british standard electrical plug
>>
>>107829102
>Then just hit cancel numbnuts! The process will cancel if you hit cancel.
That shows me the blurry preview, not my low res.

When I see the finished low res image I can decide upscale, inpainting or throw away. The tiny preview is NOT the fucking same.

Dude I ran that shit for a month.

Holy fucking shit are you shilling ComfyUI or wtf is your problem?
>>
File: Untitled.png (20 KB, 865x374)
20 KB
20 KB PNG
>>107826985
What's the difference between the desktop application and the manual install for comfy?
Idk shit about coding, so the manual option looks intimidating.
Will I miss out on any features by installing the desktop app instead?
>>
File: output3.mp4 (3.2 MB, 512x512)
3.2 MB
3.2 MB MP4
>>107828772
>>107828789
>>107829002
>>107829017
I take that back. I might just need to play with the sliding window a little more.
>>
File: img.png (449 KB, 2953x1331)
449 KB
449 KB PNG
>>107829138
Or use the preview image node to see the full preview at that point.

Wait a fucking minute, are you expecting a full preview of the final image after only running the first KSampler?

>>107829130
is pajeet teeny cock upsetty wetty?
>>
Fresh when ready

>>107829212
>>107829212
>>107829212
>>107829212
>>
>>107829207
>Wait a fucking minute, are you expecting a full preview of the final image after only running the first KSampler?
Yes. What's what I get. Just with a lower resolution.
>>
>>107829050
>>107829102
>webapp, the server is your own computer.
Oh that makes sense. Sorta like localhost when you are doing HTML/CSS coding?
Thanks lads. Gonna give setup a shot now then.
>>
>>107829220
You don't get a full preview until the full preview has been generated. You can preview at multiple stages in the process, but you will not see the final result until you generate it . OBVIOSULY.
>>
>>107829253
I am genning at 896p for the first pass. Then on upscale I run 1.5x of that. That 896 is my full low res preview.

With your workflow I have to disable the second sampler, then set the seeds to fixed, enable the second sampler and run the whole process again.

Your solution to that was to cancel the process. There I only see the blurry preview in the sampler node.
>>
>>107829253
I really don't understand how this concept doesn't get into your brain.
>>
>>107829292
You don't need to gen the second pass at the same seed as the first.
>>
>>107829357
no shit sherlock
>>
>>107829361
Oh well then, over the past 6 or so [posts I've tried to help but your IQ is clearly so low you don't know how to formulate a question. Therefore, fuck yourself and work it out on your own! Good day!
>>
>>107829394
The fact that you don't understand the question for a single button that runs the same image through a second pass without generating it from the beginning again, is not an issue of my IQ but yours.
>>
>>107829419
The fact you need my help to generate a workflow to create whatever the fuck you want your orange button to do, is an issue of your IQ not mine.
>>
>>107829506
I didn't need your help creating a second pass workflow, retard.
>>
>>107829536
What did you need help with then? I'm still wanting to help you out?
>>
>>107829551
I didn't want help I asked why comfy is so retarded that you can't do >>107829419
I repeated that a few times. In quite a simple language.
>>
>>107829582
But if your flow has 2 kSamplers, you're running the same image twice?
>>
>>107829593
That's a two stage pass not a double generation. What do you want now
>>
>>107829606
Nothing, you do you.

LOL
>>
>>107829623 <- still doesn't get it, right?
>>
>>107829646
Hey man, my final outputs are approx 15360x15360 pixel perfect. If you're struggling I'm still happy to try and help?
>>
>>107829659
Nice for you.
But why should I care?

I already told you how long a high res gen takes here.
>>
>>107829681
Dude, I'm not mad at you, I'm trying to give you the same tools I have.
>>
>>107829693
I have the same tools. What part of "I didn't need your help" did you not understand? I mean really, you should work on your reading comprehension.
>>
>>107829711
ok, bye
>>
>>107829719
(You)



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.