[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Even Comfy Himself Edition

Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>106988458

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Neta Lumina
https://civitai.com/models/1790792?modelVersionId=2298660
https://gumgum10.github.io/gumgum.github.io/https://huggingface.co/neta-art/Neta-Lumina

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
THREE MORE YEARS OF SDXL
>>
>>106991205
>https://gumgum10.github.io/gumgum.github.io/https://huggingface.co/neta-art/Neta-Lumina
fix this link and seperate them you retarded nigger faggot
>>
File: ComfyUI_23220_.png (749 KB, 1280x720)
749 KB
749 KB PNG
>>
>>106991228
How you achieved a crappy camcorder look? is it a lora or just some prompting?
>>
>>106991232
it's this
https://civitai.com/models/1134895/2000s-analog-core
>>
how do I create hyperslop
>>
>>106991238
use a mix or merge created in the last six to nine or so months
>>
File: ComfyUI_00316_.png (1.07 MB, 912x1144)
1.07 MB
1.07 MB PNG
>>
>>106991233
oh yes I fucking love grainy analog y2k look, slop me more bra
>>
>>106990996
that's just for part of it in the most recent version though, it's probably not a big deal. Mixed NLP / tag captions are generally what you want for this kind of model anyways.

>>106991062
that's not gonna happen lmao, it would take an enormously huge amount of degradation given the text encoder itself is far superior to CLIP
>>
File: ComfyUI_00301_.png (1.06 MB, 984x1064)
1.06 MB
1.06 MB PNG
>>
>>106991252
>that's not gonna happen lmao, it would take an enormously huge amount of degradation given the text encoder itself is far superior to CLIP
tbqh i think it's saying something considering the model still retains a lot of its original knowledge even after extensive training on anime
>>
File: ComfyUI_00324_.png (979 KB, 1024x1016)
979 KB
979 KB PNG
>>
I wish Qwen was nearly 1/5th as good as Chroma
>>
>>106991261
yeah the realism that must be from base Lumina isn't that degraded at all, you can bring it back pretty easily with boomer prompts
>>
I wish Chroma wasn’t 1/5 the resolution of Qwen
>>
File: ComfyUI_00311_.png (1.08 MB, 1248x832)
1.08 MB
1.08 MB PNG
>>
>>106991268
KEEEEEEK chroma really was trained at 512x512 in 2025. embarrassing!
>>
Does anyone else notice Yume suffers from duplications at resolutions higher than ~1400px. Need controlnets ASAP.
>>
>>106991282
Or just a non shit model
>>
>>106991282
Well yeah, clearly it’s not trained above that resolution. Happens with SD1.5 above 768 and SDXL above 1200
>>
>>106991282
not really, I gen at 1536x1536 with it all the time. Even higher every now and then. Could depend on your artist tags though possibly
>>
File: 1750753560885489.mp4 (948 KB, 1280x720)
948 KB
948 KB MP4
>>106991228
>>
File: ComfyUI_23225_.png (771 KB, 1280x720)
771 KB
771 KB PNG
>>
File: 1756984080829792.png (637 KB, 1758x693)
637 KB
637 KB PNG
i'm using a bunch of different face detailers in my workflow, but i think i would be getting way better results if i took the detected area, resized it, inpainted it at a higher resolution, and then downscaled it. is there a clean way to do this? would simply resizing the whole image before and after work well?
>>
>>106991297
IDK about Neta Lumina 1.0 but Yume is supposed to have been mult-res trained at between 768 and 1536.
>>
File: AnimateDiff_00001.mp4 (1.49 MB, 480x848)
1.49 MB
1.49 MB MP4
>try out the double loras for an old gen

Jesus christ, why is it so bad?
>>
>>106991302
>Could depend on your artist tags though possibly
Without a doubt, now that I think about it. Still I would love cnets so I can at least gen at a lower res in order to choose which I throw through a second pass. Like other high(er than XL) res models, it's a cool feature but I much prefer "highres fix"ing instead.
>>
How much longer until local reach midjourney levels?
>>
File: ComfyUI_07801_.png (2.25 MB, 1152x1152)
2.25 MB
2.25 MB PNG
>>106991191
Chroma really being carried by prompt engineering here. Can't do this at all if I describe
>Lifeless body of a man
And descriptions as such just gives me body horror.

But then changing that to
>sleeping man, who lays with his arms spread, eyes closed

Is much closer to what I want even if not perfect (I wanted her to hold axe, but then it can't depict it unless I have her standing there alone).

Chroma truly is all about prompt engineering and that's why Plebbitors are sleeping on it.

>>106991193
It's down, but prompt was
>Amateur flash photograph capturing a striking and adventurous beautiful young Japanese idol woman, embodying a mix of fierce determination and ethereal beauty, squatting low in a shadowy woodland clearing at night beside the sleeping man, who lays with his arms spread, eyes closed, extended across the leaf-strewn ground with a faint glimmer of crimson catching the camera's harsh light. She grips a katana, its sharp blade prominently displayed, with dried blood on it, and held in a manner both triumphant and solemn, as if to mark a rite of passage in this rugged outdoor expedition. She has long, dark hair with heavy bangs covering her forehead, and seems to be wearing makeup that creates a tired or distressed look, with smudged eyes and possibly pale skin. She gazes directly into the lens with wide, intense eyes enhanced by subtle makeup—her expression a magnetic blend of quiet pride, melancholy, and idol-like poise—while her chilled cheeks flush with the effort. Her attire is as dark as her: A maid dress, with ripped stockings. The backdrop fades into a veil of dense trees and tangled undergrowth barely touched by the abrupt, brilliant flare of the flash, suggesting the vast obscurity of the woods beyond. The overall scene radiates themes of survival instinct, primal empowerment, and the uncanny allure of an idol transformed into a huntress under the stark, unflinching glow of a nighttime capture.
>>
>>106991315
have you tried just using two KSamplers where everything is exactly the same except for the denoise strength, with an upscale model in the middle? Should work fine at like 0.3 - 0.4 strength, that's how I upscale with Neta sometimes. Unless your reason for using controlnet tile was solely to save memory
>>
>>106991342
>>106991224
>>
>>106991345
You can thank T5 for that. The thing wants extremely literal prompts or else it'll misinterpret it.
>>
>>106991342
It has, MidJourney isn't even close to the top of any benchmark chart that exists anywhere
>>
>>106991355
i think that's the point he's making. local is so shit for the past few years and the next several years while API is constantly raising the bar every week
>>
>>106991355
There are no valid charts for image models
>>
File: ComfyUI_23230_.png (747 KB, 1280x720)
747 KB
747 KB PNG
>>
>>106991360
SDXL came out in late June 2023, Flux came out in August 2024.
>>
>>106991354
Just requires intuition what works and what doesn't. It can depict two people together perfectly, and I have confirmed that thanks to the POV experiments, so after that you just guess what words it wants and where it wants them.
>>
>>106991346
I'm sure it would, but I prefer using laten upscale which requires a high denoise and thus cnets. Pixelspace upscaling is often not terrible, but latent is superior hands down.
>Unless your reason for using controlnet tile was solely to save memory
No, just because I think latent is much better.
>>
>>106991355
> top of any benchmark chart
like hynuan 3.0?
>>
all midjourney gens rook same same though. sometimes its okay but often it ruins it.
>>
>>106991370
latent upscale is just using traditional dumb algos to increase the size before you move into the next KSampler, it's not superior in any way to using a purpose trained ESRGAN / DAT / etc model to do the exact same thing, really it's worse by all accounts. I frankly don't understand what you mean.
>>
>>106991388
For one, needing to translate to and from pixel space isn't lossy... so just based on that it's better. Also the use of cnets allows the user more control over how the second pass holds to or departs from the original image. Subjectively, any denoise lower than 0.4 is pointless anyway.
>DAT
Desu the best out of the bunch but still not as good as latent when doing comparisons.
>>
>>106991397
>isn't lossy.
*isn't lossless
>>
Give me one good reason why training with synthetic content is a bad thing
>>
>>106991307
Are you using comfy or some variation of forge/webui/etc? What you described is the inpaint behavior in webui if you have "masked area only" set. It scales the area to whatever resolution your have appreciated and you can specify a padding to bring in more context around the inpainted region
>>
>>106991397
>needing to translate to and from pixel space isn't lossy
nta but the absolute best upscaling workflow imo would be training DAT but exclusively on VAE degradation. traditional latent upscaling methods aren't great because they're, like the other anon said, using dumb algos like bicubic, nearest, etc. with the very low resolution of the actual latents, this often hurts details more than it helps. the true endgame would be using a model similar to DAT but in latent space, but this would require a much much more powerful arch due to the very low resolution of the latents.
>>
>>106991370
>>106991397
You're relying on intuition. Empirically it's better to upscale the raw image, not the latents. Yes it requires one extra pass through the vae, but that isn't as lossy as you think
>>
>>106991423
>>106991428
Perhaps. I have done direct 1:1 tests (on XL to be clear) and pixel space has always fucked the outputs. Again, sure it's often not terrible but latents superiority virtually jumps out of the screen at me.
>dumb algos like bicubic, nearest, etc.
I wish Comfy had that aliased latent upscale that whatever Forge fork has.
>but that isn't as lossy as you think
This is especially true with *Lumina models but, again, I have done tests, and the benefits of latent far surpass that of pixelspace.
With Chroma I was surprised at how well it holds an image when doing pixelspace upscale second pass, but even that still falls apart when you push the denoise up to anything close to .7.

>>106991428
What is the downside to cnet support, regardless of mine and your points? I don't see a reason to NOT have them desu.
>>
>>106991345
Hunyuan 3. Yeah, about those benchmark rankings...
>>
File: dbhgzdhbgzdfhbzd.png (11 KB, 336x254)
11 KB
11 KB PNG
>>106991469
>I wish Comfy had that aliased latent upscale that whatever Forge fork has.
this?
>>
File: 1752290634499048.png (19 KB, 716x252)
19 KB
19 KB PNG
>>106991412
im using comfy right now. resizing the whole image beforehand works, however it seems to mess with bbox detection.
>>
>>106991477
Yes, and that bicubic antialiased.

>>106991388
>>106991423
>traditional dumb algos
Is there a problem with Bislerp? I only use that.
>>
>>106991469
>pixel space has always fucked the outputs
hasn't been the case in my experience. if anything latent upscale often introduced more artifacts for me.
>>
Name one thing chroma does better than other models
>>
>>106991489
I think often the problem lies in ones cnet settings and prompt. It's a bitch to dial in (especially with some models) but once one does, it's like magic.
>>
>>106991484
Use case for latent upscaling?
(I jumped into this conversation just to help jog your memory i have no idea whats going on im running on 2% brainpower but i wanna see were this goes)

i ran some upscales with latent bicubic antialiased and it looks really good
>>
>>106991495
Its the only model that has a built-in noise filter
>>
>>106991508
>>106991397
Even if the loss is minimal, the logical approach is to minimize it as much as possible as in not translating at all. It is admittedly less now with models like Flux, Lumina, and other modern arch compared to the shit that is XL's. But still, it's there.
For past models, it was most apparent in the colors and high noise details. Even with a suped up external VAE.
>>
sell me on using Qwen
>>
File: 00041-962605011.png (2.17 MB, 1792x1024)
2.17 MB
2.17 MB PNG
>>
>>106991556
You can use the analog lora and pretend its chroma to trick anons into thinking chroma is actually good
>>
>>106991556
it's like chroma but worse in every way
>>
>>106991556
highest param open image model ever released
>>
>>106991572
Post a chroma guitar with 6 strings and 6 pegs
>>
>>106991556
it's like chroma but better in every way*
*very bad seed variety and no nsfw
>>
>>106991579
best i can do is a 1girl
>>
>>106991577
broski, your hunyuan 3 80B?
>>
File: 1733973356023155.jpg (802 KB, 1336x2008)
802 KB
802 KB JPG
>>
File: 521545215448.jpg (1.11 MB, 873x873)
1.11 MB
1.11 MB JPG
>>106991355
>>106991342
Local caught up around Flux. That's when its LoRAs were really up there.

For realism, MJ is currently not that good. Pic rel are four MJ gens made not too long ago. SDXL tier crap (though you could argue it's better than SDXL all you want, it's still not Flux tier).
>>
File: file.png (842 KB, 764x823)
842 KB
842 KB PNG
>>106991264
finally... untooned if it was good
>>
>>106991600
do peter griffin
>>
>>106991601
>>>/r/
>>
>>106991607
it wasn't a request
>>
File: 1756524604760972.png (2.69 MB, 3000x1680)
2.69 MB
2.69 MB PNG
>>106991224
SDXL be like
>>
Trying the latent upscale from an anon from before. It won't work for using the same image as last frame, I guess this is because the low noise now has the upscaled resolution, but am I not feeding it the upscaled resolution?
>>
File: 1732399560434934.png (3.77 MB, 3727x965)
3.77 MB
3.77 MB PNG
https://noamissachar.github.io/DyPE/
slop in 4k let's goo!
>>
>>106991704
chromaxysters... we won...
>>
>>106991694
Oh, I was talking only about images. No idea for videos.
>>
File: 1733362971554566.png (129 KB, 1625x507)
129 KB
129 KB PNG
https://xcancel.com/bdsqlsz/status/1981610051422040067#m
new cope soon(TM)
>>
>Keeps models loaded onto vram even after closing
>Logs prompts and send them for """telemetry"'" purposes
>Will soon be closed source
Tell me again why Comfy is good?
>>
>>106991738
>Logs prompts and send them
me when I lie
>>
>>106991712
Shit. Well I got it to not error by doing pic related. But the genned result is just a static image.
>>
ComfyUI Hijacks your phone and sends your dick pic to Comfyanon himself
>>
>>106991746
but i already do that myself
>>
File: 00062-3075492450.png (1.81 MB, 1792x1024)
1.81 MB
1.81 MB PNG
>>
>>106991753
>>106991566
interested in recipe
>>
>>106991738
Why is Comfy such a promptlet that he needs to steal other people's prompts?
>>
File: 00071-4070013988.png (2.21 MB, 1792x1024)
2.21 MB
2.21 MB PNG
>>106991813
i'll try, but share places are getting retarded...
>>
File: 00068-4070013985.png (2.1 MB, 1792x1024)
2.1 MB
2.1 MB PNG
>>106991813
gettem while they're hot
https://litter.catbox.moe/9alwt7ad0aziq1r9.png
https://litter.catbox.moe/fhji583fokthr8h8.png
>>
>>106991704
Looks insane
>>
Wansisters, long vid 2.2 is here

>State-of-the-art text-to-video models excel at generating isolated clips but fall short of creating coherent, multi-shot narratives—the essence of storytelling. We bridge this "narrative gap" with HoloCine, a framework that generates entire scenes holistically to ensure global consistency from the first shot to the last. Our architecture achieves precise directorial control through a Window Cross-Attention mechanism that localizes text prompts to specific shots, while a Sparse Inter-Shot Self-Attention pattern—dense within shots but sparse between them—ensures the efficiency required for minute-scale generation. Beyond setting a new state-of-the-art in narrative coherence, HoloCine develops remarkable emergent abilities: a persistent memory for characters and scenes, and an intuitive grasp of cinematic techniques. Our work marks a pivotal shift from clip synthesis towards automated cinematic storytelling.

https://github.com/yihao-meng/HoloCine
https://huggingface.co/hlwang06/HoloCine/tree/main/HoloCine_dit/full
https://holo-cine.github.io/
>>
File: 1746197977693273.jpg (727 KB, 1336x2008)
727 KB
727 KB JPG
>>
Reminder the next release of Wan is already showing to be better than Sora2
>>
>>106991556
qwen image is boring and has terrible seed rng variety.
>>
>order 96gb of ram because it's the only thing in stock and other stores have no date for restock
>meant to be delivered yesterday
>got delayed till monday
>get an email now that it's out of stock

But with me searching again led me to a 192gb pack and it's completely in stock and arrives monday.

What a blessing.
>>
What is the Jeets recommendation for a good image model?
>>
>>106991871
It needs the new code to work properly, kijai is supposedly working on it at the moment.
Finally 5 sec slop will stop, been waiting for this for like a year.
>>
>>106991871
>no audio+video combined generation.
fuck off with this bullshit.
>57.2gb
dead in the water, even ovi would have better potential community support if properly integrated with comfy, wan2gp and neoforge.
>>
>>106991930
>gen 10 minute vidoe
>prompt completely fails 6 minutes in
5 seconds will always be superior
>>
File: latentupscale_00008.mp4 (3.68 MB, 720x1200)
3.68 MB
3.68 MB MP4
>>106991742
I can't get around this it seems.
I guess using a last frame doesn't work when low noise is starting from step 0 with less than 1 on denoise?
>>
>>106991945
Can't tell if trolling or legitimately braindead.
>>
File: 1748853342899229.jpg (784 KB, 2000x1336)
784 KB
784 KB JPG
>>
File: 00078-97005244.png (1.59 MB, 1792x1024)
1.59 MB
1.59 MB PNG
https://youtu.be/9HwCNiUtYv4
gn
>>
File: latentupscale_00004.mp4 (1.99 MB, 720x1200)
1.99 MB
1.99 MB MP4
>>106991953
Compared to just using first frame. Stuff actually happens and the latent upscale is working.
>>
>>106991845
got, tyvm
>>
>>106991976
I went back to the original workflow and hooked up one single thing and it just works..
I shouldn't be doing these things after waking up and desperately needing to take a shit.
>>
File: 00023-4065159245.png (1.89 MB, 1248x1824)
1.89 MB
1.89 MB PNG
>>
>>106991921
Are you asking in order to avoid it?
>>
Why is there so many gay loras for Chroma?
>>
File: 1755744657747127.jpg (729 KB, 2000x1336)
729 KB
729 KB JPG
>>
>>106992093
because the chroma creator is a gay furry (I'm not joking)
>>
File: image_00296_.jpg (353 KB, 1240x1672)
353 KB
353 KB JPG
>>
File: 1744848557854756.png (490 KB, 1080x720)
490 KB
490 KB PNG
can you do this shit with 16gb vram? i stopped paying attention to new models after flux because i was already pushing the limits of my card
>>
File: 00041-1838771304.png (2.43 MB, 1536x1536)
2.43 MB
2.43 MB PNG
>>
>>106992190
You can do that on 3GB of vram
>>
>>106992190
That example looks untrustworthy. Qwen-E is good at preserving text style and combining images but a restoration like that seems out of its reach.
>>
>>106992190
>>106992214
Can I use it to make nudes (of adults)?
>>
>>106992235
Ask >>106992209
>>
File: 00046-563496202.png (2.41 MB, 1824x1248)
2.41 MB
2.41 MB PNG
>>
>>106991883
Of course it is, they upgraded to SaaS for Wan2.5 which is why they were able to compete
>>
>>106992294
retard
>>
>>106992209
i love me some plastic
>>
wan2.5 will be local just like mogao, only two more weeks of waiting!
>>
File: image_00300_.jpg (424 KB, 1240x1672)
424 KB
424 KB JPG
>>
>saastech so powerful it let Wan skip over 2.3 and 2.4
It’s no surprise local is so far behind, SaaS must be literal magic
>>
>>106991883
>the next release of Wan is already showing to be better than Sora2
you mean wan 3.0?
>>
>>106992271
I tried qwen image edit but the workflow says it needs more than 16gb vram and indeed it did not work
>>
>>106991736
let's hope it won't be another slopped shit this time, wake the fuck up chinks and stop training your models with synthetic data
>>
File: 00065-1699447175.png (2.51 MB, 1248x1824)
2.51 MB
2.51 MB PNG
>>
File: file.png (319 KB, 450x532)
319 KB
319 KB PNG
>>106991601
the homer one isnt ai its just an old thing someone by the name of pixeloo made, they called it untoons
>>
>>106992386
ltx 2 seems way better anyway and that's confirmed to be open source in november and running on consumer gpus, alibaba can suck it.
>>
>>106992615
the ltx guys always give the distilled shit model though no?
>>
>>106992235
Yes, with the clothes remover lora.
Go back a few threads for a link.
Hopefully it still works.
>>
>>106992615
It’s also a western model, and western models are better quality than chinese slop. It’s just we rarely get weights without bullshit attached
>>
>>106992492
currently running a qwen image edit on my 16gb card, amd at that
so you're on several layers of skill issues here
>>
>>106992638
>It’s just we rarely get weights without bullshit attached
when was the last time we got a non distilled western model? lool
>>
Why is shitjai still paying attention to that absolute svi dogshit loras?
The new holo finetune seems orders of magnitude better, guess he's too dumb to work with different and complex code when he vibecodes with claude.
>>
>>106992649
was sd3 distilled? sd3 also had bullshit attached with the license though. maybe sd cascade or sdxl
>>
>>106992653
You should do it since you have everything figured out
>>
>>106992660
Don't need to till so much autistic finngolian
>>
comfyui and forge should switch names
>>
File: QwenEdit_00222_.png (1.24 MB, 912x1144)
1.24 MB
1.24 MB PNG
>>106992190
>>106992229
I just ran it through qwen edit no loras because you had me curious.
Prompt:
adjust the color of the image to a realistic photo
>>
File: F73bqw3.png (280 KB, 657x829)
280 KB
280 KB PNG
>>106992725
input
>>
>>106991736
he didn't say when it will be released?
>>
File: 00074-2234657624.png (2.81 MB, 1248x1824)
2.81 MB
2.81 MB PNG
>>
man I love the pornmix plastic sloppa
>>
File: hindenburg.png (2.2 MB, 2144x1000)
2.2 MB
2.2 MB PNG
not too bad honestly
a bit slopped but eh
>>
>>106991845
>>106992002
Fuck, I missed it at work
Reup please?
>>
when will local reach this level of kino? >>>/wsg/6008898
>>
File: 00092-1659676425.png (2.1 MB, 1248x1824)
2.1 MB
2.1 MB PNG
>>
File: 1740643033570938.mp4 (616 KB, 832x480)
616 KB
616 KB MP4
>>106991871
>HoloCine
16 seconds is hype. I'm staying optimistic until I run this myself
I'm already bored of video without audio now though
>>
File: babby_loicense.png (1.06 MB, 1400x620)
1.06 MB
1.06 MB PNG
>>106992855
>>106992725
>>
File: 00101-166946628.png (2.98 MB, 1824x1248)
2.98 MB
2.98 MB PNG
>>
File: old_photo.jpg (34 KB, 436x550)
34 KB
34 KB JPG
>>106992190

manual edits with 2GB ram
>>
File: 00103-3793658223.png (2.86 MB, 1248x1824)
2.86 MB
2.86 MB PNG
>>
>>106991871
looks kino desu
cumfart when????
>>
>>106992977
Kijai seems to be struggling with the implementation at the moment
>>
File: 233595738.jpg (600 KB, 1664x2432)
600 KB
600 KB JPG
>>
File: ComfyUI_00094_.png (957 KB, 1168x888)
957 KB
957 KB PNG
>>
>>106991694
dude don't bother. the output is slopped and the background is grainy. anon must've been trolling
>>
>>106991738
none of those are true, julien
>>
>>106993062
No it's working for me. The quality is equivalent to going 720p on low noise, but the motion is enhanced.
>>
File: 00110-2218708856.png (2.62 MB, 1248x1824)
2.62 MB
2.62 MB PNG
>>
>>106993079
I'm looking at your workflow and both samplers are genning at 704p, no?
>>
>>106993118
Oh ignore that one, I wrote in a later post that I went back to the original one and it's working, but it's not quite as good for last frame, weird things happening.
>>
File: 00118-2949417064.png (2.38 MB, 1824x1248)
2.38 MB
2.38 MB PNG
>>
>>106991736
Oh god, if it's good enough to kill chroma I'm all for it.
>>
>>106991704
LET'S GET DYPED UP DYPER BROTHERS
>>
>>106991736
>trusting this faggot when he said the same about the new wan model
LOL
>>
File: file.png (5 KB, 429x50)
5 KB
5 KB PNG
>>106991704
uhmmm sisters??? this doesnt look right
>>
>>106993259
>he got wrong one time out of 1000 therefore we shouldn't trust him anymore
meh, he still has a great ratio though
>>
>>106993282
the wan ragpull left a deep scar man
>>
>>106993290
When ltx 2 gets out wan is basically dead anyway.
>>
So I take it neta lumina isn't good with text
>>
>>106993299
meh, I saw some ltx 2 videos, the sound is atrocious
>>
>>106993299
doesnt really look better visually, we'll see how it trains and how it hold coherence but wan 2.2 is pretty good for physics actually and for cartoony art styles already

the 4k 50fps long generations is good on paper but means little if the videos genned ultimately look like they are 720p "upscales"

although its obvious ltx was trained very heavily on veo 3, given it copies its voice styles very closely, so at least we will have veo 3 mini at home for ok audio and video gen

and we will also see about speed compared to wan
>>
>>106993299
from the few clips I've seen it looks really slopped, maybe for I2V it will be good though, I want my own I2V grok meme generator at home >>>/wsg/6009078
>>
The based chinks are waiting for someone to btfo them hard before the finally have to pull out the trump card of just saying fuck kikes and ip "rights" and train on the entirety of youtube they must have been scraping all this time like sora 2 and all movies and cartoons ever made to finally get a huge boost in model quality and knowledge.

They don't want to do it too soon because if they put out a great model that knows all popular media:
1. IP "rights" holder companies will put large pressure on China to shut it down.
2. They will have no more trump cards until they can make their own gpus which wont be for a couple more years and everyone else will be able to train on their models while adding their own advancements, leaving China to follow behind

So by always having this extra aspect of being able to train on copyrighted media, they have a reasonably big leeway to do whatever and always be able to add the extra high quality copyrighted dataset spice to get juust near the top of the list of good gen ai models
>>
When will the based chinkoids finally release a vram monster
I'm rooting for the insects
>>
>>106993371
>The based chinks
I'll call them based the day they'll really do train their model on the entirety of youtube like OpenAI did
>>
>>106993379
What matters is releasing good models, and wan 2.1 was a huge jump they contributed that wont be matched any time soon that they didnt need to release, there was no pressure in that space from anyone else, hunyuan was okish but still very much a toy
>>
So how does bucketing and batch size work together?
I am training with a batch size of 2. I have some buckets with odd number of images.
I have 65 images, no repeats and 10 epochs. This should give me 325 steps, based on batch size 2.
Judging by the fact that I have 360 total steps, I am guessing the training script is doing some steps with batch size 1 to compensate for the odd numbered buckets.
The question is, does this have an adverse effect on the training quality? Should I manually resize or use higher bucket steps like 128?
>>
>>106993389
>hunyuan was okish but still very much a toy
Tencent has the balls to put nudity on their models, but Wan has more competent engineers, unfortunately :(
>>
>>106993397
Wasn't the Wan 2.1 chinese project page memed for having coombait women in their literal cherry picked examples at the beginning?

Also its very good at everything NSFW with any NSFW lora.
>>
>>106993412
>Wasn't the Wan 2.1 chinese project page memed for having coombait women in their literal cherry picked examples at the beginning?
I remember that, it was a fake website unfortunately kek
>>
>>106993415
I was here the entire time and don't remember it being exposed as a fake website, i dont feel like going through the archives and my second point still stands to disprove the censorship part
>>
>>106993422
>my second point still stands to disprove the censorship part
your second point destroys your initial argument though >>106993371

you said "you want NSFW, just do a lora bro" but at the same time you want IP shit on the base model, why can't we respond to that "you want IP characters on the model? just do a lora bro"
>>
https://xcancel.com/maxescu/status/1981416100303950309#m
it's so fucking plastic, why do they all train their model on synthetic shit, I'm going craaazzyyy, only OpenAI doesn't do that
>>
>>106993371
>based chinks
first, they're not based at all. second, they can't beat sora 2. or maybe in 5 years kek
>>
File: 1737452916818240.png (224 KB, 487x467)
224 KB
224 KB PNG
>>106993467
>https://xcancel.com/maxescu/status/1981416100303950309#m
>flux chin
DOA
>>
>>106993475
>they can't beat sora 2. or maybe in 5 years kek
they'll never beat sora 2 if they keep training their model on synthetic data, one day they must learn that they can't cheap out on the dataset, it'll always be the most important thing on deep learning, period
>>
>>106993443
When mentioning the censorship i meant its not censored against nsfw, given that it so easily learns any nsfw concept in lora training.

With IP characters, it doesn't learn them as fast and they are not the same type of data to expect the model to be able to generate compared to nsfw because there is a difference for a company to train heavily on youtube and when asked about IP rights say "oh well we trained on everything like sora 2 did" versus them training on a huge dataset of literal porn.

And im not saying a model should be limited in the data its trained on even when it comes to porn given the anatomy benefits, its just that training on porn for a model company is not something that we can almost ever really expect to happen, meaning when it comes to the discussion of censorship, what that really means is that as long as the model is not specifically trained against genning nsfw or it gets lobotomized to the tier of sd3 so it cant even generate women, then that is good enough of a sign that the model's core wasnt "censored"/lobotomized.
>>
>>106993498
>there is a difference for a company to train heavily on youtube and when asked about IP rights say "oh well we trained on everything like sora 2 did" versus them training on a huge dataset of literal porn.
it's way more dangerous to train on IP, you can piss off anime artists, celebrities... OpenAI is getting some heat recently because of that, copyright is something serious, really serious
>>
File: UHZht5g.png (486 KB, 1024x576)
486 KB
486 KB PNG
>>106991306
Not bad. Can you do NTSC artifacts like cross-color, dot-crawl, dot-hang?
>>
Divine axioms of diffusion:
1: SaaS is years ahead of local
2: China mogs the west
Therefore it’s easy to understand why Wan stopped releasing local models.
>>
>>106993396
>So how does bucketing and batch size work together?
Some say it affects, but using gradient checkpointing should negate it which is always on for me so I haven't even thought about the thing. Might be worth testing out.
>>
File: why.jpg (247 KB, 2418x513)
247 KB
247 KB JPG
why is comfy ignoring the openpose controlnet?
>>
>>106993529
OpenAI did push the overtron window, but I don't believe this was their intention, they just wanted hype by showing their model could do Will Smith playing ping pong against 2pac, ff7 style and shit, they know what people like, so they bait by letting them do copyright shit for like one week and then switch to stay safe, they did this on 4o and dalle3 as well, I'm NOOOTICING the pattern at this point

but hey, everything that pushes the overton window in the right direction is welcome, even if it's not being done intentionally
>>
File: 00144-2446407782.png (2.64 MB, 1248x1824)
2.64 MB
2.64 MB PNG
>>
>>106993549
>trannymai
good
>>
>>106993549
you need to go back >>106970615
>>
>>106993529
>Although it's all inevitable, thankfully.
there will be a long fight before it being normalized though, I don't believe copyright companies will give up that easily
>>
>>106993412
>Also its very good at everything NSFW with any NSFW lora
lol no. it's ok at best if you stack half a dozen loras and fuck around with strengths
>>
>>106993549
cute
>>
File: 1730797376385699.png (473 KB, 750x1000)
473 KB
473 KB PNG
https://github.com/bytedance-fanqie-ai/MoGA
Make OpenSource Great Again!
>>
>>106993798
Either jump on the API train or get run over by it, API is the future.
>>
Does Chroma and Qwen share workflows? Or would I need to set up different nodes for each one? Do they work similarly to Flux Kontext?
>>
>>106993825
just check the default templates, retard.
do you know how to breathe?
>>
>>106993825
>Does Chroma and Qwen share workflows
no, i mean you wouldn't use different nodes but you would use different settings
>>106993816
this is the local diffusion general. fuck off
>>
>>106993798
>more buttdance scraps
There is not a single thing they released that is actually useful. Bytedance literally only releases garbage
>>
>>106993853
>Bytedance literally only releases garbage
to be fair, they seem to only have made failures, Seedream 4.0 is the only succesful model they have lol
>>
>>106992993
Where can i read "Mien Comfyui" ?
>>
>>106993837
you can load apis on applications like comfy retard.
>>
File: 00160-2305585121.png (2.58 MB, 1824x1248)
2.58 MB
2.58 MB PNG
>>
>>106993798
>Make OpenSource Great Again!
not thanks to free poop models of jewdance
>>
>>106994005

I can't generate that video. Try describing another idea. You can also get tips for how to write prompts and review our video policy guidelines.
>>
>>106992993
>>106993965
Nevermind I forgot the mongol is only interested in i2v and control shit, he's still wasting time with the useless svi loras.
Probably won't even work on the holocine implementation, which it's insane considering long gens are what everyone has been waiting forever.
>>
>>106994023
Man I am not even trying to be a "hater" but can you look at your gens for longer than 2 seconds before posting them here?
She has like 8 fingers in her right hand.
>>
>>106993301
Use NetaYume, not the original Neta Lumina, if you aren't already. They can do it decently enough, DPM++ 2S Ancestral Linear Quadratic seems to give the most consistently good results for it. Particularly long text support definitely isn't as strong as in e.g. Flux or Qwen though.
>>
>>106994089
i2v does a lot of the heavy lifting for getting a satisfactory gen though, it's understandable though not desirable.
>>
File: ComfyUI_00605_.mp4 (1.56 MB, 880x1176)
1.56 MB
1.56 MB MP4
>>106994090
>he doesn't have 8 (6) fingers on his right hand.
>>
anyone know how i can have an image to 3d set up?
>>
The new lightx2 loras from a couple days ago (yesterday?) seem quite good. Just running them at 1 strength. I guess there's still some slowmo.

>>106994264
What do you mean by 3d? Do you want to make a 3d model or do you want to make a 3d video that rotates around the subject?
>>
>>106994275
yes i want a 3d model. i use sparc 3d now, but it takes forever to get a turn
https://huggingface.co/spaces/ilcve21/Sparc3D
>>
>>106994284
Idk about that model specifically but if you have a decent GPU you can just try cloning their repo and running it locally. Lots of those example apps on huggingface can just be cloned and run locally.
>>
File: FLX_0044.png (3.51 MB, 1080x1920)
3.51 MB
3.51 MB PNG
>>
>>106994295
i have no idea how to set this up. i just set up a text-to-image generator once using A1111
>>
File: FLX_0047.png (3.34 MB, 1080x1920)
3.34 MB
3.34 MB PNG
>>106994311
Flux sometimes just kills ittttttt
>>
any work on low step Lumina models? 30-50 steps is too much
>>
>>106994131
Using Yume with comfy's default workflow, consistently fucks up on a short phrase
>>
File: 00201-1225490963.png (2.9 MB, 1248x1824)
2.9 MB
2.9 MB PNG
>>106994090
on sdxl, hands are very difficult to get right especially when prompting for complex poses with foreshortening and combat involved. adetailer can't fix all the aspects of a bad hands and fingers.
>>
File: 1576977637700.jpg (218 KB, 1280x960)
218 KB
218 KB JPG
i regret testing sora 2. it's hard to go back to mute videos now. and isn't like we have, the best mute video models anyway
>>
Any tips for prompt adherence for WAN2.2 not to zoom in randomly? I feel I hit this more than slowmo nowadays
>>
>>106994394
prompt in chinese. works 110%
>>
File: FLX_0054.png (3.36 MB, 1080x1920)
3.36 MB
3.36 MB PNG
>>106994339
>>
File: 00210-1157773276.png (2.12 MB, 1248x1824)
2.12 MB
2.12 MB PNG
>>
>>106994410
Don't know if you're trolling me or not but will try it out lol
>>
>>106994445
not even kidding. give it a whirl
>>
>>106994443
realism >>>>>>>
>>
>>106994229
is this a good plan for an application?
>>
>>106994467
thats called uncanny valley desu
>>
File: FLX_0064.png (3.47 MB, 1080x1920)
3.47 MB
3.47 MB PNG
>>106994552
not saying its bad. jus pref.
>>
>>106993798
Hopefully its not another dead project that'll never release their model. Speaking of released models,wonder if Kijai or anyone know that Rolling Forcing is already out https://huggingface.co/TencentARC/RollingForcing/tree/main/checkpoints
>>
>>106994452
Did not work, trying another gen without any loras to see if there's any weird interaction fucking up.
Or maybe I suck at prompting
>>
File: bgkorit91xwf1.png (13 KB, 846x213)
13 KB
13 KB PNG
>>
File: ComfyUI_00139_.mp4 (479 KB, 640x640)
479 KB
479 KB MP4
>>106994655
>>
>>106994512
better than cumfart at least
>>
File: i3281w.jpg (1.44 MB, 1600x1600)
1.44 MB
1.44 MB JPG
>>
File: file.png (17 KB, 916x152)
17 KB
17 KB PNG
>>106994655
based ani
>>
>>106994773
wtf i love julien now
>>
>>106994655
What is this platform? Just a slop character interaction ui? Can the avatars change?
>>
>>106994734
>>106985727
>>
>>106994826
there it is ;)
>>
>>106994829
glad i could help o/
>>
>>106994823
civitai pony v7 comment section replies
>>
File: AnimateDiff_00001.mp4 (753 KB, 352x512)
753 KB
753 KB MP4
Trying the 2.2 distilled loras, not too shabby.
>>
>>106994253
yes I love 1-2-3girls laughing at me. Wheres the laughing at me gens?????????
>>
I prefer "girls zapping me with magic lightning bolts" personally although that's more of a video prompt.
>>
>>106995080
thanks for the idea kind anon, ill make some kino zapping 1girls!
>>
>>106995092
Looking forward to it
>>
why is chroma full of shitty gay loras, it's sad
>>
What the fuck, I can't open workflows in comfyui anymore, but opened the last one 5 minutes ago. Did not update, just restarted.
Did this shit updated by itself silently or what?
> [DEPRECATION WARNING] Detected import of deprecated legacy API
>>
>>106994378
Try the specific sampler / scheduler combo I mentioned
>>
>>106995123
create good straight loras. you can train, right?
>>
>>106995133
(samefag) woops I should have mentioned also, around CFG 4.5 to 5.5 is best.
>>
File: wrwrwrwrrwr.jpg (106 KB, 1024x1024)
106 KB
106 KB JPG
>>
File: arf2.jpg (105 KB, 1024x1024)
105 KB
105 KB JPG
>>
>>106994951
>>
>>106995173
can you make it of anime girls
>>
>>106995125
>deprecated legacy API
>comfy deprecating all API nodes
based
>>
is 5090 worth it
>>
Giving video gen a shot, I downloaded wan2GP, 32gb system ram + 16gb 5070.
Are 5s/step on the 1.3B t2v model expected or am I doing something wrong?
>>
>>106995229
if you are thinking of a 5090 in terms of "value" then no. Like all high-end hardware (speakers/cameras/headphones/whatever) it isn't about value, it is about how much you enjoy owning high-end shit and seeing the marginal advantages.
>>
Kijai-Sama, please, I can only test so many models

>holocine

https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled/tree/main/T2V/HoloCine
>>
>>106995229
not really. the VRAM helps fit models but the speed isn't much better than a 4090
>>
>>106995276
>t2v
but I want i2v
>>
File: 00009-1000121469.png (2.98 MB, 1152x1440)
2.98 MB
2.98 MB PNG
>>
>>106995223
it's comfyui manager and rgthree give these warnings
>>
what is the prompt to consistently remove all people in the scene while keeping the viewpoint unchanged in wan i2v?
>>
Can an anon catbox a decent NetaYume workflow?
>>
File: 1744256748672062.png (1.68 MB, 1280x960)
1.68 MB
1.68 MB PNG
qwen image, analogcore 2000s lora
>>
>>106995386
I could
>>
>>106995386
he did in previous
>>
>>106995125
Try --disable-api-nodes
>>
>>106995386
here, I dedicate to you my first gen of the day
>>
>>106995386
There is nothing special with Yume workflows desu.
>>
File: QwenImage_Output_6266433.png (2.04 MB, 1584x1056)
2.04 MB
2.04 MB PNG
>>
>>106995438
there is nothing special with yume
>>
>see a cool lora on civitai
>early access and you need to pay for it to download it
>>
>>106995407
>long dick general
>>
>>106993412
> Also its very good at everything NSFW with any NSFW lora.
as long as nsfw is not genitals or sexual acts
>>
>>106995479
?
https://civitai.com/user/LocalOptima/models
>>
File: AnimateDiff_00001.mp4 (489 KB, 416x480)
489 KB
489 KB MP4
>genning funny reaction images
>results are complete shit with cartoon stuff
>find funny baby
>have it act like a footballer witnessing a goal
>turns out great
>continue with other images
>start contemplating in the back of my mind
>realize what people can do with photos of kids
>truly realize
>am aware of the realization

Local needs to be banned.
>>
>>106995407
oh yeah broi gimme the grain and analog oh yeah I love shitty photos that remind me of crappy cameras ohb yeah bro i can feel the soul bro ycamcorder bro yeah bro give it to me bro
>>
>>106994452
apparently it's either the light or the fusion loras that add the movement, without them camera stays static, but quality becomes ASS
>>
>>106993467
The model is probably cucked too, but at least we got a hypothetical Flux video.
>>
>>106995517
the problem bro is that you're non-white bro
>>
tfw my wife will never launch lighting bolts at me
>>
>>106993549
rgb -> bgr
>>
>>106995507
And that's why I dont put personal stuff online anymore.
Bringing back printed family albums
>>
>>106995530
and that's a good thing, i'd hate to be a minority like a nigger
>>
I am not even gonna give a (You) to that fucking redditor
>>
>>106993371
They don't have leeway to do anything. US/ClosedAI could do it, but not them, plus copyright holders would attempt to charge them double the tax.
>>
File: 00013-795533095.jpg (904 KB, 1536x1920)
904 KB
904 KB JPG
>>
>>106991495
Porn
and easier lora training
>>
>>106995611
He can't train unfortunately
>>
File: file.png (2.03 MB, 1328x1328)
2.03 MB
2.03 MB PNG
>>106995530
whiter than you post hand
>>
>>106995660
kek
>>
>>106995676
>>106995676
>>106995676
>>106995676
>>
>>106993371
Anon, stop being delusional, we don't even have open-source T2I models with dalle3's levels of pop culture knowledge, so videos are a given it's "never ever" in that regard.
Chinks don't care about having a video model with trillions of parameters that "knows everything", they just want a model that performs well enough on benchmarks while being small enough to run on their gpu-embargoed datacenters
>>
>>106993523
Unsure if specifics like that were trained into the lora, I did attempt to prompt for extra haloing / rainbowing but it didn't do anything.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.