[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


No longer down for maintenance!

[Advertise on 4chan]


Discussion and Development of Local Image and Video Models

Previous: >>108597963

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
>mfw Resource news

04/14/2026

>ERNIE-Image: Text-to-image generation model built on a single-stream Diffusion Transformer
https://huggingface.co/baidu/ERNIE-Image

>Danbooru Dataset Filter: High-Speed Metadata Explorer for AI Training
https://github.com/ThetaCursed/Danbooru-Dataset-Filter

>ChatGPT will praise the mood and 'bedroom/DIY texture' of fart sounds pulled from YouTube
https://www.pcgamer.com/software/ai/chatgpt-will-praise-the-mood-and-bedroom-diy-texture-of-fart-sounds-pulled-from-youtube

>RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
https://limuloo.github.io/RefineAnything

>Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation
https://github.com/leeruibin/hybrid-forcing

>Energy-oriented Diffusion Bridge for Image Restoration with Foundational Diffusion Models
https://jinnh.github.io/E-Bridge

>FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data
https://github.com/yuandaxia2001/FashionMV

>Degradation-Aware and Structure-Preserving Diffusion for Real-World Image Super-Resolution
https://github.com/jiyang0315/DASP-SR.git

04/13/2026

>LTX 2.3 Distilled v1.1
https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-22b-distilled-1.1.safetensors

>UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations
https://huggingface.co/tencent/Unicom-Unified-Multimodal-Modeling-via-Compressed-Continuous-Semantic-Representations

>CatalogStitch: Dimension-Aware and Occlusion-Preserving Object Compositing for Catalog Image Generation
https://catalogstitch.github.io

>Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement
https://github.com/Metaverse-AI-Lab-THU/ImViD

>Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise
https://github.com/gezbww/Vis_Prompt

>MixFlow: Mixed Source Distributions Improve Rectified Flows
https://github.com/NazirNayal8/MixFlow
>>
>mfw Research news

04/14/2026

>EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model
https://editcrafter.github.io

>VGA-Bench: A Unified Benchmark and Multi-Model Framework for Video Aesthetics and Generation Quality Evaluation
https://arxiv.org/abs/2604.10127

>FineEdit: Fine-Grained Image Edit with Bounding Box Guidance
https://arxiv.org/abs/2604.10954

>AIM-Bench: Benchmarking and Improving Affective Image Manipulation via Fine-Grained Hierarchical Control
https://arxiv.org/abs/2604.10454

>Continuous Adversarial Flow Models
https://arxiv.org/abs/2604.11521

>OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video
https://arcomniscript.github.io

>Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation
https://arxiv.org/abs/2604.10837

>Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression
https://arxiv.org/abs/2604.10546

>Rethinking the Diffusion Model from a Langevin Perspective
https://arxiv.org/abs/2604.10465

>Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding
https://arxiv.org/abs/2604.11177

>SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models
https://arxiv.org/abs/2604.11530

>Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
https://arxiv.org/abs/2604.11496

>LDEPrompt: Layer-importance guided Dual Expandable Prompt Pool for Pre-trained Model-based Class-Incremental Learning
https://arxiv.org/abs/2604.11091

>Agentic Video Generation: From Text to Executable Event Graphs via Tool-Constrained LLM Planning
https://arxiv.org/abs/2604.10383

>Omnimodal Dataset Distillation via High-order Proxy Alignment
https://arxiv.org/abs/2604.10666

>What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models
https://arxiv.org/abs/2601.06165
>>
File: 1768864323566751.png (1.68 MB, 832x1024)
1.68 MB
1.68 MB PNG
ok ernie turbo is fucking garbage at prompt following
>>
>>108604751

forgot prompt
>A photorealistic candid photo of a woman with long, flowing hair that transitions from icy white at the roots to vibrant cyan-blue at the tips, cascading over her shoulders and partially obscuring her face as she looks downward. She wears a form-fitting, sleeveless top with a high neckline, primarily white with bold geometric yellow trim and a large, faceted blue diamond-shaped emblem centered on the chest. The garment has a structured, armored appearance with gold-brown segmented panels along the waist and hips, suggesting a fantasy or sci-fi outfit. Her right hand rests on a smooth, light-colored surface in the foreground, fingers slightly curled. The background is an out-of-focus twilight landscape under a deep indigo sky, with a soft gradient of magenta and purple along the horizon. A faint, glowing horizontal line runs across the lower portion of the frame, possibly a railing or edge of a platform. The lighting is directional, casting soft shadows and highlights on her hair and clothing, emphasizing texture and form with natural depth and contrast. No text, speech bubbles, or tears are visible.
>>
File: 1746834295472832.png (3.57 MB, 1484x1676)
3.57 MB
3.57 MB PNG
https://huggingface.co/baidu/ERNIE-Image
https://huggingface.co/baidu/ERNIE-Image-Turbo
https://yiyan.baidu.com/blog/posts/ernie-image
https://ernieimageprompt.com/

LOCAL IS SAVED!!
>>
File: 1768040078201688.png (1.64 MB, 832x1024)
1.64 MB
1.64 MB PNG
>>108604754
wait nvm im gay, fucked up a setting
>>
can some littlebox or gofile some nsfw gens of ernie image? the huggingface demo is too censored.
>>
>>108604759
But can it do anime loli porn?
>>
File: Ernie.png (2.14 MB, 1200x896)
2.14 MB
2.14 MB PNG
>>108604759
>no edit
that's a shame, imagine doing edit with such a monster of a model, the prompt following is on another level, can't believe it's using a simple 3b text encoder to get that shit, and fucking ministral of all things
>>
>>108604786
ZAMN!
>>
>>108604759
https://github.com/Comfy-Org/workflow_templates/blob/main/templates/image_ernie_image_turbo.json
https://huggingface.co/Comfy-Org/ERNIE-Image
>AttributeError: 'Ministral3_3B' object has no attribute 'generate'
thanks Comfy
>>
>>108604759
Can it do nude?
>>
>>108604759
Can it do shrek?
>>
File: 1750850705420377.png (1.86 MB, 1024x1024)
1.86 MB
1.86 MB PNG
>>108604759
bruh, turbo has garbage anatomy, downloading the base model
>>
>>108604759
buy an ad
>>108604810
have you pulled?
>>
>>108604842
>implying the monk didn't cultivate enough to master the four immeasurables and grow two extra arms
lol?
>>
File: Ernie-Image_00001_.png (1.31 MB, 896x1152)
1.31 MB
1.31 MB PNG
The gen times for non-turbo on my 3060 is a bit slow, 2 and half minutes for 20 steps, probably needs more steps, but it's not unusually slow for a model of this size.
Let's see how it holds up further testing.
>>
>>108604817
>Can it do nude?
https://litter.catbox.moe/9z9qwbnxpflyqt27.jpg
>>
>>108604751
>>108604763
What did you fuck up so I can avoid it
>>
>>108604861
I see you tasted the base model, I hope it's the good one, I don't really like my tests on turbo so far
>>108604843
yes I'm on the latest version, seems like comfy hasn't implemented the prompt rewriting yet
https://github.com/Comfy-Org/ComfyUI/pull/13395
>Needs template before it works properly.
>>
https://huggingface.co/lightx2v/Wan2.2-Distill-Models/blob/main/wan2.2_i2v_A14b_high_noise_lightx2v_4step_720p_260412.safetensors

Why is the high and low noise close to 60gb?
>>
>>108604759
What VAE does it use?
>>
>>108604817
>>108604862
https://litter.catbox.moe/tz2g5anklf3bmmmt.jpg
as expected, garbage genitals lol
>>108604879
the best one, flux 2's vae
>>
>>108604817
>>108604772
It hasn't been trained on boobs, it generates mediocre breasts. Though from my very limited testing it doesn't seem to be deliberately poisoned like Flux models are.
>>108604871
I just had a feeling that the distill will be problematic and went for the base immediately.
>>
>>108604888
Is this turbo or base
>>
>>108604889
>I just had a feeling that the distill will be problematic and went for the base immediately.
good, was about time that we got a fully finetuned model that isn't distilled, no need for some NAG cope, we can directly use CFG, and we'll be able to train and make loras on it
>>108604893
turbo
>>
>>108604872
FP32 precision.
4 bytes for every weight:
14b x 4 = 56
>>
>>108604906
Huh, I haven't seen the fp32 version before.
>>
File: Ernie base.png (1.65 MB, 1024x1024)
1.65 MB
1.65 MB PNG
>>108604842
>downloading the base model
I really don't like the anatomy, like this is base at 50 steps, come on
>>
>migu
:)
>>
File: 1771543722896827.jpg (647 KB, 2048x1141)
647 KB
647 KB JPG
>>108604940
smells like more and more like a nothingburger, the realism quality is Klein tier, but ernie can't even edits to compensate, sad
>>
File: Ernie-Image_00006_.png (1.11 MB, 1152x896)
1.11 MB
1.11 MB PNG
>>108604940
I am wondering if Comfy fucked something up, or did they do Chroma-tier cherry picking for the images?
>>108604922
FP32 is usually only used for training because the benefits to inference are almost non-existent.
>>
File: Ernie-Image_00007_.png (1.1 MB, 1152x896)
1.1 MB
1.1 MB PNG
>>108604974
50 steps turned out better.
Seems also a bit wild when it comes to adding shit to the image. First time I have seen AI add a knife to 1girl, standing prompt unsolicited.
>>
File: 1769295316640072.jpg (924 KB, 2560x1039)
924 KB
924 KB JPG
>>108604959
>>
>>108604983
Oh I think image is so different due to the fact that Control after generate is bugged with the retarded subgraph Cumfy has shipped with the template. So it ran a whole new seed.
The point about knife stands though, same prompt.
>>
File: 1767451626519673.jpg (890 KB, 2560x1214)
890 KB
890 KB JPG
>>108604991
>>
File: o_00245_.png (391 KB, 896x1152)
391 KB
391 KB PNG
>>
File: 1765165312128737.jpg (1.13 MB, 2560x1179)
1.13 MB
1.13 MB JPG
>>108605000
Ernie knows only one anime style: "Nano Banana Pro"

:]
>>
File: 1757245838942284.jpg (1.54 MB, 2560x1354)
1.54 MB
1.54 MB JPG
>>108605023
kek, I think I've seen enough
>>
File: 1771533625427241.jpg (1.66 MB, 2560x1354)
1.66 MB
1.66 MB JPG
>>108605045
maybe turbo at 16 steps is the best it can get
>>
File: file.png (1.52 MB, 1024x1024)
1.52 MB
1.52 MB PNG
ernie base with the default settings and default prompt in comfyui gave me a guy with 3 legs.. not a great start
>>
File: 1757491738029006.jpg (973 KB, 3072x971)
973 KB
973 KB JPG
>>108605060
Z-image turbo be like:
https://youtu.be/WO23WBji_Z0?t=10
>>
File: Ernie-Image_00009_.png (1.16 MB, 896x1152)
1.16 MB
1.16 MB PNG
One of the better gens I got.
Still has this Kleiny look to it.
>>
File: o_00247_.png (1.73 MB, 896x1152)
1.73 MB
1.73 MB PNG
>>
>>108605080
something is wrong with the proportion of their body, looks like they're midgets, Flux Kontext style lool
>>
>>108605064
>3 feet
>>108605080
>3 hands
lol I think I won't downloading this
>>
it's all right, the jews will save us
https://xcancel.com/ltx_model/status/2044108750592643279#m
>>
File: Ernie Comparison.png (2.57 MB, 1724x1356)
2.57 MB
2.57 MB PNG
This model has been trained on 3 billion images of Nano Banana Pro kek.
>>
File: Ernie-Image_00010_.png (1.62 MB, 896x1152)
1.62 MB
1.62 MB PNG
>>
>>108605126
>This model has been trained on 3 billion images of Nano Banana Pro kek.
Z-Image supremacy, yeaaah! We had Qwen Edit and then the Tongyi model/s, but all other Chinese t2i are all equally sloppy, GLM, this, whatever.
>>
File: Ernie-Image_00011_.png (1.46 MB, 896x1152)
1.46 MB
1.46 MB PNG
I am kinda liking things about it despite it's faults.
But they probably either overcooked this thing or it needed a little bit of post training aesthetic alignment to temper schizo anatomy.
>>
File: o_00252_.png (771 KB, 896x1152)
771 KB
771 KB PNG
>>
>>108604974
>I am wondering if Comfy fucked something up
I think the model is just not that good, in my tests it's inferior to Z-image turbo almost everywhere
It can be a great base model to train on though, but yeah, 8b is big, people prefer something smaller like 2b so that they can do Anima type of models or some shit
>>
File: 1485680357151.png (299 KB, 500x375)
299 KB
299 KB PNG
>>108605183
>8b is big
>>
>>108605183
yup, same experience, back to zit for me
>>
File: 1756553466182638.jpg (459 KB, 1250x1566)
459 KB
459 KB JPG
>>108605183
>it's inferior to Z-image turbo almost everywhere
the niggas thought that training a model only on Nano Banana Pro's images would do the trick, all we got is that Synth-ID watermark pattern everywhere lmao, once again, synthetic data BTFO
>>
>>108605115
oops, forgot to attach their paper
https://arxiv.org/abs/2604.11788
>>
>>108605183
I think there are issues with finetuning klein and ZIB for some reason.
If it responds to training well this look salvageable. Decent text encoder + best vae + good size balance between quality and being able to be run on most hardware + OK quality bar anatomy issues + mid instruction following but can be possibly ironed out.
I hope someone besides Kekstone takes a crack at it.
>>108605208
Can't we improve realism with finetuning/lora? I know training on slop sucks but banana pro is really high quality baseline.
>>
File: weird.png (98 KB, 1416x497)
98 KB
98 KB PNG
>>108604759
>https://ernieimageprompt.com/
or else something is wrong with ComfyUi, or those baidu fucks are straight up lying to us, I'm not getting something even close to those images in that site
>>
File: 1746478579501469.jpg (38 KB, 460x490)
38 KB
38 KB JPG
>>108605236
Chinks lying? How can it be...
>>
File: jpeg artifacts.png (1.72 MB, 1419x1326)
1.72 MB
1.72 MB PNG
I love to complain about the jpeg artifacts on Z-image turbo, but for Erenie we arrived to a whole other level, jesus this is ugly af
>>
>>108605262
I don't think those are jpg artifacts, probably the watermark patterns of NBP >>108605126
>>
>>108605262
Is this Turbo? I am not really getting these on the Base.
>>
File: 1764213941152989.jpg (1.04 MB, 2560x1109)
1.04 MB
1.04 MB JPG
turbo seems more slopped overall, and if there's one thing I can say base does better than Z-image turbo, is that it seems to know more stuff, but knowing more stuff is useless if the anatomy is ass and the realism is not even close too
>>
>comparing z turbo to ernie base
Why not compare base to base tho
>>
File: 1755056615501464.jpg (874 KB, 2560x1039)
874 KB
874 KB JPG
>>108605278
I think you are right anon, base doesn't seem to have that much noise
>>
>>108605321
as a ""base"" model it looks like it's destroying Z-image base, let's hope that we can train it well then, both ZIB and Klein had their issues
>>
File: 1764603567551374.jpg (869 KB, 2048x1299)
869 KB
869 KB JPG
I don't see anything in which Ernie is the best at, Chroma has the best kino, Z-image has the best realism and anatomy, this shit is just slop after slop
>>
>>108605317
it's been compared here >>108605080
>>
File: 1761780172470859.jpg (623 KB, 2560x934)
623 KB
623 KB JPG
>>
File: kek.jpg (830 KB, 2048x1247)
830 KB
830 KB JPG
>>
File: now what?.png (114 KB, 640x640)
114 KB
114 KB PNG
the ledditors are loving it though
https://www.reddit.com/r/StableDiffusion/comments/1slg4wh/we_may_have_a_new_sota_opensource_model/
>>
>piggies love slop
STOP THE PRESSES A FROGFAG IS SPEAKING !!!
>>
File: Nano Banana Amateur.jpg (1.11 MB, 2784x988)
1.11 MB
1.11 MB JPG
Can't the chinks do anything else than just make cheap copies of murica's products?
>>
File: 635872472572.jpg (2.07 MB, 1664x2432)
2.07 MB
2.07 MB JPG
>>
File: _AnimaPreview3_00291_.jpg (465 KB, 1792x1072)
465 KB
465 KB JPG
>>
File: Ernie-Image_00022_.png (1.29 MB, 832x1216)
1.29 MB
1.29 MB PNG
>>108605408
>>
File: 1773347391030829.png (707 KB, 1024x1024)
707 KB
707 KB PNG
>Tezuka Rin \(katawa shoujo\) sitting on a bench
is that how you're supposed to prompt on Anima? I can't manage to get her
>>
>>108605115
distilled seedance 2.0 (ltx 4) and kazar milkers honeypot spy gf was promised to me 6 gorillion years ago but unironically.
>>
>>108605262
>>108605278
>>108605276
i never had the artifacts problem with zit, just dont use the suggested retard samplers and instead use:
euler (/euler_a) + simple (/normal)
>>
>>108605468
Yes for tag based prompts but I don't think there is full consensus on how to prompt characters when prompting with natural language. Try Tezuka Rin from Katawa Shoujo.
If all options are exhausted try it on preview 2.
>>
File: _AnimaPreview3_00310_.jpg (459 KB, 1160x1696)
459 KB
459 KB JPG
>>
File: 1712175743062.jpg (1.63 MB, 2048x2048)
1.63 MB
1.63 MB JPG
>>
File: blaze it.png (1.52 MB, 1024x1024)
1.52 MB
1.52 MB PNG
>>108605468
>Tezuka Rin from Katawa Shoujo, a girl with short messy red hair and green eyes and no arms, sitting on a wooden bench, wearing her school uniform, calm distant expression, soft afternoon light, On the left knee there's a plush of Hatsune Miku, on the right there's a plush of Kazane Teto
skill issue
>>
File: 1770648835789723.mp4 (2.08 MB, 948x476)
2.08 MB
2.08 MB MP4
https://xcancel.com/DylanTFWang/status/2043952886166761519
>Open-source tomorrow
damn, if it's not too big to run locally maybe Tencent finally cooked
>>
big jump in real time interactable video gen

Waypoint-1.5 apache2 first person shooter focused 1.2b 720p 512 frames of context 56fps on 5090, need at least 30xx

online demo https://www.overworld.stream/
https://github.com/Overworldai/world_engine
>>
>>108605539
Anons what's the actual use case for this world model thing?
Every single world model I see looks like "cool tech demo you play for five minutes and then never touch again".
>>
>>108605539
forgot that link too
https://3d-models.hunyuan.tencent.com/world/
>>
File: Flux2-Klein_00092_.png (82 KB, 640x80)
82 KB
82 KB PNG
>>
>>108605552
newfag. luddite. brown, even.

the point is to enjoy the cool new tech and tinker with it while thinking about how you can maybe use it and change it yourself now while also thinking about how cool it will be in a year from now on.

for example chaining multiple generated rooms you can traverse infinitely is a software problem and thus solveable relatively easily while allowing you to get much more out of that tech there.
>>
File: 1758430737520461.png (2.44 MB, 1920x1080)
2.44 MB
2.44 MB PNG
>>108605550
>512 frames of context 56fps on 5090
So? less than 10 seconds? lol
>>108605552
desu I'd enjoy lurking on a world made out of a cool drawing image, like this shit
>>
File: Ernie-Image_00023_.png (1.14 MB, 832x1216)
1.14 MB
1.14 MB PNG
A very sloppy double exposure sloppa.
>>
>>108605586
https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Wan22_Lightx2v
kijai made the loras out of the new lightning version of Wan 2.2
>>
File: Flux2-Klein_00182_.png (1.71 MB, 1024x1024)
1.71 MB
1.71 MB PNG
There are some who call me...Tim
>>
File: 1747739466716694.jpg (777 KB, 1328x1640)
777 KB
777 KB JPG
>>
audio in ltx 2.3 1.1 seems nicer. we wuz hogwarts:

https://litter.catbox.moe/hnjzczuml64krkjr.mp4
>>
File: 1770342916813218.webm (3.73 MB, 960x528)
3.73 MB
3.73 MB WEBM
>>108605592
not bad, Wan 2.2 may be an ancient model, it's still the best thing we have :')
>>
>>108605651
that's cool, I was tired of the ultra metalic sound of ltx, if those jews keep improving on that shit it might end up being a genuinely good model, still a long way to go to seedance 2.0 though lol >>>/wsg/6128285
>>
File: 1759932705998330.webm (3.82 MB, 816x608)
3.82 MB
3.82 MB WEBM
>>108605654
>first frame + last frame
kek, I forgot how much vram wan 2.2 is asking, I think I might return to LTX just for that
>>
>>108605627
lul, did you combine monty python screenshot with the cat meme?
>>
>>108605686
What do you mean by that, isn't LTX to heavier on resources?
>>
File: 720p Wan 2.2.webm (3.82 MB, 1104x832)
3.82 MB
3.82 MB WEBM
>>108605701
it uses a less heavy VAE so the kv cache usage is less punitive, good luck going for 720p on wan 2.2
>>
>>108605723
this. i can make 720p resolution gens on ltx. literally impossible on wan-hunyuan
>>
>>108604726
does any of this shit run simply and reasonably well on AMD cards yet?

I have tried multiple times over the last couple of years to get a functional pipeline up and running on my 6800xt 16gb and it has never once worked

I'm no genius but I'm also not retarded
>>
File: Flux2-Klein_00152_.png (1.77 MB, 1024x1024)
1.77 MB
1.77 MB PNG
>>108605689
yeah
>>
MLEM MLEM MLEM HECKIM CHNGUS
>>
>>108605813
if linux rocm + forge neo work fine
if windows i pray for you
>>
File: 00109-58636226.png (886 KB, 1168x816)
886 KB
886 KB PNG
>>
File: BULLSHIT.png (633 KB, 1424x1211)
633 KB
633 KB PNG
https://youtu.be/XUxKm40X__g?t=907
benchmarks was a mistake...



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.