/g/ - /ldg/ - Local Diffusion General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/ldg/ - Local Diffusion Genera(...) 04/14/26(Tue)13:08:02 No.108604726

File: highlights_g_108597963_17(...).jpg (2.85 MB, 4469x3674)

/ldg/ - Local Diffusion General Anonymous 04/14/26(Tue)13:08:02 No.108604726

Discussion and Development of Local Image and Video Models

Previous: >>108597963

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon

Anonymous
04/14/26(Tue)13:12:22 No.108604745

Anonymous 04/14/26(Tue)13:12:22 No.108604745

>mfw Resource news

04/14/2026

>ERNIE-Image: Text-to-image generation model built on a single-stream Diffusion Transformer
https://huggingface.co/baidu/ERNIE-Image

>Danbooru Dataset Filter: High-Speed Metadata Explorer for AI Training
https://github.com/ThetaCursed/Danbooru-Dataset-Filter

>ChatGPT will praise the mood and 'bedroom/DIY texture' of fart sounds pulled from YouTube
https://www.pcgamer.com/software/ai/chatgpt-will-praise-the-mood-and-bedroom-diy-texture-of-fart-sounds-pulled-from-youtube

>RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
https://limuloo.github.io/RefineAnything

>Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation
https://github.com/leeruibin/hybrid-forcing

>Energy-oriented Diffusion Bridge for Image Restoration with Foundational Diffusion Models
https://jinnh.github.io/E-Bridge

>FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data
https://github.com/yuandaxia2001/FashionMV

>Degradation-Aware and Structure-Preserving Diffusion for Real-World Image Super-Resolution
https://github.com/jiyang0315/DASP-SR.git

04/13/2026

>LTX 2.3 Distilled v1.1
https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-22b-distilled-1.1.safetensors

>UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations
https://huggingface.co/tencent/Unicom-Unified-Multimodal-Modeling-via-Compressed-Continuous-Semantic-Representations

>CatalogStitch: Dimension-Aware and Occlusion-Preserving Object Compositing for Catalog Image Generation
https://catalogstitch.github.io

>Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement
https://github.com/Metaverse-AI-Lab-THU/ImViD

>Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise
https://github.com/gezbww/Vis_Prompt

>MixFlow: Mixed Source Distributions Improve Rectified Flows
https://github.com/NazirNayal8/MixFlow

Anonymous
04/14/26(Tue)13:13:58 No.108604750

Anonymous 04/14/26(Tue)13:13:58 No.108604750

>mfw Research news

04/14/2026

>EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model
https://editcrafter.github.io

>VGA-Bench: A Unified Benchmark and Multi-Model Framework for Video Aesthetics and Generation Quality Evaluation
https://arxiv.org/abs/2604.10127

>FineEdit: Fine-Grained Image Edit with Bounding Box Guidance
https://arxiv.org/abs/2604.10954

>AIM-Bench: Benchmarking and Improving Affective Image Manipulation via Fine-Grained Hierarchical Control
https://arxiv.org/abs/2604.10454

>Continuous Adversarial Flow Models
https://arxiv.org/abs/2604.11521

>OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video
https://arcomniscript.github.io

>Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation
https://arxiv.org/abs/2604.10837

>Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression
https://arxiv.org/abs/2604.10546

>Rethinking the Diffusion Model from a Langevin Perspective
https://arxiv.org/abs/2604.10465

>Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding
https://arxiv.org/abs/2604.11177

>SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models
https://arxiv.org/abs/2604.11530

>Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
https://arxiv.org/abs/2604.11496

>LDEPrompt: Layer-importance guided Dual Expandable Prompt Pool for Pre-trained Model-based Class-Incremental Learning
https://arxiv.org/abs/2604.11091

>Agentic Video Generation: From Text to Executable Event Graphs via Tool-Constrained LLM Planning
https://arxiv.org/abs/2604.10383

>Omnimodal Dataset Distillation via High-order Proxy Alignment
https://arxiv.org/abs/2604.10666

>What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models
https://arxiv.org/abs/2601.06165

Anonymous
04/14/26(Tue)13:14:05 No.108604751

Anonymous 04/14/26(Tue)13:14:05 No.108604751

File: 1768864323566751.png (1.68 MB, 832x1024)

1.68 MB PNG

ok ernie turbo is fucking garbage at prompt following

Anonymous
04/14/26(Tue)13:15:16 No.108604754

Anonymous 04/14/26(Tue)13:15:16 No.108604754

>>108604751

forgot prompt
>A photorealistic candid photo of a woman with long, flowing hair that transitions from icy white at the roots to vibrant cyan-blue at the tips, cascading over her shoulders and partially obscuring her face as she looks downward. She wears a form-fitting, sleeveless top with a high neckline, primarily white with bold geometric yellow trim and a large, faceted blue diamond-shaped emblem centered on the chest. The garment has a structured, armored appearance with gold-brown segmented panels along the waist and hips, suggesting a fantasy or sci-fi outfit. Her right hand rests on a smooth, light-colored surface in the foreground, fingers slightly curled. The background is an out-of-focus twilight landscape under a deep indigo sky, with a soft gradient of magenta and purple along the horizon. A faint, glowing horizontal line runs across the lower portion of the frame, possibly a railing or edge of a platform. The lighting is directional, casting soft shadows and highlights on her hair and clothing, emphasizing texture and form with natural depth and contrast. No text, speech bubbles, or tears are visible.

Anonymous
04/14/26(Tue)13:16:32 No.108604759

Anonymous 04/14/26(Tue)13:16:32 No.108604759

File: 1746834295472832.png (3.57 MB, 1484x1676)

3.57 MB PNG

https://huggingface.co/baidu/ERNIE-Image
https://huggingface.co/baidu/ERNIE-Image-Turbo
https://yiyan.baidu.com/blog/posts/ernie-image
https://ernieimageprompt.com/

LOCAL IS SAVED!!

Anonymous
04/14/26(Tue)13:17:05 No.108604763

Anonymous 04/14/26(Tue)13:17:05 No.108604763

File: 1768040078201688.png (1.64 MB, 832x1024)

1.64 MB PNG

>>108604754
wait nvm im gay, fucked up a setting

Anonymous
04/14/26(Tue)13:20:05 No.108604772

Anonymous 04/14/26(Tue)13:20:05 No.108604772

can some littlebox or gofile some nsfw gens of ernie image? the huggingface demo is too censored.

Anonymous
04/14/26(Tue)13:21:46 No.108604779

Anonymous 04/14/26(Tue)13:21:46 No.108604779

>>108604759
But can it do anime loli porn?

Anonymous
04/14/26(Tue)13:24:19 No.108604786

Anonymous 04/14/26(Tue)13:24:19 No.108604786

File: Ernie.png (2.14 MB, 1200x896)

2.14 MB PNG

>>108604759
>no edit
that's a shame, imagine doing edit with such a monster of a model, the prompt following is on another level, can't believe it's using a simple 3b text encoder to get that shit, and fucking ministral of all things

Anonymous
04/14/26(Tue)13:31:39 No.108604806

Anonymous 04/14/26(Tue)13:31:39 No.108604806

>>108604786
ZAMN!

Anonymous
04/14/26(Tue)13:33:07 No.108604810

Anonymous 04/14/26(Tue)13:33:07 No.108604810

>>108604759
https://github.com/Comfy-Org/workflow_templates/blob/main/templates/image_ernie_image_turbo.json
https://huggingface.co/Comfy-Org/ERNIE-Image
>AttributeError: 'Ministral3_3B' object has no attribute 'generate'
thanks Comfy

Anonymous
04/14/26(Tue)13:35:26 No.108604817

Anonymous 04/14/26(Tue)13:35:26 No.108604817

>>108604759
Can it do nude?

Anonymous
04/14/26(Tue)13:39:09 No.108604839

Anonymous 04/14/26(Tue)13:39:09 No.108604839

>>108604759
Can it do shrek?

Anonymous
04/14/26(Tue)13:39:27 No.108604842

Anonymous 04/14/26(Tue)13:39:27 No.108604842

File: 1750850705420377.png (1.86 MB, 1024x1024)

1.86 MB PNG

>>108604759
bruh, turbo has garbage anatomy, downloading the base model

Anonymous
04/14/26(Tue)13:39:44 No.108604843

Anonymous 04/14/26(Tue)13:39:44 No.108604843

>>108604759
buy an ad
>>108604810
have you pulled?

Anonymous
04/14/26(Tue)13:41:39 No.108604852

Anonymous 04/14/26(Tue)13:41:39 No.108604852

>>108604842
>implying the monk didn't cultivate enough to master the four immeasurables and grow two extra arms
lol?

Anonymous
04/14/26(Tue)13:42:48 No.108604861

Anonymous 04/14/26(Tue)13:42:48 No.108604861

File: Ernie-Image_00001_.png (1.31 MB, 896x1152)

1.31 MB PNG

The gen times for non-turbo on my 3060 is a bit slow, 2 and half minutes for 20 steps, probably needs more steps, but it's not unusually slow for a model of this size.
Let's see how it holds up further testing.

Anonymous
04/14/26(Tue)13:43:22 No.108604862

Anonymous 04/14/26(Tue)13:43:22 No.108604862

>>108604817
>Can it do nude?
https://litter.catbox.moe/9z9qwbnxpflyqt27.jpg

Anonymous
04/14/26(Tue)13:43:31 No.108604863

Anonymous 04/14/26(Tue)13:43:31 No.108604863

>>108604751
>>108604763
What did you fuck up so I can avoid it

Anonymous
04/14/26(Tue)13:45:42 No.108604871

Anonymous 04/14/26(Tue)13:45:42 No.108604871

>>108604861
I see you tasted the base model, I hope it's the good one, I don't really like my tests on turbo so far
>>108604843
yes I'm on the latest version, seems like comfy hasn't implemented the prompt rewriting yet
https://github.com/Comfy-Org/ComfyUI/pull/13395
>Needs template before it works properly.

Anonymous
04/14/26(Tue)13:45:45 No.108604872

Anonymous 04/14/26(Tue)13:45:45 No.108604872

https://huggingface.co/lightx2v/Wan2.2-Distill-Models/blob/main/wan2.2_i2v_A14b_high_noise_lightx2v_4step_720p_260412.safetensors

Why is the high and low noise close to 60gb?

Anonymous
04/14/26(Tue)13:47:09 No.108604879

Anonymous 04/14/26(Tue)13:47:09 No.108604879

>>108604759
What VAE does it use?

Anonymous
04/14/26(Tue)13:50:40 No.108604888

Anonymous 04/14/26(Tue)13:50:40 No.108604888

>>108604817
>>108604862
https://litter.catbox.moe/tz2g5anklf3bmmmt.jpg
as expected, garbage genitals lol
>>108604879
the best one, flux 2's vae

Anonymous
04/14/26(Tue)13:50:47 No.108604889

Anonymous 04/14/26(Tue)13:50:47 No.108604889

>>108604817
>>108604772
It hasn't been trained on boobs, it generates mediocre breasts. Though from my very limited testing it doesn't seem to be deliberately poisoned like Flux models are.
>>108604871
I just had a feeling that the distill will be problematic and went for the base immediately.

Anonymous
04/14/26(Tue)13:52:12 No.108604893

Anonymous 04/14/26(Tue)13:52:12 No.108604893

>>108604888
Is this turbo or base

Anonymous
04/14/26(Tue)13:53:38 No.108604900

Anonymous 04/14/26(Tue)13:53:38 No.108604900

>>108604889
>I just had a feeling that the distill will be problematic and went for the base immediately.
good, was about time that we got a fully finetuned model that isn't distilled, no need for some NAG cope, we can directly use CFG, and we'll be able to train and make loras on it
>>108604893
turbo

Anonymous
04/14/26(Tue)13:54:56 No.108604906

Anonymous 04/14/26(Tue)13:54:56 No.108604906

>>108604872
FP32 precision.
4 bytes for every weight:
14b x 4 = 56

Anonymous
04/14/26(Tue)13:57:39 No.108604922

Anonymous 04/14/26(Tue)13:57:39 No.108604922

>>108604906
Huh, I haven't seen the fp32 version before.

Anonymous
04/14/26(Tue)14:02:52 No.108604940

Anonymous 04/14/26(Tue)14:02:52 No.108604940

File: Ernie base.png (1.65 MB, 1024x1024)

1.65 MB PNG

>>108604842
>downloading the base model
I really don't like the anatomy, like this is base at 50 steps, come on

Anonymous
04/14/26(Tue)14:06:06 No.108604955

Anonymous 04/14/26(Tue)14:06:06 No.108604955

>migu
:)

Anonymous
04/14/26(Tue)14:08:03 No.108604959

Anonymous 04/14/26(Tue)14:08:03 No.108604959

File: 1771543722896827.jpg (647 KB, 2048x1141)

647 KB JPG

>>108604940
smells like more and more like a nothingburger, the realism quality is Klein tier, but ernie can't even edits to compensate, sad

Anonymous
04/14/26(Tue)14:11:18 No.108604974

Anonymous 04/14/26(Tue)14:11:18 No.108604974

File: Ernie-Image_00006_.png (1.11 MB, 1152x896)

1.11 MB PNG

>>108604940
I am wondering if Comfy fucked something up, or did they do Chroma-tier cherry picking for the images?
>>108604922
FP32 is usually only used for training because the benefits to inference are almost non-existent.

Anonymous
04/14/26(Tue)14:14:04 No.108604983

Anonymous 04/14/26(Tue)14:14:04 No.108604983

File: Ernie-Image_00007_.png (1.1 MB, 1152x896)

1.1 MB PNG

>>108604974
50 steps turned out better.
Seems also a bit wild when it comes to adding shit to the image. First time I have seen AI add a knife to 1girl, standing prompt unsolicited.

Anonymous
04/14/26(Tue)14:16:52 No.108604991

Anonymous 04/14/26(Tue)14:16:52 No.108604991

File: 1769295316640072.jpg (924 KB, 2560x1039)

924 KB JPG

>>108604959

Anonymous
04/14/26(Tue)14:17:15 No.108604992

Anonymous 04/14/26(Tue)14:17:15 No.108604992

>>108604983
Oh I think image is so different due to the fact that Control after generate is bugged with the retarded subgraph Cumfy has shipped with the template. So it ran a whole new seed.
The point about knife stands though, same prompt.

Anonymous
04/14/26(Tue)14:19:41 No.108605000

Anonymous 04/14/26(Tue)14:19:41 No.108605000

File: 1767451626519673.jpg (890 KB, 2560x1214)

890 KB JPG

>>108604991

Anonymous
04/14/26(Tue)14:23:07 No.108605010

Anonymous 04/14/26(Tue)14:23:07 No.108605010

File: o_00245_.png (391 KB, 896x1152)

391 KB PNG

Anonymous
04/14/26(Tue)14:26:12 No.108605023

Anonymous 04/14/26(Tue)14:26:12 No.108605023

File: 1765165312128737.jpg (1.13 MB, 2560x1179)

1.13 MB JPG

>>108605000
Ernie knows only one anime style: "Nano Banana Pro"

:]

Anonymous
04/14/26(Tue)14:32:33 No.108605045

Anonymous 04/14/26(Tue)14:32:33 No.108605045

File: 1757245838942284.jpg (1.54 MB, 2560x1354)

1.54 MB JPG

>>108605023
kek, I think I've seen enough

Anonymous
04/14/26(Tue)14:37:15 No.108605060

Anonymous 04/14/26(Tue)14:37:15 No.108605060

File: 1771533625427241.jpg (1.66 MB, 2560x1354)

1.66 MB JPG

>>108605045
maybe turbo at 16 steps is the best it can get

Anonymous
04/14/26(Tue)14:38:39 No.108605064

Anonymous 04/14/26(Tue)14:38:39 No.108605064

File: file.png (1.52 MB, 1024x1024)

1.52 MB PNG

ernie base with the default settings and default prompt in comfyui gave me a guy with 3 legs.. not a great start

Anonymous
04/14/26(Tue)14:42:13 No.108605080

Anonymous 04/14/26(Tue)14:42:13 No.108605080

File: 1757491738029006.jpg (973 KB, 3072x971)

973 KB JPG

>>108605060
Z-image turbo be like:
https://youtu.be/WO23WBji_Z0?t=10

Anonymous
04/14/26(Tue)14:43:03 No.108605083

Anonymous 04/14/26(Tue)14:43:03 No.108605083

File: Ernie-Image_00009_.png (1.16 MB, 896x1152)

1.16 MB PNG

One of the better gens I got.
Still has this Kleiny look to it.

Anonymous
04/14/26(Tue)14:43:51 No.108605087

Anonymous 04/14/26(Tue)14:43:51 No.108605087

File: o_00247_.png (1.73 MB, 896x1152)

1.73 MB PNG

Anonymous
04/14/26(Tue)14:44:40 No.108605088

Anonymous 04/14/26(Tue)14:44:40 No.108605088

>>108605080
something is wrong with the proportion of their body, looks like they're midgets, Flux Kontext style lool

Anonymous
04/14/26(Tue)14:49:28 No.108605101

Anonymous 04/14/26(Tue)14:49:28 No.108605101

>>108605064
>3 feet
>>108605080
>3 hands
lol I think I won't downloading this

Anonymous
04/14/26(Tue)14:53:04 No.108605115

Anonymous 04/14/26(Tue)14:53:04 No.108605115

File: LTX3 will beat Seedance 2.0!.png (237 KB, 1630x1527)

237 KB PNG

it's all right, the jews will save us
https://xcancel.com/ltx_model/status/2044108750592643279#m

Anonymous
04/14/26(Tue)14:54:56 No.108605126

Anonymous 04/14/26(Tue)14:54:56 No.108605126

File: Ernie Comparison.png (2.57 MB, 1724x1356)

2.57 MB PNG

This model has been trained on 3 billion images of Nano Banana Pro kek.

Anonymous
04/14/26(Tue)14:55:40 No.108605128

Anonymous 04/14/26(Tue)14:55:40 No.108605128

File: Ernie-Image_00010_.png (1.62 MB, 896x1152)

1.62 MB PNG

Anonymous
04/14/26(Tue)15:01:13 No.108605150

Anonymous 04/14/26(Tue)15:01:13 No.108605150

>>108605126
>This model has been trained on 3 billion images of Nano Banana Pro kek.
Z-Image supremacy, yeaaah! We had Qwen Edit and then the Tongyi model/s, but all other Chinese t2i are all equally sloppy, GLM, this, whatever.

Anonymous
04/14/26(Tue)15:03:10 No.108605153

Anonymous 04/14/26(Tue)15:03:10 No.108605153

File: Ernie-Image_00011_.png (1.46 MB, 896x1152)

1.46 MB PNG

I am kinda liking things about it despite it's faults.
But they probably either overcooked this thing or it needed a little bit of post training aesthetic alignment to temper schizo anatomy.

Anonymous
04/14/26(Tue)15:05:30 No.108605160

Anonymous 04/14/26(Tue)15:05:30 No.108605160

File: o_00252_.png (771 KB, 896x1152)

771 KB PNG

Anonymous
04/14/26(Tue)15:11:32 No.108605183

Anonymous 04/14/26(Tue)15:11:32 No.108605183

>>108604974
>I am wondering if Comfy fucked something up
I think the model is just not that good, in my tests it's inferior to Z-image turbo almost everywhere
It can be a great base model to train on though, but yeah, 8b is big, people prefer something smaller like 2b so that they can do Anima type of models or some shit

Anonymous
04/14/26(Tue)15:13:52 No.108605192

Anonymous 04/14/26(Tue)15:13:52 No.108605192

File: 1485680357151.png (299 KB, 500x375)

299 KB PNG

>>108605183
>8b is big

Anonymous
04/14/26(Tue)15:14:07 No.108605193

Anonymous 04/14/26(Tue)15:14:07 No.108605193

>>108605183
yup, same experience, back to zit for me

Anonymous
04/14/26(Tue)15:18:08 No.108605208

Anonymous 04/14/26(Tue)15:18:08 No.108605208

File: 1756553466182638.jpg (459 KB, 1250x1566)

459 KB JPG

>>108605183
>it's inferior to Z-image turbo almost everywhere
the niggas thought that training a model only on Nano Banana Pro's images would do the trick, all we got is that Synth-ID watermark pattern everywhere lmao, once again, synthetic data BTFO

Anonymous
04/14/26(Tue)15:20:14 No.108605220

Anonymous 04/14/26(Tue)15:20:14 No.108605220

>>108605115
oops, forgot to attach their paper
https://arxiv.org/abs/2604.11788

Anonymous
04/14/26(Tue)15:20:57 No.108605224

Anonymous 04/14/26(Tue)15:20:57 No.108605224

>>108605183
I think there are issues with finetuning klein and ZIB for some reason.
If it responds to training well this look salvageable. Decent text encoder + best vae + good size balance between quality and being able to be run on most hardware + OK quality bar anatomy issues + mid instruction following but can be possibly ironed out.
I hope someone besides Kekstone takes a crack at it.
>>108605208
Can't we improve realism with finetuning/lora? I know training on slop sucks but banana pro is really high quality baseline.

Anonymous
04/14/26(Tue)15:23:59 No.108605236

Anonymous 04/14/26(Tue)15:23:59 No.108605236

File: weird.png (98 KB, 1416x497)

98 KB PNG

>>108604759
>https://ernieimageprompt.com/
or else something is wrong with ComfyUi, or those baidu fucks are straight up lying to us, I'm not getting something even close to those images in that site

Anonymous
04/14/26(Tue)15:27:37 No.108605254

Anonymous 04/14/26(Tue)15:27:37 No.108605254

File: 1746478579501469.jpg (38 KB, 460x490)

38 KB JPG

>>108605236
Chinks lying? How can it be...

Anonymous
04/14/26(Tue)15:31:35 No.108605262

Anonymous 04/14/26(Tue)15:31:35 No.108605262

File: jpeg artifacts.png (1.72 MB, 1419x1326)

1.72 MB PNG

I love to complain about the jpeg artifacts on Z-image turbo, but for Erenie we arrived to a whole other level, jesus this is ugly af

Anonymous
04/14/26(Tue)15:35:14 No.108605276

Anonymous 04/14/26(Tue)15:35:14 No.108605276

>>108605262
I don't think those are jpg artifacts, probably the watermark patterns of NBP >>108605126

Anonymous
04/14/26(Tue)15:35:50 No.108605278

Anonymous 04/14/26(Tue)15:35:50 No.108605278

>>108605262
Is this Turbo? I am not really getting these on the Base.

Anonymous
04/14/26(Tue)15:45:59 No.108605304

Anonymous 04/14/26(Tue)15:45:59 No.108605304

File: 1764213941152989.jpg (1.04 MB, 2560x1109)

1.04 MB JPG

turbo seems more slopped overall, and if there's one thing I can say base does better than Z-image turbo, is that it seems to know more stuff, but knowing more stuff is useless if the anatomy is ass and the realism is not even close too

Anonymous
04/14/26(Tue)15:49:35 No.108605317

Anonymous 04/14/26(Tue)15:49:35 No.108605317

>comparing z turbo to ernie base
Why not compare base to base tho

Anonymous
04/14/26(Tue)15:50:10 No.108605321

Anonymous 04/14/26(Tue)15:50:10 No.108605321

File: 1755056615501464.jpg (874 KB, 2560x1039)

874 KB JPG

>>108605278
I think you are right anon, base doesn't seem to have that much noise

Anonymous
04/14/26(Tue)15:52:49 No.108605334

Anonymous 04/14/26(Tue)15:52:49 No.108605334

>>108605321
as a ""base"" model it looks like it's destroying Z-image base, let's hope that we can train it well then, both ZIB and Klein had their issues

Anonymous
04/14/26(Tue)16:02:13 No.108605357

Anonymous 04/14/26(Tue)16:02:13 No.108605357

File: 1764603567551374.jpg (869 KB, 2048x1299)

869 KB JPG

I don't see anything in which Ernie is the best at, Chroma has the best kino, Z-image has the best realism and anatomy, this shit is just slop after slop

Anonymous
04/14/26(Tue)16:04:56 No.108605367

Anonymous 04/14/26(Tue)16:04:56 No.108605367

>>108605317
it's been compared here >>108605080

Anonymous
04/14/26(Tue)16:08:48 No.108605377

Anonymous 04/14/26(Tue)16:08:48 No.108605377

File: 1761780172470859.jpg (623 KB, 2560x934)

623 KB JPG

Anonymous
04/14/26(Tue)16:15:41 No.108605402

Anonymous 04/14/26(Tue)16:15:41 No.108605402

File: kek.jpg (830 KB, 2048x1247)

830 KB JPG

Anonymous
04/14/26(Tue)16:17:30 No.108605408

Anonymous 04/14/26(Tue)16:17:30 No.108605408

File: now what?.png (114 KB, 640x640)

114 KB PNG

the ledditors are loving it though
https://www.reddit.com/r/StableDiffusion/comments/1slg4wh/we_may_have_a_new_sota_opensource_model/

Anonymous
04/14/26(Tue)16:24:47 No.108605427

Anonymous 04/14/26(Tue)16:24:47 No.108605427

>piggies love slop
STOP THE PRESSES A FROGFAG IS SPEAKING !!!

Anonymous
04/14/26(Tue)16:27:10 No.108605439

Anonymous 04/14/26(Tue)16:27:10 No.108605439

File: Nano Banana Amateur.jpg (1.11 MB, 2784x988)

1.11 MB JPG

Can't the chinks do anything else than just make cheap copies of murica's products?

Anonymous
04/14/26(Tue)16:28:47 No.108605448

Anonymous 04/14/26(Tue)16:28:47 No.108605448

File: 635872472572.jpg (2.07 MB, 1664x2432)

2.07 MB JPG

Anonymous
04/14/26(Tue)16:29:36 No.108605451

Anonymous 04/14/26(Tue)16:29:36 No.108605451

File: _AnimaPreview3_00291_.jpg (465 KB, 1792x1072)

465 KB JPG

Anonymous
04/14/26(Tue)16:35:20 No.108605464

Anonymous 04/14/26(Tue)16:35:20 No.108605464

File: Ernie-Image_00022_.png (1.29 MB, 832x1216)

1.29 MB PNG

>>108605408

Anonymous
04/14/26(Tue)16:36:24 No.108605468

Anonymous 04/14/26(Tue)16:36:24 No.108605468

File: 1773347391030829.png (707 KB, 1024x1024)

707 KB PNG

>Tezuka Rin \(katawa shoujo\) sitting on a bench
is that how you're supposed to prompt on Anima? I can't manage to get her

Anonymous
04/14/26(Tue)16:40:53 No.108605490

Anonymous 04/14/26(Tue)16:40:53 No.108605490

>>108605115
distilled seedance 2.0 (ltx 4) and kazar milkers honeypot spy gf was promised to me 6 gorillion years ago but unironically.

Anonymous
04/14/26(Tue)16:42:58 No.108605497

Anonymous 04/14/26(Tue)16:42:58 No.108605497

>>108605262
>>108605278
>>108605276
i never had the artifacts problem with zit, just dont use the suggested retard samplers and instead use:
euler (/euler_a) + simple (/normal)

Anonymous
04/14/26(Tue)16:45:23 No.108605502

Anonymous 04/14/26(Tue)16:45:23 No.108605502

>>108605468
Yes for tag based prompts but I don't think there is full consensus on how to prompt characters when prompting with natural language. Try Tezuka Rin from Katawa Shoujo.
If all options are exhausted try it on preview 2.

Anonymous
04/14/26(Tue)16:46:22 No.108605507

Anonymous 04/14/26(Tue)16:46:22 No.108605507

File: _AnimaPreview3_00310_.jpg (459 KB, 1160x1696)

459 KB JPG

Anonymous
04/14/26(Tue)16:51:21 No.108605521

Anonymous 04/14/26(Tue)16:51:21 No.108605521

File: 1712175743062.jpg (1.63 MB, 2048x2048)

1.63 MB JPG

Anonymous
04/14/26(Tue)16:54:58 No.108605526

Anonymous 04/14/26(Tue)16:54:58 No.108605526

File: blaze it.png (1.52 MB, 1024x1024)

1.52 MB PNG

>>108605468
>Tezuka Rin from Katawa Shoujo, a girl with short messy red hair and green eyes and no arms, sitting on a wooden bench, wearing her school uniform, calm distant expression, soft afternoon light, On the left knee there's a plush of Hatsune Miku, on the right there's a plush of Kazane Teto
skill issue

Anonymous
04/14/26(Tue)17:00:44 No.108605539

Anonymous 04/14/26(Tue)17:00:44 No.108605539

File: 1770648835789723.mp4 (2.08 MB, 948x476)

2.08 MB MP4

https://xcancel.com/DylanTFWang/status/2043952886166761519
>Open-source tomorrow
damn, if it's not too big to run locally maybe Tencent finally cooked

Anonymous
04/14/26(Tue)17:03:25 No.108605550

Anonymous 04/14/26(Tue)17:03:25 No.108605550

big jump in real time interactable video gen

Waypoint-1.5 apache2 first person shooter focused 1.2b 720p 512 frames of context 56fps on 5090, need at least 30xx

online demo https://www.overworld.stream/
https://github.com/Overworldai/world_engine

Anonymous
04/14/26(Tue)17:03:33 No.108605552

Anonymous 04/14/26(Tue)17:03:33 No.108605552

>>108605539
Anons what's the actual use case for this world model thing?
Every single world model I see looks like "cool tech demo you play for five minutes and then never touch again".

Anonymous
04/14/26(Tue)17:04:34 No.108605555

Anonymous 04/14/26(Tue)17:04:34 No.108605555

>>108605539
forgot that link too
https://3d-models.hunyuan.tencent.com/world/

Anonymous
04/14/26(Tue)17:07:06 No.108605566

Anonymous 04/14/26(Tue)17:07:06 No.108605566

File: Flux2-Klein_00092_.png (82 KB, 640x80)

82 KB PNG

Anonymous
04/14/26(Tue)17:09:15 No.108605573

Anonymous 04/14/26(Tue)17:09:15 No.108605573

>>108605552
newfag. luddite. brown, even.

the point is to enjoy the cool new tech and tinker with it while thinking about how you can maybe use it and change it yourself now while also thinking about how cool it will be in a year from now on.

for example chaining multiple generated rooms you can traverse infinitely is a software problem and thus solveable relatively easily while allowing you to get much more out of that tech there.

Anonymous
04/14/26(Tue)17:11:19 No.108605580

Anonymous 04/14/26(Tue)17:11:19 No.108605580

File: 1758430737520461.png (2.44 MB, 1920x1080)

2.44 MB PNG

>>108605550
>512 frames of context 56fps on 5090
So? less than 10 seconds? lol
>>108605552
desu I'd enjoy lurking on a world made out of a cool drawing image, like this shit

Anonymous
04/14/26(Tue)17:13:19 No.108605586

Anonymous 04/14/26(Tue)17:13:19 No.108605586

File: Ernie-Image_00023_.png (1.14 MB, 832x1216)

1.14 MB PNG

A very sloppy double exposure sloppa.

Anonymous
04/14/26(Tue)17:16:26 No.108605592

Anonymous 04/14/26(Tue)17:16:26 No.108605592

>>108605586
https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Wan22_Lightx2v
kijai made the loras out of the new lightning version of Wan 2.2

Anonymous
04/14/26(Tue)17:27:39 No.108605627

Anonymous 04/14/26(Tue)17:27:39 No.108605627

File: Flux2-Klein_00182_.png (1.71 MB, 1024x1024)

1.71 MB PNG

There are some who call me...Tim

Anonymous
04/14/26(Tue)17:29:55 No.108605634

Anonymous 04/14/26(Tue)17:29:55 No.108605634

File: 1747739466716694.jpg (777 KB, 1328x1640)

777 KB JPG

Anonymous
04/14/26(Tue)17:34:22 No.108605651

Anonymous 04/14/26(Tue)17:34:22 No.108605651

audio in ltx 2.3 1.1 seems nicer. we wuz hogwarts:

https://litter.catbox.moe/hnjzczuml64krkjr.mp4

Anonymous
04/14/26(Tue)17:35:34 No.108605654

Anonymous 04/14/26(Tue)17:35:34 No.108605654

File: 1770342916813218.webm (3.73 MB, 960x528)

3.73 MB WEBM

>>108605592
not bad, Wan 2.2 may be an ancient model, it's still the best thing we have :')

Anonymous
04/14/26(Tue)17:41:10 No.108605674

Anonymous 04/14/26(Tue)17:41:10 No.108605674

>>108605651
that's cool, I was tired of the ultra metalic sound of ltx, if those jews keep improving on that shit it might end up being a genuinely good model, still a long way to go to seedance 2.0 though lol >>>/wsg/6128285

Anonymous
04/14/26(Tue)17:45:22 No.108605686

Anonymous 04/14/26(Tue)17:45:22 No.108605686

File: 1759932705998330.webm (3.82 MB, 816x608)

3.82 MB WEBM

>>108605654
>first frame + last frame
kek, I forgot how much vram wan 2.2 is asking, I think I might return to LTX just for that

Anonymous
04/14/26(Tue)17:46:36 No.108605689

Anonymous 04/14/26(Tue)17:46:36 No.108605689

>>108605627
lul, did you combine monty python screenshot with the cat meme?

Anonymous
04/14/26(Tue)17:50:56 No.108605701

Anonymous 04/14/26(Tue)17:50:56 No.108605701

>>108605686
What do you mean by that, isn't LTX to heavier on resources?

Anonymous
04/14/26(Tue)17:58:05 No.108605723

Anonymous 04/14/26(Tue)17:58:05 No.108605723

File: 720p Wan 2.2.webm (3.82 MB, 1104x832)

3.82 MB WEBM

>>108605701
it uses a less heavy VAE so the kv cache usage is less punitive, good luck going for 720p on wan 2.2

Anonymous
04/14/26(Tue)18:18:06 No.108605791

Anonymous 04/14/26(Tue)18:18:06 No.108605791

>>108605723
this. i can make 720p resolution gens on ltx. literally impossible on wan-hunyuan

Anonymous
04/14/26(Tue)18:23:53 No.108605813

Anonymous 04/14/26(Tue)18:23:53 No.108605813

>>108604726
does any of this shit run simply and reasonably well on AMD cards yet?

I have tried multiple times over the last couple of years to get a functional pipeline up and running on my 6800xt 16gb and it has never once worked

I'm no genius but I'm also not retarded

Anonymous
04/14/26(Tue)18:31:47 No.108605841

Anonymous 04/14/26(Tue)18:31:47 No.108605841

File: Flux2-Klein_00152_.png (1.77 MB, 1024x1024)

1.77 MB PNG

>>108605689
yeah

Anonymous
04/14/26(Tue)18:34:53 No.108605848

Anonymous 04/14/26(Tue)18:34:53 No.108605848

MLEM MLEM MLEM HECKIM CHNGUS

Anonymous
04/14/26(Tue)18:35:56 No.108605855

Anonymous 04/14/26(Tue)18:35:56 No.108605855

>>108605813
if linux rocm + forge neo work fine
if windows i pray for you

Anonymous
04/14/26(Tue)19:19:47 No.108606004

Anonymous 04/14/26(Tue)19:19:47 No.108606004

File: 00109-58636226.png (886 KB, 1168x816)

886 KB PNG

Anonymous
04/14/26(Tue)19:21:12 No.108606006

Anonymous 04/14/26(Tue)19:21:12 No.108606006

File: BULLSHIT.png (633 KB, 1424x1211)

633 KB PNG

https://youtu.be/XUxKm40X__g?t=907
benchmarks was a mistake...

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.