[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 00066-347302328.jpg (1.12 MB, 1728x1344)
1.12 MB JPG
Previous /sdg/ thread : >>108606788

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Advanced UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge Classic: https://github.com/Haoming02/sd-webui-forge-classic
Stability Matrix: https://github.com/LykosAI/StabilityMatrix

>Z-Image
https://comfyanonymous.github.io/ComfyUI_examples/z_image
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>Flux.2 Dev/Klein
https://comfyanonymous.github.io/ComfyUI_examples/flux2
https://huggingface.co/black-forest-labs/FLUX.2-dev
https://huggingface.co/black-forest-labs/FLUX.2-klein-4B
https://huggingface.co/black-forest-labs/FLUX.2-klein-9B

>Chroma
https://comfyanonymous.github.io/ComfyUI_examples/chroma
https://huggingface.co/lodestones/Chroma1-HD
https://huggingface.co/silveroxides/Chroma-GGUF

>Anima
https://huggingface.co/circlestone-labs/Anima

>Qwen Image & Edit
https://docs.comfy.org/tutorials/image/qwen/qwen-image
https://huggingface.co/Qwen/Qwen-Image

>Text & image to video - Wan 2.2
https://docs.comfy.org/tutorials/video/wan/wan2_2

>Models, LoRAs & upscaling
https://civitai.com
https://huggingface.co
https://tungsten.run
https://yodayo.com/models
https://www.diffusionarc.com
https://miyukiai.com
https://civitaiarchive.com
https://civitasbay.org
https://www.stablebay.org
https://openmodeldb.info

>Index of guides and other tools
https://rentry.org/sdg-link

>Related boards
>>>/aco/sdg
>>>/b/degen
>>>/d/ddg
>>>/e/edg
>>>/gif/vdg
>>>/h/hdg
>>>/r/realistic+parody
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vp/napt
>>>/vt/vtai

OP https://rentry.co/twkuk8tz
>>
First for shithole general
>>
"adorable Quokka" according to ERNIE image turbo
Lmao
>>
>>108624720
it's an adorable stuffed quokka
>>
File: SDG_News_z_00004_.png (3.16 MB, 2000x1168)
3.16 MB PNG
>mfw Resource news

04/17/2026

>ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling
https://yjx-research.github.io/ControlFoley

>TokenGS: Decoupling 3D Gaussian Prediction from Pixels with Learnable Tokens
https://research.nvidia.com/labs/toronto-ai/tokengs

>MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
https://aka.ms/mm-webagent

>Qwen2D-VAE
https://huggingface.co/Anzhc/Qwen2D-VAE

>ComfyUI HY-World 2.0 — WorldMirror 3D
https://github.com/AHEKOT/ComfyUI_HYWorld2

>Anima Style Explorer: A free web tool for ComfyUI styles
https://anima.mooshieblob.com

>Stanford AI Index Report 2026
https://hai.stanford.edu/assets/files/ai_index_report_2026.pdf

04/16/2026

>Motif-Video 2B: A micro-budget text-to-video diffusion transformer from Motif Technologies
https://motiftech.io/videoshowcase

>HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
https://huggingface.co/tencent/HY-World-2.0

>ErnieTurbo_extracted_lora
https://huggingface.co/GuangyuanSD/ErnieTurbo_extracted_lora/tree/main

04/15/2026

>DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching
https://huggingface.co/tencent/DisCa

>Lyra 2.0: Explorable Generative 3D Worlds
https://research.nvidia.com/labs/sil/projects/lyra2

>AniGen: Unified S3 Fields for Animatable 3D Asset Generation
https://github.com/VAST-AI-Research/AniGen

>T2I-BiasBench: A Multi-Metric Framework for Auditing Demographic and Cultural Bias in Text-to-Image Models
https://gyanendrachaubey.github.io/T2I-BiasBench

>Generative Refinement Networks for Visual Synthesis
https://github.com/MGenAI/GRN

>VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization
https://videoflextok.epfl.ch

>DiffusionPrint: Learning Generative Fingerprints for Diffusion-Based Inpainting Localization
https://github.com/mever-team/diffusionprint
>>
>mfw Research news

04/17/2026

>Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting
https://arxiv.org/abs/2604.14648

>Prompt-to-Gesture: Measuring the Capabilities of I2V Deictic Gesture Generation
https://arxiv.org/abs/2604.14953

>Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes
https://daidedou.sorpi.fr/publication/beyondprompts

>Flow of Truth: Proactive Temporal Forensics for I2V Generation
https://arxiv.org/abs/2604.15003

>AnimationBench: Are Video Models Good at Character-Centric Animation?
https://animationbench.github.io

>DVFace: Spatio-Temporal Dual-Prior Diffusion for Video Face Restoration
https://arxiv.org/abs/2604.14560

>Geometrically Consistent Multi-View Scene Generation from Freehand Sketches
https://arxiv.org/abs/2604.14302

>Analysis of Regularization and Fokker-Planck Residuals in Diffusion Models for Img Gen
https://arxiv.org/abs/2604.15171

>Step-level Denoising-time Diffusion Alignment with Multiple Objectives
https://arxiv.org/abs/2604.14379

>Prompt-Guided Image Editing with Masked Logit Nudging in Visual Autoregressive Models
https://arxiv.org/abs/2604.14591

>Towards Design Compositing
https://arxiv.org/abs/2604.14605

>LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories
https://rockeycoss.github.io/leapalign

>The Courtroom Trial of Pixels: Robust Image Manipulation Localization via Adversarial Evidence and Reinforcement Learning Judgment
https://arxiv.org/abs/2604.14703

>Reward-Aware Trajectory Shaping for Few-step Visual Generation
https://arxiv.org/abs/2604.14910

>Deepfake Detection Generalization with Diffusion Noise
https://arxiv.org/abs/2604.14570

>Switch-KD: Visual-Switch Knowledge Distillation for VLMs
https://arxiv.org/abs/2604.14629

>Bird-SR: Bidirectional Reward-Guided Diffusion for Real-World Image Super-Resolution
https://arxiv.org/abs/2602.07069
>>
File: 00067-923028541.jpg (1.08 MB, 1344x1728)
1.08 MB JPG
darn teeth
>>
>>
File: 00068-1295015300.jpg (801 KB, 1344x1728)
801 KB JPG
>>
>>108624914
nice
>>
>>
>>
>>
>>
File: deEF_zi_00044_.png (2.38 MB, 1792x977)
2.38 MB PNG
>>
>>
>>
>>
File: 00069-3669227972.jpg (1.01 MB, 1344x1728)
1.01 MB JPG
>>
lel
>>
File: 00070-3855282227.png (3.86 MB, 1344x1728)
3.86 MB PNG
>>
File: deEF_zi_00045_.png (2.34 MB, 1792x977)
2.34 MB PNG
>>108625483
hmm, which is the vile one
>>
File: 00072-827988493.jpg (1.49 MB, 2592x2016)
1.49 MB JPG
>>
>>108625573
the one in the back
>>
>>
>>
>>
File: 000000_66393_.png (2.01 MB, 1024x1024)
2.01 MB PNG
>>108624720
Nice. lol
Here's my ERNIE.
>>
>>
>>
>>108626018
>>108624720
nice. i plan on getting around to trying it, how hard is it to prompt compared to like klein/chroma/zit?
>>
>>108626170
So far feels easier, I couldn't change the aspect ratio without distortion/multiple subjects appearing. Follow styles, and knows characters..
>>
File: deEF_zi_00046_.png (2.48 MB, 1792x977)
2.48 MB PNG
>>108626170
>>108626199
do we know what text encoder ernie uses? I don't see it mentioned on the model card
>>
>>108626351
ministral 3 3b afaict
>>
>>108624734
Now it turned into a koala
>>108626018
Kek, heebs will not divide us
>>
File: deEF_zi_00047_.png (2.2 MB, 1792x977)
2.2 MB PNG
>>108626369
I dont think I've ever done anything with minstrel. is it the same as qwen basically?
>>
>>108626392
yeah but french instead of chinese. idk if they just use the text encoder built into the model or what. the comfy template also uses a prompt enhancer which i'd expect is kind of like that one thing from the news we were talking about the other day. i'm still downloading so i'm just talking out my ass atm
>>
File: deEF_zi_00049_.png (2.09 MB, 1792x977)
2.09 MB PNG
>>108626420
>yeah but french instead of chinese
god damn it, I just finished learning mandarin and now I have to learn french too? wǒ zhè cāodàn de féi zhái rénshēng..
>>
>>108626369
oof and yikes
even t5 would be better
>>
>>
>>
>>
File: deEF_zi_00050_.png (2.12 MB, 1792x977)
2.12 MB PNG
>>108626825
omg lewd
>>
>>108626935
it's a skin colored bodysuit
>>
ERNIE Quokkas are so pink
>>
>>108626966
it's either a disease or they've turned carnivorous
>>
This one came out better
>>108626991
New species!
>>
File: deEF_zi_00052_.png (2.27 MB, 1792x977)
2.27 MB PNG
>>108626966
i remember some sdxl mix long ago kept drawing quokkas as green. if I had a quarter for every time quokkas had a spurious correlation with a random color.....
>>
zit version of chromagirl (zgirl?) has a real death-cult vibe to her, i dont get it
>>
File: deEF_zi_00054_.png (2.43 MB, 1792x977)
2.43 MB PNG
>>108627088
should try out ernie and see what kind of stuff erniegirl gets up to
>>
>>108627101
>would have to update comfyui
i dont know about that
>>
>>
>>
File: deEF_zi_00003_.png (2.69 MB, 1920x1047)
2.69 MB PNG
>>
File: bbs-zit-2026-04-14_00179_.png (3.61 MB, 1920x1080)
3.61 MB PNG
>>108627141
don't be a pussy it'll be fine, unless it's been months and then idk maybe ur fucked lol. there haven't been any issues i've seen on desktop for a while now and i update everytime it nags me like a good comfyslave
>>
>>108624673
i was going hard on SD back in 2023 or 2024 and it worked on my gtx1080 8gb but now that is simply not possible. What are the boys using nowadays and how much do i need to pay to get up to speed?
>>
>>108627450
z-image-turbo might work on that card, it's enough vram anyway. might be a tad slower than on a 30xx+
>>
>>108624673
Kik Epp23g
Tele Bgftg33

Make a Lora of my gf?
>>
File: deEF_zi_00005_.png (2.66 MB, 1920x1047)
2.66 MB PNG
>>108627450
>how much do i need to pay to get up to speed?
a better question is how much you're willing to pay
>>
File: Ernie-Image_00003_.png (1017 KB, 1792x1024)
1017 KB PNG
ernie is pretty literal, but did ascii anyway. who needs color anyway?
>>
File: deEF_zi_00006_.png (2.76 MB, 1920x1047)
2.76 MB PNG
>>108627605
thats the most ascii yet
>>
>>108627621
workin on integrating it into an existing workflow. getting distracted by some mouseman baka
>>
File: deEF_zi_00007_.png (2.4 MB, 1920x1047)
2.4 MB PNG
>>108627651
the thread yearns for mouseman baka
>>
>>108627659
he is very stubborn. also this thing is very good at text
>>
will need a lot of tweaking that i'm not in the mood to tweak. gonna try models i guess. also "adult female" = "chinese" so there's that
>>
File: deEF_zi_00010_.png (2.61 MB, 1920x1047)
2.61 MB PNG
>>108627837
>i'm not in the mood to tweak.
just do what I do and jam all your old prompts into it with reckless abandon
>>
>>108627880
working on it, it's a little fiddly bc i gotta massage the noodles... THEY'RE ALWAYS CHINESE?? I TOOK MY MEDS I SWEAR TO GOD
>>
this is aids. i'm kicking the ernie can down the road. why are they chinese?!?
>>
File: deEF_zi_00015_.png (2.61 MB, 1920x1047)
2.61 MB PNG
>>108627926
the noodles are chinese?
I prefer my noodles to be japanese, personally

>>108627952
>why are they chinese?!?
doesn't ernie have negs?
>>
>>108627958
probably, idk the template is fucking retarded. they shove the whole thing in a subgraph bc they know ppl see your average workflow graph and run screaming in terror. but that subgraph doesn't expose negatives. besides, what kind of model makes you put "chinese" in negatives? i'm moving on, it's caturday
>>
>>
File: deEF_zi_00016_.png (2.38 MB, 1920x1047)
2.38 MB PNG
>>108627966
people only use subgraphs ironically
>what kind of model makes you put "chinese" in negatives?
maybe the chinese felt this way about SD. "why do I have to put 'white people' in the negatives?!?
>>
>>108628004
i was thinking about that as i wrote my previous reply. i didn't have a good answer except "we won, and in victory: the end"
>>
>>
File: deEF_zi_00024_.png (2.74 MB, 1920x1047)
2.74 MB PNG
>>
>>
>>
still chinese
>>
File: deEF_zi_00025_.png (2.67 MB, 1920x1047)
2.67 MB PNG
>>108628278
its your subconscious reaching out. deep down you is mao's great proletarian cultural revolution and you have an unquenchable thirst for the blood of landlords
>>
>>108628317
>i want a king
no not like that!
>>
File: deEF_zi_00026_.png (2.75 MB, 1920x1047)
2.75 MB PNG
>>108628323
what is a king
>>
File: 1762810152045767.jpg (144 KB, 784x1168)
144 KB JPG
>>
>>108628337
hopefully not mao! but if it is, then it is what it is.
>>
>>
>>
>>
File: 04488-4129453782.png (2.5 MB, 1213x2159)
2.5 MB PNG
Everything looks like AI to me now. Even supposedly real images.

>>108627621
maybe i should explore the galaxy
>>
>>108628573
>Everything looks like AI to me now. Even supposedly real images.
these things happen... welcome!
>>
File: 04484-4093498429.png (1.21 MB, 1152x896)
1.21 MB PNG
>>108628580
Thanks. I guess the future finally caught up with me.
>>
File: 1750204969951731.jpg (136 KB, 784x1168)
136 KB JPG
>>
>>108628614
i took this picture! where did you get it?!
>>
File: 02063-2081661209.png (1.05 MB, 896x1152)
1.05 MB PNG
>>108628666
Your mom said I could have it. I laughed so much she insisted that I should take it.
nice devil's trips btw
>>
File: rabid_penguin.png (3.16 MB, 1920x1080)
3.16 MB PNG
rabid penguin
https://suno.com/s/8SAYG3928UkqCFON
https://youtu.be/Gzg7i4iKw-A
>>
>>
gn then.
>>
>>108628990
gn

gm
>>
>>
>>
File: 1755331967427001.jpg (184 KB, 784x1168)
184 KB JPG
>>
>>
File: 000000_66599_.png (2.76 MB, 1051x1557)
2.76 MB PNG
G'mornin Anons,
>>
File: 00001-3070779711.jpg (1.98 MB, 2016x2592)
1.98 MB JPG
>>108630347
morning
>>
>>108630350
Gm, nice details/gradient in gen!
>>
File: 00003-3525810399.jpg (1.6 MB, 2016x2592)
1.6 MB JPG
>>108630380
ty
>>
File: Ernie-Image_00001_.png (2.26 MB, 1024x1024)
2.26 MB PNG
>ernie
yeah...
>>
File: 1765642505924136.jpg (184 KB, 784x1168)
184 KB JPG
>>
i miss schizo anon
>>
hmmm
>>
File: deEF_zi_00030_.png (2.54 MB, 1920x1047)
2.54 MB PNG
>>
>>
i cant say i'm impressed by ernie
>mistral 3
eh
>prompt enhancer (actually gemma3-2b or 4b)
eh
>asian(chinese) women no matter the prompt
eh
>>
File: deVC_zi_00006_.png (2.29 MB, 1792x977)
2.29 MB PNG
>>108631069
>>asian(chinese) women no matter the prompt
century of chinese prosperity
>>
>>108631162
might be ok for more simple prompts but not the stuff i throw at it
>>
>>108631069
>asian(chinese) women no matter the prompt
That's interesting. Mistral isn't even chinese, neither is gemma.
>>
>>108631215
yah but ernie is, and if it's trained on say 80% chinese women (for the "woman" token) then there you go
>>
File: deVC_zi_00010_.png (2.49 MB, 1792x977)
2.49 MB PNG
>>
>>108631244
Yeah, makes sense.

I just vide coded a GIMP plug-in that runs an image edit workflow.
>>
>>108631293
*vibe
>>
the TE (mistral) and "prompt enhancer" (gemma3 if you even use it) are fine, the issue is the image model (ernie) doesnt have enough knowledge of say styles/things (probably because they spent so much training the text/multilanguage aspect). so while for certain things it may work well enough (especially text), if you throw a lot of things it has no idea about even if the TE/PE do, it's pointless
i'd say chroma (and more recent XL finetunes) > zit/f2.klein > flux in terms of local "knowledge" for image models. ernie would sit between zit/flux on that
like this image, doesnt know what a space marine looks like lel

>>108631293
>I just vide coded a GIMP plug-in that runs an image edit workflow.
nice, share it pls
>>
File: deVC_zi_00012_.png (2.4 MB, 1792x977)
2.4 MB PNG
>>
>>108631304
>nice, share it pls
https://litter.catbox.moe/ee68zgsfs9evey64.zip
expand directory in ~/.config/GIMP/3.2/plug-ins/
Should work with the out of the box edit workflows from templates but didn't test all. Just gwen 2512.
>>
>>108631342
thx
>>
File: deVC_zi_00014_.png (2.39 MB, 1792x977)
2.39 MB PNG
>>
File: 00005-1719979831.jpg (1.72 MB, 2016x2592)
1.72 MB JPG
>>
>>
>>
>>
File: deVC_zi_00015_.png (2.22 MB, 1792x977)
2.22 MB PNG
>>
File: pixel-0001-1055173564.png (2.66 MB, 2016x2592)
2.66 MB PNG
>>
>>
File: deVC_zi_00016_.png (2.35 MB, 1792x977)
2.35 MB PNG
>>
File: 15151123156.jpg (92 KB, 600x800)
92 KB JPG
Last time I fiddled with stuff like SD was when automatic1111 webui was pretty new.

Lookin to get up to date with this, specifically to use img2img to change styles (eg photo to a drawing)

Can someone give me a general point into the right direction? Where do I have to look? Theres so much shit now. no need for an explaination
>>
File: pixel-0003-1565456904.png (1.48 MB, 2592x2016)
1.48 MB PNG
>>
File: deVC_zi_00018_.png (2.35 MB, 1792x977)
2.35 MB PNG
>>108631874
step one is always install a UI from the OP and gen a picture
forge is the spiritual successor to a1111. comfyui is more premiere
>>
Morning anons
Pinkuokka lmao
>>108631874
Flux Kontext shouube able to run on reforge, still unsure how to use comfy myself.
>>
>>108631874
Assuming you have a GPU with 12+gb vram
Start by installing comfyui from https://github.com/comfyanonymous/ComfyUI
Install all the dependencies.
Go to the templates and click in image, there are image edit templates. Use one which works for you.
Skipping lots of steps but that's what the thread news is about.
>>
>>108631907
>>108631909
>>108631920

alright thanks anon i think i can work with that
>>
>>
File: deVC_zi_00019_.png (2.28 MB, 1792x977)
2.28 MB PNG
>>
>we get a brand new theme poster
>nobody engages them
>just keep spamming slop
This is why everyone left
>>
File: pixel-0007-2017423765.png (434 KB, 2592x2016)
434 KB PNG
>>
>>108632226
you're still here
you didnt do shit either
be the change you want to see
>>
>>108632284
OK debo
>>
>>108632288
you're dumb
>>
File: deVC_zi_00022_.png (2.24 MB, 1792x977)
2.24 MB PNG
>>108632226
>brand new theme poster
wat
>>
hi
>>
tfw nogens
>>
>>
>>
File: deVC_zi_00023_.png (2.35 MB, 1792x977)
2.35 MB PNG
>>108632379
hello
>>
>>
>>108632379
good afternoon
>>
>>
>>
File: deVC_zi_00027_.png (2.37 MB, 1792x977)
2.37 MB PNG
>>
File: pixel-0010-3793713580.png (721 KB, 1344x1728)
721 KB PNG
>>
>>
>>
>>
>>
>>
File: deVC_zi_00028_.png (2.29 MB, 1792x977)
2.29 MB PNG
>>
File: deas_00015_.png (3.21 MB, 1728x1344)
3.21 MB PNG
>>
File: deSS_cHD_00008_.png (3.13 MB, 2016x1165)
3.13 MB PNG
>>
File: 1763501650049437.jpg (270 KB, 832x1216)
270 KB JPG
>>
>>
new
>>108633149
>>108633149
>>108633149



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.