[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


Discussion of Free and Open Source Diffusion Models

Prev:>>107861070

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>NetaYume
https://huggingface.co/duongve/NetaYume-Lumina-Image-2.0
https://nieta-art.feishu.cn/wiki/RZAawlH2ci74qckRLRPc9tOynrb

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
>>107864569
if only the audio quality was a bit higher this could be really useful, didn't they say they were making a ltx 2.1 with better audio at some point?
>>
File: dehr_00004__.png (3.13 MB, 1448x1728)
3.13 MB
3.13 MB PNG
mfw
>>
File: LTX-2_00007_.webm (3.91 MB, 704x960)
3.91 MB
3.91 MB WEBM
>>>/wsg/6072817
>>
File: Nekoyomi_110433_0553_.png (1.6 MB, 1071x1428)
1.6 MB
1.6 MB PNG
>>
>>107864623
it can be very clear but singing is harder to clone than regular audio I guess, still works well in most cases:

https://files.catbox.moe/cozidd.mp4
>>
Blessed thread of frenship
>>
does ltx2 lose more details overtime than wan in i2v?
>>
>>
>>107864660
reminds me of this
>>
>>107864658
if you have enough vram you can make a 30 seconds i2v video without loss
>>
>>
>>
File: NetaYumeV40_Output_12515.png (2.59 MB, 1536x1280)
2.59 MB
2.59 MB PNG
>>107864184
>>
Z Image Illustrious when
>>
File: 1743229977650182.mp4 (3.57 MB, 1440x1080)
3.57 MB
3.57 MB MP4
looks like you get less slopped results if you go for normal ltx2 + distill lora at strength < 1
>>
File: _zimg_00003.png (1.03 MB, 864x1152)
1.03 MB
1.03 MB PNG
i am actively prompting qwen to hallucinate prompts and automatically gen them, what are the odds that i get v& by something it comes up with?
>>
>>107864688
you mean up the resolution?
>>
>>107864763
If only I could get my hands in a workflow that works and is not a complete mess...
Is there a workflow available for this video?
>>
It's up

https://civitai.com/models/933294
>>
https://github.com/Comfy-Org/ComfyUI/pull/11845#issuecomment-3752055641
>For gguf we will try not to break it but we are focusing on improving our own native quant system to make it better/faster than gguf.
delusional lol
>>
>>107864797
2 years late unc
>>
>>107864623
low res with have shit audio. You have to generate at around 1080 for good audio. The image and audio are connected.
>>
>>107864814
>The image and audio are connected.
really? that's dumb, the audio's quality should be its own and has nothing to do with the video
>>
>>107864823
Problem? Make your own model.
>>
File: 1763149559244816.png (187 KB, 274x356)
187 KB
187 KB PNG
>>107864828
>Problem? Make your own model.
>>
>>107864808
Well one more reason to stop updating until absolutely necessary
>>
>>107864823
You can do upscale pass, it helps the audio too not as good as natively but something
>>
>>107864814
so what you're saying is i can hear in 4k
>>
File: silent.mp4 (3.83 MB, 1500x2048)
3.83 MB
3.83 MB MP4
Tried to make her open the bottle with teeth :(
>>
>just upscale the audio
>>
File: 1761450192918469.png (31 KB, 1031x169)
31 KB
31 KB PNG
https://github.com/Comfy-Org/ComfyUI/pull/11837

>2 days ago
>nobody told me
baka baka
>>
>>107864878
what models are nv anyways, only one i saw is gemma
>>
>>107864872
With the ltx latent upscaler lol, it does both
>>
>>107864881
wan 2.2, zit, all the good shit. just search huggingface
>>
>>107864889
thanks
>>
>>107864823
It kind of does make sense. Video quality and audio quality are tied together in videos. When you see some 240p video from 2006 you're not expecting to hear crisp FLAC audio.
>>
File: 1763290501869535.png (34 KB, 200x252)
34 KB
34 KB PNG
>>107864898
you know what, that's a fair point
>>
>>107864889
>zit nvfp4 is 1/4 the size of bf16
big if quality is similar
>>
>>107864918
>quality is similar
not even close lol
>>
>>107864929
then how is this better than quants
>>
>>
File: _zimg_00209.png (996 KB, 768x1024)
996 KB
996 KB PNG
>>
Any link to an uncensored gemma3-12b fp8?
The default one is making my lewd gens into abominations.
>>
>>107864952
yeah
>>
>>107864878
oh shit thanks for the heads up
wan2.2 seems to about as fast with fp8 vs nvfp4 but at a glance nvfp4 quality is better but slightly slower.
>>
>>107864999
>quality is better
no it isnt
>>
>>107864985
what did he mean by this
>>
>>107864999
>at a glance nvfp4 quality is better
better quality than fp8? I really doubt that, hope I'm wrong though
>>
>>107864999
from here? https://huggingface.co/GitMylo/Wan_2.2_nvfp4/tree/main
>>
File: 1759347593798252.png (1.51 MB, 1024x1024)
1.51 MB
1.51 MB PNG
will lodestone use the same training data he used for chroma? i hope not. you can tell it was filled shit slop like pic related
>>
>>107865016
ye
>>
i guess we going back to pony realism
>>
>>107864952
>le uncensored text encoder meme
When will this nonsense die? An "uncensored" text encoder doesn't help the diffusion model make lewd outputs. All it does it make it so the LLM doesn't refuse requests. I.e. if you actually GENERATE text autoregressively using the LLM it will do what you ask instead of saying "Sorry I can't help with that." The actual text embeddings of the words in your prompt, which is the thing the diffusion model conditions on, barely changes with an uncensored text encoder. It's literal fucking snake oil that does nothing. If anything the slight shift in text embeddings would reduce quality.
>>
So I take it GLM-Image was a flop?
>>
>>107865023
I don't know, but what is the problem if it's tagged correctly? are you having trouble prompting for realism?
>>
https://github.com/Rolandjg/LTX-2-video-extend-ComfyUI

this is amazing. extend any clip, clone voices + video. see example.

https://files.catbox.moe/cale33.mp4
>>
>>107865139
Buy an ad
>>
>>107865142
I didnt make it, i'm linking so anons can enjoy it cause it's fun.

the thread is a resource for qwen edit/wan/ltx/etc, so why not. would the dev be making fun of troons? they would lose their github.

https://files.catbox.moe/1bsuwa.mp4
>>
>>107864934
If you want a serious response it's hardware accelerated so it should run a lot faster than q4 (which needs to be dequantized before being run), provided you are on 5000 series.
Nvfp4 is also not the same as standard fp4. Groups of 16 4-bit float values are scaled by an fp8 key and the tensor is globally scaled with an fp32 key, to lower the deviation from the baseline. It's still 4 fucking bits though, so don't expect magic in terms of quality.
>>
>>107865063
Yes.
I am still on copium that one day we will get a kino local AR model.
>>
>>107865158
Thanks for explanation, is there a nvfp8?
>>
>>107865165
>AR model
qrd
>>
>>107865155
another gen with the detailer lora enabled:

https://files.catbox.moe/yjxn76.mp4
>>
>>107865168
Nope.
>>
File: 0.png (1.23 MB, 1408x640)
1.23 MB
1.23 MB PNG
>>
File: ComfyUI_00546_.png (1.36 MB, 832x1216)
1.36 MB
1.36 MB PNG
>>107865173
Qrd: The model iterates on your prompt before diffusing.
>>
also, try enabling the detailer lora in the extend workflow:

https://files.catbox.moe/ipa17z.mp4
>>
>>107865209
When are you planning on making it
>>
what's the difference between wan 2.1 and 2.2?
I had a good 2.1 setup and i'm trying to get it to run 2.2 but i'm encountering errors... looking for a good workflow
>>
>>107865373
jerk off and go to bed bro
>>
>>107865353
Right after Z-Image Base drops.
Should take around two more weeks.
>>
>>107865378
I just want to compare i2v with 2.1 and 2.2
>>
>>
kek

https://files.catbox.moe/1h3ja0.mp4
>>
>>107864886
It also upscales the sound with the spatial upscaler??
>>
>>107865373
Wan 2.2 has higher quality.
Wan 2.2 is a moe that uses separate models for high timestep and low timestep denoising. It eats more system resources, particularly ram as a result.
>but i'm encountering errors
Can't help without seeing the WF or errors
>looking for a good workflow
Have you tried the default Cumfart template?
>>
any of yall know a method to turn anime images into realistic / psuedo-realistic ones? like a qwen lora or something
>>
>>107865425
Technically speaking no, but you are rerunning both latents through the sampler at a bigger res so it kinda ends up having the same effect.
>>
>>107865441
OK thanks I will try it then;
>>
in theory, you can make an entire anime episode with linked + extended LTX2 gens.

https://files.catbox.moe/brgchb.mp4
>>
File: file.png (26 KB, 336x421)
26 KB
26 KB PNG
>>107865427
>Have you tried the default Cumfart template?
I tried the default from one of the headers with the anime girl picking up a gun.

Getting this:
KSamplerAdvanced
Given groups=1, weight of size [5120, 36, 1, 2, 2], expected input[1, 32, 21, 96, 96] to have 36 channels, but got 32 channels instead

see picrel.

All i did was load the json and realign the diffusion models and vae + Clip.
>>
File: 1930.jpg (40 KB, 640x576)
40 KB
40 KB JPG
gemma 3 is truly stupid. many failed gens, and with periods that weren't even requested... even hunyuan wasn't this stupid
>>
>>107865448
Are you using the correct vae? 14B 2.2 needs 2.1 vae. Only 5B 2.2 uses 2.2 (You shouldn't use the 5B one anyway)
>>
>>107865457
What were your prompts?
>>
>>107864952
no it's not, that's not how any of this shit works at all
>>
>>107865063
it's not really a 1girl T2I sort of model
>>
>>107865491
What kind of model is it?
>>
>>107865479
something like
>A medieval knight in full plate armor stands in a castle entrance. The armor is intricately crafted with detailed engravings and a polished silver finish. He holds a longsword in one hand and a kite shield in the other.
>>
>>107865474
ugh, gimme a min, apparently I haven't upgraded comfyui in a while and now everything is broken... gotta learn to disable that upgrade shit...
>>
I fucking kneel LTX, extended a frieren clip

subbed is best, but this is still better than funimation:

https://files.catbox.moe/4u9bah.mp4
>>
>he pulled
>>
the fennec tranny strikes again
>>
>>107865513
i sure hope you are using at least Q6 of 27b gemma
>>
>>107865535
>invents new eyebrows
>face becomes fucked up
this model had like no cartoons in the dataset lol
>>
File: file.png (1.15 MB, 2000x1000)
1.15 MB
1.15 MB PNG
>>107865543
>his model had like no cartoons in the dataset lol
>>
>>107865474
nah i used the right ones.
Can you point me towards a basic workflow for wan 2.2? I'll get the resources/nodes as needed. I see tons of t2v but almost no i2v :(
you mentioned cumfart, where is that?
>>
>>107865554
benchod
>>
>>107865550
fill me in, xir
>>
>>107865543
worked fine on this one

https://files.catbox.moe/d3gnpk.mp4
>>
>>107865573
https://old.reddit.com/r/StableDiffusion/comments/1q9ao8t/ltx2_weird_result/
>>
>>
>>107865542
try it and show me the result
>>
>>
>>107865579
sl

>>107865586
op
>>
>>107865578
Dataset full of garbage and bad captions, evidently
>>
frieren but with guns:

https://files.catbox.moe/x0l3eu.mp4
>>
>>107865601
Ok this one was funny lol
>>
>>107865579
what lora?
>>
Any examples of the upscaler workflow for ltxv2?
I'm using the i2v workflow from kijai and it doesn't seem to use any upscaler so I have to gen with the full hd pics if I want hd stuff so I can't do more than 121 frames with a 5090 because the vae just shits itself and never goes to the video compilation.
>>
>>107865623
Use the comfy workflow for inspiration, once I've put everything out from the subgraph it made sense after checking.
>>
yakuza, ltx extend version

https://files.catbox.moe/6nw4t4.mp4
>>
File: gun no jutsu.gif (3.84 MB, 600x338)
3.84 MB
3.84 MB GIF
>>107865601
>>
>getting memory spikes from just image batching to convert to video
>fuck it just use ffmpeg
>works like a charm and barely even touches my RAM
Reminder to stop relying on ComfyUI for everything. It's a bloated mess that does things sloppier than what more efficient programs can already do.
>>
>>107865689
>zoomers in action
>>
holy shit, AI can hallucinate some interesting stuff with a basic prompt.

the laser beams cause the red tentacles to explode into fire and black smoke.

allah fern-bar!

https://files.catbox.moe/esajck.mp4
>>
File: file.png (1.61 MB, 1296x800)
1.61 MB
1.61 MB PNG
I made this btw
>>
>>107865612
Egon Schiele
ill release it soon
>>
>>107865724
like ugly on shitele
>>
>>107865724
nice gen
>>
>>107865741
thanks
>>
AI Toolkit now supports LTX2 training on 512 res, 5 secs and audio with 24GB cards now
>>
Is there a vid2vid workflow for ltx2? Would it work to just use a typical i2v workflow but use an input video and specify the number of outputs frames to be the same as the number of frames in the input video?
>>
frieren s2 early leak:

https://files.catbox.moe/hzng9z.mp4
>>
>>107865778
use this one, ive been making the extend edits with it, if you want a full duration edit just lower the frame load cap on the video to 9 or whatever.

https://github.com/Rolandjg/LTX-2-video-extend-ComfyUI
>>
>>107865724
you got links to other loras?
>>
>>107865780
bro where do you get your short anime videos?
>>
File: file.png (230 KB, 951x1143)
230 KB
230 KB PNG
>>
>>107865791
i only got illustrious loras on display atm
im not ganna shill my profile, you're just going to have to find me in the wild
>>
File: 00266-1397496870.png (698 KB, 768x1024)
698 KB
698 KB PNG
just installed forge neo so I can finally try ZIT

I thought 16gb vram was enough for q8 but it will only run with neveroom extension active
>>
>>107865780
also, in a very interesting output, I got Japanese without asking for Japanese.

https://files.catbox.moe/xeq8iu.mp4
>>
File: 1_.webm (1.07 MB, 640x960)
1.07 MB
1.07 MB WEBM
>>107865271
>>
>>107865797
I just recorded a clip with nvidia shadowplay (alt z) from my frieren s1 folder.
>>
>>107865817
It's gibberish, sounds like how a japanese va would though
>>
>>107865824
>recorded a clip with nvidia shadowplay
now this is some lobotomite tech lmao
>>
>>107865829
I just wanted a fast clip, otherwise i'd use adobe premiere (torrented) and cut out a clip.
>>
>>107865829
I just use snip....
>>
>>107865842
ima snip your balls
>>
>>107865824
nigga please https://github.com/mifi/lossless-cut
>>
has anyone remade one punch man s3 yet
>>
holy kino

the purple hair anime girl takes out a black pistol and points it at the camera and says "demon faggot you're going to die.". she fires the gun several times.

https://files.catbox.moe/0uekvw.mp4
>>
File: 9_.webm (1.07 MB, 512x704)
1.07 MB
1.07 MB WEBM
>>107865807
>>
>>107865850
unironic improvement
>>
>>107865849
it takes like 2 seconds to export a clip from a movie btw. it doesnt re-encode like normal bloatware usually would
>>
>>107865842
oh right snipping tool can do video too, thx anon

loading premiere for a quick meme isn't optimal desu
>>
>only reason we don't have actual porn video models is because of muttism poisoning the world
grim
>>
>>107865803
Can you share training settings?
>>
>>107865862
sora 2 can do porn
>>
>>107865850
imagine being able to train a lora on season 1 and fixing the rest...
>>
>>107865869
local model anon, come on
>>
>>107865851
two guns:

https://files.catbox.moe/frd2i1.mp4

this is fun, also with enough frames you can clone voices. it's pretty cool. need to try alex jones next.
>>
>>107865875
all you need is wan 2.2 and a lora. not enough for you?
>>
>>107865554
Click the templates button. Pick wan 2.2 on models list. Then click on the i2v workflow.
And cumfart is comfyui, saar.
>>
>>107865869
I know grok can/could but I doubt this. Any examples?
>>
>>107865880
you know that's not the same
>>
>>107865885
>Any examples?
https://files.catbox.moe/3jdl4m.mp4
>>
sometimes leddit brings the bants

https://www.reddit.com/r/StableDiffusion/comments/1qchwcg/ltx2_easy_all_in_one_workflow/
>>
>>107865893
lmaoooooooooooooooo
>>
>>107865893
This is impossible to make with any video model btw
>>
>>107865896
Stupid tourist, that's literally one of us
>>
>>107864863
wan is really bad at doing things you want it to do.. if it isn't the most generic thing imaginable, it just shits the bed
>>
>>107865905
Wan 2.6 handles this perfectly though?
>>
>>107865908
don't recall asking
>>
>take 1-2 seconds of video clip
>use it to clone voice and character
>make 10s+ gen
>clip the original 1-2 seconds off
can make literally anything with audio cloning this way btw.
>>
>>107865912
proof? and no the slop you posted so far isn't it
>>
>>107865912
It's kinda funny the best voice cloner model is ltx. Man we need better tts models
>>
>>107865911
didn't*
>>
what is alibaba studio waiting for to release something? we already have ltx 2 and z loras
>>
>>107865920
retard im not trying to clone the frieren voices, go try it on any news broadcast and you'll see
>>
>>107865922
VibeVoice could've spawned some incredible finetunes had microsoft not pussied out and rugged the training code
>>
>>107865924
you deleted before it even loaded, fuck you
>>
>>107865925
esl
>>
File: x_3cqnyf.png (1.62 MB, 1536x1024)
1.62 MB
1.62 MB PNG
>>
>>107865938
edible?
>>
>>107865920
nta here is a good example
https://github.com/Rolandjg/LTX-2-video-extend-ComfyUI
>>
>>107865942
yta though
>>
>>107865920
here, kneel to LTX, BBC news report. All I have to do to make it seamless is boost the low audio on their shitty youtube video. or do it in post. accent and everything is the same.

https://files.catbox.moe/fwu81t.mp4
>>
>>107865924
>but for z-image i resize it to by a factor of 3/4.
So you are training at 768p.
I assume you tried and failed with 1024p before? I didn't have the best time with it.
>>107865933
NTA but https://desu-usergeneratedcontent.xyz/g/image/1768/45/1768451789988.png
>>
>>107865948
Shucks, thanks love you too
>>
>LTX is ba-ACK
https://i.4cdn.org/wsg/1768452199582496.webm
>>
>>107865951
part 2, hahaha

I cant trust anything I see now, if local is this good. to fix the transition I just have to adjust the frame number (so it is right after a word)

https://files.catbox.moe/kfixll.mp4
>>
>>107865898
Nah, it's Sora. You've seen the YTPs right?
>>
File: miku3.jpg (838 KB, 1751x1151)
838 KB
838 KB JPG
using that lipsync workflow from reddit
>>>/wsg/6072959
>>
>>107865977
Nope
>>
Spooknik is MIA for more than a month now.
I think Chromachaku might be dead:(
>>
>>107865981
link pretty please?
>>
>>107865993
https://old.reddit.com/r/StableDiffusion/comments/1qcc81m/ltx2_audio_synced_to_added_mp3_i2v_6_examples_3/
>>
what can 96gb vram do but not 24gb vram?
>>
>>107865992
Oh fuck didn't see this recent discussion:
https://huggingface.co/spooknik/Chroma-HD-SVDQ/discussions/6
Yeah, it's over.
>>
>>107866002
thanks
>>
imagine how many people you can trick with these workflows, whether it's t2v, i2v, or v2v extension. also, the creative applications of it.

seamless.

https://files.catbox.moe/23tjt6.mp4
>>
>>107866003
Hunyuan 80B
Flux 2 (faster and less quantized)
Full finetune of larger models
FP32 of mid sized model
Keep a lot of crap in the VRAM instead of unloading and reloading. (Wan 2.2 for example)
Low denoising upscales of very large images
Not saying all of these are worth it.
>>
>>107866021
What can 32gb vram do that 24gb vram can't?
>>
>>107865952
no i just don't know why z-image prefers smaller images, considering it can go up to 2048
>>
>>107866009
the great thing about the model is it can clone voices too, so you dont even need an external app to do it, and it works with the video.

https://files.catbox.moe/nf8msj.mp4
>>
>>107865981
did you gen that miku? if so which model?
>>
I can't be the only one noticing what is going on.
>>
has anyone checked the fights or something for ltx?
>>
>>107866039
Tell us
>>
>>107866035
zit
>>
>>107866028
I think you are supposed to min-max timestep distribution to be able to train at higher resolutions but:
a) The precise knowledge seems to be gatekept at a few discord channels now
b) I don't care enough to run multiple tests to figure out myself in the current distilled version. If it's still an issue with the base, I will take a look again.
>>
>>107866021
Almost none of those are worth it lol, maybe faster wan load but ehh
>>107866043
>fights
as in can it do fight scenes? It's meh at it unless you run it on 50fps
>>
last one. only 33 input frames from the video:

https://files.catbox.moe/88crej.mp4
>>
>>107865992
>>107866005
chroma or wan chaku is never meant to be, ive accepted this...
>>
finally, via AI we can make Jensen honest:

https://files.catbox.moe/hml4zw.mp4
>>
>>107866039
shilling?
>>
File: its completely over.png (916 KB, 1024x1024)
916 KB
916 KB PNG
>>107866060
I can live without chroma but no wanchaku hurts.
>>
>>107866101
im not shilling, im having fun cause those wan cocksuckers made 2.5 API only and now I have a free model with sound more capable than their model.

hope they choke to death on their shekels. enjoy failing like stability AI, niggers.
>>
CES 2026 continued:

https://files.catbox.moe/hae03w.mp4
>>
>>107865050
>>107865481
You know that you can just test it with the same seed and same prompt?
Z-image normal and abliterated qwen:
>Hatsune Miku reclining naked on a beach lounge.
https://files.catbox.moe/fcfoqa.jpg
Obviously, it doesn't improve generation of nipples or vagene because the model had seen no such images in training, but abliterated TE makes the model follow nsfw prompts more easily. It won't add skin-colored clothes when asked to do nudity.
>>
>>107866125
idk but he sounded ominous
>>
>>107866137
>michael jordan feet
anyone got clorox for my eyes?
>>
>>
File: 1758610020136112.png (3.47 MB, 1880x1248)
3.47 MB
3.47 MB PNG
>>
>>107866232
Needs to be more grungy and blurry, too many pixels
>>
>ltx2
>wan + freelong
which one for 20 sec video?
>>
>>107866264
test this out for me pls https://www.reddit.com/r/comfyui/comments/1q61gfd/update_wan_svi_infinite_legth_video_now_with/
>>
>>107866269
does it support keyframes? if not then freelong is better
>>
>>
is there any way that ltx can do women in panties walking around without creating body horror skin mutations?
>>
>>
>been watching a dude on youtube making workflows with his autism
>he now uploads blurred porn and spouts rumors to get views

Sad.
>>
>>107866336
>>been watching a dude on youtube making workflows
There's your problem.
>>
>>107866336
>>107866342
>watching ... youtube
No that's the problem.
>>
tranny ass topaz software queue disappears just like in comfyui after a crash and its unstable on its own already
>>
>>
>>>/wsg/6072537

I can't be the only who thinks it's fucking insane that you can now 1 shot a 40 second video on a single 3090 around 5 minutes and there's barely any fuckery.
>>
>>107866556
you're not alone, ltx has brought a lot of good shit to the table, that and the fact you can extend a video with great accuracy is a huge deal too >>>/wsg/6072806
>>
https://www.scmp.com/tech/tech-war/article/3339869/zhipu-ai-breaks-us-chip-reliance-first-major-model-trained-huawei-stack
>omg guyz we made GLM-image without having to use Nvdia cards!!!
who cares? that model sucks anyway lool
>>
>>107866556
>say you purposefully brought a lewd movie for the family to watch instead of saying it was an accident
chinese culture is so interesting
>>
>>
>>107866577
>we made GLM-image without having to use Nvdia cards
This isn't the win they think it is with those results. I'll happily buy a chinese card once they're proven to be good.
>>
It's been like 4 days of civitai doubleposting uploads. Their jeetcoding is breaking apart and they can't fix it, lol.
>>
My life became better when I stopped giving a fuck about jeetit and whatever BS is going on there.
>>
facial physiognomy diversity of qwen image 2512 is a great improvement of the previous model and way better than ZIT.
>>
File: yrwj.gif (63 KB, 595x696)
63 KB
63 KB GIF
This is an extreme and urgent request. I am in desperate need of an extremely meticulous and accurate AI image editor model which won't moralfag me. No this is not for NSFW purposes. will be using it to edit a document text. All the best ones are not allowing me to edit it. It can be local as well. Please help me
>>
File: 1750834009107281.png (179 KB, 517x266)
179 KB
179 KB PNG
>>107866623
>>
>>107866632
wrong thread senpai
>>
>>107866556
for me the most insane thing is that we finally have a local video model that doesn't pretend it's in space (looking at you Wan 2.2)
>>
>>107866637
Actually it's the right thread, but random people who show up desperately looking for ways to edit like this give me the ick.
>>
>>107866637
an anon on lmg redirected me to here. No idea to where else to go. Reddit is a no go for obv reasons. Would still like any recommendations though
>>
>>107866632
They can moralfag less on the API sometimes.
If you want to do local not too many options besides
Flux Kontext
Qwen Image Edit
>>
>>107866632
>moralfag
>edit a document text
it's not just about morality, it's illegal lol
>>
>>107866632
>he actually asked ChatGPT or something to edit documents
>it's now recorded on their servers if he gets caught
Good luck newfag
>>
File: ComfyUI_00019_.png (1.76 MB, 1400x800)
1.76 MB
1.76 MB PNG
>>
File: 1766481611633501.png (2.44 MB, 1216x1248)
2.44 MB
2.44 MB PNG
>>
File: 1743945021768171.png (1.91 MB, 1248x1216)
1.91 MB
1.91 MB PNG
>>
File: zitlora.jpg (690 KB, 1344x1728)
690 KB
690 KB JPG
first ever zit image on new ai pc, lets go

Gonna train zit sometime but my brain is fried just setting up
>>
>>107866705
Clannad was such a depressing anime, don't watch that when you're in a sad mood or you'll ACK- yourself
>>
how do you make a single .safetensors for a text encoder? I wanted to use some ablit/heretic/mpoa encoders to test stuff out and wanted too cook my own.
>>
>>107866717
>Gonna train zit sometime
don't, I spent a lot of time downloading loras for ZiT and they all suck, this model is just too distilled to be trained with
>>
https://huggingface.co/lodestones/Zeta-Chroma/blob/main/zeta-chroma-x0-pixel-proto.safetensors
Has anyone tried that one?
>>
>>107866733
Couldn't hurt desu, gonna do anything anyway, at worst I just made a dataset anyway
>>
>>107866730
I dunno but you can try ggufs. Search gguf quants of the abliterated TEs. Major quantizers like bartowski and Unsloth also publish bf16 ggufs.
Note, since this is an unusual use case and gguf implementation for diffusion is overall not in a good state, there might be some bugs or regressions with this.
>>
>>107866751
It has been training for only two weeks.
Don't see the point in bothering with it.
If you must go to lodestone's shitcord and ask for inference code, it's not supported by anything yet.
>>
>>107866717
>>107866754
Train at 768 or 512. 1024 is more difficult for reasons.
Even the best ZiT loras still break the anatomy and text a bit.
>>
>>107866717
>>107866754
Loras work great with Z. Just use the distilled as base when you train.
>>
File: ComfyUI_00020_.png (1.94 MB, 1400x800)
1.94 MB
1.94 MB PNG
>>107866718
Just heal your soul with Tomoyo After's rimjob scenes. https://arch.b4k.dev/vg/thread/545288714/#545353981
>>
>>107866791
>>107866792
thanks thanks thanks
>>
>>107866804
wtf?
>>
>prompt character on chroma
>despite using a strong lora, the model completely shifts to the ugliest cartoon artstyle known to man
is there a way to mitigate this shit
>>
>>107866855
By not using Chroma. Anything remotely usable you see from Chroma is a 1 in 100 cherry picked literally unicorn image that in no way represents the absolute shit that model usually spits out.

You are better off just using SDXL.
>>
>>107866860
truth nuke
>>
>>107866767
I know, but I wanted to use comfy's native model loading instead of the gguf custom nodes.
>>
>>107866855
On top of many faults of Chroma commonly being discussed here lack of style control is another one. I have seen detailed photographic prompts randomly switching to illustration styles across seeds.
This anon's right >>107866860
Z-Base will save us soon.
>>
>>107866904
>Z-Base will save us soon.
Someone's been skipping their Chinese culture lessons.
>>
>>107866882
Try your luck with this random crap I found on github:
https://github.com/soursilver/safetensors-merger
Also you may want to convert it to fp16/bf16 later on, the models are likely in fp32.
>>
File: 1747874949990820.png (140 KB, 498x281)
140 KB
140 KB PNG
>>107866904
>Z-Base will save us soon.
lol
>>
We're getting wan 2.5, I haven't seen anything from it at all. It was dead as fuck wasn't it?
>>
>>107866945
They said it was too big.
>>
>>107866945
anime chinese man said it was too big to be run, it's probably a 40b model no one will ever run (like step video which was a 30+b model)
>>
>>107866945
>We're getting wan 2.5
Sorry. You didn't say please enough.
>>
>>107866929
I was thinking of something like thisa:
import torch
from transformers import AutoModelForCausalLM
from safetensors.torch import save_file

MODEL_ID = "YanLabs/gemma-3-4b-it-abliterated-normpreserve"
OUT_FILE = "gemma-3-4b-it-abliterated-text-encoder.safetensors"

# Load model
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.float16,
device_map="cpu", # safest option
)

state_dict = model.state_dict()

# Save as single safetensors file
save_file(state_dict, OUT_FILE)

print(f"Saved {OUT_FILE}")

actually ill just try this
>>
What's that one chinese twitter account people swear by for leaks in here?
>>
>>107867003
bdsqlsz
>>
File: ng2.png (1.64 MB, 856x1216)
1.64 MB
1.64 MB PNG
>>LTX is ba-ACK
>>
>>107867003
this dude
https://xcancel.com/bdsqlsz/status/2009520301156258171#m
he's even allowed to Alibaba's conference and shit
>>
>>107866976
>>107866979
How much is 40b? Like 80-90gb? Offloading is still possible.
>>
>>107867052
>Offloading is still possible.
The bargaining phase is over, anon. It's not coming.
>>
>>107867042
he's the omar of diffusion
fuck omar
>>
>>107867052
basically on fp8, the number of parameters and the size are the same, so 40b = 40gb on fp8
>>
>>107866937
>does
>doe-Z
Z-base confirmed
>>
why are people still using chroma?
use case?
>>
File: 1737350488259046.png (1.58 MB, 1024x1472)
1.58 MB
1.58 MB PNG
>>
>>107867184
>use case?
Satisfying emotional debt caused by months of sunk cost in believing the next epoch would finally fix its deep and fundamental flaws that make the model literally unusable.
>>
>>107867184
>why are people still using chroma?
it's the only model that can make realistic image and NSFW at the same time, and people are willing to get through 99 straight images filled with anatomy attrocities if they can get 1 good goon image out of it, many such cases
>>
>>107867203
man it was 1 cope after the other
>actually chroma was very soul at Revision #
>no wait #48 is the actual soulful one
>#50 (HD) is bad but wait!
>there's also the flash heun model its good (its not)
>and there's the HD flash merge too!!!! lol!! I swear this time it converges good!!!!
>but you know whats really bad? its not the unfinished training... its just the.. UGH VAE!!!
>yeah lets train a new vaeless chroma LMAO, RADIANCE!
>*retard spams the general for weeks with his absolutely melty/cooked gens*
>uhmmm no radiance is good !!!!
>but WAIT, ackshually radiance can be fixed with this x0 version
>ehh but you know what? we're moving onto z-image... what? waiting for base? lmao!!!! we're training on a distill just like we did for normal chroma!!!
what a shitshow
>>
>>107867228
>>but you know whats really bad? its not the unfinished training... its just the.. UGH VAE!!!
I have to admit that's me :( I really thought Flux's VAE wasn't that good, and then Z-image turbo showed me that it's actually an incredible VAE
>>
>>107867228
Well at least it proved de-distillation is a waste of time, so something at least, costly lesson though...
>>
i not sure what going on here, i haven't used pinokio in 6 months because i had issues installing wan2gp over there so i did a normal stand alone install. i recently opened up pinokio to use Joy_Caption_Alpha-Two_GUI app and it had some install requirements which i accepted the installation requirements, finished captioning some images and then closed the application. I'm trying to use wan2gp and now I'm running into some road block again. how do i resolve this?
>>
>>107867209
>realistic image
but it's not
>>
File: 1743998107645331.png (157 KB, 498x430)
157 KB
157 KB PNG
>>107867228
you have to admit it was pure entertainment though
>>
>>107867238
That furry fag didn't learn that lesson since he wants to save Z-image turbo and that model is way more distilled than Flux Schnell
>>
File: mlady.jpg (323 KB, 948x1264)
323 KB
323 KB JPG
>>107867184
>use case?
Fun to use, gives me the results I want
>>
>>107867228
>waiting for base? lmao!!!!
to be fair, we've been waiting for base for too long I can understand he wants to move on
>>
>>107867184
qwen image already mogged chroma so idk
>>
>>107867264
It's only been like a month and a half lol. Not enough to open the wallet on a stupid mission imo
>>
>>107867269
good joke, qwen image is still plastic and can't do NSFW out of the box
>>
File: aaaaaaaaaa.png (85 KB, 225x225)
85 KB
85 KB PNG
>>107867275
>It's only been like a month and a half lol.
THEY PROMISED TO RELEASE IT "NEXT WEEK", NOT NEXT DECADE
>>
File: end my suffering.png (171 KB, 736x736)
171 KB
171 KB PNG
>>107867275
>It's only been like a month and a half
and counting...
>>
>>107867275
People think I'm joking when I talk about Chinese culture. If you even tried to read between the lines, you'd understand like I do that we're not getting it.
>>
File: 1755329637196283.png (67 KB, 1613x259)
67 KB
67 KB PNG
>>107867299
B-but (Corporate Hegemony™) said "Patience will be rewarded"!!1!1!!1
>>
>>107867306
>Patience will be rewarded
They never said with what.
>>
>>107867184
I use chroma for base image, and then zit for detailer
>>
>>107867239
can someone help me please?
>>
>>107867306
Patience, Colorado.
>>
New thread

>>107867304
>>107867304
>>107867304
>>
>>107867325
Interesting. You got any examples?
>>
>>107867235
>>107867228
I don't get this retard, why can't he just take the dedistilled ZiT and train it? What's his problem?
>>
>>107867340
not sharing, rajesh



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.