[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of Free and Open Source Diffusion Models

Prev: >>107757207

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2485296
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
8 minutes 47 seconds
>>
>>107762587
https://www.reddit.com/r/comfyui/comments/1pgvdgo/impact_pack_is_trying_to_connect_to_youtube/
>>
File: zimg_00028.png (1.31 MB, 864x1280)
1.31 MB
1.31 MB PNG
>>107762645
thx anon, good reminder to use a standalone computer for this shit
>>
>>107762645
that's ultralytics (YOLO) telemetry though, nothing to do with image gen or what you're prompting
also it can be disabled
>>
File: naoZ.png (1.89 MB, 1440x1563)
1.89 MB
1.89 MB PNG
base when
>>
File: 1f7-3005716917.jpg (53 KB, 685x567)
53 KB
53 KB JPG
idk maybe it's just my dataset but it feels like ZiT has some problems with picking up bodytypes that are noticeable different from the default one. eg shortstack girl comes out as almost normal sized, you can kinda wrangle ZiT with prompts but it feels like it's strugglng to pick up on physique, while the likeness comes out perfectly. At least that's my impression, I'm using old datasets, and with XL models the bodytype came out way closer.
>>
A little tip: use something like https://setpose.com/ to get the perfect pose and angle for i2i and play with denoise values. I mostly use it with zit. Play around with all the different settings to see what the free version can do. Also, change the color of the model to the skin color you want. Zit is pretty good at following the poses.
>>
File: zimg_00002.png (1.3 MB, 864x1280)
1.3 MB
1.3 MB PNG
>>107762645
>>107762672
yeah this isn't from comfy it's from ultralytics (which is still kinda shitty), but i had this happening from when i was training vision models for another task already. disabled it now, still gonna use the nodes.

>>107762699
my body type loras are almost too strong
>>
What about
What about three steps Ksampling?
Or more.
>>
File: file.png (206 KB, 1285x1353)
206 KB
206 KB PNG
>>107762705
seems straightforward enough
>>
>>107762592
working on doing a Lora of an OC, my workflow at the moment is to use a base SDXL model to make 20 images, various poses, expression ect, with aorund 15 of them containing her regular outfit and 5 in normal closes, im color matching the skin and hair, eyes ect to try and get them as close to each other as posible and then plan to use them to make a Lora using SDXL v6, thing is im not really happy with the results, things youd never consider like eye size being slightly off, head shape, things like that theyre so hard to control and get sonsistent that im thinking of starting again from scratch, has anyone had any luck doing this? if so what model did you use, was considering Illustrous based ones if this doesnt work out
>>
File: 1763544955323444.png (149 KB, 823x643)
149 KB
149 KB PNG
>>107762751
>>
are lustify/chroma still the best nsfw image models? anything new worth checking out?
>>
>>107762724
>my body type loras are almost too strong
odd idk, I mean i can see it kinda picking up the bodytype but not quite, like that fitness influencer dataset I have, she is kinda muscular but has kinda short legs, I have enough clear full body pics but it always ends up giving her normal length legs and so on
>>
File: z-image-turbo_00005_.png (1.9 MB, 2577x1353)
1.9 MB
1.9 MB PNG
>>107762751
wups

>>107762805
if you have your dataset captioned try uncaptioning it and see what results you get
>>
File: 1748840570026472.png (497 KB, 816x640)
497 KB
497 KB PNG
>>107762762
>>
>>107762882
whats the denoise you use
>>
File: 81359230-348940020.jpg (52 KB, 600x469)
52 KB
52 KB JPG
>>107762850
>if you have your dataset captioned try uncaptioning it and see what results you get
Yeah its captioned, worth a try I guess, I always used captioned sets so far, might be interesting what it spits out this way
>>
>>107762850
question, how is Z turbo with Anime style and can you make Lora's for it yet?
>>
>>107762675
Nice style
>>
>>107762901
0.78
You can get away with high values when the pose is very readable and easy for it to interpret, just as long as your prompt aligns nicely with the pose you're feeding in.
>>
>>107762592
>4/10 of the gens are mine
holy kino thread of ultra friendship or whatever!!!
>>
>>107762945
tran is too much of a spiteful bitch for the collage to be taken seriously. the thread blessings are tarnished by a half nigger tranny with bad taste. you should feel bad for supporting a drama faggot and avifag worse than debo
>>
File: imageData.png (1.71 MB, 1024x1536)
1.71 MB
1.71 MB PNG
As a techlet retard, why do some models seem to pick up the concept being trained gradually, with the sample pics looking coherent but slowly changing, while with other models the whole thing suddenly collapses into an garbled`barely recogizable mess when it starts cooking, with the training pics only slowly becoming more coherent again as time progresses?
>>
File: 1686934998056.jpg (1.17 MB, 1664x2432)
1.17 MB
1.17 MB JPG
>>107762919
It can't do any styles but you can make loras. Could from week 1 even.
>>
File: Begone vramlets.png (97 KB, 1139x484)
97 KB
97 KB PNG
>>107757565
nmp
>>
File: zimg_00062.png (1.53 MB, 864x1280)
1.53 MB
1.53 MB PNG
>>107762919
i don't do a lot of anime but it's pretty great out of the box if you're just referencing stuff
>>
>>107763003
>>107762983
Nice
>>
File: 1686572851767.jpg (1.71 MB, 1664x2432)
1.71 MB
1.71 MB JPG
>>
>>107763131
you have zero reasons for living. all you do is think about ani. cry about it
>>
File: 1759703180094537.png (2.37 MB, 1472x1024)
2.37 MB
2.37 MB PNG
>me when I read someone sneeding about the fagollage
>>
File: 1633909218999.jpg (1.11 MB, 2304x1792)
1.11 MB
1.11 MB JPG
>>
File: 1742642899796492.png (2.19 MB, 1024x1440)
2.19 MB
2.19 MB PNG
posting 1girl standing looking at viewer? couldnt be me!
>>
File: 1764913951853980.png (2.14 MB, 1440x1024)
2.14 MB
2.14 MB PNG
we finna be gay af underwater!
>>
File: zimg_00075.png (1.45 MB, 864x1280)
1.45 MB
1.45 MB PNG
>>107763267
>>
>>107763305
>brown frosting
coincidence? I think not
>>
>>107763226
why would you gen small breasts
>>
>>107763305
Fitting brown frosting
>>
>>107763319
you realize not everyone likes cow tits right? I prefer normal sized tits, if you think those are small, man, you might be a coomerbrained porn addict fag
>>
Blessed thread of frenship
>>
>>107763339
these threads have been cursed for a long time
>>
File: zimg_lora__00075_.jpg (1.61 MB, 8620x1212)
1.61 MB
1.61 MB JPG
>>107763321
>>107763315
tfw they get the frosting joke but not the pink roses

>an anime still, a man in a cool outfit leans against a car, in the style of..
No one
Hayao Miyazaki
Mamoru Hosoda
Makoto Shinkai
Satoshi Kon
Shinichirō Watanabe
Sunao Katabuchi
Yoko Kanno
Naoko Yamada
Mamoru Nagai
Hiroyuki Imaishi
>>
File: zimg_lora__00077_.jpg (1.61 MB, 9476x1212)
1.61 MB
1.61 MB JPG
>>107763377
wrong image, it's this one
>>
File: file.png (1.34 MB, 1120x1440)
1.34 MB
1.34 MB PNG
>post in /ldg/
>get mass deleted due to some kind of mixup with the actual spammer
sad
>>
File: 1761885953567176.png (2.51 MB, 1440x1024)
2.51 MB
2.51 MB PNG
>>
>>107763401
do you masturbate to controllers retard?
>>
>>107763412
This might be shocking to hear but some people gen things they don't masturbate to
>>
how can I post on 4chan after my pass expires, they block all vpns and random range bans blocking images
>>
File: file.png (1.7 MB, 1120x1440)
1.7 MB
1.7 MB PNG
>>107763412
n- no...
>>
File: 1742311059036551.png (1.62 MB, 864x1280)
1.62 MB
1.62 MB PNG
>>107763305
>>
>>107763439
what the fuck?
>>
>>107763401
stop spamming your shitty controller you dumb fucking nigger retard.
what are you? retarded?
gen some new material fuckin moron
>>
File: 1753067394244910.png (1.01 MB, 855x1508)
1.01 MB
1.01 MB PNG
Proud of yourselves?
>>
>>107763591
alright pack it up everyone, a women has got the ick
>>
>>107763605
billions in compensation awarded, all schools to teach on the dangerous of virtual rape
>>
>>107763591
kek
>I know it wasn't me, but I FEEL it was me
nice bookend
>>
>>107762850
I will be so happy when girls go ultra thin route mental illness instead of mega fat route
they'll go mental illness either way for the concerned demographic
>>
>>107763591
>I got robbed! Buy my nudes for 15$ on onlyfan!
>>
>>107763591
>posts her age, real name and countless attentionwhore photos of herself on the internet
>she seethes at this
Is this the norm nowadays? Are women unironically this fucking retarded in 2026?
>>
>>107763685
incentivised retardation
what other mind could concieve of 'sexist air conditioning' or 'pink tax'
>>
File: 1758029901645676.png (117 KB, 1830x660)
117 KB
117 KB PNG
https://github.com/huggingface/transformers/pull/43100
>GLM-Image AR Model Support
maybe we'll be saved from (((Alibaba))), I always loved the LLM GLM series, they are more soveful than the rest of the LLMs, let's hope that's also the case for their upcoming image model
>>
>>107763439
Dude that's impossible.
>>
>>107763808
i haven genned past 10k images at this point
masturbated to around a dozen or so
>>
>>107763584
holy meltie over controller anons kino
>>
>>107763584
>a billion flowd fent obsessive images, more floyd than even media at the peak of their obsession with him
>10 billions hitlers
>but clearly, the dozen controllers posted are way too much
thanks for taking a stance
>>
File: 1736407703001367.png (176 KB, 460x310)
176 KB
176 KB PNG
>>107763806
>maybe we'll be saved from (((Alibaba)))
and then it's another incredible turbo model and we'll also have to wait for base for that one
>>
>>107763806
>AR
so it's an autoregressive model right? we never had a good AR local model, so let's hope that one will break that cycle
>>
File: IMG_2883.png (2.33 MB, 1280x1408)
2.33 MB
2.33 MB PNG
>may 24 2023 /sdg/
nah. what gives.
>>
File: IMG_2884.jpg (3.05 MB, 2592x1728)
3.05 MB
3.05 MB JPG
why have we regressed
>>
File: IMG_2885.png (2.64 MB, 1280x1408)
2.64 MB
2.64 MB PNG
what happened to this expressiveness
>>
>snubbed
Fuck you all. Tasteless retards.
>>
File: Z-image turbo.png (2.27 MB, 1280x720)
2.27 MB
2.27 MB PNG
>>107763975
>why have we regressed
we haven't, Z-image turbo is a way powerful model than anything we got before
>>
File: IMG_2886.jpg (906 KB, 1408x1792)
906 KB
906 KB JPG
this was peak soul
>>
>>107763878
>but clearly, the dozen controllers posted are way too much
the other niggers are annoying too but you are not different from them.
you are the same.
stop doing that nigger retard faggot assfeetus
>>
File: IMG_2904.jpg (2 MB, 2048x4096)
2 MB
2 MB JPG
take me back

>>107763996
that's the best you can do with cutting edge models?
>>
slop hours
>>
File: unnamed (1).jpg (835 KB, 2048x2048)
835 KB
835 KB JPG
we have been rug pulled and didn't even realize
>>
>>107764009
that image looks like ass the architecture is completly broken, why are you pretending this is the peak of local lol
>>
>>107763806
Why's some Russian faggot re-linking this everywhere? His Babushka die or something?
>>
File: you can do it.png (153 KB, 498x410)
153 KB
153 KB PNG
>>107763806
I think it'll be good, they have no reason to release an image model if they know everyone will ignore it if it's worse than ZiT
>>
>>107764025
where did all of that potential go
>>
File: Capture.png (242 KB, 500x244)
242 KB
242 KB PNG
>>107763806
https://github.com/huggingface/transformers/blob/cd8d78fcb4067979e921b20163d62035c51b4e7f/src/transformers/models/glm_image/modular_glm_image.py#L794
>=== Case 1: Image-to-Image Generation (single or multiple source images + 1 target image_grid) ===
>=== Case 2: Text-to-Image Generation (no source images + 2 image_grids for multi-resolution) ===
it's an edit model
>>
>>107763993
>>107764014
same person
>>
File: 1752911935160455.png (10 KB, 383x119)
10 KB
10 KB PNG
>>107764076
wrong
>>
>>107764041
>I think it'll be good
I hope so, that'll force Tongyi to release the base model if it turns out we'll move on without them lmao
>>
did anyone ever figure out how to make wan video lora locally on 16gb ramlet card?
>>
File: img_00026_.jpg (376 KB, 1520x1728)
376 KB
376 KB JPG
>>
>>107764009
Even at the time I thought this was ugly and an abomination. I suspect only subhuman retards find it appealing.
>>
>>107764009
I like the idea, I wonder if it can be recreated with less impossible stuff.
>>
File: 29a.png (16 KB, 645x770)
16 KB
16 KB PNG
>Even at the time I thought this was ugly and an abomination. I suspect only subhuman retards find it appealing.
>>
Base is a collective hallucination.
>>
whats anonies issue?
>>
>>107764194
>a collective hallucination
there's a french meme about that sentence lol
https://youtu.be/MhIbTEue2ew?t=90
>>
>>107762705
i use magicposer for poses. very simple and intuitive.
>>
>>107764219
holy shit, the memories, I vaguely remember people driving this guy nuts, kind of sad, they could have left him alone
>>
what's the best penis substitute for z-image? i've ben using a carrot
>>
>>107764248
Do it trib style. Take a pic of your own cock over the gen.
>>
File: 1763912753777134.jpg (52 KB, 499x499)
52 KB
52 KB JPG
>>107763591
she feels humiliated for that weak grok. i can't even imagine her reaction, if she saw my bbc punishments, lol
>>
>>107764268
qrd or keyword for me to do my reps?
>>
File: 1764736820653390.png (267 KB, 604x1059)
267 KB
267 KB PNG
>>107763591
>Proud of yourselves?
should've read the ToS, by uploading her pictures on Grok she agreed on having people making parodies of her
>>
>>107762592
Hi /g/,
I tried AI generation in the past but my GPU sucked.
I got a better one now and I'm trying to get a hang of the basics (light/colors/camera/styles) on SDXL before I try to do anything with flux or illustrious
I suck at prompting though, and I can't get Comfy to do what I want it to do
Can anybody give me suggestions on how to improve my prompting?
I'm assuming there is an AI tool that can help with that?
>>
>>107764219
c'est pas drôle du tout ça
base ou je lance une bombe nucléaire vers la Chine
>>
>>107764329
désolé monsieur anonyme, mais la culture chinoise est plus forte que tout
>>
>>107764308
prompting depends on the model: get z-image turbo, and just ask any llm to improve the prompt for you. you shouldnt use tags like everyone did before.
>>
>>107764308
>Can anybody give me suggestions on how to improve my prompting?
proficiency is gained only through trial and error. spend four years prompting and itll start to click.
>>
>>107763806
If they made an AR image model based on GLM-4.6V-Flash 9B it might be interesting, or else it's DoA
>>
File: 1755544348083363.jpg (35 KB, 682x548)
35 KB
35 KB JPG
>>107763591
>le Musk did this!
hate the journo faggots so much. Musk or Grok didn't do anything you dumb cunt. It was the person who used the tool. Other women are more than happy to have a tool that can put them in a bikini. What about their rights to enjoy the bikini tool? What about their rights to enjoy others putting them in a bikini? Fuck off if you don't like it.
>>
File: img_00032_.jpg (424 KB, 1520x1728)
424 KB
424 KB JPG
>>
>>107764470
Musk should have changed the name to Le Musk, that would have been on brand for his sense of le humor.
>>
>>107764470
>Musk or Grok didn't do anything you dumb cunt. It was the person who used the tool.
it's even worse when you know it's those bitches that are doing this to themselves, if they don't want to use a website that allows image editing, they can just leave the site
>>
>>107764248
use a banana for scale
>>
>>107764308
When you have an image in your head you want to generate, really think about not only what you're 'seeing', but also what you're not actually seeing; don't EVER mention the things you're not seeing in detail, or it will try and generate it. Say, for example, you want someone with their hands handcuffed behind their back, don't prompt 'with their hands handcuffed behind their back', for it will try to generate their hands in handcuffs in the image and will likely go schizo about the handcuffs specifically. Instead, don't say anything about the handcuffs, just say 'hands hidden behind their backs', for what the hands are doing behind the back (whether they're handcuffed/tied or not) really doesn't matter if you can't see them anyway. Apply this thought process to everything when prompting.

Be detailed about the things you want to see, not the things you don't. It doesn't need to know everything in your head or the intent/context of the image, it only needs to know exactly what you can see in your head and no more.
>>
>>107764484
Isn't that the spider bitch from Wicked City?
>>
File: 1765848857521218.mp4 (3.84 MB, 720x1280)
3.84 MB
3.84 MB MP4
SVI 2.0 is pretty impressive desu, now if only making one minute video wouldn't take ages it would be fun to do as well
>>
>>107764599
I guess you used the same prompt for the whole minute?
Can you make her turn/spin, I wonder if it's able to maintain her face for a whole minute.
>>
>>107764351
>>107764381
>>107764514
thanks!
>>
>>107764630
>I wonder if it's able to maintain her face for a whole minute.
I won't do that again it's too long, but you can keep her face consistent yes, look at that example
https://www.youtube.com/watch?t=603&v=PJnTcVOqJCM&feature=youtu.be
>>
>>107764599
>SVI 2.0
what is it? never heard of it before.
>making one minute video wouldn't take ages
how long did it take you? hours? also what gpu?
>>
>>107764655
>>SVI 2.0
>what is it? never heard of it before.
they finally found a way to make Wan do longer videos without having it look like garbage
https://github.com/vita-epfl/Stable-Video-Infinity
>>
File: img_00060_.jpg (399 KB, 1520x1728)
399 KB
399 KB JPG
>>107764551
the one and only
>>
>>107764678
woah thats neat.
whats the catch?
takes 10 hours to gen a 1 minute video?
>>
>>107764719
well, imagine the time it takes to do a 5 seconds Wan 2.2 video, now multiply that time by 12 to get a 60 seconds video
>>
>>107764646
how the hell is it able to keep face consistent??
>>
File: 1671095569362.jpg (993 KB, 1664x2432)
993 KB
993 KB JPG
Turbo can't deliver the same natively, I cry.
>>
>>107764719
it takes me 13min to make a 6s video in 720p that looks ok, so 2h for 1min
that's something I can run at night
>>
>>107764678
Can you edit parts of it, aka not forced to generate everything at once?
>>
>>107764599
If you sped that up to look real-time, it would be 5sec long...
>>
File: 1746748116191213.png (523 KB, 720x540)
523 KB
523 KB PNG
>>107764738
>how the hell is it able to keep face consistent??
black magic
>>
Does training on dedestilled ZIT make a noticeable difference?
>>
>>107764599
is there a 10-15 sec svi, and for comfyui nodes?
>>
>>107764828
you can go for any length you want, look at this youtube tutorial he explains it and provides a workflow on the video description
https://www.youtube.com/watch?v=XGB4qBkCFSM
>>
>>107764828
The workflow for making 10 seconds and 1 min is nearly identical. Just use less samplers
>>
>>107764755
you can yeah, since you have to provide a new prompt for ever 5 seconds
>>
>update a few nodes but not comfy
>everything works but wanwrapper workflows are now broken
>had a previous comfy backed up with a working wanwrapper
>revert to older wanwrapper
>still fucking broken

WHY
>>
why do you guys even update comfy?
>>
>>107764870
i don't know
but i bricked my install.
im using the goyapp now
>>
>>107764870
>why do you guys even update comfy?
it's not like we have a choice when there's a new good model that comes out and that we have to update it to run it
>>
>>107764870
a lot of us dont. not sure what the ones who do are doing to get the latest comfy to work but every time i try to update it, since december, everything breaks. good thing i have working backups
>>
>>107764856
It means you can export the latent video and reinject it later?
>>
why did comfy remove the old job queue and replace it with a system where you now need two windows, one for the queue and one for the finished results, and also make it so that you have to double click the image to see it, and also make it so that if you hover over a job you get a third job details window that will often get stuck if you dont hover over it?

anyone know why he did this?
>>
>>107764870
Never had any problem doing so.
>>
>>107764897
>when there's a new good model
like...?
>>
>>107764870
to complain about the UI more

>>107764909
comfy never was able to do frontend so he hired jeets to do it
>>
>>107764918
Z-image turbo, and I guess that upcoming model >>107763806
>>
>>107764897
no model is good enough on release. it needs to be fine tuned and redistilled and have the first wave of loras and controlnets done before it can be used seriously
>>
>>107764870
New model support and also I hope the new release will unfuck things the old release fucked up.
>>
>>107763806
>>107764938
what the fuck is glm image and why would it be good?
>>
>>107764926
>comfy never was able to do frontend
I feel him, I'm way better on the backend side
>he hired jeets to do it
that was a big mistake, he's not that good on the frontend but he's way better than your random jeet
>>
File: 1766928181157652.png (419 KB, 2048x1447)
419 KB
419 KB PNG
>>107764945
>why would it be good?
the glm guys are dominating the local ecosystem on LLMs, so maybe they'll do the same on image models, they're not randoms, far from it
>>
when will bfl create a safer flux? I could generate non ugly women with flux 2 wtf
>>
>>107764956
GLM is good but I'd say the most impressive team is the deepseek one, I wish they had any interest in image/video, these guys also publish every trick they use.
>>
>>107764947
he just used the light graph framework and didn't touch anything but I agree that nothing should have been touched if this nu-ui is the result
>>
>>107764956
All the Chinese LLMs are within spitting distance of each other, GLM 4.7 was just the latest release. They all borrowed a lot from DeepSeek.
>>
File: 648432864.jpg (615 KB, 2048x2048)
615 KB
615 KB JPG
>>
>>107764998
>feeling cute today, might never release base
CHAAANGGGGGGGG
>>
File: Turbo.jpg (2.98 MB, 6656x2532)
2.98 MB
2.98 MB JPG
SeedVR2 doesn't bring me joy consistently.
>>
>>107764870
>>107764725
nooooticing
>>
Am I missing something with Chroma lora training? I'm trying to train a character lora and it's barely picking up the likeness of the character. I'm using
>40 images
>adamw
>batch size 2
>lr 2e-4
>alpha/rank both 16
>100 epochs
I don't really know what's happening or if I should just train further.
>>
>>107765006
you mirin that culture, huh?
>>
>she is spreading her legs revealing her panties
>a third leg pops out of her crotch
everytime.
pozzed model
>>
Enjoying your base model you gullible retards?
>>
>>107764956
>dominating
dominating is a stretch but they have by far the most sovlful models, let's hope it's also the case for their image model
>>
>>107758571
>Enjoying your base model you fucking retards?
>>107765046
>Enjoying your base model you gullible retards?
I wonder what slight variation he's gonna bring to the table on the next bread kek
>>
File: img_00080_.jpg (479 KB, 1520x1728)
479 KB
479 KB JPG
>>107765033
what resolution? Chroma sometimes struggles to pick up details if you use 512+ resolution.
>>
>>107765057
so where base
>>
>>107765067
how should I know, I'm not a Tongyi employee :(
>>
File: 1753838431444871.png (2.39 MB, 1344x1344)
2.39 MB
2.39 MB PNG
>>
im beginning to hate the chinese
>>
>>107765064
I'm using 512 right now, OneTrainer's default settings for Chroma downscale them to that. Should I bump up the resolution to 1024?
>>
>>107765067
>where base
In China
>>
>>107765047
>they have by far the most sovlful models
What exactly do you mean by that? How can an LLM have sovl?
>>
>>107765067
because you touch yourself at night
>>
>>107765074
I hate all researchers equally. all of them are sellouts
>>
>>107765033
The images are too different or your captions are bad or too short. Sometimes, it just doesn't work for a particular model. Try training it on only one image and see if the likeness if picked up in the sample images.
>>
File: 1751495689716772.png (398 KB, 828x828)
398 KB
398 KB PNG
>>107765072
she looks like Simona Halep
>>
>>107765122
only oldfags know who that is
>>
I neither hate nor love, I just enjoy the free stuff we get every other week.
>>
>>107765103
>What exactly do you mean by that?
all the LLMs sound the same, they are boring and write corporate bullshit, except the GLMs models, they write like a human would do, and they have way more imagination, maybe they're not using as much synthetic AI on their dataset training as the others
>>
>>107765127
desu her "doping" scandal is pretty recent
>>
File: onetrainer1.jpg (113 KB, 1054x849)
113 KB
113 KB JPG
>>107765082
>Should I bump up the resolution to 1024?
No, it will learn even slower. Could it be bad captions? Here's some settings I used that worked just fine. I had rank 64 and alpha 32
>>
>>107765132
>GLMs models, they write like a human would do
do you mean less sappy shit like "mischievous glint in the eye", "ball in your court", "half lidded eyes", etc
>>
>>107765145
yeah
>>
>>107765132
>except the GLMs models, they write like a human would do
You sound like a shill.
>>
>>107765074
I gradually came to hate them
>>
>>107765151
oh, I'll take a look then
>>
>>107762592
>none of the pixelart kino made it into OP
Absolute disgrace. Shame on you.
>>
why isn't the new qwen popular here?
>>
>>107765142
I'm guessing it's probably the captions. I'll redo those and do another run with those settings you posted. Any recommendations for how to caption? I've been using joycaption, not sure if I should use something else.
>>
Chinese culture
>>
Remember when you were all jerking each other off over a misinterpreted discord message that the base model would be out ON the weekend the turbo model came out?

You are all so clueless about Chinese culture it's obscene.
>But it got merged into diffusers
Merge it into my ass because that doesn't mean anything.
>But the chinese guy on twitter said it was coming
He said wan 2.5 would be open source too
>But they said wait a little more on the discord
Wait for what exactly? You and I both know the people there have no say over what gets released.

At best you're getting an API model.

Chinese. Culture.
>>
>>107765187
Joycaption works just fine for Chroma. 2-3 concise sentences is enough. You could also try without captions. I used ChromaHD as base.
>>
>>107765131
This. Imagine complaining about free gimmies. Must be a burger thing.
>>
Still on a ddr4 system right now and wondering if I should stick with it and get 64gb ram or just do a full upgrade.
>>
>>107765211
>that doesn't mean anything
why? because you said so? why would they even put the effort on bringing the inference code on diffusers if they were sure they were gonna go API?
>>
>>107765219
we complain about lies and broken promises, if they said from the begining they were not gonna release base no one would bat a fucking eye
>>
File: image(40).png (2.05 MB, 1024x1536)
2.05 MB
2.05 MB PNG
>>
>>107765231
Chinese culture.
>>
File: 5825884544.png (3.89 MB, 1296x2232)
3.89 MB
3.89 MB PNG
>>107765006
This reminded me of the unfathomable number of githubs projects last updated 4+ years ago with "model to be released soon" on the readme I've seen, ogre
>>
>>107764998
>>107765257
is this a reference to deep dark fantasy? lool
https://www.youtube.com/watch?v=Tg82nutmTwI
>>
>>107765257
>unfathomable number of githubs projects last updated 4+ years ago with "model to be released soon" on the readme
Surprisingly good understanding of Chinese culture in /ldg/? Wow
>>
Base model will be heavily censored.
>>
File: get hitler culture'ed.png (1.65 MB, 1280x853)
1.65 MB
1.65 MB PNG
>>107765280
Black Forest Labs promised a video model all the way up to 2024, I guess that was the G E R M A N C U L T U R E kicking in this time
>>
File: img_00157_.jpg (365 KB, 1112x1264)
365 KB
365 KB JPG
>>
https://xinyu-andy.github.io/SelfE-project/
Trust me bro, this one will definitely replace Self Forcing!
>>
>>107765289
That's just cultural appropriation.
>>
File: ComfyUI_15637.png (2.36 MB, 1080x1440)
2.36 MB
2.36 MB PNG
>>107765033
With Flux (same difference), Cosign with Restarts and Prodigy (I think? that auto one) always worked best for me. Rarely had to go over 3k steps with 45-50 images before it nicely converged.

Could just be Chroma though, I've never had a LoRA outright fail even without captions. It always picked up something.
>>
File: 077164.png (1.39 MB, 832x1216)
1.39 MB
1.39 MB PNG
>>107765269
nah, just my kink
>>
File: 1747699015989390.png (2.28 MB, 1152x1472)
2.28 MB
2.28 MB PNG
>>
File: image(41).png (1.92 MB, 1536x1024)
1.92 MB
1.92 MB PNG
>>
File: 1742655885900459.jpg (680 KB, 1336x2008)
680 KB
680 KB JPG
>>
>>107765331
>Hitler if he was accepted to that painting school
:(
>>
>2026
>we're still on Wan 2.2 for video
API chads are laughing at us.
>>
>>107765346
Z-video base will save us
>>
File: 1739076188403750.png (2.45 MB, 1152x1472)
2.45 MB
2.45 MB PNG
>>
File: 1740365766138343.mp4 (3.29 MB, 720x1072)
3.29 MB
3.29 MB MP4
>>
File: 1747686107141271.png (2.63 MB, 2048x1024)
2.63 MB
2.63 MB PNG
Floowandereeze & Robina
>>
File: 1752792066413186.png (3.29 MB, 1336x2008)
3.29 MB
3.29 MB PNG
>>
>>107765346
>are laughing at us
they're censored and limited. even api professionals use local models, for serious projects
>>
File: img_00164_.jpg (794 KB, 1368x1784)
794 KB
794 KB JPG
>>107765419
very nice
>>
>>107765430
>even api professionals use local models, for serious projects
and they end up making bad shit like that coca cola ad lmao
https://www.youtube.com/watch?v=Yy6fByUmPuE
>>
xixxix anon pls upload your 'jak lora
>>
File: 1745050890534006.png (3.35 MB, 2048x1024)
3.35 MB
3.35 MB PNG
>>
File: Qwen_00378_.png (2.19 MB, 1472x1136)
2.19 MB
2.19 MB PNG
>>
>>107765456
>vaporwave
>90s
erm
>>
>>107765464
i guarantee there's hundreds of videos like that on youtube, regardless of the accuracy of the premise
>>
File: 1752547615443205.png (2.94 MB, 1336x2008)
2.94 MB
2.94 MB PNG
>>107765435
ty same to you
>>
maybe autoregressive open weights zai-org/GLM-Image model soon

https://www.reddit.com/r/StableDiffusion/comments/1q42gv8/glmimage_ar_model_support_by_zrzrzrzrzrzrzr_pull/

https://github.com/huggingface/transformers/pull/43100/files
>>
File: 1753011746589362.png (2.63 MB, 1152x1472)
2.63 MB
2.63 MB PNG
>>
File: IMG_20260105_015431_530.jpg (166 KB, 909x1280)
166 KB
166 KB JPG
Haruhi is waiting
>>
>>107765509
yeah we know >>107763806
>>
Is there NAG for forge neo? My zimage gens on neo are coming out ass compared to cumfart.
>>
>>107765509
Watch this get released very soon with no Z image base. And people will still make excuses as to why it hasn't be released yet.

A key misunderstanding of Chinese culture.
>>
>>107765509
>that'll be 72gb of vram
>>
>>107765523
if that glm image is as good or better than z-image turbo while being a base model, I won't need to wait for Z-image base anymore, Alibaba can fuck themselves for what I care
>>
>>107765510
loooooooooool
>>
>>107765509
>>107765528
if it's based on glm 4.6 9b flash we might be eating good
>>
>>107765443
Retraining atm. Trying regularization dataset because it overfits so fast
>>
File: 1764382198216845.jpg (696 KB, 1336x2008)
696 KB
696 KB JPG
>>
File: 1759910875194905.png (3.55 MB, 1336x2008)
3.55 MB
3.55 MB PNG
>>
>>107765509
>>107765543
>if it's based on glm 4.6 9b flash
When you look at the configuration_glm_image.py script you see this
>hidden_size: 4096
>num_hidden_layers: 40
>num_attention_heads: 32
>intermediate_size: 13696
which is the exact same as 9b flash, so you're right anon, it's a 9b model
https://huggingface.co/zai-org/GLM-4.6V-Flash/blob/main/config.json
>>
File: 1740099707630937.png (11 KB, 485x178)
11 KB
11 KB PNG
>>107765532
>>107763806
>I always loved the LLM GLM series, they are more soveful than the rest of the LLMs
I think they are the best of all local models for creative writing right now, including having the best soul out of the big models. Although I think all LLMs since big Deepseek/Qwens and especially since old big Mistral releases lost a nice amount of actual soul overall, it's just that new models make it up with IQ.

I don't think we'll get any big architectural breakthroughs with GLM, but it will probably be a good incremental leap.
>>
>>107765607
flash was released 29 days ago, and they took them less than a month to transform it into an image model? that's fast wtf
>>
>>107765628
q2 sounds like a meme quantization
>>
>>107764998
If he didn't have granny skin he would be pretty cute
>>
>>107765664
1 hour in AI is 3 days in real life
>>
>>107765690
if you die in ai you die in real life
>>
File: 1752036752238097.mp4 (1.36 MB, 1920x1080)
1.36 MB
1.36 MB MP4
>>107764998
>>107765687
>If he didn't have granny skin he would be pretty cute
the interesting thing with Z-image turbo is that you can stop the generation a few steps before (with KSampler advanced) than the expected number of steps (let's say we stop at step 18/20), and that can get rid of that overdetailled skin
>>
File: 00160-4141389876.png (1.04 MB, 1224x768)
1.04 MB
1.04 MB PNG
>>
>>107765676
Q2 on really big models isn't a big deal
>>107761981
>>
>>107765761
That seems reasonable
>>
>>107765761
Even Q1 of 671b R1 was SOTA for months. When the models are gigantic it's not the same, especially for LLMs.
>>
>>107765710
>20 steps
I also noticed that the more steps I added the worse it got, did you notice anything similar?
>>
>>107765800
for text I feel that 8 steps isn't enough, that's why I went for more, but you're right 20 might be overkill, this model wasn't trained on so many steps
>>
File: 1753463908075051.png (220 KB, 640x416)
220 KB
220 KB PNG
>>107765509
oh god please make z-image base obsolete before it ever gets released, that would be so funny
>>
File: kek.png (110 KB, 320x180)
110 KB
110 KB PNG
>>107765509
>Chongdanov, we finally got a worthy rival to our Z-image base model
>drop it
>>
>still no base or edit
is qwen the best edit cope for the time being?
>>
>>107765828
Z Image Turbo won because it's small enough AND good enough. Unless GLM is as small, size/quality won't balance.
>>
>>107765872
>Z Image Turbo won
you cant even combine two loras
>>
>>107765872
>Unless GLM is as small
it's based on glm v4.6 9b flash, so it's the same size as Chroma >>107765607
>>
>>107765878
to be fair, the fact it's working on loras at all is a miracle on itself, it's a double (guidance + steps) distilled model after all
>>
>>107765878
play with the layers of each
>>
File: 1756458638894258.png (2.32 MB, 2328x1295)
2.32 MB
2.32 MB PNG
thoughts on the dataset i'm building?
>>
>>107765927
>thoughts
Looks like a god's chosen dataset to me!
>>
>>107765346
they're too busy spending hours trying to jailbreak the model to show a nipple for half a second
>>
File: 1741363865527079.png (754 KB, 1248x832)
754 KB
754 KB PNG
>>107765927
delete this
>>
>z-image vs flux 2
Flux 2 is truly a western model
>>
>>107765927
>most are from the shoulders up, portraits
lol
>>
File: 1752795840261362.png (48 KB, 225x225)
48 KB
48 KB PNG
>>107765966
>I love shek
me too
>>
>>107765969
>western
I'd say european, burgers have no issue with guns
https://www.youtube.com/watch?v=Y-3IV11_ZgA
>>
File: 000.jpg (3.22 MB, 1456x1920)
3.22 MB
3.22 MB JPG
>>107765977
I tested again and I think I did something wrong, the girl is still ugly though.
The dirt on the mirror looks good, so maybe its not that bad of a model.
>>
>>107766050
I better hope a 32b model is "not that bad" kek
>>
>>107766050
She wears that and has a dirty mirror, of course the model will default to ugly.
>>
File: jew.jpg (1.22 MB, 5315x3543)
1.22 MB
1.22 MB JPG
>>107765927
add this one too
>>
>>107766101
thank you, i do need more. i just googled ugly jew and took the best ones
>>
>>107765927
waste of time, need hot girls not this crap
>>
>>107765969
flux 2 made a safer less feminine pose lmao
>>
>>107766050
show booty
>>
>>107766158
she has an uglier face too
>>
File: 11111.jpg (3.75 MB, 1456x1920)
3.75 MB
3.75 MB JPG
>>
>>
File: z-turbo_00013_.png (3.88 MB, 1536x1536)
3.88 MB
3.88 MB PNG
>>
>>107766379
>slight pantyshot
very important details, thanks anon
>>
>1girl
>>
1girl Is All You Need
>>
>>
1girl is great, it links us to our ancestors doing the same thing since forever
>>
>>107763403
This is literally me
>>
Inside the car there is 1girl.
>>
>>107766419
https://github.com/comfyanonymous/ComfyUI/pull/11632
We'll get LTX2 before z-image base looooool
>>
>>107766434
im sorry but that doesnt count as 1girl
>>
File: file.png (1.01 MB, 948x1171)
1.01 MB
1.01 MB PNG
>>107766434
quite the beauty
>>
>>107766439
>its audio + video
HUGE, I mean the audio sucks but MAN WE WONNED!!!!!!!!
>>
File: z-turbo_00022_.png (3.71 MB, 1536x1536)
3.71 MB
3.71 MB PNG
>>
ready when fresh
>>107766478
>>107766478
>>107766478
>>107766478
>>
File: car going fast.jpg (3.29 MB, 1920x1456)
3.29 MB
3.29 MB JPG
>>107766469
rude

>>107766465
the car identifies as a girl
>>
>using wan loras for i2v
>works really well for a number images
>suddenly just doesn't work at all and just makes the people awkwardly shift around for every gen
Why does this black magic get so temperamental for no reason
>>
File: img_00200_.jpg (289 KB, 1216x1376)
289 KB
289 KB JPG
>>
>>107766472
>>107766608
put those on the next bread dude, they are good lol
>>
>>107764599
how long did it take you? I'm gonna try this too once I finish the setup
>>
>>107766651
3 mn per 5 second split, so 3x12 = 36 minutes
>>
>>107766608
based
>>
>>107766608
Link?
>>
>>107762882
that from zimage or qwen edit?

also holy fucking shit these captchas are garbage
>>
>>107764017
how do this? this that QR code generator thing?
>>
>>107766782
literally 3 year old stable diffusion could do this
>>
Are there any uses for SD1.5?
>>
>>107767841
for the sovl
>>
>>107766480
thanks for the fresh bread
>>
What would a genuine AI enthusiast buy now for 15000$ ? rtx 6000? A100? multi 5090 setup?
>>
>>107769647
I'd get three 5090s, a threadripper or whatever and as much ram as possible



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.