Brings You Back EditionDiscussion of Free and Open Source Text-to-Image/Video Models and UIPrev: >>106525822https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/sd-scripts/tree/sd3https://github.com/derrian-distro/LoRA_Easy_Training_Scriptshttps://github.com/tdrussell/diffusion-pipe>WanXhttps://rentry.org/wan22ldgguidehttps://github.com/Wan-Videohttps://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y>Chromahttps://huggingface.co/lodestones/Chroma1-BaseTraining: https://rentry.org/mvu52t46>Neta Luminahttps://huggingface.co/neta-art/Neta-Luminahttps://civitai.com/models/1790792?modelVersionId=2122326https://neta-lumina-style.tz03.xyz/>Illustrious1girl and Beyond: https://rentry.org/comfyui_guide_1girlTag Explorer: https://tagexplorer.github.io/>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkBakery: https://rentry.org/ldgcollage>Neighbours>>>/aco/csdg>>>/b/degen>>>/b/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debo
Is /a/ right? Has image gen literally gotten worse?>>>/a/282129552 >It's gotten worse since 2020 lol, that's why everyone just uses finetunes of SD1. 5 before the model had the chance to eat so much of its own shit
Blessed thread of frenship
>reposting b8
>>106529614Groovy
Will installing Wan2GP and its dependencies interfere with SD Forge?
>>106529582>everyone just uses finetunes of SD1.Your hourly vramlet retard who can't use X model FUDing because of his sour grapes because he can't use anything else.
>>106529560So what is the easiest install for AMD on Linux right now?
>>106529549it's bait; the technology effectively didn't even exist in 2020
>>106529719He had an uncle working for NVidia
I have to get 128GB (64GB x 2) kit? Anyone here with 32GB x 4 RAM setup?
>>106529754Well 64x2 ddr5 is more likely to run stable above JEDEC, so it should be more preferable, bar pricing maybe.
>>106529754>>106529790Btw what are you getting this much memory for?I would go 64 or 96 if I could, but what would 128 gb system memory do for AI?
>>106529800Maybe multitasking? Local LLM? 64gb is more than enough for most anything in this thread though.
>>106529667your sd forge dependencies should be kept separate in their own uv/pip or conda venv anyhowinstalling everything userwide is pain with this python stuff
>>106528775How do you make time lapses like this? Just ask wan for it?
>>106529837>>106529800128gb is already beneficial for video gen, vramletretard
>>106529837Running a large LLM on system memory would be slow as fucking balls though.>>106529915Prove it.I am on 32 gigs and the most I have seen is 30 gigs of swap during video gen.So that gives circa 60 gigs of use.Some overhead for multi tasking and you can justify 96.But show how you get to 128.
Local is saved yet again by the strong blooded Chinesehttps://x.com/bdsqlsz/status/1965293660058386484
>>106529584GOOOD MORNIN! :3
>>106529941>Chinese image modelIt's not a question of whether it's slopped or not. It's a question of how slopped it will be.
>>106529915>vramThe benchmarks showed real minimal benefits past 64gb for wan. 96 is probably the highest I'd go to just never think about ram use if anon is concerned.
>>106529955kekd
any way to mitigate cartoony characters talking in wan 2.2? even with flf2v they start yapping.
>>106529978Use the negative prompt. It's not 100% but it helps.
>>106529941>another image modelVery good! Another delay for wan nunchaku
>>106529978you can reduce it via prompt but i haven't found any setup that stops it very reliably
>>106529941I wonder how it does with more complex prompts? Not a fan of it coming up with too many of it's own details if I don't want them.>base+refine modelEw...
>>106529941>17Bmight check out when someone makes GGUF quants and it is agreed upon that it isn't ass.I think this it the first major model to use Glyph-SDXL-v2? For text clarity apparently???Interested to see if it isn't ass.
>>106530013I will withhold judgment but even the demo images look gigaslopped.
HOW 2 DEMOE
it looks like the qwen edit remove clothes lora got nuked from everywhere. fucking hate moral normie fags.https://huggingface.co/starsfriday/Qwen-Image-Edit-Remove-Clotheshttps://huggingface.co/drbaph/Qwen-Image-Edit-Remove-Clothing-LoRAhttps://civitai.com/models/1916583/qwen-image-edit-remove-clothing
Do wan 720p loras work at all on 480p?
>>106529754you should check the cpu/memory support page from your motherboard vendor, also while you're there you may want to update your bios as it sometimes improves memory compatibility.
>>106530043This weird ass Chinese website has it apparently but you need to sign it.https://www.liblib.art/modelinfo/99d2d7a0bf0e41bd9275bdbc9a84995d?from=feed&versionUuid=5a5b4e055ed4485db884d26a440eb018&rankExpId=RVIyX0wyI0VHMTEjRTM3X0wzI0VHMjUjRTM4
>>106529941It's distilled? eww if so...
>>106530143There is a distilled and non-distilled version in the repo.
>>106530148That's good then.
>>106529941I'm downloading it now and will test shortly
It's interesting to me that no one seems to have figured this out:You get way higher quality outputs with two loras at 0.65 instead of one lora at 1.0.For example you can get extremely close "likeness" if a character has 4 loras on civitai and then you use them all, putting them all at 0.35 or something (you have to include the trigger words too of course).Like why hasn't anybody written a scientific paper about this and then use that as a basis to improve lora training tech?
>>106530012>Jenny NicholsAnon, I...
>>106529941https://x.com/bdsqlsz/status/1965293660058386484>Tencentredemption arc?
>>106530328So it's not an LLM with image out but rather your average image slop?
>>106529941>base+refine model>not edit modelI sleep
>>106530148>There is a distilled and non-distilled version in the repo.that's surprising, Tencent always released only the distilled version, I guess they have no choice but to try harder since Alibaba is spoiling us with Wan and Qwen Image
>>106530328I love how they went for a slighly older woman to show that their model can produce decent skin, I appreciate the effort, we'll see about that!
>>106529941>>106530161Well, I've downloaded the models from HF (163GB) but the link to the github project with part of the inference code ( https://github.com/Tencent-Hunyuan/HunyuanImage-2.1 ) is dead. Trying to figure out alternatives.
>>106530383>163GBwait what?
>>106530328Are they for real with their prompt enhancer?
>>106530395>yes we use Google Gemini to caption our slopkek, the mask is so off now, those Chinks doesn't give a fuck and aren't pretending anymore lmao
>>106530394Looks like there's a bunch of stuff in the repo besides the actual models.Both actual models are 34gb each, and the vae is 1.5gb. I'm not sure why the distilled one is the same size.
I've dove head first into learning comfyui with wan genning, all these workflows I've gone through have had bad results. I find a video going through the default workflow provided by comfyui/alibaba, whatever, and it blew all of them out of the water. It's like the other ones weren't working properly. Probably user error, but still. Is the Shift parameter basically how much it.. shifts the image, with i2v? High value lets it go crazy, do whatever it wants, while a low value maintains a majority of the initial image?Meaning a start and end frame workflow with high shift would yield some whacky but stable results?And the Lightning lora, it changed my speeds from like 20min gen to a minute, how does that work?Compare to the first ones I tried >>106511054It reads the prompt properly and doesn't fuck up the quality and style. How can the workflows be this different? I feel like a boomer being this baffled. It even retains the fucking grain I added.
>>106530395desu, google gemini is an excellent model to caption images, the best right now, it even know anime characters like yui from k-on
>>106530394>>106530416There's a 15gb LLM in the repo as well
>>106530421>a 15gb LLMprobably just the text encoder
>>106529941>"two images model comparable to Nano Banana">it's just a regular image model and not an edit one like Nano BananaYOU LIED TO ME JACKIE CHAN
comfy should be dragged out on the street and shot
>>106530401>>106530416That's their user prompt enhancement, though.But they probably do use Gemini for captioning as well, but who can blame them.Normal prompt:>A cute labubu wearing a spacesuit is floating and roaming in outer space. Oil painting style, heavy brushstrokes, strong texture, and obvious paint stacking.Enhanced prompt:>Labubu, a monster character with long, rabbit-like ears and a mischievous smile full of jagged teeth, is wearing a white spacesuit, floating and roaming in the vastness of outer space. Around it is a deep space background, made of large areas of mixed blue, green, and yellow paint, forming irregular and dynamic blocks of color. The paint stacking is obvious, creating a raised texture. The background is also dotted with some stars composed of bright yellow and white oil paint pointillist brushstrokes. Expressionist oil painting style, with heavy brushstrokes, obvious paint stacking, palette knife textures, and a strong sense of canvas texture.I mean, piping prompts through LLMs really isn't anything you. All they did was write a few prompts and think of some CoT.What they do show, though, is that their model has absolutely no idea what a Labubu is.
>>106530446>A cute Tom cat wearing a spacesuit is floating and roaming in outer space. Tom's body is mainly composed of large blocks of white and gray oil paint, showing a rounded and lively contour. The background is a mixed tone of dark blue and black, exhibiting an impasto technique, and is dotted with multiple celestial bodies made from white and yellow paint in a pointillist style. Oil painting style, heavy brushstrokes, strong texture, and obvious paint stacking.vs>Tom Cat from "Tom and Jerry," wearing a spacesuit, is floating and roaming in outer space. Tom's body is mainly composed of large blocks of white and gray oil paint, showing a rounded and lively contour, with a cute facial expression. He is wearing a multi-layered spacesuit; the suit is made of stacked off-white and light gray paint, presenting a strong texture. On his head, he wears an opaque glass helmet with yellow highlights. The background is a mix of dark blue and black, also using an impasto technique, and is dotted with multiple celestial bodies made from white and yellow paint in a pointillist style; these celestial bodies appear as round color dots of varying sizes. Oil painting style, heavy brushstrokes, strong texture, and obvious paint stacking.And neither does it now Tom from fucking Tom and Jerry.Well, curious to try that shit out once they finally release the code.
>>106530446>>106530450how did you get to use the model? is there a demo page somewhere?
>>106530477That's only from their demo about their prompt enhancer they released alongside the model.Apparently, besides the gemini API, they're releasing a 7B parameter model for this shit and teased a video prompt enhancement model as well.
>>106530482It all looks like slop, maybe even beyond that of qwen so I don't really give a shit until I see something truly interesting.
>>106530492>It all looks like slopand it's not an edit model, booooooring
>>106529754get whatever has good speed between 2x48 and 2x64.4x32 is a bad idea in ddr5.
>>106529998lol
>>106530496https://xcancel.com/bdsqlsz/status/1965302946280923479#mI think an edit model will be released just after this one and it'll be bigger than 17b, c'mon man...
How come these retards will release gigslop model after gigaslop model, but refuse to release 3D 2.5
>>106530492The base resolution is 2048 so if nothing else it may be an excellent hires second pass.Still waiting for them to unhide the github so I can test it...
Huh. Right now it uses the Qwen MLLM and they're releasing their own.Might be cool to play around with using theirs on Qwen Image and Qwen Image Edit.
>>106530554>they're releasing their own.Hunyuan video text encoder all over again.
>>106530554>at this stage, we have not yet released the latest HunyuanMLLMbro they already said this on HunyuanVideo last year, we'll never get this shit are we? top kek
>>106530252LoRA merging is in the ancient tomes, Anon. Doing it the way you described introduces more chances for errors caused by the LoRAs and can limit the flexibility. It's best to do two training runs separately with identical settings, merge them together into one and then use that new/combined LoRA at a lower setting. You can also get a bit more flexibility by splitting your dataset into two as well... it just insanely time consuming with all the testing necessary. Kinda not worth it.Here's my highest quality 512px LoRA (0.75) and the most recent 1024px EQ VAE trained LoRA (1.28 because it's not natively trained on Krea - I tested all the way up to 1.50) used together. Looks a lot more like Jenny, but there's also a lot more little issues that crop up on each pull. Note: Krea is also doing a lot of heavy lifting chilling these out, otherwise I'd have to drop them both a lot lower.>>106530446>>106530450Since it's designed to be used in their code, I wonder how much of those words from the LLM pass the model actually understands? Or if CLIP still lurking somewhere in the shadows with it's ol' timey gibberish?
https://files.catbox.moe/pqbzg5.flac
>>106530043holy shit man, I hate these people
>>106530595I don't really get these guys, with inpainting you can basically make deepfakes in like a couple seconds, this just eliminates the masking part lol. That being said Wan does better nudes than editing with Qwen, just saying.
https://voca.ro/1nPQWpvXnbdgHave you ever felt so completely and utterly creatively drained, but also compelled to gen literally anything because your GPU has been idle for too long?
How do these people stomach the OpenAI look? This noisy, dark, pissfiltered piece of shit.What's up with their colors, anyway? Did they butcher their VAE?
>>106530622it makes no sense, it's pure pettinessI was waiting for nsfw friendly loras from qwen edit, even just stuff like understanding various underwear, sexy clothes types, but this "it can be used without consent" is just super retarded and broadI can draw a doodle of a random person without their consent, big deal
>>106530633They are nostalgic of the PS3 era.
>>106530633Am I the only one who remembered it be really good at one point?
>>106530623LOL
>>106530636people who can't create destroy
>>106530633>What's up with their colorsprobably a watermarking of some sort
>>106530636The consent stuff is just the latest in a never ending moral panic around ai.It's the strangest thing to see unfold.
>>106530633gpt5 has its own image model now, and it doesn't have the piss filter of gpt4o anymore (it's still ass, it changes the image too much)
>>106530666It's kind of wild. It's like people forgot convincing alternatives to these things have existed for a very long time. Some people have gone absolutely ballistic over AI.
>>106530675Most of them are worried about the ease of use and accessibility when you press them on it.
>>106530595>poses serious risks of harmIt is this smarmy redditor's moralizing language that gets me. If they just said this is illegal and we don't want to host it I would respect it.They have to do this BS performative preachy choir though.And also.>prohibits models intended for sexual exploitation, especially when it involves non-consensual usePray tell how can you have "sexual exploitation" with consent?This is either redundant tautology or buzzword salad, pure faggotry eitherway.
>>106530595these guys think they're farming some kind of social credit
>>106530675>Some people have gone absolutely ballistic over AI.luddites have always existed, when photography was invented, senators wanted it gone because we could photoshop this shit and spread misinformation, and the realistic painting fags were afraid they were losing their jobs
>a new model each week>each with its controlnets, loras, nodes, and settings>no time to grow a ecosystem
>>106530675I like how people seem to suddenly think that reusing their voice they freely shared online or face or whatever is somehow "stealing" from them. It's the exact same idea as Indians in early 19th century thinking photos are stealing their souls.While I agree doing that to attack someone else or scam them should be illegal, it's the act of scamming or attacking that is illegal, not the imitation.
>>106530693>While I agree doing that to attack someone else or scam them should be illegal, it's the act of scamming or attacking that is illegal, not the imitation.they want the cake and eat it too, they want to post their shit on the internet and make money out of it, but they don't want us to make mames out of their work, transformative work has always been fair use, they're just coping at this point
>>106530328if it's less slopped than qwen image I'll take it, but I'm weary of the licence, it's not MIT like qwen
>>106530717>PersonaSlopof course
>>106530714Is it as bad flux dev? Than it's probably dead to future bakers
>>106530726it can't be worse than flux dev, at least it's not distilled
>>106530714It literally has the exact same one anime style as Qwen. It may as well be bloated Qwen.
>>106530730>It literally has the exact same one anime style as Qwen.it's funny because it's true, and then the chinks wonder why we say they all look the same
>>106530714Why do you care about the license? Are you planning to make money off of their work like some kind of little parasite?Anyway, it's irrelevant, these licenses are unenforceable.
>>106530736>>>/g/adt
Still no github project. Are we being trolled like 3D 2.5?
>>106530736I am just glad these models can atleast do anime, Fucking BFL had some hate boner for it for some reason.
>>106530741>Why do you care about the license?you don't want someone to make a serious finetune of it like lodestone did with flux schnell you fucking low IQ retard?
>>106530741>it's irrelevant, these licenses are unenforceable.they can enforce it, that's why there's no NSFW loras of Kontext on civitai, they enforced their licence there
>>106530759That's civit covering its own ass.
>>106530717>[cartoon/video game character] says [generic shitlib NPC opinion #493852], presented with no humor, setup, or punchline
It's out.
>>106530769Holy crap lois. I don't care.
>>106530769who cares? it's just qwen image all over again, I'll wait for the edit one >>106530507
>>106530755>>106530759at no point have any measures been legally enforced. that's all "voluntary respect" for the license.in reality the models can't be owned and no one will ever take legal action because it would fail, vaporizing the pretense.
>>106530781i hope it's 32b parameters for a 12% improvement!!
>>106530791didn't StabilityAI enforce their new licence (which is to dissalow NSFW on SD3) on civitai recently?
I get this when trying to load the Q8 gguf of Chroma on SwarmUI>No backends match the settings of the request given! Backends refused for the following reason(s):>- Request requires model 'chroma-unlocked-v11-Q8_M.gguf' but the backend does not have that modelI have the extension for gguf installed and nothing on the swarmUI github page helps either, any idea what to do here?
>>106530794and it'll zoom in the image 12% more!!
>>106530769As long as I don't have nano banana at home I sleep
>>106530819WAKE UP there's a nano banana in uranus
>>106530769Here we gooo
>>106530769https://xcancel.com/TencentHunyuanyou know this model is mid when they didn't announce it on twitter lol
>>106530804they politely asked and civitai said "ok"
>>106530834what would've happen if civitai said no?
>>106530834i wonder why
>>106530846nothing, civitai are just a bunch of weak faggots that don't know how to run their platform.its like a new shit storm every week over there in terms of their TOS. they had threatened to ban all NSFW For a while and decided to slow increment things that way, for most of the year they were blaming payment processors on why.
>>106530866>for most of the year they were blaming payment processors on why.and they're right, VISA is bullying everyone recently, Steam got some heat from them aswell
>>106530808I can't think of too many reasons to use an ancient version of chroma. Go with 48.Regardless works fine on comfy.The error message makes me think it is not actually seeing the model. Reload / restart.
>Minimum: 59 GB GPU memory for 2048x2048 image generation (batch size = 1).
>>106530808Is it in your unet folder? Weird its mentioning such an old epoch too.
>>106530823I didn't feel it because of the nano size though
>>106530750At this point it should be obvious why, it goes well with their anti nsfw crusade.
>>106530890>he didn't boughtedted RTX PRO 6000Vramlet cuck
>>106530880you miss my point, yes that was an issue for everyone at the time, but they weren't directly threatened by it because there were alternative payment processor optionswhich they refused to exercise until after they scared off a sizeable chunk of their userbaseNOW they have those payment processors, proudly gloating them on their front fucking page like its some new innovation and not something they could've had from the start kek
>>106530890it's using less memory when you're on sageattention though? and we won't be using bf16 but Q8, and they probably included the text encoder on the equation I guess
>>106530989>there were alternative payment processor optionsthe alternatives wanted NSFW gone from civitai, they didn't do it because they know NSFW is like 90% of their revenue lol
>>106531001is that you, gaylord that runs civitai? everyone already knows the not so secret conversations about how much you wanted nsfw gone to begin with.you colossal anus demons. nobody believes crypto processors were at your door like VISA.
>>106530990NTA but I'm currently implementing some better offloading to their code, the model itself is requesting 39.01gb VRAM, so 59gb will be the calculation for everything including qwen 7b.
>>106531008>everyone already knows the not so secret conversations about how much you wanted nsfw gone to begin with.if civitai hates nsfw to begin with they wouldn't have allowed it in the first place, what are you talking aboug?
>>106531017i've just lost my entire breakfast onto the floor in front of you, this retarded debate is over.
>>106531022Did it come out of the front or the back?
>>106531022>i've just lost my entire breakfast onto the floornice
>>106530507>super large sizeplace your bet gentlemen? how big will it be? if that's a 30b one it'll be DOA like that Step-video model lol
>>106530989>NOW they have those payment processors, proudly gloating them on their front fuckingbecause switching from fucking visa to alternatives like crypto only would have massively cut into their profits what the fuck are you on about retard. what do you think the average person uses?
cheers
120B LLM that is able to generate img tokens simple as that nigger sauce
https://xcancel.com/ArxivToday/status/1931031321435857218#m>train 9x fasterlodestone, if you're reading this, THIS IS FOR YOUhttps://youtu.be/dXHYp_T4yTU?t=46
>>106531058lodestone here, thanks! gonna figure out a way to frankenstein this into my current training run! furryderpemoji
>>106531058probably snake oilas everything in ai
>>106531090Idk man, the loss curve decreases a lot, as if we're training a bigger model, this shit looks interesting
>>106531120>you need actually 250 steps on regular flow models to get the full quality imageoof that's tough...
>>106530769>refiner VAE is 6GBhuh
>>106530769>>106531183>not using the pixel space (PixNerd)NGMI
>>106531183There's a 30gb refiner model as well
>>106531058Lodestone has been poached by a chinese firm and is producing SaaS models for them. Starting with Seedream 4.>>106530633Definitely a subtle watermarking tactic. The fact that their model is amongst the more performant models, and yet we can all instantly pick up on when an image was made using OpenAI's services, is clear proof that they've baked some biases into their image gen itself.>>106530595I like to think that such safety nerds are really just 4cunts having a larp, I've done it before too it's quite fun.
>>106530717reminds me of the goku age of consent shit, it's as retarded
soo whats nunchaku status?QIE?WAN?LORAS?CHROMOSOME?HELLO!?!??!
>>106531207>Lodestone has been poached by a chinese firm and is producing SaaS models for them. Starting with Seedream 4.trust the plan, he's infiltrating the chink company and will leak the model
>got mail on pixiv>some nigga asks for AI request>check his profile>broken english venezuelan hyperfixated on some unknown calarts style cartoon
>>106529941If you told me this was a Qwen image render I would've believed you, it has the same exact anime style wtf.
>>106529948Just in case you're the avatarfagging tripnigger:Kill yourself
>>106531124did you ever do that? I did, zero difference above 100 steps.
>>106531273based
>>106531271so another benchmaxed slopmaxed censored model?
>>106531226they're very hard at work
>>106531280I'm reading the paper and it seems like they're not using CFG for comparisons, the fact it can render coherent images at cfg 1 is really interesting
>>106531247>hyperfixated on some unknown calarts style cartoonI didn't believe it, but 10 years of this stuff seems to have made a whole generation becoming extremely into it.
>>106531289>>106531226lmao, but seriously though, I don't get the hype for nunchaku, it has a Q4 quality image, why not simply using the lightx loras and go for 8 steps instead?
I've ...almost gotten hunyuan to generate an image. Keep running into issues, but maybe getting there
>>106530328looks giga slopped
>>106531296>it has a Q4 quality imagewhat? they have Q8+ quality while having the size/requirements of Q4
>>106531305>they have Q8+ qualityhttps://www.youtube.com/watch?v=oHC1230OpOg
>>106530328>>106531303>Flux chinit's over...
>>106531283>so another benchmaxed slopmaxed censored model?what did you expect, it's a chink model after all
>>106531247I used to do deepfakes like idk 5 years ago whenever. I gave up after literally hundreds of DM requests from Indians begging for some random tv star on their local village cable tv riding a cow or whatever. The third world is a real thing, physically and intellectually.
I got as far as the script loading all of the models into memory, and attempting to begin generation! Then it tries to compile something with torch compile first, and this fails for some obscure reason to be discovered soon. Many surprises!
>>106531358you can make so much money out of those retards though, lodestones knows a bit on how to charge extra money to the patreon furryfags lol
>>106529956128 wouldn't be enough if video gen started to be like LLMs going into the ridiculous range and we get a truly large MOE or something that required an actual server to run even with all the tricks in the book. But yes, it would be enough for the current moment.
>>106531380>74gbholy shit...
>>106531414his lower teeth are scary...
>>106531380You can call anyone a vramlet.
lmao when did civitai get based
This any good?https://bananaai.live/
>>1065314631) not a local model2) if you want to use nano banana, go for google ai studio insteadhttps://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-image-preview
>>106529582what objective criteria do you use to determine "better" vs "worse"non-fucked up hands is probably a decent metricbut part that a lot of it is just stylistic fashiona lot of the older gen pics that looked "great" back then are objectionable now because the style looks so generic and overplayedpeople are ever searching new highs of uniqueness and the ephemerality of ai will just make it all the more pathologicalnothing really means anything and explicitly exploits that existential vapor part of the psyche to a merciless degree, some percentage of people are going to crash out bad if they don't realize this early on
>>106531380>94.5 gb of vramVRAMgod sama!
>>106531487Subjective creatures never have 100% objectivity thus any human opinion on a model should be discarded.
>>106531487>parthow did that word get in therethat was supposed to be "beyond"
Qwen is ridiculously good with text. Chroma absolutely btfo on the text front. Hands too. Chroma still has a place but damn
>>106531600to be fair, I kinda expect a 20b model to be better than a 9b model
>>106529941>Omg our model can do 2k resolutionsso does Qwen image lol
>>106531600I really need to a get some kind of good text workflow going for my MAGA hats
>>106531600fso young
>>106531487>non-fucked up handsthat hasn't been a good test since before flux. like with eyes and teeth, popular models have all been trained to detect and replace or hide badhands with perfecthands without doing much to improve foundational understanding or prompt following.a better test is not to measure how often something does or doesn't randomly produce slopped hands, but whether the model can do what you ask.try prompting for hands in different positions, holding up certain fingers, with more or fewer digits. most models can't or won't without LoRA, but a few can.
>>106531380Nice man, keep us updated.I'm hitting a lot of snags implementing blockwise offloading to get this shit to run on 16gb VRAM.I imagine by the time I'm done quants will be out and I wasted a lot of time again.
>>106531289I'm also very hard at work
It's working! default prompt output / distilled.sort of working anyway. I had to disable the 5gb "refiner vae" because there seems to be something broken with the loader. The vae file was given the .pt extension but the loader assumed ckpt. Renamed it, but then it was unhappy because didn't match what it was expecting (state dict). No time to figure out the problem now.
>>106531656not local but out of all my meager attempts at genning i've still never been happier than I was with bing dall-e 3exactly zero of the images are perfect, they are all flawed in one or more obvious ways, and probably simplistic to others' tastes (not much to be done when you're drastically limited in prompt length, what was it, 185 characters?) but there's just so much charm and richness and warmth to them which are entirely meaningless subjective qualities that are impossible to justify except that it made me happy. everything i've done with locals (which isn't as much as most here admittedly) has been chasing down loras to try and replicate just one style that never quite has the same qualities bing can just randomly spew out on a whim. of course the flipside of this is the infinite ephemerality that no two iterations of a character will ever be the same. maybe i could train my own loras on them but that still just feels like chasing a dragonwill we ever, ever have a "one dataset to rule them all" like this in the world of locals?
>>106531724that's so slopped, when will we be free of this shit? :(
>>106531677Well, I'm getting there. Kinda.>>106531724Nice.
>>106531724the meme lisa looks very sloppy. can you do comparison with non-distilled?
>>106531738Yeah, I'll do that next.Meanwhile, here's the first 1girl from the undistilled model.
> train chroma lora> 4400 steps> barely learnt style, any random word can pull it back into the overslopped cartoon styleWow so trainable, despite havingg used gemini captions and making sure they make sense. chroma is simply overtrained.
https://wccftech.com/nvidia-geforce-rtx-5090-128-gb-memory-gpu-for-ai-price-13200-usd/Local is saved.
>>106531737Alright, I got... some output using blockwise offloading and 1024x1024 on 16gb VRAM.Progress, at least.>>106531724Their test code mentions that the refiner is not ready yet.
>>106531607>expect a 20b model to be better than a 9b modelSame. It's a dog to run without 8step lora, but skin quality suffers too much.>>106531651>good text workflow going for my MAGA hatsDon't over-think it. Let an LLM turn caveman into prose.>>106531653F>>106531724>>106531775Nice! Interested in further results
to this day i still have a bunch of bing stuff that, if you know what to look for you can probably tell, but if i slapped it on the cover of a 90s pulp fantasy novel and tossed it on a shelf probably no one would ever knowwhere's the lora for this?
>>106531794>Their test code mentions that the refiner is not ready yet.k, that explains that.Here's the undistilled penguin. =/
>>106531803
>>106531724one day I'll get a 96gb vram card aswell...
>>106531806Kek that mona lisa
>>106531803Are you running 8 steps on the distill? Seems to mangle text for me.
>>106531724https://xcancel.com/kohya_tech/status/1965390189435769273#mKohya is lurking on /ldg/ confirmed lol
>>106531829Yeah, 8 on the distill and 50 on the undistill as recommended
What's your acceptable gen time? Any longer you won't bother.
>>106531806The text is sharp, but it doesn't curve or blend well. You already know she looks like a man there. And those fingers.. hopefully it's just euler? Thanks for showing us the reality of it tho. Much appreciated
>>106531853>50Bruh. How is the it/s compared to qwen?
>>10653186450/50 [00:54<00:00, 1.10s/it]
Another hunyuan test
Here's the actual prompt "1girl"
>>106530504Knew this shit would happen and called it a few threads back, picrel is from plebbit about the new hunyuan model. They're going to distract the Chinaman once again
>>106531897>"1girl"it looks like a troon, doa
>>106531898and it's not just wan model, it's also wan loras support
>>106531856>What's your acceptable gen time? Any longer you won't bother.It depends on a lot of stuff. Mainly if I can use my computer for something else at the same timeI do find that once you cross the 2.5 to 3 minutes per gen threshold for images or videos, it's much harder to get dopamine hooked on the gacha. If 5 second HD videos could come out every 60 seconds that would probably be fast enough to keep the comfyui tab open while gooning
>>106531908lol, kinda true sadly. nano banana my dick. waiting for the fp8 scaled and quants to add to my collection.
>>106531911>wan loras supportthe chink better do QIE and qwen loras first
>>106529998>>106531898cuda onlydon't care
>>106531914>If 5 second HD videos could come out every 60 secondsyou can do this with the light loras desu>inb4 vramlet
>>106531924Only at like 640x480 on my card which is not high enough resolution for my standards, but now that you mention it I absolutely should be genning at lower resolutions when testing prompts I'm not sure are good or not yet
The (same)face of hunyuan-image
OK, here's a weird one. This was the prompt fed to the model (after LLM "reprompt"):> A person is captured in a relaxed moment, sitting on the floor of a room while focusing on a camera they are holding. The individual is seated on the floor, dressed in dark, long-sleeved clothing and dark pants, creating a casual appearance. Their attention is directed downwards towards a Sony camera that they hold with both hands, as if reviewing an image or adjusting its settings. In the background, a bed is visible, covered by a bedspread featuring a distinct pattern. The ambient lighting throughout the room is soft and natural, suggestive of daytime light coming from an unseen window, which contributes to the overall relaxed and candid atmosphere. This image presents a photography style.
>text
Sloppy slop
>>106532016the eyes and fine details like the fine dot grid on the radio are much worse than qwen
>>106532016yeah this is definitely distilled from qwenwhy are you like this chinamen
Last hunyuan before I go to sleep 90gb peak vram used for this
>>106532016looks like some text you put on paint, not natural at all
>want to try hunyuan but some required models from modelscope are downloading at 0.5MB/s
>>106532053kek, this is shit, let's hope that the edit model isn't from tencent, those guys don't know how to make decent models at all
>>106532053>can't even model proper gun holding pose>worse text than qwenActual DoA model. Is this the distill or the full one?
>>106532053they managed to make it worse than Qwen Image while having a worse licence, gg wp, it's hunyuanVideo vs Wan all over again
Making loras for chroma is surprisingly easy, the hardest part is figuring out captions and gathering images
>meme model with prompt enhancer and refiner can't beat Q4 qwen
>>106532076Full, 50 steps. The "distill" is the same size so I'm not sure what it actually is.
>>106532096the day the chinks will learn that putting garbage data (synthetic data) on their model will produce garbage out, they'll improve on the AI spacehttps://en.wikipedia.org/wiki/Garbage_in,_garbage_out
>>106532108>https://en.wikipedia.org/wiki/Garbage_in,_garbage_out> The first known use is in a 1957 syndicated newspaper article about US Army mathematicians and their work with early computers,[3] in which an Army Specialist named William D. Mellin explained that computers cannot think for themselves, and that "sloppily programmed" inputs inevitably lead to incorrect outputs. >sloppilykek, I thought "slop" was a recent meme, they were complaining about that shit 70 years ago already
>>106532016The ヶ's a bit mangled but the japanese is surprisingly spot on.
>>106532120> he knows japanese
>>106532127yest. japanfagi mean those are all pretty simple kanji thoughit's kind of interesting to see a model get two languages right at the same time
>>106532127>he can't read moonrunes
Anyone had success with these with gguf? 1st node works genning 10 secs but it takes 10 minutes (can gen 5-6 sec under 3 minutes). I dont quite understand these