Discussion of Free and Open Source Text-to-Image/Video Models and UIPrev: >>106703056https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/sd-scripts/tree/sd3https://github.com/derrian-distro/LoRA_Easy_Training_Scriptshttps://github.com/tdrussell/diffusion-pipe>WanXhttps://comfyanonymous.github.io/ComfyUI_examples/wan22/https://github.com/Wan-Video>Chromahttps://huggingface.co/lodestones/Chroma1-BaseTraining: https://rentry.org/mvu52t46>Neta Luminahttps://huggingface.co/neta-art/Neta-Luminahttps://civitai.com/models/1790792?modelVersionId=2203741https://neta-lumina-style.tz03.xyz/>Illustrious1girl and Beyond: https://rentry.org/comfyui_guide_1girlTag Explorer: https://tagexplorer.github.io/>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkBakery: https://rentry.org/ldgcollage>Neighbours>>>/aco/csdg>>>/b/degen>>>/b/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debo
>>106706459>>style change is harder in new qie+>that's what happens when you want to save a model with finetunes, at some point you're trying too hard and the model starts to lose some of its concept, that's why the pretraining is always the most important part, if the base model is too weak, it's already overchat is it true?
>>1067065022nd for MORE MOTOKO!
>>106706512you get a free lain instead>>106706509yes, im back to shitposting with old QIE desu
>>106706509yes. go a few threads back in the archive and you'll see anons confirming it with gens.tl:dr: it understands some concepts/objects now but lost art understanding.
>>106706509it's all right, Tencent will save us with their own edit model that'll be released in a monthhttps://youtu.be/DJiMZM5kXFc?t=18
Reminder to use the v2 of the lightning lora for qwen edit. Retains original quality and style better for the overall image.
https://xcancel.com/Alibaba_Wan/status/1971485743194484880#mlmao, Wan 2.5 can edit images, and I'm sure that one is way better than the gigaslopped QIE shit
>>106706583it fucks up the edit capabilities thoughever bait>>106706608100% api only, sad
againanon, do you know these models and can you share a comfy workflows?https://huggingface.co/ShinoharaHare/Waifu-Inpaint-XLhttps://huggingface.co/ShinoharaHare/Waifu-Colorize-XL
Do I have to run sage attention nodes for wan 2.2 workflows or can I just use a command line?
>>106706625its illustrious based so just check the 1girl guide in the op. youre welcome retard
>thoughever put your trip back on>>106706608why call it a video model at that point.
>>106706633for wan 2.2 just a command line is enough
>>106706633I find nodes to be better, so I can switch on the fly
>>106706633keep sage off. it's not worth the 5% speed increase.>>106706635they aren't simple models like that, retard.
>>106706623>100% api onlyonly for preview
>>106706644Great.>>106706646>>106706654Why switch? Isn't it purely performance?
>>106706608look at 39 sec, there's 0 zoom in issue, the image stays the same and just changes the cat, kek they're really fucking with us and they're only releasing locally their failed scraps
>>106706658no, flash attention is lossless, sage attention is NOT lossless
>>106706656>being that high on copium
>>106706658anedotal and i don't care if you believe me but turning it off had a positive impact on video gens.barely matters for sdxl but qwen also requires it to be off so i just keep it off all the time now.
>he fell for the "it's going to be cloud only" bot brigade
https://xcancel.com/bdsqlsz/status/1971448657011728480#m>4x qwen imageit means a 80b model, and it's consistent to what that chink is saying in that video (80b)https://youtu.be/DJiMZM5kXFc?t=204it's over...
>>106706693so you're telling me they need a 80b to output this slop? lmao, China has lost the plot, instead of going for a more quality dataset training untainted by synthetic slop, they went the API route, MOAR LAYERS
>>106706635ty it was helpfulbut call it maximized laziness :>
>gm
>>106706662>>106706676Now I have to try it.I just tried running q8 in high noise and fp16 in low noise. Didn't see much of a difference. Strange.
>>106706668https://youtu.be/IhH7gDDPC4w?t=50m58s
>>106706763at no point he said that they'll release the complete model>then we will complete next version, wan 2.5 without the previewhe said "complete" not "release"
>>106706693>80b for thishttps://www.reddit.com/r/StableDiffusion/comments/1nqm5l0/images_from_the_huge_apple_model_allegedly/
so anon, what kind of stuff have you been making with the edit models?surely only wholesome memes that are family friendly, right?> makes everyone pregnantANON STOP WHAT ARE YOU DOING
>>106706608>Alibaba have their own good edit model and won't release it>>106706693>Tencent went for LayersMaxxing and their shit still looks like pure sloplol it's so over dude
>>106706801Hello, /adt/ repost Impressed by the new NetaYume v3, in my opinion it's on par with current SaaS models in the anime field. Same prompt, but for copyright reasons I used character traits instead of names as shown in pic related.I would like to do this with Chroma anime checkpoint. Does anyone have a ready workflow to import and test?My Workflow: https://files.catbox.moe/84cdwx.png
>>106706853Making aliexpress-tier pics for online stores
>>106706484>CumshillUI still in the OP
>>106706891that sounds tedious. i hope you're getting that bag anon.
>>106706693So you need a RTX PRO 6000 (96 gb) to run this shit on Q8? kek, what's the point of releasing this shit at all?
>>106706880sooooooo not local?
>>106706693Will this be the biggest local image/video model ever? If I remember correctly, the biggest one before that was step video (30b).
Nvidia is capable of turning their gpus vram plug'n'play with upgradeable and affordable vram. But they won't do it because they have a monopoly and won't profit as much from it.
>>106706788Well, if this can be effectively quantizised without losing too much quality and it trains well, it could still have good adoption, but those are really big IFsEven with good quality quantization now being available, Qwen adoption has clearly been hampered by its size and slow generation
>120s to generate an imagehow do fluxlets do it? it seems a terribly inefficient way to iterate.
>>106706919calling this a gaming gpu is insane work.
>>106706943>Well, if this can be effectively quantizised without losing too much qualityif you want to run this on a 24gb vram card, you'll have to do a Q2 quant, and this shit is unusable
>>106706920Neta Lumina it's local, its good news for local anime models.
>>106706944this qwen?
>>106706938>Nvidia is capable of turning their gpus vram plug'n'play with upgradeable and affordable vram.the speed is important too, even if your 3090 has 100gb of vram and could run this, it would still be slow as fuck since it has to calculate those 80b layers
>>106706938Well, despite being three years into the AI boom, their competitors are still sitting with their thumbs up their asses = no competition...
>>106706944it's so noisy, like the denoising process hasn't been finished, what model is this?
>>106706880those SAAS are slopped at anime. please do a comparison between neta, noob, and novelAI, which is the only good API anime model
Bros.. the 4090 won't fit at all. Even with the different mounting types.Would running it via one of those external boxes be worth it?
>>106706978>their competitors are still sitting with their thumbs up their asses = no competitionyou don't betray your family anon
>>106706989would you. same thing
>>106707003I wouldn't yeah
>>106706987Did you buy the card without checking if your mobo/case has enough clearance?You could use a riser and place on the card on top of your PC, other than that I don't know.
can someone make a Erika Kirk lora?
>>106706987just get a cheap case which fits
Post gens fags.And I can't right now I'm at work.
>>106707047Make me bitch!
>>106707047/sdg/ is that way nogen
>>106706526Cyber-lainFishnets go with everything
>>106706693I would legit buy an expensive card if this giant model was at the level of Seedream, but it's not the case at all, it's still the same slopped shit you see on your regular model, the fuck are they doing?
>>106707018No I upgraded to a 5090, didn't plan on using the 4090 as well. But then the topic was brought up I got interested.>>106707024I have one of the largest cases, evo xl. It might fit if I stop using pushpull on the aio. But that'd be fully diy.
>>106707096>blurrydreamJust have a grain filter on top of all your images boom you got yourself blurrydream at home.
>>106707096thos guys have insane compute and they're wasting this on moar layers and moar synthetic slop, it's so sad when you think about it
>>106706982What you say is valid, but the thing is, Noob and Ilustrious are both based on tags. How can I fairly compare the prose prompt for Noob and Ilustrious?
>>106706908>>106706891This, why not just take pictures of the actual product?
Moar layers has yet to be debunked tho
>>106707128it's 4x the size of Qwen Image, and do you seriously believe the image looks 4x better? >>106706788
>>106707143You do understand image models are judged on things other than aesthetics, right?
>>106707143>it's 4x the size of Qwen Imageand 6.66 times the size of Flux dev, the devil is with us dude
>>106706987just get a riser retard
>>106707161go on anon, show us how those images are objectively better than what we can do on Qwen Image? >>106706788
>>106707161who fucking cares about anything besides aesthetics? it's whole purpose is to make pictures, if the pictures it makes look shit what's the point?
>>106706772???>get asked "hey why is this model closed when everything's been open from you guys">response "we've done big changes to the model so in the meantime we'll give you guys a preview model for input/feedback that we can use to iron shit out"if it wasn't going to be open he would've just said some bullshit like "model too big" instead of explaining in engrish the purpose of the preview
>>106707179You can fix aesthetics, you can't fix dogshit prompt following or anatomy. Seriously do we have some sort influx of retard in the image gen sphere recently?
>>106706788>81.3 times bigger than SD1.5>22.8 times bigger than SDXL>6.66 times bigger than Flux>4.7 times bigger than HunyuanImage 2.1>4 times bigger than Qwen Image
>>106707175I would if it was released. All I'm saying is judging STRICTLY on aesthetics is idiotic. And I'm an aesthetics fag, trust.
>>106707121noob does have some NL capability. Or, just do a comparison that's mostly tags, or leave noob out
>>106707179B-but the green ball sits next to the blue box on top of the yellow rectangle, also the text is correct!
>>106707195>I would if it was released.there's plenty of images already publicly available, just look at them ane explain to the class how much superior they seem to be compared to the smaller modelshttps://www.reddit.com/r/StableDiffusion/comments/1nqm5l0/images_from_the_huge_apple_model_allegedly/
>>106706969>>106706979Flux,https://civitai.com/models/1961797/srpo-refine-quantized-fp16-forge-compatible?modelVersionId=2220553and a stack of loras, but these being the most prominenthttps://huggingface.co/Alissonerdx/flux.1-dev-SRPO-LoRas/blob/main/srpo_128_base_R%26Q_model_fp16.safetensorshttps://civitai.com/models/1253380/phone-quality-style?modelVersionId=1413027I was soliciting hints to better workflows available out there, I'm still in the honeymoon phase of trying things out
>>106707193>you can't fix dogshit prompt following or anatomyyes you can it's called inpainting and manual work. are you incompetent? how do you fix a dogshit looking gen?
>>106707210>it's called inpaintingif you want to inpaint, just use a SD1.5 model bro, you don't need a 80b model to get anatomical errors
>>106707206It doesn't matter until anon can run his usual autistic tests. Every base model newer than XL is slopped but you don't see anon posting SD1.5 do you?
>>106707210That's literally what a LoRA is for like dude wtf?
>>106706025>>106706046Sadly I don't think it's possible with the nodes we have, but I don't see why it wouldn't be feasible.
>>106707223>you don't see anon posting SD1.5 do you?/sdg/ exists for that no?
>>106707223*newer than SD1.5
>>106707223again, what's the point of a 80b model if it doesn't offer something better than Qwen Image? those images look exactly the same as a regular Qwen Image input, what's the point?
>>106707193>Seriously do we have some sort influx of retard in the image gen sphere recently?They are either new, retarded, or being purposefully obtuse. I can't tell which desu.
>>106707227making a lora for a 80b model is going to be quite pricey
What's up, /ldg/!Last week was an absolute whirlwind. Thanks to a happy little accident, we did our first-ever YouTube livestream!That means the raw, unfiltered VOD was up instantly. But for those who want the polished version, we just dropped a brand-new edited cut today.Get ready to level up, because today we're diving deep into the art of compositing. We'll be breaking down killer techniques and workflows for SDXL, Flux, and even bleeding-edge models like Nano B.>Now, I need your help deciding the future:Want more raw, unfiltered livestreams? Reply with <3Prefer the tight, info packed edited videos? Reply with :^)Can't wait to see you there!https://youtu.be/jmIbIIA9Qmc
>>106707193>You can fix aestheticsdid anyone fix Flux aesthetics? it's been more than a year and we're still waiting lol
>>106707174I don't think you understand the sizes at play.But yes, the riser method is needed if they are to fit at all. I'd have to do it for both and loop it all around and diy a mount for both of them.
>>106707248Is this your first time seeing something new pop up on the jeeterboard? You're sperging out like the GAE is taking away your GPU. Just chill until it's out.
>>106707227no one is gonna run a 80b model, if you want to fit that on a 3090, the best you'll be able to do is Q2, do you know how Q2 looks like?
>>106707263
>>106707265you're the one sperging out about how "stacking more layers makes shit automatically better bro, still waiting for the debunk bro", you are so fucking retarded >>106707128
>>106707273moebros, ramtorchbros what did this ramlet mean by this?
>>106707287>he wants to calculate 80b parameters with ram offloadingit's gonna take an hour to make a single image with our current gpu's lmao
>>106707284Sure, I'm the one sperging kek. Keep telling yourself that
>>106707127Product in a cool setting or looking shinier sells better than an actual picture of the product.Sad but true.
>>106707293>blud doesnt know what MoE models arehow new are you?
>>106707300it's not a MoE model though
>>106707300where's the moe model?
>>106707276Yes nigger, I know what a riser is. We're talking 2x 4slot gpus, not one.
>>106707300who said it's a Moe model anon?
I got a natural riser with all the bouncing boobs gens I made last days
I wonder how long 1 step with a 80b image model would take
>>106706788I don't understand them, they created SPRO and they're not using it on that giant model? fucking why?
yall mind if i up and wildly speculate thoughever
>>106707209this is 60s on a 3090 with the fp8 model. drop your prompt and i'll make you a workflow.your setup sucks man, mine is pure spaghetti right now (controlnets), but even the default workflow can do some good stuff. Why are you using SRPO?
Someone still believes that bigger params = better model? Damn.
>>106707317>>106707307>>106707306my mistake as i didnt follow the conversation 7 replies back being about a specific model and replied to the general statement i saw of>no one is gonna run a 80b model
>>106706693you know what? now it's the time to pray that what this furry fuck said about "seemless" offloading is true lolhttps://xcancel.com/LodestoneRock/status/1968976389807161515#m
Thread of poorfags with miniscule compute
when will temu release a model
>>106707256buy an add faggot
>>106707273>do you know how Q2 looks like?Bigger models quantize better, even Q1 might be fine. We'll just have to wait and see (though the full precision results don't exactly inspire interest).
>>106707256fuck OFF
>>106707407>We'll just have to wait and seeNo... That's too rational. We MUST sperg out right here right now.
SDXL = clayFlux and higher = metalsthe average consumer = stuck before the copper age :(pls Nvidia
>>106707423>No... That's too rational.Oh yes, the rationality that consists of looking at images from an 80b model, noticing that they are not much better than those from a 20b model (and even seem more slopped), and continuing to be enthusiastic about it, Chang, please.
>>106707256This is why you need to aggressively tell faggots like ani to fuck off.
oh yeah it's qwentime
>>106707447anthro her
>>106707443Oh shit you have an advanced copy of the model? Leak it anon!
>>106707464>Don't look at the images bro, they don't mean anything bro, they're just the outputs of the image model after all, and you should never draw conclusions about the quality of an image model by looking at images.(You)
So after trying native wan context I think it's just busted for I2V. Their example workflow for sliding context shows the frame count on the context node being 81 and the total frame count at some number in the hundreds, but trying to do that with the wan image to video node gives tensor size errors. Has anyone gotten wan sliding context to work with i2v WITHOUT setting the frame count on the context nodes equal to the total number of frames?
>continues sperging
>>106706693You can press generate now, and by the time your first image will finished, WAN2.5 become open-source.
>>106707351appreciate the honesty, I'm honestly just throwing stuff at it to see if something works.here's your slag prompt, probably using wrong syntaxhttps://pastebin.com/raw/chw9aZPv
>>106707525>https://pastebin.com/raw/chw9aZPvi gotchu senpai, gimme ten minutes
>>106706693can you feel the power of a 80b model anon? those are next gen images that's for sure 1!!1!1!
>>106707455didn't expect it to work
I still have no idea how to use Qwen Edit
>>106707564they turned flux into a 80b model
that anon is totally not sperging out guys cant you tell?
>>106707584you don't have to, it's so slopped and nano banana destroys everything on the edit space
>>106707589FUCK YOU!!
If nobody is aware of it the faggot is often arguing with himself typically when anon posting so ignore all of it and don't take the bait
size is all that matters.
>>106707265>sperging>>106707297>sperging>>106707499>sperging>>106707589>spergingthat's a bot right?
>>106707584so sorry for your loss.>>106707564oh my god it even has flux chin.flux is a curse that keeps on giving.
>>106707605I'm forced to distill my shit because it won't fit in hers. Sadge :(
>>106707564still looks weird.
there arent enough newfags here to fall for your antics KEK
>>106707525an output is attached. i'll check back in an hour or two if you have questions about anything in there. you'll have to attach your own lora nodes since this is just straight fluxworfklow: https://pastebin.com/JgZEs7QQ
>>106707623I am not that guy but I take all bait, it's more fun that way
>>106707128It's an objective fact that most of the Flux layers do nothing which means the model is not fully saturated. What happens when you start stacking layers is the model can learn to skip them.
>muh aestheticswhy is this an argument lmao if a SD 1.5 model popped up and had prompt and concept understanding on par with gpt/nano we'd all be on it and just lora and upscale
>>106707638Thanks for making a good point instead of malding like the other guy
>>106707645>I love slopSays no one, you're a Tencent employee doing damage control.
>>106707575qwen is horny
>>106707645good piggy
>>106707648>maldingyou're not saying "sperging" anymore? I wonder why lmao >>106707607
>>106707631Man, that chin.. just whyy
>>106707645Morons with a "it's always greener on the other side" syndrome
Remember his thread is dead he can only necrobump it and he craves interaction with this thread. He can't outwardly show himself because he knows he will be kicked out so now he needs to anon post all day doing thee same antics he did for years.There is no need for brand or model wars he's just mad that he's priced out and he can't afford a new card because he never worked a day in his life. Just stop replying and you'll notice it will be just him replying to himself.
>>106707660it's so slopped it looks like a low res painting
>>106707564that bottom left image looks like qwen anime stylehilarious if they spent so much money on training an 80b model just to train on slop
>>106707645>>muh aesthetics>why is this an argument lmaocan't tell if this is bait or retardation
>>106707679because the couch was a shit lowres image. i can try n tidy it up a bit. and slopped is the wrong term retard. but i get what you're saying.
>>106707699>slopped is the wrong termit is the right term you low IQ degenerate, look at the face of the girl, completly plastic and smooth
Why are we arguing over this stupid shit, hone your fucking craft
> anons insulting InvokeAI.Sincere question: why are you doing this to Invoke and not to Comfy if they share the same business model?
I'm tired of the pointless speculation over models no one ITT has access to is all. It happens every time.
>>106707645You can train a SD 1.5 tier model (~600m transformers model) with a 5090. Probably could do it with less than $1000 renting an H100.
>>106707712wow it's nearly like it's a blurry lowres mess.post your gens anon, we're waiting to see howit's done properly.you must kill yourself right now to death.
>>106707723Oh yeah, Tencent had always delivered good sovl model shit after all, why should we be weary of them now??
>>106707723and EVERY time the doom posters get proven right
>>106707723One retard rustles the cage and the rest of them jump in. There is zero reason to argue over shit you can't touch it's like arguing over how good the pussy would feel over a Nun
>>106707631cheers, I'll give it a spin. Already looks quite a bit more elaborate than my adapted chicken scatch.
>>106707731Like when they said Qwen would end up like hidream? lmao
>>106707729>kill yourself right now to deathpleonasm
>>106707731And despite all this free time they have yet to create a diverse well captioned dataset.
>>106707735>like arguing over how good the pussy would feel over a Nunfucking kekd
>>106707729>post your gens anon, we're waiting to see howit's done properly.I won't, unlike you I recognize the models we are currently using are slop machines, I'm not releasing anything until we get something good enough
>>106707741>Like when they said Qwen would end up like hidream?who said that? Alibaba is a highly trusted company since they released the Wan series
>>106707741qwen may be usable for what you want to use it, but that does not mean its not gigaslopped
>>106707750ah, skill issue. got it.stop posting here lil bro, you're wasting space.
>>106707731>and EVERY time the doom posters get proven righthe's out of line but he's right>>106707764>skill issuethat's actual skill issue -> >>106707660
REMINDER: most anon in this thread run sub 10gb cards
the shittier the gen the angrier they get
This thread reminds me of the time i morphed kate bush into a tiger on my pc in the mid 90's, it had mfm drives that needed a kick in the morning to spin up.
>>106707750Why are you attacking him when it's clear he's testing stuff?I post shit I'm testing all the time, this is part of the journey if there's context I'm missing please show me
>>106707774once HunyuanImage 3.0 will be released, every guy that doesn't have a 96gb vram card will be officially called a vramlet
>>106707774i'm going to jerk off knowing i have a medium 32gb dick.goodbye you fucking losers. stay mad.
>>106707759Here and rweddit during release. They called flavor of the month and we would return to the mighty flix/krea lmao. I did return those models to my recycle bin, that's for sure.
>>106707789>They called flavor of the month and we would return to the mighty flix/krea lmao.desu, only Qwen Image Edit is worth a damn, and I stopped using it after the novelty weared off, it's just too slopped
>>106701867>>106707784>96gb vram cardLower your voice when you speak to me you're brand new to the local meta
>>106707741trvth nvke
>>106707735You do know the images we're currently seeing are supposed to be high quality cherry picked images? This is Tencent telling you "look at what our model can do best!", they probably made 20 tries and choose the best one for each one of them, does that scream "it's gonna be good" do you?
>>106707741>doom posters 18484141 - cope posters 1doom posters sissies, how are we gonna cope with our only loss?
>>106707784Their example images don't justify the requirements. At that size I'd expect a model that produces extremely complex perfect scenes. Like a full Peanuts comic strip page.
dis nigga never seen researcher gens before
>>106707833In all fairness 90% of model makers make shit tier gens when showing model ability, still people have a right to explore and see if they can get anything useful. You don't have to use the model anon
>>106707461nice
>>106707837I would have liked to hear that music.
Qwen Image is good enough. I'm done getting hyped for new model releases, we should just focus on Qwen finetunes / controlnets / loras etc
>>106707774no i run an exactly 10gb card
>>106707887Small models are woefully underexplored. What the community should waste time on is a proper pretrained small model ready for finetuning on any mid-sized dataset.
>>106707887nah, I want something smaller, and without the VAE shit so that the edit doesn't introduce pixel compression
>he doesn't seedmaxx
>>106707887false, qwen image is bloated and slopped.we should focus on building our own non-bloated, non-slopped model at 1/4th the size or less.
slopped.we
>>106707887bruh, Qwen Image is barely better than Flux, and flux is almost twice as small
>>106707918Lol this is biggest load of BS ever, Flux doesn't even compare
>>106707887im eagerly awaiting your finetunes
>>106707900Like lumina?
>>106707913we will never make our own model
Based Koff.
>>106706693There's probably hundreds of ways to improve the model by adding novel training techniques or new architectures (or going for a serious unslopped dataset) but nahh, those mf went for the "just stack more layers bro" meme, seriously...
>>106707935anthro her
>>106707929lumina could possibly work but we need to incorporate more optimizations like EQ VAE and TREAD going forward. and lumina has pretty lame base styles>>106707934yes we will, look at thishttps://huggingface.co/KBlueLeaf/HDM-xut-340M-anime
>>106707955>340Mno thanks, we already have SD1.5
>>106707929Lumina is shit because it uses an opinionated, likely censored text encoder. But yes, a ~2B model with maybe something like the Qwen 0.6 text encoder but T5 XXL is still king for being verified uncensored.
anyone created voices? I am trying alltalk, its uses short voice samples to clone voiced. i tried one sample with a latina accent, but alltalk makes her speak british. do different accents need different models?
I'm starting to think Tencent is just incompetent, and HunyuanVideo was accidentally a kinda good model.HunyuanVideo i2v was terrible, and dramatically changed the first frame. They hastily changed the implementation and released an updated model and just said "lol jk change your implementations and use this one instead" but it also had problems.HunyuanImage 2.1 uses a VAE with too high of a compression ratio, which also caused problems with LTX and Wan 2.2 5b. They slapped on a refiner after the fact to cope, and also say you need to use their special snowflake guidance method. Refiner failed with SDXL, nobody will run it, it will fail here too. The model also has a bad license, is slopped as hell, and just worse across the board compared to Qwen.Now HunyuanImage 3 is fucking 80b parameters, literally DoA, nobody can run it not even RTX 6000 Pro, and is even more slopped and just looks like ass for how large it purportedly is.If you've ever tried to read their training or inference code, it's a fucking mess. They never released the text encoder HunyuanVideo was actually trained with, same with HunyuanImage 2.1. Technically, we've all been using wrong text embeddings the whole time.They have no idea what they're doing.
>>106707971vibevoice has been all the rage recently as far as open models go
>>106707955>cheapestnigga
>>106707980yeah for the stunt MS pulled
>>106707969Neta Lumina can do porn fine
>>106707976>They have no idea what they're doing.yep, only Alibaba is the one chinese company that could save us (if they learn one day that synthetica data is poison)
>used to have the problem of not getting enough motion in my i2v>now have the problem of too much motionAAAAAA
>>106707900 >>106707907 >>106707913 >>106707918 >>106707925all vramlets btw
what do we need saving from doe?
>>106707961read the paper, it's a proof of concept. if he can do that with ~$600 of compute, the community can EASILY train their own real base models.Instead of the furry blowing >$150,000 on finetuning fucking FLUX SCHNELL, we could have had a SOTA community-funded fast model by now if we went the route of HDM. ultimately it's inevitable though.
>>106708001It really just seems vramlets are getting mindbroken day after day. Why don't the LLM field have this kinda bitching?
>>106708009>read the paper, it's a proof of concept. if he can do that with ~$600 of compute, the community can EASILY train their own real base models.I wished Tencent read that paper instead of going for gozillions of parameters lol
>>106707969>likely censored text encodernever had an issue with this, it does what i tell it to just fine
>>106708014>Why don't the LLM field have this kinda bitching?are you joking or something? when deepseek got released, the shitstorm was so intense they had to create a new general just for this specific model and appease the "giant models can't be considered local" group
>>106708001Yes anon, I like models that can be full finetuned on consumer hardware.
>>106708001I use Qwen image all the time (fp8 scaled), it's still fucking bloated and slopped. LORAs help deslop it though
Bigma status?
>>106708014>Why don't the LLM field have this kinda bitching?lol
>>106708025Do we have a model like that? Most XL models were tuned on a small to big cluster
>>106707677Ranfaggot is getting desperate for attention
>>106707998I'd rather have the second problem actually, I can slow down the video, while speeding it up reduces the total time..
>>106708043No because the people with compute are retards. For example, Chroma should've been trained from scratch as a 4B model.
>>106708043nta but you can do a full finetune of sdxl with a 24gb card iirc
>>106707976>I'm starting to think Tencent is just incompetent, and HunyuanVideo was accidentally a kinda good model.yeah, they seem to have learn nothing, they went on the right path and instead of keeping those solid fundations they went for something completly new and broken, that's not how you improve on this field at all
>106707887Autistic compulsion forces him to say the line again>reiterates insult thrown at himMore wheelchairs it its then
>>106708014>Why don't the LLM field have this kinda bitching?https://www.youtube.com/watch?v=H47ow4_Cmk0
>>106708059>>106708061You can tune sure but will it be worthwhile? Has there any evidence that doing something like this has shown results. Only the big tunes are usable as far as I can see.
>>106707976>HunyuanImage 2.1 uses a VAE with too high of a compression ratio, which also caused problems with LTX and Wan 2.2 5b.wan 2.2 5b VAE is worse than the 14b one right?
I only want models I can finetune with a TNT2
>>106708024I mean to be fair that one is 670 billion parameter lol
>>106708076Are you stupid or something? I said he should've made a 4B model from scratch which would've had sufficient expressive compacity as a base model while also being easy to train for other community members. And yes, finetuning is easier than making a base model as you can have a much more constrained dataset.
>>106708076>Has there any evidence that doing something like this has shown resultsonly for smallscale stuff like lora extracts, people use clusters because it's waaay faster. training on a few million images with consumer hardware is too slow
>>106708009what would a $150,000 base model get uslike comparable to what model
>>106708095>Will it be worthwhile?>Evidence, shown resultsHere just added some keywords to help you understand, I get hard to english in India.
>>106708106>what would a $150,000 base model get usa small useless shit, unfortunately we'll always be dependant of giant companies like Alibaba and Tencent
>>106708107you get hard to english in india?
>>106708109grim.
>>106708097>>106707945>>106707935>>106707837>>106707709go back to sdg with your slop.
>>106708107Okay, you are retarded. The premise is small models are woefully underexplored. We, however, can use our brains when considering say the 340m HDM model, the 600m Pixart Sigma model, and extrapolate to the overly bloated Flux model. And we can ask a simple question: is Flux being 12B parameters 20 times better than Sigma? The answer is obviously no. Which means we can make the hypothesis that a model bigger than 600m and smaller than 12B could be quite good, especially considering SDXL which is decent despite using a shitty text encoder, shitty VAE, and shitty architecture. So we can make an educated guess that a properly trained 2B model would be better than SDXL. We can also make an educated guess that a 4B model would be much better than the 2B model.
>>106707988stunt aside, i have been using it with great success. i like making asmr voices (slower, more whispery) and while alltalk is very good at what it does vibevoice does it better.
>>106708149>Hey everyone spend your money on this shit I "think" will work over stuff we "know" worksLol good luck with the tune bro
>>106707564>wait anon you don't have a 96gb VRAM card to render us? AHAHAHAHAH
>>106708174imagine standing outside that photobooth trying to get a passport picture quickly taken on the way to an important meeting and all you hear from is autistic and retarded onions voices doing onions laughs
>>106708165>"know what works">spent $150k on a distilled model that underwent cope brain surgeryIt's actually funny how ignorant you are about everything. Chroma wasn't "what works". And we do know what does work because people have done it multiple times. Any DiT model with text conditioning trained starting at 256px and progressing to 1K and 2K. HDM is literally a dick around project and proved without a doubt that the process is fucking simple.
>>106708188why is s. oy censored. what.
>>106708190As I said good luck on that amazing small model that blows everyone else's out of the water. Looking forward to it champ
>>106708144
>>106708032>BigmaI went on a 500m mlp test phase. It'll keep training forever though.
>>106708188what's a onions voice?
>>106708202he wanted to say "s.oy" but 4chan is censoring that word and replace it with "onion"
>>106708199What is it with zoomers with just lying about what other people said. What is wrong with you, seriously.
quick question, is the social media hate towards AI currently big enough to hamper this field?
>>106708214no
Ugh, anons, can anyone help me?
>>106708214people will take whatever they can, especially now where legislation is hazy around copyright for learning materialIf anything was gonna happen I'd expect it to be around that
>generates qwen images and edits, makes them into videosheh nice.>goes back to generating with sdxlanyone else do this? there's just so many more tools and shit available for sdxl. ip-adapter is just pure bliss.
>>106708218ok good
>>106708228it says it filed to extract an archive, delete it and see if redownloading won't fix the erroryour disk isn't full and the folder isn't write protected or has no permission for the user this process runs as right?
>>106708214Why? AI is about efficiency (saving money) which means the people with money will invest in it because it has obvious utility. All social media does is make people better at hiding AI use but I already personally use AI for my everyday work both LLMs for being my code slave and dev duck and using Image models for things like product hero images.
>>106708232>goes back to generating with sdxlThis whole gen is compromised. It's just NVIDIA and Comfy glowies telling everyone to buy more hardware and I bet none of the real anons here have more than 12GB of VRAM themselves.
>>106708214No. If the powers that be wanted to hurt AI you would be hearing "think of the children" type arguments, instead it's just kvetching artists
>>106708241i feel like if it was 100% accepted by everyone we'd have more tools idk
>cumfart ooms after every gen again>can't line up 4 wan gens anymore. again.why is this software so cursed?
>>106708232I deleted all my XL models except one for when I just want a quick inpaint.
>>106708250speak for yourself
>>106708250i'm that anon and i have 32gb vram but i still just like going back to sdxl for ease of use.making wildcards for illust/noob/sdxl is just so much easier.i'm tired of writing entire chapters just to get a decent gen with these new models.
>>106708228looks to me like you have unstable internet
>>106708259I don't get your logic. What tools? People don't work for free. Who are these people who should be making tools for you to use for free?
>>106708273>i'm tired of writing entire chapters just to get a decent gen with these new models.and then there's doing upscaling/2pass with flux, which doesn't seem to exist/work right/whatever does exist is a huge clusterfuck of spaghetti nodes. vs i get better gens just niggering with sdxl a little bit.someone needs to make a proper realism model for illustrious, seeing as we can't rely on the chinese because they only like sameface gray alien girls and the americans like BOGGED negroid physiognomy. who's left?..the french?
>>106708267>barely enough to run HunyuanImage 3.0 on Q4vramlet
>>106708291i tried merging lustify and biglust with some illust models which works somewhat decently but in general yeah, realism models are all biased towards shit unless you use loras.
>>106708297You can get the Hunyuan Image 3.0 experience by quadrupling the layers on Flux with Identity pass through.
>>106708303wow he's literally me>>106708305even a few pony loras somehow manage to add realistic lighting, the future may be in loras. who knows, might try it myself since i have the hardware.>>106708267lmao the 4090 user got called a vramlet get owned >>106708297
When ready >>106708328>>106708328>>106708328
>>106707161>things other than aestheticsIt's literally all synthetic checkboxes in benchmarks. Qwen also had funny charts shown at release with BIG NUMBAHS but in reality it's a a plastic model.
>>106706484Sauce on CWC vid? Looks hilarious