Discussion of Free and Open Source Text-to-Image/Video ModelsPrev: >>107145378https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/sd-scriptshttps://github.com/tdrussell/diffusion-pipe>WanXhttps://comfyanonymous.github.io/ComfyUI_examples/wan22/https://github.com/Wan-Video>Neta Yume (Lumina 2)https://civitai.com/models/1790792?modelVersionId=2298660https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQdhttps://gumgum10.github.io/gumgum.github.io/https://neta-lumina-style.tz03.xyz/https://huggingface.co/neta-art/Neta-Lumina>Chromahttps://huggingface.co/lodestones/Chroma1-BaseTraining: https://rentry.org/mvu52t46>Illustrious1girl and Beyond: https://rentry.org/comfyui_guide_1girlTag Explorer: https://tagexplorer.github.io/>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkBakery: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/b/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debo
From "Localsong" + a lora:https://voca.ro/1cbIetpoY6GvI am telling ya, this shit has potential
>>107154861based
blessed bred
>>107154883that's no language I've ever heard, sounds like gibberish
>>107154885thanks. any OCR or VLM anons want to see if their model can read these?
https://files.catbox.moe/9egs1f.png
What program/model do I use to gen cool landscape images?
>>107154891Who cares when the melody sounds coolModern music is garbage precisely because artists try to give emphasis to the lyrics way too much
https://files.catbox.moe/cqg2n9.png
For those who missed it:https://github.com/Lakonik/ComfyUI-piFlowhttps://huggingface.co/spaces/Lakonik/pi-Qwenhttps://huggingface.co/Lakonik/pi-Qwen-Imagehttps://huggingface.co/Lakonik/pi-FLUX.1>>107154174>Ok this thing is kind of insane. I made a workflow to compare it with normal Qwen, and it's basically the same level of quality while taking less than 10% of the time. Works out of the box with loras also. In fact, with a custom lora on a mediocre quality dataset, the results are arguably better with this thing at 4 steps. It is partially counteracting the shitty quality of my dataset. Absolutely the new meta for using Qwen, it will be impossible to go back with how fast it is.
>>107154886
>>107154908You can try regional prompting, so that in one region of the image it'll follow this prompt, then in another this other prompt. You can also try inpainting----https://files.catbox.moe/2zhb62.png
>>107154918>20s qwen gennot bad, i would still give a little denoise with something to tidy it up.if you gen with qwen then do wan denoise, where do you even post that on civit?
>>107154937>>107154904how long do your WAN gens take anon?
>>107154937I just wanna go:>landscape, big castle, atmospheric, dark clouds, lightning, mountainsWhat does that?
https://files.catbox.moe/e3dk4s.png
>>107154920
>>107154918>6s flux gen with 4steps
>>107154943Takes about 3 mins per generation. The workflow I use has an upscaler that basically generates the image twice>>107154944Hmm, I see. Any image generator can do that. I thought you were going for a specific compostion, etchttps://files.catbox.moe/tcgxrp.png
>>107154883alright, i'm gonna give this a try with some instrumental tracks and see what happens. this was convincing, lyrics aside (which i know the page said it wasn't trained on lyrics)
>>107154972you got the patience for that? asking coz I dont. I can get a 540p WAN video at least twice. I know your gens are super good. its just too long I fee.
>ram prices skyrocketing>rumors of 5000 series supers being delayedbros... I'm about to give in. I'm tired of waiting. Should I buy a used 3090 or 5070ti? they are about the same price
https://files.catbox.moe/sp4jkj.png>>107154981Yeah, I actually set up a bunch of them in a row then I got eat a snack or something, lol. Thanks for the compliment btw. Also: You can cut generation time by half by skipping the upscaler/upres part of the workflow
>>107154997Ill give it a shot. I havnet your level of realism till now.
>>107155004WAN is perfect to recreate the "modern digital" photography style, that you see with most photojournalism and some photographersAlso, it has pretty much perfect anatomical precision, but adding loras (i.e porn loras) decrease this precision https://files.catbox.moe/wbkfmb.png
>>107155038oh yeah the military ones look damn good.
https://files.catbox.moe/rc3h45.png
>>107155038can you do images like this but with bikini thighhighs girls?
>>107155050I can, but I don't wanna get the banhammer. Also, I don't have access to the 5090 I use to generate the imgs rn.I'll post some NSFW next post. I'll just post the catbox link, i won't up the img on the threadhttps://files.catbox.moe/3jpm5w.png
>>107155050+1
>>107154920nta making the other wan gens
>>107155050>>107155072I don't have access to the 5090 I use to generate images rn, sorry. The porn images I've are mostly artsy-fartsy ones
>>107155050>>107155069This gen is a rare one made in the "digital photojournalistic" style I've on hands rnhttps://files.catbox.moe/lei0s5.png
>>107155050>>107155069An example of my typical "artsy fartsy" gens.lmk if you guys want morehttps://files.catbox.moe/y93k43.png
>>107155166can you generate feminist protesting free nipples or something feminist but are actually hot babes with big tiddies in underwear and wearing thighhighs?
>>107155178Yess ofc definitely!
man, all these dit models kinda suck. was raping ram really worth having nlp? everything was just fine if not better when we used controlnets and ipadapter. edit models were a mistake
>>107155069Nice Redditor Gold there, kind stranger!
>>107155178i too would like more
>>107155166>>107155178These are great
>>107155048
>>107155234What track is this?
>>107155240le circuit de wan
>>107155240this is going to be the first playable "world simulator" game. just an infinite race track. probably releasable by someone like deepmind right now
>>107155217>>107155190https://files.catbox.moe/32hb6v.png>>107155188can't, sorry. this machine can't gen imgs
>>107155225Thanks a lot, fren!>>107155234>>107155222Awesome gens, fren! Loved how the lead car went to the F-Zero shield recharge strip at the end there, lmao>>107155240Reminds me of the start/finish line from Imola, but it's not any particular track
Fencing duel gens, complete pic(s) in the catboxhttps://files.catbox.moe/10dpcm.pnghttps://files.catbox.moe/9g7xb8.png
TW: suifuel (contains happy couple)https://files.catbox.moe/ngt115.png
last one for now, gtg work. another duel, this time to the deathhttps://files.catbox.moe/y7jlxy.png
Blessed thread of frenship
>>107155204recipe for this bread?
>>107154958Does it work with Chroma since it supports Flux?
>>107155437try it and find out
>>107154896
Sega Genesis Sonic-style track on "LocalSong":https://voca.ro/13U9LKll5na4Things got a bit bad in the end, but overall pretty good
>>107155437>60s with (30s -> face detailer), 12steps using 8step lora. no dice on chroma, it has hardcoded qwen and flux in the loader
Need a wan lora from the Tylers poop festival video
>happily gen some cute anime 1girls at the start of the year>look away from the screen for a moment>Huge fucking pile of optimizations happenI feel like unless you're keeping up with this daily, you're just hopelessly left behind because its impossible to find information on whatever sage attention or these other -attention fixes are, how to use it, or what they're for because it gets buried under a sea of new or conflicting information.
>>107155866that would be the case if anyone used said optimizations. unless it's merged into mainline comfyui, most of the good optimizations (both for speed and quality) just get ignored/forgotten.
>>107154100>>107154342Nope, doesn't build with downgraded toolkit:(Yaps about nvvc not existing after idling for half an hour. I guess the other anon who warned about incompatibility was right.Gonna wait TM for official support or make separate docker for it later.
What do you want the most for a local model?https://poal.me/7udx6shttps://poal.me/7udx6shttps://poal.me/7udx6shttps://poal.me/7udx6s
>>107156022anyone voting anything than video is retarded, images are already mostly there, the biggest thing we need is edit model without vae, video has a long way to go in comparison
>>107156045>anyone voting anything than video is retarded*or vramlet
>>107156045yep this was my take too
Retards rise up
>>107156045Video models are less suitable for prompt alignment for a single frame
>>107156045I'm excited for video because I know video brings audio in with it immediately as well. Immediately ASMR and braps and sound effects and short dialogue sentences and memes and swears and so much more are solved before we even get a text-to-audio model that's good
You know deep in your hearts that you will not be able to run Sora 2 grade stuff without 48gb vram and waiting 10+ minutes per video even with distillation and quants
>>107154918>ctrl f "edit">zero results does it work for qwen-e
>>107156130correct, we will have something much better than dogshit sora lol
>>107155799Lame ty. Glanced at the code and it seems like there's a few places that would need adapting
>>107156157I am an openai hater as well, but come on anon, let's not cope that way
>>107156170toy model for memes whose only great thing is the fact that they trained on the entire youtube dataset, without that its literally worse than wan 2.2
>>107155614well it got the genesis instruments right for sure
>>107154918Loaded this up and I'm getting 20 second Qwen gens even with my shitty setup, what sorcery is this
>>107156269vram?
What is the current meta lora for speeding up wan 2.2 14b i2v?
>>10715626916GB, RX 9070 XT.
>>107156194It's still superior to any open video model in existence by a country mile, and that will remain true for a long time. To this day, there isn't a single local model that can pull some of the stuff that dalle3 could in 2023If you cherrypick things, Wan does mangled outputs just as often
>>107154918does it work with gguf?
>>107156310>If you cherrypick things, Wan does mangled outputs just as oftennot by a milesadly for you, the apicuck model cant be tested 1:1 with local because its locked into a chastity cage, like all who shill for it
>>107156335>sadly for you, the apicuck model cant be tested 1:1 with local because its locked into a chastity cage, like all who shill for itYou do realize there are other possible prompts other than porn and politically incorrect stuff, right? So yes, they can be compared
>>107156282Let me be more clear.Apperantly I am still using this from 3 months ago:https://huggingface.co/lightx2v/Wan2.2-Lightning/blob/main/Wan2.2-I2V-A14B-4steps-lora-rank64-Seko-V1/high_noise_model.safetensorsIs this:https://huggingface.co/lightx2v/Wan2.2-Distill-Loras/tree/mainOr anything else better than it?
>>107154918>uses own ksampler>uses own model loaderINTO THE TRASH IT GOES
>>107156393NTA compared =/= 1:1
>>107156393Wow. I didn't know that. You're telling me now for the first time
>>107156485You're welcome anon. It's enlightening indeed to know there are more prompts other that "1girl big bobs and vagene", who would have guessed!
>>107156458There also seems to be a moe distill lora...
>>107156509damn, gotta step my game up, i mean imagine a 1girl with smal bobs... it got my creative juices flowing(and unretarding for a minute: curiosity in how to setup those matrix comparison graphs people post every now and then, since those can be programmed, i think?)
>>107156523There also seems to be v1030 that got deletedhttps://huggingface.co/Kijai/WanVideo_comfy/blob/main/LoRAs/Wan22_Lightx2v/Wan_2_2_I2V_A14B_HIGH_lightx2v_4step_lora_v1030_rank_64_bf16.safetensorsI don't expect a wall of text spoonfeeding me strengths and weaknesses of all but just what are anons here using in their daily gens?
>>107156157we still don't have DALL-E 3 at home, stop coping
what's a good free software for managing gens? preferably one that shows the metadata like prompts. I'm getting to have too many. bonus points if it does wan too, though idk if it actually has metadata yet. I only just started with that
>>107156736correct yet again, we have something much better than dalle 3, the possibility to train a lora on anything you want and generate with any parameters you want with no limits, including training a dalle 3 style lora itself like picrel
>>107155204It upsets me that I can't reproduce this solid vectorized style.
nano banana 2 is too goodits over for local
>>107156761lora https://civitai.com/models/2093591
>>107156772The better proprietarycuck edit models are, the better outputs the new qwen image edit model can be easily trained on, thanks for spending millions for local to snatch it all up for free before training a clothes remover lora within a couple hours lol
>>107156804based
>>107156761it's not about the style, or any specific thing object/concept, retardthat you thought it was tells me all I have to know about your intellectual level, you don't understand what dall-e 3 has that local still has not and you never will understand because you're a moron
>>107156845>no argumentoof, thanks for conceeding
>>107156462This. I can't fucking use this in my workflow. I needs my snake oil!
>>107154826>not collaging the real braphog
>>107156772It still can't do maps. (Courtesy of some plebbitor.)But yes the whiteboard math equation stuff is impressive.
>>107156335>not by a mileNo local model can gen multiscene videos WITH audio at the same time, so yes, nothing local comes close to it currentlyThe closest thing to it is this Wan fine-tune for multiscene, which has no audio:https://holo-cine.github.io/(and I haven't seen any anon use this)Apparently they will release the weights for an audio component later though, so we'll see (there is a HoloCine-audio in the roadmap as well as an I2V version)
>>107156920no proprietary model is gonna allow you lora creation for whatever you want nor to tweak every gen parameter, that is the thing that actually matters, everything else can already either be done locally or can be done locally but with more manual work worst case scenario, but proprietarycucks literally CANT do these things and wont ever be able to in any way.
>a- aunt jemima... is that OK to wear in public?
>>107156854keep on coping, copeboy
>>107157028>no argumentalready accepted your concession lil bro, keep crashing out
>>107156982Very nice anon
>>107157052you do whatever it takes to keep the cope aliveis this you?>>107156940>everything else can already either be done locally or can be done locally but with more manual work worst case scenariolol, lmao even
>>107157076>no argumentthis has to be a bot, right? lol
Most important things for new pc if I wanna do decent video gens in a non absurd timeframe? I don’t wanna reply to ever webm in here asking for pc specs but if someone wants to post some with their specs/how long it took I’d greatly appreciate itBudget is about 2.5k for new pc
>>10715709816gb vram is the single most important thing. more than that is better. less than that you're fucked.
>>107157092of course, anyone who laughs at your lack of intelligence is a botthe argument is that you're a retard, you give more weight to what can be done locally just to poop on the things local can't do yet, that's moron behavior>can be done locally but with more manual work worst case scenarioANYTHING can be done locally but with more manual work, just grab a camera, hire actors, make a set, film it, pay jeets to VFX it and there you have it, no Sora 2 neededit's an useless statement, you absolute shit for brains baboonthe whole point of AI is to have less manual work, if Sora 2 can do it without the manual work then it is (even if just for now) better
>>107157098nvidia gpu is the only thing that really matters. 16gb vram+. 24vram is practically required if you want top quality video gens. minimum 64gb ddr5 ram for offloading model cache if needed. cpu isnt important but you'll want something made within the past 10 years at least.
Question to the anons using Wan2.2 text-to-video (not I2V), which lora are you using?
>>107155166crazy workflow, nice >>107155364im so lonely bwos
>>107157280There was this released two days ago if you're talking about lightx2vhttps://huggingface.co/lightx2v/Wan2.2-Lightning/tree/main/Wan2.2-T2V-A14B-4steps-lora-rank64-Seko-V2.0
>>107157141>be proprietarycuck>you cant train a lora to add a style to the model>you cant train a lora to add a character or a person to the model>you cant train a lora to add a concept to the model>you cant train a lora for anything at all>you cant finetune the model>no big company can finetune the model like many companies are doing right now with wan>you cant have anyone research around the model at all to improve its architecture, find optimization avenues, fix issues, change specific layers, text encoders, vaes, learn how to make better models in the future and advancing the entire ai industry itself etc>you cant generate gore>you cant generate pornographic material>you cant generate anything else someone else would deem "problematic", no matter how mundane it might be>you cant generate anything they at any point in time say you cant generate in the future when they change their mind overnight>you cant generate anything at all if their servers are overloaded, not online, or broken>you cant generate anything without it being logged and all your data harvested and sold>you cant control dozens of generation parameters that would allow you to have precise control over what you generate, no matter how specific>you cant write nor test out new generation parameters like new specialized samplers and schedulers>you cant do anything about it if they decide to lobotomize the model you are using or remove it completely overnight, never being able to truly recreate what you once did and liked>you cant test out new papers coming out with new technologies like completely changing how an entire portion of inference works, like completely changing how cfg works, completely changing how negative prompting works (https://github.com/hako-mikan/sd-webui-negpip) etc etcAs a proprietarycuck you are paying to be in a limited and spied on cuck cage and you lash out when someone calls out your evil corpo master and your pathetic cuck predicament.
>>107156982how the fuck are you guys, like pancakechad for example, genning animateinanimate like this? fuck this is so good.man i know my brain is rotted when i find pancake and syrup women hotter than any e-girl kek
>>107157453very carefully
>>107157470i asked how you gen them, not how you fuck them!but true.
>>107157311No lora I found works well with Holocine (the multiscene fine-tune)
WAN 2.2 anons: just bought a 5070ti and I've been playing around all weekend to get a good workflow for keyframing a longer animation>Generate ~12 separate 'keyframes' in SD for character LORAs >Inpaint poses/details - create depth masks to quickly delete background in photoshop to keep character in white void for WAN>send color 'keyframes' 1 + 2, 2 + 3, to FFLF2V to get a crude timeline of 2-3 second clips (turning, raising, pointing, draining a pint glass, etc. ) >i2v Q_8 gguf in the comfy 'workaround' gets jarring "Flashes" on reaching last frame as it quickly tries to compensate for color degradation, but LORAs are made for i2v.>Inpaint Q_8 gguf seems to go faster and solves the flashes, seems to take the LORAs but i'm still unsure how well it will work long term.curious how to proceed here:>finish all the 2-3 second clips in i2v and try to save it in premiere>keep playing with the inp. to get it to follow styles so I only need to fix the front half in post or re-gens>Learn how to use VACE and how to use the last and first 8 frames of each clip to preserve the motion>Take the entire 24 second video with jank coloring and learn VACE v2v to depth mask the entire thing and regen. >>107157199minimum 64gb ddr5 ram for offloading model cache if neededI have 32 and have been holding off because prices are gay. is it actually super necessary?
>>107157556vace
>>107157556>is it actually super necessary?No but excessive swap use you get with 32 gigs slow generation down considerably.
Why are the vue nodes so fucking huge? I want to use them, but this is ridiculous.
>>107157556It's much faster with 64gb+ ram
>>107157453prompt for the original one:>professional 4k high resolution hyperrealistic 3d render by Disney Pixar of a beautiful nude curvy woman slime girl who is made entirely out of maple syrup. Her whole body and face are translucent and seethrough syrup. Her hair is made out of melting butter. She sits cross-legged on top of a huge stack of pancakes. Her body melts onto the pancakes. The pancakes are on a modest porcelain plate in a 50s American diner restaraunt. >raytracing, beautiful lighting.standard chroma WF
>>107157199What does offloading model cache mean and what do you mean by 16gb vram + .24vram?
Easy Cache, Lazy Cache, Apply First Block Cache, Wan Video Tea Cache, Wan Video Mag Cache, Wan Video Tea Cache Native, Wan Video Easy CacheWhich cope cache node do you use and at what settings?
>>107157592would 96 make any difference or is that just pointless? the price ladder from 64 is a lot narrower than it used to be due to being a weirder size + slower clocks for XMP >>107157565>Vacewhat's the point of the 3gb "Module" Vace FUNs at https://huggingface.co/Kijai/WanVideo_comfy_GGUF/tree/main/VACE versus the large models at https://huggingface.co/QuantStack/Wan2.2-VACE-Fun-A14B-GGUF/tree/main/HighNoise?Do you load the modules in the same chain as the regular i2v (or inp) model to save on disk space while achieving the same result?
>>107157642For example, Wan2.2-I2V-A14B-LowNoise-Q8_0.gguf is 15.4gb. If you only have 16gb of vram on your gpu, that leaves you with 0.6gb of vram. Keep in mind, the text encoder + loras + vae also are stored in the vram. Since all that can't fit on a tiny 16gb card, you can set a specific amount of the model to be swapped to your system ram. IE, 10gb of the wan model off loaded to system ram. This will allow you to gen without running out of memory. Off loading to ram is much, much slower, but it works. Optionally, you can use a lower quant version of the model, like Wan2.2-I2V-A14B-LowNoise-Q6_K.gguf which is 12gb, but lower quants = lower quality.
>>107157642He's saying you should aim for 16gb vram minimum but 24 is preferable. Offloading is when you can't fit the entire model into vram so you use your system ram. wan 2.2 q8 is like 15 gigs(?) for one of the models
>>107157685For Wan2.2, you don't use any of them.
>>107156022imgchad rein eternal
>>107157370
>>107157713Is there a reason why?
>>107157705>would 96 make any difference or is that just pointless?Hard to say really. Depends of the motherboard combo I guess.
>>107157740Video generation is iterative.
https://civitai.com/models/2114848/2000s-amateur-photographyAs requested. Not perfect, but reduces vaginahorror and manfaces.
I tried Holocine and I could not get the same results as their demo even with 15 seconds lolI used the same promptI obviously had to make some sacrifices like using distillation models with 5bit quants"b-but local is better than saas, trust me bro!""results are shit? It's your fault you are poor and don't own an H100, the pinnacle of LOCAL gpus :^)"
>>107157641thanks <3
>>107154956this is AI?
>>107157987That looks like a zoomer idea of what 2000s photography looks like, and some of the photos in the showcase don't look "amateur" at all. At least search for photos that used popular cameras from that time like Sony Cybershot, Olympus, Canon PowerShot etc, or search for old myspace photos or older photos from Flickr.t. Millennial
>>107157987bruh moment, as the kids say. https://civitai.com/models/978314/ultrareal-fine-tune?modelVersionId=1413133
>>107157987wait regular chroma cant do vageen? wtaf
>>107158042Dataset is mostly from 2000-2010 era.>>107158090It can, but it gets confused.
>cold weather>gpu 100% to warm roomOhh shit it is GOON season
But for what shall i goon to?
correct me if im wrong, but is there any reason to make a high noise of a character lora for wan? there's no motion, so what would be the point?
>>107158114>Dataset is mostly from 2000-2010 era.I am a Millennial boomer who lived that era and at least the showcase images don't resemble the amateur pics from that era at all
>>107158147It's less about "motion" strictly but denoising strength.You might be able to make do if your character looks like a normal human with just low denoising lora. But for something like say Kirby or Sonic, you probably want for both.
>>107155187
>>107158193I see, thanks. I've been experimenting with my character lora while using other NSFW loras, and I noticed that using the low rank of some loras forces my character(person) to look like whatever person that lora was trained on. How can I avoid that? Increase the strength of my character's LOW lora? remove the NSFW's low model? I've tried both but haven't found anything solid that works. I can't get rid of the low lora for some NSFW loras because wan needs that data to create for example, a penis or cumshot. The twerk lora for example, always makes the ass bigger and i don't want that. its so annoying. lowering the strength of the nsfw lora helps but also reduces the motion
n00n0
>>107158244nani kore wa yameto my ramenu betta stoppa acting up i'm gonna nækædæshi my ramanu
>>107158224>How can I avoid that?I should note that I never trained a WAN lora, but this seems like a generic lora compatibility issue to me. Try lowering the strength of other lora?>Increase the strength of my character's LOW lora?Maybe just a bit if you are desperate.>remove the NSFW's low model?Probably not.>The twerk lora for example, always makes the ass bigger and i don't want that.This just means the person who trained it, trained on big asses.Train your on with diverse dataset of asses of all sizes?
flux/chromosome users, how do you handle your text encoders? do you use specific quants? i'm starting to wonder if my shit gens are a product of what i'm using, but i'm not sure. cumfartui is very confusing as well so that's a variable. the default flux krea workflow is 3 whole seconds slower than an old workflow i was using earlier this year..
>>107158330keep t5 at fp16 imo.
>>107158330q8 chroma, fp16 clip, 26-35 steps, euler simple/betatry "aesthetic 1" in negative
it doesn't understand left/right but far/near seem to work
>>107158350>>107158378thanks. i guess i was trying too hard to save on vram by lobotomizing the text models.
>>107158385Why do text encoders struggle with directions? That's not an isolated incident.Quick theory:Is this because right/left can mean both viewer's right/left and character's right/left, which ends up confusing the UNET during training?
>>107157987thank you for your hard work
So sounds like it’s worth going down a generation to the 4x cards if I want 24gb vram at a more reasonable cost
>>107158475>>107158542neat
i literally gooned for 12 hours today
>>107158554so a 4090 then? aren't they like 1500 dollars
>>107158162I guess I could rename it, fair point>>107158435npnp
>>107158607Can you catbox your picrel or a similar gen?
>>107158665https://files.catbox.moe/0andv6.png
>>107158723Thanks!
>>107158781>>107158726>no large breasts, wide hips
>he doesn't (large breasts, wide hips, thick thighs:1.5)
>we may be getting a $2k stimulus checkand i'm 100% going to use that money to buy a 4090, kek
>>107158919You should by stocks, dummy. Preferably OpenAI stocks of course lol
>>107159071but i want faster gens right NOW
Hello, I am from the TouHou AI general on >>>/jp/2huAI/. It is a very nice and good quality general, but it is slow to answer simple questions. I have a question about making a lora. Is this the right place to ask?
>>107159091you could've just asked the question instead of wasting a post asking for permission to ask a question
ooooh baby SongBloom is cookin up some songs real nicehear that sizzle & smell dem onions
>>107159161https://vocaroo.com/1myH3aeJX4hTTrying again, the tricky part is trying to get lyrics adherence but the right amount of song stealing.
>>107159282nice, can she SING?
>>107159091Going to bed now might answer your question hours later if it makes sense, if no one else answered it and if I don't feel too lazy.
>>107159296nobody cares bro.bro go to the poop festival, in your dreams.bro
>>107159294dunno
>>107159308>slow motion shitlightx2 crap
>>107159256https://vocaroo.com/1cHl7bT8AHk0
> no new better cards> no new better modelsit's so over for local
>>107159481I'm literally posting SongBloom gens, dearest sir of the African persuasion.
>>107156778cth-uwu.
>>107159495do you know how you would prompt this?https://www.bbc.com/news/articles/c1wl5jp94enoI genuinely have no idea
Is normal forge still the only UI that uses Gradio 4?
>>107159504you could try this maybe? it does alrighthttps://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
>>107159519>https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one>This photograph features a large, shiny, blue, abstract sculpture of a humanoid figure with a rounded, bulbous body and simplified, elongated limbs. The sculpture has a smooth, glossy texture, reflecting the surrounding environment. It stands outdoors on a paved area with a grassy patch behind it. In the background, there are palm trees and a building with a white facade and a red horizontal stripe near the top. The sculpture's head is slightly tilted downward, and its expression is indistinct due to its abstract nature. The bright blue color contrasts with the green grass and the white and red building.I may try it.
>>107159452nogen crying
>>107159562I gen 24/7 but ok
>>107159618cry moar kid
>>107159618win VOMIT dows
local still can't compare to grok imagine
>>107159643can grok do explicit porn? no? who cares.
>>107159618WHOA lookout, we got a WINDOWS guy here
Any good alternatives to lightx2v, other than waiting 10 minutes for a gen?
>>107159679Personally, I think SongBloom is the best replacement, but some may disagree, for very stupid reasons.
>>107154861lol
>>107159561Running it
>>107159652Yes. I use freeBSD, headless debian and primarily Windows, anon.
>>107154883thanksSongBloom is way beyond lol
>>107158919>4090Shit, $2k is like 8GB of RAM these days.
>>107159727
>>107159452>lightx2 craplet's hope the new 4 steps distillation method will make the slo mo shit dissapear >>107154918
>>107159737why not just buy a 5090 at that point
>>107159816I've said it a billion times but slow-mo isn't the only problem with lightx2. it nukes the liveliness of animations. everything is simply less animated. less things move.
>>107159702image & text go together, but cba to make a video, so just play it and stare at the picture. or ask someone to wan it, I guess.https://vocaroo.com/1otWsnlu7TwZ
>>107159835That's an extra $700-900.At that point may as well buy a RTX 6000 Pro. Just a few extra $$May as well buy an H100Shit may as well buy an H200
>>107159835:^)so, apparently the 5090 sucks for ai, anyway.
So the only recent local developments are speedcrap for turdworld poorfag shitskins? Local really died with wan2.5, what a letdown
>>1071598405090 is $2k bud
a pickle, for the knowing ones
>>107159308>Uses ai to create something that basically doesn't exist. An east-asian with giant tits that aren't fake. This is the future of stable diffusion.
>another day, another lora
>>107159910>2.2, still slow mo
https://vocaroo.com/1fh5yWC322DTused joy caption on the top image fromhttps://www.artforum.com/features/yuk-hui-daniel-birnbaum-interview-1234733869/>Photograph of a clear, rectangular ice cube suspended in mid-air against a bright blue sky with scattered white clouds. The ice cube is transparent with visible internal crystal structures and slight surface imperfections. In the background, there are blurred green trees and a tall evergreen tree, indicating an outdoor setting. The image has a sharp focus on the ice cube, with a shallow depth of field that blurs the background. The sunlight illuminates the ice cube from the front, highlighting its transparent and textured surface. The overall composition emphasizes the contrast between the sharp, detailed ice cube and the soft, blurred natural background.chroma hd and SongBloom
>>107159910What lora is that
>>107159960who cares, they're all literally the same.
>>107159960https://civitai.com/models/2063310?modelVersionId=2334783just published the wan version
>>107157987how many images for the dataset?
>>107159982Nice
>>107157370Unfathomably based
>>107157290don’t worry anon that’s just your dumb hormoneswomen are overrated
>>107160004600, next version has 648
>>107154918this is way closer than the lightning method, impressive
>gooning to your own gensisn't this just a more convoluted and expensive way to goon to your own imagination? what's the point?
>>107160054damn dude, i rarely go over 20 images. have you done any tests with smaller training datasets?
>>107160083aphantasia
>>107160106>have you done any tests with smaller training datasets?Yeah. I prefer larger datasets for more variation
>using supervacetools to make long video>long pauses between each gen>swap the "patch sage attention kj" node with "model patch torch settings" node>no more retarded long pauses in between gens>near double the speedfp16 accumulation is pretty dope, wonder if it'll work on wan2.2
>>107159932this i2v or t2v? what x2v youre using?
>>107157370train a lora to make Wan have the prompt adherence of Sora 2
https://github.com/wallen0322/ComfyUI-Wan22FMLF Improved tech just dropped.
And a qwen edit upscaler. https://huggingface.co/vafipas663/Qwen-Edit-2509-Upscale-LoRA
>>107160391>Wan fuck my lifeStart, middle, and end frame is a nice addition.
>>107160391> - Dual MoE conditioning outputs (high-noise and low-noise stages)> - Multi-motion frames support for dynamic sequences> - Automatic video chaining with offset mechanism> - SVI-SHOT mode for infinite video generation with separate conditioning> - Adjustable constraint strengths for each stageInteresting.
SPARK chroma fixed chroma.
>>107160523Thought svi was for 2.1 and 2.2 5b? Would it properly work with 2.2 14b? Also pretty sure there's either going to be dedicated nodes or comfyui native implementation, soon hopefully
>>107160582Native comfyUI nodes often times are riding free ideas from others or poorly supported just for market capture. Dedicated nodes from other people tends to get faster updates and works better. More than once, I am disappointed with Comfy implementation. IE, inpainting and wan2.2
>>107160565Show realism, then we talk
>>107160391I was just asking about something like this a few threads ago. There is a ton of multi image pixiv illustrations as well as my own ai ones that would make for great animation with this.
What's the state of the art for local photorealistic video gen?
PSA: pi-flow combined with loras seems to be slightly last slopped than the normal Qwen-Image experience. To get rid of plastic skin slop and "cinematic" stuff, avoid using words like "a photograph of (...)" or "an image of", and use "Amateur footage of (...)" instead and you will consistently get better photo-realistic results. Recommended model: Lenovo Ultrareal
>>107160622makes me wish a different UI got all the community attention. anons are too doompilled on comfy since it focuses on saas more than anything nowadays
>>107160622>>107160804It's not too late for (You) to contribute to stable-diffusion.cpp
>>107160772>Lenovo UltrarealMy favourite LORA
>>107160772>slightly last*Slightly less. I am sleepy
>>107160083it's gooning to gambling and chance that the prompt matches what you had in mind beforehand. twice the degeneracy and an excuse to edge endlessly
>>107156856now that is just ugly
>Try to generate a headshot>The top of the girl's head is always out-of-frameHow can I fix this?
>>107160837tell the Chinese to switch onto it, it's run by one of their own so I don't get it. it would also gatekeep western companies since they hire Indian slaves to poothon all day. would be hilarious if they changed their minds and cucked america by doing that though
In case you guys want to be disappointed of the state of local: prompt Qwen-Image for centaurs.
>>107160965She looks like your average Bong woman
>>107160980If putting "out of frame, cropped" in negative prompt isn't enough you can always just outpaint.Another option is to resize and draw in the hair color in an editor then inpaint that to match the rest of the image.Failing that you could always generate a taller aspect ratio full body shot and crop out what you don't want to use.
>>107160980Add more details to the prompt of what you want to see: eyes, hair, etc.
Does anyone know the default wan2.2 settings without light loras? 20 step (10 h + 10 l) and euler + simple, high start step 0 end step 5 / low start step 5 end step 10000?
>>107161034Best I've got.
>>107160083>isn't this just a more convoluted and expensive way to goon to your own imagination? what's the point?For me it's a combination of aphantasia like the other anon said (I can't visualize things) as well as "playing with dolls" (some anon once mentioned that genning 1girls is the same delayed brain development as people who play with dolls and I 100% agree because I independently came to the same realization/conclusion)>>107161131"Default" wan 2.2 to me sounds like 50 steps (25 each) on unipc sampler with default CFG and flow shift values
>>107161160Got it to work with the 20 step but only gen once. I tried genning again and get an error ofCLIPTextEncode'GGUFModelPatcher' object has no attribute 'named_modules_to_munmap'I've already updated everything to the latest version. Doesnt seem to want to work without light loras, kek
>>107161034It can, however, do the reverse (prompt was 'a headless horse', after some seed hopping)
>>107160582The node doesn't work even without svi lora.
>>107161206Never seen that error before, you can open your ComfyUI folder in Visual Studio Code and give Copilot the error and see if it can help figure it out. Since this is a clip error I'm assuming you're doing image2video? Since only wan i2v should be using a clip model (clip vision to read your input images)Text to video and image to video have different settings and nodes required. The default comfy workflows on the GitHub for wan are fp8 scaled and don't have any lightning enhancements so you can use those to do your full-step gens I guess
>>107156291on linux?please share your setup, what distro and kernel version
guys is qwen fp8 better or q8?
>>107161341Yes, its i2v. I just switched to the multigpu unet and clip nodes instead and that solved the problem. Yeah there's always some kind of new error every update
>>107161397>guys is qwen fp8 better or q8?Q8 is fp8 with some layers kept unquantized so it should be strictly better
>>107161470thjanks. I dont know what it is with qwen, but it takes so long to gen. Wan videos are so much faster!
>>107161470>Q8 is fp8 with some layers kept unquantized No. Q8 is basically FP16. It is much better than FP8. This is common information you can google.
>>107161497What you said didn't invalidate what I said. Q8 is some of the blocks at int8 and some blocks at f32. I'm planning on testing t5_xxl with the different fp8 versions versus Q8_0 today
>>107161474Don't you use a speed-up lora?
>>107161642theres so much quality loss tho
>>107161648are you real? finally a real person that agrees lightx2 and speedup loras are fucking dogshit
>>107161666umm yeah, if u dont think theres quality loss then ur blind af
>>107161666No one is denying that. But also a genuinely improved t2v version came out recently, and so did an i2v version but it was worse than previous ones so only t2v got an improvement recently
>>107161341i've been doing i2v without clip vision and it seems just fine, would i get better results if i add it?
>>107161730I have absolutely no idea to be honest since I hardly do i2v locally I never used it either but it's supposed to be required. It works on a GGUF workflow without clip vision but broke for me using kijais nodes iircBut I am also out of the loop of what nodes to use nowadays. My 2.2 workflow is full of deprecated and beta nodes since it just works, it's GGUF and there's no actual benefit I get from remaking it until until there's a new model to run, which will have its own nodes and workflow needed anyways probably
>>107161730>>107161775Asked perplexity about clip vision for wan2.2...>Clip Vision is generally not necessary or beneficial for WAN2.2 workflows, according to user reports and in-depth testing from the image-to-video AI community. WAN2.2 is designed so that it no longer relies on CLIP embeddings; this marks a shift from previous models like WAN2.1, which did have image cross-attention layers that could utilize CLIP vision. When Clip Vision is supplied in a WAN2.2 workflow, it is simply ignored, so it does not improve generation quality or prompt adherence, and may actually slow down video creation times by several minutes.
>>107159091Don’t mind the mean people
>>107161648There's a new one that promises to be better, look up the thread. Altenatively, gen in stages, and only use the full model during critical ones. I don't think I could tolerate genning with Qwen at all, but at less than 10 seconds per preview without optimizations, I'm chugging along merrily. But then I've split my workflow into so many sampling stages, the workflow is becoming unwieldy by itself.
>>107161882>and may actually slow down video creation times by several minutes.Good to know.
>>107161882Then kijai is more of a vibe coder than I thought lol. Pretty sure it's not a false-memory that I needed to download clip_vision_h in order to get one of his nodes to stop complaining, even if that node ended up never using it I'm bored and my new job starts in a month so I'll spend today making an "opinionated 2.2 t2v guide/recommendation" rentry as well I guess
>>107161926Probably a hallucination. Clip vision is like a 70mb model so even if it's being loaded and unloaded every generation without doing anything it can't be adding more than a couple of seconds max
>>107156462>Stanford University, Adobe Researchyeah i'm not going to be installing that adobe research shit on my machine. I really just don't trust them not to sell my data for research purposes using some fuckery inside of their nodes. also no gguf support? TRASH
>>107161947This post activated the neurons in my brain that reminded me that Tel Aviv University made a really good text to video model and put out a paper and then never released it. I think this was either before or during the wan 2.1 era
>>107161938> Clip vision is like a 70mb model
>>107161995>>107161938clip_vision_h used for wan21 is like 1gb. still, on an ssd you'd barely notice it. if wan22 doesn't use it then there's no reason to have it.
>>107161995 I was wrong but I also swear I downloaded a tiny clip vision h as a .pt before >>107162021>clip vision h for wan 2.1 onlyThat explains it. Thank God 2.2 got rid of the double text encoder autism that hunyuan introduced. Too bad we got refiner autism instead
do you get better prompt adherence with fp16 clip compared to fp8 scaled?
>>107162068fyi, just ask claude these type of questions. higher precision models will always be better. how much of a difference in quality/prompt adherence will always be subjective and debatable because it entirely depends on the prompt and model.https://claude.ai/
>>107162109please go away
>>107162123you asked a question and got an objectively correct answer. if you're upset that it was ai generated while also posting in a general about generating ai content then you're a fucking retard.
>>107162140>>107162109> is model A good?> according to benchmarks model A is the best...
>>107159646why don't you just have sex?
new>>107162296>>107162296>>107162296>>107162296
>>107162259the kind of sex i want is forbidden.
>>107162300stop lusting after horses
>>107162300ask the friendly fbi agents to kindly break all of your limbs
>>107160349t2v
>>107162606check the new thread, but is this with the new seko v2.0 version of lightx2v? in my testing slow motion has gotten much better most of the time