Discussion of Free and Open Source Text-to-Image/Video Models and UIPrev: >>106464276https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicChromaforge: https://github.com/maybleMyers/chromaforgeSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://tensor.arthttps://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/sd-scripts/tree/sd3https://github.com/derrian-distro/LoRA_Easy_Training_Scriptshttps://github.com/tdrussell/diffusion-pipe>WanXhttps://rentry.org/wan22ldgguidehttps://github.com/Wan-Videohttps://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y>Chromahttps://huggingface.co/lodestones/Chroma1-BaseTraining: https://rentry.org/mvu52t46>Illustrious1girl and Beyond: https://rentry.org/comfyui_guide_1girlTag Explorer: https://tagexplorer.github.io/>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneSamplers: https://stable-diffusion-art.com/samplers/Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkBakery: https://rentry.org/ldgcollage>Neighbourshttps://rentry.org/ldg-lazy-getting-started-guide#rentry-from-other-boards>>>/aco/csdg>>>/b/degen>>>/b/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debo
Blessed thread of frenship
They tell me this thread is blessed
neta is the future for anime start learning now https://neta-lumina-style.tz03.xyz/
>>106469516>>106469514all of Gods children are blessed by his grace
>>106469536>.xyzOh yes, anon, this link is definitely trustworthy.
>>106469565Seems fine
>normalfags calling other normalfags out as being too eager to use the word "clanker" as a slur ai bros stay winning
>>106469612I didn't realize styxhexenhammer was this big
is there any hope for local t2v? except for making shizo videos, local t2v is obsolete. only local i2v is excellent for serious things
>>106469625It's pretty good for porn with porn LoRAs but yeah most videos end up like a fever dream.
>>106469625I mean, we’re basically in the Ford model an era of this stuff and here you are writing off the future of cars. Bit silly don’t you think?
>>106469621i dont know who that is or how he relates
>>106469625It's good for producing extremely plausible realistic single frames from the video. Like if candid realism if your goal I'd just extract a frame from i2v
Anti AI fags have gone fucking nuts the last few months.
whats the best tool for automated video captioning? I want to try training a t2v lora but I dont wanna caption my entire dataset by hand. I'm trying chatgpt but it cant caption videos for shit.
>>106469652>the last few months.That's an understatement
>>106469625remember how anon hyped Wan T2I? shame the diaperfag decided to tune qwen instead. maybe the bigasp guy will do it.
>>106469658Gemini. It shits all over the other models for OCR stuff.For example. >>106469558Based on the video provided, here is a description of what happens:The video features two characters from the TV series Star Trek: The Next Generation: the android Lieutenant Commander Data (on the left) and Lieutenant Commander Geordi La Forge (on the right). They are seated side-by-side in the cockpit of what appears to be a shuttlecraft.The sequence of events is as follows:Data holds up a blue and white package resembling a pack of cigarettes.He removes a single cigarette and places it in his mouth.The tip of the cigarette spontaneously lights up on its own.Data takes a drag from the cigarette and then exhales smoke.Throughout this entire action, Geordi La Forge looks at Data with a surprised and bewildered expression.
>>106469536>Boris VallejoLooked up his style and... that's quite sad. Of course, nano banana can pull of his style just fine.
love me ambiguous language
According to GeminiIn this video:The scene opens with Captain Picard looking thoughtful in his ready room. In the background, the Vocaloid star Hatsune Miku, now impressively dressed in a red Starfleet command uniform, walks into the room holding a Starfleet PADD (Personal Access Display Device).She stops and stands behind the captain. Then, in a move that would surprise even the Borg, Picard picks up a 20th-century-style handgun from his desk and calmly examines it.It's a fantastic edit, but a few small clues give it away:The Starfleet Guest Star: As you know, Hatsune Miku, while a cultural icon here in Japan and worldwide, never actually made it onto the Enterprise-D's crew roster.The Firearm: The biggest giveaway is the prop. Starfleet exclusively uses energy-based weapons like phasers. A projectile-based handgun like that is a massive anachronism and would be completely out of place in Picard's ready room, unless he was in the middle of a Dixon Hill holodeck program.This is another great example of combining real footage with a digitally inserted character to create a completely new, surreal, and humorous narrative.
>>106469755Isnt gemini paid? Do you use with some local client like kobold or silly tavern? Im trying to find good uncensored llm for images
>>106469755those are not good captions
>>106469705>Two elf female characters with pointed ears. The character in the foreground has long, wavy blonde hair, light skin, and wears a blue, off-shoulder dress with white fur trim. She has a worried expression and blue markings on her forehead. The character behind her has long, orange hair, light brown skin, and wears a sleeveless purple top. She has green markings on her face and is embracing the blonde-haired character from behind, with one arm around her shoulder. The background is a textured, dark green and blue gradient, resembling a forest or cave
>>106469887basterd bitch delete this
What's the most viable captioning method for deviantart-tier freak fetish stuff?
should i get my lazy ass out of bed and finish installing wan? how long do gens take with a 5090
>>106469998they'll take no time at all you fuckin ass
>>106469705>>106469887
>>106469998sure / depends on settings but in the order of some minutes with many settings. you can do 1 minute gens at a not too terrible resolution if you take the fast options with 4 steps or so.
>>106470011my bad man, i genuinely got no clue on this shit
>>106470024thanks anon
>>106469998it takes me roughly 4-5 minutes for a 720p 8 second video
>>106470020Just like my futa doujins!
>>1064699982.2 is so good it convinced me to try training video loras. >>106469701thanks bruv. got everything captioned, surprised it let me do them all for free
Damn, AI inventing new instruments.
>>106470133very mongolian
>>106470141I fed Gemini a Batzorig video screenshot lol
>>106469883Yeah but I didn't prompt it how to caption. I was just more interested to see if it could identify what was happening in the video at all.
>>106470085>surprised it let me do them all for freenp. I assume they do it as a means to capture audience share. Their free stuff is very generous. I honestly just pay for gemini as my GPUs are usually blasting away at training most of the time anyway. It's the best model for captioning in my opinion. And the fact google made veo 3 should indicate as much.
>>106469887>>106470020>Reference imageLiterally just prompt for the guy>A caveman carrying a wounded woman while pointing a spear a large flying bird over them while sitting atop a rocky hill by Boris Vallejohttps://files.catbox.moe/zcos9q.jpegLocal would never.
did comfy fix the shitty qwen edit text encode node?
>>106470186every fix breaks two more things. python was a mistake
>>106470176Do you have some special version lol? I get this on nano which isn't even close.
>>106470176Didn't think it would recognize it. Flux can't really handle a more complex composition like that.
>>106470235Not bad. Unlike the original nano banana can't show me booba, so a Chroma LoRA would win anyway.As for the results you're getting with nano banana, no idea what you're using. I can get his likeness right away even across other seeds.>A caveman with a shield standing atop a rocky hill while goblins are incoming. A woman kneels beside him by Boris Vallejohttps://files.catbox.moe/ak75k4.jpeg
ultra cozy
is there anything as good as veo3?
>>106470779Yes veo3 is as good as veo3.
>>106470211if not for toes I would say it's not a gen
>>106470885I was not aware she had a cat.
2.2 for vace soon?
>>106470940Expected miku to walk out of that...This thread is getting to me
>>106470885Who are those two from? I recognize them from something...
not what I wanted at all but okay.
bros... i beg... do loras work with qwen nunchaku yet... bros...
>>106470993Himawari and the flat chested one from Yuru Yuri
>>106469701>He removes a single cigarette and places it in his mouth.obviously wronghe was holding the cigarette already
>>106469755>Then, in a move that would surprise even the Borgwtfimplying they have feels
>>106471031There was that one episode where the borg had feels.
>>106470885Animate this.
>New furk postSee any issues here?
>>106471219>water is wet
hey bros anyone got a spare 5090 to donate :) I promise ill train some qwen ToT loras with it
>>106471248I also just found out he blocked me. But look at his loss. He's basically trained a broken LoRA and bragging about it.
>>106471219Is that 5600 steps??
>>106471219nans for days
I'm not terribly impressed by how Qwen handles traditional media.
>>106471307Ask furk to train you a nan lora for you.
>>106471219well yeah, they aren't giving consumer cards 96gb vram because it would destroy their enterprise market overnight. that's why i'm hoping for a deepseek-level breakthrough from china but in the hardware space. they already have modded cards. they are also making 96gb custom cards but they're kind of shit because low bandwidth, no cuda, and shit-tier drivers.
>>106471307The paintings themselves in the back is honestly really well done, just the girl is slopped.
>>1064699985 hours of genning god damn. having fun with the I2V
>>106471172just image AI generating FPS walkthough jump-scare game movies in perpetuity
I shouldn't have updated my OS.DRAM/VRAM management is kinda fucky now. God damn.
>>106471357It's not his post that's cringe. It's that he's bragging about his hardware while being unaware he is basically showing the world that his LoRA is stillborn.
>>106471711nta, but I can't help but question how the man is such a prolific (shit)poster seemingly everywhere but somehow missed that their training run was cooked from the go.
Anyone ever used the captioning tool in Onetrainer? Usable or megacopium only good for boorutags?
>>106471833I'm convinced his low intelligence robbed him of his ability to second guess and check himself and by radiating enough confidence in a field most people knew little about, he was able to accidentally grift his way to notoriety by just being a fucking idiot.
>>106471933usable, but you still have to check manually for any flops afterward, if anything it saves time by doing the heavy captioning for you.
>>106472188Can it do nsfw?
I know you can upscale and interpolate wan video but is there anything to fix any fuckups in the video like when something gets blurred out or things like that?
with wan 2.2 you can save the latent from the high noise sampler and reroll with the low noise sampler to hopefully get a better result
Can I run Wan on my M4 Max? How's the speed?
>>106470133>Playing 'viking boat'
>>106471711Are you telling me that ohwx man training of himself is not the true way ?Seriously this guy has been doing this for such a long time yet he has learning nothing, still seems to think there's some magic token combination, still hasn't understood that repeats are only for balancing training data when doing multiple concepts at the same time, doesn't even understand the principle of A - B testing and instead changes lots of parameters between experimental runs.Snakeoil salesman if there ever was one.
>>106472298You probably can, I saw someone posting that they got it running, the speed was something horrendous though.
I'm going insane trying to find a good noob-based model with decent coherence that allows some flexibility beyond basic tags. For example, this one https://civitai.com/models/1201815?modelVersionId=1491533 - you can actually add variations like 'blue glowing tattoo' instead of just 'tattoo' without it breaking. Problem is, these models all have shit mixes and can't follow artist styles closely like vanilla noob does. But I'm too much of a shitter to get kino results with plain noob. Is there a good middle ground model/remix that actually respects artist styles while being more forgiving?
>>106472485anything that uses (only) CLIP will never give you the control you seek
>>106472485Your best bet is to find a model that has an LLM text encoder slapped onto it. Idk if noob has a variant like that tho.
>>106472485illustrious has limited natural language support
beeg birb
>>106471573The 1girl machine keeps churning, but memory management sucks.
>>106472485https://huggingface.co/Minthy/RouWei-Gemmasomeone's been trying to stitch better encoders to sdxl but i don't see much difference so far
>>106471219>NaNThat explains those terrible LoRAs of himself.
>>106472717lul
>random character sheet out of nowhereThanks, I guess
>>106472904What did you prompt for ?
>>106472916https://genshin-impact.fandom.com/wiki/JahodaThe appearance paragraph xd. I just specified anime artstyle. I guess AI generated articles are good prompts lmao
>>106472904not a footfag but that little red foot is cute
>>106471391At least it draws really good legs
>>106472639That's not a Luger. Man.
beeg guy
>multiple seeds, samplers and schedulers>it keeps giving me ref sheetsThere is no way a paragraph can have such strong specific style "vibes" that it fucks with the model. Is this a thing,
>>106469887Seb McKinnon Lora for Flux? That's not available on civitai, I want to test it.
>an entire separate general of waisloppersmortifying
>>106473136wrong post>>106470020
>>106472717
It feels like progress is pretty stagnant after a few months. too bad the software still sucks ass and just got worse. any new models on the horizon to look forward to at least?
>>106473269qwen hinted at some updates but that's about it.
>>106473269Goddamn man we just had qwen not even 2 weeks ago. Is this zoomer brain I heard about?
Chroma loras are so easy to bake, all you need is:10-15 512x512 imagesnatural gemini captionsadamw optimizerconstant schedulerbatch size 1set it for about 2K steps (about 150ish epochs)
>512x512we're regressing, not progressing
>>106473301I don't know if I would call that good quality anon. This seems to have the same quality problem flux did if you trained at such low resolution
>>106473282probably some standard controlnet IP adapter thing for their existing stuff.>>106473289qwen isn't really as impressive as it should be at that size. synthetic slopped datasets are a step backwards and the two stage models for wan is just annoying for a 10% higher quality video than 2.1
>>106473301>Chroma loras are so easy to bake,Yes, it's shockingly easy to train Chroma loras effectively>natural gemini captionsDon't need this, JoyCaption is good enough, and if all you train is a single concept like a person (or even an art style assuming it's consistent) you can train with just a simple 'foobar' nonsense tag and it will have no problem training itYou didn't mention learning rate, for people I would suggest 0.0001 (1e-4) for art styles you probably want to go a bit higher since it's more abstract as a concept
>>106473365>qwen isn't really as impressive as it should be at that sizeyou are right but a lot of people don't want to believe it. people will dismiss the arena rankings as nonsense, but qwen ranks around the same place as hidream and it honestly looks it. i went to train a qwen lora and it was like 60gb worth of slop. it's incredibly bloated for a model that does not feel anywhere near the top-10. if it was good it would've made way more strides like flux dev did compared to SD3/SDXL (though at the time we didnt realize how impossible it would be to tune).
>>106473365You know I was gonna take you seriously but then>10% higher quality video than 2.1Ahh another vramlet seethe. Trust me if you can't run this you probably just dip from the scene, it's gonna get worse from here.
recommend me some cool Illustrious base model hidden gems
>>106473301people should just ignore chroma, and work on qwen. chroma is like bigasp. big potential, crappy results. maybe another model, merged with chroma will save the day
>>106473301when will girls stop having their feet buried in the ground?
>>106473446When you type full body shot or prompt something regarding footwear
Do flux loras still work with chroma
>>106473431>chroma is like bigaspcome on now lets not be disingenuous...at least bigasp was trained at 1024x!
>>106473264
>>106473365>>106473394>qwen isn't really as impressive as it should beI'd agree, but the alternatives in terms of prompt adherence and non-mangled hands/poses are kinda slim.But the shocking amount of sameface and general lack of variance between seeds hurt the model a lot.Picrel, 3 seeds, same prompt.
because qwen is DPO'd to shit, it has been said multiple times. you need to inject noise if you want an actual different image per seed
>>106473497Lack of seed variation is actually good for i2v gen purposes. You can change the pose or other details, and the face/character tends to stay the same. I agree it does take some tard wrangling to avoid unwanted generalizations.
>>106473473Some do some don't. You have try one by one.
>>106473497I personally like it makes editing an image much easier without just losing the entire damn composition.
>>106473360It's not the resolution, most likely prompted to look like a phone camera shotThis is from a Chroma lora I recently trained at 512 resolution
>>106473431Qwen's comprehension of traditional media styles, as well as creativity, is piss poor. I don't think it will be remedied by finetuningIt's a great model, but more experienced users will get more out of chroma.
>>106473431One of the really popular SDXL model makers is making a finetune right now, I've used a prototype lora of it and it is VERY promising.>>106473446This nigga likes feet!>>106473553Yeah they're prompted/oversharpened to look like phone shots after they're loaded to IG.
I'm new I2V and I'm following the guide. When I try to generate with the first workflow (https://rentry.org/wan22ldgguide) I'm getting:> ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")And ChatGPT insists that fp8e4nv doesn't work on a 3090. Is it wrong?
>>106473571>I don't think it will be remedied by finetuningI think you can, but it will be very expensive since the model is large and massively overtrained, not sure if anyone with enough money would think it's worth it
>>106473595I had that same issue when I tried a new comfy install, even though I used the e4m3fn models with my old install no problem. I dunno. Just try the e5m2 models.
>>106473571>remedied by finetuningThe fuck, if the distilled horseshit that flux was able to be unfucked by finetuning, a non distilled model should be 100 fold easier.
>>106473656qwen would take forever. it needs a bit more elbow grease and cash than chroma
https://chromaawards.com/I think 11labs is paying off civit to not add a chroma category because of this shit
>>106473571>Qwen's comprehension of traditional media styles, as well as creativity, is piss poorcorrect>I don't think it will be remedied by finetuningit CAN, but nobody will because the model is way too bloated. >but more experienced users will get more out of chroma.completely false
>>106473674People keep saying that but all big finetune needs a lot, chroma needed 105,000 H100 hours are you saying qwen would need more? SDXL needed a cluster, finetune will need big hardware for any model. It's such a non argument.
>>106473703chroma is 8.9b. qwen is 20b
>>106473703>are you saying qwen would need more?it entirely depends on the size of the model. the other anon is right, it's too bloated but it's also overturned which is why there isn't much seed variation
>>106473431qwen is overfit and bloated to shit. just do wan
>>106473723wan is slightly over it as well but at least it's in the realm of doable
>>106473301hello beautiful babe
>>106473703yes, qwen would absolutely need more because it's massive in comparison to chroma. chroma already had to cope by removing parameters and training at 1/4 the resolution of fucking sdxl. and even with all that, he still wound up spending $150k on it. acting like the compute costs for these models are the same as SDXL is simply retarded
>>106473703Use your brain
>>106473577>prototype loraWhere?
>>106473732Love it lmao>>106473766I've said too much.
>>106473797Oh you meant a chroma tune, that;s good was wondering how did you manage to make a qwen tune look so shitty lol
>>106469536Neta is not perfect, but this is the only anime model that can handle multiple subjects on screen without mangling them
>>106473431Most people outside of here are ignoring chroma
>>106473732>>106473301tranny hands
>>106473797>prompt for indian man>get a cholo
>>106473977>t. has futa images saved
>>106473720>white mans kriptonite.png
>>106473720
>>106473687Could be, damn
Are flux dev and schnell loras interchangeable?
>>106474328Hello, I'm trying to switch from Forge to ComfyUI. I prefer ComfyUI's interface because my entire txt2img + hires fix workflow fits on my screen without scrolling.The problem is that I can't get it to work correctly. I've posted more details in the attached thread. Any help would be appreciated. Thanks!json: https://files.catbox.moe/nv2b7k.json
>>106474388I think sdxl wants -2 clip layer
>>106474103tranny eyes
>>106473637Yeah that worked.
>>106474388clip needs to be -2. Not sure what you are trying to do with the tiled vae encode/decode nodes. Also 'BREAK' commands don't work in the default CLIP Text Encode nodes, there are custom nodes that use the A1111 parser if you want to keep them but in Comfy you should break each one out into separate text encode nodes and concat them.
>>106469492how do i generate abstract happy merchant memes? i am super retarded when it comes to prompting ai
>>106474602literally just img to img and play with the denoise
>>106474536Thanks I am looking here and /adt/ for answers. All those options were loaded by default when I dragged the gen made in Forge to Comfy. I will keep in mind what you tell me.
>>106474631you can also just disable the CLIP Set Last Layer node, ComfyUI will automatically use -2 with SDXL models.
>>106474631>>106474388I think it would be much faster to just make a workflow from the ground up
>>106470041Fuck me I don’t know how you guys put up with that, 5seconds for my 1girls feels insufferably inexcusably long as it is. I’m sure someone will chime in with “it’s worth it” or whatever but that’s into “too long for me to bother” territory. And that’s with the fastest consumer card.
>>106471050Well it’s a good thing the oral insertion lora bandit got bored, otherwise…ToT
Say I'm training a lora and I want 3000 steps total, with 20 images.Is it better to do 3 epochs with 1000 steps each, or 6 epochs with 500 steps each?Is there a noticeable difference between the two at same step amounts (ie, first at epoch 2, 2000 steps, second at epoch 4 at 2000 steps)