Retarded Node Devs Edition Discussion of Free and Open Source Text-to-Image/Video ModelsPrev: >>107010364https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/sd-scripts/tree/sd3https://github.com/derrian-distro/LoRA_Easy_Training_Scriptshttps://github.com/tdrussell/diffusion-pipe>WanXhttps://comfyanonymous.github.io/ComfyUI_examples/wan22/https://github.com/Wan-Video>Neta Luminahttps://civitai.com/models/1790792?modelVersionId=2298660https://gumgum10.github.io/gumgum.github.io/https://neta-lumina-style.tz03.xyz/https://huggingface.co/neta-art/Neta-Lumina>Chromahttps://huggingface.co/lodestones/Chroma1-BaseTraining: https://rentry.org/mvu52t46>Illustrious1girl and Beyond: https://rentry.org/comfyui_guide_1girlTag Explorer: https://tagexplorer.github.io/>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkBakery: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/b/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debo
>Pony v7what we wrong??
>>107017125Didn't want to jump ship from auraflow after flux was dedistilled
Comfy UI is life. I have been experimenting with it after being only a forge/wan2gp fag and the node thing is genius if you want to do anything more than basic gens.Also it pairs extremely well with LLM assistance. I can quickly vibe code a python custom node for whatever niche functionality I need instead of dealing with spaghetti hell trying to mix and match other nodes. I wonder if I can come up with workflows to cover needs unrelated to AI gen. I imagine its good for anything pipeline-ish with modular nature, feels like playing factorio IRL.
obese nerds using my images for video gens, nothing new under the sun
>>107017139it has shit tooling. node graphs are a feature, not the full experience which is the fatal flaw of comfyui. that and it's poothon
>>107017125Dogshit base model and who knows how much money wasted on futile training with sunk cost fallacy.Everything else is at best of tertiary importance.It's kinda crazy how it ended up being such a wet fart nothing burger after how sensational v6 was though.People were at least outraged and sad over SD3.Literally no one gives a fuck about that brony's autistic trainwreck now though.
>>107017125not using illustrious
>>107017125>abandoned artist tags, a suicide move after illustrious and noobai that contained hundreds of drawfags in the dataset>heavily censored the outputs>picked a mediocre model and decided to go through with it>requires autistic prompting style and SD3 tier novel sized prompts>wasted a shit ton of money in the process
>>107017173>it has shit tooling.What do you think is missing that is out of the scope of handling with custom nodes? Only thing I'm having some beef for now is queue management, which seems lackluster.Still I can already see me setting up some workflows to handle stuff of mine that involves clear composable modules to spaghetti with.
>>107017234canvas, sequencers, video editor, 3d editor, and vector drawing
please for the love of all that is loli save us ani
1:04 length output from LongCat. The original file was about 300mb, so this is severely compressed. https://files.catbox.moe/62kssd.mp4 is still compressed to fit in the catbox, but not as badly.It took about 4.5 hours to generate. LongCat works by generating multiple videos about 6 seconds long and then doing another pass to fuse them together. Based on this output, it doesn't seem like its default behavior is to repeat the action described in the prompt in each subsegment, whereas you'd normally want the action to be spread out over the full length.To get good results involving complex sequences of actions, it would proably work better if each segment had its own prompt, so that you explain in detail what's going on every 6 seconds. The stock inference code isn't set up to handle that, but it would probably work, although I don't know how it will interpret the fusion pass. For people interested in repetitive actions, it may be totally fine as-is.This sequence was supposed to be anime-styled, but it refused to do it for anything sci-fi themed like this and would only produce 3DCG variants. (it will do anime for other subjects).
>>107017252ywn baw
>>107017244this is why comfy will always be shit, he will never add actually comfortable tools
um. songbloom is singing chinese lol
>>107017139>if you want to do anything more than basic gens.What are you even doing that requires a massive workflow something like swarm can't handle?
>>107017269snake oil addiction
Can I be put in the next OP?
>>107017255Watching this feels like you are developing schizophrenia. It's almost debilitating.I don't think it is working well.
>>107017276no, just ugly spaghetti screenshots. comfyui hasn't ruined the thread enough
>>107017255>took about 4.5 hours to generate
>>107017255based patiencechad
>>107017244In the context of out of the box quick AI gen infused gooning sessions I can see your point. The other UIs feels faster if you don't already have the comfy workflows integrated in other ergonomic software via API or something.>>107017269Never used swarm but niche pre and post processings, stitching together and daisy chaining different gen models, the possibilities seems infinite, specially if you already have a porn stash with years of curation.Just now I have set up a mpv keybinding that sends the current video frame to a comfy workflow that overlays my crotch and dick at the bottom of the frame and run pov insertion loras and continuations. Its a great time to be alive and women are done for.
>want to make fantasy races>realism model wont let me or it looks like a cheap costume>have to make my own races like black asians or humans with photoshopped proportions
>>107017369it's bad if you are a beginner or gooner if you just want to hop in and make shit and it's bad for professionals if you specialize in digital art or 3d modelling. it only really clicks with vfx node nuts but even then it breaks so many rules of node software they have their panties in a bunch. the only people that actually like comfyui and don't have a laundry list of problems are midwits who feel smart using it. this is funny because it pisses off smart people so much for actually being a pile of shit
I just found out something that SongBloom does...
Why are all of the examples of SongBloom wrong?
>>107017402>the only people that actually like comfyui and don't have a laundry list of problems are midwits who feel smart using itOr those are just regular power users / programmers facilitating hobby stuff instead of acting pretentious with professional needs.
Please saar the clients
>>107017454people clearly try using it in a professional sense and it's retarded
>>107017454>programmersweb shitters and poojeets aren't programmers
https://www.reddit.com/r/StableDiffusion/comments/1ogx7j4/chroma_radiance_mid_training_but_the_most/Those images / that WF in the comments...
>>107017234>queue managementWish I could edit queue and then add it back in the same position.
>>107017256you will never be a wataa
>>107017136auraflow is shit, but de-distilled flux is just as bad. look at chroma who wasted 50 epochs trying to make it unmelty yet still failed. both of them should've just waited for qwen
>>107017255>each segment had its own promptThat's not possible?Man, that's already something that pisses me off with wan.
All my gens are coming out shit for some reason
>>107017513are you retarded?chroma blows away qwenhttps://files.catbox.moe/fe427t.pnghttps://files.catbox.moe/d5gu67.pnghttps://files.catbox.moe/ui7a5x.png
>>107017523The flash / chroma HD is the version for these btwcustom node: https://github.com/silveroxides/ComfyUI_Hybrid-Scaled_fp8-Loaderhttps://huggingface.co/silveroxides/Chroma-Misc-Models/blob/main/Chroma1-HD-flash-heun/Chroma1-HD-flash-heun-fp8_scaled_original_hybrid_large_rev2.safetensorsLoras:https://huggingface.co/silveroxides/Chroma-LoRAs/tree/mainWF in image
>>107017504But at the same time its a huge plus that it actually runs headless on the comfy server. Other UIs I used seemed to handle it client side/browser based, which was a pain. Now I can finally queue a batch, close my laptop and go to sleep, or just close the tab and open it up another time without fucking up everything.
>>107017513Chroma is an amazing model, the fuck you're talking about
>>107017523chromamelt is unavoidable and only blind retards choose to ignore this shit. the model is a mess thanks to de-distillation. literally every aspect of the images is falling apart, it looks terrible
>>107017558you have to be trolling, qwen has nothing on this
>qwen has nothing on this
>>107017544Sure, I do like it overall, just wish it had more user friendly features.
>>107017578show me a image remotely similar both style wise and quality wise from qwen or sdxl or illustrious at the same res, also that is the point of chroma radience which is training, all models have that issue with tiny details
>>107017578yes sis, fucked hands can easily be fixed versus no seed variability of qwen that cant be fixed
been trying for a while but I'm stuck on this so thought I might ask regarding >>107017425 Cause I found a solution buts its the most caveman shit I've ever done with a computer.In order to solve it turns out it works in video if I split the video file in chunks of 25 Mb each and then join each little tiny piece into the original movie like a virtual puzzle.Just in case I'm missing something very obvious like GGUF models do diving the load by itself but with a video file as input, Its takes long to assemble and even longer to run the whole batch but it does the same job as paid software like Unifab which is worth around 300USD and its able to the whole video file in the same computer without any hassle and in one piece.
>>107017484>>107017578Part of me thinks that has to be trolling but I know that it isn't.I am not even trying to be a hater and hope that some finetune will unfuck the schizo anatomy down the line but he really posted that angel demon image with errors on the wings, hands and nonsensical snakes as the example of chroma not having any problems.Also I find their "come join and our cord and dilate with our xisters" attitude to anyone asking for documentation very bizarre.Like seriously, they have spent 100k + on training and can't be assed to put whatever material they have anywhere public? Do they not care about adoption?
>>107017578>$200k lateronly coomerboomer retards defend this failbake
>>107017606who is they? I dont think lodestone is running any kind of business, he is just making models, I dont think he is monetizing it at all
ovi is now currently supported on wan2gp with the latest v9.2 update. Not impressed with ovi at all, the audio quality is significantly inferior compared to ltx2.ovi: https://files.catbox.moe/5n22sv.mp4ltx2: https://files.catbox.moe/clx4i3.mp4
>>107017619only low iq browns cant extract the 95% of good from something and focus on the fixable 5% bad
>2b params: just fix the hands manually!>4b params: just fix the hands manually!>8b params: just fix the hands manually!>16b params: just fix the hands manually!why is AI getting more expensive yet not actually improving?
ok, trolling, got it, I can go back 100 threads and only find worse gens from worse models
>>107017632at least it does a pretty good job with the facial expressions of someone being forced into existance to say shit like that
>>107017519The default LongCat inference code doesn't do it, but it's easy to change it so it does (I didn't test it yet because it takes so long).I'm trying long I2V now and seeing that it's using the same input image for the start of each segment which obviously isn't going to work well. Changing it to use the last frame of the past segment as the input for the next segment. The tricky part will be knowing what the prompt should be for each successive segment in advance, since it's unpredictable how much of the prompt will actually be incorporated within each segment.
>>107017634vae, vae compression is the issue, even qwen's which only slightly hides it by being so overcooked that it is fully fitted to images
>>107017634Qwen is actually pretty good at fixing hands, it's the main reason i use it for inpainting with better models.
>>107017633>512x512>broken hands>melted details all over>no artist tags>no characters/celebsderanged coomers will see this inferior-to-sdxl model and start creaming themselves simply because it can render a titty.
>>107017634>manuallyyou could automatically fix the hands quickly since the earliest days of a1111 sis, thanks for exposing yourself as a retard
id rather have 99% perfect details with crazy good aesthetics and prompt following AND style flexibility and then just fix small details with adetailer over having nothing but the same image with better small details due to being overfit to death on a single styleIts one or the other, at least until the vae issue is fixed
>>107017651Only reusing one frame will lead to the same issue wan have, not having information related to motion with the new segment.
>>107017125>PonyHim getting in Loadstone's ear to make a equally shitty model. The irony is this piece of shit is worse than chroma and instead of following THE FUCKING DOCUMENTATION he decided to listen to pony fag to make a stillborn model that can't even work with loras. Fuck both of them for burning so much time and money
>>107017656>512x512chroma gens at 1024x1024 by default or even more, like 1088x1344 without needing a double pass >>107011247again, brownoids literally just lying online every single thread, hope you're at least paid to be this retarded
>>107017623>who is they?People who regularly post about how great chroma is. Like that redditor. We have a few here, too.>I dont think lodestone is running any kind of business, he is just making models, I dont think he is monetizing it at allI guess this explains the attitude. 100k is awful amount of money to sink into this kind of hobby though. Another case of rich furry and suspicious amount of disposable income I suppose.>>107017656You can simply use flux loras for last two or train your own to be fair.Only illust/noob knows major styles and characters and pretty much no model knows celebrities out of the box.
>>107017656The gooner burden lies in seeds with things that ought be nice but have 6/7 digits.They can't be bothered to apply a mask that is wakanda levels of science fiction.
>>107017700He was referring to training resolution.Most of the chroma training was done on low res.Which perhaps partially explains fucked hands.(Remember how SD 1.5 was unable to learn hands at all due to that resolution?)
>>107017689After it generates each individual segment, it regenerates everything only using the first outputs as a reference, which ought to take care of that.
>>107017709>He was referring to training resolution.So he repeated himself twice in the first two points again chroma seething about hands? Even lower iq
>>107017709No, all recent models were trained at low res first. Wan was 256 res for most of its training
>>107017653>Qwen inpaintingshit, I didn't think of thatI tried it with i2i and it didn't work very well, but I might give it a shot
>>107017634>Flux devs give instructions how to finetune>furry decides to ignore it and do random retarded shitIt's tiresome
>>107017701loras are not a good alternative to character knowledge because they do not interact well at all and bleed all over. people continue to ignore the major flaws of loras as an excuse to cope with bad models
trying to imagine the kind of anon thats genuinely taken by surprise at pony
>>107017766Loras don't even work with chroma in the first place and are volatile between seeds mostly because token weights are fucked beyond repair.
>>107017782Personally I'm mad because pony fag is partially responsible for chroma turning into dogshit.
>>107017780I think the most surprised were style cluster 545 and style cluster 217.
>>107017788>trying to push that chroma is bad still>>107017782chroma loras are trainable in diffusion pipe, simpletuner and aitoolkit
>>107017803
>>107017800I made Chroma loras and the model itself does not respect the outcomes this has been discussed at length in this thread. That is a fundamental flaw with the model
>>107017484https://www.reddit.com/r/StableDiffusion/comments/1ogwi51/holy_crap_form_me_chroma_radiance_is_like_10/2 posts like that today, can't be organic
>>107017743Huh.I guess higher vae channels make the point moot or whatever. I am not going to LARP that I have a detailed understanding.>>107017782>because token weights are fucked beyond repair.Can you elaborate on this?>>107017788No idea how this ties into the rest but okay.
are we saved?https://xcancel.com/HuggingPapers/status/1982360432514883795#m
>>107017844>respect the outcomeswhat? try English this time. Or are you being intentionally unobtrusive to try and hide the fact that you are full of shit?
>>107017846>>107017484Obviously notWhat I'm still trying to figure out is why is he still training off of his broken architecture instead of restarting from a earlier epoch >>107017858Use a non realism lora and post 5 back to back seeds using the same prompt, I'm not fucking spoon feeding you
chroma bros, have you made more celebrity loras?
>>107017866>broken architecturecan you explain how its broken?
>>107017868usecase for celebrity loras?
>>107017873>Use a non realism lora and post 5 back to back seeds using the same prompt, I'm not fucking spoon feeding you
>>107017859that is what lodestone is already doing lol
>>107017879>usecase for celebrity loras?
>>107017859holy based. >inb4 nobody ever implements it in a real model
>>107017884no, pixnerd doesn't improve the training speed and works on the pixel space, that one works on something different than pixel space and latent space
>>107017892Me on the bottom
>>107017892>her head is burried in Panty's pantieskek, I got the joke!
>>107017899>lodestone once again starts a new chroma model
>>107017859>SVGHow can you trust anyone stupid enough to reuse initials like that?
>>107017868Hypothetically if I were to train one where would I even upload it at this point?Can't register to that broken piece of shit seaart at all and tensorart asks for glowing email.
>>107017879gooning
>>107017986huggingface then https://civitaiarchive.com/
>>107017859>x62 faster training>35x faster inferenceMust be a catch somewhere, i assume it will take longer to train, although even if its literally slightly worse resource wise it will be worth not having a vae quality loss, especially for the future of image gen models, which will be edit models, which need to lose their vae
>>107018054they're saying it gets to the same quality level 62x and 35x faster.
>>107017909If it's 62 times faster to train and 35 times faster to gen I wouldn't mind him doing that.
>>107017866Well that anon wasn't me.I did what you asked and yes I can see that consistency is a problem.Can you now tell me why?
>>107017859is this why we got the slopped look? since everything is mixed together it does a sort of an average of concepts instead of focusing on the concept itself
>>107018079thats on the usual initial toy model size, seeing if it actually scales is what kills papers
>>107017859Tencent be like:>"we're gonna pretend this never existed, STACK MORE LAYERS"
>>107018121Or deliberately p-hacked BS that is useless in practice.So many fucking papers on arxiv that does not yield promised results when trying the experiment on different data/params.
>>107018111The creator decided to obfuscate tokens and then do yolo training methods not backed by any data. The model got destroyed because he decided to make it a yolo sandbox half way thought.
>>107018043Well hf regularly purges "problematic" models.I wonder if a gibberish name would fly under the rather or do they directly use that website to find them.
>>107018144>So many fucking papers on arxiv that does not yield promised resultsyep, that's called the reproduction crisis, a lot of researchers are fraudshttps://en.wikipedia.org/wiki/Replication_crisis>A 2016 survey by Nature on 1,576 researchers who took a brief online questionnaire on reproducibility found that more than 70% of researchers have tried and failed to reproduce another scientist's experiment results (including 87% of chemists, 77% of biologists, 69% of physicists and engineers, 67% of medical researchers, 64% of earth and environmental scientists, and 62% of all others), and more than half have failed to reproduce their own experiments.
>>107018156Create a torrent, port forward the client port, and post the magnet link on the civarchive page too as another mirror
>>107018153>yolo training methodsAI in general then? Everything is is a first in this field, wtf are you on about>destroyedlooks amazing to me>>107017133>>107017523
>>107017990perfect non real girls are hotter than random celebs
>>107018117It's a known issue and one of the reason details like hands or feet look bad.
>>107018172>Flux creators give training guidelines>decide to ignore them when the goal was to make a flux model without restrictionsAlso since you're pointing out single gens it's clear to me you're not at a level for this discussion, if a model is not consistent then it's worthless especially when you have to put so much time investment into creating chroma gens compared to other models even on top end hardware.I think we're done here because you fail to realize that 1 out of 5 gens is not acceptable when you're trying to maintain a look or style especially at the dog shit speeds chroma runs at
>>107018158actually this survey cannot be reproduced
>>107018153>obfuscate tokensWhat does this mean exactly?>do yolo training methods not backed by any data.I believe this, judging by the stream of weird experiments in his hf repo, and the results of the training speak for themselves. But could you elaborate? Do you mean stuff like mixing high res and low res steps?>>107018166I am cursed with CGNAT unfortunately.
>>107018189Show me a model that does as well as consistently as chroma at as many styles and is not qwen which is style locked
>>107018156>regularly purges "problematic" modelsonly if people snitch on them, they don't really check everything by themselves
>>107018195kek, the irony is beautiful innit?
>>107017859https://arxiv.org/pdf/2510.15301>We also find that classifier-free guidance (CFG) is less effective in our framework, indicating the need for better alternativesnothingburger
>>107018200Anything XL anon again you're being disingenuous so I'm going to ignore you now. >>107018199If you haven't noticed Chorma can't do artist tags despite having a dataset that includes various artist and styles, this is a conscious effort done by the creator and has historically lead to massive issues with models as we saw with SD3 being dumbed down and the latest victim being pony V7. It always seems to do massive damage to models because concepts and other things that are often put into separate tags or related to some tags become completely fucked and you get stuff like inconsistent styles or broken anatomy like SD3 not being able to produce an image of a woman lying in the grass because they decided to obfuscate anything close to a sex act.
ignore the anti chroma troll, notice how he never posts gens
>>107018228>notice how he never posts gensthat sentence is /sdg/ coded btw
>>107017859Another dinov3 embedding model interesting>>107017986>where would I even upload it at this point?https://catbox.moe/https://gofile.io/
>>107018228Now post 5 images with the same prompt within the same batch so I can laugh at you>>107018234I have a feeling it is one of those waste of space bro, we already tested chroma for weeks and decided the model had deal breaker flaws I bet if I did post a gen he's going to shit his pants and reveal himself.
>>107018222I see what you mean.I actually haven't tried specific styles with chroma, mainly just realism stuff.Shame that he also did this stupid BS.
isnt there any local models that you can interact with like llms? >generate a unicornoutput.jpg>wait, make the horn purple, and make it bit longeroutput.jpgWhy isnt memory a thing for image models?
>>107018238>Now post 5 images with the same prompt within the same batch so I can laugh at youdo that with ANY model, any, it will either have issues OR the model will be so overfit you will have no variance at all, are you arguing against AI in general? Because every model has issues every few gens, chroma just has the least that is not also locked to a single style like illustrious / qwen
>>107018205impressive chroma gen
>>107018246hunyuan 3 is supposed to be like that, like an LLM that can generate images (why it's so big)
>>107018246edit models do that but they are slow and have to be loaded after the initial image is generated
>>107018246The memory is your prompt and an I2I workflow
>>107018249You're too stupid for this conversation please go back to your containment thread, no other model besides chroma is unable to maintain the same consistent art style even with the aid of loras
>>107018274thats what I thought
Give me a new idea for gen
>>107018257Is that the first model? So can we expect a distilled/smaller model in the future? I want to interact it like a LLM model with LM Studio. Kinda annoying trying to install billion different dependencies to generate image with python for every new model.
>>107018284Enormous ufo ominously hovering over the landscape of your choice
>>107018294holy shit... imagine if he cussed?
>>107018284plump bbw milf
>>107018294omg it can do 1character?? best model ever!
>>1070183091character more than you can looooooool retard
I'm tired of disabled retards trying to troll in this thread>>107018111Is why Chroma is shit and trained incorrectly, it shows the key flaw with this model and the retard is going to continue until he can find something else to annoy the general with.>>107018244I think the model is fine with realism but that's mostly due to flux so there's that positive to it. Realism is so strong tags like selfie will make a anime lora turn 3D in some seeds.
ldg mustve been linked elsewhere with this newfaggotry
>>107018322No it's the same faggot that griefs this thread daily, sora is now irrelevant and the api bullshit gets no traction so he's cycling through his autistic bullshit.I do think chroma is good for some memes if you can cope with the art constantly being inconsistent between seeds
also you basically have to use a style cluster or it looks like garbage, there are tons of different ones
>>107018335I think a lot of the initial tests disregarded the prompting instructions and tried to a-b test models with exactly the same prompt, getting terrible results.pony 7 still doesn't look great, but not as bad as the first mutagenic horrors people initally posted.
>>107018352posted wrong image at first:pony is better than originally thought (still not good) but it has major issues, he did not do uncond dropoutTo somewhat fix it spam tons of random nonsense tokens into the negativeIt is also somewhat locked to long prompts = good image
>>107018321>>107018322And lastly, and once again I agree with you that it has issues, do you think it can be salvaged with extensive finetuning?If someone were to train it further with say 50000 decent images and proper methods could they unfuck it for like a few thousand bucks?I am asking this because I know that a major SDXL finetuner (actual finetuner, not shit-mixer) plans to move on to it soon. I wonder if I should have some hopium for it.
SongBloom chads have something to look forward to:
why does anon sometimes type like this
>>107016852the prompt was this in all cases:`a traditional media paper texture watercolor \(medium\) painting \(medium\) depicting a solitary crowned forest priestess rendered in fine inked linework and layered watercolor washes with an Art Nouveau botanical sensibility. A Caucasian woman, approximately 25 years old, stands in the middle of the composition, full body visible, facing forward with her eyes closed and a serene expression. Her long straight blonde hair is parted in the middle and falls down past her shoulders. She wears a flowing green gown with graduated watercolor tones and vertical drips, a fur-trimmed cloak at the top, and an embroidered chest panel showing a prominent crescent moon above stylized leaves. Her hands are on her hips with her left hand on the left side of her waist and her right hand on the right side of her waist. A delicate crown made of antlers, tiny beadlike orbs, and fern fronds rests atop her head, and a faint circular halo inked behind her head frames the crown. The background is a dense mossy woodland with gnarled tree trunks framing left and right, climbing ivy on the right, and tangled roots across the bottom foreground. Bioluminescent mushrooms with glowing undersides sit bottom left and bottom right, casting warm orange glow upward. The piece shows visible brush texture, wet-on-wet gradients, and pen hatching for detail.`base AuraFlow 0.2 almost certainly didn't actually know any of those generic Booru style tags at the front and yet it still did better than Pony V7, which basically seems to be ridiculously over-reliant on the completely undocumented `style_cluster_xyz` tags.
>>107018359You would need to go back to a earlier epoch and train it from there and it will take a long time. I really think he should have just trained it like flux and there wouldn't have been any issues. The model degraded the further along it went with training which is sad. I'm tempted to test loras with older models but I just can't be bothered to do it desu.>>107018362Redditor most likely
>>107018359lol no
>>107018362Redditors colonized this website and started reddit spacing everywhere for "legibility" or whatever.
>>107018352I would be MUCH more forgiving if there was actually ANY FUCKING DOCUMENTATION WHATSOEVER for the style clusters. (There isn't). Nobody fucking knows what any of them consisted of dataset wise, somehow, which is ridiculous. The model is literally useless until he publishes a list of every single cluster with at the very least a one-word description of what they're supposed to do.
>>107018366>base AuraFlow 0.2 almost certainly didn't actually know any of those generic Booru style tags at the front and yet it still did better than Pony V7, which basically seems to be ridiculously over-reliant on the completely undocumented `style_cluster_xyz` tags.it's irrelevant what the original auraflow was trained for after the pony tuning. if you're not using "score_9, rating_sensitive, style_cluster_430" even though it sounds retarded you'll obviously get bad results. using them doesn't mean you'll get /good/ results, but if you leave them out you're not even giving it a chance.
>>107018359i can think of at least three other models that are more worthy of tuning on ~50k images
>>107018358I don't think anons should waste their time with that flaming piece of shit. Everyone warned him and he didn't listen and now he has a model even his retarded ass doesn't understand.>>107018366I asked a few questions in the discord and his drones got overly defensive when asked how and why would he implement this tagging system into a model with zero understanding or documentation. Fuck them and let them fester in shit absolute clown car of a model.
>>107018359In my experience testing it a bit you can clean up Chroma reasonably well with even ONE lora, but it HAS to be trained at native 1024x1024, and it HAS to be captioned with proper (preferably hand-checked) English natural languae captions, not slopmaxxed broken grammar chink / jeet shit
>>107018388why 430 specifically when there's apparently literally 2048 clusters? or was that just an example?
melband roformer is neato.vocal separation.
im such a worthless gooner
>>107018359>50000datasets of legit meta changing finetunes are larger by orders of magnitude
>>107018394it's possible that they literally do not have documentation of the styles, because they trained a classifier first that assigned the style tags.it's linked on their guide at https://civitai.com/articles/21107/captioning-and-prompting-primer-for-v7they probably used that classifier and a captioner and never saved and correlation with artist names.
Where do people get ideas about aesthetic tags etc. in Chroma? They do seem to have an effect but what's the source? I don't want to have to join this guy's discord just to learn the basics of how images were tagged for training, that should be public
>>107017166You still haven't posted catboxes revealing the model and lora these gens are using. You claimed it's Chroma, but I am still waiting for the Catbox.
>>107018427All the more reason to ignore his garbage model. Also if I recall the model has the same style swing issue chroma has with less features.>>107018436Chroma guy talked to pony guy and then we got that which hurt the model, he said he wouldn't do it and he copied the retard's homework.
>>107018425I really think (if we're talking about photo stuff) as I said above you could really tighten Chroma up quite a lot by just training it on a dataset of ~1000 - 5000 actual photographs (no other content types whatsoever) at 1024x1024 with high-quality natural language captions.
>>107018361AceStep 1.5 mogs that and will be released soon. And apparently, the Qwen team is working on Musicgen as well and they confirmed it will be open
>>107018438If you can't tell what model an output comes from you should just give up honestly
>107018460Stop replying to this bored dipshit his disability is going to get cut off in a few weeks so feel bad for him
>>107018438some of them do look like good examples of what Chroma can sometimes output. Other ones I've seen from that guy look suspiciously like Flux Krea moreso than Chroma though.
>>107018468There are a couple of gens from him that seems to be Wan 2.2 with a Lora or a SaaS model, but he keeps being a faggot withholding information and not posting the catbox
>Find a combination of an image and prompt that gets consistently good gens across several seeds in WAN Lightx2>Load it into Wan2GP to generate a longer and higher-resolution video using full 30-step WAN 2.2>Get some hokey rigid jerky 2x speed crap>mfw I waited 2h30m for this shit
>>107018425My bad. I didn't realize more modest SDXL finetunes below pony/illust level still trained on millions.>>107018415>>107018447I am skeptical of the claim but would be great if you can publish the lora that can do that.
>>107017653>>107017744can't seem to find the proper way to do this in comfy, any tips?
>>107018451>will be released soon.:^)https://vocaroo.com/1niQmJYusH4G
>>107018511they don't, the only other super-high-image-count finetunes ever done (starting from a literal base model as in SDXL) are like Animagine and BigASP. Starting from something like Chroma which has heavy concept knowledge already though is going to be a totally different story even if Lodestones insists it's itself a "base model". A high-diversity high-quality dataset could make it way more coherent with nowhere remotely close to millions of images.
>>107018246Not quite local but doesn't Tensorart have this on their site? Where you can prompt any model they're hosting with a chat LLM?
>>107018054it's VRAM usage. the vae is for converting to and from latents. openai already did this for years because of the compute they have access to. it's faster but the entire point is sd1.x having a vae in the first place was to fit on consumer gpus
>>107018548ramtorch will save us unironically
>>107017844what do you mean by that exactly?>>107017800it's even been in CivitAI's trainer for a while now
>>107018257Hunyuan 3 should be able to this in theory, but they never added support for it. I tried experimenting with it anyway (it does work as a VLM, so you can feed it back the images it generates) but the memory use was intractable.
>>107018491That's the fun of wan>2.1 fluid, wavy and wobbly movement but often morphs, changes input image and poor quality>2.2 believable, very high quality and "realistic" movement but too rigid and stiff2.2 just isn't as fun. It tries too hard to be realistic. At least with 2.1, I can slap on 256 lightx2v i2v lora or/and PUSA to retain character consistency.
How could a model even begin to learn how to draw fine details when the VAE can't even encode it properly?
>>107018648that is indeed the issue
>>107018491using a seed that was made from lightx2 wont work good. are seeds even still relevant for trying to get the same output with wan?
>>107018648>>107018651Yep.Despite being still 4 channel the SDXL vae was a lot better than the turbo garbage in SD 1.5 (for which you wouldn't even need comparison node to see how much it destroys the image)Hence why it was still a noticeable improvement over its predecessors in term of fine detail.
>>107018648Hunyuan 2.1 has a 32ch VAEActually, I don't remember if it uses it for the full process or only the refiner.
>>107018689>furry footum
>>107018689Hunyuan 4.2 has a 64 channel VAE
>>107018648how about flux vae?
Clothes changes for wan lora: https://civitai.com/models/2077374/sudden-outfit-change?modelVersionId=2350576get it before it's banned
>>107018648This is what chroma radient is upposed to avoid.
>>107018729
>>107018737why would it be banned? There's plenty of i2v sex acts, just not ones where there's a thought crime of it being non consensual
>>107018757
>>107018773I expect it to be banned the same way nude loras for qwen image edit systematically get banned.
>>107018773someone somewhere can use that to make a real a woman be in her underwear, that's super dangerous tech
I am curious does anyone know how major models SD, Wan, Flux, etc. handled bucketing during training?Did they use 64 like lora trainers, lower step numbers or resize and prepare all images separately beforehand?
>>107018778because making her clothes go away is perverted, but her grabbing two cocks and going at it will make her giggle
>>107018791Don't ask me to find logic in that.
>>107018447I trained a Chroma rank 64 lora on 500ish images, should I release it? It sadly still needs negative prompts to get rid of the blurriness and lowres look (despite the fact the dataset consisted in highres images), but it does fix some anatomy issues Chroma has such as holding swords, guns, bows etc (but still far for perfect)It's not as sharp/realistic as Chroma Flash but I think it fucks up anatomy a bit less often
how the fuck do you search while banning tags in civitai?
>>107018775>>107018757>>107018648>>107018729If you guys want quantifiable data, a few months ago I made a small experiment with encoding and decoding image with a vae and calculate VMAF score wrt original image.Note that despite being objective values you should consider these numbers rough placements rather than absolute indicators of quality as one model can compress one image more accurately than the other and they can end up trading places in another image. But it is still indicative of broader trends.Also there is some compounding effect from running it twice for compressing and decompressing.SDXL 66SD1.5 55Flux VAE 89.52Wan 2.1 VAE 84.12Wan 2.2 VAE 86.15 #48 channel one for 5BHunyuan Video vae 89.85Sd3 79.72SD3.5 vae 80.44
>>107018823What does it do anyway? It's compression?
>>107018823did you calculate the video output vmaf after saving videos in lossless quality?
>>107018834Variational auto-encoder is a lossy compressor, yes.Used for speeding up training and lowering VRAM requirements for consumer inference, at the expense of some quality.>>107018838Lol no.I've fed the same image to all vaes.Works fine on video vaes too as it just gets processed as single "frame".
>>107018794>Don't ask me to find logic in that.I realized that my life had improved significantly mentally once I stopped caring about the logic of society, it's completly incoherent and inconsistent, but that's just the way it is. It do be like that. Stoicism is rad babyyy!
>>107018859:^)Want to talk about the trinity?
>>107018823Same trend I am seeing as I go down my list of VAEs and eyeballing the error image brightness. Flux and Hunyuan are best for the random 3 anime images I tested.
>>107018888checkedI don't think it's comfy. I think it's a vae thing.
it's funny how Qwen Image Edit doesn't give a single fuck about the original image's style, it'll add the new element like you'd add something on paint and call it a day, that's soo lazyyy
what can i use for non realistic but also non anime? i find illustrious doesnt do my style. people shit on pony but i can get it to do what i need it to do as a base, but i need to do too much polishing and manual work
>>107018899That looks very cool, however.
>>107018899some people really knew how to wear suits with class
>>107018927almost any suite looks good if it's tailor made for you.
>>107018796rank 64 seems kinda high DESU, at least if you mean that in Kohya scaling where it's gonna thus be > 500 MBwhat training settings? was it trained at 1024x1024? If not that's your problem lol
SongBloom again. I find the input to be... sorta not really that controlling. But, you can get good gens.https://vocaroo.com/1mtRTQX4ZRnG
>>107018927>>107018937more like "when you're handsome, everything fits well for you"
>>107018900>i find illustrious doesnt do my style.Are you using a shitmix?Anyway base noob can do a wide variety of styles in my experience.
>>107018796Yes I am interested.Going to bed now but would check it out later.
Looking at the info for SongBloom, it's likely they'll be able to offer midi control at some point.
>>107018955that sounds great. i tried ace step and it was crap it couldnt even follow a consistent beat.
Has anyone tried the longcat lora on native yet, does it work? I think an anon earlier tried the full model but too 4 hours or something https://huggingface.co/Kijai/LongCat-Video_comfy/tree/main
>>107018980>>107018955The chorus is pretty great lol.https://vocaroo.com/163Fm1cacGS7it skips lyrics some, idk why.
>>107018990>I think an anon earlier tried the full model but too 4 hours or somethingI'm looking at the benchmarks and it seems to be inferior to wan 2.2, so I don't see the point of using it
>>107018985Yeah, SongBloom really is solid, but weird in how really rn you can't much control style. In theory you give it a sample, and maybe it's my settings, but it uh... kind of does its own thing.There's supposed to be a text prompt version so you don't have to feed it an audio style sample, soon out:>>107018361Maybe by December???
WAN retards:is there a wan that can take audio and lipsynch it?:^)
>>107019052lol it made it unrealistically short.
>>107018952>what training settings? was it trained at 1024x1024? If not that's your problem lolI used diffusion pipe's defaults. LR 1-e4, 68 epochs, over 500 images, took me about 3 days
>>107019072>took me about 3 daysDamn, what GPU?> 68 epochs, over 500 images,34k steps assuming no repeats and batch size 1?Perhaps not that slow then.
>>107019113>Damn, what GPU?Two 3090s>34k steps assuming no repeats and batch size 1?1 batch per gpu, no repeats per epoch
>>107019140what exactly is this lora, just photo realism?
>>107019140her forehead is red, did she hit her head or something? lol
>>107019173that car probably has those automatic seatbelts that stab you in the neck when lazily entering the vehicle
>>107019072idk what diffusion pipe isso you don't even know what resolution they train at?
>>107019148I trained on "everything", not just photorealistic imagesThe goal was to fix Chroma's anatomy issues as it mostly has images of humans making poses or interacting with objectsAnd of course, it has some personal flavor to it (for example, it can consistently make 1990s ad scan photos without adding unprompted texts unlike the base model, it can make comic book artstyle without outputting comic pages, it can make ingame screenshots without TV offscreen shots unlike what base Chroma does etc)>>107019207I trained on 1024x1024, sorry, I forgot to say
Can I merge 2 6.46gb sdxl models with 12gb vram? I haven't merged a model since 1.5 days and I remember running out of vram sometimes with 8gb.
>>107017112Are restarts a meme? I usually train with cosine with restarts and Adamw8 but I heard they are useless with adaptive optimizers. Anyone knows if this is true?
>>107019258how perfect are your adaptive optimizers?in most cases doing random shit from the stuff that could be possibly working and seeing what sticks is still the way to go
>>107019140Were this many steps necessary or did you just go for overkill?>>107019258>>107019291Restarts aren't "useless", they straight up rape adaptive optimizers (Or at least just prodigy, haven't tried the rest with restarts.)I believe they can boost quality for non adaptive optimizer like Adamw8 though.
>>107019356>Were this many steps necessary or did you just go for overkill?If anything, it's still undertrained / it's far from overfitting, lol
>>107019356Ok I had a fucking brainfart there. Excuse me.What I am saying is don't use it for Prodigy but you should be able to use them with Adam(8bit).
Who cares about music, is there a text2sfx model?
>>107017534images uses different workflows, the bottom two use a ton of custom nodes and are have stack of loras loli can't replicate your results in the first one, i get a very blurry renditionsounds like its back to v48 for me
>>107019394me I do, me me meI'm trying another source file :^)It's promising, we'll see.
>>107018737Did anyone figure out how to get it to change to nudity?
>>107019245I hope you're at least block merging them.
I love how the Chroma haters pretend that the Flux architecture isn't vastly superior to SDXL at concept knowledge, scene coherence and prompt following. You have to be blissfully ignorant to ignore that SDXL has elementary understanding of what the world looks like. Let alone any NSFW concept it's already seen without relying on overcooking it with LoRAs. SDXL doesn't learn concepts. It can't generalize. Without controlnet, or some kind of hack, it does not understand composition.These are not problems that have been made up by Chroma devs. The SOTA for image gen had been out for a while, and I mean years (Dalle 3). Models came around that were complete shit (SD 3/3.5), models that were good in their own way but still not good enough (Sigma/HunyuanDiT), then Flux came around, compared to Dalle it was still significantly behind in world understanding, scene coherence, NSFW and prompt following. Chroma came around and bridged that gap in all but two things (styles/character knowledge), and now it properly surpasses in those areas.Now Chroma is the best model for photorealism. There is no better model currently available, even API is far behind both layers of censorship and plastic. Official Flux finetunes (E.G. Krea) are still behind what Chroma has achieved for photorealism. There are no other models that give you proper photographic look on demand, where you can place any 1girl in any scenario that you can think of and still get the photorealistic look.Chroma may not be the easiest model to use, had its rough edges (earlier versions), but Chroma Flash HD really is the most coherent model available right now. It's also just a base model, so further large scale tunings to bring back things that it's missing are still on the line. Flux itself felt like a hopeless dead end for catching up to Dalle in any way, and now we're here.
>>107019476>another wall of textthis guy is seriously mentaly ill
>>107019472I didn't try yet, I don't think I can do it and I'd have to restart my UI and I'm generating anime girls. I dunno what that means, I was just going to 50/50 two models I like and get a gooder model(it's just that easy).
>>107019476As someone who is only interested in text to video (and is burnt out of that and waiting for a local sora 2 so I can generate teens acting bratty with sound before going back to genning),The fact that local models are in such a state that you even wrote a three-paragraph persuasive essay is top grim. I don't think it could get any grimmer than this. At least time will heal all wounds and it's just a matter of surviving and waiting for things to get better.
>>107019492I think I remember merging XL with less VRAM back in the day (when it was first implemented in Comfy) for what that's worth. I could be misremembering, but I also am a simply retard as well.
>>107019531I'll give it a shot then, thanks.
>>107019476>"A picture is worth a thousand words">he went for the thousand words anywayloool
>>107019510>The fact that local models are in such a state that you even wrote a three-paragraph persuasive essay is top grim.this, it's a really cultish behavior, they're coping so hard they're pretending their toys is on par with what API has to offer, they don't seem to understand that you can enjoy local while admitting that your product is far from the best
>>107019476>>107019543>thousand wordshttps://youtu.be/0rIvp-AreGI?t=114(Sorry, I too hate FFX-2 but I couldn't help myself)
>>107019476bub you gotta learn to parse through the b8 better it hurts me to see you like this
>>107019476True and based chromaGOD, unfortunate about the blown out colors and chromatic abberation of some chroma posters here though but to each their ownbigASP is apparently also solid for some realism
are there any checkpoints that can do black women?
>>107019476>“The heart has its reasons which reason knows nothing of... We know the truth not only by the reason, but by the heart.”>― Blaise Pascal, Penséesin other words, don't try to justify why you enjoy Chroma, feel free to enjoy it, but also feel free to accept that some people won't enjoy that model as well
What's new this week? Ditto? Dype? How come /ldg/ don't have weekly news?
>>107019577>How come /ldg/ don't have weekly news?be the change you want to see, make this like debo does on /sdg/
*yawn*
>>107019476this is one of the saddest posts I've ever seen on 4chan, why are you so passionate about defending this model?
>>107019577>How come /ldg/ don't have weekly news?Because everyone who is based already knows whats new and is too busy genning to write news reports for newniggers
>>107019394Is a good music model not inherently a good "sfx model"
>>107019569what do you mean? every model can do black women
>>107019569>are there any checkpoints that can do black women?Have you tried (very dark skin:1.4), (dark skin:1.4)If you mean the actual genetic facial structure then good luck because I couldn't even get jungle women last time I tried and had to settle for using a Lilo and stitch Lora to un-beauty-standard the faces
>>107019569tons of models can do black women.
>>107019601>>107019620based
>>107019620lmaooo, that's amazing
>>107019577news maker tend to be highly schizo for some reason
>>107019569DUI checkpoints usually have a lot of cops
>>107019577https://www.youtube.com/watch?v=uQiqKFK5_0w&t=1155s
>>107019605>>107019615yes i mean actual black woman, i can only get a tan anime girl from pokemon. its hard to do any ethnicities with illustrious, even when i prompt for asian nothing happens
>>107019620nice
>>107019620this lore goes deeper than I thought
>>107019356> they straight up rape adaptive optimizers (Or at least just prodigy, haven't tried the rest with restarts.)it can go pathologically wrong, but really doesn't have to.yes, the adaptive scheduler would probably have continued with a lower/higher learning rate than the initial one, but the initial LR doesn't *necessarily* do damage. maybe it also IS moving the, uh, learned thing out of a local minimum/maximum.not all adaptive optimizers are good at doing this kind of a thing on their own either.
>>107019620>there's 2 debosI KNEW IT!!
>>107019620hahahahaahahahahahahahgen a crowd of normal people sitting below the screens with their backs turned, monitoring the screens above themadd to screens: poopdickschizo, pissbuttfag, asianposter #6374
>>107019476ignore even more rough edges, enjoy radiance
>>107019620>Debo#1 looks like ComfyUH OHH IM NOTICING
>>107019485Baiters consistently post walls worth of text since they are on every thread.>>107019571Just because I'm sharing my thoughts doesn't mean I take it personally when someone posts for the 1000th time that Chroma sucks because x or y reason. But some of the reasons sound sound enough to warrant argument. I'm not claiming Chroma is perfect, and I do acknowledge its imperfections.
>>107019680>Just because I'm sharing my thoughts doesn't mean I take it personallyyou make wall of texts and you don't take it personally? Sure...
Move>>107019684>>107019684>>107019684>>107019684Move
>>107019644>yes i mean actual black woman, i can only get a tan anime girl from pokemon. its hard to do any ethnicities with illustrious,Yeah I don't think you're gonna get it from illustrious. Pony could do it better. Don't have experience with noob. Sorry anon, I understand your struggleIt's funny that the reasons why AI struggles with generating ugly people is the same reasons why it struggles with nigs, and those reasons are unrelated to "black = ugly" but more related to AI averaging everything out
>>107019685If you're not a Zoomer that takes not long to read.
>>107019698that's not the point, you're writing bibles to defend such a subpar model, what's wrong with you? do you think we're gonna be convinced by reading words? do you understand that the goal of an image model is to produce good images? I don't care what you want to say, I look at the image, if it looks like shit that's pretty much it
>>107018823add qwen and lumina to this.I think lumina uses the flux vae maybe? i dont rememebr
So has anyone actually tried to generate chinese cartoon smut with pony v7 yet? Thats what its for right? I see all these people seething about it not doing hyperrealism well but thats not even what its for?
>>107019761anon, it's not so good at that either. you'll prefer noob/illustrious or neta-yume lumina or perhaps even chroma.feel free to try tho.
>>107019603Music isn't sound effects
>>107019258Cosine With Restart @ 3 works well with like AdamWthey do nothing with adaptive though, you shoud just use Cosine for e.g. Prodigy or CAME
>>107019743Lumina reuses the Flux VAE yeah
>>107019932I'll give it a try later since I just bought a new drive. Apparently these new models need insanely verbose prompts to be any good.