Discussion of Free and Open Source Text-to-Image/Video ModelsPrev: >>107049284https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/sd-scriptshttps://github.com/tdrussell/diffusion-pipe>WanXhttps://comfyanonymous.github.io/ComfyUI_examples/wan22/https://github.com/Wan-Video>Neta Yume (Lumina 2)https://civitai.com/models/1790792?modelVersionId=2298660https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQdhttps://gumgum10.github.io/gumgum.github.io/https://neta-lumina-style.tz03.xyz/https://huggingface.co/neta-art/Neta-Lumina>Chromahttps://huggingface.co/lodestones/Chroma1-BaseTraining: https://rentry.org/mvu52t46>Illustrious1girl and Beyond: https://rentry.org/comfyui_guide_1girlTag Explorer: https://tagexplorer.github.io/>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkBakery: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/b/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debo
Imagine the kino level of shitpost if we really get suno 4.5 at home
>all chroma shit>probably own gens tookys OP frfr
>>107054061>rainbow>woman avatar....... is he cooking?
>>107054106>womanI think it's a man avatar, the hair is short
>>107054061If it was so easy suno would have been as good as udio long ago, but udio was always better, so I have my doubt that local can be as good.
>>107054015gonna depend on the model, lora strength, captions etc. this is generally the reason why you use nonstandard words to invoke the lora to avoid confusion in the model.
>>107054121this, suno is overrated as fuck, they always pretend it's at the same level as udio when it's definitely not
>caring about the fagollage
Pay debo no mind he's disabled
>>107054047https://vocaroo.com/1lhI4LNQojvTUdio 1.0 of course.
>>107054183udio is amazing desu
>>107054061I listened to his samples, they are struggling to get quality of YuE even.
So wan q8 gguf is like 95% of as good as fp16 and a little faster?
>>107054183meh
>>107054221show some of those samples here anon, I don't want to go to trooncord
>>107054227with a little optimism and cope
>>107054227yeah, the quality is really equivalent and it's 2x lighter in terms of size
>>107054227I have the same speed almost with 16 vs q8 on a 5090, I just block swap half the model.
>>107054227Half the precision, half as goodBut being able to run it makes you twice as blind
Contrastive flow matching is tight.
>>107054444>Contrastive flow matchingwhat is that?
>>107054206Yes, I really want to hope that local will catch up but that seems like a leap from nothing, not even SD, to a Dalle 3 tier music model. High quality manually captioned audio data is probably a must for such results, and then really good DPO process.
opensource emu3.5 with 32b, which according to the authors is supposed to be superior to nano banana in every way.Looking at the sample images, I have my doubts about
>>107054448It's a new version of flow matching that encourages the model to find unique paths which speeds up convergence, gives shaper results because it's not blending paths, and also encourages diverse results (because paths aren't blended).
>>107054492I see, and I guess you're using that method to make a lora right?
>>107054498I'm using the method to train a 1.5B model from scratch.
>>107054444test finetune or what
>>107054482its chinkslop. if its not good they lie and say it is. if it is good they make you pay for it. the good thing about china isnt that its better, its that its cheaper
>>107054508Tell us more. This sounds interesting.
>>107054508nice, how much faster is it compared to the previous method?
>>107054508i believe in bigma
>>107054482even if it's true, 32b is just too big
>>107054518It's just a revision of the Pixart model I'm working on. 1.5B, Pixart architecture with the HDM mlp, and Ostris's 16 channel VAE.>>107054524Insanely fast compared to MSE, doing like 0.02 loss per day which means a full from scratch model on a single 5090 in 50 days.
>>107054531>is just too big
>>10705453132b for a model claiming to do everything they claim it can is quite impressive actually. the problem is benchmarks don’t hold up against reality.
>>107054444Based quads
>>107054482>according to the authors And why should I believe authors this time? They lie as often as the common whore.
>>107054564>And why should I believe authors this time?you should never believe them, like everything you test it out by yourself and see that at 95% of the time it's a big nothingburger
>>107054542Pretty cool tinkering. What sort of database are you using?
>>107054564My text consisted of two parts. Why are you asking me something that I answer in the second sentence?
>>107054482It's unfortunate that from Chinese all we get are models on par with Seedream but they are made out to be something more. I'll give them props for catching up to Seedream by using Seedream based synthetic slop in conjunction with 4o though.
>>107054542Sexy loss graph. The new training run is switching from a 3d perlin noise Automagic warm up to AdamW.>>107054591I scraped millions of images from duckduckgo but I also have e621, danbooru and gelbooru (200k images). Then a lot of lets plays from Youtube for games. A few movie screencaps from stuff I have / famous movies. It's generally pop culture and art centered.
>>107054636so like the dark blue was the normal method and the light blue is the constrative flow thing?
Weird how no local models can break into the top 10 anymore in arenas. Shame they fell so far behind
>>107054629Arguably, Qwen already did that though. We are stuck getting same model over and over again. One slightly more fancy than the previous.
>>107054642No dark blue was me experimenting with a Automagic idea I had which basically was activating layers and parameters with 3d perlin noise much like how wind is simulated in a video game. Light blue is AdamW which is much faster than the other optimizer but I think the Automagic warmup was probably helpful especially for forcing the layers and parameters to be randomly and usefully activated.
>>107054636>I scraped millions of images from duckduckgoYandex used to be so good for this. Now it barely works as image search.
>>107054706Unfortunately all search results are fucking garbage now because AI bullshit is everywhere. I scraped early last year before the flood.
>>107054636retard here, what exactly does loss mean in this context?
>>107054715I think to be sure you should scrap every images before 2022, it's the last year before the AI flood
>>107054725the loss is like the error between the image you trained and the recreation of the model, the goal is to get the lower loss possible
TTS with voice cloning capabilities that you can set up by using docker compose with the relevant options for your system, it comes with an api and GUI set up too.https://github.com/devnen/Chatterbox-TTS-Server
>>1070547250 loss means the model’s prediction perfectly matches the objective, denoising an image or finding a flow. In practice this is catastrophic failure and memorization if you ever get to 0. Normally for diffusion models (e.g. SDXL) loss is how well the image is denoised. Models like Flux have loss based on finding paths in latent space to the image. Contrastive flow adds an additional objective that paths must be unique.
>>107054657you forgot slower every iteration
Hello people, its my first time using chroma, how can I get my pics to be higher quality? Higher steps or more negative prompts?
>>107054636Cool stuff. It's a great learning experience.
>>107054706Yeah, quite sad what they did to it.
>>1070537783.5 Medium was stronger at the top end of its resolution range than the lower end TBQH, like generating at e.g. 1216x1600 would very often be way more coherent and better looking than 832x1216 on the same seed. So for a high res use case like this it might make sense especially if they've tuned it any past the base model. Attached pic is a native 1280x1536 one-shot Medium gen, for example.
>>107054736>>107054902i seeinteresting and makes me wish i wasn't a retard
>>107054980It's not that complicated and all the smart people already did the math for you. I'm half a retard and I just experiment with bleeding edge research other people have already done. Really to get into any of this you just have to drop into it and be willing to sweat, a lot of this shit is just tedious churning, especially captioning. For example I'm working on finetuning Joycaption to be better which requires handcaptioning thousands of images.
>>107054947It's AI filtered walled garden.Speculation wise, maybe Cuckflare and others restricted their web crawlers because of the geopolitical incidents. It's okay because Google can leech everything.
>>107054902>Contrastive flow adds an additional objective that paths must be unique.that's quite a smart idea when you think about it, it forces the model to not be lazy and work on every edge cases
>>107054969And this one's 1600x1216
>>107055046illustrious is the peak of local diffusion. no other model comes close in terms of character knowledge, style knowledge, concept knowledge, and overall fidelity. Chroma is a disgusting blurry mangled mess, neta knows a fraction of the styles, qwen is a bloated stopgap that can’t even compete with seedream 3. SDXL is an absolute triumph and will likely not be surpassed for years
>>107055077depends on the scope of the finetune dataset. they'll probably manage to make the girls/boys hotter, among some other things. it's probably fixing the biggest popular issue in a bunch of months or so?idk if anyone will get enough compute to train the boorus, real fashion/nsfw collections, cjk idols and so on.
>>107055061(You)
>>107055096i think people actually will start to finetune it with ramtorch or w/e. it'll likely be quite slow.
>>107055061There's multiple small model projects now that have proven you can train a from scratch DiT model for a couple thousand dollars. The fact is many people don't want to stick out their neck especially if they're in North America or Europe.
>>107055103Not really. Multigpu is mainly used for higher batch sizes, not learning rate. A single gpu would only go ~4 times slower than the typical setup, and you would only rake in more donations by going slow and doing incremental releases. Recent papers have proven that with modern optimizers the results are no worse too.
>>107055110That's not what I said. There's nothing stopping you from taking HDM, scaling it 4x. The AMD 340m toy model was trained in 1.5 days. Pixart 600m is better than SD 1.5 and on par with SDXL. So you can assume a modern DiT 2B model could be trained in less than 30 days and be SOTA as a booru model.
>>107053799>Udio partnership with UMGhttps://desuarchive.org/g/thread/106957370/#106958310So it has begun. First they will figure out which model is best, give their artists exclusive access to this model, and then give the general public a watered down version of the model, if the public gets a version at all.
Sage attention 3 when.
>>107055167Remember, Udio is no joke compared to everyone else. Going after them specifically is very strategic.
Finished baking male vocals version of:>>107053777Lyrics I posted here already:>>107053946https://voca.ro/1muX2AJkOy1P
>>107055277cringeim waiting for acestep 1.5
>"Udio is da best!!!11!!1"Reality is picrelYou can make direct comparisons yourselves in the Music Arena if you wantUdio is only "better" if you are into gacha and generate parts of the song multiple times until it does something good, and Suno allows you to do something like that as well
>>107055288>mememarks>udio v1.5
I haven't been in the game for like a year now and I'm starting to plan a full system upgrade. Why is there no normal middle ground between 16 and 32 VRAM cards, nvidia?I have a 2060 super, 8GB vram. My question is, assuming I just go for a 5080 or something for the 16GB, could I use it together with the 2060 to add up the VRAM and leave the rest to regular RAM? Or should I not bother and just make do with a 5080? I could afford a 5090, I just cant help but feel like I'm being ripped off and honestly more than anything else I'm worried about it melting and exploding or something. Is undervolting/underclocking a good idea?
>>107055321Im retarded and forgot to mention that I want to get into video generation. the pastebin doesn't go too much into detail on multi-card drifting for what its worth
>>107055149I think the best possible way to train an anime model would be to first train your own captioner model that took tag lists along with an image as input, and interleaved the tags into proper sentences (in a way that would try to be grammatically correct but not necessarily to a fault) based on what it could actually see, while also adding spatial information where it could and where it made sense. Then you could just run that model on the Danbooru dataset directly with the original accurate tag lists for each model.I'd also pick just a maximum resolution and proportionally downscale larger images to that if needed, but not *upscale* anything whatsoever, rather just bucketing everything at as close to the original upload res as possible. So the end result would be basically a fairly robust mixed-res model that could coherently do a wide range of resolutions rather than just focusing on one range.
>>107055288Udio 1.0 was best by far. They neutered the model after that. I've never seen a Suno gen on par with Udio 1.0 composition wise.
>>107055321If you're patient you could wait for the 50 series super refresh as the 5080 super is supposed to have 24gb vram. Can't speak to multi gpu use, but 8gb seems rather abysmal and not worth the hassle especially when you consider you can offload to system memory.As for the 5090, I have one and I've undervolted, overclocked it and have capped the power at 80% (460W) without any issues. In any case make sure to get at least 64gb of system ram if you're gonna gen videos.
>>107055370How'd I do it is have the same image with three different captions:- tags as seen on the booru site- short description- long descriptionEach caption really is a supported way for a user to prompt the model and the model will naturally learn how to mix the different caption types. The problem we've now seen multiple times now is people training on caption blob balls and forcing the model to be reliant on long captions if you want maximum output quality.
>>107055321There's really no downside to undervolting a 5090, you lose 5% performance and reduce the power 30%.
>>107055277Bridges and outro are pretty weak https://voca.ro/12sNX01jyU6M
>>107055541It's still pretty good if it's local.Eg. in terms of slop.
>>107055412>>107055288Suno very likely is trained on royalty-free music libraries like Audiojungle which is why it sounds worse but also more "polished. I'm guessing Udio was trained on more copyrighted music even before the UMG deal so it's more random but gives more interesting results.
>>1070553213090's are pretty cheap
>>107055603Old Udio was overfit on copyrighted stuff, if you input the same tags and lyrics some tracks had, you'd get nearly identical outputs
>>107055077
>>>/pol/520205588How do I do this on my laptop?
>https://huggingface.co/nvidia/ChronoEdit-14B-Diffusersanyone tried the new nvidia edit model?
>>107055697
>>107055524I think your way might work if you literally swapped out the sets of captions for each image between epochs, probably better than slapping them all in one caption file
>>107055797That's what I mean, you duplicate the image for each caption type. And if you practice VAE jitter you prevent memorization.
>>107055826Nice. Maybe more... try dynamic angle, rim light. Really good.
does infinite talk work with wan2.2?
>>107055826
>>107055951Love it. What's the model? I should try and gen something related to this.
>>107055951>>107055982Qwen
>>107056008Cinematic Redmond had these wibes. Cool that Qwen can be grainy too. Of course it's probably pretty stiff but that's what they all are.
>>107054482>emu3.5HF links are all deadI can test it if the models are available somewhere
>>107055752I can't make her sit on the pig, but it's still funny
Is there a way for Librewolf (flatpak) to remember its last directory? In Linux Mint. It's somewhat tiring to use ComfyUI and I need to open a file dialog to traverse all the way up from /home/ to my work mount...
>>107055603I don't think they're prompted the same. Udio has a better understanding of music, that shows because with a good prompt it destroys almost any Suno song I've ever heardhttps://www.udio.com/songs/2bXYLKaVDyVwi1GAb6pSkRThis is a very hard songhttps://www.udio.com/songs/7zrLreMnwCYrdBqQkGtEXMThe musical depth I've witnessed out of this model truly is insane. Unprecedented connection between lyrics and musical notes. It has mastered vocals and intonation in a way Suno has not.Using high quality copyrighted music in conjunction with whatever royalty-free music is available for training the model is the way to go.
>>107056111Yeah, that's just subjective to people who have never played any instrument in their lives.
Any of you has a recommandation to generate videos for a music video? I'm looking for 16:9, some kind of 35mm grain/look, mostly still shots but with some travelling too. Theme is urban 90's/2000's workers working daily shifts.I only know of Veo 3 so far, and looking at Runway.
so has the copywritepocalypse finally started
>>107054444>>107054508>>107054542>>107054636Are you using TREAD like HDM too? And have you considered going VAE-less using SVG (or at least using EQ-VAE like HDM)? https://arxiv.org/pdf/2510.15301if not, you are missing out on huge speedups
>>107055921this is a very nice genwhat model did you use?
>>107056231TREAD is much harder to implement if you want the real speed up. The 16-channel VAE I'm using has been EQ'd yeah.
>>107056239NovaOrangleXL_v120I'm just testing my linux installation, I deleted all my previous models. Don't have noob or anything else.
>>107055149>>107055103THIS, we are on the cusp of home baked local SOTA.
>>107056121Udio literally just got acquired by the largest music label. That should tell you all you need to know.
>>107056281I don't know, really.
Test
>>107056256thank you anon, hows your linux experience going?
Yume hands do work but you have to describe them in the prompt, treat it more like a LLM
/iemg/ lore, you wouldnt get it
>>107056349Yeah, well, I'm an experienced faggot but I wouldn't advertise it for normal people. Even with the most common interfaces, it's been 20 years and they still can't get a file save dialog right.I have used Irix and it never had these issues.Like save a file from Cum and it defaults to some ~/. Open a file...It's great if you are a developer but for a normal person just use Windows.I feel like that Linux environments have gone backwards since I last used them 15 years ago.
>>107056389Flatpak browser does not remember the save file location from Cum. This is what I mean. I need to browse in 5+ deep to just get to the directory I want.
remember that guy that was training a model and said his image of a brown splotch for the prompt "a woman" was 80% of the way there?
>>107056375Okay, then what's the optimal total prompt length in your experience?
Text can be consistent with small phrases from the looks of it and it is really sensitive to artist. The wrong tag will completely fuck everything which lends to the needs more training.>>107056415I never pay attention to that doesn't matter from my testing?
>>107056405>>107056389im a linux user, what distro r u on?im on debian and the default file save dialog in brave/mullvad remembers locations (actually not sure about mullvad because it has crazy settings, but firefox did work) for extensionswhat environment are u using? i use dwm so less ram/vram is used
>>107056412how's your dataset going? still 0%?
>>107056432Yeah, comparing distros is like comparing dicks. I think I'm using Librewolf and it's a flatpak - this explains why it does not remember the directories.
>>107056440how's your training going 2 years later? still at 80% there?
>>107056421>hands do work but you have to describe them in the promptSounded like your approach is to really bloat the prompt with specifics, but I guess I misunderstood.
>>107056441try firefox or brave, or look for settings to disable forgetting save directory in about:configflatpak is likely the issue because of muh sandboxing
>>107056451it doesn't take much philosophy to understand you don't do anything and thus won't achieve anything, don't put your insecurities of failure on me, thanks :)
>>107056465yup, still 80% there confirmed lmao
>>107056421do you switch the system prompt like how the guide says to help with text or? i have yet to really fuck with text on it
>>107056458Nah, your advice is just like any of the useless non-tech advice - changing distro or even browser does not accomplish anything. If it works it works and if it does not there is way to do this but sure as hell it is not by reinstalling my disks.
>>107056476actually that doesn't mean anything, for all you know I've already released a model, but what we both know is in 2 years you don't have anything except a bitter attitudeit's truly funny I'm living rent free in your brain though
>>107056490I treat it like chroma I also use my own system prompt I'll look into the guide again but pigeonholing it to anime only doesn't do much for me
>>107056497for all anyone knows you haven't released anything lmao
>>107056518Feel free to explain why anyone would ever attach any of their professional work to 4chan. No one releasing a model that had their name attached to it would ever link to it 4chan if they wanted to be taken serious.
>>107055046One more
>>107056537how convenientmy locally trained 300B model is going great too
>>107056560The only thing you're developing is suicidal thoughts.
>>107056567let's see a 1girl gen, bro, I'm sure it will be great bro, two years of improvement bro
>>107055713This is brilliant desu. Now they just need to make it uncensored so that a guy jerking off shows a girl fingering her pussy. Then bye bye thots, any guy can become an OnlyFans whore.
>>107056581Great stuff.
>>107056375That's not really true at all for Yume 3.5 IMO, you can absolutely even Booru prompt it straight up as long as you leave the Gemma boilerplate properly in my experience. That the generally recommended sampling configs are both not really that good is more likely the issue for some people, DPM++ 2S Ancestral at 4.5 to 5.5 CFG gives massively better results most if the time for me. It is slower though.
>>107056597Oh I forgot to say, that's with Linear Quadratic.
https://huggingface.co/lightx2v/Wan2.2-Distill-Models/tree/mainnew models, anyone test?
>>107056491did saving files with librewolf remember?
Lustify is pretty good for off-topic gens.
>>107056596ty
>>107056597sadly that sampler is not in neo forge for some reason but I get good luck with DPM++ 2M
>>107056640>>107056581huh? aren't these just film stills?
>>107056620I does remember the last directory for images but with cum ui it does not.
>>107056151>Any of you has a recommandation to generate videos for a music video? I'm looking for 16:9, some kind of 35mm grain/look, mostly still shots but with some travelling too. Theme is urban 90's/2000's workers working daily shifts.>I only know of Veo 3 so far, and looking at Runway.This is the local general so I'll give you advice for a model you can run on a GPU on your home computer Your only real option for cinematic stuff is Wan 2.2 or 2.1 with the MoviiGen lora. I would recommend trying a 2.2 workflow + that Lora at 720 using a 5090, or FusionX if you choose to use 2.1If you don't need the lack of censorship of WAN and you have money to spend, I'd just use runway for this. Higgsfield AI may also be interesting to you because they have specific stuff for music videos
>>107056640I am going to give a you tip: watch Bram Stoker's Dracula (90's) and take couple of screenshots, there's Lucy and all that. Then img2img them. That'll be great.
>>107056614>more i2vI sleep
>>107056674i wish this were me right now
Kek, I was playing the Suno side and thought local already caught up somehowhttps://levo-demo.github.io/Very disingenuous demo
>>107056674You should put him in a van. And make him go.
>>107056670Cool thanks
>>107056679>Bram Stoker's DraculaThe best shots are couple of still frames from inside the film, not these tiktok screenshots etc.
>>107056690Kek, who trained this model? It spits out Adele unprompted?https://levo-demo.github.io/static/audio_sample/overview/04_en.mp3It might be good. How come I've never heard of it.
>>107056726Is there a reason why you feel the need to talk about a unrelated subject in the thread when it can exist in it's own thread with actual documentation we can grab from the OP and all use?Just seems odd you can't do that instead
>>107056732bigger
>>107056742There's no comfy workflow, and the model seems like some experimental half trained model, what is there to talk about?
>>107056650Remember this scene?>>107056679It's a decent movie but I would unironically remake all Keanu Reeves dialogue with AI
>>107056819bro dont upload that scary ass shit here
I'm getting closer
>>107056726Tencent is actually training their own music model.https://huggingface.co/tencent/SongGeneration>TODOs>Release SongGeneration-v1.5 (trained on a larger multilingual dataset, supports more languages, and integrates a Reward Model with Reinforcement Learning to enhance musicality and lyric alignment)And the data is so copyrighted it spits out Adele unprompted as you can see on their demo. That is wild, with Qwen doing the same, my faith in China has been restored.
>>107056890What does this have to do with image diffusion?
>>107056867to killing urself? never been happier for u
>>107056904There is no music thread. It's either here or /lmg/, the only two places we can discuss local models.
>>107056904this is local diffusion general, we accept video and audio related content here.
i for one welcome our music gen brothers
>>107056935>Discussion of Free and Open Source Text-to-Image/Video Models>>107056915>>107056926You revealed yourself go back to your containment thread
>>107056867closer to approaching the quality of a quantized 2gb illustrious model? maybe
>>107056946Comfy has audio models. We should be allowed to discuss anything comfy adopts as long as it is local.
>>107056983>>107056984You're so fucking pathetic dude
>>107056984Besides, good audio models are pivotal for video. Since Sora 2 it's not muted audio era anymore, the SOTA has changed, so all discussion on audio research is welcomed.
I'll give the NetaYume shill this, the model requires a whole lot of gacha but at least it has some actual variation in its outputs.
Im running comfy ai and following the guide ive been playing around with the hand and face detailer. Is there an equivalent for feet/toes? Id like to be able to fix those too.
>>107056597Some are better than others, clearly, but IMHO much of sampler/scheduler choice is subjective. The latter moreso than the former in my estimation.
>>107057066You haven't taken any steps to learn the model and it shows, Why not explore something before going on multi day complaints?
>>107057119I barely post in this thread, you're tilting at the wrong windmill friend. And I'm saying I like the model, I get better results out of it for the particular thing I'm prompting than I get out of the other boomerprompt models.
>>107057137Anything to show?There have been this constant wave of anons that complain about this model but don't post anything. I know you're just wasting time but take your low skill ass to one of the other threads
>>107057066>the model requires a whole lot of gachaDescribe the poses/gestures better
>>107057157"Face and proportions that don't look weird">>107057145>take your low skill ass to one of the other threadsOK
>>107057165Fuck off now thanks!
>>107057175illustrious 2gb?
why is netayume so sloppy bros??
*yawn*
>Mindbroken because hen ever made anything good in his life
>>107055177it's already there
>>107054935both
>>107057227damn you melting so hard you cant even spell yumebro
https://www.youtube.com/watch?v=xboXFT46XSo
>>107057111DPM++ 2S Ancestral is pretty objectively better than Res Multistep at least for details like hands and text, using Linear Quadratic for both, I'd say at least
>>107054935You typically don't need more than 25 steps. Most of my 50 step outputs have been either a sidegrade or even a downgrade in terms of quality.Don't forget that chroma can gen pics above 1024 dimensions.
>>107057172It's clearly the same fairly bad troll as yesterday, he's blatantly ragebaiting
>>107057334yeah I agree fellow yumebro, theres totally not a vast majority of people that find this model trash
>>107057334It's the same retard from the rentry, he spends his entire life doing this for years and is just reduced to a bitter faggot.
>>107057206kek yeah that poster was an idiot >>107057327nice
>>107057352You're right, there's in fact not a vast majority of such people
>>107057327NTA. Your pic is neat af. This is also 25 steps
>>107057457Oh, this is neat too, how's radiance compared to DC-2K?
>>107057474>How's radiance compared to DC-2K?Couldn't tell you, but I loved the 2k debug ones. There's still a lack of blending the macro pixels but it's mostly good
>>107057521Cinematic Redmond is great.
>>107057497"The lighting is even with no strong shadows." compared to "Cinematic lighting, dark background, deep shadows, detailed skin. Sharp HDR.">>107057515>>107057521very cool
>>107057589https://www.youtube.com/watch?v=ZEWGyyLiqY4
>>107054482>32bMostly useless for local. Viable for use with quantization, especially nunchaku, but LoRA training will be a nightmare, and a model without low cost LoRA training is pointless beyond ten minutes of novelty use.
>>107054248From the final pretrained model we haven't seen any samples, but this is as it was trainingJ-pop songhttps://vocaroo.com/19CHG4V410OPSome pop songhttps://vocaroo.com/1i7OjKcLbmnOSome opera songhttps://vocaroo.com/1f64Fkmpn9AxIdk, maybe with SFT phase it'll catch up to where it needs to be, but those outputs are very underwhelming. Just a bit concerning, but I don't know jack shit about these models.
Slowly getting it together still need to learn composition better
What was that feature of comfyui that was being advertised a while ago where you bundle a bunch of nodes together and then you can re-use that as one node?did this ever actually happen?
>>107057657subgraphs? didn't really change anything and was kind of a letdown. the node implementation in general is lacking too much and everything done to the front end has been lipstick on a pig
>>107057657subgraphs? they're pretty great to clean up wf and only see what you actually need to see
>https://huggingface.co/meituan-longcat/LongCat-Video>We introduce LongCat-Video, a foundational video generation model with 13.6B parameters, delivering strong performance across Text-to-Video, Image-to-Video, and Video-Continuation generation tasks. It particularly excels in efficient and high-quality long video generation, representing our first step toward world models.Anyone tried it? Works with KJ wanvideowrapper
>>107057589I had an antiquated gpu. But jesus, the boost even SDXL has gotten in terms noise... Sounds like a faggotry.
>>107057615>LoRA training will be a nightmareostrisai's trainer has supported 3bit quants for a while now.. wouldn't that be sub-16gb? https://xcancel.com/ostrisai/status/1953933728948121838
anyone knows if infinite talk works with wan2.2, or is it just for 2.1?
>>107057589what model is this?
>>107057677some onions have been trying it out. doesn't look that much different from context window jerkiness after every 5 seconds
>>107057718>onionsanons filters to onions sometimes or something? the more you know I guess
>>107057713Chroma-DC-2K-T2-SL4-Q8_0
>python?>no, that shit is gay
>>107057788based chink
>>107054044slowly but surely, mistakes were made just need to adjust values
>>107057813Thank you Ran. You wanted some attention.
>>107057648It's impressive to see the vocals don't sound anywhere near as robotic as original ACE-Step though. If they catch up to Suno 4.5 maybe there's a chance of getting Udio tier kino now and then.
>>107057813you can't adjust values if you are worthless
>>107057655>>107057813the painted nails are nice
So far I've been using the Wan 2.1 workflow from the rentry but wanted to try out 2.2 from here: https://civitai.com/models/1818841/wan-22-workflow-t2v-i2v-t2i-kijai-wrapper (2.2 I2V)Why isn't it recognizing the vae? Everything looks correct to me, straight dragging the vae output from the loader to the decoder doesn't do anything either
>>107057861it's not connected to the decode node, pull the string from the vae loader to the decode node to connect them
https://www.youtube.com/watch?v=Gu3TAuw3ZJ8
>>107057861If it's not bait, as it's probably is, but it can be useful for newfags, use the example workflow instead and just load the correct models : https://raw.githubusercontent.com/Comfy-Org/workflow_templates/refs/heads/main/templates/video_wan2_2_14B_i2v.json
1girl
>>107057823*MrCatJak
>>107056890need them to train a speech model with emotion prompting so we can be freed from the dead end known as vibevoice
>>107055149That's what I want to hear.Start a group, delegate simpler tasks to me, such as some manual captioning, and I'll contribute $250 toward training.The only catch is that you share the training process and I get to ask a few technical questions.We can find 20 others; there are plenty of interested people out there.I don't care if it's a failure.
>>107057917I wonder if krea has a buttchin obsession too
>>107057922You want to suck off people.
>>107057922What took you so long, please offer your asshole.
>>107057655hot
>>107057949is that...
elongated 1girl
>>107057838Thanks, I think I'm getting the hang of this model now, the hardest part is finding the right bled of tags for a presentable image followed by adjustments, starting to feel like 60 steps is the magic number with this model. I wish neoforge had all the samplers I don't know why he took some away.One thing I like with this model is I can game due to how little vram it uses compared to chromaSorry but I have a dedicated sperg that hates me and has been holding a grudge for years as well just ignore him
>>107058031?
>>107058031I respected you for years but not any longer. Seems like you are just spiteful.
>disabled retard noises
>>107057671>>107057672Thanks, I've started using subgraphs but I can't figure out how to make all subgraphs reflect each other's changes when I edit one of them. Any ideas? I would expect them to work like Scenes in Godot.
>>107058084Why do you refer in 3rd person?
>>107058031>ran wanted to come outHe manages to spit out a narcissist rant.
>>107058085clone it
>>107058085>I would expect them to work like Scenes in Godot.there are a lot of expectations from modern nodegraphs and comfyui ducks up 90% of what's standard
>>107058130ah, I re-cloned and the clone is working now! I guess I must've cloned too early before, or there was a bug, which caused my clones to become unique (and were no longer clones).
>>107055297>pleeeeeease novel ai, i need the model files, my local model is kinda noisy
>>107058193if you duplicate you get separated entities, and if you clone you get tied ones
>>107058024Buffy x slenderman
Yeah I need to make loras for this model, it's the boost I needed, it should also be pretty fast compared to training chroma,
>>107057893doing that just crashed comfyui>>107057915not bait, I'm just a bit of a brainlet when it comes to this but your workflow works fine, thanks
>>107058274Netayume is fucking garbage, holy shit
>>107057744>Chroma-DC-2K-T2-SL4-Q8_0nta, nice gens, with lora?
Netayume is fucking trash and just having it write some text that looks like its done in paint doesnt make it redeemableChroma for complex stuff and illustrious for hentai is the way to go
uh oh meltie
>>107057457Shit I didnt notice you replied to me>25 steps is enoughThanks for the heads up boss, chroma fp16 with fp16 text encoder doesnt run all that slow on my 5060ti 16gb if I keep it under 30 steps
>>107058315Yeah, uploading to civitai right now
does using this node lead to loss in quality?
>>107058370not anything visible
You can still download Udio songs on the fly as 320kbps btw, just downloaded a couple of bangers. No need to record or anything like that.
can a pitfag rate this pit for me
>>107058408from what I read they're limited to 192kbps mp3I'm getting everything in bulk I saved there
>>107058432pits fine but smaller boobs would be more harmonious
>>1070584326/10I prefer mine like this
>>107058432Its nuts that i immediately spot a netayume pic every time since it looks so off
Fresh>>107058480>>107058480>>107058480Fresh
>>107058441Yeah dunno, it's quite strange.Was able to download a few of them at 320kbps with fetchv, like https://www.udio.com/songs/hoCg4BmayTYXcJfjo4jvbTBut other ones are only 192kbps. Maybe for some reason some of them stream at 320kbps, while other ones don't?
I see people making AI images of trump and stuff like that.But some of the stuff is definitely better than others. Is there any way to set it up such that whatever prompt I give, the character is strictly that one character?I mean, not just simply typing in the name of the character but making it more realistic?Like, in a way such that even when I make anime or caricature images, it seems like some professional artist drew that based on the likeness of the person?I don't know much about the loras and such, that's why I ask.