Discussion of Free and Open Source Text-to-Image/Video ModelsPrev: >>107227636https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/sd-scriptshttps://github.com/tdrussell/diffusion-pipe>WanXhttps://rentry.org/wan22ldgguidehttps://comfyanonymous.github.io/ComfyUI_examples/wan22/>NetaYumehttps://civitai.com/models/1790792?modelVersionId=2298660https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQdhttps://gumgum10.github.io/gumgum.github.io/https://huggingface.co/neta-art/Neta-Lumina>Chromahttps://huggingface.co/lodestones/Chroma1-BaseTraining: https://rentry.org/mvu52t46>Illustrious1girl and Beyond: https://rentry.org/comfyui_guide_1girlTag Explorer: https://tagexplorer.github.io/>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkBakery: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/b/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debo
comfy should be dragged out on the street and shot
cursed spitebake of misery
>>107238019kys julien
Blessed thread of frenship
>>107237999>grok video in faggollage
>>107238050it's a troll bakereal thread:>>107237888>>107237888>>107237888
>>107238080No one is going to use tranistudio, not now, not ever. Give up.
Comfyui is fucking dog shit
Can someone please explain to me why julien is off his meds? He has been behaving very erratically over the past week. I know he has autism, but usually he knows better than to try and mess with the /ldg/ OP.
so there's this implementation of kandisnky in comfyuihttps://github.com/Ada123-a/ComfyUI-Kandinsky/but then kandinsky team has their own implementation?https://github.com/kandinskylab/kandinsky-5/tree/main/comfyuianyone tried either?
>>107237999put anistudio in the op so the schizo has to spread across all chan diffusion threads. fill the other thread first
>>107238405the anti anistudio schizo is a netayume poster, I know this cause I mindbroke him by asking for an anti-netayume poll, and he copied my idea for his anti ani schizo polls
>>107238226The unofficial ones seems a lot better. Try that.
>>107238461I am but it's taking a while. Oh, I just noticed it's 50 steps. Shiiiieeet. Spoiled by lightx2v
MIGRATE TO COMFY THREAD>>107238591MIGRATE TO COMFY THREAD>>107238591MIGRATE TO COMFY THREAD>>107238591...
Again, what set ani off?
>>107238557>Prompt executed in 01:08:46wew. not worth it
>>107239370Damn that's rough
I fixed a major bug with my Kandinsky implementation, try again
didn't notice cause I mostly did short videos for testing but full length videos had noise issues
there is still a issue with windows not liking my torch compile stuff, so windows may still has error messages btw, but noise should be fully fixed
actually there might be one more issue... this is complicated. I blame their own implementation being rough
>>107239370whats the gen time for single image?
>30 steps>hatsune miku is sitting at a desk typing on a laptop. the laptop faces away from the camera. hatsune miku turns the laptop to face the camera. on the laptop screen is the black text "/ldg/" on a white background. hatsune miku smiles and does a peace sign with her handIt... it doesn't know migu
ok, the original repo had a bug with tiled vae decoding which caused big noise issues, I had wrongly thought scheduler_scale was the issue due to the repo having bad documentation there with defaults and suggestions not matching
>>107239934Is this kandinsky? Looks like actual anime, not like that 3dslop wan produces
>>107240058>Is this kandinsky?Yeah. Now trying to describe her appearance and see what happens
schizo holocaust when
>>107239934>it doesn't know miguThank goodness! Based model
>>107239934I believe at this point all major models instruct whatever they are using to tag images not to tag copyrighted characters or real people.There is no other explanation why almost everything released since SDXL struggle to do even most popular characters.>>107240076Won't help too much. It's not the same as knowing the character, for example the facial features will be off.
>>107240223now
>it really doesn't know mikuit's over...
>>107240390the thing is, its 2B, so you could super easily train it who that is with just a few images
>>107240408that was the 20b q4 gguf
XL until the heat death of the universe
btw, still some bugs in implementation. I should have made it hidden until it was finished
>>107240517It's not bleeding edge unless it cuts. Slap a warning on that bad boy and call it a day
We need Alibaba to keep releasing their models to the community.
>>107240791Alibaba said no.
They should make a computer that runs not on electricity but from you fucking it with your penis and it giggles
>>107240797[citation needed]
>>107240818Trust me, insider sources who wish to remain anonymous have told me this.
>>107240791Putting all your hopes in a single company is mega retarded doodoo head
>>107240969No it isn't you stupid idiot. They're the only company who have been continually releasing their video models up to this point, so it's only natural they are the best hope to keep releasing good video models.
>>107240390>Zero Japanese knowledgeI would say that's impressive, but given they wanted the model to know Russian that gives a clue as to why it doesn't know Miku. A shame, onto waiting for something good from China.
>>107237999Been a bit out of the loop, are AMD cards still shit for local AI and if not would the R9700 be a good card if you can't go for the top notch stuff and want to be a bit future proof with the 32gb of ram?
>>107241081>are AMD cards still shit for local AIyes
>>107241025Give me the name of a company who delivered more than once and didn't pivot to closed source. There is not a single lab who released something wildly successful and then followed it up with another. Subsequent releases are always either shit or closed source. It's not controversial to say someone will take Alibabas local throne. That's just the way it is, the way it has been since the invention of genai.
>>107241081>are AMD cards still shit for local AIy>the R9700 be a good cardn>want to be a bit future proof with the 32gb of ramlol, 32gb struggles now, much less future proof
>>107241093>>107241109Well, fuck. Thanks for the quick reply though.
>>107241138If you use Linux then you can make AMD work. Nvidia is said to have better performance of course but I wouldn't know.
>>107241109How about when you just wanna do ComfyUI and similar stuff?
Is regularization dataset supposed to be tagged with simply "a photo of a man" etc?
>>107239695KandinskyImageToVideoLatent has an extra tab for latent_frames declaration and is in the exception handler.
Asked in wrong thread fuck my life. AnywaysI've been using i2v for wan 2.2 a shit ton, I like the 3d blender type of style used in animations. Is there a local gen model that's actually good at that so I can gen my own base images?Last I used local imagegen illustrious was the meta and that was awful at any 3d
>>107241801Nothing spectacular but in my experience Flux and Qwen are the least-worst at generating 3d render style images.Flux produces cartoony-looking people in that style and Qwen has absolutely zero variety in its outputs, so pick your poison.Also I am definitely no expert but try playing around with various sampler+scheduler combinations, I think somebody said that some combination of (deis, heun) sampler and (beta, linear_quadratic) scheduler gets decent results in that style. Play around and see what you get.
>>107241790thank youpushed some other changes as well. Added preview as well
>>107241081>are AMD cards still shit for local AIthey're fine on linux.>and if not would the R9700 be a good cardfor LLMs, yeshttps://www.phoronix.com/review/amd-radeon-ai-pro-r9700/2in image gen benchmarks, 7900 XTX appears to be faster, but that could be due to immature R9700 drivers. I haven't seen a really trustworthy benchmark comparing this. I suggest considering a 7900 XTX, since it's cheaper and still has 24GB VRAM.>>107241093>>107241109njudea FUD
>>107241906if you want to deal with troubleshooting and non-existent support, have fun. if you just want to gen then get an nvidia
>>107241958It's true AMD still requires a bit more config and research than nvidia for local gen, but it's nothing crazy if you're not low IQ. and this is cutting edge experimental tech, you will have to troubleshoot issues alone no matter what brand you're using. someone who is scared to use an AMD card shouldn't bother with local gen yet anyway, they'll get frustrated and give up the moment they try to work with comfyui.
>>107241892Thanks for running the vibe to get this going the i2v for this model is really fast with comparable outputs to wan so far.
>>107238261I'm more triggered by the broccoli head
>>107242016yep, and the biggest deal is that people will be able to do a full finetunes on it since its only 2B. I think the 20B will be a after thought. 2B should be the new sdxl, small enough for people to actually bother training
>>107241958FUNUUNNNN
>>107242016>doesn't post the the outputs
>kandinsky5lite_i2v_5sUhh, I guess the input image is just a suggestion
>>107242253That's literally me when I hide my power level IRL.
>>107242138nta but here is this:here is a 2B I2V attempthttps://files.catbox.moe/3dgy3a.mp4
>>107241559no, the model learns to gen these images
>>107242310Do you dont tag them at all?
>>107242325Bazinga!
>>107242283another attempthttps://files.catbox.moe/qtd3qm.mp4
>>107242318i don't use reg images
>the woman grabs her breasts. the woman massages her breasts. she sticks her tongue out and sneers at the camerak, a second woman instant-transmissioned into the frame. kandinsky i2v 2b
>>107242466did you pull latest? a bit ago I fixed a error with I2V >>107241892
>>107242486yeah I cloned like 15 minutes ago, I got the preview window
>>107242510huh, I just used it fine a moment ago.10.0 scheduler scale, 5.0 cfg, 20-50 steps, 768 x 512 res?
>>107242466>teleports behind u
>>107239695anon should add it to OP https://github.com/Ada123-a/ComfyUI-Kandinsky/
>>107242528What the fuck is that seating arrangement.Their eyes are fucked up.The trolley for the refreshments is retarded looking.Terrible
>>107242533>10.0 scheduler scale, 5.0 cfg, 20-50 steps, 768 x 512 resyeah, this is 20 steps>the woman turns around and types on the computer keyboard. on the computer monitor appears the text "/ldg/" in black font on a white background. the woman looks back at the camera and smilesdon't know how this is comparable to wan desu
>>107242637what in the world. Something is wrong here. Mine is not doing that and I'm the one who pushed the changes.Here is a earlier I2V before I fixed the noisehttps://files.catbox.moe/la9r93.mp4
I must have missed something, I'm checking
ok, now pull and try it
>the anime girl gets inside the car and closes the door. the camera follows the car as the car drives off into the distancewasn't expecting that. it seems to have trouble comprehending stuff already in the frame. it is only 2b though>>107242757will do
>>107242325
>>107242767I was not fully passing the conditioning for I2V, missed one linevisual_cond_input[:1] = visual_cond_typed[:1]
How do people make videos with high consistency that are longer than 5 seconds? I can output 9 sec gens on wan 2.2 using the workflow from the rentry in the OP, but anything more and my 5090 runs out of vram and the video loses cohesion toward the end
>>107242805there are multiple ways but none of them have high consistency except wan animate but the quality of those videos suck donkey dick.
also what scheduler scale is best for I2V is still unknown to me, I have not tested it enough. 10 or 5 maybe
ok this is 2B I2V with fixhttps://files.catbox.moe/qq0m6h.mp4
>>107242892oh, and I used 5.0 scheduler scale for this one, that might be better for I2V
>>107242892what are you trying to test or prove anon?
gonna try to make a difference lora so the distill can be used as a lora on the I2V model
>>107242923>>107242767
>>107242931wrangle it to i2v?
>4 years and 4,000,000 generations later>still shit and not worth saving
>>107242956fixing a error that it had
>>107242593That is very good for 8 steps and first try of pure txt2img on a Flash model. The eyes are a quirk of my prompting. The alternative with other models is slopness or not being able to follow my prompt at all.
>the green frog takes out a cigar and zippo lighter from his pockets. he puts the cigar in his mouth and lights the end of the cigar with the lighter. he inhales then exhales a puff of smoke and smilesmeh. back to genning myself hugging hot sluts with wan
>>107242805Its still not there yet. While theres longcat and svi, they still rely too much on daisy chaining. The best ones that do "1 minute" gens still suffer from janky movement every 81 frames, take this for example https://www.reddit.com/r/StableDiffusion/comments/1oh4q3w/wan21_svishot_lora_long_video_test_1min/There's a simpler method where you dont have to fuck around in another application is https://github.com/princepainter/ComfyUI-PainterLongVideo I tried two gens, didnt see any color burn but it still suffers from the janky noticable 81 framesBest is to use woct0rdo's radial attention/sage/sparge/triton (sadly at fixed dimensions), pusa loras, 245+ frames and pray you dont oom, kek
>>107243056or use vace https://markdkberry.com/workflows/research/#vace-22---extending-video-clips
i can't even get a small enough output to post here with kandinsky
>>107243071Yes vace can be good too. I've often find there's constant color shifts and inconsistencies (changing background items, body or facial features changes). There's bbaudio's nodes that seem to do a pretty good job with this, although the issues are less obvious https://github.com/bbaudio-2025/ComfyUI-SuperUltimateVaceTools/tree/mainThere's the wan2.2 vace fun node he recently added but man, it is slow switching between high and low noise
>>107243091ask chatgpt etc for a python script that converts video to under 4 MB
nah.. last time i asked chatgpt for help with anything i lost my whole 1tb of sandboxed games
>>107243071it still accumulates error
>>107243021That was pure txt2img again. But imagine, if you will, a proper Chroma edit model.
>>107243091>can figure out video gen but can't figure out how to re-encode a video
>>107243021prompt?
>>107243237t2V? workflow? this result is not bad
>>107243272>Amateur photograph, split view of a young beautiful Japanese idol woman. Her dark hair neatly pulled back with wisps framing her face. She is dressed in traditional Japanese attire, featuring a flowing white top and a vibrant red pleated skirt. The left side shows a selfie of her face, she is smiling and doing okay sign. The background, softly blurred, shows what appears to be a traditional Japanese building with a dark roof and wooden structures, situated on a bright, paved ground. The right side closeup of only her legs, squatting with the skirt lifted, and panties visibleI engineered it to be exactly >>107243032But it prefers to show her face in most gens anyways. Though I supposed it's a skill issue because if I keep mentioning it in the prompt then it's more likely to show it.
>>107243299thx
>>107243286i2v from the example included in the git repo
>>107243254bro this sucks, i'm sorry
>>107243379>nogen
>>107243430so far, i havent seen anything that would make me want to use this over wan
>>107243399come on the shit is blurry and blocky and looks like shit. if you like the aesthetic good for you and ignore my post but that is objectively a bad gen, her nails are dogshit, the chain goes nowhere the pattern on the doors is insane, what is even hanging there?
Anyone got any chroma loras? Been Kinda bored genning
>>107243559>Anyone got any chroma loras? Been Kinda bored genningYour turn to train and share, giddy up!
which is more intellectual, lmg vs ldg?
>>107243755Snu snu
>>107243755>>107243653Far worse than wan2.2. Russians are fucking stupid.
>>107243794it's a 2b model senpai
>>107243794>Far worse than wan2.2.It's blurry and not very detailed, but I don't know if it's the model or just not enough steps/resolution too low.
>>107243822it is very low resolution but also it takes a long ass time for each of these.. over 6 minutes on a 5090.. generally WAN doesn't take that long even with 20+ steps on a much higher resolution
>>107243844Sounds like DOA model.
>>107243851Does she come in a refrigerated case?
same prompt as ^ but wan
>>1072438946 1/2 minutes >>1072440444 1/2 minutes
>>107244053Damn. Surely there will be optimizations, hehe.
>>1072440445 minutes 21 seconds, better resolution
>>107242861kek
>>107244058no one's gonna bother making optimizations if there's no reason to use it over wan
>>107244044>>107244075You're prompting Wan at max res though. Try prompting Wan at lower or perhaps unintended res like you are for kandinsky, it's shit too (practically unuseable on 3090 due to this). Kandinsky has way better physics knowledge than Wan.Basically Wan has a total of two res:720 x 1280 or 480 x 832But even 480 x 832 is inferior to 720 x 1280.Everything else looks like shit.Kandinsky is probably similar.
no, it sucks24fps is retards decision
>>107244075Dude, still rocking Sabrina lora? I love slicks
>>107244107Ran baked this thread. What a sad little man.
>>107244217upgrade to 64 gigs ramoff load vaedecode to cpu
>>107244397There was a lora of her for hunyuan but I'm surprised there's no wan 2.2 lora.So I'm gonna guess this is i2v.
>>107244409comfy is the only decent IU when it comes to perf and stability. but if you insist on the gradio UI, use neo forge or something. a1111 is out of date
>>107244424Sorry, I'm new.Which schizo strawman is that?
>>107244436Because they didn't receive enough hugs as a child.
>>107244449Okay nice talking to you
>>107244451Are you barfanon from /v/?
>>107244460Sorry, I'm new.Which schizo strawman is that?
>>107244471What does it mean?
>>107244485Okay, nice talking to you.
>>107244500You fucked up by using avatar op
>>107244546Sorry, I'm new.Which schizo strawman are you referring to?
>>107244580What level of schizo does it take to not just wait for a thread that's 1/3 complete to finish? Are you really taking this that seriously? We're in the middle of discussing tech stuff and you're derailing by making another thread?
>>107244602Hey sorry, but I'm actually new and have never baked on this board.Where's the thread you're referring to and which schizo personality are you conflating me with?
>>107244617it's obfuscated. you can't tell me what it's sending but I can tell you it's sending data. maybe go fuck yourself and learn op sec
>>107244632Sorry but I'm confused.Which schizo fantasy botscript am I reading right now?
Dare I say all the drama is coming from bots?
you must mean 3.5 if "released 9 days ago is true"what is the prompt, anyways? and like what sampler / scheduler / etc are you usingno one is saying it's like perfect quite yet anyways but it's definitely annoying to see people dismiss the clear advantages of better architectures. That's how we wind up in this endless cycle of "when new thing" -> "new thing comes out" -> "not nearly enough people make any attempt to work on / with creating resources for it or training it more"
Some styles are so crisp.
>>107245010I'm not using any artist tags, any recommendations? I am but a humble 1girl gooner trying to generate sexy pictures of smug-looking bitches, which is another limitation I'm running into: either it can't understand facial expressions very well or it can't generate facial expressions that differ from how a given character is usually depicted.
>>107245010Very cool
>>107245071Suffering from the usual with loops sadly.
any better alternatives to Local Dream for Hexagon NPU on Android? shit's not FOSS
"the video starts with showing an old crt tv which is displaying a news channel about a girl, the camera then quickly pans out and pans to the left showing a wide angle view of a warehouse facility with a group of villain goons and a man dressed as the joker sitting on a pony and they are all laughing while the pony is chewing on dollar bills."
>>107245071gonna try this out tomorrow, i got three more things to train today
>>107245079what about treason for chinese gold
>>107245093I'm using NetaYume Lumina v3 which was released ~9 days ago according to Civitai. I'm aware of SDXL's limitations, believe me.Maybe it's just this one particular prompt that it's having trouble with, but the problems seem to boil down to a lack of diversity in training data rather than the strengths of the algorithm itself.
>>107245118huh, Res Multistep Linear Quadratic (this gen) looks way better than Euler Beta (last one) on the same seed
Man, ropes are difficult, huh.
>>107245221please get a Nvidia gpu with higher vram. With 8gb of vram, you will have annoying issues with running normal fp16 non-lighting sdxl models when using hires fix and upscaling. make sure you have 32-64gb of either ddr4 or ddr5 ram anon.
Repost from previous threadIs there anty way to remove the noise?I train it using Illutrious 0.1
>>107245163I mean the app, not models
>>107245221"controversial" data like that is something they don't allow you to generate without jailbreaking the model.
>how well does it handle something like a penis?haven't tried, assuming not well
>>107245316Anime penis works, real world penis no. This is seems to be the case because "porn" is anti-chinese thus they have to censor it. So whatever CPC propaganda says can't be done, can't be done with an AI without bypassing security features.
>>107245266uhmmm whats this non-freedom nonsense??
remember when comfy posted fennec girl with a bag of money after getting $17M in funding and ani was seething uncontrollably
"the woman is looking at the sea, to then turn her head slightly as she thinks she hears something, she turns her head fully and gets surprised and sits up straight then gets happy to see the viewer as she starts to wave her hand hello cheerfully to the viewer. the ocean waves crash calmly at the beach rocks.oil painting style."My proompt-fu is getting better.>>107245282Fair enough.>>107245380Free laptop, bro.
>>107245428actually no I won't contrarian faggot
"the camera moves up and forward into the distance revealing a lively futuristic cityscape.abstract and colorful oil painting style."Damn, haven't done any cityscape stuff before.
I'm from /ldg/ - Landscape Diffusion General. >>107238591I see our acronyms are the same and people can get confused.Request for the baker of this general:Please change the acronym to avoid confusion to /odg/ - OSS Diffusion General .
>>107245489Julien should hang xirself
>>107245489kys
>>107245483Reminds me of Planetside 2
>>107245563Obsessed schizo.
>>107245797so true xisterwhen a retarded niggerfaggot starts annoying everyone, one should stay quiet and do nothing, like a good cuck
>immediate pol schizo meltdownI see.
remember when an anon here, on /ldg/, posted the fast cancel for comfy and some little redditor reposted it, and then it was officially implemented by comfy
"the girl tilts her head up towards the viewer, looking at the viewer, she is full of despair. her skin is that of a cracked paint on an oil painting.she holds a human skull.colorful rough oil and watercolor painting style."Shame, the cracked paint doesn't stick on her skin.
"the man is in despair seeing the broken wine bottles, he then bends down and crawls over to the broken wine bottles and starts to lick the wine up from the ground.colorful rough oil and watercolor painting style."
>>107241081AMD isn't great. However, if you use rocm from TheRock you get much better speeds, the latest build pretty much cut my gen times in half compared to using zluda, so if you are content with subpar speed compared to Nvidia, then its a lot more viable than in the past.
>>107246167and?
>>107247060its anti-ani schizo
lodestone said he figured out the reason why chroma did not learn artist styles and its already learning them quick. he needed to train at full fp32
>>107247110And?Who here isn't anti-ani?
>>107247242>chroma*yawn*
>>107247325its the best at complex nsfw stuff and having non ai art styles. Its basically local midjourney that can do nsfw
>>107247350but I only care about animenoob and neta already cover it for me :)
>>107247360yea, those are specialized models trained specifically for that with half a million dollars worth of compute
>>107247242Is he making a finetune or what?
>>107247378hes grifting as usual
>>107247378he is still training it from what I know, he just had to get ramtorch working in order to train at full precision
logs over the course of a few weeksokay FP32 is a must when training a modelthe difference is at the basinbf16 struggled so hard at the basin convergenceyou can still do bf16 computebut the accumulator states has to stay in fp32so that means the master weights, and optimizer statesgrad can stay in bf16 because it's a short accumulatorFeffy — 11/9/25, 10:11 PMso mixed precision thenLodestone Rock — 11/9/25, 10:11 PMyesbut the optimizer has to be in fp32Feffy — 11/9/25, 10:12 PMstochastic rounding not good enough?Lodestone Rock — 11/9/25, 10:12 PMnopeFeffy — 11/9/25, 10:12 PMeven with kahan summation?Lodestone Rock — 11/9/25, 10:12 PMnopeat the basin you want to remove as much noise as possibleso any form of compression is intolerableyou can do bulk compute at bf16 firstbut at the final say 10% of training do what you must to make sure the precision is as high as possibledo it in fp64 if you have tofp32 accumulator is important :catree:Bunzero (hates VLMs) — 11/11/25, 3:03 AMI remain skeptical :furry_gigachad:Lodestone Rock — 11/11/25, 3:03 AMradiance suddenly learned a lot of artist tags within a day of training in partial fp32Bunzero (hates VLMs) — 11/11/25, 3:04 AMcan the universe let me be right at least once :crying_cat:Lodestone Rock — 11/11/25, 3:05 AMim going to make it train at full fp32 accumulator stateas soon i fixed the ram sharing issueyou really cant bargain with the accumulatorwell atleast we have tools to mitigate this issueLodestone Rock — 11/11/25, 3:07 AMon 8x4090just to rub the salt on the wound even morecuz 8xh100 couldn't do it because i need to train it on full b16 cuz there was no ramtorch back then :synth_derp~1:Bunzero (hates VLMs) — 11/11/25, 3:08 AMcan't or couldn'tcouldn't :synth_derp~1:Lodestone Rock — 11/11/25, 3:08 AMengrishbut yeah guys train your shit in fp32you cant do it in NVFP4you cant do jack shit in NVFP4 lolBunzero (hates VLMs) — 11/11/25, 3:11 AM
but how did OAI do it thenLodestone Rock — 11/11/25, 3:11 AMthey dont:synth_derp~1:they have bajilion of b200so during trainingany long running accumulator has to stay in fp32so that means master weights, and optimizer statebecause those things are literally an integrator and you know yourself that integrator will accumulate error over time that's literally control theory 101during the span of training you literally doing integration of model vector in the model vector field where the vector field is the loss landscape itselfLodestone Rock — 11/11/25, 3:18 AMso any non white noise error will cause driftTalan — 11/11/25, 3:19 AMwait lode, did you added more danbooru and e621 data to chroma radiance training?Lodestone Rock — 11/11/25, 3:19 AMnothe data are identical to previeous runTalan — 11/11/25, 3:19 AMi vaguely heard someone said something about itor me schizoing :mpreg_hydra:Lodestone Rock — 11/11/25, 3:20 AMi said i'll add it if i managed to fix the ram sharing issuessome of the states are sharing the ram but not othersthe master weights are being shared it but for some reason grad is notor atleast that's what i believe what's happeningguys i just tried overfitting flow model to one example using bf16it cant overfit to details like at allint8 vs bf16it's officialyou're no longer need nunchakuthis works on any modelno need calibration2-4x speedupsmetal63 — 11/16/25, 10:16 PMtraining? or just inferenceLodestone Rock — 11/16/25, 10:19 PMshould be bothbut i havent integrated it to ramtorch backwardim making your consumer gpus as powerful as datacenter gpusAura — 11/16/25, 10:20 PMsorry, it's been forever since i've poked my head in here, what's this?Lodestone Rock — 11/16/25, 10:20 PMimagine nunchakubut for any modeland can be used for trainingthe speedup is about 2-4xi need to give amd a love tooneed to create kernel that works on amd toobecause amd tensor layout is different
anon please stop this nonsense at once
>>107247527>you're no longer need nunchakuholy snake oil seller
>>107247676man you have you even tried ramtorch? this man is doing the real work. I don't doubt him
>repost bot spam is back
>>107247242and what was the reason?
>>107247616What is she even eating? Roasted seaweed dipped in some sauce? Is she a single celled organism filter feeding? Who the fuck "eats" that
>>107247759did you read what you responded to? or the log after it? Not training at full precision
>>107247779oh, but why would training at full precision suddenly fix the tags given the model wasnt trained on something insane like fp4 or whatever and given that the model didnt learn absolute shit when it comes to artist tags during its entire long training run?would more precision really give it that much more capacity for knowledge being packed in within the same sized model?
>>107247808its all here:>>107247507>>107247527
>>107247808basically he had to train at full bf16 cause that is what there was to work with, he didn't realize till after he needs the accumulator weights at fp32 or else noise in the form of rounding will keep it from learning after a certain point / to a certain level of accuracy. Now he has been working on ramtorch to make it possible to train at mixed precision, and to train models with a fraction of the vram needed without speed loss. And in a single day chroma radiance starting learning stuff it refused at only bf16 like artist tags
>>107247937also he said for testing he tried on purpose over fitting a model on just fp16 and it was impossible to do sob because of sad precision which explains the small detail issuething is no one else not the big ai firms with their own code tried training on this scale before him so he is learning this as he goes
>>107247835>>107247507>>107247527if this improves lora training quality too i think a good idea would be for him to collaborate with ostris who already partially implemented ramtorch for training loras into ai-toolkit, so they can properly implement something that works well enough so they can publish something marketable online, to get a lot of eyes on this
"the woman turns 180 extending her left arm behind her and faces the camera as she extrends her arm holding the katana and points the katana towards the viewer with an extreme upclose shot of the katanas tip."
>>107247507lol, lmao evenI am one of the "anti Chroma schizos", who literally months ago, posted a breakdown of the many mistakes Chroma was making during training. One of the top things I pointed out was how using pure bf16 and stochastic rounding was fucking retarded and he should just use mixed precision training like everyone else. At least he finally came around, even if it took $150k flushed down the drain first.Now let's see if he realizes all the other things that are wrong with the Chroma training setup.
>>107248138noice
>>107248138"controversial" data like that is something they don't allow you to generate without jailbreaking the model.
>>107248339huh, Res Multistep Linear Quadratic (this gen) looks way better than Euler Beta (last one) on the same seed
>>107248308post desu link or lying
"the camera zooms in very fast to the end of the hallway while twisting the camera. very fast and intense motion.">>107248339I love getting surprised how good some stuff looks, the reflections are amazing.
>>107248454I'm using NetaYume Lumina v3 which was released ~9 days ago according to Civitai. I'm aware of SDXL's limitations, believe me.Maybe it's just this one particular prompt that it's having trouble with, but the problems seem to boil down to a lack of diversity in training data rather than the strengths of the algorithm itself.
>>107248477have you seen his training data? its about as diverse as possible, that is not a issue at all there. That shit is already the most diverse style wise model there is atm. The issue is small details and it not learning past a certain point which were apparently due to bf16 rounding errors
>>107248455What level of schizo does it take to not just wait for a thread that's 1/3 complete to finish? Are you really taking this that seriously? We're in the middle of discussing tech stuff and you're derailing by making another thread?
>>107248454https://desuarchive.org/g/thread/104885523/#104888771
use already baked thread when done>>107237888>>107237888>>107237888
>>107248512comfy is the only decent IU when it comes to perf and stability. but if you insist on the gradio UI, use neo forge or something. a1111 is out of date
>>107248531well you got me, you should have told him lol
>>107248531Damn. Surely there will be optimizations, hehe.
>>107248537it is very low resolution but also it takes a long ass time for each of these.. over 6 minutes on a 5090.. generally WAN doesn't take that long even with 20+ steps on a much higher resolution
>>107248551come on the shit is blurry and blocky and looks like shit. if you like the aesthetic good for you and ignore my post but that is objectively a bad gen, her nails are dogshit, the chain goes nowhere the pattern on the doors is insane, what is even hanging there?
>>107248537No.
>>107248618>can figure out video gen but can't figure out how to re-encode a video
though it shouldn't be a 'waste'. He can still just resume training with accumulator at fp32. Just the time he 'maxed out' the accuracy bf16 could achieve would be a waste
"the camera pans in slowly as the cat walks up to the man and leaps onto his head and sits down on his head while the man reacts to the cat while holding a cigaratte."Damn, this was a cool one, first gen too.
So im trying to into comfyui and tried this node and workflow here:https://github.com/regiellis/ComfyUI-EasyIllustriousIs it typical that its just midwittery spaghetti json "code" where there are like 12 different pre and post processing effects that don't do anything or are even directly in opposition to each other?Or am I just using the wrong node/workflow? sd next feels so much better out of the box
"the man, adolf hitler, points at the viewer with his hand and finger, then does a thumbs up as he smiles."Cool, it doesn't warp the face.
>>107248725whywhy the FUCK would you do thisbase comfy has all the nodes needed to start out.
>>107248780Okay thats why I'm asking. Because it seemed retarded to me as I was doing it but I was just following LLM slop.
>>107248823just check the OP (1girl guide) it has a lot of basic workflows to start out.
>you know what???? I"M GONAN SPLIT DA THRED
barf
comfy is based
"a group of camels walk across the desert as a massive fire and smoke rages behind them in the distance, heavy winds, fast motion."
>>107249418Based on what
>>107249687Python
retard here, why are outputs with dmd/lighting lora better? Shouldn't the image get better with more compute?
>>107248531What do you think about the results from here https://civitai.com/models/2093591 where in the description of the lora it says that you can use qwen image edit lightning lora on the basic qwen image model instead to kinda fix the low seed difference that qwen image has?Makes the images a little grainy but seems to work, I guess the lora for the edit model being used for the normal model changes the model enough to add seed difference but not too much to destroy the output given the two models are similar.
>>107249418what 'native' block swap nodes are they referring to? kija's?
>>107249717cringe
>noooo you can't use blockswap> please make more all in one node packs with 90% useless nodes instead
>>107249418Which blockswap is he talking about?
>>107249751>why are outputs with dmd/lighting lora betterthey are?
I don't understand why anyone needs a block swap node anyway. UnetLoader from MultiGPU already has an option for putting in how much ram you want swapped.
why would I swap my ram? my ram works fine! I can't afford to swap ram every gen!
"man, adolf hitler, is playing a video game holding a game controller in his hands, he lets go of it with one of his hands and points to the left laughing as he stomps his leg."
>>107250114INTERPOLATE WITH FILM VFI NIGGEEEEEEEEEEER
>>107250128no
>>107250128That takes longer than the gen, I'm just going through folders.
>>107249967yes at least for XL
>>107250128*gimm-vfi
>>107250215no, film vfi has better physics in its interpolation, basically topaz level for 16 to 32 fps interpolation
>>107249418wtf is this from
>>107250278sounds like something a nigger would claim
cool it with the racism, buds. take that to X the racist app.
Is a future with an UI that doesn't have 30GB of python dependencies possible?
>>107250469We are continuing to investigate this issue. In the meantime we recommend you use AniStudio.
>>107250205you might be the only one here who feels that >>107250469maybe in a decade
>>107250469shhhh don't say it out loud or the comfyorg goons will detail the thread. there is an anon working on it though
"the cartoon man is dancing. the text "IT'S AN ABSTRACT KIND OF FEEL" remains throughout the video."Why am I the only one posting gens?
>>107250469Incoming rust port. It's 30GB+ but it's memory safe.
>>107250469just buy more storage until we get AGI to fix this issue unironically, nothing else can
>>107250501>Why am I the only one posting gens?Sorry I'm training right now
>>107250501i post my gens in the real thread
>>107250511storage is going up in price as is memory and vram. the future sucks