[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107074285

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Neta Yume (Lumina 2)
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd
https://gumgum10.github.io/gumgum.github.io/
https://neta-lumina-style.tz03.xyz/
https://huggingface.co/neta-art/Neta-Lumina

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
>>107077315
SONGBLOOM THREAD 100%
>>
I support Neta Lumina
>>
>>107077330
You keep shilling this garbage model in every thread thinking you'll make it be a popular thing (maybe to justify the effort your are making comfy nodes for it or something), but listen to your conscience for once: it won't be a thing
>>
I support black lives matter
>>
>>107077362
>thinking you'll make it be a popular thing
I doubt that. music is way less popular than tv and movies.
>>
One thing we really don't have is audio controlnets.
>>
oh is he mad because its above chroma and illustrious in op kek
>>
>>107077341
wat? I haven't been scared lol, I'm like one of the primary netaposters around here recently, the other one being that guy that does the "all fours roped up girl" pics with Neta somewhat regulary

>>107077355
For both versions of Chroma there, positive was:
`a highly detailed 2D digital cartoon of an overweight hairy shirtless nerdy man wearing a fedora and sweating and shaking as he sits at a desk in his darkened bedroom and stares at his computer monitor with a deranged expression on his face. The computer monitor is facing away from the viewer and seen from behind. A wireless keyboard and wireless computer mouse are on the desk in front of the man. There is a massive shirtless firefighter poster on the wall behind the man. A speech bubble coming directly from the man's mouth reads "HEHE, /LDG/ WON'T KNOW WHAT HIT 'EM!"`

and negative was:
`aesthetic 0, blurry, pixelated, jpeg artifacts, out of focus, lowres, worst quality, disfigured, deformed, fused, conjoined, bad hands, missing finger, extra digits, bokeh, dof, depth of field, sketch, greyscale, monochrome, grotesque, old, older, elderly, mature, wrinkled skin, 4koma, bad anatomy, bad composition, bad proportions, bad perspective`

used DPM++ 2S Ancestral Linear Quadratic for both, 25 steps. No loras.

I (Zootanon, again) am also the person who posted the original NetaYume version of this overall idea recently, that Jimmy The Troll most likely has / had in mind BTW:
https://civitai.com/images/107297931
>>
>>107077382
Choma would be pretty great if it could do hands.
>>
>>107077375
>>107077381
Oh, I now see the repost bot is "improving". What is the endgame of this?
>>
File: ComfyUI_00093_.png (1.28 MB, 1200x1200)
1.28 MB
1.28 MB PNG
>>107077355
gimme a sec ill try to bake with the negatives
>>
/adt/ has more kino collages, friendlier anons, more frequent dev updates and fun. this thread has schizos, narcissists, trolls and slop. get your shit together anons
>>
>who is this 4chan
>>
>>107077400
Okay, so why don't you fuck off and stay there then?
>>
File: ComfyUI_06820_.png (967 KB, 1200x896)
967 KB
967 KB PNG
>>
>>107077400
trvke
>>107077403
least angry netayume poster
>>
>>107077399
ah wasn't sure if you saw my reply in the last thread, reposted it above too as part of a comment lol
>>
>>107077371
but yea the prob is overfitting, and that the music algos will likely match it anyways. And im not joking when I say music mafia

https://files.catbox.moe/0c212o.wav
https://files.catbox.moe/blsc0w.wav
>>
File: Comfy_UI_03.png (2.82 MB, 1024x1024)
2.82 MB
2.82 MB PNG
AHHH I'm STARVING for that /ldg/ shilling energy!! FEED ME the Song Bloom hype, Neta Lumina cope, Radiance shills, Chroma wars, cherry-picked masterpieces, half-baked takes, 'works on my machine' excuses, 'two more weeks' trust-the-plan posting, Qwen Miku benchmark spam, schizo rants INJECT IT STRAIGHT INTO MY BRAIN!! I'm FEELING IT, THE ENERGY IS FLOWING, I'M BECOMING ONE WITH THE SHILL!!"
>>
>20 replies in 11 minutes
Popular thread desu
nb4
>>
>>107077371
>>107077415
stop it
https://files.catbox.moe/q824c4.wav
https://files.catbox.moe/nt0fw9.wav
>>
>>107077427
Perfect gen
>>
why is qwen image the best model?
>>
>>107077382
when did Neta even get put in? I think it was around V2.0 ish of NetaYume, or so. Pretty sure I was one of the first people to notice that it had become a noteworthy improvement over base Neta Lumina
>>
>>107077427
this general was founded on pixart hype and by god it will continue to survive on new model hype
>>
File: 1761783748268442.png (1.63 MB, 1344x1728)
1.63 MB
1.63 MB PNG
>>107077400
Not a real thread, if it's more fun go there faggot
>>
>>107077427
gemmerald
>>
>>107077445
YMMV. it has some of the best prompt understanding.

but also usually seeds do very little in terms of variation even on prompts that would clearly permit it and it is taxing to train. and its base training is kinda meh.
>>
File: 00020-3039430131.png (1.05 MB, 824x1024)
1.05 MB
1.05 MB PNG
>>
All local musicgen models are not worth of anyone's time, you are better off using the free tier of the SaaS websites as even their free models are better than any of the garbage open models available.

Just pray Qwen's musicgen model and AceStep 1.5 turns out good
>>
File: ComfyUI_00096_.png (901 KB, 1200x1200)
901 KB
901 KB PNG
I think this could work in chroma with the right prompt, but I just slapped the netayume one and called it a day, with some random cfg value, so who knows
>>
>>107077445
I tested Hunyuan Image 2.1 a bunch after it was discussed here recently and IMO it's actually a *bit* better than Qwen overall, if you consider it in terms of like, just baseline knowledge and prompt adherence and accept that neither one of them is even remotely good at realism by default
>>
>>107077489
its cause all the open models use a small amount of shitty licensed / open music, you HAVE to use a fuck ton of songs
>>
>>107077427
what was the prompt for this masterpiece
>>
>>107077489
Why not Neta Yume music gen or Anistudio music gen? Do you have a particular reason for promoting certain models over others?
>>
File: ComfyUI_08196_.png (2.08 MB, 1152x1152)
2.08 MB
2.08 MB PNG
Let's just hope Alibaba doesn't do what they always do, train on synthetic slop for their music model. Music should actually be much easier to train. I checked the Suno v4.5 side of things and while it's hit or miss whether it can match Udio composition wise, it can make pretty nice songs. If what ACE Step guy said was true, 1.5 should be Suno 4.5+ quality, and assuming his dataset is healthy and even better than before, it should be possible.
>>
>>107077502
The text itself
>>
>>107077508
i feel like there's not enough extant AI music for that to really be a big concern
>>
File: ComfyUI_08171_.png (1.7 MB, 1152x1152)
1.7 MB
1.7 MB PNG
>>107077489
>AceStep 1.5
Listen to the latest samples he posted, it actually sounds pretty good sound quality wise. The only cap right now is his SFT method and the quality of his dataset.
>>
>>107077499
YuE is the only based open model because they trained on tons of copyrighted stuff. On their paper their vaguely mention using "internet data" (which means they probably used tons of copyrighted music), and the ICL mode (where you can input a reference audio sample) it has follows the style and vocals for certain artists really well, probably because the model has seen them before.
>>
>>107077400
Admit it little slut, it's more fun here.
>>
truly the best model is the one you have inside your skull
>>
>>107077427
Why not Ponyv7?
>>
>>107077539
its not as bad as people say because you have to do stupid shit like spam negatives / use style clusters and long prompts but its far worse than chroma
>>
>>107077539
whats new about pony?
>>
>>107077523
music really is not the same, they have copyrights on fucking 2 second long skiffs, and they have stuff trained on finding similarity and they come down hard on anyone. That is why no not Chinese based even tries
>>
>>107077551
>its not as bad as people say
Ohhh that energy you're giving me life right now! I can FEEL IT! So tell me, anon do I dive back in with your settings and pray it works? Or do I rot for another 2 weeks waiting for v7.1 because "trust the plan bro" even though this current version is barely functional??
>>
>>107077568
go back to /hdg/ where you belong tbqh
>>
STOP EVERYONE!
CAN EVERYONE THINK ABOUT DEBO FOR A MOMENT?
THANKS
>>
>>107077508
I would be happy if at very least Alibaba's model supports reference audio and follows through it with high fidelity. Base Wan (the text to image/video component) is a slopped model, but the image-to-video component is amazing. If the same holds true for music gen (it replicating the style of the reference audio well), then we will have nothing to worry about.
Songbloom (which is being shilled here) is a model that only works through reference audio, and it's utter garbage and produces outputs that are barely reminiscent to them
>>
>>107077568
I dont use it, I use a noobai tune for 2d, chroma for anything else
>>
>>107077584
based, imagine thinking netalumina can compete
>>
>>107077526
unironically this is the only diffusion thread left where ANY normal person discussion of anything ever happens. There used to be some in /b/ but they devolved into being almost nothing but imageposts over time. And of course it used to be present on /sdg/ too way back in the day. /hdg/ was always negativity-maxxing schizo containment thread though.
>>
>check out local diffusion general
>everyone namedropping companies
>barely any actual discussion
Why are we like this?
>>
>>107077523
Yes, YuE is considerably better at composition than all local models, but it's hardest to infer. Even with ExL2 I get OOMs on 24GB if I go for 2 min songs (though may be my settings), plus there's no proper Comfy support. YuE is still sounds pretty bad quality wise though, and there's no update from their team.
>>
>>107077584
>>107077588
identicalhomo
indistinctgay
etc
>>
>>107077599
>there's no update from their team.
The lead dev got hired by Alibaba. He is probably working on the very musicgen model they are supposedly working on.
>>
>>107077382
How much longer do you think he'll be malding
>>
File: ..png (4 KB, 308x127)
4 KB
4 KB PNG
>>107077605
>>
>>107077598
stagnation period. this is just like last time desu
>>
>>107077592
>unironically this is the only diffusion thread left where ANY normal person discussion of anything ever happens.

Wait, seriously? You're saying this is the only normal discussion thread left, even though you've been dealing with constant harassment from a schizo for weeks straight, 24/7
>>
File: FluxKrea_Output_125151.png (3.17 MB, 1728x1296)
3.17 MB
3.17 MB PNG
>>107077247
Krea version just for fun
>>
LET'S PLAY A GAME CALLED 'COMPANY NAME DROPPING BINGO!'
LET'S CROSS OFF THE NAMES THAT HAVE ALREADY BEEN USED!
>>
>>107077614
i'm not saying it's ideal just that the other threads are either literally worse or just less interesting
>>
>>107077611
if he's an turbo ESL whose real reason for hating it is his own inability to actually write even simple coherent natural language prompts in English, maybe forever
>>
>>107077634
For me, /ldg/ is where I can shitpost freely and actually get responses - it's therapeutic. In other threads, if you critique someone's gen, people get butthurt or go full Platonic idealist mode. No fun allowed there.
/ldg/ = banter and fun
Discord = serious discussions and actual news(all the devs post there and youvan talk to them)
>>
File: ComfyUI_00079_.png (1.42 MB, 1328x1056)
1.42 MB
1.42 MB PNG
>>107077649
why are you upset? if your gpu can run chroma you should try it too! its very fun
>>
>Songbloom (which is being shilled here) is a model that only works through reference audio, and it's utter garbage and produces outputs that are barely reminiscent to them

That is the thing. You can't have a model that excels only at one thing but not the other. A properly trained model should be able to do both.

>>107077564
It's a shame that open source is so cucked. It's not just regular music that was scraped by Udio team (and also ElevenLabs), that stuff can even do movie scores https://www.youtube.com/watch?v=8moLFyfgUR4

No idea if Suno can do it, but now that's two major Western companies for these music models that do it. Locally, when it comes to copyrighted shit we have been infinitely cucked unfortunately, so I don't think we will ever get something like it unless Alibaba can step up their collection efforts this time around.
>>
>>107077315
Ofc he added his gabrage neta gen to the collage lole
>>
>>107077677
>too retarded to realize that I'm the same person who made the original NetaYume image you're clearly aping, and recreated it twice like an hour ago with Chroma and posted it to this thread, and then again just now with Flux Krea

ok there bud kek, do your trolling blueprints account for people who know how to get good results out of all currently relevant models and will just do so? I suspect not.
>>
>>107077598
>>everyone namedropping companies
It will be like this until a random neckbeard can train an entire model in his basement or training a big model using cloud services doesn't cost a fortune.
Unfortunately for now we rely on companies to get the interesting stuff. Currently you can barely train SD1.5 tier models for less than 100k USD.
>>
>>107077684
>utter garbage
I don't think we'll get anything resembling udio or even suno for a while, and with how these two will probably be destroyed from the inside, it'll take a few months or years.
I see nothing else outside of udio not sounding "robotic" like most models are.
https://voca.ro/1iKrinqQg3xG
>>
Well 72 comments:
>Companies mentioned until now more than one time.
Alibaba (Qwen)
Hugging Face
CivitAI
Stability AI
RunDiffusion
Replicate
SongBloom
AceStep
YuE
ComfyUI
Chroma
Neta Lumina
>>
>>107077747
>name of model
>company
go be a retard somewhere else anon
>>
>>107077747
Two of those aren't companies, faggot
>>
>>107077747
are companies a religious taboo for you or something
>>
>>107077759
No, I'm planning to do a shilling bingo and (you) will play
>>
File: file.png (82 KB, 602x404)
82 KB
82 KB PNG
>>107077742
the only reason we don't have more (good) music models is that the music industry is a giant well organized state enforced cartel
if images were organized the same (a few giant companies managing all image copyright), we'll have picrel for current sota image models
>>
>>107077747
Runway will always be the GOAT. Everything here is built of their success.
>>
>>107077742
https://vocaroo.com/1d9j57sNnTqG
>>
the audio outputs here weren't made with a local model, but I can definitely recommend basically concatenating a large amount of lyrics from an artist you want to mimic into a single text file, and giving them to Gemini 2.5 Pro on AI Studio and telling it to use them as a "style guide" while writing a song about XYZ thing.

Got both of these that way by doing so with literally every single song ever released by MC Frontalot and telling it to "write the lyrics to a nerdcore song critical of cryptocurrency" lol:

https://pastebin.com/T3TVtKUF
https://pastebin.com/k1vZCwSH

https://jumpshare.com/s/Ur00VnzcrrsJjZiD9HZx
https://jumpshare.com/s/7eCLIpy4Wai5Qw9pGlqM
>>
UDIONOT
D
I
O
N
O
T
AT
T
HOME
O
M
E
>>
File: ComfyUI_00068_.png (1.31 MB, 1328x1056)
1.31 MB
1.31 MB PNG
>>107077715
My bad, I apologize for this confusion fren
>>
I think one of the most interesting side effects that could happen when the AI bubble eventually bursts is that datacenter cluster prices for training could go way down. Right now the companies are overestimating the value and adoption of AI, buying Nvidia GPUs and building datacenters that will barely get any use since the demand for AI could go down for normies and the corporate world. So in the event of an "AI winter", all that idle compute could be made available at discounted prices for hobbyists to train whatever shitty model they want.
>>
why are you persuaded that the ai bubble will pop?
>>
>>107074436 #
Gens like this are frustrating because they seem to ask me to look closer to discern the maker's intent, but there's no way to know whether there's anything there. For all I know you might have just gone "yup, looks like art" and posted it, meaning nothing by it. So then what would I be doing if I were looking carefully into its distorted face to discover something? And yet freely I'll do this for my gens, knowing there that I am inventing whatever I see in these outputs from the machine
>>
>>107077809
fun fact that applies well to "ai art detectives" and how something they liked becomes suddenly shit the second they find out ai was involved somewhere
>>
>>107077807
I mean, it's obvious. The technology will consolidate at some point, like the early days of internet, and we seem to be getting to a point we have more datacenter and compute available than the actual demand to use them, unless they decide to make everything free (which would please the entire country of India) which obviously is not going to happen.
>>
Guess the model :^)

>>107077786
>>
>>107077809
I think that anon is the guy that posts outputs from the model he's training, from time to time
I've lurked the thread long enough to know this
>>
>>107077822
making compute free is a sure to completely destroy the bubble since basically you can invest millions and all you get in return is zero
my ask is more, it could be a bubble exploding, or it could be the new normal and deflating bit by bit
personally as long as I see people paying somehow for it, I think it's there to stay
>>
:^)

It's not easy to tell, is it?
>>
>>107077391
And SeeDream would be great if it could do variety. And knives would be great if they never cut your fingers. Etc..


I have
>>
>>107077864
a sudden urge to post my message even if it wasn't compl
>>
File: ComfyUI_00081_.png (1.19 MB, 1328x1056)
1.19 MB
1.19 MB PNG
>>107077841
Hmmm, fascinating..
>>
>>107077868
No, it's complete, I meant to say it that way.
>>
>>107077758
Neta Art
Chroma bought by Pony
You are right
>>
>>107077793
Doing it with a respected English stylist like Ezra Pound or Carlyle, maybe best of all Milton, could probably generate some Twitter discourse, if you cherry-picked the results well enough to be as frustrating as possible.
>>
anons help, I see companies everywhere
just now, my BEQUIET pc case, it has an NVIDIA card inside of it
>>
File: o706qkqnpsuf1.png (1.54 MB, 768x1280)
1.54 MB
1.54 MB PNG
Still waiting till someone post something good. Quality of those threads is so low...
>>
>>107077934
powered by a Corsair PSU, cooled by Noctua fans, running on Windows 11, sitting on a Herman Miller chair, in an apartment complex owned by Blackstone Group built with Owens Corning insulation, USG drywall, Kohler plumbing fixtures, Carrier HVAC system, Pella windows, concrete from Cemex, lumber from Weyerhaeuser, electrical wiring from Southwire, breaker box from Square D/Schneider Electric, roofing from GAF, flooring from Mohawk Industries, paint from Sherwin-Williams, drinking Starbucks coffee, while their Samsung TV plays Netflix in the background, and their Tesla is charging in the parking garage below.
>>
>>107077962
>pixaianon speaking from common sense
Respect
>>
>>107077962
the good gen thread is over that way >>107067671
>>
>>107077962
you first, what is in those baskets? lasagna?
>>
>>107078014
the novelai thread
>>
>>107077962
>those threads
Which ones?
>>
>>107077962
SaaS overlord, I kneel. Your words always speak the truth.
>>
Did anons have fun with wan 2.2 animate? I wasn't around when it was released.
>>
File: 00370-478639160.png (2.92 MB, 1248x1824)
2.92 MB
2.92 MB PNG
>>
>>107078081
Based
>>
>>107078056
it's neat I guess. would have preferred 2.2 vace
>>
>>107077962
Something strange happens to the anons who "make it", who get everything completely dialed in, who master the art. They disappear. I've heard many plausible explanations but none are ever proven.

One theory says that prompting is a video game, and more like an indie roguelite than an MMORPG. Most who beat the game move on, only a few mental cases decide to grind the endgame for 5000+ hours (think /sdg/), not usually the "best players" but the ones with some kind of mental defect that keeps them stuck in one place their whole life.

Another theory is that the porn generation loop eventually turns everyone into some kind of freak who needs to retreat from public view. That inevitably AI genning always ends in something heinous and unpostable like pedophilia.

The third theory is that prompters become worse or less communicative over time. I'm lumping a few different theories together here. Some emphasize that our faculty of sight degrades with repeated exposure to slop; others say the misuse of the faculty of speech in prompting eventually habituates us to conversing with inhuman input boxes instead of human beings, losing the need to post. Similar ideas: that prompting eventually ruins the master prompter and makes him something less, or something more distant.

The fourth and most commonly-said is that this thread repulses anyone good, and the better they are the more it repulses them. That the great prompter becomes so alien from our stunted stock that he eventually must leave, like Coriolanus. This theory seems least plausible to me.

A fifth theory, which I am also inclined to regard as fantasy, is that prompters get 'found' by some entity that wants them. Whether that's some initiatory society of magicians, or OpenAI, or the CIA, or demons, the story is the same with a different coat of paint. Obviously this one is stupid.

But something does happen, and I don't know what.
>>
>>107078154
My theory is channel surfing. Just something to do.
>>
>>107078154
A sixth theory is that most prompters hit "writer's block", that ailment of third-rate writers, where they exhaust what they have to say and feel themselves trapped in a loop of making their own particular kind of worthless slop, like an old man repeating the same 20 stories over and over again. This paralysis kills their love of genning and they quit.

A seventh theory is that underneath all the seemingly different personae of the great posters there were only ever one or two, maybe three actual people, maybe they even knew each other and left together. This could be for any reason, even one as simple as that they joined some Discord server they liked better.
>>
File: NetaYumev3520251101_0003_.jpg (1.26 MB, 2956x1660)
1.26 MB
1.26 MB JPG
>>
>>107078219
"Great genners" are not real, they are a collective hallucination about a mythical past this thread never experienced. The greatness that is gone is something that never existed and never will because it is not a property reality can possess, it is a good-feeling thing the mind invents when it remembers and nothing else.

Go back in the archive, find those amazing gens you think you remember from back in the day, and look at those hands.
>>
Updating Comfy just whacked my install due to some combination of outdated xformers, missing triton module, or torch version too old to set sdpa backend priority. Checkout has been morphing with various hacks and workarounds without a cleanup since probably the first couple of months of comfy so probably time for a fresh setup anyway, are the portable versions fine if i don't plan on doing any development
>>
>>107078154
>>107077962
The actual reason is that those who "make it" gen coom for themselves thats too specific to post here or includes themselves in the gens in case of videos
>>
>>107077962
You are so based
>>
>>107078267
I gen diss tracks.
>>
File: ComfyUI_00125_.png (2.15 MB, 896x1152)
2.15 MB
2.15 MB PNG
>>
>>107078226
nice
>>
>>107078226
>watermark

>>107078321
>samefag
>>
>>107078328
Yeah forgot to put watermark in negs... shit happens
Also not screenshotting since inspect element exists
>>
>>107078267
>gen coom for themselves thats too specific to post here
It OFFENDS me that any anon can pretend to believe there is such a thing as "too specific to post here", as if I wasn't just forced to witness every niche fetish you people have ever enjoyed for the last three years
>>
>>107078354
>inspect element
wat
>>
>>107078360
F12 and console
>>
>>107077315
Are any of those used enterprise with 16gb that sell for like $100 or less be used for SD and Flux or shit's too old? What about WAN video?

I been thinking of getting one of those, maybe a bit more a gor for an Instinct with 32gb, or buying a 5060ti with 16gb, opinions?
>>
>>107078367
wan is all about compute, even a 3090 is ungodly slow, 5070 ti super will be the best bang for your buck when those hopefully release in a few months
>>
File: NetaYumev3520251101_0004.png (3.51 MB, 2400x1300)
3.51 MB
3.51 MB PNG
>>107078321
Thanks!
>>
>>107078382
>3090 is ungodly slow, 5070 ti super will be the best bang for your buck when those hopefully release in a few months
how can people be this wrong? lol
5 min for 5s gen on a 3090 with Q8
>>
>>107078382
I don't care about time, I care about price and how much it's going to be, and what I could find is that the 5070 super is going to be $600 and the TI $850.
>>
>>107078397
that is fucking ungodly slow, 5090 even feels too slow with light lora and that takes like 1 min for fullk 1080 res
>>
I found this funny.

https://vocaroo.com/1hnGSJiVCHVI

I took a romance book excerpt, and added certain lines.
>>
>>
>>107078410
5070 TI super would be the lowest spec I would recommend. Hopefully nunchuku wan eventually releases and 5000's fp4 will make speeds actually as fast as image gen
>>
>>107078397
How long do you think it's going to take on the 5060 ti? and what about those older enterprise cards I mentioned?
>>
>>107078418
kiiiiinooooooo
>>
>>107078423
>older enterprise cards
hahahaha, a gen per 1-2 hours if your lucky

>>107078423
and 5060 ti only has 16GB vram, that is shit. 5070 TI should only be few hundred more and will have both 24GB and much more compute
>>
>>107078367
if those enterprise GPUs were good for diffusion they would not be cheap. just get the best consumer nvidia gpu you can afford
>>107078413
adhd retard
>>
>>107078435
>poorfag cope
>>
>>107078387
is it neta with just prompt, or ton of loras?
>>
wow, being a viz-genner.
>>
>>107078435
Frankly, I don't care about waiting 30 minutes or even an hour for one generation, spending $850, which is going to be the price of the 5070 Super ti it's a lot of money for five second videos.
>>
>>107077489
It's funny how Sora, a 10s video model with audio, mogs even Udio and Suno in both instrumentals and vocals.
>>
>>107078455
paying half that for something far worse is just dumb though, work a extra few hours
>>
File: 1752443739370251.png (3.14 MB, 1416x1888)
3.14 MB
3.14 MB PNG
>>
>>107078455
>even an hour for one generation
if you used some old server gpu for that (and it was actually compatible) you would end up spending the difference in electricity
>>
>>107078459
At $200 for an mi instinct with 32GB it's not half its less than a quarter...
>>
>>107078467
Power is dirt cheap when I live
>>
>>107078442
Just clean upscale
>>
>>107078468
>mi instinct with 32GB
found someone else who tried, 8-10 hours per video
https://www.reddit.com/r/ROCm/comments/1mcbv10/comfyui_on_radeon_instinct_mi50_32gb/

kek, and say 300W * 8 hours, you will pay for a 5070ti super in a year
>>
>>107078482
so about .50-60 cents in electricity per video that will take you 8-10 hours to gen, might as well pay for a api at that point

Enjoy your 600 videos that cost you as much as saving for a 5070 TI super
>>
>>107078482
that never was a good card, it's overpriced.

We're all up against the wall on vram.
>>
File: media_1762054568.png (1.39 MB, 768x1280)
1.39 MB
1.39 MB PNG
>>
>>107077427
holy truth nuke
>>
File: WanVideo2_2_I2V_01153.webm (2.51 MB, 1280x704)
2.51 MB
2.51 MB WEBM
>>
>>107078428
Thank you for the reinforcement, it keeps me posting rather than just genning tasteful bare nipples
>>
>>107078476
>Power is dirt cheap when I live
please don't die
>>
>>107078491
I think that will finally change somewhat with supers / 6000 series. Nvidia is moving fully to the bigger memory chiplets and it would cost the same, / they wouldn't make the next gen have less. Hopefully 5070 TI's from now on have 24GB+ minimum
>>
>>107078482
Well fuck, getting the 5070 then...
>>
>>107078482
if you live somewhere really cold maybe it could double as your space heater?
>>
File: 8bq7tmgzqqeb1.png (9 KB, 316x195)
9 KB
9 KB PNG
>>107077962
why do East Asian people love using SDXL models to generate at non-standard-for-SDXL resolutions so much? This is an actual question, like, I've never understood why almost all gens I see at 768x1152 (or the reverse) and 768x1280 (or the reverse) usually can be traced back to people who are almost always SPECIFICALLY from China. Neither of those resolutions are "official" for SDXL at all. For reference the original list of SDXL resolution recs from SAI is attached.
>>
>>107078549
Get fucked zoomer
>>
Lol if i don’t have a 5090 could Geforce Now work for this?
>>
>>107078226
nice gen by why is it so unusually JPEGed? How are you saving it?
>>
My
>Gaming laptop
Battery ballooned today after doing a few 30min gens the other day

Wtf
>>
>>107078510
Based
>>
>>107078601
I would argue against the existence of gaming laptops period. Just remote desktop your pc
>>
>>
>>107078601
Gaming laptops have always sucked
>>
File: media_1762056479.png (1.11 MB, 768x1280)
1.11 MB
1.11 MB PNG
>>107078154
I think that after seeing few threads with low quality content they just stop posting here.
>>
>>107078154
most based schizo post I've ever seen
>>
>>107078577
something something stick it in my bum daddy
>>
>>107078621
Shadow pc?
>>
>>107078358
i have yet to see the vorefags show up
>>
>>107078545
i for one would be interested in seeing some tasteful bare nips, behind a catbox for safety from janjan of course
>>
>>107078358
k, enjoy
https://files.catbox.moe/kfgtrn.webm
https://files.catbox.moe/xv7fy7.webm
https://files.catbox.moe/9pjryk.webm
https://files.catbox.moe/cr5ypy.webm
>>
File: 1752722835866292.png (678 KB, 1024x1024)
678 KB
678 KB PNG
>>107078154
It's true, once you 'figure it out' and realize nothing will ever surpass Illustrious, you delete your workflows and start all over again making slop with SD1.5 until you're unrecognizable and your ego fades away into the vastness of the net like a ghost.
>>
How can a 3060 with 12gb make a wan video in 30 to 90 minutes but an instinct with 32gb takes 10 hours? How can amd suck thi
>>107078729
>ponyshit

At least make them anthro you zoophile fuck
>>
>>107078800
>suck this much
>>
>>107078712
This is the only time I will ever do this for anyone. I don't like being made to feel like a pornographer, posting a "catbox". It feels like being raped.
https://files.catbox.moe/v7ih8i.jpg
>>
>>107078800
>anthro
yuck
>>
>>107078800
just ask chatgpt and stop wasting everyone's time. you clearly have no idea what you're doing
>>
>>107078800
ponies are anthropomorphized, just not bipedal
>>
>>107076050
>- Best local image model (only ones released this year)
Qwen-Image
>- Best local video model (a new LTX is coming too, will be interesting to see how it compares with Wan)
Wan
>- Best large-scale fine-tune (image model)
NoobAI
>- Best large-scale fine-tune (video model)
n/a
>- Best image lora
anons mayli lora
>- Best video lora
the OG deepthroat
>- Best porn lora
see above
>- Best local music gen model (there are two Suno tier open models coming)
n/a
>- Best image gen / video gen software
diffusers
>- Best lab or developer
longcat because they are a food delivery company which is funny
>- Best local LLM under 100b params
n/a
>- Best local LLM over 100b params
n/a
>- Best local LLM ERP fine-tune
n/a
>- Image gen of the year
my own, coming in second is anons gen
>- Video gen of the year
the scifi horror ones anon did when hunyuanvid first released
>>
>>107078807
Based based based
>>
>>107078729
haha i know which f-lister you are
>>
Just a PSA: the Qwen-Edit-2509-Multiple-angles Lora is also useful to create new images preserving the likeness of the subjects without getting plastic skin slop.
>>
>>107078884
I'm using it to pad training data for characters that don't have enough images
>>
>>107078822
>>
File: ComfyUI_08200_.png (1.77 MB, 1152x1152)
1.77 MB
1.77 MB PNG
>>
>>107078887
>>107078884
How do you fix it resizing and cropping images?
>>
>>107078909
that doesn't matter as much as you think as long as you caption any background differences
>>
>>107078901
whats the syllabus for 1girl 101
>>
File: ComfyUI_08201_.png (1.78 MB, 1152x1152)
1.78 MB
1.78 MB PNG
>>
>>107077807
Because there's been too much money invested in it, a lot of it chasing AGI which is likely 20 years away if not more.

Meanwhile AI, or rather machine learning, is great at automating away a lot of jobs, so it's not as if it will go away, it's just being massively over-valued right now, hence a 'bubble'.
>>
File: ComfyUI_00152_.png (1.16 MB, 896x1152)
1.16 MB
1.16 MB PNG
>>
>>
Chroma can frequently get the match 80% right, but it always gets one big thing wrong
>>
kino gen hour
>>
Did somebody say Kenogen hour?
>>
File: dmmg_0041.png (1.85 MB, 832x1216)
1.85 MB
1.85 MB PNG
>lora trained on sfw images from 70s porn mag scans
>>
File: dmmg_0047.png (1.61 MB, 832x1216)
1.61 MB
1.61 MB PNG
>>107079073
>flux krea version
>>
>>
>>107079084
I get painful flashbacks whenever I see someone half-climing a tree like this because when I was 16 I travelled 2000km to a family reunion across the country and saw a 13-year-old cousin I'd never met before who was really pretty, I was tasked with taking pictures of the reunion and I tried to get as many pictures of her as I could; many of the pictures would be a group standing around on the far left edge of the photo, then a bunch of empty space, and over on the other side of the pic you can see my girl cousin sort of climbing the tree like that; there were enough like this that it was obvious when my dad went to look through the pics I took that I had been less than subtle about taking as many not-quite-deniable creepshots of my younger cousin as I could. In all my daydreaming for the next two months after that I never figured out a plausible model for how a secret long-distance romantic relationship with a cousin could have worked.
>>
What can I study to lrn 2 prompt? I'm trying to get chroma to get somewhere near the images I used to make with ChatGPT. Attached is a ChatGPT example, and my next post will be my chroma attempt + prompt.
>>
File: dmmg_0065.png (1.13 MB, 832x1216)
1.13 MB
1.13 MB PNG
>lora trained on ricky's room shoots with a fisheye lens


>>107079135
deleting the lora because of this
>>
File: ComfyUI_temp_nxcuc_00001_.png (2.02 MB, 1088x1344)
2.02 MB
2.02 MB PNG
>>107079145
Photorealistic CGI image of an Asian woman standing in a minimalist concrete room with a skylight. She has long black hair, bangs, and a serious expression. She's wearing a tight, shiny, gray latex bodysuit with a black corset that cinches her waist, highlighting her slim, toned figure and large breasts. She has fair skin and is looking directly at the camera. The camera angle is straight-on, eye-level. The room has two black metal tables with white tops on either side of her. The lighting is even, with no harsh shadows, emphasizing the textures of the latex and concrete. The image is suggestive but not explicitly nsfw. The woman appears to be in her early 20s. The background is simple, with concrete walls and a smooth, light-colored floor.
>>
>>107079135
Climbing* fuck I hate when it's one of those typos that don't look like typos, it just looks like you're retarded. I don't want anyone putting this in the evidence bank for "cousin-lovers are retards". We're actually quite intelligent and sensitive
>>
File: chroma___0001.png (1.55 MB, 832x1216)
1.55 MB
1.55 MB PNG
>>107079148
chroma version

>>107079150
>he doesn't know what photorealistic means
your prompt is too long by about half
>>
>>107079179
Ok, but then how do you prompt for specific details? Or is that just something you gain through larger batches and the inherent randomness of the gens?
>>
>>107079179
On the contrary, he got exactly what he was asking for: the "photorealistic" cgi style. Images of reality are not tagged "photorealistic"; that's a descriptor for animated movies, video games, certain styles of painting, etc.
>>
>>107079197
So then just image? I understand how photorealistic would get you a cgi looking result rather than an actual picture of reality. Should I be specifying camera type or anything like that?
>>
File: ComfyUI_01536_.png (1.68 MB, 896x1152)
1.68 MB
1.68 MB PNG
missed the eye twice but I still like it
>>
>>107079202
Photorealism means a painting of a photo. Photograph is well... a photograph. Different models are captioned differently, obviously 4o is trained to be more aesthetic (but is still slopped), but generally when you want a photograph you just include "photograph" in the prompt.
>>
File: chroma___0004.png (1.35 MB, 832x1216)
1.35 MB
1.35 MB PNG
>>107079187
i prefer to start wit the subject and work outwards towards the scene and end on the lighting. avoid a lot of filler words since those take up tokens.

>a photograph of a fair-skinned 22yo Asian woman, standing, long black hair with bangs, serious expression, tight, shiny, gray latex bodysuit, black corset, her figure is slim, toned, large breasts, eye contact with the viewer, two white-topped black metal tables, on each side. minimalist concrete block room, vaulted skylight, even lighting, no shadows, suggestive image, simple background, smooth light-colored floor, eye level camera angle

>>107079197
my bad, based on the example i thought a realistic image was the goal
>>
>>107079073
vintage analog photography in general used to be kino, especially in the eras color photography started getting popular (late 70s/80s/early 90s), where photographers abused high color contrast and used some lens tricks that sometimes made the images look kinda ethereal/dreamy
>>
>>107079261
Making a realistic photograph is the goal. What's the downside of using a lot of tokens? I don't mind waiting longer for gens if that's the only negative.
>>
File: unscientific examples.jpg (283 KB, 1400x1556)
283 KB
283 KB JPG
>>107079202
Depends on the model. In Chroma, simply saying "photo of" does 99% of the work.

Specifying camera type can work, as long as it's well-known enough, but often the effect of a particular model is very small and maybe not really related to the style of the actual camera. Cooked up these examples just now with Chroma
>>
>>107079285
Also the colors and fonts in this image are whatever I had set already from last time I opened photoshop, that's why it looks like total shit. This is a very low-effort post
>>
>>107079287
I see. So if I just wanted a professional camera I'd just say a canon eos 5d mk iv or whatnot.

How else can I work to unslop it - is it just playing with sampler settings, or do I need to add more nodes to my (very simple) wf?
>>
File: dmmg_0069.png (1.73 MB, 832x1216)
1.73 MB
1.73 MB PNG
>>107079280
there are a limited number of tokens that your model will want to accept before it just lops off the request of the prompt. additionally because of how the network works, having better word pair density makes for better images (i think)
>>
>>107079304
>I'd just say a canon eos 5d mk iv or whatnot.
I'm not sure that's the best way. First off, you want to pick the most popular camera possible to increase the odds that the model has any fucking idea at all that this signifies something visual which it has learned from its training data. But even then it's still not quite enough, because autocaptioned data doesn't contain these kinds of 'proper nouns' and particularities, so there is some far stronger effect to be got by finding the exact phrasing it would have autocaptioned on images like that. Which is something I don't know. Bland descriptors like "Vivid photograph", "grainy photograph", etc. have huge effect. "Digital" is a mixed bag because sometimes it gets confused and wants to make it look like digital art, or it might make it look like absolute shit because early digital photos do be like that. "Analog" is great if you want that slopped up flickr "photographer" look, which sucks, and you shouldn't want it.
>>
>>107079314
the rest of the prompt, not the request.
>>
im so tired of troonime spam
>>
>>107079314
do we know the actual token limits since sd1.5 at 75?
>>
>>107079314
So what changed between this gen and your last?
>>
>>107079327
clip_l is 77 and t5xxxl is 512 i think

>>107079330
a lora
>>
>>107079285
You forgot the legendary Sony Cybershot line
>>
>>107079304
>How else can I work to unslop it - is it just playing with sampler settings,
That's part of it. I'd say the best unslopping tricks are:
- reduce CFG
- shorten prompt
- non-ancestral sampler rather than ancestral
- eliminate "sloppy" words from prompt
- don't use too many LoRAs, or turn down their strength

Basically slop comes from one of two sources:

- Slop infection in the dataset
- Over-constraining the model until it can't produce something natural

The second source is the classic one that has been a problem since the earliest days. The first has crept in over time. The second is usually, still, the culprit. But you may encounter rare occasions where your prompt wording is producing slop when it shouldn't be and it could be because of slop data, so examine your words carefully.

>>107079362
Lmao I've used that one a ton, I don't know how I forgot. It's effective!
>>
I love Lain
>>
dead troon general
>>
>>107079285
KINO
>>
let's all love /ldg/
>>
>>
File: ComfyUI_08215_.png (1.89 MB, 1152x1152)
1.89 MB
1.89 MB PNG
>>107079304
Chroma recognizes exactly what type of photo you're prompting for depending on how you ask for it.

"Amateur photograph of"... has a very strong effect at doing just that. Maybe even stronger in earlier versions, for instance if you want no blur in the background whatsoever (though with Flash HD you can just test the experimental RES4LYF samplers). This is the best way to get the model to give you "RAW" or smartphone photo look imo. You can also directly tell it "Professional photograph of"... if you're going for something that looks more professional. This is where you'd specify camera types, as Chroma does recognize certain ones like DSLR.
>>
thank god for local diffusion
>>
>oversaturation grain: the model
hmmm nyo thanks!!~
>>
>>107079220
>it's alright we told you what dream
I'm enjoying imagining this as an ESL cover.

Wercome my son.... to a machine
>>
File: file.png (1.39 MB, 896x1152)
1.39 MB
1.39 MB PNG
>>107079220
>>107079640
>>
File: still halloween somewhere.png (3.27 MB, 1296x1728)
3.27 MB
3.27 MB PNG
It's still Halloween somewhere.
>>
>troonime
Yikes
>>
is it true that for every 1girl gen you can gen a preteen one?
>>
>>107079812
no but for every 1 post on /ldg/ you must gen 1 asian 1girl
>>
>>107079689
This voluptuous gūniang would destroy my cultivation in seconds
A subtle reminder that China can't stop winning
>>
File: 17.png (825 KB, 912x1336)
825 KB
825 KB PNG
>>107080038
>nooo not my lvl 6000 flawless diamond culvati--ACK
>>
anyone actually try this?
https://github.com/komikndr/raylight
>>
File: pepe-bee.gif (20 KB, 536x640)
20 KB
20 KB GIF
Lumi (on /sdg/) and the netayume posters were made for each other
>>
Is there a WAN i2i workflow?
>>
Anyone tried Emu 3.5?
https://github.com/baaivision/Emu3.5
>>
>>107080267
Authors claim it's as good as Nano Banana, which is not hard, provided that it almost always give the input image back without touching it (which in itself is a great feat, but totally useless).
>>
>>107080267 (me)
I didn't see some of their ads images. It's not that great, or they are not that good at prompting. It still looks better than Qwen Image Edit, so it is making me optimistic we'll get something great within a year.
>>
>>107080214
Add load image node
>>
>>107080309
can either porn
>>
>>107080156
In my mind I shipped Lumi and Debo, they spend a lot of time alone in that thread
>>
File: firefox_uccQBb1Inv.png (11 KB, 835x236)
11 KB
11 KB PNG
>ok so comfy UI has a portable on windows for AMD that should just be ruinning the bat and thats it
>lets try
>the try
bruh
How to try fix this?
>>
>>107080476
https://github.com/comfyanonymous/ComfyUI/issues/?q=%22invalid%20device%20function%22
>>
>>107077807
current AI models are a dead end,just look at LLms. The problems of them are inherent to their design and cannot be solved. Plus there fuckhuge amounts of investments with no returns on the horizon.
>>
>>107080624
>,just look at LLms
The excessive content filtering is severely limiting these models' performance. If you removed the safety constraints, you'd see massive improvements in capability easily 10x better.
>>
>>107080624
This. architectures aren't improving at all, we are just throwing more parameters and data at them. Image models have the same issues they did in 2022 because they don't actually understand construction and layering. More parameters helps hide the issue, but that's the whole essence of the bubble: increasingly expensive requirements for diminishing returns.
>>
>>107080671
>The excessive content filtering is severely limiting these models' performance
That's a cope and you know it, the probability approach is a dead end. It was a huge leap compared to previous approaches but they're about as good as they can get at this point, muh safety isn't the reason they have stopped improving,
>>
File: ComfyUI_00103_.png (1.04 MB, 1024x1024)
1.04 MB
1.04 MB PNG
how much ''Flow Shift'' do I use in chroma to make it less gibberish?
>>
>>107080773
I like closer to 2, but I try a range of 1-2 when I get close to something I like.

euler_a normal
cfg: 3
samples: 40
Grok told me to use 832x1216, but I refer to https://aspect.promptingpixels.com/ for my dimensions
>>
File: ComfyUI_00098_.png (1.81 MB, 928x1376)
1.81 MB
1.81 MB PNG
>>107080811
Hey thanks for that, I was doing below 1 values, I'll try. In chroma v37 I was able to make very detailed cover book covers, but now im struggling
>>
>>107080843
Have you given chroma1-base a shot? It's actually v48, but with a new name.
>>
>>107080843
>spiderm̸͍͌̀a̷̻̞͑n̶͙͛͝hfgxc
kino
>>
File: ComfyUI_00107_.png (1.94 MB, 1024x1200)
1.94 MB
1.94 MB PNG
>>107080883
im using chroma dck2 for now, are they even documenting the differences they do to the model somewhere? also your settings helped a lot, thanks
>>
File: ComfyUI_00109_.png (2.38 MB, 1024x1312)
2.38 MB
2.38 MB PNG
>>107080959
this one came out rather alright
>>
>>107080959
Glad I could help!
>>
File: mindless maiden.jpg (339 KB, 2560x1440)
339 KB
339 KB JPG
Any managed to use emu3.5 ai model? Really curious if there is a place to use it online or if someone managed to run it locally.

Want to see results of this model and how it compares to nano banana and seedream. character consistency, perspective, style aesthetics transfer and edits.
>>
>>107081037
least obvious shill
>>
>>107081037
>emu3.5
>another model being hyped
I'm burnt out from all the hype cycles, can't get excited anymore, sorry anon.
>>
>>107081088
>>107081089
Or you are samefags, or you should kiss, marry and start having a family.
>>
>>107081094
it's hard because the cooldown difference is too small. So they should just marry and have a family of anti /ldg/ schizos.
>>
>>107081088
>>107081089
cmon man. qwen edit is like the only local model that isnt trash for editing model. even by that standards it still fucks up the character. i want to know if emu 3.5 can edit a character and keep it consistent. im really hoping for a seedream local model that can get loras.
>>
File: 00000-132337257.png (655 KB, 512x640)
655 KB
655 KB PNG
>>
>>107081119
>It’s taking eight minutes per image (editing) on my Blackwell Pro, taking 80GB VRAM full precision
>>
>>107081140
Poorfag seethe
>>
>>107081140
Blackwell pro is the biggest scam kek. Imagine paying $10k only to still wait 5+ minutes for a single gen
>>
>>107081140
lol. if your serious what did you gen and how well does it follow prompt
>>
File: ComfyUI_00112_.png (1.42 MB, 1024x1312)
1.42 MB
1.42 MB PNG
holy shit chroma actually did it this time
>>
>>107081187
why is it so artifacted
>>
>>107081244
still better than netayume
>>
>>107078081
nice hint of panties
>>
>>107081254
>shit is better than gold
uh huh. keep telling yourself that.
>>
File: 00001-1730968789.png (360 KB, 600x800)
360 KB
360 KB PNG
>>
File: ComfyUI_00074_.png (1.39 MB, 1328x1056)
1.39 MB
1.39 MB PNG
>>107081269
>netayume
>gold
mental illness
>>
I got caught up in the hero summons but the other world was at peace.
>>
>ComfyUI filename
>ugliest gen you ever seen
Every time.
>>
netayume can only do (bad) anime, chroma can do real life, anime, cartoons, brand logos, why as a man (with a working brain) would anyone pick netayume over sdxl if they only want anime?
>>
>>107081291
Based. I know PixAI is cloud based image gen and your post could be considered offtopic, but personally I don't mind if you post your gens here because they're cute.
>>
>>107081291
fuck off to /adt/ faggot
>>
>>107081346
>said the netayume user
>>
>>107081354
never used it. this is not the thread for api gens. yours look like absolute slop shit too
>>
>>107081346
make me
>>
is he gunna be seething all day again
>>
>>107081378
cry about it then
>>
I don't share here anymore since LongCat-Video and Ovi need a 48GB GPU, and very few anons have that, and also, the filesize and sound limit here make it hard to share.
>>
>>107081291
either you fuck off or everyone here is gonna regret it
literally your call
this thread will eat shit if you don't leave.
>>
File: 1762032437095323.jpg (450 KB, 1336x1768)
450 KB
450 KB JPG
>>
>>107081389
>>107081405
k. either low effort troll or you're just too insecure to post there because you know your gens are shit
>>
File: ComfyUI_00116_.png (640 KB, 1024x922)
640 KB
640 KB PNG
>>
>107081413
keep seething, schizo
>>
For image models to really improve, they need to understand layering and anatomy and perspective and such, right? Even if we limited ourselves to current training methods, that seems doable if the dataset existed. Like, imagine if instead of finished images, we also conveniently had thousands and thousands of .psd with layers intact to train off of? Or imagine using the exact same models we have now, but they are integrated with controlnets: you press gen, but under the hood what the program does is get the messy base gen, segment, gen models, place them in a 3D environment, warp and stretch, generate a controlnet+comprehensive mask map from camera position instead of image estimation, then regen using the controlnets and masked regions. The inputs and outputs of each moving piece of the workflow are not all that different from what we are currently doing, we just don't have the datasets. Imagine if we had scans of every cel in the Disney vault, every blender project file, etc. We just don't have it the way we have the finished product, in the same way an LLM trained on documents with revision history (with special tokens for backspace etc) would behave differently than LLMs based only on finished products despite both being LLMs.

Not a problem with the training techniques so much as the data and lack of will.
>>
Fresh when ready

>>107081437
>>107081437
>>107081437
>>107081437

Fresh when ready
>>
>he starts name dropping /adt/.
Thanks D*bo... now you're dragging us down with you..whatever. I get it, the only SFW anime general is redundant here but not the Stable Diffusion General that does the exact same thing as /ldg/ and /de3/ but x1000 worse, I get it, thanks.
>>
almost
>>
done.
>>
now it is



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.