[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now open. Apply here!


[Advertise on 4chan]


Discussion and Development of Local Image, Video, and Music Models

Previous: >>108990829

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
SDWebUI: https://rentry.org/ldg-lazy-getting-started-guide#the-stable-diffusion-web-ui-lineage
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, & Upscalers
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/tdrussell/diffusion-pipe
https://github.com/kohya-ss/sd-scripts
https://github.com/kohya-ss/musubi-tuner

>Z
https://huggingface.co/Tongyi-MAI/Z-Image

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
https://animadex.net

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>Wan
https://github.com/Wan-Video/Wan2.2

>LTX-2.3
https://huggingface.co/collections/Lightricks/ltx-23

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
ANCHOR!!!!!!!!!!!!!!
>>
Blessed thread of frenship
>>
localpozzed status
>flux 2: CENSORED
>krea 2: CENSORED
>ideogram: CENSORED
>>
>your mom: UNCENSORED
>>
>>108996925
lmao

Can't wait for a scene release of Men In Black without black people.
>>
>>108996998
I want to try this but how do I prompt with it? Do I say girl in this image? I assume I use both the latent and model patch nodes?
>>
More ACEStep XL ZUTOMAYO LoRA kino, this time just J-Rock. These are the raw, unmastered outputs, without DCW enabled.
https://vocaroo.com/1cCVdYQg5ZnQ
https://vocaroo.com/1mpl1HJOqEvD

For those not aware, I made and refined an ACEStep XL LoRA training guide a while back https://rentry.co/s8fg8ber

I think I have now found the definitive way to inference ACEStep XL and get the most quality out of it, both with and without a LoRA. I tested against prompts on the official showcase- https://ace-step.github.io/ace-step-v1.5.github.io/

Notice how all the music Turbo makes sounds washed. The model you want to fix this while retaining most of Turbo's musical abilities is Base Turbo XL merge from https://huggingface.co/scragnog/ace-step-1.5-gguf-merge-models/tree/main

I have tested other merges as well and have arrived at this one as the best for all LoRAs trained on base, including for non-LoRA outputs as well.

With this model alone, there's no immediate need to master the outputs, because they aren't noisy by default. The UI I now use is https://github.com/scragnog/HOT-Step-CPP
since it has more samplers and settings (DPM++ 3M which I used here, etc...). As always, on 90% of prompts, DiT-only outputs are best, 50 steps, with a guidance scale of 12+. DCW is not needed with this model, as it's that good, and previous DCW settings I shared were not the best on this model (seems to have muted some instrumentals).
>>
Nobody cares about unreleased loras
>>
>>108997072
>J-Rock
GIRUGAMESH
>>
>>108997072
can i plug instrumentals into this model and generate pure vocals from it?
>>
Ideogram is fun for making degenerate comics. Someone should train it on hentai manga.
>>
>>108997131
how did you get actual degenerate comics to work? i feed it "a woman eating a popsicle" in json and I get censored somehow by this SAFETY fucked model
>>
>>108997144
Read the last thread
>>
File: Untitled.png (128 KB, 1137x496)
128 KB PNG
Automatic retopology using mesh and optional image input. Kinda shit tho.
>>
>>108997144
agreed, wait for the abliteration.
>>
>>108997154
very unoptomized poles
>>
>mfw Resource news

06/06/2026

>HugginFace VFS Plugin: Native Total Commander file system for Hugging Face models
https://github.com/mikinko/HuggingFace_WFX

>ComfyUI Lance AIO: Custom nodes to run Lance-3B
https://github.com/SteveImmanuel/comfyui-lance-aio

>Cube: Generative AI System for 3D
https://github.com/Roblox/cube

>The token bill comes due: Inside the industry scramble to manage AI’s runaway costs
https://techcrunch.com/2026/06/05/the-token-bill-comes-due-inside-the-industry-scramble-to-manage-ais-runaway-costs

06/05/2026

>RhymeFlow: Training-Free Acceleration for Video Generation with Asynchronous Denoising Flow Scheduling
https://simon-dcs.github.io/Website-of-RhymeFlow

>Complexity-Balanced Diffusion Splitting
https://noamissachar.github.io/CBS

>Can We Predict The Human Preference For Text-to-Image Content Prior To Generation And Is It Even Useful To Do So?
https://github.com/LSU-ATHENA/HPM-Predict

>SAM-Flow: Source-Anchored Masked Flow for Training-Free Image Editing
https://github.com/chwbob/Sam-Flow

>Geometry-Aware Dataset Condensation for Diffusion Model Training
https://github.com/2018cx/GADC

>StoryVideoQA: Scaling Deep Video Understanding with a Large-Scale, Multi-Genre and Auto-Generated Dataset
https://github.com/nercms-mmap/StoryVideoQA

>Lightricks to split into two companies as it cuts 75 jobs
https://www.calcalistech.com/ctechnews/article/r1dgjt5gmg

>Akium Sampler: Custom k-diffusion sampler for Stable Diffusion Forge / A1111
https://github.com/AkiumAI/akium-sampler

>When AI builds itself: Our progress toward recursive self-improvement, and its implications
https://www.anthropic.com/institute/recursive-self-improvement

>U.S. Government Officials In Talks To Acquire Shares In AI giants
https://www.notus.org/technology/trump-ai-stake-openai

06/04/2026

>Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation
https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity
>>
>mfw Research news

06/06/2026

>Physics-Informed Video Generation via Mixture-of-Experts Latent Alignment
https://arxiv.org/abs/2606.04737

>Real-Time Generation of Streamable Talking Portrait Video with Reference-Guided Deep Compression VAEs
https://arxiv.org/abs/2606.01620

>SymTRELLIS: Symmetry-Enforced Voxel Latents for 3D Generation
https://arxiv.org/abs/2606.04108

>Resonant Minds: Closed-Loop Social Avatars with Theory of Mind
https://arxiv.org/abs/2606.05896

>Pool-Select-Refine: Allocation-Aware Generative Dataset Distillation with Soft-Label-Guided Latent Refinement
https://arxiv.org/abs/2606.01920

>MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation
https://arxiv.org/abs/2606.04688

>CV-Arena: An Open Benchmark for Instructional Computer Vision Problem Solving with Human-AI Collaborative Preferences
https://arxiv.org/abs/2606.00931

>Diversity Matters: Revisiting Test-Time Compute in Vision-Language Models
https://arxiv.org/abs/2605.30713

>Beyond False Stability: High-Noise Drift Gating for Test-Time Adversarial Defenses in Vision-Language Models
https://arxiv.org/abs/2606.03730

>Density-Aware Translation of Spurious Correlations in Zero-Shot VLMs
https://arxiv.org/abs/2606.01710

>Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo
https://arxiv.org/abs/2606.01493

>Disentangling Visual and Factual Correctness in LVLMs' Visualization Literacy
https://arxiv.org/abs/2606.03142

>Chroma Clues: Leveraging Color Statistics to Detect Synthetic Images
https://arxiv.org/abs/2606.02224

>Beyond Encoder Accumulation: Measuring Encoder Roles in Multi-Encoder VLMs
https://arxiv.org/abs/2606.03879

>P2-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization
https://arxiv.org/abs/2606.03376

>Visual Persuasion: What Influences Decisions of Vision-Language Models?
https://arxiv.org/abs/2602.15278
>>
>>108997173
It either gets the poles right and fucks up the bars or something else.
Either way, the model would be dirt cheap to render. The quality just isn't that good.
>>
File: anima edit test.png (1.71 MB, 1605x1152)
1.71 MB PNG
>>108997048
Ok you don't need the patch and you need a lora like this https://civitai.red/models/2652469/anima-edit-experimental?modelVersionId=2978373 or this https://civitai.red/models/2650553/anima-edit-nude-filter-clothes-change-more?modelVersionId=2976234
Unfortunately they seem like very underbaked proof of concepts or specialized for narrow tasks like changing clothing.
I wonder how many image pairs you would need until the model obtains general purpose edit capability, as in applying t2i concepts it knows properly to i2i even if it never saw the precise task during training? Would you need a proper finetune or can that still be done as a lora?
>>
>>108996994
Krea 2 is not a local model, and anyone suggesting that Ideogram 4 ISN'T the most censored local model of all time by a ridiculously massive margin should actually rope ASAP. It's like six gorillion timess worse than anything else ever was, period, end of story. Only a shill would claim otherwise.
>>
>>108996611
again why the fuck the fuck would I bother with this shit unless the model is like 10x better than Klein 9B without being significantly slower? Which I'm certain it's absolutely not.
>>
>>108997072
Lyrics following is just perfect for those two gens btw.

Miku (DECO 27) generalizes well too

https://vocaroo.com/152fHPEB9In7
https://vocaroo.com/1iYfkcT7xV0N

>>108997078
Actually I was thinking of dropping these LoRAs, but given nobody has released anything for ACEStep XL, releasing them might get music mafia after me, because they're insane. Training LoRAs is very easy, at least on Modal with a rented H100 it's pretty quick.
>>
Retarded 1girl sloppers get filtered so easily. Who knew?
>>
>>108997284
>releasing them might get music mafia after me
I don't normally agree with schizo shit like this, but the music mafia is not to be fucked with
>>
>>108997251
There's an extreme astroturfing campaign on Reddit right now for it.
It makes the Qwen shilling look organic.
>>108996994
All of their models are fully uncensored. On the level of pretty much every popular local finetune. Had some stuff slip through the cracks unprompted.
I would be very surprised if they released even one of those versions locally without safety alignment.
We'll see I guess.
>>
Ideogram is going to be fun for a few weeks and then everyone will go back to whatever SDXL shitmerge they were using previously because you aren't allowed to train LoRAs or finetune it for NSFW.
>>
>>108997307
>meant Krea's models
>>
>>108997303
they can't touch china.
>>
>>108997315
>
Anima won.
>>
File: Untitled.png (278 KB, 1105x496)
278 KB PNG
More retopo
>>
if 6 GB of VRAM a curse?
>>
>>108997315
Ok here my yap out:
>you aren't allowed to train LoRAs or finetune it for NSFW
IANAL obviously but technically the license only says:
>However, we may also implement certain safety measures, content protections and other technological measures for the Model, including content filters and watermarking, and you agree that you will not circumvent, remove, alter, deactivate, degrade or thwart any such measures.
Without explicitly forbidding nudity anywhere in the license. Like I wonder if you can argue that you are just finetuning the model for boobies and the like (perfectly legal and you never agreed not to do that anywhere) and any change in the way the filter functions was a side effect. Like the model can already generate naked people with the correct json prompt. It just generates weird flesh sludge for the nether regions because it never saw enough cunts and penises during training. If these requests aren't flagged by the filter already with json prompting, and I am just making them look less shit, how am I circumventing the filter?
And no json prompting isn't circumventing since it's an explicit feature they trained the model for.
>>
File: mafw.png (254 KB, 900x806)
254 KB PNG
>>108997386
>IANAL
you anal?
>>
>>108997315
people will keep using Z family and or Klein is more like it. Given Ideogram 4 isn't even actually better than either of those in any meaningful way even putting the laughable inbuilt safety filters aside.
>>
>>108997394
instructions unclear, my penis is now DEEP in that Anon's butt (he is tight)
>>
>>108997386
The license also says that any usage of the model or derivative models has to abide by their referenced usage policy, and that usage policy prohibits you from generating anything lewd or pornographic. Obviously they can't stop people from doing what they want on their own devices but they can absolutely tell CivitAI or Huggingface to stop hosting finetunes or LoRAs that allow people to violate that part of the license, and that's an issue that at least Civit has folded on before.
>>
File: Ernie-Image_00073_.png (1.7 MB, 1200x896)
1.7 MB PNG
>>108997072
>>108997284
>>108997082
For me, it's my Fate Gear LoRA. Pure kino, able to do so much complex stuff with its instruments.
https://vocaroo.com/1iHt3NwfVDPi
https://vocaroo.com/12ERUa8VqKVB


>>108997083
>can i plug instrumentals into this model and generate pure vocals from it?

Model should be able to handle instrumentals and vocals separately, depending on what you mean by that. For pure vocals (acapellas) you need a LoRA as far as I know.

>>108997303
These are just model weights though. Maybe community fear is overblown.
>>
>>108997395
if you peruse the /r/stablediffusion subreddit right now, you'd think ideogram is the second coming. A giant leap forward. But with all the shilling, I'm yet to see single a gen that looks like anything approximating a real photo.

I don't know what to call the style, hyper-realistic? Like a lifelike drawing combined with cgi? But certainly nothing real looking. Is ideogram training on AI data or something?
>>
>>108997405
Yep ok I see it.
Oh well, nevertheless.
>>
>>108997444
I think it looks fine but in a way that is totally equivalent to other models that exist. Nobody I've seen has even tried to explain how it's actually BETTER than other models. Like can it even do editing? I've seen zero examples of that.
>>
>>108997442
>Model should be able to handle instrumentals and vocals separately, depending on what you mean by that.
i want to compose an instrumental and then use the model as the singer so i can have all of my stems separate for mixing
>>
>>108997315
> you aren't allowed to train LoRAs or finetune it for NSFW
wait, I thought the point of open weights was that they could be finetuned to do anything? Localkeks are getting excited over censored garbage they can't even teach to do porn?
>>
>>108997472
bro you don't understand bro, censorship is actually fine if it's the most egregious censorhip ever seen in any local model, the acceptability wraps around bro
>>
>>108997472
I remember a time when they released Llama 1 for researchers only and nobody gave a fuck and used it to goon shared the weights anyway.

Now everyone is kveching and clutching pearls over a license that cannot feasibly touch them. I don't know why this place is fully of weeping vaginas now.
>>
>>108997484
Anon, you're talking to a bunch of schizos who spend all day baiting in this general. They don't even use AI.
>>
>>108997444
It's funny how that subreddit was up in arms over SD2 and SD3's censorship and now it's exclusively populated by retarded thirdies who salivate over any new model regardless of how censored it is or how dogshit the results look.
>>
Most underrated Danbooru tag?
>>
>>108997174
>>108997176
thanks!
>>
>>108997487
we need more kinos in here!!!!
>>
>>108997539
skindentation
>>
>>108997174
> https://github.com/Roblox/cube
> Acknowledgments
> We thank the leadership, Nishchaie Khanna, Karun Channa, Anupam Singh, and David Baszucki, for their support and guidance throughout this work.
lol
>>
>>
What's stopping someone from taking Anima and finetuning it follow the same JSON BBOX prompting shit that Ideogram has?
>>
>>108997600
Money and possibly pride
>>
>>108997600
everything that's stopping local from being good. competent people and compute
>>
Not sure if this was posted yet but bigASP 3 is being trained on Klein 9B
https://huggingface.co/fancyfeast/bigasp-3
>>
>>108997619
damn, this might be good
>>
>>108997619
cool. I don't know wtf lodestone is doing but I've given up expecting anything usable from him
>>
>>108997476
>local models rank #1 on the safety arena
holy fuck saaskeks btfo
>>
>>108997631
Lodestone has unsurprisingly fucked up Zeta Chroma and has shifted to working on some 2.5B pixel space model instead while Zeta and Radiance train in the background in the hope that they'll be usable by July.
>>
https://huggingface.co/circlestone-labs/Anima/discussions/174#6a1ef4729f9c1460465d145f

>TensorArt's commercial license is permissive, and they can choose to use the model and charge for it however they want. They pay only a per-image fee, nothing else is restricted or costs anything. They can allow whatever creator monetization programs they want. The license doesn't require the model to be gated, paywalled, or anything else. It doesn't charge for model training either.

>I see a great number of people calling be greedy and unreasonable. I think the license fees I'm charging for Anima are very reasonable, and much lower than you would get from almost any other comparable model, based on the information I can find. I'm trying to build a sustainable business, and if I allow large platforms to use Anima for free, I will just go out of business and never train another model again. If you believe I'm being greedy and unreasonable, then please explain specifically what you think I should do differently.
>>
>quantized distilled untrainable releases
>restrictive licenses
>built in censorship
not even trolling, the future of local is looking grim
>>
>>108997619
What a retard, why on earth would you train on a model with a license that says BFL can't yank your right to distribute it should they not like the NSFW capacity it has.

kek
>>
nobody here cared about ideogram to begin with desu
>>
>>108997651
you license faggots are so annoying
>>
>>108997619
HOLY FUCK
PLEASE PLEASE PLEASE PLEASE end up being good.
This guy went radio silent for months ago. I guess this is what he ended up working on.
Not trying to jinx it but in theory this has the ingredients to succeed, a decent quality base model, large TE, best vae, someone who made large scale finetunes in the past without going too schizo about them. But obviously still lots of things can go wrong easily, I wish him the luck it needs to succeed.
This guy is also training on his own pocket I believe, so non-commercial restrictions shouldn't be a problem.
>>
>>108997631
He's been wasting 6-8 months on chasing pixel space which he just can't manage to converge with, radiance is dead no more training, zeta is practically dead

Eventually he will give up on pixel space, but he will have wasted SO much time that it's just over at this point.
>>
>>108997639
>Lodestone has unsurprisingly fucked up Zeta Chroma and has shifted to working on some 2.5B pixel space

I'd say I told you so to the anons I told them this would happen but I'm sure they've all roped by now.
>>
>>108997651
Was it only for the original BFL license and Klein license isn't as Draconian when it comes to arbitrary termination?
I might be misremembering but I believe something like this should be the case.
>>
File: Ideogram_00014_.png (2.8 MB, 1264x1264)
2.8 MB PNG
>>
>>108997666
>radiance is dead
he uploaded a new version 25 minutes ago lol
but I'm not testing it out
>>
>>108997658
Retards who think you can just ignore licenses are more annoying, picking a model with such a shitty license is stupid beyond belief, but yeah, it's his money to waste
>>
>>108997670
No, it's entirely arbitrary, if they interpret it as you circumventing their safety protections, then can just yank your right to distribute it, meaning sites like CivitAI etc will have to take it down
>>
>>108997680
once it's released and online, it's out. What's the worst that can happen? You have to download it only with a torrent?
>>
>>108997680
>Retards who think you can just ignore licenses are more annoying

I've trained dozens of LoRAs for models that told me I can't. You've never seen them. Nobody ever will. It doesn't affect me at all. I hate license faggots because they all carry the implication that everything will be served to them on a silver platter if the license allows it.

It's just outrage at not being spoon fed wrapped in the guise of pretending to care about the law.

Utter faggotry.
>>
File: zeta loss.png (117 KB, 1029x454)
117 KB PNG
>>108997639
He ditched dino stuff from Zeta, do you know if that's the reason it's training much faster than before, or if he pruned dataset or if he also did something else?
Not that I expect anything with this loss curve to turn out good. So that's why I hope BigAsp guy succeeds.
While we are at it do you know if anything else changed with Radiance?
And lastly, what's this 2.5b model? I can't think of anything public that fits the bill.
>>
just a bit of banter
>>
>>108997700
CivitAI and other sites won't be able to carry any loras, meaning it will effectively be dead
>>
>>108997691
Damn it's extra stupid to make a public repo before it's ready to be released than, maybe we should give a heads up to him?
>>
>>108997716
civitai are supreme faggots but other sites? Lol I don't think they're monitoring jack shit. As long as you're not posting it with nsfw previews. Anyway, if a model is really good and does all the nsfw stuff, loras required would be few and far between.
>>
>>108997706
>I've trained dozens of LoRAs for models that told me I can't.
That's not the issue, what you do on YOUR computer can't be controlled, but you can't build an ecosystem around a model if you can't share loras / finetunes for it, which is the case here should BFL yank the license.

Why even take the risk, he could have gone with Z-Image Base or Flux 2 Klein 4b.
>>
>>108997710
I haven't been paying super close attention to what he's been doing with Zeta, just that he changed some training stuff and was giving it until the end of July. I remember reading on his Discord that he fucked with the batch size so that might be why it's training faster. He started Radiance up again for some reason (I don't know the exact reason) around the time that people started noticing that Zeta wasn't getting much better.

This is apparently the 2.5B model https://huggingface.co/lodestones/debug-flow
>>
has there been a single instance these ai faggot corps taken anyone into the court?
>>
>>108997727
When was the list time we got a large scale fine tune worth a shit from anyone regardless of license? Nobody gives a fuck about lode of shit stones money furnace.

As long as I have a training script and a GPU I don't care what other think about the license or their shitty ecosystem. Again. Expecting to be served wrapped in concern for a license.
>>
>>108997710
I know nothing about lodestone's new models. I tried zeta about a week ago and all I could gen were human-shaped blobs
>>
>>108997716
if the model is good enough, loras will emerge. it really is that simple. there has yet to be an actual good model that failed to take off. i said the same thing back when chroma amounted to 3 loras per week and chromakeks pissed and shat themselves but look at it now, everyone moved on because it sucked.
if there truly was a generational leap in nsfw models people would find a way to build a community around it. remember that civitai only got popular because people wanted to easily browse and share loras for the nai leak
>>
enough yapping, put up or shut up
>>
>>108997730
I have seen that repo but assumed it was an experiment or test from the name DEBUG-flow.
Yeah I am not hopeful for anything 2.5B even if he somehow doesn't fuck it up.
Thanks for the info.
>>
File: Ideogram_00031_.png (2.31 MB, 1456x1088)
2.31 MB PNG
>>
>>108997828
what did you prompt for this style?
>>
why are the people in the local language model general all rich people with hundreds of gigabytes of memory, while everyone here are pajeets who can't run anything and just troll each other?
>>
>>108997731
They don't have to, they just contact sites like Civitai and tell them that model x is violating their license, and it will be removed

Even a mega-autist like lodestone has been very clear that he would never do a large finetune on anything not permissively licensed
>>
>>108997836
Natural filtering. They talk about images there too. The only difference is the poors can only participate here. It's like a natural sieve.
>>
*yawn*
>>
>>108997834
World of Warcraft ingame footage
>>
>>108997471
The closest that I know of is the Cover NoFSQ feature.

I can think of a workflow that might work. Haven't tried pure instrumentals, will that and let you know. But here's something that might work. You generate with your instruments, then you lower the cover strength, then you should have lyrics that are aligned over similar instrumentals. Then you place the vocals on the original instruments.

Here's the cover NoFSQ feature used on Black No.1 at 0.3 strength, lyrics are aligned though the instruments are off so this should theoretically work.

https://vocaroo.com/12EgDtyMeM0z

>>108997619
Was his dataset really that good or is everyone nostalgiafagging? Last I checked a bigASP model was SDXL days, and since then he's had plenty of opportunities to shine (Chroma.1 HD tune, Lumina, etc...) And he never delivered.
>>
File: radiance.jpg (85 KB, 1024x1536)
85 KB JPG
>>108997666 >>108997678
its working quite ok, not sure what you're complaining about.

sam altman didn't give him 1.5 trillion USD, and on consumer hardware training takes quite a lot of time. unsurprisingly. including attempts that didn't work he still basically achieved yet another model.
>>
File: radiance.jpg (103 KB, 1024x1536)
103 KB JPG
>>
>>108997878
>will that and let you know
will try*
>>
>>108997878
i don't understand. did you get an output that was just the vocals and nothing else? i don't want the model to apply the vocals to the song, i need the separate vocals so i can mix it in my own DAW and apply effects without affecting the instruments
>>
File: 1778349712251270.png (18 KB, 1000x1000)
18 KB PNG
where can i find that N64 lora
>>
>>108997881
can you post a realistic image?
>>
>>
comfy local wont launch anymore
u guys were right, they really did kill it
>>
File: Ernie-Image_00076_.png (1.86 MB, 1200x896)
1.86 MB PNG
>>108997878
Model's meme potential is just insane.
>>108997897
Nah, this was the workflow ->

Input song, custom lyrics on ACEStep.cpp. Enable the cover-nofsq feature and use the instrumentals you want as a source (Src option checked).
Write caption, metadata, and lyrics for your generation. Cover strength should be anywhere from 0.5-0.01 (experiment for what works best for your particular song). I found 0.3 to be a sweet spot for this on merge Base Turbo model.

The model makes a cover with similar instruments and aligned lyrics (in this case it has lyrics, but yours won't so it will just place vocals described in the caption over the song that doesn't have them). That's what that vocaroo is.

Then theoretically you'd want to use something like a vocalremover API or model to isolate the vocals from the generation and mix them that way.
>>
>>108997954
you're just retarded. shut the fuck up
>>
I'm creatively bankrupt
give me ideas
>inb4 X23
>>
>>108997649
>then please explain specifically what you think I should do differently.

obviously he should be doing it for free, regardless of how many hours it takes
is he actually stupid?
>>
>>108997968
Iwakura Lain
>>
File: zeta.png (969 KB, 720x1280)
969 KB PNG
>>108997710
>Not that I expect anything with this loss curve to turn out good.
it's not a single training with unchanged settings or on the low part of the lr trajectory from high to low lr so this doesn't mean so much

in the end it's mostly just a question if the non-dino distance training now works better. zeta is very rough still.
>>
>>
>>108997968
Laura Kinney
>>
>>108997965
Worth noting, if you have just the vocals of a song, there's a complete feature that adds instruments to it https://github.com/ace-step/ACE-Step-1.5/blob/main/docs/en/ace_step_musicians_guide.md

No clue if that works for instruments, worth a try for your usecase, and if that doesn't work then cover-nofsq feature it is.
>>
>open comfyui
>press X at top right of screen to close it
>it does not close

bros????????????
>>
>>108997965
eh, a vocal isolation model wouldn't sound as good as the original. i guess i have to wait for someone to make a model that does it since i already have all of my drums and other stuff as separate tracks, so i want the vocals to be the same for full control
>>
File: based.png (81 KB, 1691x477)
81 KB PNG
finally some good fucking food
also licensekeks BTFO, please tell me all about the lawsuit BFL will definitely for real file against BigASP man for training Klein 9B this way
>>
>>108998031
It's unreal how much I hate licencekeks. Imagine seeing everyone having fun and having to remind everyone about the license the moment they mention a model.
>>
what's the deal with ideogram??? it seems to have the most censorship of any local model, yet bypassing it apparently unlocks some of the craziest prompt comprehension available? so it's like early dall-e 3? is it possible this gets any finetuning attention or is it just another point-and-look like flux [dev]?
>>
File deleted.
>>108997974
is this considered nsfw if nothing shows
>>
>>108998059
>it seems to have the most censorship of any local model, yet bypassing it apparently unlocks some of the craziest prompt comprehension available? so it's like early dall-e 3?

Got examples of that? Isn't this model slopped to hell and back btw.
>>
>>
>>108998059
Just a very well trained model with a novel prompting style with some baffling censorship. People will get around it eventually. Until then there's doomers and license cucks going nuts.
>>
>>108998072
I was recently banned for posting megumin in bikini (all covered)
>>
>>108998081
You should have stayed that way
>>
>>108998084
seethe, troon
>>
>>108998072
TJD
>>
>>108998089
oh i will! i don't need your permission!
>>
>>108998073
I mean, for all intents and purposes we already have an uncensored Dalle 3 (Chroma 1 HD). I doubt any model will compare to that. Won't try Ideogram just to get my hopes up for a nothingburger.
>>
>>108998073
i genuinely don't know, i'm just reading around. all i've seen isjust generic shit i would expect from nano banana. some people are saying the model is really good if you bypass the filter using a long json prompt, and others are saying it's censored slop unusable for nsfw and unable to be finetuned. i'm curious if anyone actually has any interesting nsfw outputs or if it's all just a shill brigade making shit up
>>
wake up, deepbeepmeep. wake up. you broke something. KLEIN IS BROKEN!
>>
File: Ideogram_00051_.png (2.73 MB, 1680x944)
2.73 MB PNG
>>
>>
>>108998095
Dalle 3 was never capable of anything that wasn't slopped 4-channel VAE bullshit, people talk about it with the largest pair of rose colored lenses ever
>>
>>108998124
Def. slopped but the model had power and potential. Not as much as Chroma which is basically a non-slopped version of it.
>>
>>108998095
chroma is garbage that doesn't even know 1/10 the characters of dall-e 3. anima is actually comparable. ideogram too has a lot of character knowledge
>>
File: Ideogram_00056_.png (3.79 MB, 1936x1088)
3.79 MB PNG
>>
Does id4 have the GUI for the regional prompting in comfy yet?
>>
>create a comic in Ideogram
>copy the JSON format
>paste it in an LLM and ask it to continue the comic's story along with the official prompting instructions
>spits out a new JSON prompt that continues the story
It's not perfect but it's like a visual chatbot instead of just text. Schizos can seethe all they want but I haven't had this much fun with image models in a while.
>>
>>108998170
My eyes still haven't adjusted to it yet, so the outputs don't look like AI to me.
>>
are you guys using this? https://huggingface.co/bertbobson/Ideogram-4-INT8-ConvRot/tree/main
>>
File: Ideogram_00059_.png (3.58 MB, 1936x1088)
3.58 MB PNG
>>
>>108998169
Just use the ID4 prompt builder from kjnodes
>>
>>108997828
Ugh, WoD models, soulless.
/spit
>>
>>108998209
>/spit

Sorry you can’t so that anymore.
>>
>>108998152
>ideogram too has a lot of character knowledge
Does it? I didn't try too much but couldn't get it to gen any other Vocaloid girl besides Miku properly.
It seems to know "a bit" about wide variety of characters, but the amount of characters it can recognizably gen without major errors isn't very high.
Oh and don't get me wrong, it is still more than local base model releases nowadays, most are completely safetycucked out of captioning any during training. But still it knows a tiny fraction of what booru models know.
>>
>>108998192
Yep.
Legit revitalized my 3060. Convrot int8 is the highest quality 8-bit quant, there is probably some value in checking it out even if you are on newer GPUs.
>>
File: radiance.png (2.25 MB, 1024x1536)
2.25 MB PNG
>>108997948
sure.

i think it's not as interesting for 1girl, realistic atm (qwen/z-image and so on are better)
>>
File: radiance.jpg (108 KB, 1024x1536)
108 KB JPG
>>108997948
>>
File: ComfyUI_00044_.jpg (1.19 MB, 1184x1776)
1.19 MB JPG
>>108998197
That's not a GUI, anon.
>>
File: radiance.jpg (119 KB, 1536x1024)
119 KB JPG
>>108997948
beach with a fortress on a rock in the ocean
>>
>>108997917
https://files.catbox.moe/w57tcb.safetensors
>>
>>108998253
>>108998257
thanks. It leaves a lot to be desired
>>
>>108998247
what workflow do you use with it? catbox any chance
>>
I guess I just need to learn how to prompt for id4. But at least it did follow the prompt, wanted to test extremes.
>>
>>108998260
The node has the regional prompting interface you're asking about. Not sure what else you're looking for.
>>
>>108997828
the man has blue tayota prius 2012 he doesn't need elven bitches
>>
>>108998274
those look like letters. it's trying to tell you something
>>
Oof.

>>108998277
Yeah I know it will translate your regional prompting, but I want it visually, it's so much faster and easier.

>>108998281
I have a feeling that id4 is good at taking directly from its trained images, so it injects whatever else there was in the image if you prompt is tiny. So this was probably a promotional wallpaper for some anime shit.
>>
>>108998270
You only need "Load Diffusion Model INT8 (W8A8)" node technically. You should enable on the fly quantization and convrot toggles, set the appropriate model type and then you can load any bf16 checkpoint. You can use save int8 node to make your own checkpoints so that you don't need to wait through quantization in the future.
Lora situation is a bit messy. Dynamic and preloading have the highest quality. Dynamic has 10% speed hit and pre-load means you need to sit through quantization again every time you change loras.
Stochastic and None still have decent quality, when they work. Some loras will work with both. Some prefer one of the other. Stochastic is the safer choice overall in my experience.
Here, I re-run my Brazil Miku gen from the last thread with int8 >>108996329. Just incase there is confusion you obviously don't need CFG stuff:
https://litter.catbox.moe/027m1d9fs3aswy79.png
>>
>goes close-ups of areola and nipple just fine
>prompt for penis

Breasts are more prominent in "art" I suppose, for training data.
>>
Holy, shit, vaginas are horrible.

https://files.catbox.moe/95ryv9.png
>>
File: Flux2-Klein_01110_.png (1.53 MB, 896x1152)
1.53 MB PNG
>>108997619
So I tried it and yeah it needs a lot of work.
I wish him the best.
>>
>>108998329
Vagina is technically the term for the internal portion so that image sort of makes sense, probably has a bit of medical internal camera shit in its data somewhere. Try labia or vulva or cleft of venus or something.
>>
>>108998309
Thanks man
>>
>>108998336
TRY PUSSY
>>
>>108998352
It's going to give you a cat or something. Probably.
>>
>>108998352
pussy will literally give you a cat on censored models
cock will give you a chicken
>>
File: 5246.gif (532 KB, 250x188)
532 KB GIF
>>108998357
>>
>>108997619
please be good
>>
File: radiance.jpg (125 KB, 1024x1536)
125 KB JPG
>>108998268
no problem
>>
i have to remake my images cause i used a reference image that had a mistake i didn't realize
>>
File: radiance.jpg (189 KB, 1024x1536)
189 KB JPG
>>
File: ComfyUI_00058_.png (3.48 MB, 1184x1776)
3.48 MB PNG
I wonder if there's a way to reduce how much the prompt takes from the training data. Like this is just straight up like 80% of the original image I bet.

>>108998336
Yeah all the results, even for anus, looked like the peephole for intestinal surgery.
With more detail it was just something resembling a crotch but still body horror.
>>
File: ComfyUI_00063_.png (1.39 MB, 848x1264)
1.39 MB PNG
>>108998414
It's time to go full abstract, my forte.
>>
File: ComfyUI_00083_.png (1.47 MB, 848x1264)
1.47 MB PNG
>>108998426
Damn, wasn't as fun as with normal models.
>>
dang it i cant get beast girls with this realism model
it either gives me fully anthro cat women with large ass cat heads and very sus detailed fur+gens or women with car ears and gloves/leggings
>>
>>108998279
>blue tayota prius 2012
It's 2017 blue toyota prius. Thank you.
>>
>>108998266
thank you
>>
File: Evie_Anima_to_ZiT.jpg (783 KB, 2688x2688)
783 KB JPG
I realized I have skill issues in realism 1girl composition. Been 1girli-ing on anime too long.
>>
File: Ideogram_00060_.png (2.32 MB, 1936x1088)
2.32 MB PNG
>>
I'm so grateful that the the Anima shilling or tdrusell botting arc has stopped, I can breathe fresh air again, see new ideas, new things. This is the real /ldg/.
>>
>>108998577
I still don't know why he doesn't have LTX 2.3 on his trainer.
>>
>>108998590
Oh wait, just saw he finally added it.
>>
File: 1768709435594642.jpg (1.84 MB, 2048x1536)
1.84 MB JPG
>its the year 2032
>agi is near, but most image models coming out are still less realistic than Z Image Turbo
how do you respond without getting mad?
>>
File: 1759495099466894.png (1.87 MB, 1122x1402)
1.87 MB PNG
>>108998577
anima shills moved onto shilling ideogram.
there are discord servers where you can put up bounties for jeets to promote your models
>>
>>108998626
i generate more kinos on my klein 300b
>>
It's interesting to me. I sometimes go on those isthisAI subreddits (I know I know) and I notice how a lot of the gens here are generally more realistic than the very obvious saas slop you see there. I wonder how much of this just escapes normie filters entirely.
>>
>>108998630
klein 300b with a vae
>>
>>108998634
boomers think videos of cats playing drums on ring door cameras are real
>>
File: Ideogram_00070_.png (3.17 MB, 1936x1088)
3.17 MB PNG
>>
File: ZIT.png (2.48 MB, 1536x1536)
2.48 MB PNG
>>
Anyone else implement AMD yet beside SDNext?
>>
File: 1748351684074420.png (217 KB, 716x659)
217 KB PNG
>>108996994
but why are they censored, since the people of these models do not have genitals?
>>
>>108998782
Sd.cpp
>>
Anima is as fast as Microsoft Lens. Does anyone know why Microsoft Lens was so horribly broken when it came to faces and hands?
>>
>>108998782
RocM works fine for comfy and forge neo
>>
>Yes, Ideogram (Welcome to Ideogram) lacks native support for complex, non-Latin character sets with unusual diacritics Text and Typography - Ideogram. Generating Polytonic Greek (which includes breathings like psili and dasia, as well as multiple accents) and specifically combining these with macrons will likely result in jumbled, hallucinated, or completely incorrect letters Text and Typography - Ideogram.

Saved myself a download. It doesn't do anything useful to me.
>>
>>108998833
rocm support has gotten better this year.

However, nobody at all has rdna2 support for trellis. One guy, last month, managed to get a fork of Microsoft's trellis code to work with rdna3.

rdna4 apparently works fine.
>>
>>108998846
>rdna2
still pissed at the fennec faggot for dropping support entirely
>>
>>108998864
rdna2 still works with comfyui, but it's getting slower.
>>
File: output_1780833333.png (1.63 MB, 832x1216)
1.63 MB PNG
anima. I guess I butchered the prompt, but I like it.
>>
>>108998878
support sdcpp more
>>
>>108998890
sd.cpp is shit tho
>>
>>108998626
Box please?
>>
>>108998904
>sd.cpp is shit tho
Computers are hard. Scripting is hard.
>>
File: Ideogram_00076_.png (3.74 MB, 1264x1680)
3.74 MB PNG
>>
>>108998905
Deleted it
https://civitai.red/models/2088956/famegrid-2nd-gen-z-image-qwen?modelVersionId=2604982
>>
File: 637865.webm (3.72 MB, 768x576)
3.72 MB
3.72 MB WEBM
you can set a character reference as the first 6 frames and it will mostly get it right
>>
>>108998890
I've basically switched. But, gguf aren't working with at least my rdna2 card. not sure what's up:
https://github.com/leejet/stable-diffusion.cpp/issues/1488

I don't have a github account *shrug*.

I remember trying to get into discussing tech topics and getting confused by all the weird down voting and the massive bad attitude, lots of places, but a big one was Stack Exchange. Indians really are totally incompatible with us in every sense possible, socially.
>>
>>108998904
It's already better than comfyui.
>>
>>108998937
kek good one
>>
>>108998829
correction, anima is half as fast as msft lens (lens is trash, but idk maybe lens2 will be good?)

>>108998889
ok I think it's because I had cfg=1.
>>
Chicoms are winning.
>>
File: Ideogram_00079_.png (2.36 MB, 1920x832)
2.36 MB PNG
>>
File: output_1780835577.png (1.31 MB, 832x1216)
1.31 MB PNG
cute girls will say hi to you. go to church.

>>108998925
It was good at the start, but the ai "moot chinned" her at the end. real women just rarely have massive boobs or massive chins, but ai thinks moar boob=moar indiangood and moar chin =moar indiangood

>>108999015
It is, I have totally switched. Use Obsidian.

you have things to paste between ``` marks like

```
cd [I typed the path to the bin folder here]
HSA_OVERRIDE_GFX_VERSION=10.3.0 HIP_VISIBLE_DEVICES=0 ./sd-cli --diffusion-model ~/ComfyUI/models/diffusion_models/anima-base-v1.0.safetensors --vae ~/ComfyUI/models/vae/qwen_image_vae.safetensors --llm ~/ComfyUI/models/text_encoders/qwen_3_06b_base.safetensors -p "1girl, laughing, yellow socks, green dress, pews. The girl is standing with her legs spread out on top of a church. @kanosawa" -n "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia" --cfg-scale 6.0 -v -W 832 -H 1216 -s -1 --offload-to-cpu --steps 8 -o "output_$(date +%s).png" --sigmas "1.0000, 0.9982, 0.9962, 0.9939, 0.9913, 0.9883, 0.9848, 0.9807, 0.9759, 0.9700, 0.9628, 0.9538, 0.9421, 0.9266, 0.9051, 0.8744, 0.8290, 0.7627, 0.6769, 0.5911, 0.5247, 0.4793, 0.4486, 0.4272, 0.4117, 0.4000, 0.3968, 0.3931, 0.3890, 0.3842, 0.3788, 0.3724, 0.3650, 0.3561, 0.3456, 0.3328, 0.3172, 0.2982, 0.2752, 0.2478, 0.2163, 0.1825, 0.1486, 0.1172, 0.0898, 0.0667, 0.0477, 0.0322, 0.0194, 0.0088" --sampling-method heun --preview proj --preview-path ./preview.png
```


That's just Tan2 sigmas pulled from comfyui. You can steal any sigmas using the preview as text node thing.

it works fine with Powershell, you just have use backslashes idk stuff like that.

>>108999039
you did it. You found the use case for idiogram: backrooms and Mall World dream simulation.

btw, have you ever been to Mall World in your dreams? It's like a mall, only where idk it stretches forever, and the geometry of it is goofy. Sometimes stores and things that are almost never in malls showup
>>
is there an anima character wildcard?
I don't know many anime girls
>>
>>108999054
>have you ever been to Mall World in your dreams?
I'm more of a finding a room in your house that never existed before type guy. Usually a very large public bathroom.
>>
instagram just banned my ai girl
i had 70k followers :(
>>
File: 24645.webm (740 KB, 768x576)
740 KB
740 KB WEBM
what is moot chin? i know about the flux butt chin, but i don't know about the other AI chin varients
>>
>>108999079
KWAB
I hope you made some money off it at least
what reason did they give you?
>>
>>108999072
i don't have specifically that but the "danbooru" search result wildcards on civitai should work to a pretty large degree, pick any or multiple of them
>>
File: output_1780836795.png (1.58 MB, 832x1216)
1.58 MB PNG
>>108999054
>>
using anima and the prompt scheduler from asagi4, can i specify that i want something like [dog:cat:4]? so i want cat to replace dog after 4 steps? instead of using 0.1 or whatever? there's an advanced node on comfyui that has the number of steps parameter, would i have to repeat the prompt multiple times and then specify only the differing part on each one, in order to assign the number of steps i want? e.g. <full text> + dog with 4 steps, then <full text> + cat on another?
>>
File: file.png (123 KB, 526x633)
123 KB PNG
>>108999094
i only made a little bit of money from paypigs but thats about it.
apparently you get banned if you don't label your account as ai. wtf is this shit?
IM MAKING CONTENT ON YOUR SITE NIGGER WHO GIVES A FUCK IF ITS "REAL" OR NOT!!
>>
>>108999194
instagram is gay as fuck
I hope someone nukes meta hq
>>
File: output_1780838771.png (1.56 MB, 832x1216)
1.56 MB PNG
>>108999167
>>
>>108999194
benchod
>>
File: 1758697838363401.png (122 KB, 960x509)
122 KB PNG
why do you need to generate an image when you can just imagine it instead?
>>
>>108999309
I can't turn my thoughts into a wallpaper
>>
>>108999309
Because image generation doesn't actually adhere to your ideas, but adheres to your prompt.

For this reason, like seeing another artist's painting, you see from another inner eye.
>>
>>108999313
"1girl, transgender, handgun, wall"
>>
>>108999309
my goon chamber needed posters
>>
File: zeta.png (1.47 MB, 720x1280)
1.47 MB PNG
>>
>>108997619
If he is now training on non-commercial models, why not at least use Anima? That way the model is starting with full booru tag knowledge plus it's probably 3x faster (and therefore cheaper) to train. Anima's realism is already almost good, a few million image finetune easily converts it to a competent realism model.
>>
>>108999369
catastrophic forgetting
>>
>>108999309
because imagination takes active focus, no matter how small, meaning you are not exploring something but creating every aspect of it, same as why you cant enjoy a story in the same way when you are just reading it for the first time vs when you were the one to have to come up with everything in it with nothing to surprise you coming up

and imagination doesnt have the rng engine of the ai which creates interesting things that you wouldn't have thought of easily or wouldn't have known
>>
File: 1749810492186288.png (16 KB, 192x282)
16 KB PNG
>>108999382
>>
>>108999378
>catastrophically forgets all the weebshit while learning the realism
sounds like a win to me
>>
>>108999397
what i said has nothing to do with what one can and cant imagine because of his ability, unless you give yourself brain damage after writing a story, you will never be surprised by a big twist in that story when reading it, retard
>>
>>108999369
anima is a commercial model. non commercial means no way of monetizing
>>
>>108999453
retard
>>
>>108999486
how?
>>
>>108999453
it is really slimy Russ won't just call it what it is. It's like he's lying on purpose to seem more like an open source guy when he is just a slimy salesman
>>
>>108999397
those are unironically blessed
I wish I was able to erase some images from my mind
>>
whats the difference between changing samplers and lora mid generation vs generating my image with one sampler then img2img with the other sampler?
>>
File: ComfyUI_00019_.png (3.2 MB, 1536x1536)
3.2 MB PNG
>>
>>108997251
>and anyone suggesting that Ideogram 4 ISN'T the most censored local model of all time by a ridiculously massive margin should actually rope ASAP
its less censored than SD 2 and SDXL
>>
didn't stable diffusion come out in like... 2023? why are we still talking about it?
>>
>>108999626
why are you here
>>
>>108999558
>>
>>108999626
sdxl is still king unfortunately
>>
File: Untitled.png (255 KB, 2288x2000)
255 KB PNG
>>108999453
GROKKED
>>
Here's the official image gen rankings for "takes too long":

40- seconds: i eat that shit for breakfast
50 seconds: decent
60 seconds: you're pushing it
70 seconds: tone it down buddy
80+ seconds: retard
>>
>>108999644
if you have a lot of images in the style you like (like 30-100) you could bake a lora for a general model
>>
>>108999654
open-weights != Non commercial. it's an open commercial model because there isn't a restriction to monetizing the model, it's just that you need to pay. commercial is not impossible with this licence ergo it's non commercial. grok being a sycophant and not realizing this is also hilarious. wtf is Elon even doing anymore? judging by the ipo, he's trying to gtfo
>>
>>108999677
why are you reading slop output?
>>
>>108999677
>ergo it's non commercial
*it's NOT non commercial*
my bad
>>
>>108999687
why are you reading slop output?
>>
>>108999677
retard
>>
>>108998298
apple twig? what the fuck
>>
>>108999701
interesting hypothesis. care to elaborate without looking like a child?
>>
File: ComfyUI_00741_.png (405 KB, 896x1152)
405 KB PNG
>>
>>108999999
>>
File: ComfyUI_00742_.png (590 KB, 896x1152)
590 KB PNG
>>
>>108999856
fucked hand and toes
>>
>>108999664
I gen 2,5k Z Image pics with DPM++ SDE on a 4070 12GB with 6 mins per image
>>
File: 2456.webm (1.71 MB, 768x576)
1.71 MB
1.71 MB WEBM
where's the kinos at?
>>
File: Anima_02626_.png (761 KB, 832x1216)
761 KB PNG
>>
File: 454467888819881.png (2.41 MB, 1600x1152)
2.41 MB PNG
>>
Klein 999999b when?
>>
>>108999992
what's the artist/style?
>>
>>108999992
artist?
>>
File: 1773542612112793.png (190 KB, 512x422)
190 KB PNG
>>109000000
>>
>>108999297
>goto chuchy
>>
>>109000000
Wasted
>>
sirs, I believe I was promised one (1) z-image edit model. Please make providings
>>
>>108999194
What are your options now? Where will you find paypigs?
>>108999039
Wow, mysterious.
>>
File: ComfyUI_00743_.png (423 KB, 896x1152)
423 KB PNG
>>
File: 1780828711790032.jpg (78 KB, 885x498)
78 KB JPG
>>109000000
tomorrow
>>
>>108999873
thats the goal
>>
>>109000086
>no background
>no details
>simple colors
>slop colors
>tiny resolution
yup, average trAnima gen.

now that base trAnima failed, who is gonna finetroon it so it actually gits gud?
>>
>>109000120
>nogen
>>
best current zimage (base or turbo)/klein 9b finetunes for realism?
>>
File: ComfyUI_00744_.png (891 KB, 896x1152)
891 KB PNG
>>109000120
here
>>
File: 151264163087903.png (1.56 MB, 1152x1472)
1.56 MB PNG
>>109000010
>>109000014
Luis Royo lora with (@murata range:0.6)
>>
>>108997619
good to see he hasnt abandoned local training. i hope he recaptioned his data set and didnt just literally reuse the joycaption data though, its become a bit dated
>>108997651
not just that ive heard kleins arch isnt as good as people initially thought itd be. i wonder why he still picked it over z-image, i believe in december when turbo came out he said somewhere that he's considering it for bigasp3
>>108999378
literal skill issue, just dont butcher the LR and regularize
>>108999369
klein has the superior arch (in theory, ive heard people have grown somewhat sour on it), same for z-image
>>108997878
>Was his dataset really that good or is everyone nostalgiafagging?
he constantly delivered AND documented what he did quite detailed. could end up being nostaligiafagging but so far none of his releases have been outright bad. though he also tried to train a bigasp 2.6 that he abandoned since it ended up being shit. beyond that hes also the same guy who trained joycaption
>since then he's had plenty of opportunities to shine
i think chroma came out around the time when he was playing around with SDXL rectified flow models.
>>
File: 1760075868571105.png (1.67 MB, 896x1152)
1.67 MB PNG
>>109000086
>>
>>109000154
>murata range
thanks
>>109000171
Klein?
>>
>>108998257
>>108998262
you can literally get this quality out of ootb anima. like im not even joking, this is about the same quality as "photo (medium, cosplay photo" as tags give you + "a photograph of..." nl prompt
>>
>>109000150
lmao
>>
File: ComfyUI_00745_.png (1.08 MB, 896x1152)
1.08 MB PNG
>>
i think Anima models might secretly require 36 steps instead of 30
>>
File: 120713207887162.png (1.93 MB, 1152x1472)
1.93 MB PNG
>>
>>109000301
You should use 50-150
>>
File: 158285479283534.png (2.31 MB, 1152x1472)
2.31 MB PNG
>>
>>109000442
No there's actually a point where the image gets worse, with the composition I mean.
I guess it depends on the sampler, but for the samplers that change into a new composition every X amount of steps they do get worse (more boring or whatever).

It also depends on the model obviously, some huge models want 50 steps to work properly.
>>
File: 246452002939920.png (2.03 MB, 1152x1472)
2.03 MB PNG
I just do 30 on Anima.
>>
>>109000442
Yeah use 150 with res_2s
>>
>>109000442
I use 300 personally
>>
is optimal steps actually just dependent on resolution?
like maybe the reason high resolution gives you mistakes and boring compositions is simply because you're using the same old 30 steps when you should be using higher?
>>
>>108999926
have you considered learning new words?
>>
>>109000600
>>109000593
Big dogs
>>
>>108998335
ghetto miko
>>
>>109000618
it is from my experience, though ive seen other anons reflect on this. the higher you go the more steps you want. of course theres other factors like sampler etc but this seems a pretty clear correlation
some anons here talking about 150 steps unironically, but ive had gens at 4MP with anima that actually benefitted from shit in that range, though obviously not with res samplers
honestly tdrussell should add this to the readme on the HF too
>>
>>108997154
>>108997352
Is this trellis? is the retopo a new feature?
>>
File: 1776775319598367.png (1.92 MB, 896x1152)
1.92 MB PNG
>>109000183
yeah
>>
>>109000717
I member a anon that posted improvement even with noob at higher steps when using Euler with Noob and other models.....
>>
>>109000754
that makes sense. youll usually see lower recommendations because it saves time and from a certain point onwards you simply have diminishing returns from adding even more steps, as in youre close enough to convergence anyway.
though it also seems like higher resolution -> convergence takes longer. since anima supports more than just 1024, that matters
30 steps is close enough to convergence at 1024, at 1536 you want to aim higher and at 2048 even higher than that from my experience
might post some examples later
>>
>>109000754
more steps rarely makes it worse it just becomes diminishing returns
>>
>>109000830
This is slow gpu cope
>>109000812
This is true that higher steps help above stock resolution with the first pass only. You should target as high as the model will allow you on the first pass
>>
>>109000838
you don't get a better image by going to 300 steps nothing to do with gpu cope
>>
Someone please make a sugoihi lora for anima...
>>
>>109000843
Did anyone mention 300 steps anon?
>>
How did that DGX Spark work out?
>>
File: 173512CUI_00002_.png (2.06 MB, 1152x1536)
2.06 MB PNG
>>109000154
Kino
>>
>>109000878
yes me
>>
Anyone playing with anima?
You guys use the checkpoints or base?

Also, it is good with natural language right? Why do checkpoints use all those quality tags then?
>>
>>109001009
base is still undertrained so the previous ones are even worse
>>
>>109001022
So base is still the best?

Should i remain at sdxl?
>>
>>109000920
Not good for anyone here because it's not great at inference.
>>
>>109000196
sure? both are not the best realistic trainings so far, they're more training to address the gap for 2d/2.5d/3d artwork including questionable/nsfw that most of the commercial base models suck at
>>
File: ideogram4_00001_.jpg (373 KB, 1264x848)
373 KB JPG
>>
>>109001030
it's why it's funny they doubled down on it as a laptop
>>
>>109001049
Model is too slopped. I prefer Ernie image.
>>
with great computeR becomes great responsibility
>>
>>109001049
>proudly trained on nanofluxgpt 2-pro outputs!
>>
>>108999339
>>
>>109001049
what's the deal here? why are the Wu quintuplets having a hysterical meltdown out in the rain in the glue factory parking lot?
it's a monster, isn't it? there's a monster on the loose out there.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.