[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: collage.jpg (2.98 MB, 3696x4400)
2.98 MB JPG
Discussion and Development of Local Image, Video, and Music Models

Previous: >>109080151

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
SDWebUI: https://rentry.org/ldg-lazy-getting-started-guide#the-stable-diffusion-web-ui-lineage
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, & Upscalers
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/tdrussell/diffusion-pipe
https://github.com/kohya-ss/sd-scripts
https://github.com/kohya-ss/musubi-tuner

>Z
https://huggingface.co/Tongyi-MAI/Z-Image

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
https://animadex.net

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>Wan
https://github.com/Wan-Video/Wan2.2

>LTX-2.3
https://huggingface.co/collections/Lightricks/ltx-23

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
First for fuck the malware spreading news spam schizo
>>
Blessed thread of frenship
>>
>>109085384
thats a lot of words for "im a pedo", be more concise
>>
>inb4 nigbo shits up the bread again
>>
File: gork is this true.png (84 KB, 1222x627)
84 KB PNG
>>109085384
>>109085404
>>
>>109085431
I think he had to visit his social worker today so lets enjoy the peace.
>>
File: 00229-3569229045.png (1.77 MB, 1280x768)
1.77 MB PNG
>>
File: file.png (1.7 MB, 1024x1536)
1.7 MB PNG
its been 3 hours and hermes (i named it mia) is still chugging along on that database. it asked to pull more tags to complete it so i think thats what is taking it so long.
im not sure what to do anons. im looking around the house and im thinking of selling some things. i have a v1 moddable switch i never use with some games, a 3ds i never use collecting dust in a drawer, a steam deck and dock i never use collecting dust under the tv, an ipad that i just use as a third monitor on my mac.
i think if i sell all of those i will have enough for either 64gb of ddr4 ram for more context for my llm, or a better gpu with 16gb that i can use for llama.cpp and then use the 4070 as a comfyui gpu.
what route should i go? i want to keep making pictures and posting them
>>
If I want to gen an obscure character, what's the best way to introduce them to my workflow? A LoRA?
>>
>>109085623
yes
>>
File: anima1_00070_.jpg (447 KB, 1152x1648)
447 KB JPG
>>109085623
>>109085634
LIke anon said, lora. If using anima, adjust rank according to amount/quality of source images
>>
>>109085640
>>109085634
Thank you. I'll be honest, looking up LoRAs has been cool, but the number of them I've seen using generated images concerns me, 'copy of a copy' type shit. Perhaps it would behoove me to make some of my own, especially if its a subject I'm autistically fascinated about and already have resources of.
>>
>>109085640
Oof
>>
>>109085661
Making loras is fun, I can recommend
>>
>>109085661
whats important is maintaining a high quality and well captioned dataset
>>
This shit is getting out of hand I wonder if it's even worth using anything other than the 12b gemma4 model for this task to begin with 26b is fine but is it worth all the space being used for a task that really shouldn't require that much compute
>>
File: 00583-478965148.jpg (529 KB, 1152x1920)
529 KB JPG
>>
>>109085741
remember when I told you to just use sdcpp?
>>
>>109085741
I want an assistant like that too. Which vendor is this? Is this new safer version of Fable?
>>
>>109085750
that was me you told not him. i havent gotten around to it yet. i work full time and have a 1 year old. right now just building that database still
>>
>>109085760
cool. just rip out comfy and use sdcpp instead of whatever multi day retardation you've been on
>>
>>109085766
ive been having fun with my multiday retardation though
>>
>>109085349


One Rentry to rule them all
https://rentry.org/LDG_vital_info
>>
>>109085770
good morning saar
>>
File: 1781496180810105.jpg (237 KB, 1024x1536)
237 KB JPG
>>109085807
good morning saar
>>
>>109085779
We need to gatekeep those two
>>109085750
Does it handle inpainting better than comfy because this has to be the most shit tier implementation of inpainting I have ever dealt with in my life, it's horrible in regards to approach and performance. Also you have me confused with someone.
>>109085759
Cline and jailbroken qwen. 3.6 you don't need those lobotomized models just prompt the jailbreak correctly
>>
>mfw Resource news

06/18/2026

>UniTemp: Unlocking Video Generation in Any Temporal Order via Bidirectional Distillation
https://lzhangbj.github.io/projects/unitemp

>Reasoning as Intersection: Consensus-Frame Alignment for Visual Focus in Video-MLLMs
https://github.com/1Pansy/VideoCFR

>Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance
https://hustvl.github.io/Moebius

>From Bounding Boxes to Visual Reasoning: An On-Policy Data Annotation Tool for Vision-Language Models
https://github.com/WnQinm/Annotator

>Boogu-Image-0.1-Edit GGUF
https://huggingface.co/realrebelai/Boogu-Image-Edit_GGUFs

>FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision
https://tobias-kirschstein.github.io/flexavatar

06/17/2026

>Ostris releases 2-8 step Ideogram 4 Turbo LoRa
https://huggingface.co/ostris/ideogram_4_turbotime_lora

>Neodragon: Mobile Video Generation Using Diffusion Transformer
https://huggingface.co/karnewar/Neodragon

>Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification
https://sharelab-sii.github.io/uniar-web

>DreamX-World 1.0: A General-Purpose Interactive World Model
https://amap-ml.github.io/DreamX_World

>Universal Image Restoration via Internalized Chain-of-Thought Reasoning
https://github.com/gy65896/CoTIR

>Boogu-Image-0.1: Boosting Open-Source Unified Multimodal Understanding and Generation
https://huggingface.co/Boogu/Boogu-Image-0.1-Edit
https://huggingface.co/Comfy-Org/Boogu-Image/tree/main/diffusion_models

>LTX-2 Trainer
https://github.com/Lightricks/LTX-2/tree/main/packages/ltx-trainer

>ComfyUI-DKTWrapper: Transparent-object depth + surface-normal estimation built on WAN 2.1 + MoGe
https://github.com/BeeeFX/ComfyUI-DKTWrapper

06/16/2026

>PermaVid: Consistent Video Generation Across Edits
https://ys-imtech.github.io/projects/PermaVid

>DifFRACT: Diffusion Feature Reconstruction and Attribution for Circuit Tracing
https://github.com/Artalmaz31/DifFRACT
>>
File: rulez.jpg (369 KB, 2282x1755)
369 KB JPG
> evil bbox
for the first time in a long while, Im in the mood to gen again
>>
>mfw Research news

06/18/2026

>Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation
https://arxiv.org/abs/2606.18478

>The Market in the Model: Latent Diffusion as Neural Economy
https://arxiv.org/abs/2606.19151

>FlowObject: Flow Steering for Bridging Generative Priors and Reconstruction Fidelity
https://yuchenrao.github.io/projects/flowObject/flowObject.html

>Moving Beyond Diversity: Visual Token Pruning as Subspace Reconstruction for Efficient VLMs
https://arxiv.org/abs/2606.18681

>Bridging Creative Intent and Visual Quality: Creator-Driven Recurrent Video Generation with Agentic Feedback Loops
https://arxiv.org/abs/2606.18591

>BindEdit: Taming Attention Leakage for Precise Multi-Object Image Editing
https://arxiv.org/abs/2606.18906

>Show, Don't Ask: Generative Visual Disambiguation for Composed Image Retrieval with Turn-Valid Coverage
https://arxiv.org/abs/2606.18992

>Physics-IQ Verified
https://arxiv.org/abs/2606.18943

>SpectralDiT: Timestep-Conditioned Spectral Residual Correction for Flow-Matching DiTs
https://arxiv.org/abs/2606.18765

>A Multi-Domain Benchmark for Detecting AI-Generated Text-Rich Images from GPT-Image-2
https://arxiv.org/abs/2606.19259

>Forged Calamity: Benchmark for Cross-Domain Synthetic Disaster Detection in the Age of Diffusion
https://arxiv.org/abs/2606.18554

>NeuMesh++: Towards Versatile and Efficient Volumetric Editing with Disentangled Neural Mesh-based Implicit Field
https://zju3dv.github.io/neumeshplusplus

>MUFASA: A Multi-Layer Framework for Slot Attention
https://visinf.github.io/mufasa
>>
>>109085820
yeah it's more similar to a1111/forge style inpainting. they have a webui and it's pretty shit simple to run without touching any python bullshit
>>
>>109085844
How was your social worker visit?
>>
>>109085852
Do they have adetailer if so I will migrate I only picked up comfy because I felt it was more flexible but most of these advance detection methods and flows actually don't require anything from comfy at all I can grab the models myself and take the logic directly from the custom nodes
>>
File: Klein_00117_.jpg (403 KB, 864x1488)
403 KB JPG
>>
>>109085879
no unfortunately. I feel like getting a better inpaint is worth it. only a matter of time before it gets added anyways
>>
>>109085845
Griftnigger got gpu with neetbux? Nice
>>
>>109085844
>>109085851
Fuck off debo
>>
>>109085680
like this thread? >Maintain Thread Quality
>>
>>109085900
True I can leverage other detection methods. I'm still floored that they honestly would use such a shit tier approach with such poor adjustment values
>>
File: 70623.jpg (673 KB, 1472x1792)
673 KB JPG
>>
feels like the poothonic era is ending
>>
ponyv7 just hit 100k downloads on android, say something nice about astrolite
https://play.google.com/store/apps/details?id=ai.fictional.app
>>
>>109085889
yep, thats a butt
>>
File: debo_ccg_fia_00056_.png (2.15 MB, 1792x977)
2.15 MB PNG
>>109086025
>>
>>109085845
God DAMNNNNNNNN it feels good to be an ideogram CHAD
>>
Can someone vouch for these links not including malware? Because in the past there was malware linked multiple times
>>109085844
>>109085851
>>
https://github.com/yanokusnir-ai/one-node-flux-2-klein

interesting node to try, all in one img + edit node for flux klein
>>
>>109086126
Crazy that you're so lonely you even join the gay larp lmao
>>
There are like a billion different optimizers out there.
Are there any that add anything meaningful to diffusion lora training over good old AdamW, or maybe using Prodigy to determine optimal LR when you are unsure?
>>
File: debo_ccg_fia_00096_.png (3.95 MB, 1432x2144)
3.95 MB PNG
>>
File: z_00147_.jpg (418 KB, 984x1264)
418 KB JPG
>>109086212
came 8bit, rex or constant. It's so fucking good.
>>
>>109086185
thanks for the link Ján Kušnír
>>
File: debo_ccg_fia_00059_.png (2.43 MB, 1792x977)
2.43 MB PNG
>>109086229
welcome to the club
>>
>>109086249
idk who that is nor am I czech

just seems like an interesting tool, im happy with the 1/2 image edit workflow though.
>>
>>109086212
i don't know what's different about it but I see a trend of people using automagic.
>>
so is ideogram with turbo any good? speed? edit capability?
>>
>>109086229
>>109086260
Why are you replying to yourself schizo?
>>
>>109086248
Skimming through CAME seems like memory optimization primarily? Keeping it in mind for larger models regardless.
Rex seems like a decent alternative to cosine scheduling. Always felt like cosine curve wasn't particularly efficient, I might actually use this.
>>109086272
That's like ostris's alternative to prodigy, right? And he seems to be working on a newer version recently. Not sure if I want to waste too much time tinkering with it, but it seems interesting.
>>
>>109086297
He is using bot
>>
>>109086315
Yes that's it. I tried it once and got worse results than with Adamw but some people praise it.
>>
>>109086332
The renty covers a lot of his antics I think he's too low IQ to do something as complex as a bot
>>
>>109086315
I'm not sure how it works compared to others but CAME has been my primary optimizer for ages. I use it with large rank and large batch, max both. Only way so far to make multiple concepts work with single lora without terrible bleeding
>>
So is Boogu boogood or nah?
>>
>>109086354
this one? >>109085779
>>
>>109086370
That's a cope rentry after the entire community wouldn't budge. The terms were simple all he had to do is leave us alone but his cope thread is so dead he's here on his knees for attention
Like a disabled dog
>>
Hello, I am a tourist.

Can you guys point me what is the current local meta for:

1 - Anime image gen
2 - Non-anime image gen
3 - Image editing
4 - NSFW fine-tune
5 - Videogen

Also, is are the Boogu models that released this week any good? Like this anon asked >>109086367
>>
>>109086203
Not a larp, I just don't gen often uwu
>>
>>109086387
anime: illustrious/noobAI (nova anime is pretty good), or anima

editing: qwen edit or klein edit 9b

image gen no anime: z image turbo is good and fast, ideogram

videogen: ltx 2.3, wan 2.2

nsfw: cant say but you'll find lots of stuff online
>>
Is Ideogram any good at handling known IPs? I can prompt Chroma for pretty much anything in that regard. Want to know before I give it a try.
>>
>>109086444
only like nintendo shit, outside of that its completely clueless
>>
>>109086444
>not generating tifa
Homo & gay
>>
>>109086462
Correct, but because you asked.
>>
File: 1756052175736316.png (1.34 MB, 1024x1024)
1.34 MB PNG
>>109086424
example of klein edit 9b (can use one or two images)

fun stuff.
>>
Is Klein Edit still local sota for editing? Has anyone tried Boogu edit?
>>
>>109086505
Maybe it's better, maybe it's not but boogu uses flux 1 vae so you will get more latent compression degradation on texture level detail at minimum
>>
File: 1780995671675163.jpg (209 KB, 768x1024)
209 KB JPG
my vibe coded natural language to tag database search to prompt to comfyui is complete. it needs some tweaking but this was the first attempt.
>use prompt_builder. we are looking at the back of a beautiful anime girl with medium length light reddish brown hair wearing a blue two piece bikini is standing at the water’s edge of a realistic beach shoreline. in front of her is the sun setting in the horizon casting beautiful colors onto the clouds and onto the peaceful water
>>
>>109086505
Klein 9b will be hard to beat, even saas struggle
>>
>>109086424
You don't have to keep recommending SDXL anon it's okay you can just say Anima
>>
File: file.png (51 KB, 676x1036)
51 KB PNG
behold, the future of prompting
>>
File: 212259CUI_00002_.png (1.23 MB, 1152x1536)
1.23 MB PNG
>>
>>109086424
Nsfw is still Chroma.
>>
>>109086565
haha nice- wait
>>
>>109086565
*past
Regional prompting has been around since SD(non-xl)
>>
>>109086565
damn, ideogram really is next level
this kind of complex image composition just wouldnt be possible without bboxes
>>
File: debo_ccg_fia_00065_.png (2.48 MB, 1792x977)
2.48 MB PNG
>>
File: 1781511711241194.jpg (182 KB, 768x1024)
182 KB JPG
second attempt. still not quite what i’m going for but it’s getting there. might have to v2 the script tonight
>>
>>109086638
Those aren't waves, they're mountains.
>>
>>109086229
the fake debo couldn't even match his level of melty shit, it still looks too coherent
>>
>>109086658
He's not disabled which is why he's failing at it, you need to be broken in a way that can't be replicated
>>
9 days of nofap. This is how we cure cancer.
>>
>>109086687
Enjoy the enlarged prostate
>>
File: file.png (350 KB, 455x552)
350 KB PNG
combining loras on ideogram renders me all sorts of body horror
>>
File: 00354-1160499192.png (1.55 MB, 1152x896)
1.55 MB PNG
>>
>>109086701
Perfect for anal stimulation
>>
It's depressing there is no Sora tier open model yet, with multiscene support and memetic potential.
The chinks were going to open-source HappyHorse originally but Alibaba pulled the plug, we were so close. It's apparently 15b and is Seedance tier
>>
pytorch status?
>>
>>109086799
>Seedance tier
Fucking kek no it's not even close. It's fucking garbage. They just gamed the leaderboards to get to the top. It could've been salvaged if it was open source but Alibaba just wants money now.
>>
>>109086811
tensorflow usecase?
>>
File: 00369-2605446145.png (1.84 MB, 1152x896)
1.84 MB PNG
>>
>>109086811
bloating
>>
File: 221558CUI_00001_.png (1.26 MB, 1152x1536)
1.26 MB PNG
>>
>>109086812
>It could've been salvaged if it was open source but Alibaba just wants money now
Yeah, and they are also hoarding Qwen-Image-2 API-only for no reason (it has since been mogged by both closed and open models). Its 7b size would have been really nice for fine-tuning
Their latest Wan offerings also got mogged by other API competitors, why not releasing the weights for that too? They could easily replace LTX2.3 given Wan2.2 still has better quality (but no audio and 5sec only). If I am going to pay for API, I am just going to use Seedance, Grok or Kling instead
>>
File: image.jpg (400 KB, 1421x1421)
400 KB JPG
>>109085349
Holy shit, when did this hobby get so commercialized?
I reached out to 5+ LoRA makers on Civitai asking if they'd be willing to port their Illustrious LoRAs to Anima. Even offered to do the training myself if they just shared the dataset, I'd send them the finished LoRA privately, no public release needed.
Every single one basically hinted they'd only do it for payment.
Thought this was supposed to be gooners helping gooners, not an e commerce platform.
>>
Fuck it I'm going to fork webui inpainting and make a custom comfy node, fuck it all I will vibe this shit into existence, fuck you comfy for once again failing at basic functionality and the acting surprised when you get pushback.
>>
>>109086896
you chose not to support Ani's open source future. you were complacent with cumfart and now the space is just grifters
>>
>>109086896
>Every single one basically hinted they'd only do it for payment.
No shit. You're "offering" them to send you their datasets. Essentially.
>>
>>109086900
It do be like that. Reddit shitters will downvote you if you insinuate web UI might be a better tool to use.
>>
>>109086900
just use sdcpp
>>
>>109086900
Can you vibe code the supermerger extension while you're at it.
>>
>>109086915
>>109086896
Honest offers are not good enough for anything in this planet.
You should have tried to extort them somehow but of course that's difficult without first stalking them and so on. Problem with the internet is that you can't just go there and beat someone physically.
Also: it's probably not that difficult to figure out some shitty lora on your own.
>>
>>109086900
someone built it just now
https://www.reddit.com/r/StableDiffusion/comments/1u9g3vy/i_built_a_single_comfyui_node_for_flux2_klein_t2i/
>>
>>109086909
:S
>>109086915
>>109086929
But my CivitAI account has a few models, likes, and followers. Why would I throw my reputation out the window if I’m offering to train LoRAs for them?
>>
>>109086821
>balding issue
isn't it inevitable anyway?
>>
>>109086917
They need to catch up and be bleeding edge, until then I'm using this shit.
>>109086916
Redditors can eat my asshole
>>
File: sph mom.mp4 (1.6 MB, 704x1280)
1.6 MB
1.6 MB MP4
>>
>>109086955
datasets are far more valuable than lora training itself.
setting up ai toolkit isn't particularly difficult, but dataset curation is tedious grunt work that no one would just hand out for free
>>
>>109086955
You should have made him an offer he couldn't refuse.
>>
>>109086951
I want to do this programmatically, I don't need a ux for this, the values and handling by default in comfy are fucking garbage and the handling of things like mask and even blending mask are the equivalent of dragging your nut sack against glass. It's piss easy to automate detailing in webui through adetailer without a care in the world once you dial in the arguments in comfy it's a fucking nightmare and a chore just to get the values right.
>>
>>109086565
Wake me up when you're able to do that in a 3D space
>>
>>109087038
Scripting is for winners.
>>
>>109087061
You can.
>>
File: debo_ccg_fia_00066_.png (2.21 MB, 1792x977)
2.21 MB PNG
>>
https://civitai.red/models/2712733/ideogram-4-gguf-workflow
>Tested on RTX 3060 12GB with 16GB RAM
vramlets bros??
>>
>>109087094
Use case?
Default fp8 model, or better the int8 quant works fine on my 3060 with the shitty comfy workflow or kijai workflow.
>>
>>109087109
Didn't know that. How is it at editing images? Is it better than Qwen?
>>
>>109086896
this is weird behaviour not just training itself, probably told to pay up as a polite fuck off
>>
File: q_04as5a.png (1.59 MB, 1536x1024)
1.59 MB PNG
>>
>>109073927
Thank you for letting us know.
>>
>>109087138
The model itself doesn't have any edit capability. Some people are experimenting with lora training for that, but it will be a while, if ever, any decent ones show up.
>>
>>109087170
I assumed people here also follow /h/ but I did it
>>>/h/8899957
>>
>>109087186
I have forgotten /h/ altogether.
>>
>>109087080
Proof?
>>
File deleted.
I will not be denied and I will not activate this workflow in this ugly UI I will use it in my own. Just needed to make sure it loads before running it
>>
>>109087215
>>
>>109086961
>bleeding edge
what exactly is bleeding edge?
>>
>>109087232
They get stuff first
>>
>>109087238
sdcpp only adds the models that matter a few days after they release. You let the redditors bite the bullet and try out model releases. Fuck spending time and energy on being a guinea pug
>>
>>109087199
Use your own initiative instead of bothering other people.
>>
how do you use the kj ideogram node in the default workflow? want to try it out
>>
>>109087246
I would like to use things like music as well in my interface, comfy is just a catch all for other tools.
>>
Should I train a lora on my OC to make pics of me fucking my waifus?
... should I train a lora of myself to make pics of me fucking movie stars?
>>
>>109087307
train on yourself then post the lora here
>>
File: q_oav1cr.png (1.23 MB, 1536x1024)
1.23 MB PNG
>>
>>109087270
Just use the kj's workflow.
Anyway you pipe prompt builder output into the text section of the default text encode node.
>>
>>109087335
where is his official workflow? didnt see a github page for ideogram
>>
File: 1781716693502750.png (151 KB, 1159x846)
151 KB PNG
are you kidding me
>>
File: 1780172931493868.png (306 KB, 1176x822)
306 KB PNG
>>109087375
ah, the background part is required
>>
>>109087375
natural language prompting is for api only.
stop being a promptlet and learn to draw bboxes the ldg way
>>
File: 1759915423896298.png (430 KB, 1165x840)
430 KB PNG
okay, so the kj prompt builder works, just output that to a string node then use that as the prompt.
>>
>>109087338
Maybe I got confused about the official part but anyway you do something like this (NSFW):
https://litter.catbox.moe/vbxce2tsjyjmajkt.png
>>
File: 1770329234642711.png (832 KB, 1268x945)
832 KB PNG
okay maybe ideogram isnt that bad.

38 seconds, people were saying the gens took minutes wtf, lies
>>109087424
do you need loras to get nsfw out the box?
>>
does (thing;1.2) even work with ideogram? I suppose not right? is there anything close to it?
>>
>>109087454
The answer is yes as always, pretty much same for any model that isn't community finetune.
>>
>>109087509
CLIP was super sensitive to weighting but llm based text encoders need much higher weights to get similar effects, try (thing:3.0) or whatever.
>>
File: 70.png (2.51 MB, 1024x1216)
2.51 MB PNG
kek
>>
>>109087454
There's only ~9+B in Ideogram. It should be on the par with Klein 9B more or less.
Might actually download this but I really wouldn't like to update ComfyUI at this point.
>>
File: 01864-46555205676.png (1.85 MB, 1024x1536)
1.85 MB PNG
>>
File: ComfyUI_00074_.png (2.57 MB, 1536x1536)
2.57 MB PNG
ideogram is the ultimate filter for those who don't know how to run llms
images are already captioned by llms in image model training data
slop prompting is the future
>>
>>109087563
I did an update, broke nothing, just added templates: all my klein edit stuff still works fine.
>>
>>109087621
>inb4 credential miner attack
>>
File: 2484-294-.png (1.57 MB, 1264x1680)
1.57 MB PNG
>>
>>109087630
You should be using docker/podman for Comfy if you aren't already.
>>
>>109085349
kyoto
>>
>ideogram STILL doesn't have a category filter on Civitai.

This fucking piece of shit site really is dead as fuck, really just a reflection of the entire industry at this point.
>>
File: x.mp4 (2.93 MB, 576x1056)
2.93 MB
2.93 MB MP4
>>109087675
maybe they're still reviewing the license or something or trying to get another license from ideogram.

it's more restrictive than most after all
>>
File: 1765018953566682.png (812 KB, 1662x993)
812 KB PNG
okay this workflow has the kj node and minimal bloat.

https://pastebin.com/VU0PcdtS
>>
File: 1757294658238890.png (795 KB, 1593x923)
795 KB PNG
what the fuck it knows pepe NATIVELY? no lora?
>>
File: 1771540759688096.png (3.1 MB, 1264x1680)
3.1 MB PNG
>>109087707
>>
how
does
it
just
know
?
>>
ACE-Step 1.5

https://vocaroo.com/1hViYlE0WYSG
>>
>>109087675
Two more weeks and Ideogram's buried with Ernie and Z Image. Models are disposable but your captioning dataset and LoRA skills aren't, invest there.
>>
File: q_494ih2.png (952 KB, 1024x1024)
952 KB PNG
>>
>>109087791
Shit I might as well post what I got
https://vocaroo.com/12m2m2jI519m
I need to make my own frontend for this before doing more
>>
File: 1753808012337776.png (1.1 MB, 1734x1101)
1.1 MB PNG
thank god for kijai node cause otherwise making ideogram prompts would be literal cancer

in fact I really like that you can do the boxes, it's like region specific inpainting.
>>
>>109087812
I miss the Pony Illustrious days, when finetunes were like pillars holding the scene up. Feels like Anima's will be the last of the mohicans on that front and after that it's every man for himself.
>>
File: 234879.png (1.49 MB, 1264x1680)
1.49 MB PNG
ideogram added the english text on its own
>>
>>109087871
If you act as if your world is crumbling then it will crumble.
>>
>>109087898
His mental crumbled even before the days he was nostalgic for.
>>
>>109087898
spoken like a true socialite cultist
>>
>>109087871
>finetunes
you mean shitmixes right
i dont miss that
>>
File: 1752765127220944.png (960 KB, 1590x1074)
960 KB PNG
neat
>>
File: debo_csa_fia_00020_.png (2.07 MB, 1792x977)
2.07 MB PNG
>>
File: schizo.png (17 KB, 91x99)
17 KB PNG
>>109088006
>>
>>109086951
It's not impossible to make a hardware version of a diffusion model :^) like that literally can only do that one model.
>>
>>109087997
>>109087855
This is giving me Krita diffusion flashbacks, are you kidding me? You know this was already possible with the early diffusion models and the Krita plugin, right? Why is everyone acting like this is some new breakthrough when it's been doable forever?
>>
>>109086955
>my reputation
izzt status
>>
>>109088011
That's right, you are schizo.
>>
>>109088006
the other anons are right
fuck off namefag
>>
>>109088011
I see ai stickers in ecommerce places like ebay and etsy all the time and these low IQ grifters dont even bother to fix the fucking hands... I mean people are buying them, but it just goes to show that the world is filled with literal goyim
>>
>>109088011
I now automatically start counting people's fingers irl
>>
just in case
>>
he posts intentionally dogshit images because he is upset we moved away from sdg like a year ago, sad
>>
File: 1763447989435712.png (868 KB, 1717x1051)
868 KB PNG
lmao

I think I cracked the code. is the filter...generated?
>>
>>109088097
who is we, are they in the same room with you right now?
>>
File: 00118-80090.jpg (1.19 MB, 1383x2144)
1.19 MB JPG
>>
File: 1772929575421213.png (1.62 MB, 896x1184)
1.62 MB PNG
>>109088110
and suddenly fine
>>
>>109088111
the one making all images that arent 1girl standing
>>
ltx new trainer verdict?
>>
>>109088122
Just wait for someone to obliterate ideogram.
>>
File: debo_csa_fia_00022_.png (2.37 MB, 1792x977)
2.37 MB PNG
>>
File: 1773700537738172.png (1.63 MB, 896x1184)
1.63 MB PNG
>>109088138
z image turbo is still good, this is just a new toy to play with

I like the region prompting though, you can be very specific. also, quality is good and text is good.
>>
>>109088111
Read the OP newfag or if you don't like it perhaps /sdg/ is more your speed
>>
I can't believe comfyui after multiple months for some retarded reason still limits the number of queued items you see you queued to 200 and after it, it just stop counting. Who though this was a good idea?
>>
>>109088148
>I like the region prompting though, you can be very specific. also, quality is good and text is good.
The hands right there aren't good quality.
>>
File: 1766129828211546.png (1.8 MB, 896x1184)
1.8 MB PNG
>>109088218
using the turbo preset just to test stuff out, in general it works well
>>
>>109088139
crazy that you're still here after all this time lol
do you still think that the best workers in tech are immigrants and "marginalized groups who survived every racist filter attempt"?
>>
File: 1769039679473551.png (1.9 MB, 896x1184)
1.9 MB PNG
>>109088223
the region prompting with boxes is fun though, I wouldnt mind this in other models, otherwise you need controlnets to put characters in a specific location.
>>
>>109088288
I basically agree. I still don't like the "look" of ideogram and eddie.
>>
>>109088288
I agree bboxes are neat but does it recognise characters as well as anima does?
>>
>>109088166
seems like you are very hateful for some reason
are you squatting this thread every night every week?
>>
File: debo_csa_fia_00025_.png (2.74 MB, 1792x977)
2.74 MB PNG
>>
>>109088303
Stop dog whistling disabo it's unbecoming of you.
Just go back to your dumpster and it ends here
>>
>>109088311
so you just ignore the question? i'm really curios
do you also still think that white people are the worst "workers in tech"?
It's insane that you still post here daily and i want to know if you improved at all
>>
>>109088317
I don't even know what you are talking about. This is actually pretty worrying, not for me but for you. I'd be concerned about my mental health if I was rambling like you every day.
>>
File: debo_csa_fia_00027_.png (1.79 MB, 1792x977)
1.79 MB PNG
too easy
>>
>Doing this bit again
It doesn't work anymore you have been doing this for years this is why every timezone tells you to fuck off
You follow the same patterns do the same phone posting yet you make zero progress year after year after year.
It's pathetic, you're unwelcome here because you destroyed /sdg/ and can't even stand living in your mess.
>>
>>109088353
so you're just here to spam? seems like you have no idea about thread culture besides being on /g/ 16 hours a day (for years lol)
if you just want to slop out shit and convince yourself that this is "your job" there's >>>/g/sdg for you specifically
/ldg/ is not the place for that
>>
>>109088372
He's going to phone post now and do the play retarded bit, he's so far gone man....He does this and is still surprise when nobody wants him here. Can you imagine the hell his actual life must be?
>>
>>109088365
>this is why every timezone tells you to fuck off
unironcially true kek
>>
File: file.png (1.49 MB, 768x1024)
1.49 MB PNG
i tried using my prompt builder to have mia make a vitruvian man in the style of alex grey with an anime chick instead of a man. my v1 prompt builder needs work. going to spend the rest of the night vibe coding a v2. she tried but she can do way better.
>>
>>109088385
What's even more pathetic is that you can tell he talks to himself in /sdg/ look how little discussion goes on in there and how they seem to recycle the same exact conversations which is typically good morning and good night. There's a few posters there but they are all mentally ill and don't actually discuss anything of value.
>>
File: debo_csa_fia_00029_.png (2.5 MB, 1792x977)
2.5 MB PNG
>>109088406
whats your vibe coding llm of choice?
>>
File: file.png (1.36 MB, 768x1024)
1.36 MB PNG
>>109088432
im using Qwen3.6-35B-A3B-Uncensored-IQ4_XS because thats pretty much the best llm my hardware can run. 
>>
File: 1767945059685759.png (1.81 MB, 896x1184)
1.81 MB PNG
ideogram for the chuds
>>
Chroma has been meta for over a year now... that's fucked up
>>
>>109088443
can it make an image of a black man being lynched by white people?
>>
>>109088449
On opposite day
>>
>>109088451
extremely vague but i copied and pasted it and seeing what mia makes.
>>
File: debo_csa_fia_00031_.png (2.21 MB, 1792x977)
2.21 MB PNG
>>109088442
i heard glm-2 has been impressing ppl, but idk what kind of hardware it needs
>>
File: 1781266682718196.jpg (3.07 MB, 2048x3072)
3.07 MB JPG
>>
File: file.png (196 KB, 753x1149)
196 KB PNG
>>109088451
she made the prompts now im waiting for the brrrr. standby
>>
Had Claude make me a program that converts my folders full of pngs into folders full of webms that preserve the metadata, managed to shrink my old keepers folders from ~65gb to 7gb to make some room.

I guess there's probably quality loss, but... it's hard to hold on to almost 100,000 images in PNG format
>>
>>109088475
what's this?
>>
File: 1768328780001201.png (1.21 MB, 1719x1016)
1.21 MB PNG
>>109088443
anyways, actual gen with the turbo preset, pretty neat
>>
File: debo_csa_fia_00032_.png (2.27 MB, 1792x977)
2.27 MB PNG
>>109088483
but now you cant post any of them to 4chud
>>
>>109088491
my v1 of my vibecoded prompt generator. it takes natual language and turns it into prompt tags then automatically sends it to comfy ui where it then generates an image.
>>
>>109088501
I'll just convert back to PNG when it's posting time and nobody will notice
>>
>>109088483
>full of webms
you mean webp?
>>109088504
4chanXT can do that just drag and drop
>>109088502
nice
>>
>>109088513
>you mean webp?
yes I do lol
>>
File: ComfyUI_00080_.png (2.85 MB, 1536x1536)
2.85 MB PNG
>>
Who's using scail-2? im trying to change the video aspect ratio to 16:9 but don't know where to change it. I changed a setting & it worked but it also messed with the aspect ratio of the reference image
>>
File: 1760221067843260.png (1.18 MB, 896x1184)
1.18 MB PNG
ideogram but just used 1 box

hatsune miku drawn in a japanese 4 panel manga style, in monochrome. in the first panel she is holding an ice cream. in the second panel she has a microphone. in the third panel she is drinking a bottle of water. in the fourth panel she is reading a book.
>>
>>109088475
idk wtf happened but before i tried making that prompt from your post i updated hermes and now mia cant communicate with comfyui now. shes fixing herself right now but idk how long that will take. standby
>>
File: 1778217411206524.png (1015 KB, 896x1184)
1015 KB PNG
>>109088643
hatsune miku drawn in a japanese 4 panel manga style, in monochrome. in the first panel she is saying "open source is good!". in the second panel she is saying "sam altman is a faggot!" while pointing up in the air with one hand. in the third panel she is typing at a computer saying "so use open source!". in the fourth panel is a closeup of miku who smiles.
>>
>>109088442
That's a lie especially at that quant, you get better performance from the 27B model.
You're also talking to a fucking retard that doesn't know his head from his ass btw
>>
>>109086843
interdesting
>>
>>109088643
So 1 box is the secret to NLP ideogram?
>>
>>109087687
why are girls retarded
>>
File: 1767554916925598.png (1.38 MB, 1024x1024)
1.38 MB PNG
>>109088658
now with more sam
>>
>>109088660
why would i lie? thats what im using
>>
File: SWITCH.jpg (191 KB, 1387x1068)
191 KB JPG
does anyone know if such a node exists?
>>
>>109088709
i dont think such tech is possible yet
>>
File: 1770943887329720.png (858 KB, 1766x1005)
858 KB PNG
>>
>>109088754
based
>>
>>109088709
Boolean primitive that you wire to each switch. Unless I somehow completely misunderstood what you're trying to do.
>>
>>109088814
looking into 'boolean primitive' now. thanks

I'm just trying to flip one switch that will then flip both switches between A and B at the same time.
>>
File: Wan21_SCAIL2_00309.mp4 (2.88 MB, 1500x1412)
2.88 MB
2.88 MB MP4
>>
>>109088979
Hand track segmentation fail.
>>
File: 1758816016520998.mp4 (3.26 MB, 576x1056)
3.26 MB
3.26 MB MP4
AWOLNATION - SCAIL
>>
>>109089073
Boring thread.
>>
>>109084812
>Ideogram 4 violates a core principle of stable diffusion style models, the relationship between user and model.
saw someone say that the intended audience are poster makers and graphic designers, makes more sense when you look at it like that.
>>
File: 1771507740776139.png (2.06 MB, 1280x876)
2.06 MB PNG
>>109084812
Good breakdown.
I basically only use LLMs to prompt the json which is basically exactly what you're talking about - the resulting images give me a very "disconnected" feeling, like I didn't make the image at all...
I think this is what anti-ai art people *think* making AI art is like; ironically even though the prompting is less user friendly, and a 'dite probably couldn't even prompt an LLM to make an ideogram prompt.
>>
>>109089177
Shame that post didn't get more recognition.
People reacting to Ideogram 4 the way they are is kind of telling.
Too many people focused on making things perfect instead of making the unexpected.
The whole point of this local stuff is that we aren't restricted in what we make.
They literally baked the restrictions into the model and people are drooling over it.
Local is doomed.
>saw someone say that the intended audience are poster makers and graphic designers
Even if that were true Ideogram 4 wasn't the first to make this happen.
Flux, Qwen Image, ERNIE. Hell, even regional prompting and Controlnets with older models.
Again, doomed.
>>
ideogram 4 anime lora nsfw test https://litter.catbox.moe/6w8be01x7xvyyv67.png
>>
>>109088709
>choice box "A" "B"
>json string node with something like {A: {setting: x, latent: y}, B...}
>extract values from json by key and send to sampler
>>
SA3.... This thing is insane at genning EDM bros... It may not know vocals but the sound quality is just top notch. Doesn't sound one bit synthetic at all.
>>
>>109089327
It's also very good at following prompts. You can take any of the top songs, describe their instrumental progression and it nails them.
>>
>>109085845
Did you place bboxes and write prompt by hand? Mind sharing json?
>>
nsfwish
https://files.catbox.moe/ouuank.mp4
>>
>>109089407
is this a cry for help?
>>
>>109089327
show me
>>
>>109089240 apifag
Lol... They are already 3 nsfw LoRA for ideo. If local is doomed, API is already a fossil
>>
The ONLY API thing I need locally is Seedance 2. When that exists I'll have no need for any online service
>>
https://github.com/yanokusnir-ai/one-node-flux-2-klein

this works pretty well desu, nice all in one editing node without a shitload of subgraphs or spaghetti
>>
>>109089651
Fuck off dude if we use comfy is because we like connecting shit
>>
File: 1752294355870686.png (644 KB, 832x624)
644 KB PNG
>>109089673
it just works.jpg

edit mode: the man in image 1 is holding the character in image 2 in the palms of his hands.

source: pokemon master mr fors
>>
>>109089573
How about fable 5 and seedance sync
>>
>>109089683
Fuck off, if i don't like nodes use forge neo or swarm, there is no use case to make a all in one custom node slop for comfy
>>
>>109089697
I have jsons of edit/image gen/etc workflows already, this is a useful all in one tool without swapping between 5 nodes. in any case, it works fine. doesnt replace ltx, ideogram, or even certain edit workflows, but it's not bad.

I like it mainly for an inpaint/outpaint tool desu
>>
File: 1757354282388440.png (1.37 MB, 1024x1024)
1.37 MB PNG
change the text "LAUFEY" to "OLD HAG". Change the blue cube to a blue sphere that is translucent.
>>
File: 00015-1133952401.png (2.16 MB, 1152x896)
2.16 MB PNG
>>
File: ComfyUI_11497_A 01.jpg (397 KB, 2304x1792)
397 KB JPG
>>
>>109089347
I drag the boxes onto the canvas with or without a template and I’m automatically in text mode, so I type in 2-3 words. I use high-level description as a normal prompt and let the magic button upsampling everything. Then i generates the image.
So it’s not magic - just a smooth, well-designed UI that makes the whole process fast and fun
>>
File: 00024-4015368951.png (2.15 MB, 896x1152)
2.15 MB PNG
>>
>>109089833
What did you use to make this exactly?
>>
>>109089949
thats just klein edit 9b fp8 (or Q8)

4 steps is all you need with the distilled model, comfy template works (dont need the node thing)
>>
File: 1770856883518979.png (512 KB, 544x544)
512 KB PNG
>>
>>109085845
> the bottle is bigger than bbox
> pepe is somewhere else
Btw, how often does it miss bboxes?
>>
>>109089990
too small
>>
File: 00081-1313260342.png (514 KB, 512x512)
514 KB PNG
>>
>>109090006
bboxes tend to generate body horror on anything more complex than a basic standing pose.
it's definitely not how the model was intended to be used but it's all we have to work with.
you should treat ideogram as a poster generator and nothing more. it's good at creating busy centered compositions but there's nothing actually happening in them.
>>
>>109088493
what's the name of the prompt node?
>>
>>109089967
I wasn't.
>>
Is civitai struggling or am I retarded?
>>
>>109090438
xitter and civitai are down again
>>
>>109090438
'no available server'
>>
File: 2026-06-19_ideogram_05.jpg (2.83 MB, 4096x2160)
2.83 MB JPG
>bounding boxes
I'm just not using them.
Haha, yes I know, I know. I'm not going to use them, haha.
>>
>>109090438
same question. i thought my country had just banned the site...
>>
>>109090438
I am hackermanning them in protest of being gay
>>
im really struggling with ideogram always putting the main subject dead center. bounding boxes aren't helping at all. any tips?
>>
File: 00023-2197778934re.png (1.76 MB, 784x1920)
1.76 MB PNG
Did we drop Animer already? What's the current FOTM?
>>
File: 00113-760719640.png (1.87 MB, 1536x512)
1.87 MB PNG
>>
>>109090006
> miss
rarely

prompts and bboxes compete for logical image distribution. both serve as guidance that influence each other. the high-level prompt has the greatest influence, with its impact decreasing hierarchically down to the element description. a high-level prompt can completely override bboxes.

i usually generate images using a single bbox turbo until I find a seed I like, and then build bbox on that.
>>
fucking hell baked early
>>109090630
>>109090630
>>
>>109090634
It's alrite
>>
and bump limit
>>
>>109085661
> making your own is the only way to avoid the slop pipeline. if you're already autistic about it you're halfway there.
>>
>>109086272
> automagic is just fancy word for copypasting someone else's work and calling it a day



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.