[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: y-u-no-bake.png (840 KB, 832x1216)
840 KB
840 KB PNG
/ldg/ - Local Diffusion General

Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102610271

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io
Metastable: https://metastable.studio

>Advanced UI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/kohya-ss/sd-scripts/tree/sd3

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/aco/sdg
>>>/aco/aivg
>>>/b/degen
>>>/c/kdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vt/vtai
>>
Blessed thread of frenship
>>
i can't believe you've done this
>>
I can't believe you used my 1girl as the OP. I did not allow that.
>>
>>102622526
Fair use. I'm going to use it in my 48B model. Any suggested tags?
>>
>
>>
>>102623030
fkey, as that was the lora I used
>>
are there any good non shit ui's out yet? reforge starts bugging out for me during long inpainting sprees and comfyui is comfyui
>>
>>102623379
Invoke
>>
File: 2388284649.png (851 KB, 1152x896)
851 KB
851 KB PNG
>>
New model
https://github.com/THUDM/CogView3
>>
>>102623591
no ComfyUI implementation yet desu
>>
File: 3884501246.png (3.71 MB, 1920x1536)
3.71 MB
3.71 MB PNG
>>
>>102623700
why not today
>>
>pixart pride month over and no new pixart
>>
>>102623591
Quite possibly the worst instructions ive seen (in English though) by and recent devs.
Words do not mean the objects they are described as representing.
Cog group needs to get it's shit together and be more organised and have someone proofread their shit, skill issue.
>>
>>102623591
>https://github.com/THUDM/CogView3
Chicom spyware
>>
File: 0.jpg (283 KB, 1024x1024)
283 KB
283 KB JPG
>>
File: 0.jpg (334 KB, 1024x1024)
334 KB
334 KB JPG
>>
File: file.png (1.2 MB, 720x720)
1.2 MB
1.2 MB PNG
here's an emu3 image I found on /lmg/ kek
https://github.com/baaivision/Emu3
>>
https://github.com/ToTheBeginning/PuLID/commit/eb4004cfcc4c7611c9cc56a68c190754b9d03df1
>release PuLID-FLUX-v0.9.1 model in 2024.10
we'll get a better pulid model for flux soon, nice, now we can't still make it work on ComfyUi goddam :(
>>
>>102624182
kek
>>
How to run Flux.1-dev Upscaler ControlNet on comfy-ui?
>>
>
>>
>>102624435
what are you stuck on?
>>
>>102622356
A rare no collage edition
>>
File: bComfyUI_114620_.jpg (1.37 MB, 3072x1536)
1.37 MB
1.37 MB JPG
there a mech lora yet?
>>
Genned 1500 or so images overnight. Let's go see if any are worth posting
>>
>>
bigma status?
>>
File: bComfyUI_122235_.jpg (305 KB, 768x1152)
305 KB
305 KB JPG
>>102625486
post the shit ones too sometimes they are funny
>>
>>102625535
Our time will come a-gain
>>
>>102625446
that looks pretty good desu
>>
>>102625536
ok
>>
File: bComfyUI_114626_.jpg (1.44 MB, 3072x1536)
1.44 MB
1.44 MB JPG
>>102625651
yeah but it's like 1 out of 10 gens that'll look alright and not have fucked up proportions or 20+ weapons on it making it look goofy
>>
>>102625659
Neat
>>
>>102625659
kinda looks like a mannequin
>>
>>102625446
>>102625690
Kino as heck my nigger
>>
>>
>>102625530
booooba
>>
I wonder if Next Token with VQ VAE with Transformers is the real future. Get off this diffusion noise bus.
>>
>>102626417
does this next token architecture thing means we won't need a text encoder anymore because it's inside the architecture already?
>>
>>102626437
No you would still use the text encoder, the difference is you predict images by tokens instead of denoising. You also get multi-modal stuff built in like "change this from blue to red" or "describe this image" visual captioning without major changes to the architecture because everything is done in tokens.
>>
>>102626475
>you predict images by tokens instead of denoising.
I think this is a big deal when you think about it, the denoising process always was some hack because we don't know how to solve the diffusion equation problem, going for tokens resolve that issue and we won't have to deal with millions of cope samplers kek
>>
>>102626519
> won't have to deal with millions of cope samplers kek
>tfw ive been stockholm-syndromed into enjoying the choice of different samplers
>>
File: 0.jpg (420 KB, 1024x1024)
420 KB
420 KB JPG
>>
>>102626406
so true
>>
Any new text-to-video processes that are better than Comfy+AnimateDiff? They're so short for me, can't seem to get more than 16-64 frames in my current workflow.
>>
>>102627225
nice

>>102627035
good shit man
>>
>>102622356
shoulder to hip ratio says "cock and balls behind the curtain"
>>
>>102627413
that makes it even better
>>
>>102627463
It gives it a twist, but I dunno what to think about those tits now
>>
>>102627413
I just jerked off too...
>>
>>102627252
CogVideoX?
>>
>>102627505
PS OR, how about
>fake, ceremonial tits that adhere to skin under the bra etc
The priestess is a man, b/c god requires a bride to run the temple or something. Not far from some bullshit that actually happened in Syria.
>>
>>102627529
haven't tried it, loading it up now, thanks!
>>
File: 0.jpg (188 KB, 1024x1024)
188 KB
188 KB JPG
>>102627337
>>
slow monday
>>
>>102628101
post some gens
>>
>>102628119
You
>>
File: file.png (204 KB, 3571x837)
204 KB
204 KB PNG
https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose/blob/main/README_en.md
looks like they updated CogVideoX-Fun
>>
>>102628122
i'll do it if u do it
>>
>>102628155
Good 4 u
>>
im scared
>>
>>102628172
why
>>
>>102628228
aliens are talking to me
>>
>>102628414
Tell 'em to STFU.
>>
File: 0.jpg (136 KB, 1024x1024)
136 KB
136 KB JPG
>>
>>
>no new toys for anon
>>
>>102629146
he barely played with the toys he currently has, spoiled little shit
>>
Why doesn't llama.cpp statically link rocm? And why isn't there a binary version so I don't have to shit up my local install with amd's notoriously vile amdgpu scripts?
>>
>>102629478
sorry, wrong thread
>>
/lmg/ won
>>
>ComfyUI uses a web interface
>browsers are vram pigs
>>
>>102629511
can't you disable that?
>>
File: catbox_rmia1w.jpg (1.28 MB, 1728x1344)
1.28 MB
1.28 MB JPG
>>102629146
illustriousxl
https://files.catbox.moe/2vtome.png
>>
>>102629723
that's really nice ty for catbox
>>
>>102629734
https://civitai.com/models/811067/illustrious-xl-smoothft
the specific checkpoint
and this lora seems to generically increase quality at low power but im not married to it
https://civitai.com/models/798443/some-style-for-illustrious-xl
>>
>>102629511
I've found the manager node (essential really) kills FPS as well. It's bullshit.
>>
>>102629797
What's the point of the new model if the base quality is so bad that you need a lora to fix it?
>>
File: 72601.jpg (490 KB, 1944x1504)
490 KB
490 KB JPG
something something big bara tiddies
>>
>>102629882
enhance the bulge
>>
>>102629865
the base model is underbaked but only just so, and it's new enough that we're in the early schizo phase of plugging in different tech sorcery to see what works
>>
File: pic.jpg (970 KB, 1430x1800)
970 KB
970 KB JPG
>>
>>102629949
what happened
>>
>>102627225
What model/lora(s) is this?
>>
>>102630143
flux dev nf4.

here's the workflow:
https://files.catbox.moe/mhpv01.png
and this is the image I used for img2img:
https://files.catbox.moe/q9iu2m.png

and this post explains the workflow somewhat: >>102610322
>>
>>102630204
can you post more pics from your batch you did last night
>>
>>102630242
ok maybe a few more
>>
>>
>>102630204
So good
>>
>>
>>
>>102629954
My cat ran over my keyboard.
>>
>>102630295
>>102630287
>>102630274
cool stuff man, some of em give me a nostalgic vibe from msn messenger webcam days
>>
anyway now I'm revisiting an old prompt idea that never quite turns out the way I want it to—but the idea itself is stuck in my head, so occasionally I return to it. Someday I'll figure out how to prompt what I'm looking for
>>
honk mimimimimi...... hoonk... mimimimimi
>>
File: 0.jpg (200 KB, 1024x1024)
200 KB
200 KB JPG
>>
File: 0.jpg (277 KB, 1024x1024)
277 KB
277 KB JPG
>>
>>102629949
I've seen her before
>>
>still like two years from doing this locally
Why live
>>
>>102630922
erm, what the sigma?
>>
>>102630922
nice, catbox?
>>
>>102630922
anon
>>
>>
>>
how do we make stuff similar to Nijijourney using local? I see all these images on X and it makes me so jealous
>>
>>102630331
I never thought of it as nostalgia but maybe I'm just 100 years old now and iut of touch. I guess most people's images haven't been that shitty in over a decade.
>>
>>102631223
for some reason it just reminded me of the shitty webcams of that period. the good times in msn add threads on /b/ from 2005-2008 that were popular back then.
>>
>>102622356
What model is this? Flux? Catbox?
>>
>>102631181
Loras are the closest you'll get and they're still piss poor imitations. Iktf though anon, iktf
>>
>>102631181
Ikr, Midjourney Niji can make beautiful drawings, I love their shadows, that's the model that can truely measure to good artists and doesn't look like AI slop
>>
File: file.png (1.54 MB, 1024x1024)
1.54 MB
1.54 MB PNG
https://reddit.com/r/StableDiffusion/comments/1ft9kjw/cogview3plus3b_really_great_prompt_comprehension/
>CogView-3Plus-3B
Yeah... it doesn't look great at all
>>
>>102630988
https://files.catbox.moe/uoq7kk.jpeg
>>102631044
Ye?
>>
>>102631644
>https://files.catbox.moe/uoq7kk.jpeg
kek i should have expected that
>>
>>102631538
pitiful
>>
>>102631538
>that plastic look
the chinks trained their model with a shit ton of AI pictures innit?
>>
File: ComfyUI_21616_.png (1.82 MB, 1040x1520)
1.82 MB
1.82 MB PNG
SD3 kind ok sometimes
no bumchin in site at least
>>
>>102631764
wow how did i misspell sight lol oops
>>
>>102631538
where do you even actually download this shit
>>
>>102631710
it kinda looks like maybe it needs a different sampler. Hunyuan is like that, it's absolute horrendous dogshit at photographic gens with like Euler SGM Uniform and shit, but if you switch to SDE-based stuff it's 1000x better
>>
>>102631788
https://github.com/THUDM/CogView3
>>
So where's the demo?
>>
File: file.png (1.66 MB, 850x1170)
1.66 MB
1.66 MB PNG
>>102631764
>kind ok sometimes
>>
>>102631826
there's none, and now it's obvious why, it looks like shit
>>
>>102631811
definitely operator error. the examples on the github aren't anything to write home about but they're not as bad as that post
>>
>>102631844
>the examples on the github
maybe they're from the regular base model and not the "Plus" one? Idk what's the difference between the 2 of them lol
>>
>>102629506
In thread faggotry competition.
>>
>>102631827
I think this is my fault DESU, I had the denoise too high during hi-res-fix, the original gen didn't have the double finger looking at it. I didn't even catch that though lol
>>
File: 00337.png (553 KB, 576x1024)
553 KB
553 KB PNG
>>
File: 00343.png (452 KB, 576x1024)
452 KB
452 KB PNG
>>
>>102630204
>enormous boobs titcow. freshman year!! my friend min jee park, ugh she is the worst roommate so messy, but her body wow! ugly butterface korean. I want make sex to her chubby cleavage and huge fat booty we go to ucla together, hourglass figure hotness holy cow. first year at UCLA in the Dykstra Hall dorm. She's probably going to get into kappa delta which is honestly perfect for her. the tube top is aerie and the cheeky shorts are abercrombie
kek
>>
File: IMG_0340.jpg (1.03 MB, 1125x1367)
1.03 MB
1.03 MB JPG
>>102631684
I have no idea what else you could have been asking for
>>
>>102632232
i was just joking but it gave me a laugh, nice cock bro

>>102632214
so this is prompt engineering
>>
File: 00350.png (473 KB, 576x1024)
473 KB
473 KB PNG
>>
>>102632232
Prompt?
>>
right now I'm getting some very fake-looking women because my prompt is too ambiguous and I can see it yo-yoing back and forth from artstation slop style to anime to 3D digital art to photograph etc as it gens, usually eventually settling on a photograph but sometimes ending up some kind of photoshopped hybrid thing

>>102632214
the thing about a good base model is that you can say a few things that you know work and then spend the other 80% of the prompt trying shit that probably doesn't do much, but you hope it will—in other words you can just have fun with it, and sometimes it just werks

>>102632178
nice, we love some plump homely gals don't we
>>
>>102632645
to be clear, I don't "like" this style and wasn't aiming for it, but I do find it neat as a natural consequence of denoising being performed in steps with the prompt's ambiguity being resolved at each step, each time potentially in a different direction than before.
>>
File: 0.jpg (148 KB, 1024x1024)
148 KB
148 KB JPG
>>
>>
>>102632822
face is a little munged, but i like the style
>>
File: 00353.png (628 KB, 768x1024)
628 KB
628 KB PNG
>>102632645
>plump homely gals
Indeed
>>
>>102632546
>a hospital tray on a grubby tile counter. on the tray, a staten island charcuterie board.
(It’s a real photo sorry)
>>
>>102632853
the face only looks funny because of the video game lighting on the nose. It's a quirk of the style. Other than that it's too small to see if there's anything wrong with it, I think it looks basically fine
>>
File: 00356.png (644 KB, 768x1024)
644 KB
644 KB PNG
>>
>>
File: file.png (117 KB, 256x256)
117 KB
117 KB PNG
All I do is train now. Now I want to try training a VQ-VAE. Also training a 1B Pixart 16 channel VAE model. Now that rumor is the 5090 will have 32 GB of VRAM (maybe 48 GB for Titan RTX AI), I'm pretty hyped.
>>
>>102631764
it can look better than that even
>>
File: 00370.png (463 KB, 1024x768)
463 KB
463 KB PNG
>>
>>102633221
>VQ-VAE
What's that?
>>
dead general
>>
hibernation mode
>>
>>102630922
>two years
Too optimistic
>>
File: 00379.png (539 KB, 576x1024)
539 KB
539 KB PNG
>>
File: 00380.png (567 KB, 576x1024)
567 KB
567 KB PNG
>>
Is there anything for comfy that will let me view checkpoint / lora info saved in a txt document? It's annoying trying to figure out which loras have activation texts and what those are
>>
>>102634191
I just keep a list in Obsidian.
>>
https://pytorch.org/blog/pytorch-native-architecture-optimization/
will this work on gguf quants? that looks interesting
>>
Official pixart bigma waiting room
>>
5090 waiting room
>>
I'm calling it now. The 5090 will draw 1200 watts.
>>
To purchase a 5090, you will require a notarized letter from a master electrician of your intent to comply with local, state, and federal guidelines on the installation of industrial equipment.
>>
To purchase a 5090, you will be required to read the nda aloud, while the ai assesses the authenticity of your performance.
>>
>>102634588
there was a leak that was saying that the 5090 would be 32gb + 600W
>>
>>102634618
Sounds underpowered, hopefully that's the 5080
>>
File: 00002-2584379894.png (1.52 MB, 1152x1632)
1.52 MB
1.52 MB PNG
>>
cough
>>
>>102634466
This is basically bitsandbytes made by Pytorch with NIH syndrome. Since that doesn't play nice with GGUFs, there is no reason this will either until someone works on it.
>>
The 5090 will be the first card designed during covid, won't it be?
>>
so are we gonna migrate back to /sdg/ or what? this is a little too dead for me
>>
>>
>>102635048
0/10, better luck bext time.
>>
>>102635048
I'd rather one post per month than deal with the insufferable avatarfags desu
>>
>>102635048
I downloaded a bunch of pornstar loras. I'm saying Flux is so back.
>>
>>102635127
true
>>
Anyone made local chatbot for prompting? I'm running Mistral-Small-Instruct-2409-IQ3_M and I'm trying to make it versatile as possible. Struggles with booru tags.
>>
>>102635496
What are you trying to do?
Prompt expansion?
Mistral might not have seen the tags during training.
>>
did comfy break in the last 10 hours while I was asleep and you all fixed it immediately?
>>
nvm, pip broke for some fucking reason
>>
Is it possible to gen with zluda on rx570/580 4gb? How scuffed the experience will be? Is it much more better than directml? Thinking about trying fooocus, will it be friendly with vram usage, as 4gb is probably atrociously low for ai slop?
>>
>>102636647
quick search on google and browsing github/reddit llinks that popped up says you can gen on it. if you can get it working, good luck man.
>>
I'm trying to train a style (artist) lora for SDXL. I've tried some settings from guides, tried some presets made for kohya_ss, also tried to use the same settings from existing working lora metadata.
But the results look suspiciously similar - lora samples are horribly distorted, without any artist specific features. Loss grabs look similar too, with a quick drop to 0.07-06 and then just fluctuating there all the time (with random spikes).
I just don't understand what the fuck I'm doing wrong...
>>
>>102636027
Just something I can bounce ideas and prompts from. It can be steered pretty well with starting message and prompt knowledge can be expanded with silly taverns world lore settings. It knows the structure and what words to avoid, but It repeats itself way too much
>>
>>102636928
Did you try it with basic Prodigy settings? Did you use tags? You better make it save every X epoch so you can properly test if it works or notwith actual checkpoint
>>
>>102637339
>but It repeats itself way too much
i think that happens when you got the instruct format wrong. mistral did some really confusing changes to the whitespaces around [INST] [/INST], even i'm not sure if i have the right instruct format or not, try asking >>>/g/lmg/
>>
just came back from pooping, did anything interesting happen while i was gone?
>>
>>102637548
Gemma2 might be better for it
>>
>>102637545
Yes, I tried Prodigy too. It worked pretty much the same.
I use captions, I use WD14 with removal of low level tags like bow, underwear, etc. and also prune artist specific tags.
I already save it every epoch and generate samples every 200 iters.

Here's a note - almost all guides say you don't need 20-50 epoch trains and you can get working lora in about 2000-4000 iters. Right now I'm trying to train on a dataset of 200 images with 2 iterations for 50 epochs to test if it's underfitting.
>>
>>102637685
If you want to train just style you could try training without captions
>>
>>102633627
It's a VAE that outputs tokens, which could be useful for transformers based models which predict only on tokens rather than denoising.
>>
desu waiting for the next habbening
>>
>>102637685
Quick test with prodigy: lower dataset to 50-60 images, 800 steps, batch 2, gradient 2, save every 5 epoch
>>
File: bComfyUI_119104_.jpg (218 KB, 1024x1024)
218 KB
218 KB JPG
>>
aCHOO *sniff*
>>
File: ComfyUI_temp_zuxtq_00007_.png (3.18 MB, 1704x1280)
3.18 MB
3.18 MB PNG
>>
>>102638656
impressive
>>
File: ComfyUI_temp_zuxtq_00016_.png (3.44 MB, 1704x1280)
3.44 MB
3.44 MB PNG
>>
>>102635048
Anon in cryosleep waiting for Bigma
>>
haven't been here for about a month, any good Flux news? new controlnets, finetunes, anything?
mainly looking for:
>inpaint finetune
>upscaling that actually remains consistent vs. typical tiling
>facial image prompting so i don't have to train loras and curate a dataset for specific people
>>
Someone tell them to stop making giant loras
https://www.reddit.com/r/StableDiffusion/comments/1ftmapd/ultrarealistic_lora_project_flux/
>>
trying to switch to raw sd-scripts training and i've been cleaning up the script i'm working off.

Traceback (most recent call last):
File "F:\sd-scripts\sdxl_train_network.py", line 185, in <module>
trainer.train(args)
File "F:\sd-scripts\train_network.py", line 512, in train
accelerator.print("running training / \u5b66\u7fd2\u958b\u59cb")
File "F:\sd-scripts\venv\lib\site-packages\accelerate\accelerator.py", line 1086, in print
self.state.print(*args, **kwargs)
File "F:\sd-scripts\venv\lib\site-packages\accelerate\state.py", line 970, in print
PartialState().print(*args, **kwargs)
File "F:\sd-scripts\venv\lib\site-packages\accelerate\state.py", line 696, in print
print(*args, **kwargs)
File "C:\Program Files\Python310\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 19-22: character maps to <undefined>


i googled it and i guess the file isn't in utf-8 which is what sd-scripts wants, but i feel like just converting the file format is a shitty solution, anyone else run into this?
>>
>>102639032
>giant model
>not giantess
>>
>>102639120
actually i just saved it in utf-8 in notepad so i dunno lol
>>
>>102638993
no
>>
>>102639120
The cp1252 tells me there is some windows involved ;) you could pass an encoding to the open() if you didnt write it yourselve you might have a less bad time just converting the file tho.
>>
>>102639120
\u5b66\u7fd2\u958b\u59cb
yummy
>>
>>102640129
yeah i did some more looking at there are a bunch of info messages it's trying to print in jp which causes it to bug out for some fuckin reason
>>
>>102640190
because you don't have jap installed, just delete them
>>
How do I use loss masks with sd-scripts? Like, do I put the masks in their own folder, name them the same as the training images, then specify that folder in the dataset config or something? What's the command for it? The readme on kohya just talks about controlnet and I'm too stupid to get what it means.
>>
>>102640208
good idea, the message it was trying to print was "starting training," lol
>>
>>102640208
it should not need to be installed for utf to work, my guess is his string (internal utf8 representation) gets converted into win 1252 somewhere which obv fails.
>>
>>102640390
win10 issue maybe
>>
File: ComfyUI_34194_.png (1.46 MB, 848x1024)
1.46 MB
1.46 MB PNG
>>
File: ComfyUI_34218_.png (1.31 MB, 848x1024)
1.31 MB
1.31 MB PNG
>>
>>102640190
does this thing log?
>>
File: ComfyUI_34219_.png (1.09 MB, 848x1024)
1.09 MB
1.09 MB PNG
>>
>>102640390
it's a python bug because the terminal doesn't support the string, it's not the "code" it's the terminal throwing an error
>>
>>102640419
i didn't see it. the scripts have en messages and jp messages listed for each thing it tries, so i imagine there's something early on that's supposed to detect the system lang and print it in the appropriate one, but didn't, so it defaulted to the creator's lang? anyway, just deleting the jp text like anon suggested fixed it.
>>
>>102640400
>>102640415
>>102640423
good stuff
>>
>>102640465
>so i imagine there's something early on that's supposed to detect the system lang and print it in the appropriate one
Nope, it just prints both
>>
>>102640462
or it's this, i'm no programmer
>>
>>102640399
python 3 has funky ideas about text you see internal its utf, output is in shits and giggles if its windows. Old python would just ignore and move on with 3 it throws.
>>
>>102640462
Id say the bug is in the code, python just did what it was told, if it cant map it has to react and default is to throw
>>
>>102640612
RETARD IF THE TERMINAL THROWS AN ERROR THAT IS PYTHON'S PROBLEM
kys
I don't know why it's hard for faggots like you
>>
>>102638724
nice
>>
>>102640677
the terminal displays what phython throws yes ;)
>>
When you load SapianF Flux finetune with 16fp T5, you get bluescreened in your face.
>>
>>102640998
it's time for memtest, bucko
>>
>>102641015
My mem is fine, kthxbai
>>
>>102640998
>SapianF
>The dataset for males now contains 175 images, and the female dataset now consists of 75 images
>>
>>102641096
Finally some decent dicks.
>>
>>102638287
Tried this, 50 images with captions, batch 2, grad accumulation 2, 800 iters total.
Result is pretty much the same...
>>
File: 00564-920720096.jpg (1.01 MB, 1620x2160)
1.01 MB
1.01 MB JPG
>>102641393
If settings are good then it has to be the dataset. Prodigy overfits if anything
>>
>>102641827
too flat
>>
File: file.png (519 KB, 638x803)
519 KB
519 KB PNG
it really is crazy watching an AI learn how to make colors, there really should be study about how it learns compared to biology because it always starts with blobs and darks and lights and slowly things get coherent, color gets added, details, etc
>>
>>102636928
>lora samples are horribly distorted

Too much or too little variation in the dataset, too high learning rate, too high output resolution, and too high or too low network dimension can cause this problem.

My first suggestion is to add noise or blurriness to your images. There's a setting to add noise called "Noise offset" but I usually do it manually.
>>
>>102642001
>Too much or too little variation in the dataset
Is there any specific requirements for style datasets?
>>
File: 00007-736784397.png (265 KB, 512x512)
265 KB
265 KB PNG
Illustrious is pretty bad so far. Doesn't seem to even compare to Pony V6
>>
>>102641953
That is pretty interesting. PixArt?

>>102642092
512x512 might be too low resolution for it
>>
>>102640218
incase anyone else needs this in the future, /hdg/ gave me: https://rentry.org/d2ckzxmq
>>
>>102642225
That's the VQ-VAE learning image reconstruction
>>
>>102642063
No, as far as I know. You have to figure it out yourself.
>>
>>102642416
there is no checkpoint, just vae alone is doing that?
>>
>>102642542
Yeah it's a made from scratch VQ-VAE being trained on raw images but really all it is is a neural network compression system. It learns how to compress an image down into tokens and how to turn those tokens back into an image. Give those tokens to a transformers network and you do next token prediction conditioned on caption tokens and now you have a text to image model. Also since you're working with image tokens you can do interesting image to image stuff since we're just working with tokens and trying to predict with tokens. So you could do something like: <img_tokens> "Change their shirt blue" <txt_tokens>. As a prompt and it would be able to do it.

One of the hard parts is getting the VAE trained though and apparently VQ-VAEs can be temperamental and are actually quite old. I'm starting with 20m parameters with cross and multi-head attention on 256px patches and seeing how it goes because it needs to more or less produce 99% accurate reconstructions.
>>
>>102630791
Kirsten Dunst in "Interview With The Vampire"
>>
>>102642675
So it's the same tech that's being used with this https://github.com/buaacyw/MeshAnything ? Cool stuff, can't wrap my head around it.
>>
>>102638724
this is nuts. Flux?
>>
>>102642840
Ultimately it's just turning things into matrices and running some predefined formulas, it's really just really temperamental cooking.
>>
Pulid on Flux seems to be working on ComfyUi now, let's test that out:
https://github.com/balazik/ComfyUI-PuLID-Flux
>>
>>102643100
how's the testing going?
>>
File: file.png (712 KB, 3840x1739)
712 KB
712 KB PNG
>>102643204
not good lol, I'll try the regular workflow he provided to see it it works his way
>>
File: file.png (1.8 MB, 3388x1563)
1.8 MB
1.8 MB PNG
>>102643204
>>102643100
Ok ok, it's working, I give my workflow for those interested
https://files.catbox.moe/raseau.png
>>
>>102643328
>https://files.catbox.moe/raseau.png
TY anon
>>
File: file.png (1.91 MB, 1024x1024)
1.91 MB
1.91 MB PNG
>>102643100
I'm not gonna lie that's pretty good, too bad it doesn't seem to work with cfg > 1 on that node
>>
>>102643100
>V0.1.0: Working node with weight, start_at, end_et support (attn_mask not working)
>attn_mask not working
What does that mean?
>>
>>102643490
>doesn't seem to work with cfg > 1
DOA
>>
>>102643581
it's supposed to be working at CFG > 1, I guess that this node is just a proof of concept, he'll improve on that for sure
>>
File: file.png (1.5 MB, 1024x1024)
1.5 MB
1.5 MB PNG
>>102643100
kek, for a setting that has an early version of PuLID (we'll get a new version on october), without mask attention and with only CFG = 1 that's not that bad
>>
Are there any good UI's that can talk to a backend API and doesn't install CUDA etc? I'm using AMD on Windows (yes, I know...)
I've got great performance using stable-diffusion.cpp and ROCm, but it looks like every UI out there is like "And now install pytorch" which I can't do...
Surely someone has made something that just calls an API endpoint instead of needing to run everything locally?
Worst case I can implement the StableHorde API and pretend to be a horde of 1, but that looks painful compared to running a /txt2img endpoint.

Btw I tried DirectML and it's dog slow compared to ROCm
>>
File: file.png (1.58 MB, 1024x1024)
1.58 MB
1.58 MB PNG
>>102643490
>>
>>102643708
have you tried a web browser?
>>
>>102643708
The reason we have these problems is that none of the programmers own AMD cards, and they barely care about them. AMD has very minor market share.

AMD cards are vastly more powerful than nvidia ones, however.
>>
I only need 3 gaming computers.

1. for Flux, genning all the time
2. llm for someone to talk to about my day and stuff
3. for gaming while I think of something to say
>>
File: file.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
>>102643733
it also works fine with loras, for example I went for Wind Waker Lora here
>>
File: file.png (951 KB, 1024x1024)
951 KB
951 KB PNG
>>102643809
>>
File: file.png (2.52 MB, 1024x1024)
2.52 MB
2.52 MB PNG
>>102643877
>>
>>102643737
My AMD GPU is in my desktop gaming PC which is running windows. I don't see how a web browser helps here.
>Put your gpu in another computer
>Dual boot linux
No.

>>102643753
In my experience my 7900XTX underperforms a 4090 but at a much better price point - I'm getting ~1.10 it/s on SDXL, for reference. But having a "cheap" 24GB card lets me just about run SDXL and a quantised 8b LLM model at the same time which is pretty cool.

But I'm not really complaining "Wah AMD no worky" - AMD works fine for the actual image generation, it's just frustrating that all the UIs seem to want to bundle the entire stable diffusion ecosystem and CUDA into their backend instead of being able to use an API endpoint running somewhere else (i.e. my own jury-rigged setup)

Anyone know of anything or do I have to build it myself / rip out the internals of A1111 and replace with my own implementation? I'm lazy so I'd rather not
>>
File: 0.jpg (206 KB, 1024x1024)
206 KB
206 KB JPG
>>
File: file.png (2.15 MB, 1024x1024)
2.15 MB
2.15 MB PNG
>>102643907
>>
>>102643544
have faith in man from china
>>
File: file.png (2.03 MB, 1024x1024)
2.03 MB
2.03 MB PNG
>>102643947
when using PuLID it kinda destroy the text abilities though, can't even write "OWN" correctly kek
>>
>>102643914
>In my experience my 7900XTX underperforms a 4090
Yeah, but it's going through a translation layer. Basically, the xtx is emulating the 4090, which is crazy.
>>
>>102644075
It's not emulating it, I think? HIP/ROCm just provide a very similar (but legally different) API to CUDA, see src/ggml-cuda/vendors/hip.h in the ggml source code (used by stable-diffusion.cpp). So it is running natively, but I think the compute kernels just aren't as optimised as the ones in CUDA and the 7900XTX doesn't have as many cores as a 4090 either.
>>
>>102643914
>I just want an API (for a hosted AI)
That's what web browsers are for.
>>
>>102644122
My understanding is some vram is used by rocm to translate what it can't do natively. Was I wrong?
>>
>>102644042
Just wait for the 5090 and Titan AI releases and you're going to see a lot small but excellent 1B models.
>>
>>102644195
I'll tell biden-harris to ban the 5090 because it's used by sexists to emulate abuse of women.

:^)
>>
>>102644206
The powers that be like AI given Nancy Pelosi was for blocking the anti-AI bill in CA. So I think I'll be fine.
>>
>>102644122
HIP is the runtime that actually does the compute, ROCm is the stack. Nvidia calls everything CUDA which is the difference. Intel has a similar naming philosophy to AMD, they call their stack oneAPI but their actual runtime is using SYCL or their HIP equivalent which is Level-Zero. And yes, you are spot on.
>>
DON'T vote for Trump - vote for ME - I will hold public lynchings of furries before every football game.
>>
>>102644214
:^)

she has no idea what any of this is. kammy is a Swiftie
>>
>>102644236
Kammy will do what the powers that be want you fucking retard. AI is cheap labor so it will be allowed, in case you missed the memo the Uniparty don't care about anything except grifting the populace.
>>
>>102644178
That's not what I said. I *have* an API that wraps stable-diffusion.cpp and provides the /txt2img API endpoint of A1111 - What I am looking for is a nice piece of software that can use that API, that I can use to play around with various parameters and models, without having to keep track of everything myself and calling the API manually/via a script.

Yeah, I'd love to use a web browser to connect to a web-based UI that uses my API as its backend, but as far as I can tell no such UI exists. At least, that's the question I'm asking - Does such a UI exist?

I'm currently using SillyTavern for this because it is *somewhat* servicable, but the limitations (1 image at a time, doesn't record the seed or the full prompts) are beginning to be a pain.
>>
File: 0.jpg (143 KB, 1024x1024)
143 KB
143 KB JPG
>>
File: file.png (1.84 MB, 1024x1024)
1.84 MB
1.84 MB PNG
>>102643490
kek
>>
>>102640415
Nice
>>
File: bComfyUI_124109_.jpg (703 KB, 1440x1080)
703 KB
703 KB JPG
>>
>>102644220
I don't think so, I think that amd has to translate lots of commands, because of how the code works.
>>
>>102644254
heheheheheheheheheh

she's gonna ban profits dude
>>
>>102623700
period only 3 months out of the year? hmmm
>>
File: linux003.png (292 KB, 2640x1036)
292 KB
292 KB PNG
>>102645099
It's quite easy to see what their stack is doing, the code is open source and they have architecture diagrams. Pic related. There is no translation involved if you aren't using CUDA/HIP translation. If you are, then yes, but only because you aren't running the ML code optimally.
>>
>>102645404
Is hip too confusing without rocm help?
>>
>>102645404
It's not, it's just being used as a qualifier here. As I said in >>102644220, ROCm is just being used as a way to indicate that the package belongs there but unliike with CUDA, the runtime and stack isn't really differentiated whereas here, this does.
>>
File: file.png (2.21 MB, 1024x1024)
2.21 MB
2.21 MB PNG
>>
Has anyone benchmarked rocm on nvidia hardware for a comparison?
>>
>>102645568
do one with the penis arm lora
>>
>>102645621
kek, go for it anon, everyone has access to this thing
>>
If you trained schnell on https://huggingface.co/datasets/nyanko7/danbooru2023 it would basically be ponyxl right?
>>
>>102645652
it would be way better, schnell is a way better base model than SDXL
>>
File: file.png (2.31 MB, 1024x1024)
2.31 MB
2.31 MB PNG
>>102644456
>>
File: file.png (1.86 MB, 1024x1024)
1.86 MB
1.86 MB PNG
>>102643100
it works better if you go for the celebrity's name instead of going for "A man" or "A woman" because flux knows them a bit so it helps
>A man juggles a virus with his tennis racket, wears a hat that says “Novax”, and wears a t-shirt that says “DjoCovid”.
>>
https://youtu.be/cPVGs0_fu1U?t=213
this is insane, the chinks are getting too stronk
>>
>>102645729
>examples

lmao sucke
>>
>>102645729
>closed source
>>
>>102645729
>stronk
Go back from which you came
>>
>>102645729
>>
File: 0.jpg (236 KB, 1024x1024)
236 KB
236 KB JPG
>>
File: file.png (1.87 MB, 1024x1024)
1.87 MB
1.87 MB PNG
>>102643100
bruh
>>
>>102645877
Is this the behind the scenes from Tom and Jerry banned episodes?
>>
>>102645898
more like behind the scenes from Diddy's party kek
https://www.youtube.com/watch?v=i2CMpVf_e4o
>>
File: file.png (1.9 MB, 1024x1024)
1.9 MB
1.9 MB PNG
>>102645877
>>
File: file.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>102643100
Now that we can do Celebrities in an instant, Flux is fun again, yayyy
>>
File: file.png (1.56 MB, 1024x1024)
1.56 MB
1.56 MB PNG
>>102643100
kek
>>
File: file.png (1.93 MB, 1024x1024)
1.93 MB
1.93 MB PNG
>>102645987
>>
File: file.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
>>102645946
>>
https://xcancel.com/LumaLabsAI/status/1840820602296320083
>Welcome to the era of Hyperfast video generation: with 10x faster inference, you can now generate full-quality Dream Machine v1.6 clips in under 20 seconds. No "turbo" or "distilled" models - just uncompromised quality.
How did they do that? If we had this secret sauce we wouldn't have to wait minutes for CogVideoX kek
>>
Next Bred:

>>102646216
>>102646216
>>102646216



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.