[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models and UI

Prev: >>106464276

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassic
Chromaforge: https://github.com/maybleMyers/chromaforge
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://tensor.art
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://rentry.org/wan22ldgguide
https://github.com/Wan-Video
https://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Samplers: https://stable-diffusion-art.com/samplers/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbours
https://rentry.org/ldg-lazy-getting-started-guide#rentry-from-other-boards
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
Blessed thread of frenship
>>
They tell me this thread is blessed
>>
neta is the future for anime start learning now
https://neta-lumina-style.tz03.xyz/
>>
>>106469516
>>106469514
all of Gods children are blessed by his grace
>>
File: WanVideo2_2_I2V_00302.webm (565 KB, 1248x720)
565 KB
565 KB WEBM
>>
>>106469536
>.xyz
Oh yes, anon, this link is definitely trustworthy.
>>
File: 1739662154288883.png (897 KB, 1068x950)
897 KB
897 KB PNG
>>106469565
Seems fine
>>
>normalfags calling other normalfags out as being too eager to use the word "clanker" as a slur
ai bros stay winning
>>
>>106469612
I didn't realize styxhexenhammer was this big
>>
File: AnimateDiff_00265.mp4 (2.01 MB, 720x1280)
2.01 MB
2.01 MB MP4
>>
File: 1756562176106549.gif (1.56 MB, 500x500)
1.56 MB
1.56 MB GIF
is there any hope for local t2v? except for making shizo videos, local t2v is obsolete. only local i2v is excellent for serious things
>>
>>106469625
It's pretty good for porn with porn LoRAs but yeah most videos end up like a fever dream.
>>
>>106469625
I mean, we’re basically in the Ford model an era of this stuff and here you are writing off the future of cars. Bit silly don’t you think?
>>
>>106469621
i dont know who that is or how he relates
>>
>>106469625
It's good for producing extremely plausible realistic single frames from the video.
Like if candid realism if your goal I'd just extract a frame from i2v
>>
Anti AI fags have gone fucking nuts the last few months.
>>
File: 2.2 t2v _00183.mp4 (526 KB, 480x880)
526 KB
526 KB MP4
whats the best tool for automated video captioning? I want to try training a t2v lora but I dont wanna caption my entire dataset by hand. I'm trying chatgpt but it cant caption videos for shit.
>>
>>106469652
>the last few months.
That's an understatement
>>
>>106469625
remember how anon hyped Wan T2I? shame the diaperfag decided to tune qwen instead. maybe the bigasp guy will do it.
>>
>>106469658
Gemini. It shits all over the other models for OCR stuff.

For example. >>106469558

Based on the video provided, here is a description of what happens:

The video features two characters from the TV series Star Trek: The Next Generation: the android Lieutenant Commander Data (on the left) and Lieutenant Commander Geordi La Forge (on the right). They are seated side-by-side in the cockpit of what appears to be a shuttlecraft.

The sequence of events is as follows:

Data holds up a blue and white package resembling a pack of cigarettes.

He removes a single cigarette and places it in his mouth.

The tip of the cigarette spontaneously lights up on its own.

Data takes a drag from the cigarette and then exhales smoke.

Throughout this entire action, Geordi La Forge looks at Data with a surprised and bewildered expression.
>>
File: 44406930.mp4 (3.7 MB, 960x544)
3.7 MB
3.7 MB MP4
>>
>>106469536
>Boris Vallejo
Looked up his style and... that's quite sad. Of course, nano banana can pull of his style just fine.
>>
love me ambiguous language
>>
File: WanVideo2_2_I2V_00303.webm (399 KB, 1248x720)
399 KB
399 KB WEBM
According to Gemini

In this video:

The scene opens with Captain Picard looking thoughtful in his ready room. In the background, the Vocaloid star Hatsune Miku, now impressively dressed in a red Starfleet command uniform, walks into the room holding a Starfleet PADD (Personal Access Display Device).

She stops and stands behind the captain. Then, in a move that would surprise even the Borg, Picard picks up a 20th-century-style handgun from his desk and calmly examines it.

It's a fantastic edit, but a few small clues give it away:

The Starfleet Guest Star: As you know, Hatsune Miku, while a cultural icon here in Japan and worldwide, never actually made it onto the Enterprise-D's crew roster.

The Firearm: The biggest giveaway is the prop. Starfleet exclusively uses energy-based weapons like phasers. A projectile-based handgun like that is a massive anachronism and would be completely out of place in Picard's ready room, unless he was in the middle of a Dixon Hill holodeck program.

This is another great example of combining real footage with a digitally inserted character to create a completely new, surreal, and humorous narrative.
>>
>>106469755
Isnt gemini paid? Do you use with some local client like kobold or silly tavern? Im trying to find good uncensored llm for images
>>
>>106469755
those are not good captions
>>
File: 3818249416.jpg (2.03 MB, 1920x2703)
2.03 MB
2.03 MB JPG
>>106469705
>Two elf female characters with pointed ears. The character in the foreground has long, wavy blonde hair, light skin, and wears a blue, off-shoulder dress with white fur trim. She has a worried expression and blue markings on her forehead. The character behind her has long, orange hair, light brown skin, and wears a sleeveless purple top. She has green markings on her face and is embracing the blonde-haired character from behind, with one arm around her shoulder. The background is a textured, dark green and blue gradient, resembling a forest or cave
>>
>>106469887
basterd bitch delete this
>>
File: 1728175481459494.png (133 KB, 491x549)
133 KB
133 KB PNG
What's the most viable captioning method for deviantart-tier freak fetish stuff?
>>
should i get my lazy ass out of bed and finish installing wan? how long do gens take with a 5090
>>
>>106469998
they'll take no time at all you fuckin ass
>>
File: 3388629620.jpg (1.86 MB, 1920x1999)
1.86 MB
1.86 MB JPG
>>106469705
>>106469887
>>
>>106469998
sure / depends on settings but in the order of some minutes with many settings. you can do 1 minute gens at a not too terrible resolution if you take the fast options with 4 steps or so.
>>
>>106470011
my bad man, i genuinely got no clue on this shit
>>
>>106470024
thanks anon
>>
>>106469998
it takes me roughly 4-5 minutes for a 720p 8 second video
>>
>>106470020
Just like my futa doujins!
>>
>>106469998
2.2 is so good it convinced me to try training video loras.

>>106469701
thanks bruv. got everything captioned, surprised it let me do them all for free
>>
File: 775498300.png (1.05 MB, 1536x640)
1.05 MB
1.05 MB PNG
>>
File: ComfyUI_temp_slaev_00002_.jpg (880 KB, 1664x1152)
880 KB
880 KB JPG
Damn, AI inventing new instruments.
>>
File: 3285549005.png (903 KB, 1536x640)
903 KB
903 KB PNG
>>106470133
very mongolian
>>
>>106470141
I fed Gemini a Batzorig video screenshot lol
>>
>>106469883
Yeah but I didn't prompt it how to caption. I was just more interested to see if it could identify what was happening in the video at all.
>>
>>106470085
>surprised it let me do them all for free
np. I assume they do it as a means to capture audience share. Their free stuff is very generous. I honestly just pay for gemini as my GPUs are usually blasting away at training most of the time anyway. It's the best model for captioning in my opinion. And the fact google made veo 3 should indicate as much.
>>
>>106469887
>>106470020

>Reference image

Literally just prompt for the guy
>A caveman carrying a wounded woman while pointing a spear a large flying bird over them while sitting atop a rocky hill by Boris Vallejo

https://files.catbox.moe/zcos9q.jpeg

Local would never.
>>
did comfy fix the shitty qwen edit text encode node?
>>
>>106470186
every fix breaks two more things. python was a mistake
>>
File: ComfyUI_temp_rccnj_00005_.png (2.49 MB, 1152x1664)
2.49 MB
2.49 MB PNG
>>
File: 1729387153496869.png (1.52 MB, 1024x1024)
1.52 MB
1.52 MB PNG
>>106470176
Do you have some special version lol? I get this on nano which isn't even close.
>>
File: 3209517266.png (1.36 MB, 1152x896)
1.36 MB
1.36 MB PNG
>>106470176
Didn't think it would recognize it. Flux can't really handle a more complex composition like that.
>>
File: WanVideo2_2_I2V_00304.webm (1.66 MB, 1248x720)
1.66 MB
1.66 MB WEBM
>>
>>106470235
Not bad. Unlike the original nano banana can't show me booba, so a Chroma LoRA would win anyway.
As for the results you're getting with nano banana, no idea what you're using. I can get his likeness right away even across other seeds.
>A caveman with a shield standing atop a rocky hill while goblins are incoming. A woman kneels beside him by Boris Vallejo

https://files.catbox.moe/ak75k4.jpeg



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.