[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now being accepted. Click here to apply.


[Advertise on 4chan]


File: square-witch.png (105 KB, 498x498)
105 KB
105 KB PNG
Hello /g/, is there any way AI can be used to sort my anime pics collection? I have a folder in which I've been saving random pics for over a couple years now and it has thousands, sorting it manually would take me months probably. Bonus points if you know of any tool that can be ran locally on linux to get the job done, but I'll take anything
>>
>>106656940
sort it how ?
I have used digikam in the past to delete duplicates and group similar pics together, but it's slow af
>>
>>106656940
deepdanbooru-tagger
use google
>>
>>106656940
your parents must be really proud of you
>>
>>106656940
im working on something that does this.
just uses free API calls to google gemini to get a photo and shit out json like this
{
"standards": [
"Screenshot",
"Video Game"
],
"image_content": {
"subjects": [
"Soldier aiming a rifle in a video game"
],
"objects_identified": [
"soldier",
"rifle",
"helmet",
"tactical vest",
"gloves",
"grass",
"trees",
"hills",
"sky"
],
"people": {
"count": 1,
"description": [
"A soldier aiming a rifle"
]
},
"text_visible": [],
"activity_and_events": [
"A soldier aiming a rifle"
]
},
"visual_properties": {
"image_type": "Screenshot",
"composition": [
"Close-up",
"Over the shoulder"
],
"photographic_style": [],
"artistic_style": [
"Photorealistic"
],
"dominant_colors": [
"Green",
"Brown",
"Beige"
],
"lighting": [
"Natural Light"
]
},
"context_and_mood": {
"setting": [
"Outdoors",
"Hills",
"Daytime"
],
"mood_and_atmosphere": [
"Serious",
"Focused"
],
"abstract_themes": [
"Military",
"Combat",
"Warfare"
]
},
"internet_culture": {
"is_meme": false,
"meme_format": [],
"source_media": [],
"humor_type": [],
"subculture_and_community": [
"Gaming"
]
},
"origin_and_platform": {
"platform": "Unknown",
"ui_elements_present": []
},
"content_advisory": {
"is_nsfw": false,
"potential_sensitivities": [
"Violence"
]
},
"file_info": {
"original_filename": "marksmen_recoil.jpg",
"md5_hash": "46f2a4cedc96f5ce7419809a694eb5c7"
}
}

now that can get added to a sqlite3 database. a web browser frontend can act as your search and selector.
the hardest part is coming up with an organization system beforehand that you want all future images to adapt to.
>>
>>106658084
I'm currently doing something similar to replace my old system.

My notes:
Don't use md5 hashes, go with sha512. I haven't had a md5 collision yet but they could be manufactured.

Use tags, way more flexible.
Also, for the content advisory, the openai content moderation api is free, also if you don't mind "sharing" your cm"conversations", you can fet a few million tokens for free each month there too.

Also, if you have local hardware with at least 8gb, the joycaption model can provide additional information that even larger models don't have.

I know openai doesn't let their models identify real people in images, maybe Gemini allows it idk but joycaption has no issues.
>>
File: 1758015792038127.png (47 KB, 190x240)
47 KB
47 KB PNG
>>106658316
>I know openai doesn't let their models identify real people in images, maybe Gemini allows it idk but joycaption has no issues.
it does but only their good one, gemini-pro. i have got it identifying people in "niche" photos. pic related. so its definitely doing some images data mining to get you exact correct details.

and yeah tagging is the way to go. but combining both. like using the json for tag-categories and then just filling those in as you go, with the initial suggestions from the AI.

the reason you want a really refined json system like that, is it ensures that the same types of pictures are actually going to match. if you depend on just keyword tagging, pepe might be "green frog" in one image and "frog cowboy hat" in the next one, not really categorized well. but with a json system, as u can see it's got shit like is_nsfw booleans, humor_type, is_meme that structured categorization is super powerful
but it limits you a lot because, gemini-pro probably wont be cheap to run on your 10k images
>>
>>106658316
No-one is going to manufacture an md5 hash collision just so you accidentally delete an anime picture
>>
>>106658439
>but it limits you a lot because, gemini-pro probably wont be cheap to run on your 10k images
the way i'm going to deal with it is just queuing and batching. see how many i can do a day and design a system to track success/fails.
see if it's viable to run for a month and then sure why not, let it do it for free in the background and any new images get queued too.
>>
File: 1737734508522436.jpg (64 KB, 727x364)
64 KB
64 KB JPG
>>106658084
pic related btw
>>
>>106658439
>if you depend on just keyword tagging, pepe might be "green frog
I've already considered this two years ago and have a solution. You can generate the word embeddings for each tag - or rather the words/tokens in each tag, and store those in a vector database to enable a very fast relatively accurate distance measurement. So even though tman image might be incorrectly tagged"green frog" instead of "pepe", if the word embedding is decent they'll have not too much of a distance in the n-dimensional space.
However, existing word embeddings might not be sufficient for this, after all, they might not associate pepe with just the frog and rather have more of a tendency to group it with something else.
For that, we can also include another mechanism, specifically we can in the captions, character names, descriptions etc. that the models output we can extract how often certain words occur together and thus associate wirds like frog with pepe even if a pre-existing word embedding might not know this. If we combine that with a fuzzy search from your text input you can have a powerful search anyway.
And if you have an nsfw tag then you can still have a binary filter for that even if its not in a fixed structure of a json.

Also, I'm up to 44k images and short videos now xD

>>106658444
Never say never, this entire concept ia already autistic, I'm sure there's another autist that would spend the time to find a hash collisions just to fuck other autists over. I'd do it.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.