[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1769315636523.jpg (65 KB, 960x895)
65 KB JPG
Is there an ai agent I can use to search my entire meme folder and find me the meme/screencap with the text in the image I want? I have like 160,000 meme images
>>
yes this blog steps you through it

https://danielvanstrien.xyz/posts/2024/11/local-vision-language-model-lm-studio.html
>>
I'm thinking of a straightforward program that's just a window or sliding pane on the desktop where you can just ask "find the image with X and Y" and it'll open it for you in Explorer.
It would just plugin some cheap to run gemma model with vision, do an indexing of the folders you want and nothing else.

I might actually vibe code something like that if it's possible and it doesn't exist.

>>108864944
I skimmed over it, reads like he's just trying to sort files through the LLM, not actually searching through them.
>>
>>108863866
Excire, but its paid
>>
>>108865023
>he's just trying to sort files through the LLM, not actually searching through them
i want you to have a critical think here about how each of these tasks are achieved and see if you can perhaps spot the reason this method can achieve the requested aim
>>
>>108865023
>model with vision
Too expensive for what it is, unless you want description of visuals in the image, not just text recognition.
>>108863866
Probably nothing you can just use right away, I can bet money that with the current state of things (security, vibe slop, bloat, 3-letter fags of all sorts, unemployed scammers etc) it is easier to do it yourself.
Will take like 10 minutes to install docker, zed editor, disable telemetry, create devcontainer config a spin up a container, where you grant all access to claude code and prompt it to do the thing.
If you're too lazy to do that, then maybe you don't really need it.
>>
>>108863866
Do you relly want AI training off of your rare pepes?
>>
r5y7u8olp'[
>>
>>108863866
I don't know, whenever I need to post a Pepe I just search it on iFunny or knowyourmeme. If I were you I'd try googling "ai image describer site:github.com" and see what's what.
>>
>>108863866
>Article 13 compliant frog
hahahahhhAHAH
>>
Yes, use an embedding model (I think CLIP large could work), it basically transforms image and text into very big vectors, if you convert a text query to a vector the vectors more similar to this one will be the most similar images. Though you'd need to run each image through CLIP and store the embeddings for each, with that amount of images it'd need at least a few hours of processing. I use it for a 8k image folder and it works well.
>>
put all images in one big pdf file



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.