Is there an ai agent I can use to search my entire meme folder and find me the meme/screencap with the text in the image I want? I have like 160,000 meme images
yes this blog steps you through ithttps://danielvanstrien.xyz/posts/2024/11/local-vision-language-model-lm-studio.html
I'm thinking of a straightforward program that's just a window or sliding pane on the desktop where you can just ask "find the image with X and Y" and it'll open it for you in Explorer.It would just plugin some cheap to run gemma model with vision, do an indexing of the folders you want and nothing else.I might actually vibe code something like that if it's possible and it doesn't exist.>>108864944I skimmed over it, reads like he's just trying to sort files through the LLM, not actually searching through them.
>>108863866Excire, but its paid
>>108865023>he's just trying to sort files through the LLM, not actually searching through themi want you to have a critical think here about how each of these tasks are achieved and see if you can perhaps spot the reason this method can achieve the requested aim
>>108865023>model with visionToo expensive for what it is, unless you want description of visuals in the image, not just text recognition.>>108863866Probably nothing you can just use right away, I can bet money that with the current state of things (security, vibe slop, bloat, 3-letter fags of all sorts, unemployed scammers etc) it is easier to do it yourself.Will take like 10 minutes to install docker, zed editor, disable telemetry, create devcontainer config a spin up a container, where you grant all access to claude code and prompt it to do the thing.If you're too lazy to do that, then maybe you don't really need it.
>>108863866Do you relly want AI training off of your rare pepes?
r5y7u8olp'[
>>108863866I don't know, whenever I need to post a Pepe I just search it on iFunny or knowyourmeme. If I were you I'd try googling "ai image describer site:github.com" and see what's what.
>>108863866>Article 13 compliant froghahahahhhAHAH
Yes, use an embedding model (I think CLIP large could work), it basically transforms image and text into very big vectors, if you convert a text query to a vector the vectors more similar to this one will be the most similar images. Though you'd need to run each image through CLIP and store the embeddings for each, with that amount of images it'd need at least a few hours of processing. I use it for a 8k image folder and it works well.
put all images in one big pdf file