/g/ - Automatic content analysis on 4chan + archives - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
Automatic content analysis on (...) 09/05/25(Fri)20:07:44 No.106497005

File: contentdescription.png (372 KB, 1601x659)

Automatic content analysis on 4chan + archives Anonymous 09/05/25(Fri)20:07:44 No.106497005

For how long have images been automatically tagged and described in archives?? Is there content analysis being performed behind the scenes of all content posted? Is this being done on 4chan itself now or just the archives?
See text tooltip above the image being hovered over, using Imagus for hover zoom on all sites.
Has anyone discovered this before?
Discuss.

Anonymous
09/05/25(Fri)20:10:00 No.106497023

Anonymous 09/05/25(Fri)20:10:00 No.106497023

You're retarded if you think glowies haven't been doing this for at least 10+ years
It's only become so cheap that literally anyone with some spare system resources can run it in the background in the past couple years, so I'd say max 3 years

Anonymous
09/05/25(Fri)20:15:26 No.106497074

Anonymous 09/05/25(Fri)20:15:26 No.106497074

>>106497005
the people who do this are autistic and probably have btc being old fags, explains a lot.
theres whole wikis dedicated to cataloging the status, history, and news of archives.
the fact they give it to us is proof their intentions are good. that feature allows you to search for images better, rather than just being able to use filename search. and their "alt text" is not very good. so theyre not spending A LOT on it, demonstrably.
for example it wont really identify characters or people well, its quite generic.
basically these people deserve respect not scorn or paranoia.

Anonymous
09/05/25(Fri)20:16:59 No.106497085

Anonymous 09/05/25(Fri)20:16:59 No.106497085

>>106497005
I have it set up for some boards so they filter some retard threads based on image/OP post description the llm gives to a filter.
It's really easy and gets rid of idiots. For example, now i am gonna add some tweaks so your type of idiotic questions also gets filtered.

Anonymous
09/05/25(Fri)20:18:14 No.106497097

Anonymous 09/05/25(Fri)20:18:14 No.106497097

>>106497085
It gets rid of bbc/isreali spam and propaganda ;) fuck em

Anonymous
09/05/25(Fri)20:19:33 No.106497108

Anonymous 09/05/25(Fri)20:19:33 No.106497108

>>106497085
>a one-off question that will never be asked again
you can't filter one-time posts, lmao

Anonymous
09/05/25(Fri)20:22:02 No.106497120

Anonymous 09/05/25(Fri)20:22:02 No.106497120

>>106497074
>theres whole wikis dedicated to cataloging the status, history, and news of archives.
https://wiki.bibanon.org/FoolFuuka
https://wiki.bibanon.org/FoolFuuk
https://wiki.bibanon.org/FoolFuu
https://wiki.bibanon.org/FoolFu
https://wiki.bibanon.org/FoolF
https://wiki.bibanon.org/Fool
https://wiki.bibanon.org/Foo
https://wiki.bibanon.org/Fo
https://wiki.bibanon.org/Fo
https://wiki.bibanon.org/F

Anonymous
09/05/25(Fri)20:30:43 No.106497179

Anonymous 09/05/25(Fri)20:30:43 No.106497179

>>106497085
>For example, now i am gonna add some tweaks so your type of idiotic questions also gets filtered.
and what kind of "type of idiotic question" does the OP fall under by your own metrics? to me, this is precisely the kind of thread i would encourage over the majority of threads on this board. nor do i see how you would effectively filter this type of thread, or why you would even want to do so. a thread which i am not interested but by all means is applicable and on-topic doesn't warrant any filtering, rather than just hiding the particular thread. i use filters for some stuff that i never want to engage with, but i don't really see the point in continuing to use a website if you have to filter 99% of everything posted on it.

so that brings me to my main question for you, why do you continue to use this website when there are surely other places with discussions that are relevant to you, in which you won't be wasting so much of your own time self-moderating and filtering the majority of the content in said place.

i think you might be the idiotic one here, but i'm not sure how i'd go about filtering out posters like you. maybe i could use something with your grammar structure which is sort of unique. i mean you use some proper capitalization at the beginning of sentences, and OP was capitalized, but then you go and write LLM is lowercase like a retard. not sure how i'd go about dealing with this.

Anonymous
09/05/25(Fri)20:43:30 No.106497270

Anonymous 09/05/25(Fri)20:43:30 No.106497270

>>106497179
>and what kind of "type of idiotic question" does the OP fall under by your own metrics?
Stupid question with obvious answers that any human worth taking notice on any board should be able to answer for himself with few minutes of research.
Rest is tl;dr;dc so whatever you wrote, good for you and thanks for reminding me to remove this thread from watcher.

Anonymous
09/05/25(Fri)21:31:28 No.106497616

Anonymous 09/05/25(Fri)21:31:28 No.106497616

File: file.png (313 KB, 1702x726)

313 KB PNG

in UK the archives filter out 18+ stuff, but its retarded and filters out half of the board, eg. >>>/tv/214307220

Anonymous
09/05/25(Fri)21:43:52 No.106497720

Anonymous 09/05/25(Fri)21:43:52 No.106497720

>>106497616
kek

Anonymous
09/05/25(Fri)22:33:09 No.106498066

Anonymous 09/05/25(Fri)22:33:09 No.106498066

>>106497005
4plebs added that last year

https://x.com/4plebs/status/1802267305356763593

Anonymous
09/05/25(Fri)23:13:23 No.106498307

Anonymous 09/05/25(Fri)23:13:23 No.106498307

>>106497085
I've been waiting for someone to figure out how to block all twitter related posts via image classification, how did you set it up?

Anonymous
09/05/25(Fri)23:16:59 No.106498318

Anonymous 09/05/25(Fri)23:16:59 No.106498318

>>106498307
nta but it seems like he might just use the archive api to get their keyboards (then he blocks an image based on a common keyword)
that can be achieved with a userscript making api calls.
on twitter, it depends on if there is some alt data available (definitely probably) then you can filter directly off that, with again a userscript.
if youre talking about classifying images yourself thats out of my realm of expertise besides running something like cloudflare image classifer workers

Anonymous
09/05/25(Fri)23:20:02 No.106498332

Anonymous 09/05/25(Fri)23:20:02 No.106498332

>>106498318
my bad im high i didnt realize you just wanted to block all twitter posts. alright ill try to set something up that does this.

Anonymous
09/05/25(Fri)23:25:37 No.106498356

Anonymous 09/05/25(Fri)23:25:37 No.106498356

>>106498307
>>106498318
>>106498332
so tell me more about how you want it to block and what. from the catalog, or replies too?
i need to think about being conscious about the API calls. you cant do it too much, you need to batch them. something like if a new thread is found its not shown until its been classified as good. that can happy slowly so it wont be instant, but it will work better that way

Anonymous
09/05/25(Fri)23:57:55 No.106498529

Anonymous 09/05/25(Fri)23:57:55 No.106498529

>>106498356
Hiding literally just any thread starting with a screenshot or quote from twitter (and/or reddit) would drastically improve the site. I usually use the catalog or quickmedia

Anonymous
09/06/25(Sat)00:04:55 No.106498559

Anonymous 09/06/25(Sat)00:04:55 No.106498559

>>106498356
this board doesn't get new threads at a rate like that to matter, it's not like you'll be firing off queries at more than one every 5 mins or so
there were 249 threads made yesterday so an average of over 5 minutes really
if you do replies then yeah it becomes rather ridiculous, you have mental issues if you need to go that far though

Anonymous
09/06/25(Sat)00:08:15 No.106498579

Anonymous 09/06/25(Sat)00:08:15 No.106498579

>>106498559
issue is if 1000 people start using it, it adds up on poor old 4plebs.
same thread should never be looked up more than once on their api - use localstorage to keep keys, etc
ideally the thing would share from the same database so we all only communicate with that specific server that just posts "twatter thread YES/NO" but i dont want to run a server for that

Anonymous
09/06/25(Sat)00:12:13 No.106498602

Anonymous 09/06/25(Sat)00:12:13 No.106498602

>>106498579
I thought you were talking about making your own thing to make api call to llm to tell you if it's a twitter screenshot.
4plebs doesn't archive /g/ so you can't use it here. And who knows if the model 4plebs is using can identify twitter screenshots accurately or will include it in the description.

Anonymous
09/06/25(Sat)00:49:04 No.106498756

Anonymous 09/06/25(Sat)00:49:04 No.106498756

>>106498529
>>106498602
heres the cloudflare worker script

export default {
  async fetch(request, env) {

    if (request.method === 'OPTIONS') {
      return handleOptions(request);
    }

    if (request.method !== 'POST') {
      return new Response('Expected POST request', { status: 405, headers: corsHeaders });
    }

    try {
      const { imageUrl } = await request.json();

      if (!imageUrl) {
        return new Response('Missing imageUrl in request body', { status: 400, headers: corsHeaders });
      }

      const imageResponse = await fetch(imageUrl, {
        headers: {
          'Referer': 'https://boards.4chan.org/g/catalog',
          'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
        }
      });

      if (!imageResponse.ok) {
        return new Response(`Failed to fetch image. Server responded with ${imageResponse.status}`, { status: 500, headers: corsHeaders });
      }
      
      const imageBlob = await imageResponse.arrayBuffer();
      const inputs = { image: [...new Uint8Array(imageBlob)] };
      const response = await env.AI.run('@cf/microsoft/resnet-50', inputs);

      return new Response(JSON.stringify(response), {
        headers: { 'Content-Type': 'application/json', ...corsHeaders },
      });

    } catch (e) {
      return new Response(`Error: ${e.message}`, { status: 500, headers: corsHeaders });
    }
  },
};

const corsHeaders = {
  'Access-Control-Allow-Origin': '*',
  'Access-Control-Allow-Methods': 'POST, OPTIONS',
  'Access-Control-Allow-Headers': 'Content-Type',
};

function handleOptions(request) {
  return new Response(null, { headers: corsHeaders });
}

Anonymous
09/06/25(Sat)00:50:54 No.106498765

Anonymous 09/06/25(Sat)00:50:54 No.106498765

>>106498756
you need to then after creating that, create a binding named "AI"
>Worker Settings
>Bindings
>Add a binding: Workers AI
>Named AI

then install this userscript
https://greasyfork.org/en/scripts/548543-4chan-thumbnail-classifier-advanced/code
>add your worker url to the script

ive made the base here, you can experiment with better models and stuff probably.

demo
https://files.catbox.moe/g3y88o.webm

Anonymous
09/06/25(Sat)00:54:33 No.106498781

Anonymous 09/06/25(Sat)00:54:33 No.106498781

>>106498765
sign up on greasyfork and PM me if you want to in the future

Anonymous
09/06/25(Sat)00:56:38 No.106498793

Anonymous 09/06/25(Sat)00:56:38 No.106498793

Do any of those archives offer full text search?

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.