/g/ - Why is AI so useless at data processing? I ask it - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
12/20/25(Sat)11:36:45 No.107613076

File: Chat-GPT-logo.png (37 KB, 768x404)

Anonymous 12/20/25(Sat)11:36:45 No.107613076

Why is AI so useless at data processing? I ask it to do some fuzzy matching on two lists and give me the confidence score of how close the match is, and for about 5% of the data it ends up matching to the wrong item while giving a 100% confidence score, when there's a perfect match that it ignored. This sort of thing is one of the only real productive use cases I can imagine for AI, and it fails miserably at it.

Anonymous
12/20/25(Sat)11:39:27 No.107613107

Anonymous 12/20/25(Sat)11:39:27 No.107613107

>>107613076
What model? Is it a reasoning model, if so what kind of reasoning budget are you using? Are you running each item separately or trying to batch them?

Anonymous
12/20/25(Sat)11:40:43 No.107613119

Anonymous 12/20/25(Sat)11:40:43 No.107613119

>>107613076
it's a scam

Anonymous
12/20/25(Sat)11:47:11 No.107613168

Anonymous 12/20/25(Sat)11:47:11 No.107613168

>>107613107
I don't know what any of that shit is. I'm just writing prompts at chatgpt.com. The specific thing I was trying to do was this: I have two lists of movie titles with their release years, but with slight variations in spelling and punctuation in some of the titles and some of the years are off by +/- 1. So I asked it to map one list to the other, and it ended up mapping some to the wrong movie, even when there was an exact match with title and year in both lists. I've had similar results when trying to do other kinds of data sets too. It can never give me an accurate result no matter what kind of data processing I'm trying to do and it ends up being just a complete waste of time because I have to manually go through the entire list to find the mistakes.

Anonymous
12/20/25(Sat)11:54:06 No.107613213

Anonymous 12/20/25(Sat)11:54:06 No.107613213

>>107613168
>I'm just writing prompts at chatgpt.com
I had a feeling that was the case. That's the least powerful and least effective way to use LLMs.
If you can program, write a script to do this. Experiment with different ways to present the data to the model. Given lists A and B, I would probably make len(B) calls where each call includes the full list A plus one entry from B. Ideally cache the prompt after the last entry of A to save on cost. Then do something about any leftover entries.

If you can't program then you could try vibecoding it with something like Gemini.

Anonymous
12/20/25(Sat)11:56:57 No.107613234

Anonymous 12/20/25(Sat)11:56:57 No.107613234

>>107613076
youre asking a language model to do math and score data

Anonymous
12/20/25(Sat)11:57:50 No.107613239

Anonymous 12/20/25(Sat)11:57:50 No.107613239

>>107613213
It would take less time for me to just go manually go through my list of 7000 entries than experimenting with how to program something like this. I could just filter for the non perfect matching columns in Excel and only have a few hundred entires to have to manually fix. The only reason to use AI was to save me the time of having to do manual work and it fails at that.

Anonymous
12/20/25(Sat)11:59:45 No.107613257

Anonymous 12/20/25(Sat)11:59:45 No.107613257

>>107613076
Useless task

Anonymous
12/20/25(Sat)12:02:03 No.107613271

Anonymous 12/20/25(Sat)12:02:03 No.107613271

I agree with OP
"AI" should be able to do this by now.

Anonymous
12/20/25(Sat)12:26:15 No.107613433

Anonymous 12/20/25(Sat)12:26:15 No.107613433

>>107613271
AI absolutely can do this.
The consumer-facing website chatgpt dot com apparently cannot do this, which makes sense because it's limited in the tools it can access and is optimised to aggressively save on tokens and to use the cheapest model it can.
Using a different client and forcing use of a high-end reasoning model might be enough.

Anonymous
12/20/25(Sat)12:28:58 No.107613454

Anonymous 12/20/25(Sat)12:28:58 No.107613454

>>107613076
Ask it to write a python program that does the task and to execute it on your data and it will always succeed.
Skill issue.

But don't worry, language models will soon learn how to do this by themselves, in the background, without even telling you it's happening and then they can do tasks everyone thought would be impossible for LLMs.

Anonymous
12/20/25(Sat)12:33:01 No.107613491

Anonymous 12/20/25(Sat)12:33:01 No.107613491

>>107613454
They already do. OP is just used the free demo that's not intended to actually be useful for anything.

Anonymous
12/20/25(Sat)12:42:08 No.107613573

Anonymous 12/20/25(Sat)12:42:08 No.107613573

>>107613239
The only reason to use AI was to save me the time of having to do manual work and it fails at that.
Skill issue gramps. You're a promptlet and you don't know how to use LLM effectively, plain and simple. Git gud.

Anonymous
12/20/25(Sat)12:55:25 No.107613689

Anonymous 12/20/25(Sat)12:55:25 No.107613689

>>107613076
Just ask it to give you a python script to make that task. Most certainly it will nail it in the first try.

Anonymous
12/20/25(Sat)12:56:46 No.107613712

Anonymous 12/20/25(Sat)12:56:46 No.107613712

>>107613168
Ohhh I see now you're just stupid or 12 years old.

Anonymous
12/20/25(Sat)12:57:48 No.107613728

Anonymous 12/20/25(Sat)12:57:48 No.107613728

>>107613076
Are you making it guess the probabilities? You're supposed to make it write code retard??

Anonymous
12/20/25(Sat)13:00:32 No.107613758

Anonymous 12/20/25(Sat)13:00:32 No.107613758

>>107613076
>give me a confidence score
NIGGER ITS NOT THINKING
ITS JUST MATCHING ONE WORD AFTER ANOTHER BASED ON STATISTICAL LIKELIHOOD
It's fucking 2026 almost

Anonymous
12/20/25(Sat)13:24:55 No.107613961

Anonymous 12/20/25(Sat)13:24:55 No.107613961

Lol at the elitist retards ITT
That's actually not an easy task for LLM

The problem is that you have a bag of problems to solve:
1) the first line of list A might have a movie mispelled in title but correct year,list B the opposite
2) two movie can have same name, different year and it's actually two movies
3) sometimes the mispellings are just abbreviations
4) every time there is a mismatch you should track both titles on the side so that you can rematch them

You should really try to simplify the problem before feeding this bag of hair to the LLM
I would start with List A, forget the year, only the titles.
Would ask the LLM to check if every movie in this list is an existing movie, if it has an alternative title, or a more correct spelling. Do the same for list B
do some line by line manual scaffolding and reordering with excel
at this point you should have movies titles in a lockstep and you would have more chances in asking LLM to check the year (don't ask probabilities, just ask the correct year). After that go to another LLM and ask to verify the years again. Any non correspondance you should check by hand

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.