Why is AI so useless at data processing? I ask it to do some fuzzy matching on two lists and give me the confidence score of how close the match is, and for about 5% of the data it ends up matching to the wrong item while giving a 100% confidence score, when there's a perfect match that it ignored. This sort of thing is one of the only real productive use cases I can imagine for AI, and it fails miserably at it.
>>107613076What model? Is it a reasoning model, if so what kind of reasoning budget are you using? Are you running each item separately or trying to batch them?
>>107613076it's a scam
>>107613107I don't know what any of that shit is. I'm just writing prompts at chatgpt.com. The specific thing I was trying to do was this: I have two lists of movie titles with their release years, but with slight variations in spelling and punctuation in some of the titles and some of the years are off by +/- 1. So I asked it to map one list to the other, and it ended up mapping some to the wrong movie, even when there was an exact match with title and year in both lists. I've had similar results when trying to do other kinds of data sets too. It can never give me an accurate result no matter what kind of data processing I'm trying to do and it ends up being just a complete waste of time because I have to manually go through the entire list to find the mistakes.
>>107613168>I'm just writing prompts at chatgpt.comI had a feeling that was the case. That's the least powerful and least effective way to use LLMs.If you can program, write a script to do this. Experiment with different ways to present the data to the model. Given lists A and B, I would probably make len(B) calls where each call includes the full list A plus one entry from B. Ideally cache the prompt after the last entry of A to save on cost. Then do something about any leftover entries.If you can't program then you could try vibecoding it with something like Gemini.
>>107613076youre asking a language model to do math and score data
>>107613213It would take less time for me to just go manually go through my list of 7000 entries than experimenting with how to program something like this. I could just filter for the non perfect matching columns in Excel and only have a few hundred entires to have to manually fix. The only reason to use AI was to save me the time of having to do manual work and it fails at that.
>>107613076Useless task
I agree with OP"AI" should be able to do this by now.
>>107613271AI absolutely can do this.The consumer-facing website chatgpt dot com apparently cannot do this, which makes sense because it's limited in the tools it can access and is optimised to aggressively save on tokens and to use the cheapest model it can.Using a different client and forcing use of a high-end reasoning model might be enough.
>>107613076Ask it to write a python program that does the task and to execute it on your data and it will always succeed.Skill issue. But don't worry, language models will soon learn how to do this by themselves, in the background, without even telling you it's happening and then they can do tasks everyone thought would be impossible for LLMs.
>>107613454They already do. OP is just used the free demo that's not intended to actually be useful for anything.
>>107613239The only reason to use AI was to save me the time of having to do manual work and it fails at that.Skill issue gramps. You're a promptlet and you don't know how to use LLM effectively, plain and simple. Git gud.
>>107613076Just ask it to give you a python script to make that task. Most certainly it will nail it in the first try.
>>107613168Ohhh I see now you're just stupid or 12 years old.
>>107613076Are you making it guess the probabilities? You're supposed to make it write code retard??
>>107613076>give me a confidence score NIGGER ITS NOT THINKINGITS JUST MATCHING ONE WORD AFTER ANOTHER BASED ON STATISTICAL LIKELIHOOD It's fucking 2026 almost
Lol at the elitist retards ITTThat's actually not an easy task for LLMThe problem is that you have a bag of problems to solve:1) the first line of list A might have a movie mispelled in title but correct year,list B the opposite2) two movie can have same name, different year and it's actually two movies3) sometimes the mispellings are just abbreviations4) every time there is a mismatch you should track both titles on the side so that you can rematch them You should really try to simplify the problem before feeding this bag of hair to the LLMI would start with List A, forget the year, only the titles.Would ask the LLM to check if every movie in this list is an existing movie, if it has an alternative title, or a more correct spelling. Do the same for list Bdo some line by line manual scaffolding and reordering with excel at this point you should have movies titles in a lockstep and you would have more chances in asking LLM to check the year (don't ask probabilities, just ask the correct year). After that go to another LLM and ask to verify the years again. Any non correspondance you should check by hand