What are you criticisms of AI benchmarks? How would you fix them?
>ai benchmarksAnother fucking filter to add, jesus you nigs are painful
>>108705493New models are just the same model finetuned to have a bigger number on AI benchmarks (yes, basically what PewDiePie did on that video where he fine tunned a model to perform better than ClosedAI o3).
>>108705493My criticism is that a benchmark measures a particular metric, like how many operations per second an algorithm or piece of hardware can perform, how much you can bench press, how fast you can run a mile, and so on.It's difficult to distill a wide range of very different tasks, like you find within the field of software development, down to a single, or even just a finite number of, metrics. Even if you throw different AI models at the same, large sample of diverse tasks, evaluating their performance is still subjective beyond "does the result match the spec." And that's not even getting into evaluating their performance on the things not in the spec, like how readable the code is, how expandable/maintainable the code is, and how clean the code is/to what degree the code fits the rest of the codebase (e.g. reusing existing abstractions rather), whether the model considered things not stated in the spec like the business context which LLMs are infamously bad at and is one reason why we are not yet able to fully automate software development.
***PEDOPHILE THREAD******PEDOPHILE THREAD***CAUTION: YOU HAVE JUST ENTERED A PEDOPHILE THREAD***PEDOPHILE THREAD******PEDOPHILE THREAD***
>>108705493I want to benchmark her if you know what I mean
>>108705500kek give up 4chan already
>>108705493benchmark her fertility capabilities by trying to go for a baby multiple times in row
>>108705493They exist so the correct answers will be over trained for, aka benchmaxxed to make bad models look like they perform well, there is no fixing this because no AI house putting out models is going to be honest about doing it.>>108706291Yeah those massive hanging tits tiny waist and wide hips really scream "little girl".