[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: AI_training.png (226 KB, 505x789)
226 KB
226 KB PNG
Research teams have entire divisions that do nothing but create new RL training environments specifically designed to boost benchmark scores. They treat AIME, SWE-bench, and MMLU like standardized tests. The model practices 10,000 hours on competitive programming problems until every proof technique is at its fingertips.
Then it fails to fix a simple bug in production without introducing two new ones.
Sutskever used the perfect analogy. Student A grinds 10,000 hours of competitive programming. Memorizes every algorithm, every edge case, every proof technique. Becomes the #1 ranked competitive coder in the world. Student B practices 100 hours but has it. Intuition. Taste. The ability to learn new things quickly. Who has the better career? Student B. Current AI models are all Student A.
The benchmark gaming runs deeper than most realize. Studies have shown data contamination inflates model scores by 20-80% on popular benchmarks. The training-test boundary is porous. Models memorize answers rather than learn concepts. And when you control for contamination, much of what looks like intelligence is pattern-matching on seen data. This explains the economic puzzle Ilya pointed to. Models score 100% on AIME 2025. They hit 70%+ on GDPval beating human professionals. Yet businesses still struggle to extract value. The benchmark performance says genius. The P&L says otherwise. The sample efficiency gap tells you everything. A human teenager learns to drive any car after 10 hours. An AI model might need millions of examples and still fail on slight variations. A human learns a concept once and applies it everywhere. Models need to see the exact pattern thousands of times and still choke when the formatting changes slightly. Sutskever's diagnosis: we're moving from the age of scaling (2020-2025) back to the age of research. The belief that 100x more compute would transform everything is dying.
>>
We know this already and Cukman is only buying up ram to stagnate progress from his competition at the expense of everyone else. It's no surprise a charlatan would do this wouldn't you agree?
>>
>>107549717
>not enough generalization
no any
>>
>>107549717
Ok what do i choose when I use github copilot. All I do is ask it things like "is there a built-in library that does X" or "write the boilerplate testing class for Y". There are too many models and I just dont fucking care
>>
>>107549877
claude
>>
The goal is to improve coding capabilities to automate more and more of the process of coding models. If they succeed with that first, then they can quickly boost generalisation capabilities.
>>
>>107549717
ERPfags knew this ages ago. They've been employing LLMs for an unintended usecase on a scale dwarfing all others but programming, and since 2023 there have been no meaningful improvements and even some regressions. There is no generalized improvement, just MOAR COMPUTE + benchmaxxxing.
>>
>>107550005
Why don’t we see any real impact of AI on the GDP and productivity? It’s been 3 full years since ChatGPT release. More than a trillion dollars invested so far, every smart human out there involved in AI, but we’re not seeing any documented impact yet, beyond the benchmarks?
>>
>>107550320
What do you mean? There's been a huge GDP impact
>Microsoft pays OpenAI $100b
>OpenAI pays Oracle $100b
>Oracle pays Nvidia $100b
>Nvidia pays Anthropic $100b
>Microsoft pays Amazon $100b to pay Anthropic $100b
>Anthropic pays Nvidia and Microsoft $200b
>GDP increases by $800b
It's called a "service economy" bro, they're creating services. The services create more services and those services require services to sustain the services. It's all very valuable and productive.
>>
wait is Ilyasviel actually smart? I kept seeing his face in all the AGI in 2.5 months thread so I thought he was the same brand of scammer
>>
>>107549877
use ur brain nigger
>>
What AI do I use to make a userscript to make 4chan suck less?
>>
couple hundred trillions worth of GPUs and sammy can fix it
orbital slop generator soon
>>
>>107549717
Make an AI Ai generate test questions so they can't bullshit their way out of it by memorizing
>>
>>107550057
This. LLMs peaked with GPT-4.
>>
File: ach mein gott.jpg (125 KB, 331x506)
125 KB
125 KB JPG
>we've created the AI version of the leetcode maxxing jeetcoder
sasuga



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.