Wasn't there a model trained to specifically pass the benchmarks that ended up doing just as well despite being less than 1B parameters?I remember it couldn't hold a conversation or anything since all of it's data was geared towards the benchmark.
>>108193077Idk
>>108193091the current path to AGI is just making benchmarks around novel idea/algorithm discovery in a big while loop
>>108193077It's google, they don't need to benchmaxx, the results are probably real. The catch is that 'thinking' is a fuzzy term and they might have left the model run way longer than they would in any version provided to end users, while evaluating against what other provider offers to normal customers. All details are here https://storage.googleapis.com/deepmind-media/gemini/gemini_3-1_pro_model_evaluation.pdf
Now what are the energy consumption levels?
>>108193118There isn't infinite electricity
>>108193156thats the red states problem who built the data centers. but hey, at least they got 2.5 FTEs of minimum wage security guard jobs out of it