/g/ - This has to be bullshit right? How are they benchmaxxing it? - Technology

Anonymous

This has to be bullshit right?(...) 02/19/26(Thu)21:46:43 No.108193077

File: file.png (222 KB, 1080x1350)

222 KB PNG

This has to be bullshit right? How are they benchmaxxing it? Anonymous 02/19/26(Thu)21:46:43 No.108193077 Archived

Anonymous
02/19/26(Thu)21:50:34 No.108193091

Anonymous 02/19/26(Thu)21:50:34 No.108193091

Wasn't there a model trained to specifically pass the benchmarks that ended up doing just as well despite being less than 1B parameters?
I remember it couldn't hold a conversation or anything since all of it's data was geared towards the benchmark.

Anonymous
02/19/26(Thu)21:50:57 No.108193093

Anonymous 02/19/26(Thu)21:50:57 No.108193093

>>108193077
Idk

Anonymous
02/19/26(Thu)21:56:49 No.108193118

Anonymous 02/19/26(Thu)21:56:49 No.108193118

File: 1740045265564137.png (99 KB, 1024x836)

99 KB PNG

>>108193091
the current path to AGI is just making benchmarks around novel idea/algorithm discovery in a big while loop

Anonymous
02/19/26(Thu)22:03:04 No.108193146

Anonymous 02/19/26(Thu)22:03:04 No.108193146

>>108193077
It's google, they don't need to benchmaxx, the results are probably real. The catch is that 'thinking' is a fuzzy term and they might have left the model run way longer than they would in any version provided to end users, while evaluating against what other provider offers to normal customers. All details are here https://storage.googleapis.com/deepmind-media/gemini/gemini_3-1_pro_model_evaluation.pdf

Anonymous
02/19/26(Thu)22:04:56 No.108193156

Anonymous 02/19/26(Thu)22:04:56 No.108193156

Now what are the energy consumption levels?

Anonymous
02/19/26(Thu)22:11:21 No.108193197

Anonymous 02/19/26(Thu)22:11:21 No.108193197

>>108193118
There isn't infinite electricity

Anonymous
02/19/26(Thu)22:12:24 No.108193204

Anonymous 02/19/26(Thu)22:12:24 No.108193204

File: 1746340489312813.jpg (108 KB, 728x1299)

108 KB JPG

>>108193156
thats the red states problem who built the data centers. but hey, at least they got 2.5 FTEs of minimum wage security guard jobs out of it