dwarkeshp on AI benchmarks
2 quotes from AI researchers about benchmarks, models, and evaluation
"If Gemini 3 or Claude 4.5, whatever, solves a problem, it is not the case that its own understanding of math has progressed. You run a new session and it's forgotten what it just did."
Terence Tao @dwarkeshp · 2026-03-20 view on x
"There are research efforts to try to create automated conjectures, and maybe there are ways to benchmark these and simulate this, but it's all very new science."
Terence Tao @dwarkeshp · 2026-03-20 view on x