2 quotes from AI researchers about benchmarks, models, and evaluation
"If Gemini 3 or Claude 4.5, whatever, solves a problem, it is not the case that its own understanding of math has progressed. You run a new session and it's forgotten what it just did."
"There are research efforts to try to create automated conjectures, and maybe there are ways to benchmark these and simulate this, but it's all very new science."