fchollet on AI benchmarks — benchmark.space

voices

fchollet on AI benchmarks

4 quotes from AI researchers about benchmarks, models, and evaluation

"The new model from Meta is already looking like a disappointment: overoptimized for public benchmark numbers at the detriment of everything else. Knowing how to evaluate models in a way that correlates with actual usefulness is a core competency for AI labs, and any new lab is"

François Chollet @fchollet · 2026-04-08 ·640 likes view on x

"Join the ARC Prize team -- help us build ARC-AGI-4 and ARC-AGI-5"

François Chollet @fchollet · 2026-04-07 ·128 likes view on x

ARC-AGI 2

"One funny thing about the recent rise of LRMs is that the people who were adamant that base LLMs from 2023-2024 could already reason completely missed it, as they didn't know what to look for. You can't notice something you don't expect."

François Chollet @fchollet · 2026-04-06 ·56 likes view on x

"The new model from Meta is already looking like a disappointment, over-optimized for public benchmark numbers at the detriment of everything else. Knowing how to evaluate models in a way that correlates with actual usefulness is a core competency for AI labs."

François Chollet @fchollet · 2026-04-10 view on x

Muse Spark