8 quotes from AI researchers about benchmarks, models, and evaluation
"In different hands, Mythos would be an unprecedented cyberweapon. I am not sure how we deal with this, except to note a narrow window where we know only 3 companies could be at this level of capability. But it may be Chinese models (maybe open weights ones?) get there in 9 months"
"So we now have a pretty good picture of the state of the frontier AI model makers.
US closed source models continue to lead. Google, OpenAI, and Anthropic stand well ahead of the pack, and may have signs of recursive self-improvement. xAI has fallen from frontier status for now"
"I think the most obvious is that Meta has its own frontier model and can use that to extract additional value out of its customer base/explore new markets for its products. Very few companies can say that, and it has value on its own."
"After playing with it a bit, Meta’s Muse Spark Thinking is fine so far, but really doesn’t match the current Big Three models. It also is a bit... weird. Like some strange language & tone, a little loose with facts, etc."
"So what's the deal with Amazon Nova? They released Nova 2 in December, and even then, the top flight Nova 2 model trailed Sonnet 4.5. And it still hasn't left preview."
"The US frontier labs have all walked away from open weights. They continue to occasionally release excellent open models (Gemma 4, etc), but they are smaller models that are not competitive with their closed weights models. So all eyes are on Chinese AI labs for open models."
"Anyhow, its not bad. Just not the vibe level that the benchmarks might indicate. And, for a first re-entry into the frontier model space, given the engineering efficiencies they achieved, it feels like a solid attempt. I am sure we will see better from Meta in the future."