emollick on AI benchmarks
8 quotes from AI researchers about benchmarks, models, and evaluation
"In different hands, Mythos would be an unprecedented cyberweapon. I am not sure how we deal with this, except to note a narrow window where we know only 3 companies could be at this level of capability. But it may be Chinese models (maybe open weights ones?) get there in 9 months"
Ethan Mollick @emollick · 2026-04-08 ·529 likes view on x
"So we now have a pretty good picture of the state of the frontier AI model makers. US closed source models continue to lead. Google, OpenAI, and Anthropic stand well ahead of the pack, and may have signs of recursive self-improvement. xAI has fallen from frontier status for now"
Ethan Mollick @emollick · 2026-04-09 ·342 likes view on x
"I think the most obvious is that Meta has its own frontier model and can use that to extract additional value out of its customer base/explore new markets for its products. Very few companies can say that, and it has value on its own."
Ethan Mollick @emollick · 2026-04-08 ·197 likes view on x
"After playing with it a bit, Meta’s Muse Spark Thinking is fine so far, but really doesn’t match the current Big Three models. It also is a bit... weird. Like some strange language & tone, a little loose with facts, etc."
Ethan Mollick @emollick · 2026-04-09 ·84 likes view on x
"So what's the deal with Amazon Nova? They released Nova 2 in December, and even then, the top flight Nova 2 model trailed Sonnet 4.5. And it still hasn't left preview."
Ethan Mollick @emollick · 2026-04-09 ·45 likes view on x
"The US frontier labs have all walked away from open weights. They continue to occasionally release excellent open models (Gemma 4, etc), but they are smaller models that are not competitive with their closed weights models. So all eyes are on Chinese AI labs for open models."
Ethan Mollick @emollick · 2026-04-09 ·41 likes view on x
"Anyhow, its not bad. Just not the vibe level that the benchmarks might indicate. And, for a first re-entry into the frontier model space, given the engineering efficiencies they achieved, it feels like a solid attempt. I am sure we will see better from Meta in the future."
Ethan Mollick @emollick · 2026-04-09 ·22 likes view on x
"It is a good release for December, 2025. With the current round of new releases on deck, like Mythos, it is trailing."
Ethan Mollick @emollick · 2026-04-08 ·1 likes view on x