1 quotes from AI researchers about benchmarks, models, and evaluation
"With adaptive thinking, maximum effort, and Python tools, Claude Mythos preview scored almost 93% on finding specific UI elements in high-resolution screenshots. That is 10% higher than Claude Opus 4.6."