gian_anthropic on AI benchmarks

voices

1 quotes from AI researchers about benchmarks, models, and evaluation

"Claude Mythos is arguably the biggest step change in AI capabilities since the GPT-4 jump. When Mythos is allowed to think longer, act deeper, and better explore the solution space, it passes 92% of Terminal Bench task attempts."

Gian (formerly Replit, now Anthropic) @gian_anthropic · 2026-04-09 view on x

Terminal-Bench 2.0 Claude Mythos Preview