teortaxesTex on AI benchmarks

voices

1 quotes from AI researchers about benchmarks, models, and evaluation

"very interesting Only Opus 4.6 and GPT 5.4 manage to absolutely avoid total bankruptcy in long-term betting. This is not so much a question of reasoning as one of learning from mistakes. Clearly, when things start going south, they can adjust towards safety. Open models can't."

Teortaxes▶️ (DeepSeek 推特🦊铁粉 2023 – ∞) @teortaxesTex · 2026-04-09 ·34 likes view on x

Claude Opus 4.6, GPT-5.4