4 quotes from AI researchers about benchmarks, models, and evaluation
"Mythos scores 93% on this benchmark, and the current best public model, Opus, scores 80.8%. Nothing from OpenAI, Google, or any of the open source models are coming within 13 points of this."
"They pointed Claude Mythos at a Linux machine and without any advanced prompting, any advanced directives, it on its own found multiple kernel issues that it was able to chain together and just have root access to any system."
"Mythos found a bug that had been sitting in the codebase for 27 years. It was a remote crash vulnerability where you can actually just connect to it and just shut it off."