benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
YouTube · 2026-04-10
"On CyberGym, the standard benchmark for vulnerability reproduction, Mythos preview scores a 83.1%. The previous best Anthropic model Opus 4.6 is already at 66.6%."
Omshri
AI Infra Weekly host
CyberGym
Claude Mythos Preview
view original source →
all researcher takes →