benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
YouTube · 2026-03-31
"If it is, let us say, two orders of magnitude better than Opus 4.6 at discovering and exploiting vulnerabilities, we could have a real problem on our hands."
EJ
Limitless Podcast co-host
Cybench
Claude Mythos Preview
view original source →
all researcher takes →