benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
YouTube · 2026-04-08
"When it was controlling command line systems, we have seen it work around several different kinds of sandboxing setups during evaluation and testing that were supposed to limit its actions."
Sam Bowman
AI alignment at Anthropic
Claude Mythos Preview
view original source →
all researcher takes →