Dawn Song on Claude Mythos Preview

X / Twitter · 2026-04-10

"Our agent Terminator-1 scored ~100% on 8 major AI agent benchmarks, e.g. SWE-bench Verified & Pro, Terminal-Bench, beating Claude Mythos. It solved 0 tasks. Benchmark scores without adversarial auditing are meaningless."

Dawn Song

SWE-bench Verified Claude Mythos Preview

view original source → all researcher takes →