X / Twitter · 2026-04-10
"Our agent Terminator-1 scored ~100% on 8 major AI agent benchmarks, e.g. SWE-bench Verified & Pro, Terminal-Bench, beating Claude Mythos. It solved 0 tasks. Benchmark scores without adversarial auditing are meaningless."
Dawn Song