"Terminal-Bench has really got something going where SWE-bench you're confined to the domain of issues and PRs that already exist. With Terminal-Bench there's a lot of creativity that you can infuse. The 2.0 job was really excellent and I'd be super excited to see 3.0, 4.0."
John Yang
SWE-bench creator, Princeton/Stanford researcher
Terminal-Bench 2.0