YouTube · 2025-12-31
"I don't like unit tests as a form of verification. There's an issue with SWE-bench where all of the task instances are independent of each other. So the moment you have the model submit, oh it's done, end of the episode."
John Yang
SWE-bench creator, Princeton/Stanford researcher