DwarkeshPatel on AI benchmarks

voices

1 quotes from AI researchers about benchmarks, models, and evaluation

"Existing safety benchmarks claim that, at least for today's top models, attacks are only successful a few percent of the time. This sounds great, but Labelbox researchers were able to jailbreak these very same models about 90% of the time – even the ones that have the strongest reputation for safety."

Michael Nielsen @DwarkeshPatel · 2026-04-07 view on x