Michael Nielsen on AI benchmarks

YouTube · 2026-04-07

"Existing safety benchmarks claim that, at least for today's top models, attacks are only successful a few percent of the time. This sounds great, but Labelbox researchers were able to jailbreak these very same models about 90% of the time – even the ones that have the strongest reputation for safety."

Michael Nielsen

Physicist, author, AI researcher

view original source → all researcher takes →