"Existing safety benchmarks claim that, at least for today's top models, attacks are only successful a few percent of the time. This sounds great, but Labelbox researchers were able to jailbreak these very same models about 90% of the time – even the ones that have the strongest reputation for safety."
Michael Nielsen
Physicist, author, AI researcher