YouTube · 2026-02-20
"What happens if you take away the multiple-choice questions, get the models to answer in an open-ended fashion, and then get a blind grader model to compare their answers to the hidden correct answer? You still get some pretty impressive scores, but just not quite as high. Call it a 15 to 20 percentage point drop."
AI Explained
AI YouTube channel