"If we zoom in on just incorrect answers, and whether the models hallucinated an incorrect answer or refused to answer, Gemini 3.1 does well at 50% of its incorrect answers being hallucinations, but Claude Sonnet 4.6 is down at 38%, which is better."
AI Explained
AI analysis YouTube channel
Gemini 3.1 Pro