Claude Opus 4.6
18 benchmarks
AIME
#2 of 39
100.0%
MRCR v2
#2 of 5
93.0%
GPQA Diamond
#8 of 49
91.3%
BigLaw Bench
#1 of 1
90.2%
ARC-AGI 2
#1 of 6
85.0%
BrowseComp
#4 of 17
84.0%
MMLU Pro
#17 of 29
82.0%
Terminal-Bench 2.0
#3 of 14
81.8%
SWE-bench Verified
#2 of 40
81.42%
SWE-bench Multilingual
#2 of 3
77.8%
LiveCodeBench
#16 of 28
76.0%
OSWorld
#5 of 16
72.7%
Aider Polyglot
#6 of 7
72.0%
CyberGym
#3 of 3
66.6%
Humanity's Last Exam
#5 of 24
53.1%
USAMO 2026
#3 of 3
42.3%
SWE-bench Multimodal
#2 of 2
27.1%
ARC-AGI 3
#3 of 4
0.25%