benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
Anthropic
Claude Opus 4.7
17 benchmarks
HumanEval
#2 of 18
96.3%
GPQA Diamond
#3 of 95
94.2%
MMMLU
#1 of 6
92.0%
BigLaw Bench
#2 of 3
90.9%
CharXiv
#1 of 10
89.0%
SWE-bench Verified
#2 of 77
87.6%
MMLU Pro
#9 of 50
87.0%
SWE-bench Multilingual
#2 of 12
85.7%
LiveCodeBench
#13 of 49
83.0%
BrowseComp
#9 of 28
79.3%
Aider Polyglot
#7 of 12
79.0%
OSWorld
#3 of 30
78.0%
Terminal-Bench 2.0
#7 of 24
77.0%
CyberGym
#2 of 4
73.8%
CursorBench
#1 of 1
70.0%
SWE-bench Pro
#2 of 22
64.3%
SWE-bench Multimodal
#2 of 3
35.0%