benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
head to head
Claude Opus 4.6
vs
DeepSeek V4 Flash
11 shared benchmarks
9
wins
0
ties
2
wins
100.0%
AIME
94.8%
84.0%
BrowseComp
73.2%
91.3%
GPQA Diamond
88.1%
40.0%
Humanity's Last Exam
34.8%
76.0%
LiveCodeBench
91.6%
82.0%
MMLU Pro
86.2%
93.0%
MRCR v2
78.7%
77.8%
SWE-bench Multilingual
73.3%
53.4%
SWE-bench Pro
52.6%
81.42%
SWE-bench Verified
79.0%
65.4%
Terminal-Bench 2.0
56.9%