DeepSeek V4 Pro
15 benchmarks
AIME
#18 of 108
95.2%
LiveCodeBench
#2 of 51
93.5%
GPQA Diamond
#16 of 101
90.1%
MMLU Pro
#6 of 53
87.5%
MRCR v2
#4 of 13
83.5%
BrowseComp
#9 of 33
83.4%
SWE-bench Verified
#6 of 79
80.6%
HumanEval
#14 of 19
76.8%
SWE-bench Multilingual
#6 of 15
76.2%
Terminal-Bench 2.0
#12 of 28
67.9%
SimpleQA Verified
#2 of 3
57.9%
SWE-bench Pro
#12 of 27
55.4%
Toolathlon
#2 of 2
51.8%
Humanity's Last Exam
#16 of 50
48.2%
Humanity's Last Exam
#16 of 50
37.7%