benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
head to head
Claude Mythos Preview
vs
DeepSeek V4 Flash
7 shared benchmarks
7
wins
0
ties
0
wins
86.9%
BrowseComp
73.2%
94.6%
GPQA Diamond
88.1%
64.7%
Humanity's Last Exam
34.8%
87.3%
SWE-bench Multilingual
73.3%
77.8%
SWE-bench Pro
52.6%
93.9%
SWE-bench Verified
79.0%
82.0%
Terminal-Bench 2.0
56.9%