benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
arxiv
MARS-0.5B
3 benchmarks
HumanEval
#16 of 17
40.2%
GPQA Diamond
#92 of 92
19.4%
MMLU Pro
#47 of 47
12.4%