benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
arxiv
MARS-7B
3 benchmarks
HumanEval
#10 of 17
81.7%
MMLU Pro
#45 of 47
44.4%
GPQA Diamond
#87 of 92
26.6%