benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
arxiv
DeepSeek-R1-Distill-Llama-8B + SeLaR
2 benchmarks
AIME
#70 of 103
46.67%
GPQA Diamond
#82 of 92
40.91%