benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
arxiv
GPT-4.1-mini
2 benchmarks
AIME
#52 of 74
48.67%
GPQA Diamond
#67 of 76
47.55%