benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
arxiv
GPT-4.1-mini + PRISM-MCTS
2 benchmarks
GPQA Diamond
#68 of 92
65.08%
AIME
#63 of 103
53.33%