Phi-4-reasoning-plus
3 benchmarks
AIME
#35 of 39
81.3%
MMLU Pro
#25 of 29
76.0%
GPQA Diamond
#44 of 49
68.9%