Phi-4-reasoning
3 benchmarks
AIME
#37 of 39
75.3%
MMLU Pro
#27 of 29
74.3%
GPQA Diamond
#46 of 49
65.8%