GRPO (Qwen2.5-Math-7B)
1 benchmarks
AIME
#88 of 103
25.1%