benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
WeMath leaderboard
WeMath
2 models tested · Updated 2026-04-14 · Verified sources only
Qwen3.5 27B
leads at
84.0%
1
Qwen3.5 27B
Qwen ·
arxiv/2604.08644
· 2026-04-14
From EXAONE 4.5 technical report. Best among sub-33B models.
84.0%
2
EXAONE 4.5 33B
LG AI Research ·
HuggingFace/LGAI-EXAONE
· 2026-04-14
Significantly beats GPT-5 mini (70.3) and Qwen3-VL 32B (71.6).
79.1%