benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
CharXiv leaderboard
CharXiv
4 models tested · Updated 2026-04-08 · Verified sources only
Muse Spark
leads at
86.4%
1
Muse Spark
Meta ·
Blog/Meta
· 2026-04-08
Chart reasoning SOTA. Beats GPT-5.4 (82.8), Gemini 3.1 Pro (80.2), Claude Opus 4.6 (65.3) per Meta announcement.
86.4%
2
Qwen 3.5 397B
Alibaba ·
HuggingFace/Qwen
· 2026-02-16
Chart understanding and reasoning. Between Claude Opus 4.6 (78.9) and Muse Spark (86.4).
80.8%
3
Claude Opus 4.6
Anthropic ·
arxiv/Mythos-System-Card
· 2026-04-07
With tools. Chart reasoning on arxiv figures.
78.9%
4
Claude Opus 4.6
Anthropic ·
arxiv/Mythos-System-Card
· 2026-04-07
No tools. Chart reasoning on arxiv figures.
61.5%