benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
MRCR v2 leaderboard
MRCR v2
5 models tested · Updated 2025-12-11 · Verified sources only
GPT-5.2
leads at
98.0%
1
GPT-5.2
OpenAI ·
Vellum — GPT-5.2 Benchmarks
· 2025-12-11
4-needle variant at 128k context. 8-needle 128k: 85%.
98.0%
2
Claude Opus 4.6
Anthropic ·
Anthropic — Introducing Claude Opus 4.6
· 2026-02-05
Highest published match ratio at 256k context. Drops to 76% at 1M tokens.
93.0%
3
Gemini 3.1 Pro
Google ·
Blog/Google DeepMind
· 2026-02-19
Long-context retrieval. Strong at 128k, drops to 26.3% at 1M tokens.
84.9%
4
Gemma 4 31B
Google ·
Model Card/Google
· 2026-04-02
8-needle variant at 128k context. Solid long-context retrieval for a 31B dense model.
66.4%
5
GLM-5
Zhipu ·
DocsBot AI benchmark comparison
· 2026-02-11
Sharp degradation at extreme context length. Drops from 77% at 128k.
26.3%