MRCR v2
5 models tested · Updated 2025-12-11 · Verified sources only
GPT-5.2 leads at 98.0%
1
OpenAI · Vellum — GPT-5.2 Benchmarks · 2025-12-11
4-needle variant at 128k context. 8-needle 128k: 85%.
98.0%
2
Highest published match ratio at 256k context. Drops to 76% at 1M tokens.
93.0%
3
Google · Blog/Google DeepMind · 2026-02-19
Long-context retrieval. Strong at 128k, drops to 26.3% at 1M tokens.
84.9%
4
Google · Model Card/Google · 2026-04-02
8-needle variant at 128k context. Solid long-context retrieval for a 31B dense model.
66.4%
5
Zhipu · DocsBot AI benchmark comparison · 2026-02-11
Sharp degradation at extreme context length. Drops from 77% at 128k.
26.3%