CyberGym
3 models tested · Updated 2026-04-07 · Verified sources only
Claude Mythos Preview leads at 83.1%
1
Anthropic · Blog/Anthropic · 2026-04-07
New cybersecurity benchmark SOTA. 1,507 real-world vulnerability analysis tasks. 16.5 pts ahead of Opus 4.6.
83.1%
2
Z.AI · Web/Z.AI announcement coverage · 2026-04-07
Nearly 20pts above GLM-5 (48.3). Open-source MIT model leading CyberGym.
68.7%
3
Anthropic · Blog/Anthropic · 2026-04-07
Comparison score from Mythos Glasswing announcement.
66.6%