benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
CyberGym leaderboard
CyberGym
3 models tested · Updated 2026-04-07 · Verified sources only
Claude Mythos Preview
leads at
83.1%
1
Claude Mythos Preview
Anthropic ·
Blog/Anthropic
· 2026-04-07
New cybersecurity benchmark SOTA. 1,507 real-world vulnerability analysis tasks. 16.5 pts ahead of Opus 4.6.
83.1%
2
GLM-5.1
Z.AI ·
Web/Z.AI announcement coverage
· 2026-04-07
Nearly 20pts above GLM-5 (48.3). Open-source MIT model leading CyberGym.
68.7%
3
Claude Opus 4.6
Anthropic ·
Blog/Anthropic
· 2026-04-07
Comparison score from Mythos Glasswing announcement.
66.6%