CyberGym Leaderboard 2026 — Results Across 6 Real AI Models

CyberGym leaderboard

CyberGym

6 models tested · Updated 2026-04-07 · Verified sources only

      Claude Mythos Preview leads at 83.1%
    

Anthropic · Blog/Anthropic · 2026-04-07

New cybersecurity benchmark SOTA. 1,507 real-world vulnerability analysis tasks. 16.5 pts ahead of Opus 4.6.

83.1%

OpenAI · OpenAI Blog · 2026-04-23

Nearly 9pt over Opus 4.7 (73.1).

81.8%

OpenAI · OpenAI Blog · 2026-04-23

Below GPT-5.5 (81.8).

79.0%

Anthropic · Blog/Anthropic · 2026-04-16

Same as updated Opus 4.6 score. Cyber capabilities intentionally limited below Mythos.

73.8%

Nearly 20pts above GLM-5 (48.3). Open-source MIT model leading CyberGym.

68.7%

Anthropic · Blog/Anthropic · 2026-04-07

Comparison score from Mythos Glasswing announcement.

66.6%