benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
GraphWalks BFS leaderboard
GraphWalks BFS
1 models tested · Updated 2026-04-07 · Verified sources only
Claude Mythos Preview
leads at
80.0%
1
Claude Mythos Preview
Anthropic ·
Blog/Anthropic
· 2026-04-07
Long-context graph reasoning (256K-1M). 2x over Opus 4.6 (38.7%). GPT-5.4 scored 21.4%.
80.0%