benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
WebArena-Verified leaderboard
WebArena-Verified
1 models tested · Updated 2026-03-05 · Verified sources only
GPT-5.4
leads at
67.3%
1
GPT-5.4
OpenAI ·
Blog/OpenAI
· 2026-03-05
Record score using both DOM and screenshot-driven interaction. Previous SOTA was GPT-5.2 at 65.4%.
67.3%