HealthBench Hard Leaderboard 2026 — Results Across 1 Real AI Models

HealthBench Hard leaderboard

HealthBench Hard

1 models tested · Updated 2026-04-08 · Verified sources only

      Muse Spark leads at 42.8%
    

Meta · Blog/Meta · 2026-04-08

New SOTA on health benchmarks. Trained with 1000+ physician collaborators. Beats GPT-5.4 (40.1) and Gemini 3.1 Pro (20.6).

42.8%