benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
HealthBench Hard leaderboard
HealthBench Hard
1 models tested · Updated 2026-04-08 · Verified sources only
Muse Spark
leads at
42.8%
1
Muse Spark
Meta ·
Blog/Meta
· 2026-04-08
New SOTA on health benchmarks. Trained with 1000+ physician collaborators. Beats GPT-5.4 (40.1) and Gemini 3.1 Pro (20.6).
42.8%