benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
Simular AI
Agent S3 + Behavior Best-of-N
1 benchmarks
OSWorld
#9 of 30
72.6%