2 quotes from AI researchers about benchmarks, models, and evaluation
"Two days ago, Anthropic cut off third-party harnesses from using Claude subscriptions — not surprising. Three days ago, MiMo launched its Token Plan — a design I spent real time on, and what I believe is a serious attempt at getting compute allocation and agent harness"
"A bigger problem: many third-party harnesses compress tool responses every 3 steps when approaching the context limit, leading to very low cache hit rates."