_LuoFuli on AI benchmarks — benchmark.space

voices

_LuoFuli on AI benchmarks

2 quotes from AI researchers about benchmarks, models, and evaluation

"Two days ago, Anthropic cut off third-party harnesses from using Claude subscriptions — not surprising. Three days ago, MiMo launched its Token Plan — a design I spent real time on, and what I believe is a serious attempt at getting compute allocation and agent harness"

Fuli Luo @_LuoFuli · 2026-04-05 ·1751 likes view on x

Claude Opus 4.6

"A bigger problem: many third-party harnesses compress tool responses every 3 steps when approaching the context limit, leading to very low cache hit rates."

Fuli Luo @_LuoFuli · 2026-04-06 ·89 likes view on x