voices
GPTAIclips on AI benchmarks
4 quotes from AI researchers about benchmarks, models, and evaluation
"SWE Bench Pro, 58.4. That is a new state-of-the-art, beating both GPT 5.4 and Claude Opus 4.6."
"SWE Bench verified at 78.8. Built on chain of thought that stays focused across hundreds of agent steps."
"The 31B dense model beats models 20 times its size on benchmarks. Supports text, images, audio, and video."
"Competitive benchmarks with full precision 8B models at 14 times less memory."