2 quotes from AI researchers about benchmarks, models, and evaluation
"I do remember with the ARC-AGI puzzles trying frontier models getting strange results. It can't see the starting state accurately. It can't accurately define which boxes are colored what colors."
"The GLM team, their mixture of experts model especially for OCR tasks, surpassing what's possible with closed models with their 9 billion parameter model, which is impressive work."