3 quotes from AI researchers about benchmarks, models, and evaluation
"Four is being built primarily for coding. Internal benchmarks, not yet independently verified, suggest it's targeting over 80% on SWE bench, the benchmark for solving real-world software engineering problems."
"In January 2025, they dropped Deep Seek R1. That one broke the internet. Matched GPT-4 on key benchmarks. Cost roughly $6 million to train compared to the estimated $100 million it cost to train GPT-4."
"Deep Seek's engineers rewrote core parts of V4's code specifically for Huawei silicon. They gave Huawei early testing access and froze out US chip makers entirely."