youtube_JulianGoldieSEO on AI benchmarks

voices

3 quotes from AI researchers about benchmarks, models, and evaluation

"Four is being built primarily for coding. Internal benchmarks, not yet independently verified, suggest it's targeting over 80% on SWE bench, the benchmark for solving real-world software engineering problems."

Julian Goldie @youtube_JulianGoldieSEO · 2026-04-09 view on x

SWE-bench Verified DeepSeek V4

"In January 2025, they dropped Deep Seek R1. That one broke the internet. Matched GPT-4 on key benchmarks. Cost roughly $6 million to train compared to the estimated $100 million it cost to train GPT-4."

Julian Goldie @youtube_JulianGoldieSEO · 2026-04-09 view on x

DeepSeek R1

"Deep Seek's engineers rewrote core parts of V4's code specifically for Huawei silicon. They gave Huawei early testing access and froze out US chip makers entirely."

Julian Goldie @youtube_JulianGoldieSEO · 2026-04-09 view on x

DeepSeek V4