YouTube · 2026-04-08
"The key tests are SWE-bench for coding, MMLU and GPQA for knowledge and reasoning, and large multimodal benchmarks for image and video generation. If V4 hits 80% or higher on SWE-bench, it is genuinely competitive with Frontier Western models."
Alex
Co-host, Asia Tech Macro