Vibhu Sapra on AI benchmarks

YouTube · 2026-03-18

"They are training a 3 billion parameter model just to interpret a 1B Llama. Pretty crazy scale there. But the diffusion loss follows a smooth power law. Scaling directly affects downstream tasks. Both steering performance and probing accuracy improve with compute closely tracking the diffusion loss."

Vibhu Sapra

Paper Club Presenter

view original source → all researcher takes →