"The model was blind graded by experts against human outputs from across 44 white collar occupations, selected for their impact on GDP, hence the name of the benchmark GDPval. And GPT 5.4 beats the human first attempt 70.8% of the time. If you include ties, it is 83% of the time."
AI Explained
AI commentary YouTube channel
GDPvalGPT-5.4