karpathy on AI benchmarks — benchmark.space

voices

karpathy on AI benchmarks

2 quotes from AI researchers about benchmarks, models, and evaluation

"@kepano I just tried it this morning on the 245-page Mythos pdf and it failed badly and the outputs were all mangled. Converting pdfs is really hard, I think it has to probably be a Skill not a program, for a SOTA LLM for it to work properly."

Andrej Karpathy @karpathy · 2026-04-09 ·990 likes view on x

Claude Mythos Preview

"@chalish_b @kepano In my experience there are approx. one thousand different pdf converters that are all equally terrible for anything except the simplest documents. Post the converted Mythos pdf, figures, tables and all. If good, happy to retweet as this is essential and missing infrastructure."

Andrej Karpathy @karpathy · 2026-04-09 ·57 likes view on x

Claude Mythos Preview