AI Engineer YouTube · June 4, 2026

The Art & Science of Benchmarking Agents — Vincent Chen, Snorkel AI

The Art & Science of Benchmarking Agents — Vincent Chen, Snorkel AI video thumbnail
Why it matters

ARC AGI 3 launched a few weeks before this talk with every task human solvable and frontier models under 1%. That gap is the argument: our ability to measure AI has fallen behind our ability to build it, and benchmarks that actually shape the field are bets on where capabilities are going, not snapshots of where they a

My takeaway: ARC AGI 3 launched a few weeks before this talk with every task human solvable and frontier models under 1%. That gap is the argument: our ability to measure AI has fallen behind our ability to build it, and benchmarks that actually shape the field are bets on where capabilities are going, not snapshots of where they a