AI Engineer YouTube · May 22, 2026

Fast Models Need Slow Developers — Sarah Chieng, Cerebras

Fast Models Need Slow Developers — Sarah Chieng, Cerebras video thumbnail
Why it matters

Codex Spark, a model Cerebras built with OpenAI, generates code at 1,200 tokens per second. The Sonnet and Opus families run at 40 to 60. At that 20x difference, a context window that used to take ten minutes to fill now takes 30 seconds, and every habit built around slow generation starts producing technical debt at a

My takeaway: Codex Spark, a model Cerebras built with OpenAI, generates code at 1,200 tokens per second. The Sonnet and Opus families run at 40 to 60. At that 20x difference, a context window that used to take ten minutes to fill now takes 30 seconds, and every habit built around slow generation starts producing technical debt at a