Why it matters
Cloudflare report on testing security-focused frontier models against real infrastructure code. Relevant to evaluating AI-assisted vulnerability discovery and production security workflows.
My takeaway: Project Glasswing: what Mythos showed us is a model-evaluation signal. The practical read is to tie capability claims to evidence, launch criteria, and regression tests rather than relying on demos or benchmark headlines.