Name: Spec-Driven Testing for Agents With A Brain the Size of A Planet — Steven Willmott, SafeIntelligence
Uploaded: 2026-05-31
Description: Wrapping a malicious instruction in a poem is an effective jailbreak against large models and not against small ones. Small models don't understand the poem. Large models do and execute the instruction. Steven Willmott from Safe Intelligence argues this is one reason bigger is not straightforwardly safer: a larger mode

Why it matters

Wrapping a malicious instruction in a poem is an effective jailbreak against large models and not against small ones. Small models don't understand the poem. Large models do and execute the instruction. Steven Willmott from Safe Intelligence argues this is one reason bigger is not straightforwardly safer: a larger mode

My takeaway: Wrapping a malicious instruction in a poem is an effective jailbreak against large models and not against small ones. Small models don't understand the poem. Large models do and execute the instruction. Steven Willmott from Safe Intelligence argues this is one reason bigger is not straightforwardly safer: a larger mode