AI Engineer YouTube · May 20, 2026

Any-to-Any: Building Native Multimodal Agents - Patrick Löber, Google DeepMind

Any-to-Any: Building Native Multimodal Agents - Patrick Löber, Google DeepMind video thumbnail
Why it matters

Draw arrows on a map and ask Gemini to generate a picture of what you see. It produces the Golden Gate Bridge. Not because it matched pixels, but because the image generation model is built on top of Gemini's world understanding and knows what those arrows are pointing at. Patrick Löber walks through the full any-to-an

My takeaway: Draw arrows on a map and ask Gemini to generate a picture of what you see. It produces the Golden Gate Bridge. Not because it matched pixels, but because the image generation model is built on top of Gemini's world understanding and knows what those arrows are pointing at. Patrick Löber walks through the full any-to-an