AI Engineer YouTube · June 9, 2026

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind video thumbnail
Why it matters

One API call to Gemini 3 Flash Preview: speaker labels by name, timestamps, emotion tags, language detection with English translation, and a full summary. That is the audio understanding layer that underlies everything else Thor Schaeff demos here, including speech generation directed by a "director's note" rather than

My takeaway: One API call to Gemini 3 Flash Preview: speaker labels by name, timestamps, emotion tags, language detection with English translation, and a full summary. That is the audio understanding layer that underlies everything else Thor Schaeff demos here, including speech generation directed by a "director's note" rather than