Google Launches Gemini 3.1 Flash TTS in Preview


TL;DR

  • Launch: Google launched Gemini 3.1 Flash TTS in preview on April 15 with prompt-based control over speech delivery.
  • Controls: The docs show developers can steer pace, tone, accent, and multi-speaker scenes through structured prompts and audio tags.
  • Rollout: Availability spans Gemini API, AI Studio, Vertex AI, and Google Vids, with pricing and SynthID watermarking shaping early adoption.

Google launched Gemini 3.1 Flash TTS on April 15 as a preview speech model that turns text-to-speech into something closer to directed performance. Through the model ID gemini-3.1-flash-tts-preview, developers can shape delivery inside the Gemini API instead of treating synthetic speech as a plain readout layer.

More unusual is how much direction Google expects to live inside the prompt itself. Simon Willison calls the prompting guide “surprising, to say the least.” In Google’s launch post, audio tags for vocal style sit alongside the preview’s text input and audio output model.

Prompt Engineering Becomes Voice Direction

Inside the docs, the structured prompt format reads more like a production brief than a standard API call. Google asks for an audio profile, scene description, director’s notes, sample context, and transcript. The same guide also exposes style, accent, pace, and tone controls through natural-language prompting.

“This world-building context helps characters remain \”in-character\” and react to one another naturally across multiple turns.”

Gemini team, Google authors (via Google Blog)

Seen from a product perspective, the important shift is not merely that Google shipped another speech model. Outside coverage highlighted Gemini 3.1 Flash TTS’s director-level controls and format templates. Google, for its part, says creators can cast characters with Audio Profiles and Director’s Notes. Willison’s hands-on testing sharpened that point further: the UI for trying it out showed speaker labels, voice selection, script formatting, and WAV downloads, and he said changing the example from Brixton to Newcastle and Exeter altered the result.

Preview Rollout Comes With Pricing and Limits

Across Google’s developer surfaces, the rollout is broad for an early preview. Google’s launch covers Gemini API and Google AI Studio, along with Vertex AI and Google Vids, while the docs list 30 prebuilt voice options and direct users to AI Studio for testing. That scope makes the release more than a narrow demo feature.

For developers weighing whether to try it now, the practical constraints matter as much as the pitch. According to the docs, Gemini text-to-speech (TTS) is in Preview, and the model is framed around exact text recitation rather than the Live API’s open-ended conversational audio. The service includes a free tier, with $1 per million text-input tokens and $20 per million audio-output tokens on the paid tier.