Turn your notes into a two-host podcast — over a weekend

Here's the idea in one line: you drop your notes or an article link into the app — and out comes an mp3 where two hosts talk it through out loud. Pop in your earbuds on the walk, and you've "read" the thing you've been putting off for a month.
And here's what's fresh: a year ago this wasn't this easy. To make a lively chat you needed two distinct voices, stitched-together lines, pauses, intonation — a whole separate audio chore. Now Gemini has a TTS mode for multiple speakers at once, with expressive tags — a chuckle, a pause, emphasis. One request, and the model hands the lines out to two voices itself. Ten minutes of narration costs around $0.09. That's the thing the whole project rides on.
What you'll learn
It's a small project, but it has the full "text → sound" loop that lots of apps are built on.
- Building a script prompt. The model first writes a dialogue from your text, then voices it.
- Steering the voices. Setting two hosts, names, roles and tone — that's already multi-speaker TTS.
- Saving the result. Pulling the audio out of the model's reply and writing it to an mp3.
A ready starter prompt
Don't write "make a podcast from this text" — the model will guess the format, length and tone. Give it roles, constraints, and an example of how the chat should sound:
Make a podcast from this text.The difference is that the strong prompt leaves no room to guess: you get a finished two-voice script on the first try, not a dry summary.
What the result looks like
A two-minute episode.mp3: Ana asks, Max explains, with real pauses and chuckles in between. From your work notes, a book chapter or a long article — a mini-episode you actually listen to instead of "I'll read it someday".
Start with one short article, take it all the way to a sounding file — and you've got a pipeline that turns "should read this" into "already listened".
Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.
Source: Gemini 3.1 Flash TTS Preview — multi-speaker TTS and expressive tags





