An app reads your kid a bedtime story — in your voice, while you're away

Here's the idea in one line: you record half a minute of your speech, and after that the app reads any text — a bedtime story, a shopping list, an article — in your voice. Not robotic, not someone else's. Yours.
And here's what's fresh: a year ago this wasn't easy at all. Cloning a voice used to need a studio, an hour of clean audio, and a sound engineer. Then on June 2 Microsoft showed MAI-Voice-2 — a model that picks up a voice from a short sample and speaks in 15 languages. One small clip is enough. That's what the whole idea rides on.
Why this one
Picture the real scene. You're away for a couple of days, and your kid won't fall asleep without a story in your voice. Or grandma lives far off, and you want the grandkid to hear her exactly. Generic synthesis is "the smart speaker reads a book." Your own voice is about you.
And there's less "magic" here than it looks. The app is a pipe: take your sample, take the text, hand it to the model, get audio back, hit play. All the complexity lives in one careful request.
What you'll learn
- Voice as both input and output. For the first time you give a model audio in and get audio out. Not text — sound.
- Sample + text are two different inputs. One clip is the voice example, the other is what to say. The model won't mix them up if you don't mix them up in the request.
- "The prompt is the feature." Reading in your voice isn't a separate technology you have to invent. It's an instruction to the model: "here's the example, here's the text, read it the same way." A good request is your main feature.
A ready starter prompt
Don't ask the agent to "make an app that talks in my voice" — it'll guess where the sample comes from and in what format. Give it a scenario, a sample, and limits:
Make an app that reads text in my voice.A strong prompt leaves no room for guessing: you can see where the sample is, where the text is, the behavior and the buttons — and the line you shouldn't cross. The first result lands closer to what you wanted.
What you end up with
You're at the station, ten minutes to the train. You open the app, paste in "The Gingerbread Man," hit play, and send the audio home. That evening your kid falls asleep to the story — in your voice, even though you're not there. You didn't sit in a studio. You recorded half a minute, once.
And the important part up front: only clone your own voice — or someone's who clearly agreed. That's the line not to cross, and it's worth keeping in mind from the first line of code.
Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.
Источник: Microsoft: launching seven new MAI models (MAI-Voice-2)

