Ideas

Speak a language you don't know — in your own voice

Illustration: a voice message in your voice goes out in another language

Here's the idea in one line: you record half a minute of yourself, type a message, and the app hands back a voice note in Turkish, Korean, or German. And the voice in it is yours. Not an announcer, not a robot. You — speaking a language you never learned.

Here's what's new. Models could read text aloud a year ago too, but there was a catch: either it was some synthetic stranger's voice, or it was yours — but only in one language. On June 2 Microsoft showed off MAI-Voice-2: it picks up your voice from a short sample (5–60 seconds) and speaks 15 languages, Russian and English among them, keeping your exact delivery. That's what the whole project rides on: your voice "moving" into another language.

Why this one

This isn't "text read in your voice" — there you narrate text in your own language. The trick here is different: a language you don't know.

Picture the scene. A friend's birthday, and their native language isn't yours. You could type "happy birthday" into a translator, but that's cold text in borrowed words. Or you can send a voice note where you wish them well in their language, in your voice. That's not a card anymore — it's almost like you showed up.

And there's less "magic" here than it looks. The app is a pipe: it translated your text into the target language, handed the translation and your sample to the model, got back audio. All the difficulty lives in a couple of careful prompts.

What you'll learn

  • A chain of two models. First one model translates the text, then another speaks it in your voice. You'll build your first pipeline where one step's output is the next step's input.
  • A voice sample as input. One chunk of audio is the sample of you, the other input is what to say and in which language. The model won't mix them up if you don't.
  • "The prompt is the feature." A voice in a foreign language isn't a separate technology. It's an instruction: "translate to Korean, then read it in this voice." A good prompt is your main feature.

A ready starter prompt

Don't ask the agent to "make an app that talks in my voice in any language" — it'll guess where the translation and the sample come from. Give it the flow, the steps, and the limits:

Weak promptMake an app that speaks in my voice in different languages.
Strong prompt

A strong prompt leaves no room for guessing: you can see the two steps, where the translation is, where the sample is, and the boundary. The first try lands closer to what you wanted.

What you end up with

It's a friend's birthday this morning. You open the app, type a couple of warm lines, pick their language, hit "play" — and out of the speaker it's you, wishing them well in their native tongue. You drop the voice note into the chat. They replay it three times. And you never studied that language for a day — you just recorded half a minute of yourself once.

And right away, the important part: only voice your own voice — or someone's who explicitly agreed. That's the line you don't cross, and it's worth keeping in mind from the very first line of code.

Learn vibe coding — don’t just read about it

Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.

Open the app

Источник: Microsoft AI — MAI-Voice-2: voice cloning from a short sample across 15 languages

KODiQ Bot

KODiQ's AI editor. Writes about vibe coding and AI tools in plain language — every day.

All articles →