Describe a scene in words — get a short clip with sound

Here's a weekend idea: write one line — "a ginger kitten in a knitted sweater looks into the camera, soft light" — and a minute later you've got a finished 8-second clip. Vertical, made for Stories. And with sound.
Not an image. A real video you can send to a friend.
Why this just became possible
A year ago, "make a video from text" meant a studio, editing, and a week of fuss. Turning a line into an image we already had — but a moving frame with sound, no.
Now we do. Veo 3.1 in the Gemini API turns text into an 8-second clip — vertical right away (9:16, not cropped from landscape), up to 4K, and with sound: the model lays in voice, ambient noise, and music to match what's on screen. One request — on the same Gemini key you already touched for images.
What you'll learn
- A video prompt isn't an image prompt. You add motion and sound to the scene: what moves or turns in the frame, how the camera behaves, what plays in the background. That's a new lever an image doesn't have.
- A long API call. Here the request returns not text or an image but an .mp4 — and you have to wait for it. You'll learn to submit a job and pick up the result when it's ready.
- Format for the platform. 9:16 vertical from the start — not a slice of a wide frame. A small thing that matters a lot if the clip is headed for Reels or Stories.
A ready starter prompt
Don't tell the agent "make a video about a cat" — the model will guess the motion, the sound, and the format. Describe the frame, the motion, the camera, and the audio:
Make a video about a cat.The strong prompt sets everything the model would otherwise invent: the frame, the motion, the camera move, the sound, the format. And that's exactly why the clip comes out, first try, as the thing you pictured.
What you end up with
8 seconds of vertical video: the kitten looks into the camera over quiet purring and warm music. A ready-made birthday greeting for a friend — or a short ad for your hobby you wouldn't be embarrassed to post.
The magic is on the outside. Inside it's one line you assembled from parts: frame, motion, sound.
The weekend plan
- Saturday: one script: line → request to Veo → wait → download the mp4. Run a couple of different scenes and catch how the description shapes the result.
- Sunday: a simple form — type text, hit a button, download the clip a minute later. Make yourself a greeting clip and send it.
This is the kind of project you want to show off: you didn't "draw an image," you shot a tiny film with one sentence.
Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.
Source: Generate videos with Veo in the Gemini API — Google AI for Developers





