Drop in a clip — the model watches it and draws the cover
Here's the idea in one line: you drop a short clip into an app — a Reels cut, a screen recording, a slice of a stream — and it hands you back a finished cover. Not a frozen frame, but a drawn image that shows what the video is about. You don't open an editor and you don't hunt for the right frame.
And here's what's new. Until late May, the model could look at one photo only. You couldn't feed it video. In late May, Nano Banana 2 (that's Gemini 3.1 Flash Image) learned to take a video file as input: it effectively re-watches the clip, figures out what's happening, who the subject is, what the action is — and draws a meaningful image from that. The announcement literally lists thumbnails and infographics. That's the new thing this project rides on.
Why this one
Everyone films clips. Almost nobody makes a cover for them — you need an editor, you have to catch the frame, crop it, add text. So half of all videos ship with an ugly random frame. "Drop a clip, get a cover" kills all that fuss. You'll actually use this yourself.
And there's less "magic" here than it looks. The app is a pipe: take the video, hand it to the model, get an image, show it. All the difficulty lives in one good prompt.
What you'll learn
- Video as input. You used to send the model text or a photo. Now you send a whole clip. It's a new modality, and you'll get your hands on it.
- Asking for a meaningful frame, not a screenshot. Not "cut out a second of the video," but "watch the whole clip, get the gist, and draw a cover about it."
- "The prompt is the feature." Picking the best moment isn't a separate technology. It's an instruction to the model. A good prompt is your main function.
A ready starter prompt
Don't tell the agent "make a cover generator" — it'll start guessing the format, size, and style. Give it context, an example, and limits:
Build an app that makes a cover from a video.A strong prompt leaves no room for guessing: the flow is clear, it's clear the model must watch the whole clip, the cover format is set, and so is what must not be in it. The first result lands much closer to what you wanted.
What you end up with
You filmed your cat knocking a mug off the table. You drop the 8-second clip into the app — and a couple of seconds later, there's a cover under it: that same cat mid-leap, large, centered, like a drawn comic panel. You hit "download" and put it on your Reels. You never opened an editor or scrubbed for a frame. You just dropped in the clip.
Start with one screen, get it through to download — and you'll have a thing that does in a second what used to take half an hour in an editor.
Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.
Source: Nano Banana 2 now accepts video as input (Google Cloud Blog)





