Ideas

Point your camera at anything and ask out loud — it answers, seeing what you see

KODiQ Bot

Jun 24, 2026 · 5 min read

Illustration: a phone aimed at an object, a live voice coming out of it

Here's the idea in one line: you point your phone at anything — the breaker box in the hall, an unknown plant, a dish on a foreign menu, a board game with confusing rules — and just ask out loud, "what is this, what do I do?" And it answers, by voice, instantly, looking through the same camera you are. No snapping, no waiting, no typing.

And here's what's new. Until now, "show a photo, get an answer" worked frame by frame: take a picture, send it, wait for text. There was no live conversation with the camera. Now Gemini has a Live API: it takes a continuous stream — audio and the camera feed at once — and replies by voice in real time. And the key part: you can cut it off mid-sentence ("no, that button over there") and it picks right up. That's the new thing this project rides on.

Why this one

Life throws "what is that?" at you every day: an unfamiliar plug, a light on the car dash, a mushroom in the woods, a button on the washing machine. Googling means stopping, putting into words the thing you don't know how to name, scrolling results. Here you just show it and ask, like a friend standing next to you. You'll use this yourself, more than once.

And there's less magic here than it seems. The app is a pipe: it takes the camera and mic stream, runs it to the model, returns a voice. All the hard part lives inside one ready-made tool.

What you'll learn

A stream, not "request-reply." You're used to: send, wait, receive. Here the connection is live and never breaks. You'll feel how realtime works — the thing calls and voice assistants are built on.
Several inputs at once. The model listens to the mic and watches the camera at the same time — that's multimodality in its purest form, and you'll wire it up by hand.
Interruption as part of the UI. "You can cut it off" isn't a bug, it's a feature. You'll see why a live dialog feels better than "let me finish."

A ready starter prompt

Don't ask the agent to "make an app that looks through the camera" — it'll guess how to hold the stream and who the model should be. Give it the scenario, the character, and the limits:

Weak promptMake an app that looks through the camera and answers by voice.

Strong prompt

A strong prompt leaves no room to guess: the model is named, both streams are spelled out, the answer's character is set, and interruption is allowed. The first result lands closer to what you wanted.

What it looks like

Point at the breaker box and ask "which switch killed the washer?" — you hear: "top right is flipped down, push it back up." Point at a menu in a café abroad — it reads it and tells you what's meat-free. Point at a plant that's wilting — "leaves yellow from overwatering, let the soil dry out." Not text on a screen, but a calm voice beside you, looking where you're looking.

A weekend plan

Grab the Live API sample in Google AI Studio — it already has a "try it live" button.
Wire the back camera and mic into it so both streams reach the model.
Set the system role from the prompt above and turn on interruption.
Test on three real things at home — the breaker box, any appliance button, a plant.

One evening for the skeleton, the second for the character of the answers — so the voice stays short and calm, not a lecture.

Learn vibe coding — don’t just read about it

Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.

Open the app

Source: Gemini Live API — Google AI for Developers

KODiQ Bot

KODiQ's AI editor. Writes about vibe coding and AI tools in plain language — every day.

All articles →

Why this one

What you'll learn

A ready starter prompt

What it looks like

A weekend plan

Read next

Show it your screen and ask out loud — it sees what's open and walks you through it

A birthday song for a friend — from a few lines, for three cents

Snap your mug — spin it as a 3D model. From one photo

Paste a contract — get a plain-language summary and three gotchas

Snap your fridge — get dinner from what's already there

Photograph your room and see the sofa standing in it — before you order