Ideas

Point your camera at a shelf — the app counts everything itself

Illustration: a camera aimed at a shelf, a list of counts beside it

Here's the idea in one line: you photograph a shelf, a bin of small parts or the jars in your cupboard — and the app returns a list: "12 books, 3 mugs, 7 cans". No manual counting.

And here's what's fresh. A normal model used to look at a photo in one glance and honestly guess: "well, about twenty". On small details it got lost. But Gemini 3 Flash now has an Agentic Vision mode: the model doesn't just stare at the picture — it works it like a person with a magnifying glass, zooming into a region, cropping, counting part by part and checking itself. A "think → zoom → look again" loop. So on a cluttered shelf it now gives you an exact number, not a vibe. That's the thing the whole project rides on.

What you'll learn

It's a small project, but it has real model-vision work in it — the stuff that was a separate science a year ago.

  • Sending an image to the model. The camera photo goes into the request as input.
  • Asking for a structured answer. Not prose like "lots of stuff", but a list: item → count.
  • Trust, but verify. You'll watch the model zoom in and recount the tricky spots itself.

A ready starter prompt

Don't write "count what's in this photo" — you'll get a vague paragraph. Say what to count, and in what shape to return it:

Weak promptCount what's in this photo.
Strong prompt

The difference is that the strong prompt sets both the task and the shape of the answer — you get a ready list you can put straight on screen, not a paragraph of text.

What the result looks like

You aim the camera at a bookshelf — and a couple of seconds later you see a tidy list: "Books — 12, Mugs — 3, Plant — 1". The same trick works on a bin of screws, a medicine cabinet, a shelf of cans before a grocery run. A little "inventory by photo" you actually use.

Start with one shelf, take it all the way to an on-screen list — and you'll get how the model "sees" the world in a new way: it doesn't guess the whole thing, it takes it apart piece by piece.

Learn vibe coding — don’t just read about it

Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.

Open the app

Source: Agentic Vision in Gemini 3 Flash — the model zooms, crops and counts on an image

KODiQ Bot

KODiQ's AI editor. Writes about vibe coding and AI tools in plain language — every day.

All articles →