One prompt, a whole app: five agents on one task — and the best didn't win
Here's what we set up. We took five popular coding agents and gave them all one brief — “build a simple habit tracker with a daily streak.” Then we stayed quiet. No hand-holding beyond the first prompt. The stopwatch ticking.
We weren't hunting for a winner. We wanted to see something else: where each one shines, where it stumbles, and how often you'd have to step in yourself to keep things on track.
How we scored it
We measured every run on the same four axes — that keeps the comparison honest:
- Time to first working version — how long until something actually ran.
- Code quality — readable, reasonable structure, no obvious foot-guns.
- Self-recovery — did it notice and fix its own errors?
- Interventions — how many times we had to correct course by hand.
And here's where it got interesting. The spread was wider than we expected. The fastest agent shipped a running app in under three minutes. The most thorough one took longer — but wrote code we'd actually keep. Speed and quality almost never showed up in the same run.
The best agent wasn't the one that wrote the most code — it was the one that asked the right clarifying question before writing any.
What it means for you
You don't need the “best” agent. You need the one whose habits match yours. If you like to check every step, grab a slower, more explicit agent — it saves you cleanup later. If you want a fast first draft to push against, raw speed wins.
But here's what held across all five at once: the sharper you state the task, the sharper the answer. The tool mattered less than your prompt. So don't level up the agent — level up how well you explain.