Claude Code agents, wired up for Flutter

For most of my career the accepted shape of “programmer tools” was: you type, the editor helps a little, the compiler yells a lot. The loop was human-driven and fast because the compiler was fast.

Claude Code changes the shape. The AI isn’t in the gutter of my editor autocompleting names — it’s running the toolchain. It reads the repo, runs flutter analyze, picks a failing test, reads the test, reads the code under test, proposes a patch, applies it, reruns. That’s not autocomplete. That’s a teammate.

Where this broke my Flutter workflow (in a good way)

The first thing I noticed was that the agent is boring-good at the boring parts. Adding a new screen, wiring routing, plumbing a model through a repository into a view — the kind of work that used to eat a Tuesday morning now happens while I’m refilling coffee.

The second thing I noticed was the opposite: on genuinely tricky things the agent is polite about uncertainty. It writes a test, runs it, reads the output, comes back and says “the test proves the bug is in the serializer, not the widget; I think the fix is here — want me to try?” That’s a better conversation than I had with myself before.

The recipe format

Everything gets better when you give the agent shape. I keep a .claude/skills/ directory in every Flutter repo with short, imperative markdown files — each one a “recipe” for a specific SDLC step. One for adding a feature. One for fixing a bug. One for shipping a release. Each recipe tells the agent which commands to run, in what order, and what counts as “done.”

A fix recipe might look like: (1) reproduce the bug with a failing test; (2) make the test pass with the smallest patch; (3) run the full suite; (4) run the linter; (5) commit with a Conventional Commits message. The agent follows that flow and I get to review a pull request shaped exactly how I’d shape it myself.

The trick is that the recipes are short. Long recipes rot. If a recipe has more than a dozen bullets I split it in half.

What I’ve stopped doing

I used to manually keep a TODO list in my head for every feature. Now I dispatch the agent and do something else. When it comes back I read the diff.

I used to context-switch between a dozen tabs. Now the agent reads the docs for me and paraphrases. I spot-check when the answer matters.

I used to write scaffolding code. Now I write the contract — interfaces, schemas, test names — and the agent fills in the implementation that satisfies it.

I still write the hard parts by hand. The parts where I need to think through a subtle invariant or a tricky cache. The AI isn’t better at those yet. But it’s freed up so much of the easy-work time that I can actually think about the hard parts instead of being numb from having fought through a thousand tiny ceremonies first.

What I’m still figuring out

The interesting failure mode isn’t “AI writes bad code.” It’s “AI writes fine code for the wrong question.” You give it a bug report that said “the button doesn’t work” and it adds a disabled state check, when the real issue is that a prior refactor made the tap event handler unreachable. The fix works for the reported symptom and hides the underlying problem.

The antidote is being honest about what the bug actually is. Which is good practice anyway. It just becomes more expensive to skip.

If you want to try this

Put this in your repo and go: a .claude/skills/ folder with fix.md, feature.md, ship.md. Each one short. Each one telling the agent what “done” means for that kind of task. Trust your linter and your test suite — they’re the agent’s feedback signal, and if they’re weak the agent will drift. The better your tools, the better your agent.

Flutter happens to have great tools. flutter analyze, flutter test, dart format, dart fix. Plug those into the agent and the quality bar is self-enforcing.

The actual change for me wasn’t about code. It was about what I do all day. Less typing. More thinking. More shipping.