Draft

Notes on Coding Agents

I've been using coding agents (like Claude Code/Cursor) very extensively for the past few months and wanted to share some notes.

(Unexpected) Great Use Cases

Let the agent create minimal reproductions of a (suspected) bug in an open source project + open an issue with detailed instructions and context
Let the agent help maintaining forks (e.g. by rebasing the Git history)
Let the agent verify bug reports and turn bug reports into test cases + fixes

Testing is everything
- Regression testing beyond correctness
  - Performance
- Architecting/structuring tests will be as important as the code itself. Tests need be be maintainable over time and should be as orthogonal as possible.
Do things that compound
- https://every.to/c/compounding-engineering
I'm pretty sure containers will play a bigger role again to containerize/isolate development environments for agents
We'll probably need to figure out a better version control story to collaborate with agents (Dagger could be an interesting building block)
We might also benefit from better code review/diffing tools
There should be a improved Markdown thing
- Basically what TypeScript was to JavaScript
- Ideas:
  - Symbolic referencing (jump to definition, refactoring, etc.)
Principled engineering is more important than ever to make sure the agent doesn't do something stupid
The boundaries between local dev, CI and prod will blur
Some aspects are still very scary (e.g. security/correctness/performance implications)
If something is hard to do, instead of throwing spaghetti at the wall, tell the agent to write a script to iterate on it until it works
API design: It's a cool pattern to ask the AI for how it would have wished the API would have looked like. This often results in a more intuitive/elegant API.

Coding agents so far "can't see"
Are only trained on Tailwind web stuff, not on historical application UIs (e.g. iTunes, Windows 95, etc.)

Examples
- Generating debugging / visualization UIs
- Running little analytics experiments (e.g. how did lines of code change over time?)
Other ideas
- Collecting data about own workflows
  - e.g. what leads mostly to merge conflicts?
  - what pollutes the AI context unnecessarily?o

Workflow idea: Drawing machine architecture diagrams (e.g. in TLDraw or ASCII art) to express intent of workflow

Use cases
- Strategic debugging and root cause analysis (tree of possible root causes)
Hiearchical agent architecture
- Fan out and fan in
Poor man's version:
- Use cursor-terminal "cd path/to/project && claude 'solve problem XYZ'" to create n new terminals with a new agent each

How to let the agent use/work with long-running processes?
- I've been looking into process-compose but have been running into some issues
- Related aspects:
  - long running processes
  - Port clashes -> force explicit ports and coordinate on per-machine level (e.g. in ~/.config/ports)
  - Restarting
  - Dependencies
  - Logs -> canonical log files
- Maybe also use docker compose to run the agent in a container?
How to make sure the agent strictly (!) follows the rules (e.g. CLAUDE.md)?
How to let the agent "see" so it can actually confidentially do pixel-perfect UI work?
How to let the agent use Otel traces?
Agents tend to make things more complicated than they need to be
They create a lot of stuff
- Needs to be cleaned up
Agents hitting walls and going in circles