Johannes Schickling
Draft

Notes on Coding Agents

I've been using coding agents (like Claude Code/Cursor) very extensively for the past few months and wanted to share some notes.

(Unexpected) Great Use Cases

Open source workflows

  • Let the agent create minimal reproductions of a (suspected) bug in an open source project + open an issue with detailed instructions and context
  • Let the agent help maintaining forks (e.g. by rebasing the Git history)
  • Let the agent verify bug reports and turn bug reports into test cases + fixes

Product management

  • Let the agent maintain/create GitHub issues

Development workflows

  • Investigate bugs / issues and create a detailed report

Other thoughts

  • Testing is everything
    • Regression testing beyond correctness
      • Performance
    • Architecting/structuring tests will be as important as the code itself. Tests need be be maintainable over time and should be as orthogonal as possible.
  • Do things that compound
  • I'm pretty sure containers will play a bigger role again to containerize/isolate development environments for agents
  • We'll probably need to figure out a better version control story to collaborate with agents (Dagger could be an interesting building block)
  • We might also benefit from better code review/diffing tools
  • There should be a improved Markdown thing
    • Basically what TypeScript was to JavaScript
    • Ideas:
      • Symbolic referencing (jump to definition, refactoring, etc.)
  • Principled engineering is more important than ever to make sure the agent doesn't do something stupid
  • The boundaries between local dev, CI and prod will blur
  • Some aspects are still very scary (e.g. security/correctness/performance implications)
  • If something is hard to do, instead of throwing spaghetti at the wall, tell the agent to write a script to iterate on it until it works
  • API design: It's a cool pattern to ask the AI for how it would have wished the API would have looked like. This often results in a more intuitive/elegant API.

UI work

  • Coding agents so far "can't see"
  • Are only trained on Tailwind web stuff, not on historical application UIs (e.g. iTunes, Windows 95, etc.)

Perspective: What can we afford doing now that wouldn't be viable before agents?

  • Examples
  • Other ideas
    • Collecting data about own workflows
      • e.g. what leads mostly to merge conflicts?
      • what pollutes the AI context unnecessarily?o

Tools I tried

  • Claude Code
  • Cursor
  • OpenAI Codex
    • Limitations:
      • Can't resume previous conversations
        • Easy to accidentially press Ctrl+C to cancel the conversation and then you can't resume it
      • CLI very buggy
      • Can't see how much usage is still left for the current usage allowance
    • Observations:
      • Very strong results so far. Even better than Opus with Claude Code.
  • Conductor
    • Only works on local machine
  • Catnip
    • Benefits:
      • Portable (even works from web browser - this also mobile web)
      • Isolated in containers
        • Can even set resource usage per container
    • Limitations:
      • Docker setup: SSH tunnel / Nix / authed tools (gh, ...)
      • Home CLAUDE.md file

Building machines

  • Workflow idea: Drawing machine architecture diagrams (e.g. in TLDraw or ASCII art) to express intent of workflow

Building custom agent workflow systems

  • Use cases
    • Strategic debugging and root cause analysis (tree of possible root causes)
  • Hiearchical agent architecture
    • Fan out and fan in
  • Poor man's version:
    • Use cursor-terminal "cd path/to/project && claude 'solve problem XYZ'" to create n new terminals with a new agent each

Open questions / challenges

  • How to let the agent use/work with long-running processes?
    • I've been looking into process-compose but have been running into some issues
    • Related aspects:
      • long running processes
      • Port clashes -> force explicit ports and coordinate on per-machine level (e.g. in ~/.config/ports)
      • Restarting
      • Dependencies
      • Logs -> canonical log files
    • Maybe also use docker compose to run the agent in a container?
  • How to make sure the agent strictly (!) follows the rules (e.g. CLAUDE.md)?
  • How to let the agent "see" so it can actually confidentially do pixel-perfect UI work?
  • How to let the agent use Otel traces?
  • Agents tend to make things more complicated than they need to be
  • They create a lot of stuff
    • Needs to be cleaned up
  • Agents hitting walls and going in circles

Good resources