Boss-Fight Coding

Decades of hand-crafting code and we still ended up with Workday and Jira!

For systems development, gen/AI can reduce cognitive load on parts of the process we've always known have the most impact: trackable requirements, falsifiable acceptance criteria, meaningful tests and unemotional adversarial analysis based entirely on facts. If we improve these parts of the process, it doesn't matter who writes the code, quality goes up!

Let's take a beat to think through the reality of system development:

  • Hand-crafted code is often garbage, and the older and bigger it gets, the more true this is. (Human overrated)
  • A spoonful of sewage in a glass of wine is a glass of sewage. (AI overrated)
  • We cannot enter a P1 firefight with an AI that wrote the code — today we need context, situational awareness and understanding! (AI overrated)
  • We built the wrong thing — or we built the thing no one wanted. (Human overrated)

These somewhat contradictory things can all be true because writing code is just one part of what is mostly a messy, mundane business activity.

But we can remedy many of the messy problems if we adapt the banking industry's LoD2 regulatory process for gen/AI-based development — a process I call Boss-Fight Coding.

Context Is Everything!

For much of 2025 this was my experience of agentic gen/AI coding:

Start a task, Claude Code takes a while to warm up until it hits a sweet spot where it reliably produces consistent, good work for many turns, and then, blammo!, context collapse. At which point my only options are /clear or /compact. Arg!

Context Collapse

We know from the NoLiMa paper that longer contexts become diffuse which result in the haystack problem: earlier signals that lack lexicographical matching get buried in the mass of tokens. At some point the agent starts to go haywire — often when we're near the context window limit. This is super frustrating because it comes without warning and interrupts flow. Once this happens, I find that /clear is the only workable option. I wanted a better option than to simply start over.

Introducing Context-Curator

https://github.com/0x6a77/context-curator

Context-curator is a set of Claude Code slash commands that allow you to manage context in a task-oriented way. (You can read below how context-curator came to exist.) The general idea is that context, which includes the CLAUDE.md files, really should be task-specific so that Claude Code laser focuses on just what's needed for the task.

Now I can warm up a context and save it as part of a task, and when I encounter context collapse, I just reload the warmed-up context. Or when I return to a task, I get the warmed-up context. (If we reload a context within the cache window, we'll save tokens. This will usually be true for /resume after a collapsed context.)

Pelican Rides a Bicycle

Below are two SVG solutions to the Simon Willison's benchmark: "Generate an SVG of a pelican riding a bicycle."

Claude Sonnet 3.5:

claude-3-5-sonnet-20240620.svg

Claude-Code:

pelican-bicycle.svg

What Changed?!

As agentic gen/ai exploded in 2025, it felt like Willison's benchmark became a solvable problem if we used these new agentic powers: skills, sub-agents, tool-use and more. Using Claude-Code, I built a pipeline that now solves this problem.

The Github repo github.com/0x6a77/pelican-rides-a-bicycle let's you replicate this work and see an example of Claude-Code skills with tool-calling.

Return to ZeroDiff

After a near decade-long break, I've returned to ZeroDiff with a new perspective: it isn't just for hardware, it's a universal approach to product design and iteration. When we got clobbered in 2016 I needed a break, so I became a contractor with a focus on cloud compute, systems programming and data systems. I shaped that work into a what I thought might be a startup, Karrots, but with two kids in college I needed to give everyone a couple years break from startup stress and grind.

Eventually I want to return to robots and hardware, but because my life these days is mostly systems software at scale, I want to share some of the way that I have applied ZeroDiff to software over the years. The goal now is to run more formal experiments and write about them.