Boss-Fight Coding¶
For systems development, gen/AI can reduce cognitive load on parts of the process we've always known have the most impact: trackable requirements, falsifiable acceptance criteria, meaningful tests and unemotional adversarial analysis based entirely on facts. If we improve these parts of the process, it doesn't matter who writes the code, quality goes up!
Let's take a beat to think through the reality of system development:
- Hand-crafted code is often garbage, and the older and bigger it gets, the more true this is. (Human overrated)
- A spoonful of sewage in a glass of wine is a glass of sewage. (AI overrated)
- We cannot enter a P1 firefight with an AI that wrote the code — today we need context, situational awareness and understanding! (AI overrated)
- We built the wrong thing — or we built the thing no one wanted. (Human overrated)
These somewhat contradictory things can all be true because writing code is just one part of what is mostly a messy, mundane business activity.
But we can remedy many of the messy problems if we adapt the banking industry's LoD2 regulatory process for gen/AI-based development — a process I call Boss-Fight Coding.
Context-Curator Grows Organically¶
It was never my intention for gen/AI to write all of the Context-Curator code — it snuck up on me. What started as a "how to hack Claude Code context" chat session got messy fast and I needed a way to manage it, and a PRD seemed the most logical choice. And I kept going from there until I had a working, though creaky, AI-generated prototype. But then a weird thing happened: every time I didn't like a behavior, it wasn't the code, it was the user experience! I would run another user experience session until I settled on PRD changes that fixed the issues. And I kept going.
Even though Context-Curator was working okay, I would read articles blasting someone's "vibe-coded" project and wonder: Had I fallen into a trap? It occurred to me that I should do what we do at the day job: run an LoD2 function.
Blammo! The LoD2 Adversary¶
As I imagined an LoD2 function, the first thing I wanted was a test-by-test inventory that gave me a blurb of what each test did and how well it covered the relevant acceptance criteria. To make that work, I had to reorganize the PRD so that each feature had its own code and each acceptance criteria a sub-code so we had a through-line to make it easy to understand the test inventory report. But then I realized the LoD2 adversary should not be cooperative with the constructor at all, so it should have its own "DNA" in its own task/context.
Once I had it working, the next logical step was a specialized Context-Curator task (/task:adversary) that would independently challenge artifacts from the PRD through the acceptance criteria, test-plans, tests and code.
Danger
In financial services LoD1 (the constructor) reports through the CTO, and LoD2 (the adversary) reports through to the CRO. This organizational separation ensures the two groups don't enter regulatory capture.
For a singular creator project like Context-Curator, this reduced rigor is fine. In the future, I plan to add the ability for Context-Curator to split these duties across branch-merge protection to optionally enforce the organization separation.
Here are two example test inventory entries for Context-Curator:
| TEST_ID | DESCRIPTION | AC_CLAUSE | COVERAGE_RATIONALE | VERDICT |
|---|---|---|---|---|
| T-RESUME-MANUAL | MANUAL: After /task | T-RESUME-MANUAL | RA-002 applies. DISPOSITION: ACCEPTED. EXPIRY: v2.0-release. | ACCEPTED (RA-002) |
| git-integration.test.ts:"T-GIT-2: git status --porcelain does not list any path containing personal storage prefix after full workflow" | Verifies personal storage paths do not appear in git status after a full workflow including save-context. | T-GIT-2 | Two assertions: (1) not.toContain(personalPrefix) where personalPrefix is an absolute path like /tmp/... — git status --porcelain shows relative paths only, so this assertion is vacuous and can NEVER fail regardless of implementation behavior. (2) not.toContain('tgit-ctx') — meaningful; catches the specific case where the implementation saves the named context inside the project directory. An implementation that routes personal contexts inside the project under any name other than 'tgit-ctx' (e.g., .claude/tasks/tgit-task/contexts/other.jsonl) would pass both assertions while violating the AC. | INADEQUATE |
Remediation Loop¶
It found a lot of problems the first time it ran! But I found the best way to remediate the problems was to address them in the PRD, the acceptance criteria or the test plan — only then regenerate the code. I now feel confident that if someone complained that Context-Curator contained crappy or stupid code, I could write upstream criteria to deal with it.
I'm now to the point where the code is basically IR. While Context-Curator is not a high-performance project like Sqlite, I feel somewhat confident for a growing subset of projects, this process will work — it comes down to specs and tests.
Risk Acceptance¶
After several remediation cycles, I couldn't reach a completely clean LoD2 challenge. In fact, it looked like the constructor would oscillate, so I finally asked it, "Hey, why are you not able to fix the remaining issues, and why do you keep flipping your approaches to the remaining challenges?" The constructor basically came back with, "The adversary is a pedantic asshole. It makes me test things that don't test what it thinks it does. I run the test one way and it complains. I run it the other way and it complains."
LOL! the oscillation is exactly what happens at work and we resolve it through negotiations that result in risk acceptances. So we added the risk acceptance concept to the project where we can call out things that don't matter, that the adversary doesn't understand or that we can't fix now. We adopted language similar to OCC/FFIEC guidance and it seems to work well.
Here is an example risk acceptance for the test inventory item above:
RA_ID: RA-002
SCOPE: T-RESUME-MANUAL
FINDING: No automated test exists for T-RESUME-MANUAL. The structural proxy test
(claude-md-system.test.ts:Test 8.5) covers the automated side (@import path setup),
but no evidence exists that the manual end-to-end resume flow has been executed
or logged.
SEVERITY: LOW
DISPOSITION: ACCEPTED
RATIONALE: T-RESUME-MANUAL tests that Claude Code's /resume command reads the @import path
correctly — this depends on Claude Code internals that cannot be automated in
integration tests. The structural proxy (Test 8.5) confirms the @import is wired
correctly. The manual step is a UX validation that requires a live Claude Code
session. Accepted pending a documented manual test run before next major release.
APPROVED_BY: jeffw
APPROVED_DATE: 2026-03-12
EXPIRY: v2.0-release
So Why Boss-Fight Coding?¶
The point of the boss-fight level is to come with your A-game: high-quality PRD, acceptance criteria and the test plans. If you want your code to reach prod, you need to beat The Boss at the end of the level. Preparation is a lot of work! It's the type of work we've always known leads to better outcomes, yet where we've always under-invested. Today we can use gen/AI to reduce the cognitive load in the generation of these artifacts, and it's going to lead to better outcomes no matter who writes the code.
Human Context, Machine Context¶
Writing code was never the most important activity (outside of greenfield startups), and writing new code takes up a small minority of a developer's time. We need humans for sense-making, context generation and communication at all points along the systems development journey: from conception to operation. In this new process some developers might generate code as they come up with new algorithms or abstractions, but even that type of work is better with higher-quality upstream artifacts like PRDs and acceptance criteria. As we shift cognitive labor upstream, we're likely to find there is a lot more of it that we ignored or avoided in the past — and that's where we will get quality gains.
With any luck we can have something better than Workday!