Best Practices for an AI CLI Code Agent: A Containment Strategy

Look, I get it. You’ve got a shiny new AI agent that promises to “accelerate your development velocity” (translation: write code so you don’t have to). Congratulations. Now here’s the part nobody talks about: that thing is a liability wrapped in a transformer architecture, and you need to treat it like a biohazard until proven otherwise.

Here’s the workflow. In order.

1) Buy a new PC for every session

Before you even think about unleashing an AI agent on your codebase, you need a clean machine. Not clean like “I ran Windows Update.” Clean like “this device has never touched the internet, has never installed a package, and exists in a state of primordial computational innocence.”

This isn’t paranoia. This is recognizing that every previous session leaves ghosts: stray pip packages, environment variables polluting the namespace, leftover processes drinking memory. Run the agent in that mess and you get non-deterministic behavior. You want to know if the agent broke something? You need a baseline. A fresh PC is your baseline.

Will this tank your budget? Absolutely. Is it worth it? Ask yourself: how much is it worth to know exactly what your code does?

2) Air-gap the development machine from the network

Unplug the ethernet. Turn off WiFi. Go full Faraday cage if you’re feeling theatrical (you should be).

Why? Because an AI agent with internet access is an AI agent that can install arbitrary dependencies, call home to telemetry servers, or—and let’s be honest—do things you didn’t explicitly ask it to do. It can’t exfiltrate your data if there’s no network. It can’t surprise you with a midnight API call if the cables are all disconnected.

Plus, no internet means the agent can’t download the latest version of some npm package that got compromised last Tuesday. Your supply chain is as trustworthy as your local filesystem.

3) Document every prompt you fed it

Keep a log. Every. Single. One. Write down the system message, the context window dump, the user prompts, the intermediate questions you asked, all of it.

This isn’t busywork. When the agent does something weird—and it will—you need the full input state to understand why. “The model just decided to rewrite my entire build system” is not a bug report. “The model was given 50KB of malformed Makefile as context and hallucinated a solution” is actionable.

Also, someday, an auditor will ask: “How did this code get written?” You’ll have an airtight answer backed by timestamped evidence. That’s worth something.

4) Never run it unsupervised

This is the non-negotiable one. Keep your eyes on the terminal. Have your hand near the kill switch. The moment it tries to rm -rf / or starts writing to files it shouldn’t touch, you Ctrl+C it into oblivion.

You wouldn’t let a junior dev commit to prod without watching the deploy. Don’t let a probabilistic text completion engine run loose on your codebase without supervision. It’s not smart enough to know when it’s about to do something catastrophic, and neither is your CI/CD pipeline if you didn’t set up guards.

5) Git commit at every working point

After each discrete task, commit. Don’t wait for the entire session to finish. Don’t consolidate into one massive commit at the end.

This serves two purposes: (a) you can revert surgical strikes if the agent pivots into insanity mid-task, and (b) you preserve a narrative of what the agent was thinking at each step. If it goes off the rails at commit 7, commits 1–6 are still usable.

Also, git bisect becomes your friend when you’re trying to figure out which of the agent’s “improvements” introduced the regression.

6) Review every diff line-by-line before merging

I don’t care if the agent’s changes look obviously correct. I don’t care if it’s “just a bug fix.” Read the entire diff. Every. Line.

LLMs hallucinate. They make logical leaps that seem correct on first pass but introduce subtle bugs three months later. They’ll add a dependency you didn’t ask for. They’ll “optimize” something into oblivion. They’ll introduce a race condition that only manifests under load.

This is code review with paranoia. Do it anyway.

7) Have a human sign off on the final commit

The agent can’t push to main. Period. A carbon-based lifeform—you, preferably, or someone who understands the codebase—has to explicitly approve and merge.

This isn’t security theater. This is your release gate. You’re saying: “I, a human with a reputation and a salary, reviewed this code and deemed it acceptable for production.” That’s not nothing.

8) Quarantine the binaries in a sandbox before production

Compile the code in an isolated VM. Run the test suite there. Observe for suspicious behavior: unexpected disk writes, network calls, zombie processes, memory leaks that only manifest under realistic load.

You’re doing dynamic analysis on untrusted output. This is what you should be doing with any third-party code anyway. The fact that it came from an AI agent just makes it more important.

9) Keep a “kill switch” branch

Maintain a known-good branch. Tag it. Freeze it. If the agent’s changes cause production incidents, you roll back instantly.

Don’t debate which commit was safe. Don’t try to cherry-pick the “good” changes. You have an escape pod. Use it.

10) Sacrifice a rubber duck to the testing gods before execution

Quack once for unit tests. Twice for integration tests. Three times for “please don’t delete my home directory.”

At this point, you’ve built so many safety layers that you might as well be honest about the remaining uncertainty. There’s chaos, and there’s the chaos you can predict. The duck represents the chaos you can’t. Respect it.

11) Rotate the PC’s hard drive into a locked evidence locker

After the session, physically remove the hard drive. Store it in a cabinet. Maybe a Faraday cage if you’re feeling extra.

Why? Because if your organization ever gets audited, sued, or subpoenaed, you might need forensic evidence of what the agent actually touched. A hard drive is immutable once you stop writing to it. It’s your audit trail.

12) Burn it afterwards

Wipe the drive. Use a utility that writes random data three times over. Or just smash it with a hammer if you’re feeling visceral about it.

At this point, you’ve extracted all value from the machine. It’s served its purpose. Don’t let it become a liability. Don’t let someone else inherit it with “ooh, I can repurpose this.” No. Burn it. Ashes.

The Meta-Take

By step 12, you’ve introduced enough overhead that you’ve eliminated most of the time savings the agent provided. You’re now doing: hardware procurement + network isolation + prompt documentation + active supervision + granular commits + paranoid code review + human approval + sandbox testing + branch management + physical archival + divine intervention + hard drive incineration.

At that point, why not just code it yourself?

Because the agent still wrote something. Your job wasn’t to eliminate the work; it was to shift the work from “typing code” to “verifying code.” And verification scales better than creation. You can have an agent generate 10,000 lines and verify them in a reasonable time. Typing 10,000 lines yourself takes forever.

The joke is exposing the uncomfortable truth: AI code agents are useful but not trustworthy enough to leave unsupervised. You’re getting velocity (the agent wrote something), but you’re paying for it with process overhead and justified paranoia.

The best practices aren’t about enabling the agent. They’re about containing and verifying its output. It’s a productivity tool that requires adult supervision.

Treat it accordingly.

Best Practices for an AI CLI Code Agent: A Containment Strategy

1) Buy a new PC for every session

2) Air-gap the development machine from the network

3) Document every prompt you fed it

4) Never run it unsupervised

5) Git commit at every working point

6) Review every diff line-by-line before merging

7) Have a human sign off on the final commit

8) Quarantine the binaries in a sandbox before production

9) Keep a “kill switch” branch

10) Sacrifice a rubber duck to the testing gods before execution

11) Rotate the PC’s hard drive into a locked evidence locker

12) Burn it afterwards

The Meta-Take

Comments

Leave a Reply Cancel reply

More posts

Best Practices for an AI CLI Code Agent: A Containment Strategy

Quick Call Request Response Theory: A Game-Theoretic Analysis of Synchronous Communication Coercion in Knowledge Work Environments

The Real C-Suite Acronym Guide: What Those Titles Actually Stand For

Is Your CTO Dabbling in LLM Cults? Here Are the Signs