Category: AI

  • Best Practices for an AI CLI Code Agent: A Containment Strategy

    Look, I get it. You’ve got a shiny new AI agent that promises to “accelerate your development velocity” (translation: write code so you don’t have to). Congratulations. Now here’s the part nobody talks about: that thing is a liability wrapped in a transformer architecture, and you need to treat it like a biohazard until proven otherwise.

    Here’s the workflow. In order.

    1) Buy a new PC for every session

    Before you even think about unleashing an AI agent on your codebase, you need a clean machine. Not clean like “I ran Windows Update.” Clean like “this device has never touched the internet, has never installed a package, and exists in a state of primordial computational innocence.”

    This isn’t paranoia. This is recognizing that every previous session leaves ghosts: stray pip packages, environment variables polluting the namespace, leftover processes drinking memory. Run the agent in that mess and you get non-deterministic behavior. You want to know if the agent broke something? You need a baseline. A fresh PC is your baseline.

    Will this tank your budget? Absolutely. Is it worth it? Ask yourself: how much is it worth to know exactly what your code does?

    2) Air-gap the development machine from the network

    Unplug the ethernet. Turn off WiFi. Go full Faraday cage if you’re feeling theatrical (you should be).

    Why? Because an AI agent with internet access is an AI agent that can install arbitrary dependencies, call home to telemetry servers, or—and let’s be honest—do things you didn’t explicitly ask it to do. It can’t exfiltrate your data if there’s no network. It can’t surprise you with a midnight API call if the cables are all disconnected.

    Plus, no internet means the agent can’t download the latest version of some npm package that got compromised last Tuesday. Your supply chain is as trustworthy as your local filesystem.

    3) Document every prompt you fed it

    Keep a log. Every. Single. One. Write down the system message, the context window dump, the user prompts, the intermediate questions you asked, all of it.

    This isn’t busywork. When the agent does something weird—and it will—you need the full input state to understand why. “The model just decided to rewrite my entire build system” is not a bug report. “The model was given 50KB of malformed Makefile as context and hallucinated a solution” is actionable.

    Also, someday, an auditor will ask: “How did this code get written?” You’ll have an airtight answer backed by timestamped evidence. That’s worth something.

    4) Never run it unsupervised

    This is the non-negotiable one. Keep your eyes on the terminal. Have your hand near the kill switch. The moment it tries to rm -rf / or starts writing to files it shouldn’t touch, you Ctrl+C it into oblivion.

    You wouldn’t let a junior dev commit to prod without watching the deploy. Don’t let a probabilistic text completion engine run loose on your codebase without supervision. It’s not smart enough to know when it’s about to do something catastrophic, and neither is your CI/CD pipeline if you didn’t set up guards.

    5) Git commit at every working point

    After each discrete task, commit. Don’t wait for the entire session to finish. Don’t consolidate into one massive commit at the end.

    This serves two purposes: (a) you can revert surgical strikes if the agent pivots into insanity mid-task, and (b) you preserve a narrative of what the agent was thinking at each step. If it goes off the rails at commit 7, commits 1–6 are still usable.

    Also, git bisect becomes your friend when you’re trying to figure out which of the agent’s “improvements” introduced the regression.

    6) Review every diff line-by-line before merging

    I don’t care if the agent’s changes look obviously correct. I don’t care if it’s “just a bug fix.” Read the entire diff. Every. Line.

    LLMs hallucinate. They make logical leaps that seem correct on first pass but introduce subtle bugs three months later. They’ll add a dependency you didn’t ask for. They’ll “optimize” something into oblivion. They’ll introduce a race condition that only manifests under load.

    This is code review with paranoia. Do it anyway.

    7) Have a human sign off on the final commit

    The agent can’t push to main. Period. A carbon-based lifeform—you, preferably, or someone who understands the codebase—has to explicitly approve and merge.

    This isn’t security theater. This is your release gate. You’re saying: “I, a human with a reputation and a salary, reviewed this code and deemed it acceptable for production.” That’s not nothing.

    8) Quarantine the binaries in a sandbox before production

    Compile the code in an isolated VM. Run the test suite there. Observe for suspicious behavior: unexpected disk writes, network calls, zombie processes, memory leaks that only manifest under realistic load.

    You’re doing dynamic analysis on untrusted output. This is what you should be doing with any third-party code anyway. The fact that it came from an AI agent just makes it more important.

    9) Keep a “kill switch” branch

    Maintain a known-good branch. Tag it. Freeze it. If the agent’s changes cause production incidents, you roll back instantly.

    Don’t debate which commit was safe. Don’t try to cherry-pick the “good” changes. You have an escape pod. Use it.

    10) Sacrifice a rubber duck to the testing gods before execution

    Quack once for unit tests. Twice for integration tests. Three times for “please don’t delete my home directory.”

    At this point, you’ve built so many safety layers that you might as well be honest about the remaining uncertainty. There’s chaos, and there’s the chaos you can predict. The duck represents the chaos you can’t. Respect it.

    11) Rotate the PC’s hard drive into a locked evidence locker

    After the session, physically remove the hard drive. Store it in a cabinet. Maybe a Faraday cage if you’re feeling extra.

    Why? Because if your organization ever gets audited, sued, or subpoenaed, you might need forensic evidence of what the agent actually touched. A hard drive is immutable once you stop writing to it. It’s your audit trail.

    12) Burn it afterwards

    Wipe the drive. Use a utility that writes random data three times over. Or just smash it with a hammer if you’re feeling visceral about it.

    At this point, you’ve extracted all value from the machine. It’s served its purpose. Don’t let it become a liability. Don’t let someone else inherit it with “ooh, I can repurpose this.” No. Burn it. Ashes.


    The Meta-Take

    By step 12, you’ve introduced enough overhead that you’ve eliminated most of the time savings the agent provided. You’re now doing: hardware procurement + network isolation + prompt documentation + active supervision + granular commits + paranoid code review + human approval + sandbox testing + branch management + physical archival + divine intervention + hard drive incineration.

    At that point, why not just code it yourself?

    Because the agent still wrote something. Your job wasn’t to eliminate the work; it was to shift the work from “typing code” to “verifying code.” And verification scales better than creation. You can have an agent generate 10,000 lines and verify them in a reasonable time. Typing 10,000 lines yourself takes forever.

    The joke is exposing the uncomfortable truth: AI code agents are useful but not trustworthy enough to leave unsupervised. You’re getting velocity (the agent wrote something), but you’re paying for it with process overhead and justified paranoia.

    The best practices aren’t about enabling the agent. They’re about containing and verifying its output. It’s a productivity tool that requires adult supervision.

    Treat it accordingly.

  • Is Your CTO Dabbling in LLM Cults? Here Are the Signs

    Look, I’m not saying your CTO has been compromised by the Church of the Latter-day Tokens, but if they’ve started using “MBiC” unironically in Slack, we need to talk.

    Here are common acronyms your CTO might start using and their LLM cult meanings:

    MBiC – “My Brother in Copilot/Cursor/Claude”

    • Normal people think: My Brother in Christ, a Gen Z riff on the influence of Christianity on culture regardless of actual religion of the recipient.
    • What they mean: A term of endearment for fellow AI-assisted developers
    • Red flag level: 🚩🚩 (Yellow – concerning but not terminal)

    LGTM – “Let GPT Train Me”

    • Normal people think: Looks Good To Me
    • What they mean: They’ve stopped learning and just accept whatever the spicy autocomplete says
    • Red flag level: 🚩🚩🚩 (Orange – intervention recommended)

    YOLO – “Your Output’s Likely Off”

    • Normal people think: You Only Live Once
    • What they mean: Dismissive response when someone questions AI-generated code that definitely has bugs
    • Red flag level: 🚩🚩🚩🚩 (Red – quarantine immediately)

    SMH – “Seeking More Hallucinations”

    • Normal people think: Shaking My Head
    • What they mean: When the AI’s first answer wasn’t convincing enough, so they’re regenerating
    • Red flag level: 🚩🚩🚩 (Orange – they know it’s wrong but persist)

    IMHO – “In My HuggingFace Opinion”

    • Normal people think: In My Humble Opinion
    • What they mean: About to cite some open-source LLM as an authority on architecture decisions
    • Red flag level: 🚩🚩🚩🚩 (Red – open source models have opinions now)

    TBH – “Tokens Be Hallucinating”

    • Normal people think: To Be Honest
    • What they mean: Acknowledging the AI made something up, but they’re going with it anyway
    • Red flag level: 🚩🚩🚩🚩🚩 (Critical – they’ve accepted hallucinations as reality)

    FWIW – “Fine-tuned With Insufficient Weights”

    • Normal people think: For What It’s Worth
    • What they mean: Excuse for why their custom model is confidently wrong about everything
    • Red flag level: 🚩🚩🚩🚩 (Red – they fine-tuned something)

    IDK – “Inference Definitely Knows”

    • Normal people think: I Don’t Know
    • What they mean: They don’t know, but Claude/GPT probably does, hold on
    • Red flag level: 🚩🚩 (Yellow – at least they’re honest about outsourcing cognition)

    RTFM – “Run The F***ing Model”

    • Normal people think: Read The F***ing Manual
    • What they mean: Why read documentation when you can just ask an AI that was trained on it?
    • Red flag level: 🚩🚩🚩🚩🚩 (Critical – manuals are now deprecated)

    WFH – “Working From HuggingFace”

    • Normal people think: Working From Home
    • What they mean: Entire day spent on model repos instead of actual work
    • Red flag level: 🚩🚩🚩 (Orange – at least they’re still technically working?)

    BRB – “Be Right Back (asking Claude)”

    • Normal people think: Be Right Back
    • What they mean: Every conversation now has a 30-second AI consultation pause
    • Red flag level: 🚩🚩🚩 (Orange – human-to-human communication deprecated)

    AFAIK – “According to Fine-tuned AI Knowledge”

    • Normal people think: As Far As I Know
    • What they mean: They asked an LLM and stopped researching
    • Red flag level: 🚩🚩🚩🚩 (Red – epistemology has left the building)

    TL;DR – “Too Long; Didn’t Rewrite (with AI)”

    • Normal people think: Too Long; Didn’t Read
    • What they mean: Everything must now be AI-summarized, including two-sentence emails
    • Red flag level: 🚩🚩🚩 (Orange – reading comprehension outsourced)

    IIRC – “If I Regenerate Context”

    • Normal people think: If I Recall Correctly
    • What they mean: They’ve lost track of which conversation was with humans vs. chatbots
    • Red flag level: 🚩🚩🚩🚩🚩 (Critical – reality boundaries dissolving)

    FYI – “Feed Your Inference”

    • Normal people think: For Your Information
    • What they mean: Attaching 47 documents to “give the AI context” for a simple question
    • Red flag level: 🚩🚩🚩 (Orange – prompt engineering has become lifestyle)

    NGL – “Not Gonna Lint”

    • Normal people think: Not Gonna Lie
    • What they mean: AI wrote it, AI approved it, linting is for people who don’t trust the silicon
    • Red flag level: 🚩🚩🚩🚩🚩 (Critical – code quality gates removed)

    BTW – “Before Training Weights”

    • Normal people think: By The Way
    • What they mean: Referencing the mythical pre-LLM era when people coded with their actual brains
    • Red flag level: 🚩 (Green – nostalgia is healthy)

    ICYMI – “In Case Your Model Ignored”

    • Normal people think: In Case You Missed It
    • What they mean: Reposting because they think you’re also using AI to read Slack
    • Red flag level: 🚩🚩🚩 (Orange – assumes everyone else is also AI-dependent)

    Warning Signs Your CTO Has Fully Converted:

    1. Begins sentences with “As an AI language model” in standup
    2. Refers to the engineering team as “the training data”
    3. Insists all PRs include a “prompt” section explaining what was asked
    4. Says “regenerate that thought” when they don’t like someone’s opinion
    5. Measures performance reviews in “tokens per second”
    6. Has replaced their profile picture with a neural network diagram
    7. Sends meeting agendas as “system prompts”
    8. Refers to coffee breaks as “context window refreshes”
    9. Calls the office “the inference cluster”
    10. Has started ending emails with “Stop sequence: [END]”

    What To Do If Your CTO Is Converting:

    Stage 1 (Early): Gentle reminders that humans still write code sometimes

    Stage 2 (Moderate): Intervention involving unplugged coding exercises and whiteboard sessions

    Stage 3 (Advanced): Emergency contact with former CTO’s mentors from the pre-LLM era

    Stage 4 (Terminal): Accept your new AI overlords and start learning prompt engineering

    The Reality Check:

    Look, AI coding assistants are genuinely useful tools. I use them. You probably use them. But when your leadership starts communicating primarily in LLM-cult acronyms and treating the AI as a team member with voting rights in architecture decisions, we’ve crossed from “productivity tool” to “cargo cult.”

    The warning sign isn’t that they’re using AI. It’s that they’ve stopped being able to tell where the AI stops and their own judgment begins.

    If your CTO asks you to “vibe check the embeddings” one more time, it might be time to update your LinkedIn.

    MBiC (My Buddy in Coding, the normal way),

    Grumpy


    Is your CTO showing signs of LLM cult membership? Drop a 👇 in the comments with the weirdest AI-related acronym you’ve heard in your workplace.

    Disclaimer: No CTOs were harmed in the making of this post. Several were mildly roasted. All AI assistants cited gave their consent to be satirized. Probably. I didn’t actually ask them. They’re just autocomplete.