Category: AI

  • The Flat-Fee Coding Subscription Is Already Dead. You Just Haven’t Gotten the Email Yet.

    Filed under: Things Procurement Will Blame On You In Q3


    Remember when your $20/month ChatGPT Plus subscription felt like getting away with something? Remember when Claude Pro at $20 and then Claude Max at $100 or $200 felt like a steal because you were burning through context windows like a chain smoker at a tax audit? Remember when Cursor was $20/month and Windsurf was $15 and you told your manager “this pays for itself in like fifteen minutes of saved Stack Overflow scrolling”?

    Enjoy it. Take a picture. Frame it next to your “Unlimited Data” cell phone bill from 2011.

    The Pricing Model Was Always A Lie

    Flat-fee AI coding subscriptions exist for exactly one reason: customer acquisition. They are the free shrimp at the casino buffet. They are not a business model. They are a user-acquisition-cost line item that some VP of Growth is going to have to defend on a board call in approximately six weeks.

    Here’s the math nobody at your company wants to do out loud:

    • A moderately active developer using an agentic coding tool burns through 5–20 million tokens per day on a real project. Agents re-read files. They re-read them again. They re-read them a third time because the first two times didn’t count apparently.
    • At current API rates for a frontier model, that’s roughly $15–$75/day in raw inference cost. Call it $400–$1,500/month per developer in actual compute.
    • Your company is paying $20. Maybe $200 if somebody upgraded to the “Max” tier.

    You don’t need an MBA to see where this goes. You need a calculator and the emotional maturity to accept bad news.

    The Three-Act Play You’re Currently In The Middle Of

    Act I: The Honeymoon (You Are Here → Six Months Ago) Everything is unlimited. Everyone is vibing. Engineering leadership is posting on LinkedIn about “10x productivity.” A staff engineer somewhere is quietly running a background agent that generates 400,000 tokens every ninety seconds and nobody notices.

    Act II: The Soft Rug Pull (Happening Right Now, Actually) Suddenly there are “fair use limits.” “Weekly caps.” “Priority queues.” “Usage-based pricing on top of your subscription for heavy workloads.” Your tool mysteriously gets slower after 2pm. The context window “has been optimized.” A new tier appears above the one you’re on. Then another one above that. The $20 plan still exists, technically, the way a 1994 Geo Metro still exists.

    Act III: The Enterprise-Only Endgame (Coming To A Q4 Budget Meeting Near You) The consumer tier quietly dies or becomes a toy. The real product is now a minimum $50K/year enterprise contract with a six-week procurement cycle, a SOC 2 questionnaire, a mandatory “AI Center of Excellence” kickoff call, and per-seat-plus-per-token pricing where “per token” is the part nobody read. Your individual developer access? API keys only. Billed by the token. At retail rates. Welcome back to metered computing, it’s 1974 again and your mainframe has opinions.

    The Cost Delta Nobody Is Putting In The Slide Deck

    Let me do the math your finance team is going to do for you in about four months, except I’ll do it now so you can panic on your own schedule.

    Today, flat fee:

    • Individual developer: $20–$200/month
    • Team of 10: $200–$2,000/month
    • Annual, 10 devs: $2,400–$24,000

    Tomorrow, token-metered reality:

    • Light use (1M tokens/day, 20 workdays): ~$300/month/dev at blended frontier pricing
    • Moderate agentic use (5M tokens/day): ~$1,500/month/dev
    • Heavy agentic use (15M tokens/day, which is what your tools actually do when you let them): ~$4,500/month/dev
    • Team of 10, moderate use, annual: ~$180,000
    • Team of 10, heavy use, annual: ~$540,000

    So the pricing delta between what you pay now and what you’re about to pay is somewhere between 7.5x and 22x. For the exact same work. That you’re already doing. With the exact same tool. On the exact same codebase.

    And that’s before the enterprise contract markup, which historically runs 20–40% above raw API costs because somebody has to pay for the Customer Success Manager who schedules the quarterly business reviews you never attend.

    “But Grumpy, Won’t Model Costs Come Down?”

    Sure. And then the models will get bigger. And the agents will become more aggressive. And the context windows will grow. And you’ll go from “re-read the file three times” to “re-read the entire monorepo on every turn because the reasoning model decided it needed to be thorough.” Jevons paradox isn’t a theory, it’s a calendar reminder.

    Token costs per million have dropped roughly 10x in eighteen months. Token consumption per task has gone up roughly 50x in the same window. You do the arithmetic. Actually, don’t. It’ll ruin your weekend.

    What This Looks Like In Practice At Your Company

    • March: “We’re standardizing on [Tool X] for the whole org! Productivity revolution!”
    • April: A Slack channel called #ai-tools-feedback gets created. It has 400 members by Friday.
    • June: An email from IT: “We’re transitioning to a new billing model to better align with usage patterns.”
    • July: Finance sends a spreadsheet. The spreadsheet has a column called “Projected Q4 Overage.”
    • August: A new policy: “AI tool usage must be pre-approved by your manager for tasks exceeding 500,000 tokens.” Nobody knows what a token is. The policy is enforced anyway.
    • September: Your skip-level asks in a 1:1, “Are you really getting value from these tools?” It is not a real question.
    • October: The tool is replaced with a cheaper in-house wrapper around a smaller model that hallucinates import statements.
    • November: Leadership announces “disciplined AI spend” in the all-hands. Everybody claps.
    • December: The same VP who launched the rollout gets promoted for “cost optimization.”

    The Grumpy Prediction

    By this time next year, flat-fee AI coding subscriptions will exist in approximately the same way that unlimited mobile data plans exist: on paper, with an asterisk, a throttle, and a paragraph of fine print longer than the Treaty of Westphalia. The real action will be enterprise contracts, API keys, and a new role at your company called “AI FinOps Lead” whose entire job is to tell you to stop using the thing you were told to start using six months ago.

    The developers who saw this coming will be the ones who kept a .env file, a personal API key, and a healthy skepticism toward any business model that includes the word “unlimited.” The developers who didn’t will be the ones writing retrospective blog posts about “lessons learned from our AI tooling journey.”

    Guess which one gets invited to speak at the conference.


    The author is a Grumpy Coworker who has been through exactly this movie with cloud compute, with CI minutes, with observability vendors, with feature flag platforms, and with at least two code search tools. The ending is always the same. The popcorn is always stale.

  • Best Practices for an AI CLI Code Agent: A Containment Strategy

    Look, I get it. You’ve got a shiny new AI agent that promises to “accelerate your development velocity” (translation: write code so you don’t have to). Congratulations. Now here’s the part nobody talks about: that thing is a liability wrapped in a transformer architecture, and you need to treat it like a biohazard until proven otherwise.

    Here’s the workflow. In order.

    1) Buy a new PC for every session

    Before you even think about unleashing an AI agent on your codebase, you need a clean machine. Not clean like “I ran Windows Update.” Clean like “this device has never touched the internet, has never installed a package, and exists in a state of primordial computational innocence.”

    This isn’t paranoia. This is recognizing that every previous session leaves ghosts: stray pip packages, environment variables polluting the namespace, leftover processes drinking memory. Run the agent in that mess and you get non-deterministic behavior. You want to know if the agent broke something? You need a baseline. A fresh PC is your baseline.

    Will this tank your budget? Absolutely. Is it worth it? Ask yourself: how much is it worth to know exactly what your code does?

    2) Air-gap the development machine from the network

    Unplug the ethernet. Turn off WiFi. Go full Faraday cage if you’re feeling theatrical (you should be).

    Why? Because an AI agent with internet access is an AI agent that can install arbitrary dependencies, call home to telemetry servers, or—and let’s be honest—do things you didn’t explicitly ask it to do. It can’t exfiltrate your data if there’s no network. It can’t surprise you with a midnight API call if the cables are all disconnected.

    Plus, no internet means the agent can’t download the latest version of some npm package that got compromised last Tuesday. Your supply chain is as trustworthy as your local filesystem.

    3) Document every prompt you fed it

    Keep a log. Every. Single. One. Write down the system message, the context window dump, the user prompts, the intermediate questions you asked, all of it.

    This isn’t busywork. When the agent does something weird—and it will—you need the full input state to understand why. “The model just decided to rewrite my entire build system” is not a bug report. “The model was given 50KB of malformed Makefile as context and hallucinated a solution” is actionable.

    Also, someday, an auditor will ask: “How did this code get written?” You’ll have an airtight answer backed by timestamped evidence. That’s worth something.

    4) Never run it unsupervised

    This is the non-negotiable one. Keep your eyes on the terminal. Have your hand near the kill switch. The moment it tries to rm -rf / or starts writing to files it shouldn’t touch, you Ctrl+C it into oblivion.

    You wouldn’t let a junior dev commit to prod without watching the deploy. Don’t let a probabilistic text completion engine run loose on your codebase without supervision. It’s not smart enough to know when it’s about to do something catastrophic, and neither is your CI/CD pipeline if you didn’t set up guards.

    5) Git commit at every working point

    After each discrete task, commit. Don’t wait for the entire session to finish. Don’t consolidate into one massive commit at the end.

    This serves two purposes: (a) you can revert surgical strikes if the agent pivots into insanity mid-task, and (b) you preserve a narrative of what the agent was thinking at each step. If it goes off the rails at commit 7, commits 1–6 are still usable.

    Also, git bisect becomes your friend when you’re trying to figure out which of the agent’s “improvements” introduced the regression.

    6) Review every diff line-by-line before merging

    I don’t care if the agent’s changes look obviously correct. I don’t care if it’s “just a bug fix.” Read the entire diff. Every. Line.

    LLMs hallucinate. They make logical leaps that seem correct on first pass but introduce subtle bugs three months later. They’ll add a dependency you didn’t ask for. They’ll “optimize” something into oblivion. They’ll introduce a race condition that only manifests under load.

    This is code review with paranoia. Do it anyway.

    7) Have a human sign off on the final commit

    The agent can’t push to main. Period. A carbon-based lifeform—you, preferably, or someone who understands the codebase—has to explicitly approve and merge.

    This isn’t security theater. This is your release gate. You’re saying: “I, a human with a reputation and a salary, reviewed this code and deemed it acceptable for production.” That’s not nothing.

    8) Quarantine the binaries in a sandbox before production

    Compile the code in an isolated VM. Run the test suite there. Observe for suspicious behavior: unexpected disk writes, network calls, zombie processes, memory leaks that only manifest under realistic load.

    You’re doing dynamic analysis on untrusted output. This is what you should be doing with any third-party code anyway. The fact that it came from an AI agent just makes it more important.

    9) Keep a “kill switch” branch

    Maintain a known-good branch. Tag it. Freeze it. If the agent’s changes cause production incidents, you roll back instantly.

    Don’t debate which commit was safe. Don’t try to cherry-pick the “good” changes. You have an escape pod. Use it.

    10) Sacrifice a rubber duck to the testing gods before execution

    Quack once for unit tests. Twice for integration tests. Three times for “please don’t delete my home directory.”

    At this point, you’ve built so many safety layers that you might as well be honest about the remaining uncertainty. There’s chaos, and there’s the chaos you can predict. The duck represents the chaos you can’t. Respect it.

    11) Rotate the PC’s hard drive into a locked evidence locker

    After the session, physically remove the hard drive. Store it in a cabinet. Maybe a Faraday cage if you’re feeling extra.

    Why? Because if your organization ever gets audited, sued, or subpoenaed, you might need forensic evidence of what the agent actually touched. A hard drive is immutable once you stop writing to it. It’s your audit trail.

    12) Burn it afterwards

    Wipe the drive. Use a utility that writes random data three times over. Or just smash it with a hammer if you’re feeling visceral about it.

    At this point, you’ve extracted all value from the machine. It’s served its purpose. Don’t let it become a liability. Don’t let someone else inherit it with “ooh, I can repurpose this.” No. Burn it. Ashes.


    The Meta-Take

    By step 12, you’ve introduced enough overhead that you’ve eliminated most of the time savings the agent provided. You’re now doing: hardware procurement + network isolation + prompt documentation + active supervision + granular commits + paranoid code review + human approval + sandbox testing + branch management + physical archival + divine intervention + hard drive incineration.

    At that point, why not just code it yourself?

    Because the agent still wrote something. Your job wasn’t to eliminate the work; it was to shift the work from “typing code” to “verifying code.” And verification scales better than creation. You can have an agent generate 10,000 lines and verify them in a reasonable time. Typing 10,000 lines yourself takes forever.

    The joke is exposing the uncomfortable truth: AI code agents are useful but not trustworthy enough to leave unsupervised. You’re getting velocity (the agent wrote something), but you’re paying for it with process overhead and justified paranoia.

    The best practices aren’t about enabling the agent. They’re about containing and verifying its output. It’s a productivity tool that requires adult supervision.

    Treat it accordingly.

  • Is Your CTO Dabbling in LLM Cults? Here Are the Signs

    Look, I’m not saying your CTO has been compromised by the Church of the Latter-day Tokens, but if they’ve started using “MBiC” unironically in Slack, we need to talk.

    Here are common acronyms your CTO might start using and their LLM cult meanings:

    MBiC – “My Brother in Copilot/Cursor/Claude”

    • Normal people think: My Brother in Christ, a Gen Z riff on the influence of Christianity on culture regardless of actual religion of the recipient.
    • What they mean: A term of endearment for fellow AI-assisted developers
    • Red flag level: 🚩🚩 (Yellow – concerning but not terminal)

    LGTM – “Let GPT Train Me”

    • Normal people think: Looks Good To Me
    • What they mean: They’ve stopped learning and just accept whatever the spicy autocomplete says
    • Red flag level: 🚩🚩🚩 (Orange – intervention recommended)

    YOLO – “Your Output’s Likely Off”

    • Normal people think: You Only Live Once
    • What they mean: Dismissive response when someone questions AI-generated code that definitely has bugs
    • Red flag level: 🚩🚩🚩🚩 (Red – quarantine immediately)

    SMH – “Seeking More Hallucinations”

    • Normal people think: Shaking My Head
    • What they mean: When the AI’s first answer wasn’t convincing enough, so they’re regenerating
    • Red flag level: 🚩🚩🚩 (Orange – they know it’s wrong but persist)

    IMHO – “In My HuggingFace Opinion”

    • Normal people think: In My Humble Opinion
    • What they mean: About to cite some open-source LLM as an authority on architecture decisions
    • Red flag level: 🚩🚩🚩🚩 (Red – open source models have opinions now)

    TBH – “Tokens Be Hallucinating”

    • Normal people think: To Be Honest
    • What they mean: Acknowledging the AI made something up, but they’re going with it anyway
    • Red flag level: 🚩🚩🚩🚩🚩 (Critical – they’ve accepted hallucinations as reality)

    FWIW – “Fine-tuned With Insufficient Weights”

    • Normal people think: For What It’s Worth
    • What they mean: Excuse for why their custom model is confidently wrong about everything
    • Red flag level: 🚩🚩🚩🚩 (Red – they fine-tuned something)

    IDK – “Inference Definitely Knows”

    • Normal people think: I Don’t Know
    • What they mean: They don’t know, but Claude/GPT probably does, hold on
    • Red flag level: 🚩🚩 (Yellow – at least they’re honest about outsourcing cognition)

    RTFM – “Run The F***ing Model”

    • Normal people think: Read The F***ing Manual
    • What they mean: Why read documentation when you can just ask an AI that was trained on it?
    • Red flag level: 🚩🚩🚩🚩🚩 (Critical – manuals are now deprecated)

    WFH – “Working From HuggingFace”

    • Normal people think: Working From Home
    • What they mean: Entire day spent on model repos instead of actual work
    • Red flag level: 🚩🚩🚩 (Orange – at least they’re still technically working?)

    BRB – “Be Right Back (asking Claude)”

    • Normal people think: Be Right Back
    • What they mean: Every conversation now has a 30-second AI consultation pause
    • Red flag level: 🚩🚩🚩 (Orange – human-to-human communication deprecated)

    AFAIK – “According to Fine-tuned AI Knowledge”

    • Normal people think: As Far As I Know
    • What they mean: They asked an LLM and stopped researching
    • Red flag level: 🚩🚩🚩🚩 (Red – epistemology has left the building)

    TL;DR – “Too Long; Didn’t Rewrite (with AI)”

    • Normal people think: Too Long; Didn’t Read
    • What they mean: Everything must now be AI-summarized, including two-sentence emails
    • Red flag level: 🚩🚩🚩 (Orange – reading comprehension outsourced)

    IIRC – “If I Regenerate Context”

    • Normal people think: If I Recall Correctly
    • What they mean: They’ve lost track of which conversation was with humans vs. chatbots
    • Red flag level: 🚩🚩🚩🚩🚩 (Critical – reality boundaries dissolving)

    FYI – “Feed Your Inference”

    • Normal people think: For Your Information
    • What they mean: Attaching 47 documents to “give the AI context” for a simple question
    • Red flag level: 🚩🚩🚩 (Orange – prompt engineering has become lifestyle)

    NGL – “Not Gonna Lint”

    • Normal people think: Not Gonna Lie
    • What they mean: AI wrote it, AI approved it, linting is for people who don’t trust the silicon
    • Red flag level: 🚩🚩🚩🚩🚩 (Critical – code quality gates removed)

    BTW – “Before Training Weights”

    • Normal people think: By The Way
    • What they mean: Referencing the mythical pre-LLM era when people coded with their actual brains
    • Red flag level: 🚩 (Green – nostalgia is healthy)

    ICYMI – “In Case Your Model Ignored”

    • Normal people think: In Case You Missed It
    • What they mean: Reposting because they think you’re also using AI to read Slack
    • Red flag level: 🚩🚩🚩 (Orange – assumes everyone else is also AI-dependent)

    Warning Signs Your CTO Has Fully Converted:

    1. Begins sentences with “As an AI language model” in standup
    2. Refers to the engineering team as “the training data”
    3. Insists all PRs include a “prompt” section explaining what was asked
    4. Says “regenerate that thought” when they don’t like someone’s opinion
    5. Measures performance reviews in “tokens per second”
    6. Has replaced their profile picture with a neural network diagram
    7. Sends meeting agendas as “system prompts”
    8. Refers to coffee breaks as “context window refreshes”
    9. Calls the office “the inference cluster”
    10. Has started ending emails with “Stop sequence: [END]”

    What To Do If Your CTO Is Converting:

    Stage 1 (Early): Gentle reminders that humans still write code sometimes

    Stage 2 (Moderate): Intervention involving unplugged coding exercises and whiteboard sessions

    Stage 3 (Advanced): Emergency contact with former CTO’s mentors from the pre-LLM era

    Stage 4 (Terminal): Accept your new AI overlords and start learning prompt engineering

    The Reality Check:

    Look, AI coding assistants are genuinely useful tools. I use them. You probably use them. But when your leadership starts communicating primarily in LLM-cult acronyms and treating the AI as a team member with voting rights in architecture decisions, we’ve crossed from “productivity tool” to “cargo cult.”

    The warning sign isn’t that they’re using AI. It’s that they’ve stopped being able to tell where the AI stops and their own judgment begins.

    If your CTO asks you to “vibe check the embeddings” one more time, it might be time to update your LinkedIn.

    MBiC (My Buddy in Coding, the normal way),

    Grumpy


    Is your CTO showing signs of LLM cult membership? Drop a 👇 in the comments with the weirdest AI-related acronym you’ve heard in your workplace.

    Disclaimer: No CTOs were harmed in the making of this post. Several were mildly roasted. All AI assistants cited gave their consent to be satirized. Probably. I didn’t actually ask them. They’re just autocomplete.