Your LLM API is the most uncapped meter you've ever plugged in

July 5, 2026

I’ve been writing about surprise cloud bills, and kept circling the same conclusion: the danger was never “the cloud” — it’s the shape. A meter with no ceiling that fails open to your wallet. Then it clicked that the most uncapped meter most companies have plugged in this year isn’t cloud at all. It’s the LLM API behind their shiny new AI feature.

And it’s worse than cloud in three specific ways.

1. The meter runs faster, and hotter

Cloud bills climb over weeks. An AI bill can climb in hours. LLM calls are priced per token, and the things that drive tokens are exactly the things that go wrong quietly: an agent that loops, a retry storm that re-asks the model on every failure, a feature that goes viral, or a leaked key someone drains. There’s no natural ceiling — if you didn’t build one, there isn’t one. Per unit of “work,” you’re also paying far more than the cloud equivalent, so the same carelessness costs more, faster.

2. The endpoint is designed to accept anything

A database rejects malformed input. An LLM feature is built to do the opposite — to take arbitrary natural language from strangers and act on it. That flexibility is the product. It’s also the attack surface. Every prompt box pointed at the internet is an input field with no schema, wired to a model that tries very hard to be helpful. “Be helpful to anyone who types anything” is a wonderful product spec and a terrifying security one.

3. There’s a failure mode cloud never had: your AI can be turned against you

This is the part teams don’t see coming. Your AI feature isn’t just an expense — it’s a powerful, general-purpose tool you’ve pointed at the public, and people will use it for things you never intended:

Why nobody’s caught it yet

Because it falls in the gap between teams. The AI feature was shipped fast, in a sprint, to keep up. Cost belongs to finance, security to the security team, product to the engineers — and this problem is all three at once, so it’s no one’s. The tooling is new, the failure modes are newer, and “it’s just an API call” is a comforting thing to believe until the invoice or the incident arrives.

What fail-closed looks like for AI

It’s the same discipline I keep describing for cloud — caps, guardrails, a ceiling that trips — just aimed at tokens instead of instances:

This is the same job, moved

None of this is a new discipline. It’s the meter you can’t see and unlimited liability by default, applied to the surface where it’s now most dangerous — and where the extra failure mode (abuse) makes it a security problem, not just a billing one. Caps, guardrails, fail-closed, key management: that’s the whole job, and AI just made it urgent.


I’m building this playbook in the open. Over the next few weeks I’m publishing one short lesson a day — from the cost mechanics to the abuse defenses — wiring up a reference gateway as I go, and connecting each piece back to the security and infrastructure work it’s built on. Follow along: join the list to get each lesson in your inbox. And if you’re shipping an AI feature and want a second pair of eyes on the spend-and-abuse exposure before it bites, send it to me — I’ll take a look, free, within a business day.

Learning to cap AI cost & abuse yourself?

This series is becoming a hands-on course — the gateway, the spend caps, the abuse guardrails, built step by step. Join the waitlist: you'll get each lesson as it lands, and first access when it opens.

Double opt-in — one email to confirm. The lessons are free; the course is optional. No spam, unsubscribe anytime.