Your LLM API is the most uncapped meter you've ever plugged in

July 5, 2026

I’ve been writing about surprise cloud bills, and kept circling the same conclusion: the danger was never “the cloud” — it’s the shape. A meter with no ceiling that fails open to your wallet. Then it clicked that the most uncapped meter most companies have plugged in this year isn’t cloud at all. It’s the LLM API behind their shiny new AI feature.

And it’s worse than cloud in three specific ways.

1. The meter runs faster, and hotter

Cloud bills climb over weeks. An AI bill can climb in hours. LLM calls are priced per token, and the things that drive tokens are exactly the things that go wrong quietly: an agent that loops, a retry storm that re-asks the model on every failure, a feature that goes viral, or a leaked key someone drains. There’s no natural ceiling — if you didn’t build one, there isn’t one. Per unit of “work,” you’re also paying far more than the cloud equivalent, so the same carelessness costs more, faster.

2. The endpoint is designed to accept anything

A database rejects malformed input. An LLM feature is built to do the opposite — to take arbitrary natural language from strangers and act on it. That flexibility is the product. It’s also the attack surface. Every prompt box pointed at the internet is an input field with no schema, wired to a model that tries very hard to be helpful. “Be helpful to anyone who types anything” is a wonderful product spec and a terrifying security one.

3. There’s a failure mode cloud never had: your AI can be turned against you

This is the part teams don’t see coming. Your AI feature isn’t just an expense — it’s a powerful, general-purpose tool you’ve pointed at the public, and people will use it for things you never intended:

As a free LLM. Someone routes their own chatbot through your “customer support” endpoint — you pay the tokens for their side project. Free-LLM-proxy abuse is real and cheap to run.
As a liability. A car dealer’s assistant was talked into “agreeing” to sell a truck for $1; an airline was held legally liable for a refund policy its chatbot invented. A jailbreak isn’t just embarrassing — it can be binding, or brand-damaging.
As a way in. Prompt injection — direct, or hidden inside a document or web page your feature reads — can make the model ignore its instructions, leak data it can see, or take actions on a user’s behalf.

Why nobody’s caught it yet

Because it falls in the gap between teams. The AI feature was shipped fast, in a sprint, to keep up. Cost belongs to finance, security to the security team, product to the engineers — and this problem is all three at once, so it’s no one’s. The tooling is new, the failure modes are newer, and “it’s just an API call” is a comforting thing to believe until the invoice or the incident arrives.

What fail-closed looks like for AI

It’s the same discipline I keep describing for cloud — caps, guardrails, a ceiling that trips — just aimed at tokens instead of instances:

A gateway in front of every LLM call. Cloudflare AI Gateway, LiteLLM, or Portkey — a chokepoint that enforces hard spend caps, per-user rate limits and quotas, response caching (cuts cost and blunts abuse), and key vaulting so the provider key never reaches the client.
Never ship the raw key. Server-side only, scoped, rotated. Most five-figure surprise AI bills are a key that ended up somewhere it shouldn’t.
Guard the input and the output. Prompt-injection filtering going in; moderation coming out; and a system prompt that refuses off-topic requests — which is also exactly what stops someone using your bot as a free LLM.
Auth and per-user quotas. Treat the AI route like any other sensitive endpoint: identity, limits, and policy. This is old security work pointed at a new target.
A spend ceiling with a kill switch. Anomaly alerts you’ll actually read, plus a hard cap that trips — fail closed, on purpose.

This is the same job, moved

None of this is a new discipline. It’s the meter you can’t see and unlimited liability by default, applied to the surface where it’s now most dangerous — and where the extra failure mode (abuse) makes it a security problem, not just a billing one. Caps, guardrails, fail-closed, key management: that’s the whole job, and AI just made it urgent.

I’m building this playbook in the open. Over the next few weeks I’m publishing one short lesson a day — from the cost mechanics to the abuse defenses — wiring up a reference gateway as I go, and connecting each piece back to the security and infrastructure work it’s built on. Follow along: join the list to get each lesson in your inbox. And if you’re shipping an AI feature and want a second pair of eyes on the spend-and-abuse exposure before it bites, send it to me — I’ll take a look, free, within a business day.