The way out of the CloudWatch trap: a flat-rate log stack, and the math

July 2, 2026

I was a bit annoyed after the last post. I’d explained how CloudWatch Logs Insights quietly bills you per gigabyte scanned — how an always-on dashboard becomes a metered query loop that can cost more than the servers it watches — and then I stopped at “move it to a flat-rate store.” That’s the right answer, but it’s exactly the kind of hand-wave I complain about in other people’s writing. So I went and worked out the actual setup: the hardware, the software, the migration, and — because it’s the whole point — the savings.

Here’s how you make the alternative actually work.

The principle, in one line

CloudWatch charges you every time a query looks at your logs. A box you rent flat doesn’t. So you move the storing and the querying onto a machine you pay a fixed monthly for — and a dashboard that refreshes every ten seconds costs exactly the same as one that never refreshes, because it’s just CPU you already bought. Everything below is how to do that without it becoming a second job.

The stack: Grafana + Loki

Three moving parts (plus an optional fourth). Crucially, you keep Grafana, so your dashboards barely change — you just repoint them.

PieceRoleReplaces
Grafana Alloy (or Vector / Fluent Bit)Runs on your hosts/containers, tails logs, ships them out. Can also pull existing CloudWatch groups during cutover.the CloudWatch agent
LokiStores + queries logs. Indexes only labels (service, host, level), keeps bodies as compressed chunks, and does not bill per GB scanned — queries are just compute on your box.CloudWatch Logs + Logs Insights
GrafanaThe dashboards you already have. Swap each panel’s datasource to Loki (LogQL). Refresh as fast as you like; it’s free now.CloudWatch dashboards
Object storage (optional)Point Loki’s chunk store at MinIO on the box, or Backblaze B2 / Hetzner Object Storage, so long retention is cheap and the box stays disposable.CloudWatch Logs retention

Why Loki specifically: it was built to be cheap. It doesn’t full-text-index every byte the way OpenSearch does — it indexes labels and greps compressed chunks — so it’s light on RAM and disk, and querying carries no per-scan meter.

The hardware: one flat-rate box

The entire point is a fixed monthly price with generous, unmetered traffic — so Hetzner or OVH. Size by log volume:

Volume / retentionBox~Cost/mo
A few GB/day, weeks of retentionHetzner Cloud CPX31 (4 vCPU, 8 GB, 160 GB NVMe)~$15
Moderate, want NVMe + headroomHetzner EX44 / AX42 dedicated (64 GB RAM, 2× NVMe)~$45
Big / long retention (100s of GB–TBs)Dedicated + Loki chunks on B2 / Object Storage (or a Hetzner SX box for big HDDs)~$45–100

Loki is light — 8–16 GB RAM handles a lot. Keep hot, recent chunks on NVMe for snappy dashboards and push older chunks to object storage for cheap retention.

The setup: one docker-compose

The whole footprint is a single Compose file behind nginx + certbot, backed up with Restic. Roughly:

services:
  loki:
    image: grafana/loki:3.0.0
    command: -config.file=/etc/loki/config.yml
    volumes: ["./loki-config.yml:/etc/loki/config.yml", "loki-data:/loki"]
    restart: unless-stopped
  grafana:
    image: grafana/grafana:latest
    ports: ["127.0.0.1:3000:3000"]      # nginx terminates TLS in front
    environment: ["GF_SECURITY_ADMIN_PASSWORD=change-me"]
    volumes: ["grafana-data:/var/lib/grafana"]
    depends_on: ["loki"]
    restart: unless-stopped
  alloy:
    image: grafana/alloy:latest
    command: run /etc/alloy/config.alloy
    volumes: ["./alloy.alloy:/etc/alloy/config.alloy", "/var/log:/var/log:ro"]
    depends_on: ["loki"]
    restart: unless-stopped
volumes:
  loki-data:
  grafana-data:

alloy.alloy says what to ship and where (tail these files / scrape these containers → push to http://loki:3100); loki-config.yml sets retention and, if you want it, the object-storage backend. Add the Loki datasource in Grafana pointed at http://loki:3100, and your existing panels work with LogQL.

The migration: don’t rip out CloudWatch — split it

Low-risk because you run both side by side until you trust the new one:

  1. Stand up the stack on the box.
  2. Ship logs in parallel — point Alloy at Loki alongside CloudWatch for a week, so you can compare and trust it.
  3. Repoint the dashboards — switch your Grafana operational panels to the Loki datasource.
  4. Cut over app/host logs; keep a thin slice of CloudWatch for AWS-native alarms and metrics you can’t get elsewhere.
  5. Cut CloudWatch retention on the moved log groups (retention is itself billed) and watch the next invoice drop.

The savings — the actual math

Take the example from the last post: a 30 GB log group re-queried by an always-on dashboard runs roughly ~$650/month in Logs-Insights scan charges — and it grows every time you add a panel or speed up a refresh. On the box, that same querying is $0 at the margin; you pay the rent.

CloudWatch (this pattern)Grafana + Loki on a box
Storing the logs~$1/moincluded
Querying them (always-on dashboard)~$650+/mo, and rising$0 at the margin
The box~$15–50/mo flat
5-year cost of the query bill~$39,000+~$900–3,000 (box)

The honest footnote: that box is not free to run. Someone configures the stack, sets retention, and keeps it patched — price that at engineer rates and a self-run log stack is realistically $200–500/month all-in. It still beats a four-figure-and-climbing scan bill decisively; the point is to compare against the honest number, not a fantasy “$0.” And back Loki’s chunks to object storage so the box itself is disposable — if it dies, you redeploy the Compose file and re-attach the data.

When to keep some CloudWatch

Don’t rip it all out. Keep a thin slice for AWS-native alarms (auto-scaling triggers, managed-service internals you can’t get elsewhere) and, if your volume is genuinely small, the free tier. And if nobody will own the stack, a managed Grafana Cloud tier may beat a self-host you’ll let rot — honest monitoring you don’t run is worse than a bill you understand.

The alternatives, briefly


That’s the whole thing: one box, one Compose file, and querying that no longer bills you by the gigabyte. If you’d like the “what am I paying to scan?” line broken out of your own bill — and a target stack sized to your team and volume — send me a recent cloud bill and I’ll send back a one-page teardown within a business day. For the short version, see the CloudWatch Logs Insights vs Loki comparison.

Don't miss new posts

I publish honest, sourced breakdowns of cloud-exit economics — egress, storage, monitoring, reliability — and the occasional announcement. Leave your email and I'll let you know when something new goes up.

Double opt-in — you'll get one email to confirm. No spam, unsubscribe anytime. Read by me, never shared.