CloudWatch costs too much: the self-hosted stack that cuts it ~90%

TL;DR. CloudWatch is priced so that watching your servers can cost as much as running them. Log ingestion is about $0.50/GB, every custom metric is about $0.30/month, and ad-hoc log queries (Logs Insights) are billed per GB scanned. At a few terabytes of logs a month, that is thousands of dollars just to see what your system is doing. The open-source replacement — Prometheus + Grafana + Loki (or the leaner VictoriaMetrics / VictoriaLogs) on one flat-rate box — does the same job for roughly 90% less, and you stop being billed by the gigabyte for your own telemetry (your logs and metrics). This is usually the highest-percentage saving in a whole cloud-exit. Below: what CloudWatch actually costs at scale, the stack that replaces it, and the honest cases where you should keep some CloudWatch anyway.

Why CloudWatch surprises people

Monitoring feels like it should be cheap — it is “just logs and numbers.” CloudWatch’s pricing model turns that intuition upside down, because it meters the three things that grow fastest in a busy system:

Log ingestion — ~$0.50/GB. Every gigabyte of logs your apps emit is billed on the way in. A chatty service at debug level can ingest terabytes a month without anyone noticing until the invoice.
Custom metrics — ~$0.30 per metric per month. Innocent until you discover cardinality: one metric tagged per-customer or per-endpoint becomes thousands of billable metrics. This is the classic CloudWatch bill-shock.
Logs Insights — ~$0.005 per GB scanned. Every investigation re-scans your logs and bills you for it. Debugging an incident at 2 a.m. is also a metered query.
Plus dashboards (~$3 each/mo), alarms, and per-request API charges on top.

None of these are abusive on their own. Together, on a system at real scale, they produce a monitoring bill that routinely lands in the thousands per month — to observe infrastructure that may itself cost less than that.

What CloudWatch costs at scale

Approximate monthly cost of log ingestion alone, before metrics, queries, dashboards or alarms. Compared with a self-hosted stack on one flat-rate box. Prices approximate, June 2026 — verify before quoting.

Logs ingested / month	CloudWatch ingest	Self-hosted (box + ops)
100 GB	~$50	rent of one box (~$50–90)
500 GB	~$250	rent only
1 TB	~$500	rent only
5 TB	~$2,500	rent + ops time
10 TB	~$5,000	rent + ops time

Now add the parts that usually dominate the real bill:

Add-on	CloudWatch	Self-hosted
1,000 custom metrics	~$300/mo	$0 (Prometheus scrapes for free)
10,000 custom metrics (cardinality)	~$3,000/mo	$0
Ad-hoc log queries (2 TB scanned/mo)	~$10/mo + re-scans	$0 (Loki/VictoriaLogs query for free)

A mid-size system easily reaches $1,000–4,000/month on CloudWatch. The same observability on one Hetzner/OVH box running Prometheus + Grafana + Loki costs the rent of the box — call it $50–90 plus the ops time to run it — a ~90% cut, often more as metric cardinality grows.

The honest footnote. That box is not free. Someone configures the stack, sets retention, and keeps it patched — priced at engineer rates a self-run observability stack is realistically $200–500/month all-in. It still beats four-figure CloudWatch bills decisively; the point is to compare against the honest number, not a fantasy “$0.”

The stack that replaces it

Tool	Replaces	Notes
Prometheus	CloudWatch custom metrics	Pull-based metrics, the de-facto standard. Scrapes exporters; no per-metric charge.
Grafana	CloudWatch dashboards	Dashboards + alerting UI over all of the below. Far better visualisations than CloudWatch.
Loki	CloudWatch Logs	Log aggregation that indexes labels, not full text — cheap to store, fast to query.
Alloy / Vector / Fluent Bit	the CloudWatch agent	Ship logs and metrics from your hosts to Loki/Prometheus.
VictoriaMetrics / VictoriaLogs	Prometheus + Loki (leaner)	Drop-in, far more resource-efficient at scale — fewer/cheaper boxes for the same volume.
Netdata	per-second host metrics	Zero-config, per-second granularity; great for single-host and edge.
Uptime Kuma	CloudWatch Synthetics / status	Self-hosted uptime checks + a public status page.

A common, boring, effective layout: Alloy ships logs+metrics → VictoriaMetrics + VictoriaLogs store them → Grafana displays and alerts. One box for most teams; two for redundancy.

When to keep some CloudWatch (don’t rip it all out)

I tell clients where CloudWatch genuinely earns its place:

AWS-native control loops. Alarms that drive auto-scaling, AWS infra metrics you can’t easily get any other way (some are CloudWatch-only), and Lambda/managed-service internals. Keep a thin slice of CloudWatch for those even after you ship application logs elsewhere — or scrape them into Prometheus with the YACE exporter.
Small volume. The free tier (5 GB logs, 10 custom metrics, 3 dashboards) covers a small app. Don’t stand up a monitoring cluster to save $40/month.
No ops appetite. If nobody will own the stack, a managed Grafana Cloud / Datadog free-or-cheap tier may beat a self-host you’ll let rot. Honest monitoring you don’t run is worse than a bill you understand.

The win is concentrated where the meter runs hardest: high log volume and high-cardinality custom metrics. That’s what to move first.

How the migration actually goes

Stand up the stack on one box (Prometheus/VictoriaMetrics + Grafana + Loki/VictoriaLogs).
Ship telemetry in parallel. Point Alloy/Vector/Fluent Bit at the new stack alongside CloudWatch — dual for a week so you can compare and trust it.
Rebuild the dashboards and alerts in Grafana (often clearer than the originals).
Cut over app logs and custom metrics; keep the thin CloudWatch slice for AWS-native alarms/metrics.
Turn down CloudWatch retention on the moved log groups (retention is itself billed) and watch the next invoice drop.

The reason this is low-risk: you run both side by side until the self-hosted view is the one you instinctively open during an incident. Then the CloudWatch bill stops being the price of knowing what your system is doing.

A four-figure CloudWatch bill is exactly the kind of line I pull apart in a Cloud-Exit Assessment — with the real numbers and a target stack sized to your team. Or send me a recent cloud bill and I’ll break out the monitoring spend and estimate your saving in 24 hours, free. Read by me, never shared.