Abbeal

Mobilité urbaine · Paris + Montréal

Mobility scale-up: −30% cloud bill, same SLOs.

AWS bill doubled in 18 months without matching traffic growth. GreenOps audit, refactor, Karpenter, ARM64. Measured outcome.

KPI

−30%

facture cloud

Duration

9 mois

Team

4 engineers

Hub(s)

Paris + Montréal

GoKubernetesKarpenterPrometheusOpenTelemetry

An AWS bill that doubles in 18 months without traffic following: that's rarely growth, it's architecture debt being paid in cash.

The context

Urban mobility scale-up, 180 people, Paris and Montreal hubs, 4 million active users. Microservices platform in production for four years, platform team of 12 engineers. The CFO slammed the table: annual cloud bill exceeded 2.1M USD with 22% traffic growth over the same period.

The problem

  • AWS spend x2 in 18 months for +22% traffic
  • No team budget, no FinOps in place
  • Systematic over-provisioning of EKS nodes (14% average CPU)
  • MTTR at 47 minutes, noisy alerting, no distributed tracing
  • Engineers unable to attribute a cost to a service

The approach

Six weeks of GreenOps audit, then seven months of incremental remediation. No big bang, no re-platforming. We started by measuring, then cut obvious fat, then rethought what needed rethinking.

The four workstreams

  • Full observability: Prometheus, Grafana, Tempo, cost attribution per namespace via Kubecost
  • Cluster Autoscaler migration to Karpenter: tight packing, spot first, aggressive consolidation
  • ARM64 Graviton on 60% of stateless workloads after benchmarks
  • Smart scheduling: night batches on interruptible spot, taints/tolerations reset

The stack

  • Go 1.22, Kubernetes 1.29 on EKS
  • Karpenter 0.34, Graviton2/3 (c7g, m7g)
  • Prometheus, Grafana, Tempo, OpenTelemetry SDK
  • Kubecost for attribution, Terraform for IaC

The results

  1. Cloud bill: -30% at iso-SLO over 9 months (-630k USD/year)
  2. MTTR: 47 min to 11 min (divided by 4)
  3. Average cluster CPU: 14% to 51%
  4. Traffic absorbed: +35% with no added capacity
  5. Estimated carbon footprint: -38% (client Scope 3 report)
« Abbeal taught us to look at our bill as an engineering signal, not as fate. We recovered budget to reinvest in the product. »
CTO · Urban mobility scale-up

What we learned

Karpenter is a game changer but demands rigor on pod disruption budgets. ARM64 works on 60% of workloads, not 100%: some third-party C++ binaries resisted us for two months. The real sustainable lever is FinOps embedded in the team: we trained two internal relays so the drop holds after we leave.

A similar case at your place?

Talk to an architect