Abbeal

Transport · Montréal

Canadian operator: 12 data silos → lakehouse, real-time KPIs.

Inconsistent KPIs, dashboards 48h late. Databricks lakehouse, medallion, dbt, self-service BI.

KPI

60%

analystes autonomes

Duration

9 mois

Team

6 engineers

Hub(s)

Montréal

DatabricksdbtTableauAirflowAzure

When every department gives you a different number for the same KPI, you're not making decisions: you're arbitrating between opinions.

The context

Canadian transport operator, 11,000 employees, Montreal hub. 12 historical data silos (operations, HR, finance, ticketing, maintenance, etc.), aging Oracle data warehouse, Excel dashboards sent by email.

The problem

  • 12 data silos with no shared governance
  • Inconsistent KPIs across departments (up to 18% gap on the same indicator)
  • 48h late dashboards, manual updates
  • No data catalog, duplicates and ambiguous definitions
  • Analysts stuck on SQL extracts, low business autonomy

The approach

Databricks data lakehouse with medallion architecture (bronze/silver/gold), Unity Catalog governance, versioned dbt transformations, self-service Tableau BI with semantic layer.

The pillars

  • Real-time ingestion via Auto Loader (Kafka + files)
  • dbt dimensional modeling, mandatory quality tests
  • Unity Catalog for governance, lineage, RBAC
  • Semantic layer exposed to Tableau (centralized business definitions)
  • Enablement program: 60% of analysts trained in 4 months

The stack

  • Databricks Lakehouse Platform on Azure
  • dbt Cloud for versioned transformations
  • Unity Catalog for governance and lineage
  • Tableau Cloud with semantic layer
  • Apache Airflow for ingestion orchestration

The results

  1. Single source of truth: 100% of operational KPIs reconciled
  2. Dashboard latency: 48h to real-time (sub-minute on 80% of KPIs)
  3. Autonomous analysts: 60% in 4 months (vs 12 targeted)
  4. Data costs: -22% despite 3x on processed volumes
  5. Data quality issues: -76% in 9 months
« For the first time in 15 years, my operations and finance teams argue about action levers, no longer about numbers. That's the ROI of a data platform. »
Head of Data · Canadian transport operator

What we learned

Unity Catalog is Databricks' real differentiator, not the Spark engine. dbt scales very well up to 800 models, beyond that you need to invest in modularization. Mistake: we shipped the gold layer before consolidating silver, expensive rollbacks. To redo: never open analyst access before 90% of dbt tests are green. Otherwise, you lose trust and don't get it back.

A similar case at your place?

Talk to an architect