Transport · Montréal

Canadian operator: 12 data silos → lakehouse, real-time KPIs.

Inconsistent KPIs, dashboards 48h late. Databricks lakehouse, medallion, dbt, self-service BI.

KPI

60%

analystes autonomes

Duration

9 mois

Team

6 engineers

Hub(s)

Montréal

DatabricksdbtTableauAirflowAzure

When every department gives you a different number for the same KPI, you're not making decisions: you're arbitrating between opinions.

The context

Canadian transport operator, 11,000 employees, Montreal hub. 12 historical data silos (operations, HR, finance, ticketing, maintenance, etc.), aging Oracle data warehouse, Excel dashboards sent by email.

The problem

12 data silos with no shared governance
Inconsistent KPIs across departments (up to 18% gap on the same indicator)
48h late dashboards, manual updates
No data catalog, duplicates and ambiguous definitions
Analysts stuck on SQL extracts, low business autonomy

The approach

Databricks data lakehouse with medallion architecture (bronze/silver/gold), Unity Catalog governance, versioned dbt transformations, self-service Tableau BI with semantic layer.

The pillars

Real-time ingestion via Auto Loader (Kafka + files)
dbt dimensional modeling, mandatory quality tests
Unity Catalog for governance, lineage, RBAC
Semantic layer exposed to Tableau (centralized business definitions)
Enablement program: 60% of analysts trained in 4 months

The stack

Databricks Lakehouse Platform on Azure
dbt Cloud for versioned transformations
Unity Catalog for governance and lineage
Tableau Cloud with semantic layer
Apache Airflow for ingestion orchestration

The results

Single source of truth: 100% of operational KPIs reconciled
Dashboard latency: 48h to real-time (sub-minute on 80% of KPIs)
Autonomous analysts: 60% in 4 months (vs 12 targeted)
Data costs: -22% despite 3x on processed volumes
Data quality issues: -76% in 9 months

« For the first time in 15 years, my operations and finance teams argue about action levers, no longer about numbers. That's the ROI of a data platform. »

— Head of Data · Canadian transport operator

What we learned

Unity Catalog is Databricks' real differentiator, not the Spark engine. dbt scales very well up to 800 models, beyond that you need to invest in modularization. Mistake: we shipped the gold layer before consolidating silver, expensive rollbacks. To redo: never open analyst access before 90% of dbt tests are green. Otherwise, you lose trust and don't get it back.

// Read next

A similar case at your place?

Talk to an architect