Infrastructure
Edge computing: when offloading computation to the edge becomes profitable
Latency, bandwidth, offline: edge computing promises a lot. But it also adds real operational complexity. Where do you draw the line?
Edge computing has been coming up in every infra discussion for two years now. The promise: bring computation closer to the data to reduce latency, cut bandwidth, work offline. Reality on the ground: it's true, but it comes with an operational cost that many underestimate. Deploying code to a fleet of dispersed devices, monitoring them, patching them, managing network failures… it's a job in itself.
We've supported several edge projects over the past three years—from retail with real-time video analysis to isolated industrial sites. The pattern repeats: the technical architecture holds up quickly, but it's the ops part that hurts. This article lays out the real decision criteria and shares the pitfalls we've seen in prod.
No marketing bullshit: if your use case works with 200 ms latency and a stable connection, stay in the cloud. But if you're in one of the cases below, edge becomes a real option.
The four triggers that make edge profitable
Edge computing isn't justified by hype. We consider it when one of these four constraints becomes blocking in production.
1. Incompressible latency
When every millisecond counts. Typically: video surveillance with real-time intrusion detection, quality control on production lines, driving assistance. If your SLA requires a decision under 50 ms, a cloud round-trip takes you out of budget. Local inference becomes mandatory.
Concrete example: a retail client analyzes in-store flows to detect suspicious behavior. The model runs on Jetson Nano (€99/device), inference at 30 FPS, decision in 15-20 ms. Impossible with a round-trip to AWS, even in a nearby region.
2. Prohibitive bandwidth
Sending 4K video streams, LiDAR point clouds, or high-frequency IoT flows to the cloud is expensive—in euros and network latency. Edge allows you to filter, aggregate, and only push back anomalies or metadata.
We saw an industrial project go from 12 TB/day pushed to the cloud to 400 GB by deploying edge processing that only pushes qualified events. Direct savings in AWS egress costs (~$1,000/month saved) and network load.
3. Mandatory offline availability
Isolated sites, oil platforms, autonomous vehicles, warehouses with spotty network coverage. If your service must keep running when the WAN link drops, you need local autonomy. Edge becomes your fallback by design.
4. Data sovereignty
GDPR, sector regulations, contractual clauses: some data simply cannot leave the site. Video analysis in hospitals, HR data, confidential industrial processes. Edge computing allows processing on-site and only pushing back anonymized or aggregated data.
Edge vs cloud: complementary, not exclusive
Edge doesn't replace the cloud. Both work together in a well-designed hybrid architecture. The cloud centralizes model training, metric aggregation, long-term storage, and global supervision. Edge executes inference, filters data, makes critical decisions locally.
Classic pattern we deploy for our clients:
- Edge: real-time inference (TensorRT, ONNX Runtime), local decisions, critical data cache, keeps running offline.
- Cloud: model training and fine-tuning, cross-site aggregation, global dashboards, S3 storage, OTA update distribution.
- Bidirectional sync: edge pushes anomalies and metrics, cloud pushes new models and configs. MQTT, gRPC, or HTTP/2 depending on network constraints.
This decoupling maintains local resilience while keeping the cloud's power for everything that isn't time-critical.
Edge AI: inference closest to the sensor
Deploying ML models directly on the device is edge AI. Concretely: you train in the cloud (A100 GPUs, large datasets), then export an optimized model for embedded (INT8 quantization, pruning, distillation) and deploy it on a local accelerator.
Typical stack we use in prod:
- Hardware: NVIDIA Jetson (Nano, Xavier, Orin depending on budget), Google Coral Edge TPU, Intel NCS2, or even Raspberry Pi 4 for light workloads.
- Formats: ONNX (maximum interop), TensorRT (max perf on Jetson), TFLite (mobile/edge).
- Optimization: INT8 post-training quantization (4x faster, 4x less memory), pruning if the model tolerates it.
Example with numbers: a classic ResNet-50 does ~25 FPS on Jetson Nano. Quantized INT8 + TensorRT, we get to 90 FPS. That changes everything for multi-camera video analysis.
The trap: not all models quantize well. Test post-quantization accuracy on your real data. We've seen cases where accuracy dropped from 92% to 78% after INT8, making the model unusable. In that case, either keep FP16 or fall back to the cloud.
The real challenge: distributed operations
Getting code to prod on 10 devices is doable with SSH and a bash script. Managing 500 heterogeneous devices spread across 50 sites is an engineering problem in itself. It's the friction point we systematically see underestimated in the design phase.
OTA (Over-The-Air) deployment
You must be able to deploy a new version of the model or runtime across the entire fleet without physical intervention. That requires:
- A centralized registry (Harbor, ECR, Artifactory) for artifacts (containers, models, configs).
- A local agent on each device that polls or receives updates (balenaOS, AWS IoT Greengrass, Azure IoT Edge, or custom with MQTT).
- A progressive rollout strategy: canary on 5% of the fleet, validate metrics, then full deployment. If it breaks, automatic rollback.
- A robust rollback mechanism. Devices must be able to revert to version N-1 without manual intervention.
We saw a project stall for 3 months because a defective OTA update bricked 40% of the fleet, and technicians had to be sent on-site to manually reflash. Cost: several tens of k€ in intervention + service loss.
Per-device observability
You need to know in real-time the state of each device: deployed version, CPU/RAM/disk, application errors, network connectivity. Without that, you're debugging blind.
Typical stack:
- System metrics: Telegraf or Prometheus node_exporter, pushed to Victoria Metrics or central Prometheus.
- Application logs: Fluent Bit locally, aggregation to Loki or CloudWatch. With intelligent sampling to avoid saturating the WAN link.
- Alerting: based on critical metrics (device offline > 5 min, inference rate < threshold, disk > 90%).
The classic mistake: monitoring only the application and forgetting the system layer. Result: you discover the device is constantly swapping or storage is full only when everything crashes.
Distributed security
Each edge device is an attack surface. Especially if they're physically accessible (stores, warehouses, industrial sites). Minimum checklist:
- Encryption at rest: full disk encryption (LUKS, dm-crypt).
- Mutual authentication: TLS certificates for device ↔ cloud communication.
- Automatic patching: OS vulnerabilities come out regularly, you must be able to patch without intervention.
- Principle of least privilege: devices should only have access to strictly necessary cloud resources (IAM policies or equivalent).
- Signed artifacts: verify signatures of images/models before deployment to avoid malicious code injection.
We audited a project where devices all used the same SSH key hardcoded in the image. One compromised device = entire fleet compromised. Don't do that.
When NOT to do edge
Because it's not always the right answer. A few signals that should keep you in the cloud:
- Tolerable latency: if 200-500 ms round-trip is fine, the cloud is simpler.
- Reliable connectivity: if your sites have fiber or stable 4G/5G, the offline argument falls.
- Low volumes: a few devices (< 20) don't justify setting up a complete OTA infrastructure.
- Limited ops team: managing distributed edge requires DevOps/SRE skills. If you don't have the bandwidth, the cloud stays safer.
- Constrained budget: initial cost (hardware + OTA infra dev) is non-negligible. If ROI isn't clear, validate a cloud PoC first.
Edge adds complexity. Only pay for it if a measurable gain—in latency, network costs, availability—justifies it. Otherwise, you're accumulating debt for nothing.
Checklist before launching
If after all this you're still convinced edge is the right option, here are the milestones to set before scaling:
- PoC on 3-5 devices: validate the technical stack (hardware, runtime, model) and measure real performance.
- Automate OTA deployment: don't scale without it. Test rollout + rollback on the PoC.
- Set up observability: metrics, logs, alerting. Validate that you see what's happening in prod.
- Simulate a network outage: cut the cloud link and verify the device keeps working, then resyncs correctly.
- Test security: basic pen test, verify encryption, certificates, IAM policies.
- Document runbooks: what do we do if a device is offline? If a version causes problems? If storage is full?
Once these six points are validated on the PoC, you can scale progressively (10, 50, 200 devices). But not before.
What we take away from the field
Edge computing is a real technical answer when latency, bandwidth, or offline availability become constraining. But it's also a real operational challenge that shouldn't be underestimated. Projects that succeed are those that invest from the start in OTA tooling, observability, and security—not those that tell themselves they'll optimize "later".
If you're evaluating an edge project and want to challenge the architecture or ops roadmap with people who've done it in prod, [we're here for that](https://abbeal.com/contact). We prefer asking the right questions upfront rather than debugging a bricked fleet six months after go-live.
// Read next
Business
Output-based vs Time & Material: why we killed T&M at Abbeal.
78% of Abbeal portfolio runs on Output-based pricing in 2026. Gross margin +18 pts, NPS +24, engagement length ×1.7. How we operate and 3 success conditions.
11 min
Talent
How to build a senior engineering team across Asia, Europe and North America
The playbook for assembling a senior engineering team that operates across three continents — Asia, Europe and North America. The Abbeal three-hub model: Paris · Montréal · Tokyo.
7 min
IA
How I automated a tech consulting CEO's day with Claude (and what you can learn from it).
30 workflows orchestrated on Notion + BoondManager + Google Workspace + LinkedIn + Apollo + Calendly + Tactiq, no new SaaS. 4 pillars: multichannel anti-duplicate sales, 48h recruitment, inbound SEO/LinkedIn/AI citations, founder productivity. Zero lost leads in 6 months, 15 min/day vs 3-4h before.
7 min
IA
7 patterns for AI agents in production (no demo theater).
Real-world patterns from RAG, agents and MLOps deployments. Senior teams shipping AI from POC to prod across Paris, Montréal, Tokyo.
9 min
GreenOps
GreenOps: seven levers that cut 30% of your cloud bill.
Without sacrificing performance. Concrete cases: -30% on the bill, same SLOs.
6 min
