Abbeal

AI

AI agents in production: 7 patterns French banks taught us.

Field notes from BNP, SocGen and a top-3 French insurer. What changes when your agent runs against COBOL cores, ACPR audit logs and a 4 ms p99 budget.

7 min

Most AI-agents-in-production posts you read on Hacker News are written from a US tech-company perspective: greenfield stack, modern data lake, Stripe-like engineering culture. The reality on European banking floors is different. The cores are COBOL. The audit trail is mandated by ACPR. The p99 budget is 4 milliseconds because the trading desk does not negotiate. After two years shipping AI agents inside BNP Paribas, Société Générale and a top-3 French insurer, here are seven patterns that work — and that the usual demos miss.

1. The mainframe is not your enemy

Every consulting deck starts with "first we modernise the legacy." The bank says no — that mainframe processes 800 million transactions per day with five-9s availability and you will not touch it. The pattern that ships: treat the mainframe as a graph node. Wrap COBOL programs in MQ Series adapters, expose them as deterministic tools to the agent. The agent never writes to the core; it reads, reasons, drafts, and a human approves. We deployed this pattern at SocGen for a credit pre-screening assistant. Time-to-decision dropped from 11 minutes to 90 seconds. Mainframe untouched. Auditors happy.

2. ACPR audit trail is not a feature, it is the spine

French banks answer to ACPR (Autorité de contrôle prudentiel et de résolution). Every model decision must be reproducible, every prompt versioned, every output explainable to a human inspector five years from now. Your eval suite cannot live in a Notion page. Build the audit trail first: each agent call writes a structured event with prompt hash, model version, retrieved documents, output, human override status. We use a write-only Postgres table partitioned by month, replicated to S3 Glacier. ACPR walks in, you hand them a SQL endpoint and an explanation document. We have done this twice in production and it turned six-month inspections into two-day reviews.

3. Fallback chains assume your primary model gets banned in Europe tomorrow

The pattern is well known but the European banking risk profile makes it non-negotiable. Anthropic could lose its EU data residency story overnight; OpenAI rate-limits at 40k tokens per minute on a Friday morning when your trading floor needs answers; Mistral is locally compliant but lighter on tool use. We wire three providers in cascade with a shared interface — Claude Sonnet first, GPT-4o second, Mistral Large third — plus circuit breaker, retry budget and cost ceiling per provider. When Anthropic stuttered for 90 minutes in February, BNP's compliance assistant kept answering at 92% of normal quality, on Mistral. The team only learned about the incident from our weekly ops review.

4. RAG over Confluence is theatre. RAG over Filenet is the work

Banks have 30 years of regulatory documents in IBM Filenet, EMC Documentum, on-prem SharePoint clusters with Active Directory ACLs that nobody fully understands. The first job is not vector search — it is a permission-aware indexer that respects Bank Secrecy Act, GDPR, and internal compartmentalisation. We use a two-tier pipeline: extract and chunk in a sandboxed container that never sees the internet, then index into a tenant-scoped vector store with row-level security. Stripe and Anthropic have written excellent posts on RAG architecture; the European banking tax on top is the access-control layer, and it is heavier than the ML.

5. Put the human in the loop, on the latency budget, on purpose

The American pattern: agent acts, log everything, audit later. The French banking pattern: agent drafts, human signs off, agent acts. The cost is one extra second of latency. The benefit is a regulator who lets you ship. At one insurer we instrumented the human approval step itself: how often does the human override the agent? On which document types? What is the time-to-approval distribution? After three months we knew which agent decisions to auto-approve (84% of contract clauses with confidence > 0.95) and which to keep behind a human gate forever (anything touching life insurance underwriting). The ratio shifted by use case, not by some global threshold.

6. Cost attribution is a sales tool, not a finance tool

If you cannot answer "how much did this agent cost the retail division last quarter" by Tuesday morning, the CFO will kill the project before Q4. We log every LLM call with tenant_id, business_unit, feature_id, prompt_hash, input_tokens, output_tokens and computed USD cost. Aggregated nightly into a chargeback table that the controller pulls into the financial planning workbook. The conversation shifts from "AI is too expensive" to "retail spent €18k last month for an agent that closed €420k of additional contracts". That conversation is how a project survives the next budget cycle.

7. Vercel-style preview environments save your roadmap

Banks ship slowly because every release goes through change advisory boards that meet on Tuesdays. Borrow Vercel's preview-deployments pattern: every PR spawns a fully isolated agent stack with synthetic data and shadow traffic. Stakeholders click a link, test on a phone, leave comments. By the time you reach the CAB you have video evidence of business users approving. We cut SocGen's release cycle from six weeks to ten days using this pattern alone, with no compliance compromise. The CAB still meets on Tuesdays. They just have less to argue about.

These seven patterns are what survives contact with French banking reality. None of them are revolutionary; all of them are absent from the standard San-Francisco-flavoured AI agent playbook. If you treat the European context as a feature constraint rather than a friction, you ship.

Working on something similar?

Talk to an architect