Privacy-Preserving AI for Financial Services: AML Without Identity Exposure
AML transaction monitoring requires AI to analyze behavioral patterns. It does not require the AI to know customer names, account numbers, or addresses. Here's how split-knowledge architecture enables compliant AML without identity exposure.
What AML Transaction Monitoring Actually Needs
An AML model evaluates transaction patterns: amounts, frequencies, merchant categories, geographic distribution, velocity changes, structuring indicators. It flags anomalies — a sudden spike in cross-border transfers, transactions just below reporting thresholds, circular fund flows.
None of this requires the model to know that the account belongs to Marc Schuelke at Hauptstrasse 12, Fuerth. The model needs a consistent identifier to track behavioural patterns over time. It does not need that identifier to be a name, IBAN, or customer number.
This is the gap that split-knowledge architecture fills. The AI processes transaction patterns tagged to opaque tokens. It scores risk on behaviour, not identity. When the model flags suspicious activity, a human investigator triggers a controlled re-linkage to identify the customer for SAR filing.
How Transactions Enter Sandbox B
When a financial institution deploys dual-sandbox architecture for AML monitoring, the transaction flow works as follows:
- A customer initiates a transaction. The core banking system holds the full record: account number, customer name, counterparty details, amount, timestamp.
- The transaction reaches the Gateway (dsa-edge). The Gateway calls Sandbox A (dsa-identity) to verify the customer's authentication status.
- The Gateway calls the ID Bridge (dsa-bridge) to obtain a purpose-scoped token for AML monitoring. The token
tok_m2p8q1r7(purpose:fraud) has no mathematical relationship to the customer's identity or to tokens issued for other purposes like product recommendations. - The Gateway constructs a pseudonymised transaction record: token, amount, merchant category, geographic region (generalised — country level, not street address), timestamp, and transaction type. Account numbers, IBANs, and counterparty names are stripped.
- The pseudonymised record flows to Sandbox B (dsa-ai), where the AML model ingests it alongside the token's historical transaction patterns stored in the feature store (Redis).
- Sandbox B scores the transaction against its behavioural model and returns a risk assessment tagged to the token.
At no point does Sandbox B see customer names, account numbers, addresses, or any direct identifier. The model operates on behavioural features and opaque tokens.
Risk Scoring on Tokens, Not People
The AML model in Sandbox B maintains behavioural profiles keyed to tokens. For tok_m2p8q1r7, the feature store holds:
- Transaction velocity: 14 transactions in the past 7 days (baseline: 8)
- Amount distribution: 73% of transactions between EUR 9,000 and EUR 9,900 (just below the EUR 10,000 reporting threshold)
- Geographic spread: Transactions in 6 countries in 30 days (baseline: 2)
- Merchant category concentration: 88% money transfer services
- Temporal pattern: 91% of transactions occur between 02:00 and 04:00 local time
The model scores this pattern as high risk (0.91). It does not know whether tok_m2p8q1r7 is a student, a business owner, or a politically exposed person. It evaluates the behaviour, not the person.
This separation has a regulatory advantage. GDPR Art. 22(1) restricts automated decisions that produce "legal effects" or "similarly significantly affect" individuals. An AML risk score attached to an opaque token is not, by itself, a decision about an identified person. The decision about the person — filing a SAR, freezing an account — happens only after re-linkage, with human involvement.
Re-Linkage for SAR Filing: Four-Eyes Approval
When Sandbox B flags tok_m2p8q1r7 with a risk score above the investigation threshold, the re-linkage protocol activates:
Step 1 — Investigation request: A compliance analyst reviews the pseudonymised behavioural data in Sandbox B's investigation dashboard. They see the transaction patterns, the risk score breakdown, and the model's feature attribution. They determine that a SAR investigation is warranted.
Step 2 — Re-linkage request: The analyst submits a re-linkage request to the ID Bridge, specifying:
- Their analyst credentials
- The token:
tok_m2p8q1r7 - The legal basis: AML investigation, case reference
AML-2026-0847 - The applicable regulation: AMLD6 Art. 33 (obligation to report suspicious transactions)
Step 3 — Four-eyes approval: The Bridge's policy engine verifies:
- The analyst holds the
aml_investigatorrole - Case
AML-2026-0847exists in the compliance case management system - The token is flagged with a risk score above the investigation threshold
- A second authorised person (MLRO or deputy) has approved the re-linkage (four-eyes principle required by many national FIU guidelines)
Step 4 — Time-limited identity disclosure: The Bridge returns the customer identity with a 30-minute expiry window. The analyst can now complete the SAR with full customer details as required by the Financial Intelligence Unit.
Step 5 — Immutable audit trail: The re-linkage event is logged with: who requested, who approved, which token, which case, timestamp, and expiry. This audit trail satisfies both GDPR accountability (Art. 5(2)) and AMLD6 record-keeping requirements (Art. 40).
Regulatory Alignment: Three Frameworks, One Architecture
AMLD6 (Anti-Money Laundering Directive 6)
AMLD6 Art. 33 requires obliged entities to report suspicious transactions to their national Financial Intelligence Unit. Art. 40 requires record-keeping of customer due diligence data and transaction records for five years. Art. 42 requires that personal data processed for AML purposes be retained only as long as necessary and subject to appropriate safeguards.
Split-knowledge architecture satisfies all three: the re-linkage protocol enables SAR filing (Art. 33), Sandbox A retains KYC data and Sandbox B retains pseudonymised transaction records for the required period (Art. 40), and the architectural separation is itself the "appropriate safeguard" for retained data (Art. 42).
DORA (Digital Operational Resilience Act)
DORA Art. 6 requires financial entities to maintain an ICT risk management framework. Art. 9 mandates protection and prevention measures. Art. 11 requires ICT response and recovery plans.
The dual-sandbox architecture maps directly to DORA requirements:
| DORA Requirement | Article | How Split-Knowledge Addresses It |
|---|---|---|
| ICT risk management framework | Art. 6 | Each sandbox has a defined risk boundary. Risk assessment is modular — identity risk in A, AI model risk in B, correlation risk in Bridge. |
| Network security management | Art. 9(2) | Kubernetes NetworkPolicies enforce sandbox isolation at the infrastructure layer. No configuration drift — policy-as-code via Helm charts. |
| Detection of anomalous activities | Art. 10 | Independent monitoring per sandbox. Cross-sandbox correlation anomalies (timing attacks, access pattern matching) detected at the Bridge. |
| ICT response and recovery | Art. 11 | Each sandbox can be recovered independently. Bridge token rotation is the primary containment mechanism for cross-sandbox incidents. |
| Threat-led penetration testing | Art. 26 | Each sandbox can be pen-tested independently. Testing Sandbox B does not require access to identity data. Reduced scope, focused tests. |
GDPR
For financial services AML processing, three GDPR provisions are critical:
- Art. 6(1)(c) — Legal obligation: AML monitoring is a legal obligation under AMLD6. This provides the lawful basis for processing transaction data. Split-knowledge architecture strengthens this basis: the AI processes only what is necessary (pseudonymised behavioural data), satisfying the proportionality principle.
- Art. 25 — Data protection by design: The architecture is the DPDD implementation. Pseudonymisation is not a policy choice — it is a structural property enforced by network isolation.
- Art. 35 — DPIA: A Data Protection Impact Assessment for the AML system benefits from the architecture's clear boundaries. The DPIA for Sandbox B covers pseudonymised data processing. The DPIA for re-linkage covers the controlled identity disclosure. Each assessment is narrower and more precise than a monolithic system DPIA.
Breach Benefit Analysis: Finance-Specific
Financial services face the most severe regulatory consequences for data breaches. Consider the difference:
Monolithic AML system breach: The attacker obtains customer names, account numbers, transaction histories, AML risk scores, SAR filing status, PEP flags, and beneficial ownership records — all linked. This is a catastrophic breach. GDPR Art. 34 notification to all affected individuals is almost certainly required. The institution faces reputational damage, regulatory sanction, potential AMLD6 violations (disclosure of SAR status is a criminal offence in most EU jurisdictions under tipping-off provisions), and customer lawsuits.
Sandbox B breach: The attacker obtains transaction patterns and AML risk scores tagged to opaque tokens. They see that tok_m2p8q1r7 has a risk score of 0.91 and a pattern consistent with structuring. They cannot determine who this person is. No customer names, no account numbers, no addresses. The breach risk assessment under GDPR Art. 33 is materially different — pseudonymised data with intact safeguards may not trigger individual notification under Art. 34. No tipping-off violation is possible because the SAR filing status is not linked to an identifiable person in the breached system.
Sandbox A breach: The attacker obtains KYC data — names, addresses, ID documents. Serious, but they get no transaction histories, no AML risk scores, no SAR status. Standard identity breach protocols apply. The attacker cannot determine which customers are under investigation.
Bridge breach: The worst case — the attacker can link tokens to identities. Immediate response: rotate all tokens via the Bridge, invalidating old mappings. The Bridge's HSM-protected encryption, network isolation (no internet-facing access), and multi-party key management (2-of-3 quorum for administrative access) make this the hardest target in the architecture.
The net effect: split-knowledge architecture transforms a single catastrophic breach scenario into three contained scenarios, each with a smaller blast radius and a clearer response playbook.