Architecture Deep Dive
How The Veil Works
The Veil splits every AI workflow into two network-isolated sandboxes. Sandbox A holds identity data. Sandbox B runs AI inference. They cannot communicate directly — an opaque-token bridge is the only link. The AI model never knows who the data belongs to.
Last Updated: April 2026
The Core Invariant
No single component holds both identity and AI insights. This is not a policy — it is an infrastructure constraint.
- ASandbox A knows WHO — names, emails, account numbers, authentication credentials. It has zero visibility into AI model outputs or behavioural inferences.
- BSandbox B knows WHAT — risk scores, sentiment analysis, clinical predictions. It receives only opaque pseudonymous tokens. It cannot resolve those tokens to identities.
Kubernetes NetworkPolicies and Docker network segmentation enforce this boundary. A compromised application process in Sandbox B cannot open a TCP connection to Sandbox A. The network layer drops the packets.
Why Not Just Redact?
Redaction is a software promise. A single regex miss, an edge-case encoding, or an overlooked log statement leaks PII into the AI pipeline. The system is only as strong as every line of redaction code across every release.
The Veil replaces that software promise with an infrastructure guarantee:
| Approach | Failure mode | Blast radius |
|---|---|---|
| Redaction | Bug in regex / serializer | Full PII exposed to AI model |
| The Veil | Application-level bug | Network blocks the connection — AI still cannot reach identity store |
This distinction matters most when the AI is an external cloud provider, when multiple parties need access (auditors, consultants), or when regulation demands architectural enforcement — as GDPR Article 25 and EU AI Act Article 10 do.
The Data Flow
Every request follows the same eight-step path. Identity and inference never share a network hop.
- 1User submits a request to the Gateway (the only public endpoint).
- 2Gateway PII firewall strips identity fields from the payload before any downstream forwarding.
- 3Sandbox A authenticates the user via OIDC, validates authorisation, and issues an opaque pseudonymous token through the ID Bridge.
- 4Gateway forwards the request to Sandbox B carrying only the opaque token and the context payload — no names, no emails, no account numbers.
- 5Sandbox B runs inference (LLM, scoring model, classifier). It processes context against the token. It cannot identify anyone.
- 6Sandbox B returns its result tagged to the pseudonymous token.
- 7Gateway maps the token back to the authenticated user session via the ID Bridge. This is the only point where token and identity coexist — in-memory, never persisted together.
- 8User sees the final result. The AI model never learned who they are.
Components
Gatewaydsa-edge
REST edge service and PII firewall. The only component that touches both the user session and the inference path — but it never forwards both identity and token to the same downstream service. In-memory session cache, wait-polling for async results, graceful shutdown.
Sandbox A — Identity Vaultdsa-identity
Stores and protects identity data. PostgreSQL with row-level security, column-level encryption, OIDC provider for authentication, and DSAR/erasure endpoints for GDPR compliance. Has no network route to Sandbox B.
ID Bridgedsa-bridge
Generates opaque pseudonymous tokens (HMAC-SHA256 with random nonce) with no derivable relationship to real identities. Manages epoch-based token rotation (configurable, default 7 days) with 14-day grace periods ensuring zero downtime. Master keys integrate with HashiCorp Vault, AWS KMS, or Azure Key Vault.
Re-linkage — mapping a token back to a real identity — is the most sensitive operation in the platform. Every re-linkage request must pass through a seven-step governance chain:
- 1Legal basis citation — the requester must specify the legal ground (e.g., GDPR Art. 6(1)(c), clinical emergency).
- 2Case reference — a traceable case ID linking the request to a specific business need.
- 3Requester role validation — only designated roles can initiate re-linkage.
- 4Jurisdiction check — re-linkage can be restricted by legal jurisdiction, enforcing data sovereignty.
- 5Four-eyes approval — a second independently authorised individual must approve.
- 6Time-limited access — re-linkage grants access to specific attributes for a configurable window, not permanent access.
- 7Mandatory post-access review — within a configurable period (default 7 days), the access event must be reviewed and signed off.
A break-glass mechanism exists for clinical or operational emergencies. Break-glass bypasses the four-eyes requirement but triggers: elevated audit logging, automatic DPO notification, and mandatory post-access review within 24 hours.
Sandbox B — AI Processingdsa-ai
Async inference engine with pluggable LLM adapters — Anthropic Claude, OpenAI, Mistral, or local models via Ollama. Redis job queue for async workloads. Receives only opaque tokens and context. Has no network route to Sandbox A or the identity database.
Audit Servicedsa-audit
Decision records are appended to a cryptographic hash chain. The application role has INSERT and SELECT only. Deletion is permitted only for GDPR Article 17 erasure requests and retention expiry, executed through audited security-definer functions that write a signed erasure event to a separate append-only log alongside every deletion.
Compliance Mapping
The Veil was designed against specific regulatory requirements, not retrofitted after the fact.
| Regulation | Requirement | The Veil Mechanism |
|---|---|---|
| GDPR Art. 25 | Data protection by design and by default | Pseudonymisation is the default processing mode. Identity isolation is enforced at the network layer. Data minimisation is structural — Sandbox B cannot access more data than needed. |
| GDPR Art. 32 | Security of processing | Column-level encryption in Sandbox A. Network-level isolation via Kubernetes NetworkPolicies (16 policies across 7 namespaces). Append-only audit hash chain. Token rotation with configurable policies. |
| EU AI Act Art. 10 | Data governance for high-risk AI systems | Training and inference data passes through the Gateway PII firewall. Sandbox B only receives pseudonymised context. Full data lineage via the audit hash chain. AI-provider agnostic — swap models without changing the privacy posture. |
| EU AI Act Art. 15 | Accuracy, robustness, and cybersecurity | Infrastructure-level isolation limits the attack surface for adversarial inputs. Pluggable LLM adapters allow model validation without architectural changes. Append-only logs support post-deployment monitoring and anomaly detection. |
Breach Scenario: What an Attacker Gets
Assume full compromise of a single sandbox. What does the attacker walk away with?
| Compromised Component | Attacker obtains | Attacker cannot obtain |
|---|---|---|
| Sandbox A | Names, emails, account numbers, authentication credentials | AI outputs, risk scores, behavioural inferences, model parameters — none of this exists in Sandbox A |
| Sandbox B | AI outputs, pseudonymous tokens, model responses | Real identities — tokens are opaque and non-reversible without the ID Bridge |
| ID Bridge | Token-to-identity mappings | AI outputs or inference results — the Bridge never receives them. An attacker still needs Sandbox B data to build a linked profile. |
No single compromised component produces a linked profile of "person X has risk score Y." An attacker must breach multiple isolated systems across separate network boundaries, each with independent credentials, to reconstruct that link.