GDPR Article 25 in Practice: How We Built Privacy Into AI Infrastructure
Article 25 demands data protection by design — not as an afterthought. Here's how we mapped each requirement to specific infrastructure decisions: NetworkPolicies, column-level encryption, row-level security, and pseudonymous token bridges.
What Article 25 Actually Requires
GDPR Article 25(1) requires the controller to implement "appropriate technical and organisational measures, such as pseudonymisation, which are designed to implement data-protection principles [...] in an effective manner and to integrate the necessary safeguards into the processing." These measures must be determined "taking into account the state of the art, the cost of implementation and the nature, scope, context and purposes of processing."
Article 25(2) adds a default-minimisation rule: the controller must ensure that "by default, only personal data which are necessary for each specific purpose of the processing are processed." This applies to the amount of data collected, the extent of processing, the period of storage, and accessibility.
Recital 78 clarifies the intent: controllers should adopt "internal policies and implement measures which meet in particular the principles of data protection by design and data protection by default," including "minimising the processing of personal data, pseudonymising personal data as soon as possible, [and] transparency with regard to the functions and processing of personal data."
Most organisations treat these as policy requirements. We treated them as architecture requirements.
The Split-Knowledge Approach
The Veil enforces a structural invariant: Sandbox A knows WHO (identity, PII). Sandbox B knows WHAT (AI inferences, behavioural data). They cannot communicate directly. The only link between them is the ID Bridge, which issues opaque, non-reversible pseudonymous tokens.
This is not application-level redaction. A redaction bug leaks PII. Our separation is enforced at the Kubernetes network layer -- the AI processing environment has no network route to the identity vault. The enforcement mechanism is infrastructure, not code.
Mapping Article 25 to Infrastructure
| Article 25 Requirement | The Veil Implementation | Specific Technology |
|---|---|---|
| Pseudonymisation "as soon as possible" (Recital 78) | Gateway strips identity before forwarding to AI sandbox; only opaque tokens reach Sandbox B | ContextValidator in Gateway with regex-based PII detection for email, SSN, IBAN, credit card, phone patterns |
| Data minimisation by default (Art. 25(2)) | Sandbox B receives only whitelisted context keys per vertical and use case; all other fields rejected | Per-vertical ContextConfig with allowed_context_keys map and max_context_value_length limits |
| Technical measures at time of processing (Art. 25(1)) | Identity and AI sandboxes run in separate Kubernetes namespaces with deny-all default NetworkPolicies | default-deny-all NetworkPolicy deployed to every namespace; explicit egress rules per service |
| Integrate safeguards into processing (Art. 25(1)) | Column-level encryption on high-sensitivity fields; row-level security isolates tenant data | AES-256-GCM with versioned key prefix in crypto.go; PostgreSQL RLS policy vertical_isolation on identities table |
| Limit accessibility by default (Art. 25(2)) | Sandbox B egress restricted to audit service only; no path to identity data | ai-egress NetworkPolicy allows only dsa-audit:50051 and intra-namespace traffic |
| State of the art measures (Art. 25(1)) | Infrastructure-level enforcement rather than application-level controls | Kubernetes NetworkPolicy with namespaceSelector and per-port restrictions across 7 namespaces |
How the Gateway Strips PII
The Gateway service (dsa-edge) is the only component that sees both the user's identity and the inference token. It never forwards both to the same downstream service. Before any request reaches Sandbox B, two layers of PII filtering run:
- Context key whitelisting. The
ContextValidatormaintains a map ofvertical -> use_case -> allowed keys. When a client submits an inference request, any context key not in the whitelist is rejected outright. If no whitelist exists for a vertical/use_case pair, the validator operates fail-closed -- all non-empty context is refused.
- PII pattern scanning. Every context value is scanned against compiled regex patterns for emails, phone numbers, SSNs, credit card numbers, and IBANs. The scanner also normalises values through URL-decoding and base64-decoding to catch encoding-based bypass attempts. Non-ASCII characters trigger an immediate rejection because Unicode lookalikes can evade ASCII regex patterns.
On the return path, a separate DLP scanner (dlp.Scanner) checks AI-generated responses before they reach the client. If the LLM hallucinates PII from training data, the scanner replaces matches with tagged placeholders like [REDACTED-EMAIL] or [REDACTED-IBAN], flags dlp_redacted: true in the response, and emits an audit event with pattern match counts.
Column-Level Encryption in the Identity Vault
When we designed the identity vault (Sandbox A), we chose AES-256-GCM for column-level encryption of sensitive fields. The sensitive_fields column in the identities table stores encrypted bytes with a format of [version:1byte][nonce:12bytes][ciphertext+tag]. The version prefix enables key rotation without re-encrypting all rows at once -- new writes use the current key version, and reads check the version byte to select the correct decryption key.
For searchability on encrypted fields, the system computes HMAC-SHA256 blind indices. The email_hash and sensitive_hash columns store these values, allowing exact-match lookups without decrypting every row. This satisfies Article 25's data minimisation principle: the database can locate a record by its blind index without exposing the plaintext to the query engine.
Row-Level Security for Tenant Isolation
PostgreSQL row-level security enforces vertical (tenant) isolation directly at the database layer:
`sql
ALTER TABLE identities ENABLE ROW LEVEL SECURITY;
CREATE POLICY vertical_isolation ON identities
USING (vertical = current_setting('app.current_vertical', true));
ALTER TABLE identities FORCE ROW LEVEL SECURITY;
`
The FORCE ROW LEVEL SECURITY directive applies even to the table owner, providing defence-in-depth. If app.current_vertical is not set in the session, no rows are visible -- a safe default that prevents accidental cross-tenant data exposure. This maps directly to Article 25(2): data accessibility is restricted by default, not by policy.
Network Isolation: The Core Invariant
The Helm deployment includes 16 NetworkPolicy resources across 7 namespaces (dsa-edge, dsa-identity, dsa-bridge, dsa-ai, dsa-audit, dsa-observability, dsa-ingest). Every namespace starts with a default-deny-all policy that blocks all ingress and egress. Services then receive explicit, minimal egress rules.
The critical policies:
- `dsa-identity` (Sandbox A) can reach
dsa-bridge:50052anddsa-audit:50051. It has no egress rule todsa-ai. This is the core invariant: the identity vault cannot send data to the AI sandbox. - `dsa-ai` (Sandbox B) can reach
dsa-audit:50051and intra-namespace pods (Redis, Ollama). It has no egress rule todsa-identity. The AI sandbox cannot query identity data. - `dsa-bridge` can reach
dsa-aionly on port50055-- a dedicated token rotation gRPC endpoint. It cannot reach port50054(the inference endpoint), enforcing least-privilege even within the bridge-to-AI path. - `dsa-ai` external egress for cloud LLM providers is gated behind
global.llmEgressEnabledand restricted to port 443 with RFC 1918 and link-local addresses excluded, blocking cloud metadata endpoint access at169.254.0.0/16.
A Market Moving Toward Infrastructure Enforcement
The market is validating this direction. Gartner projects AI governance platforms at $492M in 2026, surpassing $1B by 2030. Confidential computing — processing data inside hardware-encrypted enclaves — is projected at $59.4B by 2028. A leader in that space raised $90M from Goldman Sachs with 150+ enterprise customers, proving that enterprises demand infrastructure-level guarantees over contractual promises.
GDPR enforcement is accelerating in parallel: €2.3B in fines in 2025 alone, a 38% increase year-over-year. Meta's €240M fine for Article 25 design-phase failures — before any breach occurred — set the precedent that architecture choices themselves are enforceable. France's CNIL is building tools (PANAME project) to detect PII in trained models, moving toward infrastructure validation rather than accepting software assurances.
The question is no longer whether infrastructure-level enforcement will be expected. It's whether your architecture will be ready when auditors start asking for it.
What This Means for Auditors
When an auditor asks "how do you implement data protection by design?", the answer is not a policy document. It is a set of Kubernetes manifests, Go source files, and SQL migrations that structurally prevent the AI system from accessing identity data. The verification scope shrinks: instead of auditing every application code path for PII leakage, the auditor checks that NetworkPolicies are deployed and that the identity vault's encryption and RLS migrations have run.
Article 25 says "appropriate technical and organisational measures." We chose to make the technical measures architectural -- enforced by infrastructure that does not depend on developers remembering to call the right function.