March 21, 2026|Updated March 23, 2026

GDPR Article 25 in Practice: How We Built Privacy Into AI Infrastructure

Article 25 demands data protection by design — not as an afterthought. Here's how we mapped each requirement to specific infrastructure decisions: NetworkPolicies, column-level encryption, row-level security, and pseudonymous token bridges.

GDPRArticle-25privacy-by-designinfrastructureKubernetes

What Article 25 Actually Requires

GDPR Article 25(1) requires the controller to implement "appropriate technical and organisational measures, such as pseudonymisation, which are designed to implement data-protection principles [...] in an effective manner and to integrate the necessary safeguards into the processing." These measures must be determined "taking into account the state of the art, the cost of implementation and the nature, scope, context and purposes of processing."

Article 25(2) adds a default-minimisation rule: the controller must ensure that "by default, only personal data which are necessary for each specific purpose of the processing are processed." This applies to the amount of data collected, the extent of processing, the period of storage, and accessibility.

Recital 78 clarifies the intent: controllers should adopt "internal policies and implement measures which meet in particular the principles of data protection by design and data protection by default," including "minimising the processing of personal data, pseudonymising personal data as soon as possible, [and] transparency with regard to the functions and processing of personal data."

Most organisations treat these as policy requirements. We treated them as architecture requirements.

The Split-Knowledge Approach

The Veil enforces a structural invariant: Sandbox A knows WHO (identity, PII). Sandbox B knows WHAT (AI inferences, behavioural data). They cannot communicate directly. The only link between them is the ID Bridge, which issues opaque, non-reversible pseudonymous tokens.

This is not application-level redaction. A redaction bug leaks PII. Our separation is enforced at the Kubernetes network layer -- the AI processing environment has no network route to the identity vault. The enforcement mechanism is infrastructure, not code.

Mapping Article 25 to Infrastructure

Article 25 Requirement	The Veil Implementation	Specific Technology
Pseudonymisation "as soon as possible" (Recital 78)	Gateway strips identity before forwarding to AI sandbox; only opaque tokens reach Sandbox B	`ContextValidator` in Gateway with regex-based PII detection for email, SSN, IBAN, credit card, phone patterns
Data minimisation by default (Art. 25(2))	Sandbox B receives only whitelisted context keys per vertical and use case; all other fields rejected	Per-vertical `ContextConfig` with `allowed_context_keys` map and `max_context_value_length` limits
Technical measures at time of processing (Art. 25(1))	Identity and AI sandboxes run in separate Kubernetes namespaces with deny-all default NetworkPolicies	`default-deny-all` NetworkPolicy deployed to every namespace; explicit egress rules per service
Integrate safeguards into processing (Art. 25(1))	Column-level encryption on high-sensitivity fields; row-level security isolates tenant data	AES-256-GCM with versioned key prefix in `crypto.go`; PostgreSQL RLS policy `vertical_isolation` on `identities` table
Limit accessibility by default (Art. 25(2))	Sandbox B egress restricted to audit service only; no path to identity data	`ai-egress` NetworkPolicy allows only `dsa-audit:50051` and intra-namespace traffic
State of the art measures (Art. 25(1))	Infrastructure-level enforcement rather than application-level controls	Kubernetes NetworkPolicy with `namespaceSelector` and per-port restrictions across 7 namespaces

How the Gateway Strips PII

The Gateway service (dsa-edge) is the only component that sees both the user's identity and the inference token. It never forwards both to the same downstream service. Before any request reaches Sandbox B, two layers of PII filtering run:

Context key whitelisting. The ContextValidator maintains a map of vertical -> use_case -> allowed keys. When a client submits an inference request, any context key not in the whitelist is rejected outright. If no whitelist exists for a vertical/use_case pair, the validator operates fail-closed -- all non-empty context is refused.

PII pattern scanning. Every context value is scanned against compiled regex patterns for emails, phone numbers, SSNs, credit card numbers, and IBANs. The scanner also normalises values through URL-decoding and base64-decoding to catch encoding-based bypass attempts. Non-ASCII characters trigger an immediate rejection because Unicode lookalikes can evade ASCII regex patterns.

On the return path, a separate DLP scanner (dlp.Scanner) checks AI-generated responses before they reach the client. If the LLM hallucinates PII from training data, the scanner replaces matches with tagged placeholders like [REDACTED-EMAIL] or [REDACTED-IBAN], flags dlp_redacted: true in the response, and emits an audit event with pattern match counts.

Column-Level Encryption in the Identity Vault

When we designed the identity vault (Sandbox A), we chose AES-256-GCM for column-level encryption of sensitive fields. The sensitive_fields column in the identities table stores encrypted bytes with a format of [version:1byte][nonce:12bytes][ciphertext+tag]. The version prefix enables key rotation without re-encrypting all rows at once -- new writes use the current key version, and reads check the version byte to select the correct decryption key.

For searchability on encrypted fields, the system computes HMAC-SHA256 blind indices. The email_hash and sensitive_hash columns store these values, allowing exact-match lookups without decrypting every row. This satisfies Article 25's data minimisation principle: the database can locate a record by its blind index without exposing the plaintext to the query engine.

Row-Level Security for Tenant Isolation

PostgreSQL row-level security enforces vertical (tenant) isolation directly at the database layer:

`sql

ALTER TABLE identities ENABLE ROW LEVEL SECURITY;

CREATE POLICY vertical_isolation ON identities

USING (vertical = current_setting('app.current_vertical', true));

ALTER TABLE identities FORCE ROW LEVEL SECURITY;

The FORCE ROW LEVEL SECURITY directive applies even to the table owner, providing defence-in-depth. If app.current_vertical is not set in the session, no rows are visible -- a safe default that prevents accidental cross-tenant data exposure. This maps directly to Article 25(2): data accessibility is restricted by default, not by policy.

Network Isolation: The Core Invariant

The Helm deployment includes 16 NetworkPolicy resources across 7 namespaces (dsa-edge, dsa-identity, dsa-bridge, dsa-ai, dsa-audit, dsa-observability, dsa-ingest). Every namespace starts with a default-deny-all policy that blocks all ingress and egress. Services then receive explicit, minimal egress rules.

The critical policies:

`dsa-identity` (Sandbox A) can reach dsa-bridge:50052 and dsa-audit:50051. It has no egress rule to dsa-ai. This is the core invariant: the identity vault cannot send data to the AI sandbox.
`dsa-ai` (Sandbox B) can reach dsa-audit:50051 and intra-namespace pods (Redis, Ollama). It has no egress rule to dsa-identity. The AI sandbox cannot query identity data.
`dsa-bridge` can reach dsa-ai only on port 50055 -- a dedicated token rotation gRPC endpoint. It cannot reach port 50054 (the inference endpoint), enforcing least-privilege even within the bridge-to-AI path.
`dsa-ai` external egress for cloud LLM providers is gated behind global.llmEgressEnabled and restricted to port 443 with RFC 1918 and link-local addresses excluded, blocking cloud metadata endpoint access at 169.254.0.0/16.

A Market Moving Toward Infrastructure Enforcement

The market is validating this direction. Gartner projects AI governance platforms at $492M in 2026, surpassing $1B by 2030. Confidential computing — processing data inside hardware-encrypted enclaves — is projected at $59.4B by 2028. A leader in that space raised $90M from Goldman Sachs with 150+ enterprise customers, proving that enterprises demand infrastructure-level guarantees over contractual promises.

GDPR enforcement is accelerating in parallel: €2.3B in fines in 2025 alone, a 38% increase year-over-year. Meta's €240M fine for Article 25 design-phase failures — before any breach occurred — set the precedent that architecture choices themselves are enforceable. France's CNIL is building tools (PANAME project) to detect PII in trained models, moving toward infrastructure validation rather than accepting software assurances.

The question is no longer whether infrastructure-level enforcement will be expected. It's whether your architecture will be ready when auditors start asking for it.

What This Means for Auditors

When an auditor asks "how do you implement data protection by design?", the answer is not a policy document. It is a set of Kubernetes manifests, Go source files, and SQL migrations that structurally prevent the AI system from accessing identity data. The verification scope shrinks: instead of auditing every application code path for PII leakage, the auditor checks that NetworkPolicies are deployed and that the identity vault's encryption and RLS migrations have run.

Article 25 says "appropriate technical and organisational measures." We chose to make the technical measures architectural -- enforced by infrastructure that does not depend on developers remembering to call the right function.