DSA Scanner
What does your data actually expose?
The DSA Scanner is the wedge / entry product for The Veil Core. Run it on your real data to see exactly what PII your AI pipelines are leaking — names, addresses, IBANs, health data, quasi-identifiers. It is the conversation starter that opens the scoped assessment for The Veil Core.
The Visibility Gap
Most organisations don't know what PII their AI systems actually see. They assume their data is clean, or rely on regex patterns that miss context-dependent identifiers. The gap between what you think is exposed and what actually is exposed creates compliance risk you can't quantify.
Without visibility into what your AI pipelines ingest, you can't assess risk, prove compliance, or prioritise remediation. You need a scan of your real data — not a theoretical assessment.
How the Scanner Works
| Step | What Happens | Where |
|---|---|---|
| 1. Provide Data | Representative records from your systems — CSV, JSON, or ServiceNow export. Data stays on your infrastructure, never uploaded to external services. | Your Environment |
| 2. Three-Layer Detection | Scanner runs known-entity matching (Cologne phonetics + Levenshtein), NER detection (Presidio with spaCy DE/EN), and optionally LLM PII Shield (fine-tuned Qwen 2.5 7B) in parallel. | Scanner Engine |
| 3. Exposure Report | Detailed report showing every PII element found, confidence scores, field-level risk assessment, and remediation recommendations. | HTML Report |
Three Layers of Detection
Layer 1 — Known-Entity Matching
Cologne phonetics and Levenshtein distance matching against known identities. Catches name variations, misspellings, and phonetic equivalents that rule-based systems miss.
Layer 2 — NER Detection
Presidio with spaCy DE/EN models plus custom German recognisers. Detects IBANs, tax IDs, health insurance numbers, addresses, and standard PII categories.
Layer 3 — LLM PII Shield
Fine-tuned Qwen 2.5 7B running on dedicated Ollama instance. Catches context-dependent PII that rule-based systems miss. Runs in parallel with Layer 2 for ensemble detection.
Layers 1–2 run in the initial scan; Layer 3 (LLM PII Shield) runs inside the full Assessment engagement for The Veil Core. The DSA Scanner is not a self-serve SaaS — every scan is scoped via email before it starts.
Book an Assessment