PII Protection
VaultProxy automatically detects and anonymizes Polish personally identifiable information (PII) in prompts before forwarding them to AI providers. This page documents every PII type that VaultProxy recognizes.
How It Works
VaultProxy uses a multi-layer detection pipeline:
- HerBERT NER -- A Polish-language Named Entity Recognition model based on HerBERT (a Polish RoBERTa variant) identifies names, organizations, and locations.
- Checksum validation -- Numeric identifiers like PESEL, NIP, and ID card numbers are validated using their built-in checksum algorithms.
- Regex patterns -- Structured data like phone numbers, email addresses, postal codes, and IBAN numbers are matched with regular expressions.
- Dictionary matching -- Street prefixes, city names, and other contextual clues help identify addresses and locations.
Detected PII is replaced with typed placeholders (e.g., [PESEL_1], [IMIĘ_NAZWISKO_1]) before the prompt reaches the AI provider. The original values are held in memory for up to 60 seconds during request processing and are never written to disk or logged.
Supported PII Types
Names (Imię i nazwisko)
| Detection | HerBERT NER + Polish name dictionary |
| Placeholder | [IMIĘ_NAZWISKO_N] |
| Input | Proszę sprawdzić konto Jana Kowalskiego |
| Output | Proszę sprawdzić konto [IMIĘ_NAZWISKO_1] |
Handles Polish inflected forms (Kowalski / Kowalskiego / Kowalskiemu).
PESEL
| Detection | 11-digit pattern + checksum validation (modulo 10) |
| Placeholder | [PESEL_N] |
| Input | PESEL pacjenta: 02271409862 |
| Output | PESEL pacjenta: [PESEL_1] |
Only valid PESEL numbers (passing checksum) are anonymized to avoid false positives.
NIP (Tax ID)
| Detection | 10-digit pattern + weighted checksum validation |
| Placeholder | [NIP_N] |
| Input | NIP firmy: 5261040828 |
| Output | NIP firmy: [NIP_1] |
Supports formats with and without dashes (e.g., 526-104-08-28).
REGON
| Detection | 9 or 14-digit pattern + checksum validation |
| Placeholder | [REGON_N] |
| Input | REGON: 012100784 |
| Output | REGON: [REGON_1] |
ID Card Number (Dowód osobisty)
| Detection | 3 letters + 6 digits pattern + checksum validation |
| Placeholder | [DOWÓD_N] |
| Input | Dowód: ABS123456 |
| Output | Dowód: [DOWÓD_1] |
Passport Number
| Detection | 2 letters + 7 digits (ICAO standard for Polish passports) |
| Placeholder | [PASZPORT_N] |
| Input | Paszport: EA1234567 |
| Output | Paszport: [PASZPORT_1] |
Phone Number
| Detection | Regex for Polish phone formats (+48, 0048, 9-digit) |
| Placeholder | [TELEFON_N] |
| Input | Telefon: +48 600 123 456 |
| Output | Telefon: [TELEFON_1] |
Supports formats: +48600123456, +48 600 123 456, 600-123-456, 48 600 123 456.
Email Address
| Detection | Standard email regex pattern |
| Placeholder | [EMAIL_N] |
| Input | Email: [email protected] |
| Output | Email: [EMAIL_1] |
Address (Adres)
| Detection | Polish address patterns with prefix keywords (ul., al., os., pl.) + building numbers |
| Placeholder | [ADRES_N] |
| Input | Adres: ul. Marszałkowska 10/5, 00-624 Warszawa |
| Output | Adres: [ADRES_1] |
Recognizes common prefixes: ul., ulica, al., aleja, os., osiedle, pl., plac.
Postal Code (Kod pocztowy)
| Detection | Pattern: XX-XXX (2 digits, dash, 3 digits) |
| Placeholder | [KOD_POCZTOWY_N] |
| Input | Kod: 00-624 |
| Output | Kod: [KOD_POCZTOWY_1] |
Date of Birth (Data urodzenia)
| Detection | Date patterns in context of birth/age keywords |
| Placeholder | [DATA_URODZENIA_N] |
| Input | Data urodzenia: 14.02.2002 |
| Output | Data urodzenia: [DATA_URODZENIA_1] |
Supports formats: DD.MM.YYYY, DD-MM-YYYY, DD/MM/YYYY, YYYY-MM-DD.
IBAN (Bank Account)
| Detection | Polish IBAN pattern: PL + 26 digits, with optional spaces |
| Placeholder | [IBAN_N] |
| Input | Konto: PL61 1090 1014 0000 0712 1981 2874 |
| Output | Konto: [IBAN_1] |
Also detects 26-digit Polish account numbers without the PL prefix.
Credit Card Number
| Detection | 16-digit patterns + Luhn algorithm validation |
| Placeholder | [KARTA_N] |
| Input | Karta: 4532 0151 2345 6789 |
| Output | Karta: [KARTA_1] |
Art. 9 RODO Sensitive Data
| Detection | Keyword matching with Polish inflection (see Art. 9 RODO details) |
| Action | Flagged (not anonymized) -- returns a warning in the response |
| Input | Pacjent ma cukrzycę i jest leczony insuliną |
| Output | Request proceeds with a warning flag about sensitive data categories |
Art. 9 data is flagged, not anonymized. VaultProxy detects its presence and adds a warning to the response metadata. Your application should handle this flag according to your compliance requirements.
Multiple PII in a Single Prompt
When a prompt contains multiple PII items, each gets a unique numbered placeholder:
Input:
Klient Jan Kowalski (PESEL 02271409862) mieszka przy ul. Marszałkowska 10,
00-624 Warszawa. Kontakt: +48 600 123 456, [email protected].
NIP firmy: 5261040828.
After anonymization:
Klient [IMIĘ_NAZWISKO_1] (PESEL [PESEL_1]) mieszka przy [ADRES_1],
[KOD_POCZTOWY_1] Warszawa. Kontakt: [TELEFON_1], [EMAIL_1].
NIP firmy: [NIP_1].
Configuration
You can customize PII detection settings through the API:
# Get current PII settings
curl https://api.vaultproxy.ai/v1/settings/pii \
-H "Authorization: Bearer vpx_live_YOUR_API_KEY"
# Update settings (e.g., disable specific PII types)
curl -X PATCH https://api.vaultproxy.ai/v1/settings/pii \
-H "Authorization: Bearer vpx_live_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"enabled_types": ["pesel", "nip", "names", "phone", "email"],
"art9_detection": true,
"art9_action": "flag"
}'
Next Steps
- Art. 9 RODO -- Details on sensitive data detection
- Security -- How VaultProxy handles data securely
- API Reference -- Full endpoint documentation