Skip to main content

PII Protection

VaultProxy automatically detects and anonymizes Polish personally identifiable information (PII) in prompts before forwarding them to AI providers. This page documents every PII type that VaultProxy recognizes.

How It Works

VaultProxy uses a multi-layer detection pipeline:

  1. HerBERT NER -- A Polish-language Named Entity Recognition model based on HerBERT (a Polish RoBERTa variant) identifies names, organizations, and locations.
  2. Checksum validation -- Numeric identifiers like PESEL, NIP, and ID card numbers are validated using their built-in checksum algorithms.
  3. Regex patterns -- Structured data like phone numbers, email addresses, postal codes, and IBAN numbers are matched with regular expressions.
  4. Dictionary matching -- Street prefixes, city names, and other contextual clues help identify addresses and locations.

Detected PII is replaced with typed placeholders (e.g., [PESEL_1], [IMIĘ_NAZWISKO_1]) before the prompt reaches the AI provider. The original values are held in memory for up to 60 seconds during request processing and are never written to disk or logged.

Supported PII Types

Names (Imię i nazwisko)

DetectionHerBERT NER + Polish name dictionary
Placeholder[IMIĘ_NAZWISKO_N]
InputProszę sprawdzić konto Jana Kowalskiego
OutputProszę sprawdzić konto [IMIĘ_NAZWISKO_1]

Handles Polish inflected forms (Kowalski / Kowalskiego / Kowalskiemu).

PESEL

Detection11-digit pattern + checksum validation (modulo 10)
Placeholder[PESEL_N]
InputPESEL pacjenta: 02271409862
OutputPESEL pacjenta: [PESEL_1]

Only valid PESEL numbers (passing checksum) are anonymized to avoid false positives.

NIP (Tax ID)

Detection10-digit pattern + weighted checksum validation
Placeholder[NIP_N]
InputNIP firmy: 5261040828
OutputNIP firmy: [NIP_1]

Supports formats with and without dashes (e.g., 526-104-08-28).

REGON

Detection9 or 14-digit pattern + checksum validation
Placeholder[REGON_N]
InputREGON: 012100784
OutputREGON: [REGON_1]

ID Card Number (Dowód osobisty)

Detection3 letters + 6 digits pattern + checksum validation
Placeholder[DOWÓD_N]
InputDowód: ABS123456
OutputDowód: [DOWÓD_1]

Passport Number

Detection2 letters + 7 digits (ICAO standard for Polish passports)
Placeholder[PASZPORT_N]
InputPaszport: EA1234567
OutputPaszport: [PASZPORT_1]

Phone Number

DetectionRegex for Polish phone formats (+48, 0048, 9-digit)
Placeholder[TELEFON_N]
InputTelefon: +48 600 123 456
OutputTelefon: [TELEFON_1]

Supports formats: +48600123456, +48 600 123 456, 600-123-456, 48 600 123 456.

Email Address

DetectionStandard email regex pattern
Placeholder[EMAIL_N]
InputEmail: [email protected]
OutputEmail: [EMAIL_1]

Address (Adres)

DetectionPolish address patterns with prefix keywords (ul., al., os., pl.) + building numbers
Placeholder[ADRES_N]
InputAdres: ul. Marszałkowska 10/5, 00-624 Warszawa
OutputAdres: [ADRES_1]

Recognizes common prefixes: ul., ulica, al., aleja, os., osiedle, pl., plac.

Postal Code (Kod pocztowy)

DetectionPattern: XX-XXX (2 digits, dash, 3 digits)
Placeholder[KOD_POCZTOWY_N]
InputKod: 00-624
OutputKod: [KOD_POCZTOWY_1]

Date of Birth (Data urodzenia)

DetectionDate patterns in context of birth/age keywords
Placeholder[DATA_URODZENIA_N]
InputData urodzenia: 14.02.2002
OutputData urodzenia: [DATA_URODZENIA_1]

Supports formats: DD.MM.YYYY, DD-MM-YYYY, DD/MM/YYYY, YYYY-MM-DD.

IBAN (Bank Account)

DetectionPolish IBAN pattern: PL + 26 digits, with optional spaces
Placeholder[IBAN_N]
InputKonto: PL61 1090 1014 0000 0712 1981 2874
OutputKonto: [IBAN_1]

Also detects 26-digit Polish account numbers without the PL prefix.

Credit Card Number

Detection16-digit patterns + Luhn algorithm validation
Placeholder[KARTA_N]
InputKarta: 4532 0151 2345 6789
OutputKarta: [KARTA_1]

Art. 9 RODO Sensitive Data

DetectionKeyword matching with Polish inflection (see Art. 9 RODO details)
ActionFlagged (not anonymized) -- returns a warning in the response
InputPacjent ma cukrzycę i jest leczony insuliną
OutputRequest proceeds with a warning flag about sensitive data categories
warning

Art. 9 data is flagged, not anonymized. VaultProxy detects its presence and adds a warning to the response metadata. Your application should handle this flag according to your compliance requirements.

Multiple PII in a Single Prompt

When a prompt contains multiple PII items, each gets a unique numbered placeholder:

Input:

Klient Jan Kowalski (PESEL 02271409862) mieszka przy ul. Marszałkowska 10,
00-624 Warszawa. Kontakt: +48 600 123 456, [email protected].
NIP firmy: 5261040828.

After anonymization:

Klient [IMIĘ_NAZWISKO_1] (PESEL [PESEL_1]) mieszka przy [ADRES_1],
[KOD_POCZTOWY_1] Warszawa. Kontakt: [TELEFON_1], [EMAIL_1].
NIP firmy: [NIP_1].

Configuration

You can customize PII detection settings through the API:

# Get current PII settings
curl https://api.vaultproxy.ai/v1/settings/pii \
-H "Authorization: Bearer vpx_live_YOUR_API_KEY"

# Update settings (e.g., disable specific PII types)
curl -X PATCH https://api.vaultproxy.ai/v1/settings/pii \
-H "Authorization: Bearer vpx_live_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"enabled_types": ["pesel", "nip", "names", "phone", "email"],
"art9_detection": true,
"art9_action": "flag"
}'

Next Steps