PII Protection

Ovexa automatically detects and anonymizes Polish personally identifiable information (PII) in prompts before forwarding them to AI providers. This page documents every PII type that Ovexa recognizes.

How It Works

Ovexa uses a multi-layer detection pipeline:

HerBERT NER -- A Polish-language Named Entity Recognition model based on HerBERT (a Polish RoBERTa variant) identifies names, organizations, and locations.
Checksum validation -- Numeric identifiers like PESEL, NIP, and ID card numbers are validated using their built-in checksum algorithms.
Regex patterns -- Structured data like phone numbers, email addresses, postal codes, and IBAN numbers are matched with regular expressions.
Dictionary matching -- Street prefixes, city names, and other contextual clues help identify addresses and locations.

Detected PII is replaced with typed placeholders (e.g., <PESEL_1>, <PERSON_1>) before the prompt reaches the AI provider. The original values are held in memory for up to 60 seconds during request processing and are never written to disk or logged.

Supported PII Types

Names (Imię i nazwisko)


Detection	HerBERT NER + Polish name dictionary
Placeholder	`<PERSON_N>`
Input	`Proszę sprawdzić konto Jana Kowalskiego`
Output	`Proszę sprawdzić konto <PERSON_1>`

Handles Polish inflected forms (Kowalski / Kowalskiego / Kowalskiemu).

PESEL


Detection	11-digit pattern + checksum validation (modulo 10)
Placeholder	`<PESEL_N>`
Input	`PESEL pacjenta: 02271409862`
Output	`PESEL pacjenta: <PESEL_1>`

Only valid PESEL numbers (passing checksum) are anonymized to avoid false positives.

NIP (Tax ID)


Detection	10-digit pattern + weighted checksum validation
Placeholder	`<NIP_N>`
Input	`NIP firmy: 5261040828`
Output	`NIP firmy: <NIP_1>`

Supports formats with and without dashes (e.g., 526-104-08-28).

REGON


Detection	9 or 14-digit pattern + checksum validation
Placeholder	`<REGON_N>`
Input	`REGON: 012100784`
Output	`REGON: <REGON_1>`

ID Card Number (Dowód osobisty)


Detection	3 letters + 6 digits pattern + checksum validation
Placeholder	`<PL_ID_CARD_N>`
Input	`Dowód: ABS123456`
Output	`Dowód: <PL_ID_CARD_1>`

Passport Number


Detection	2 letters + 7 digits (ICAO standard for Polish passports)
Placeholder	`<PASSPORT_N>`
Input	`Paszport: EA1234567`
Output	`Paszport: <PASSPORT_1>`

Phone Number


Detection	Regex for Polish phone formats (+48, 0048, 9-digit)
Placeholder	`<PHONE_NUMBER_N>`
Input	`Telefon: +48 600 123 456`
Output	`Telefon: <PHONE_NUMBER_1>`

Supports formats: +48600123456, +48 600 123 456, 600-123-456, 48 600 123 456.

Email Address


Detection	Standard email regex pattern
Placeholder	`<EMAIL_ADDRESS_N>`
Input	`Email: [email protected]`
Output	`Email: <EMAIL_ADDRESS_1>`

Address (Adres)


Detection	Polish address patterns with prefix keywords (ul., al., os., pl.) + building numbers
Placeholder	`<PL_ADDRESS_N>`
Input	`Adres: ul. Marszałkowska 10/5, 00-624 Warszawa`
Output	`Adres: <PL_ADDRESS_1>`

Recognizes common prefixes: ul., ulica, al., aleja, os., osiedle, pl., plac.

Postal Code (Kod pocztowy)


Detection	Pattern: `XX-XXX` (2 digits, dash, 3 digits)
Placeholder	`<PL_POSTAL_CODE_N>`
Input	`Kod: 00-624`
Output	`Kod: <PL_POSTAL_CODE_1>`

Date of Birth (Data urodzenia)


Detection	Date patterns in context of birth/age keywords
Placeholder	`<DATE_OF_BIRTH_N>`
Input	`Data urodzenia: 14.02.2002`
Output	`Data urodzenia: <DATE_OF_BIRTH_1>`

Supports formats: DD.MM.YYYY, DD-MM-YYYY, DD/MM/YYYY, YYYY-MM-DD.

IBAN (Bank Account)


Detection	Polish IBAN pattern: `PL` + 26 digits, with optional spaces
Placeholder	`<IBAN_CODE_N>`
Input	`Konto: PL61 1090 1014 0000 0712 1981 2874`
Output	`Konto: <IBAN_CODE_1>`

Also detects 26-digit Polish account numbers without the PL prefix.

Credit Card Number


Detection	16-digit patterns + Luhn algorithm validation
Placeholder	`<CREDIT_CARD_N>`
Input	`Karta: 4532 0151 2345 6789`
Output	`Karta: <CREDIT_CARD_1>`

KRS (National Court Register)


Detection	10-digit pattern with context keywords
Placeholder	`<KRS_N>`
Input	`KRS: 0000123456`
Output	`KRS: <KRS_1>`

Supports formats with and without leading zeros.

Driving License (Prawo jazdy)


Detection	Pattern: `XXXXX/XX/XXXX` (e.g., `12345/06/1234567`)
Placeholder	`<PL_DRIVING_LICENSE_N>`
Input	`Prawo jazdy: 12345/06/1234567`
Output	`Prawo jazdy: <PL_DRIVING_LICENSE_1>`

Vehicle Registration Plate (Tablica rejestracyjna)


Detection	Polish plate format: 2-3 letters + space + 4-5 alphanumeric characters
Placeholder	`<PL_VEHICLE_PLATE_N>`
Input	`Rejestracja: WA 12345`
Output	`Rejestracja: <PL_VEHICLE_PLATE_1>`

Requires strong context keywords to avoid false positives (low base score).

Location (Miejscowość)


Detection	HerBERT NER + Polish city name dictionary with grammatical inflection
Placeholder	`<LOCATION_N>`
Input	`Mieszka w Warszawie`
Output	`Mieszka w <LOCATION_1>`

Handles Polish inflected forms (Warszawa / Warszawie / Warszawy).

Art. 9 RODO Sensitive Data


Detection	Keyword matching with Polish inflection (see Art. 9 RODO details)
Action	Flagged (not anonymized) -- returns a warning in the response
Input	`Pacjent ma cukrzycę i jest leczony insuliną`
Output	Request proceeds with a warning flag about sensitive data categories

warning

Art. 9 data is flagged, not anonymized. Ovexa detects its presence and adds a warning to the response metadata. Your application should handle this flag according to your compliance requirements.

Multiple PII in a Single Prompt

When a prompt contains multiple PII items, each gets a unique numbered placeholder:

Input:

Klient Jan Kowalski (PESEL 02271409862) mieszka przy ul. Marszałkowska 10,
00-624 Warszawa. Kontakt: +48 600 123 456, [email protected].
NIP firmy: 5261040828.

After anonymization:

Klient <PERSON_1> (PESEL <PESEL_1>) mieszka przy <PL_ADDRESS_1>,
<PL_POSTAL_CODE_1> Warszawa. Kontakt: <PHONE_NUMBER_1>, <EMAIL_ADDRESS_1>.
NIP firmy: <NIP_1>.

Configuration

You can customize PII detection settings through the API:

# Get current PII settings
curl https://api.ovexa.ai/v1/settings/pii \
  -H "Authorization: Bearer vpx_live_YOUR_API_KEY"

# Update settings (e.g., disable specific PII types)
curl -X PATCH https://api.ovexa.ai/v1/settings/pii \
  -H "Authorization: Bearer vpx_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "enabled_types": ["pesel", "nip", "names", "phone", "email"],
    "art9_detection": true,
    "art9_action": "flag"
  }'

Next Steps

Art. 9 RODO -- Details on sensitive data detection
Security -- How Ovexa handles data securely
API Reference -- Full endpoint documentation

How It Works​

Supported PII Types​

Names (Imię i nazwisko)​

PESEL​

NIP (Tax ID)​

REGON​

ID Card Number (Dowód osobisty)​

Passport Number​

Phone Number​

Email Address​

Address (Adres)​

Postal Code (Kod pocztowy)​

Date of Birth (Data urodzenia)​

IBAN (Bank Account)​

Credit Card Number​

KRS (National Court Register)​

Driving License (Prawo jazdy)​

Vehicle Registration Plate (Tablica rejestracyjna)​

Location (Miejscowość)​

Art. 9 RODO Sensitive Data​

Multiple PII in a Single Prompt​

Configuration​

Next Steps​