Entity Types & Detection
PrivaiShield detects 19+ entity types out of the box. You can also define custom patterns, configure allowlists, and tune confidence thresholds.
Built-in Entity Types
These entity types are detected automatically. Types marked "enhanced" require enhanced mode for best accuracy; they still work in standard mode but with pattern-based detection only.
Personal names (first, last, full)
John SmithDr. Jane DoeRobert J. JohnsonEmail addresses
john@example.comsupport+tag@company.ioPhone numbers (US, international)
555-123-4567+1 (555) 123-4567555.123.4567Social Security Numbers
123-45-6789123 45 6789Street addresses
123 Main St, Boston MA 02101456 Oak Ave Apt 2BDates of birth, calendar dates
01/15/1990March 15, 19901990-01-15Credit/debit card numbers
4532-8821-9934-11224532 8821 9934 1122Bank account and routing numbers
Account #12345678Routing: 021000021Medical Record Numbers
MRN: 847293MRN-12345678IPv4 and IPv6 addresses
192.168.1.12001:0db8:85a3::8a2e:0370:7334Web URLs
https://example.com/pathftp://files.example.orgDriver's license numbers
DL# A1234567License: 123456789Passport numbers
Passport: 123456789PP# A12345678Vehicle Identification Numbers
1HGBH41JXMN109186Company and organization names
Acme CorporationIBMStanford UniversityGeographic locations (cities, states, countries)
BostonCaliforniaUnited StatesTimestamps and date-time combinations
2024-01-15T10:30:00ZJan 15 at 10:30 AMAge values
35 years oldage 4242yoMedical conditions and diagnoses
Type 2 DiabeteshypertensionCOVID-19Custom Patterns
Define custom entity types using regex patterns. Custom patterns are applied alongside built-in detectors.
{
"text": "Employee ID: EMP-12345, Badge: BDG-A9876",
"mode": "enhanced",
"customPatterns": [
{
"type": "EMPLOYEE_ID",
"pattern": "EMP-\\d{5}",
"description": "Internal employee ID"
},
{
"type": "BADGE_NUMBER",
"pattern": "BDG-[A-Z]\\d{4}",
"description": "Building badge number"
}
]
}{
"redacted": "Employee ID: [EMPLOYEE_ID_1], Badge: [BADGE_NUMBER_1]",
"entities": [
{
"type": "EMPLOYEE_ID",
"token": "[EMPLOYEE_ID_1]",
"confidence": 1.0
},
{
"type": "BADGE_NUMBER",
"token": "[BADGE_NUMBER_1]",
"confidence": 1.0
}
]
}- • Use double backslashes (
\\\\d) in JSON for regex escapes - • Patterns are case-insensitive by default
- • Custom patterns have confidence 1.0 (exact match)
- • Test patterns in the API Playground first
const result = await client.redact({
text: "Case #: CASE-2024-0001",
customPatterns: [
{
type: "CASE_NUMBER",
pattern: "CASE-\\d{4}-\\d{4}",
description: "Legal case number"
}
]
});result = client.redact(
text="Case #: CASE-2024-0001",
custom_patterns=[
{
"type": "CASE_NUMBER",
"pattern": r"CASE-\d{4}-\d{4}",
"description": "Legal case number"
}
]
)Allowlists
Prevent specific values from being redacted using allowlists. Useful for company names, product names, or other terms that shouldn't be treated as PII.
{
"text": "Contact John Smith at Acme Corporation headquarters",
"mode": "enhanced",
"allowlist": {
"ORGANIZATION": ["Acme Corporation", "Acme Corp"],
"PERSON": ["John Smith"]
}
}{
"redacted": "Contact John Smith at Acme Corporation headquarters",
"entities": [],
"allowlistMatches": [
{"type": "PERSON", "value": "John Smith"},
{"type": "ORGANIZATION", "value": "Acme Corporation"}
]
}Configure a global allowlist in your account settings that applies to all API requests.
Account SettingsPass an allowlist with each request to override or extend the global allowlist for specific use cases.
const result = await client.redact({
text: "Contact support@privaishield.com for help",
allowlist: {
EMAIL: ["support@privaishield.com", "sales@privaishield.com"],
ORGANIZATION: ["PrivaiShield"]
}
});Confidence Tuning
Each detected entity includes a confidence score (0-1). You can set minimum confidence thresholds to control detection sensitivity.
| Threshold | Behavior | Use Case |
|---|---|---|
0.5 | Aggressive - catch more potential PII | High-security environments |
0.7 | Balanced - good accuracy with few false positives | General use (default) |
0.9 | Conservative - only high-confidence matches | When false positives are costly |
{
"text": "Contact Dr. Smith at the clinic",
"mode": "enhanced",
"confidence": 0.8
}{
"text": "Contact Dr. Smith at the clinic",
"mode": "enhanced",
"entityConfidence": {
"PERSON": 0.9,
"ORGANIZATION": 0.7,
"LOCATION": 0.8
}
}- Pattern match strength: Exact regex matches score higher than fuzzy matches
- Context signals: Keywords like 'SSN:', 'email:', etc. boost confidence
- Format validation: Values matching expected formats (Luhn check, etc.) score higher
- ML model confidence: Enhanced mode uses model probability scores
Selecting Specific Entity Types
By default, all entity types are detected. You can specify which types to detect to improve performance or focus on specific PII categories.
{
"text": "Contact john@example.com or call 555-123-4567",
"mode": "enhanced",
"entities": ["EMAIL", "PHONE"]
}{
"text": "Contact john@example.com or call 555-123-4567",
"mode": "enhanced",
"excludeEntities": ["DATE", "URL", "IP_ADDRESS"]
}