Entity Types
Entity Types

Entity Types & Detection

PrivaiShield detects 19+ entity types out of the box. You can also define custom patterns, configure allowlists, and tune confidence thresholds.

Built-in Entity Types

These entity types are detected automatically. Types marked "enhanced" require enhanced mode for best accuracy; they still work in standard mode but with pattern-based detection only.

PERSONenhanced recommended

Personal names (first, last, full)

John SmithDr. Jane DoeRobert J. Johnson
EMAIL

Email addresses

john@example.comsupport+tag@company.io
PHONE

Phone numbers (US, international)

555-123-4567+1 (555) 123-4567555.123.4567
SSN

Social Security Numbers

123-45-6789123 45 6789
ADDRESSenhanced recommended

Street addresses

123 Main St, Boston MA 02101456 Oak Ave Apt 2B
DATE

Dates of birth, calendar dates

01/15/1990March 15, 19901990-01-15
CREDIT_CARD

Credit/debit card numbers

4532-8821-9934-11224532 8821 9934 1122
BANK_ACCOUNT

Bank account and routing numbers

Account #12345678Routing: 021000021
MRN

Medical Record Numbers

MRN: 847293MRN-12345678
IP_ADDRESS

IPv4 and IPv6 addresses

192.168.1.12001:0db8:85a3::8a2e:0370:7334
URL

Web URLs

https://example.com/pathftp://files.example.org
DRIVER_LICENSE

Driver's license numbers

DL# A1234567License: 123456789
PASSPORT

Passport numbers

Passport: 123456789PP# A12345678
VIN

Vehicle Identification Numbers

1HGBH41JXMN109186
ORGANIZATIONenhanced recommended

Company and organization names

Acme CorporationIBMStanford University
LOCATIONenhanced recommended

Geographic locations (cities, states, countries)

BostonCaliforniaUnited States
DATE_TIME

Timestamps and date-time combinations

2024-01-15T10:30:00ZJan 15 at 10:30 AM
AGEenhanced recommended

Age values

35 years oldage 4242yo
MEDICAL_CONDITIONenhanced recommended

Medical conditions and diagnoses

Type 2 DiabeteshypertensionCOVID-19

Custom Patterns

Define custom entity types using regex patterns. Custom patterns are applied alongside built-in detectors.

API Request with Custom Patternsjson
{
  "text": "Employee ID: EMP-12345, Badge: BDG-A9876",
  "mode": "enhanced",
  "customPatterns": [
    {
      "type": "EMPLOYEE_ID",
      "pattern": "EMP-\\d{5}",
      "description": "Internal employee ID"
    },
    {
      "type": "BADGE_NUMBER",
      "pattern": "BDG-[A-Z]\\d{4}",
      "description": "Building badge number"
    }
  ]
}
Responsejson
{
  "redacted": "Employee ID: [EMPLOYEE_ID_1], Badge: [BADGE_NUMBER_1]",
  "entities": [
    {
      "type": "EMPLOYEE_ID",
      "token": "[EMPLOYEE_ID_1]",
      "confidence": 1.0
    },
    {
      "type": "BADGE_NUMBER",
      "token": "[BADGE_NUMBER_1]",
      "confidence": 1.0
    }
  ]
}
Pattern Tips
  • • Use double backslashes (\\\\d) in JSON for regex escapes
  • • Patterns are case-insensitive by default
  • • Custom patterns have confidence 1.0 (exact match)
  • • Test patterns in the API Playground first
JavaScript SDKjavascript
const result = await client.redact({
  text: "Case #: CASE-2024-0001",
  customPatterns: [
    {
      type: "CASE_NUMBER",
      pattern: "CASE-\\d{4}-\\d{4}",
      description: "Legal case number"
    }
  ]
});
Python SDKpython
result = client.redact(
    text="Case #: CASE-2024-0001",
    custom_patterns=[
        {
            "type": "CASE_NUMBER",
            "pattern": r"CASE-\d{4}-\d{4}",
            "description": "Legal case number"
        }
    ]
)

Allowlists

Prevent specific values from being redacted using allowlists. Useful for company names, product names, or other terms that shouldn't be treated as PII.

API Request with Allowlistjson
{
  "text": "Contact John Smith at Acme Corporation headquarters",
  "mode": "enhanced",
  "allowlist": {
    "ORGANIZATION": ["Acme Corporation", "Acme Corp"],
    "PERSON": ["John Smith"]
  }
}
Response (values preserved)json
{
  "redacted": "Contact John Smith at Acme Corporation headquarters",
  "entities": [],
  "allowlistMatches": [
    {"type": "PERSON", "value": "John Smith"},
    {"type": "ORGANIZATION", "value": "Acme Corporation"}
  ]
}
Global Allowlist

Configure a global allowlist in your account settings that applies to all API requests.

Account Settings
Request-level Allowlist

Pass an allowlist with each request to override or extend the global allowlist for specific use cases.

JavaScript SDKjavascript
const result = await client.redact({
  text: "Contact support@privaishield.com for help",
  allowlist: {
    EMAIL: ["support@privaishield.com", "sales@privaishield.com"],
    ORGANIZATION: ["PrivaiShield"]
  }
});

Confidence Tuning

Each detected entity includes a confidence score (0-1). You can set minimum confidence thresholds to control detection sensitivity.

ThresholdBehaviorUse Case
0.5Aggressive - catch more potential PIIHigh-security environments
0.7Balanced - good accuracy with few false positivesGeneral use (default)
0.9Conservative - only high-confidence matchesWhen false positives are costly
Global Confidence Thresholdjson
{
  "text": "Contact Dr. Smith at the clinic",
  "mode": "enhanced",
  "confidence": 0.8
}
Per-Entity Confidencejson
{
  "text": "Contact Dr. Smith at the clinic",
  "mode": "enhanced",
  "entityConfidence": {
    "PERSON": 0.9,
    "ORGANIZATION": 0.7,
    "LOCATION": 0.8
  }
}
Confidence Score Factors
  • Pattern match strength: Exact regex matches score higher than fuzzy matches
  • Context signals: Keywords like 'SSN:', 'email:', etc. boost confidence
  • Format validation: Values matching expected formats (Luhn check, etc.) score higher
  • ML model confidence: Enhanced mode uses model probability scores

Selecting Specific Entity Types

By default, all entity types are detected. You can specify which types to detect to improve performance or focus on specific PII categories.

Detect Only Specific Typesjson
{
  "text": "Contact john@example.com or call 555-123-4567",
  "mode": "enhanced",
  "entities": ["EMAIL", "PHONE"]
}
Exclude Specific Typesjson
{
  "text": "Contact john@example.com or call 555-123-4567",
  "mode": "enhanced",
  "excludeEntities": ["DATE", "URL", "IP_ADDRESS"]
}