prompt-sanitizer logo
Open Source Package

prompt-sanitizer

Keep PII and secrets out of your LLM provider's hands.

A local-first sanitization library for Python, TypeScript and Ruby — no cloud calls, no telemetry, no third-party APIs. Works entirely inside your process.

255
tests passing
0.3ms
FAST latency
100%
API key recall
3
runtimes

Privacy by default

Detects and removes PII, API keys, and secrets from prompts before they reach any LLM provider. Works entirely inside your process — zero network calls.

Bidirectional vault

Replaces sensitive values with placeholders like [EMAIL_1], stores the mapping in a session vault, then restores originals in the model's response.

Zero-overhead option

FAST mode uses only regex — sub-millisecond latency, no model downloads, no GPU needed. Add local NER in SMART/FULL mode when you need name detection.

How it works

Five steps — every prompt goes through this pipeline

01

Prompt arrives

Your app receives a user prompt that may contain PII, secrets, or sensitive data.

02

Detect entities

Regex engine (FAST) or local NER model (SMART/FULL) scans every token for emails, names, API keys, and more.

03

Replace & vault

Each detected value is swapped with a typed placeholder like [EMAIL_1] and stored in a session vault.

04

Send clean text

Only sanitized text with placeholders leaves your process. Your LLM provider never sees raw data.

05

Deanonymize response

After inference, originals are restored in the model's response before it reaches the user.

live example
// Input to your app
"Email Alice at alice@example.com, JWT: eyJhbGci..."
// After sanitize()
"Email [PERSON_1] at [EMAIL_1], JWT: [JWT_1]"
// What your LLM receives
"Email [PERSON_1] at [EMAIL_1], JWT: [JWT_1]"
// LLM response (still safe)
"I'll draft a reply to [PERSON_1] at [EMAIL_1]"
// After deanonymize()
"I'll draft a reply to Alice at alice@example.com"
LLM provider never saw "Alice" or her email

Three modes

Choose based on your latency and accuracy needs

FASTDefault

Regex + secrets only

Best for high-throughput, edge workloads, or when you only care about structured PII and API keys. No model downloads. 0.3 ms median latency.

  • 0.3 ms latency
  • 100% API key recall
  • Zero ML dependencies
  • Edge / serverless ready
SMARTRecommended

FAST + local NER

Adds person names, organisation names, and locations via a local transformer model. No cloud — the model runs on your machine.

  • ~88% person recall
  • Org & location NER
  • Piiranha (Python) / Xenova (JS)
  • Still fully local
FULLCompliance

SMART + synthetic + audit

Generates realistic fake replacements (via Faker) instead of bare placeholders, and writes tamper-evident hashed audit events to SQLite.

  • Realistic fake values
  • Hashed audit events
  • SQLite or in-memory log
  • GDPR / HIPAA workflows

Installation

Zero configuration — one package, two runtimes

bash
npm install prompt-sanitizer

# optional: local NER support (SMART/FULL mode)
npm install @huggingface/transformers

# optional: realistic fake replacements
npm install @faker-js/faker

Quick start

Full examples covering the three most common use cases

python
from prompt_sanitizer import Mode, Sanitizer, SQLiteAuditLog

# ── FAST mode (default) ────────────────────────────────────
s = Sanitizer()
result = s.sanitize("Hi, I'm Alice. My email is alice@example.com")
print(result.text)      # "Hi, I'm [PERSON_1]. My email is [EMAIL_1]"
print(result.entities)  # [DetectedEntity(type=PERSON, value="Alice"), ...]

# ── Bidirectional session ──────────────────────────────────
session = s.session()
clean = session.anonymize("Call Alice at (415) 867-5309")
reply = call_llm(clean)             # LLM sees "[PERSON_1]" not "Alice"
final = session.deanonymize(reply)  # originals restored in the response

# ── FULL mode with audit log ───────────────────────────────
audit = SQLiteAuditLog("./audit.db")
full  = Sanitizer(mode=Mode.FULL, audit_log=audit)
full.sanitize("Contact alice@example.com re: claim 123-45-6789")
print(audit.export(format="json", since="1d"))

Framework integrations

Drop-in support for the most popular AI and web frameworks

Vercel AI

Wrap any Vercel AI SDK generate call — PII is sanitized before the request and restored in the response automatically.

typescript
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { Sanitizer } from "prompt-sanitizer";
import { wrapGenerate } from "prompt-sanitizer/integrations/vercel-ai";

const sanitizer = new Sanitizer();
const safeGenerate = wrapGenerate(sanitizer, generateText);

// PII is sanitized before sending, restored in the response
const { text } = await safeGenerate({
  model: openai("gpt-4o"),
  prompt: "My email is alice@example.com. Summarize this.",
});

Supported PII & secret types

Every detected type maps to a named placeholder

EMAIL
alice@example.com
PHONE
(415) 867-5309
SSN
123-45-6789
CREDIT_CARD
4111 1111 1111 1111
API_KEY
sk-proj-...
JWT_TOKEN
eyJhbGci...
PERSON_NAME
Dr. John Smith
ORGANIZATION
Acme Corp
IP_ADDRESS
192.168.1.1
IBAN
GB82WEST12345698765432
AWS_KEY
AKIA...
GITHUB_TOKEN
ghp_...
PRIVATE_KEY
-----BEGIN RSA...
DATABASE_URL
postgres://user:pass@...
LOCATION
New York, NY
DATE
1990-03-21
CUSTOM
your own patterns

Plus: ADDRESS, ZIP_CODE, PASSPORT, DRIVING_LICENSE, CRYPTO_ADDRESS, MAC_ADDRESS, and more.

Feature comparison

See how prompt-sanitizer compares to other tools

Feature
prompt-sanitizer
FAST / SMART / FULL
Presidio
Python only
LLM Guard
Python only
OpenRedaction
JS only
Runs fully local (no cloud)
Zero ML dependencies (base mode)
100% API key / secret detection
Person & org name detection (NER)
Bidirectional vault (anonymize + restore)
Synthetic realistic replacements
Tamper-evident audit log
JavaScript / TypeScript support
Python support
Ruby / Rails support
Framework middleware (Express, Next.js…)
Custom entity patterns

— partial support · Based on public documentation. See README for sources.

Start sanitizing today

One install. Zero cloud deps. Your LLM provider never sees raw PII.