Open Source Package

prompt-sanitizer

Keep PII and secrets out of your LLM provider's hands.

A local-first sanitization library for Python, TypeScript and Ruby — no cloud calls, no telemetry, no third-party APIs. Works entirely inside your process.

255

tests passing

0.3ms

FAST latency

100%

API key recall

runtimes

GitHub npm PyPI RubyGems

Privacy by default

Detects and removes PII, API keys, and secrets from prompts before they reach any LLM provider. Works entirely inside your process — zero network calls.

Bidirectional vault

Replaces sensitive values with placeholders like [EMAIL_1], stores the mapping in a session vault, then restores originals in the model's response.

Zero-overhead option

FAST mode uses only regex — sub-millisecond latency, no model downloads, no GPU needed. Add local NER in SMART/FULL mode when you need name detection.

How it works

Five steps — every prompt goes through this pipeline

Prompt arrives

Your app receives a user prompt that may contain PII, secrets, or sensitive data.

Detect entities

Regex engine (FAST) or local NER model (SMART/FULL) scans every token for emails, names, API keys, and more.

Replace & vault

Each detected value is swapped with a typed placeholder like [EMAIL_1] and stored in a session vault.

Send clean text

Only sanitized text with placeholders leaves your process. Your LLM provider never sees raw data.

Deanonymize response

After inference, originals are restored in the model's response before it reaches the user.

live example

// Input to your app

"Email Alice at alice@example.com, JWT: eyJhbGci..."

// After sanitize()

"Email [PERSON_1] at [EMAIL_1], JWT: [JWT_1]"

// What your LLM receives

"Email [PERSON_1] at [EMAIL_1], JWT: [JWT_1]"

// LLM response (still safe)

"I'll draft a reply to [PERSON_1] at [EMAIL_1]"

// After deanonymize()

"I'll draft a reply to Alice at alice@example.com"

LLM provider never saw "Alice" or her email

Three modes

Choose based on your latency and accuracy needs

FASTDefault

Regex + secrets only

Best for high-throughput, edge workloads, or when you only care about structured PII and API keys. No model downloads. 0.3 ms median latency.

0.3 ms latency
100% API key recall
Zero ML dependencies
Edge / serverless ready

SMARTRecommended

FAST + local NER

Adds person names, organisation names, and locations via a local transformer model. No cloud — the model runs on your machine.

~88% person recall
Org & location NER
Piiranha (Python) / Xenova (JS)
Still fully local

FULLCompliance

SMART + synthetic + audit

Generates realistic fake replacements (via Faker) instead of bare placeholders, and writes tamper-evident hashed audit events to SQLite.

Realistic fake values
Hashed audit events
SQLite or in-memory log
GDPR / HIPAA workflows

Installation

Zero configuration — one package, two runtimes

bash

npm install prompt-sanitizer

# optional: local NER support (SMART/FULL mode)
npm install @huggingface/transformers

# optional: realistic fake replacements
npm install @faker-js/faker

Quick start

Full examples covering the three most common use cases

python

from prompt_sanitizer import Mode, Sanitizer, SQLiteAuditLog

# ── FAST mode (default) ────────────────────────────────────
s = Sanitizer()
result = s.sanitize("Hi, I'm Alice. My email is alice@example.com")
print(result.text)      # "Hi, I'm [PERSON_1]. My email is [EMAIL_1]"
print(result.entities)  # [DetectedEntity(type=PERSON, value="Alice"), ...]

# ── Bidirectional session ──────────────────────────────────
session = s.session()
clean = session.anonymize("Call Alice at (415) 867-5309")
reply = call_llm(clean)             # LLM sees "[PERSON_1]" not "Alice"
final = session.deanonymize(reply)  # originals restored in the response

# ── FULL mode with audit log ───────────────────────────────
audit = SQLiteAuditLog("./audit.db")
full  = Sanitizer(mode=Mode.FULL, audit_log=audit)
full.sanitize("Contact alice@example.com re: claim 123-45-6789")
print(audit.export(format="json", since="1d"))

Framework integrations

Drop-in support for the most popular AI and web frameworks

Vercel AI

Wrap any Vercel AI SDK generate call — PII is sanitized before the request and restored in the response automatically.

typescript

import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { Sanitizer } from "prompt-sanitizer";
import { wrapGenerate } from "prompt-sanitizer/integrations/vercel-ai";

const sanitizer = new Sanitizer();
const safeGenerate = wrapGenerate(sanitizer, generateText);

// PII is sanitized before sending, restored in the response
const { text } = await safeGenerate({
  model: openai("gpt-4o"),
  prompt: "My email is alice@example.com. Summarize this.",
});

Supported PII & secret types

Every detected type maps to a named placeholder

alice@example.com

PHONE

(415) 867-5309

SSN

123-45-6789

CREDIT_CARD

4111 1111 1111 1111

API_KEY

sk-proj-...

JWT_TOKEN

eyJhbGci...

PERSON_NAME

Dr. John Smith

ORGANIZATION

Acme Corp

IP_ADDRESS

192.168.1.1

IBAN

GB82WEST12345698765432

AWS_KEY

AKIA...

GITHUB_TOKEN

ghp_...

PRIVATE_KEY

-----BEGIN RSA...

DATABASE_URL

postgres://user:pass@...

LOCATION

New York, NY

DATE

1990-03-21

CUSTOM

your own patterns

Plus: ADDRESS, ZIP_CODE, PASSPORT, DRIVING_LICENSE, CRYPTO_ADDRESS, MAC_ADDRESS, and more.

Feature comparison

See how prompt-sanitizer compares to other tools

Feature	prompt-sanitizer FAST / SMART / FULL	Presidio Python only	LLM Guard Python only	OpenRedaction JS only
Runs fully local (no cloud)
Zero ML dependencies (base mode)
100% API key / secret detection
Person & org name detection (NER)
Bidirectional vault (anonymize + restore)
Synthetic realistic replacements
Tamper-evident audit log
JavaScript / TypeScript support
Python support
Ruby / Rails support
Framework middleware (Express, Next.js…)
Custom entity patterns

— partial support · Based on public documentation. See README for sources.

Start sanitizing today

One install. Zero cloud deps. Your LLM provider never sees raw PII.

View on GitHub Documentation