Security &
Compliance

Technical documentation for security teams evaluating PII Scrambler for enterprise use. Every claim on this page is independently verifiable.

Zero transmission

No APIs, no uploads, no external calls. Data never leaves the browser.

1.2M+ name dictionary

Names from 106 countries — the largest client-side name dictionary of any PII tool.

No server runtime

Static HTML/JS/CSS export. No Node.js, no backend, no database.

Fully auditable

Open source. Inspect the network tab yourself to verify.

Architecture Overview

PII Scrambler is built with Next.js using output: "export" — this produces a fully static site. The build output is plain HTML, JavaScript, and CSS files. There is no server-side code, no API routes, no database, and no runtime environment.

File processing runs entirely in a Web Worker — a separate browser thread that handles parsing, PII detection, and file rebuilding. The Web Worker communicates with the main thread exclusively through postMessage with transferable ArrayBuffers.

The deployed application consists of:

●Static HTML, JS bundles, and CSS
●Two name dictionary files (3.2 MB + 6.3 MB, served from same origin)
●A PDF.js worker script (for PDF text extraction)

No environment variables are used. No secrets or API keys exist anywhere in the codebase.

Data Flow

File selection

User selects a file via drag-drop or file picker. The file is read into an ArrayBuffer in browser memory.

Worker transfer

The ArrayBuffer is transferred (zero-copy) to a Web Worker thread. The main thread no longer holds this data.

Text extraction

The appropriate file processor extracts text content. PDF uses pdfjs-dist, DOCX uses jszip, XLSX uses the xlsx library. All libraries run in-browser.

PII detection

14 regex patterns scan for structured PII (emails, credit cards across all major networks, SSNs, NI numbers, dates of birth, IP addresses, phone numbers, postcodes, and VINs). Three-tier name detection — including a 1,229,656-name dictionary covering 106 countries — runs in parallel. All data is loaded from static files bundled with the app.

File rebuild

The processor rebuilds the file in its original format with PII replaced by labels like [EMAIL], [NAME], [PHONE_NUMBER]. For PDFs, pages containing PII are rendered to images and rebuilt with the original text content stream removed — only non-PII text is re-added as a selectable layer. This ensures redacted text cannot be recovered via copy-paste or text extraction.

Download

The cleaned file ArrayBuffer is transferred back to the main thread. A download is triggered via URL.createObjectURL. The object URL is immediately revoked and all references released for garbage collection.

At no point in this flow does data leave the browser's memory boundary. There is no network transmission of file content.

Network Activity

What you will see in the Network tab

On page load: HTML document, JS bundles, CSS, font files (DM Sans from Google Fonts, PX Grotesk from same origin), and two name dictionary files: /names-first.txt (3.2 MB) and /names-last.txt (6.3 MB) from same origin. These are preloaded static assets bundled with the app — no different from loading CSS or JS.

On first PDF process: /pdf.worker.min.mjs is loaded from same origin (Mozilla's PDF.js worker).

On file process: Nothing. All resources are already loaded and cached.

What you will NOT see

✕XHR/fetch to external domains

✕WebSocket connections

✕Tracking pixels or beacons

✕Google Analytics or any analytics

✕Mixpanel, Segment, Amplitude

✕Sentry, Bugsnag, or error reporting

✕CDN calls for PII processing

✕Any outbound POST requests

What It Won't Do

No server-side processing

No file uploads to any server

No data persistence (no localStorage, IndexedDB, or cookies for user data)

No user accounts or authentication

No analytics or telemetry of any kind

No third-party tracking scripts

No external API calls for PII processing

No error reporting to external services

No A/B testing frameworks

No session recording or heatmaps

No advertising or marketing SDKs

No data collection whatsoever

Technical Evidence

Static export configuration

// next.config.ts
const nextConfig = {
  output: "export",  // ← Fully static, no server
  webpack: (config) => {
    config.resolve.fallback = {
      fs: false,       // No filesystem access
      path: false,     // No path module
      crypto: false,   // No crypto module
      // ... all Node.js modules disabled
    };
    return config;
  },
};

No API routes

The src/app/ directory contains only page routes and layout files. There is no api/ subdirectory. No server-side request handlers exist anywhere in the codebase.

Worker isolation

The Web Worker runs in a separate thread within the same browser security context. Data is transferred via postMessage using Transferable ArrayBuffers (zero-copy, single-owner semantics). The worker terminates after each file is processed.

Dependency Audit

Package	Purpose	Network
next	Static site generation framework	None
react / react-dom	UI rendering	None
pdfjs-dist	PDF text extraction (Mozilla PDF.js)	None
pdf-lib	PDF modification (content stream replacement, image embedding, text layer rebuild)	None
jszip	ZIP manipulation (DOCX files are ZIP archives)	None
xlsx	Excel spreadsheet parsing	None
papaparse	CSV parsing	None
compromise	NLP library for named entity recognition	None

Every dependency operates exclusively on in-memory data. None make network requests, transmit telemetry, or access external resources.

Name Detection & Data Provenance

1,229,656unique names from 106 countries

The largest client-side name dictionary of any PII tool. 437k first names and 793k surnames — precision-filtered at build time to remove ~4,000 non-name words (countries, cities, common English) while preserving every legitimate name. Combined with patronymic suffix and prefix pattern recognition for naming conventions that dictionary lookup alone would miss.

Three-tier detection

Tier 1

Contextual heuristics

Detects names near honorifics (Mr., Dr.), salutations (Dear), form labels (Name:), signature blocks, and email-derived patterns.

Tier 2

NLP

Named entity recognition via compromise.js — a client-side NLP library. No external API calls.

Tier 3

Dictionary + suffix patterns

1,229,656-name dictionary (106 countries), precision-filtered at build time to exclude non-name words, plus cultural suffix/prefix pattern matching. Multi-tier agreement boosts confidence; common English words and blocklisted terms are penalised.

Cultural naming convention coverage

Beyond the dictionary, PII Scrambler recognises culturally-specific suffix and prefix patterns for surnames that wouldn't appear in standard English name lists:

●

Slavic -ović, -ski, -enko, -chuk

●

Arabic Al-, El-, Bin-, Abu-, Bint-

●

Turkish -oğlu

●

Persian -zadeh, -pour, -nejad

●

Georgian -dze, -shvili

●

Armenian -ian, -yan

●

Greek -opoulos, -idis, -akis

●

Scandinavian -sson, -ström

●

Romanian -escu, -eanu

●

Portuguese -eiro, -eira

Full Unicode diacritics support including Turkish dotless-i, Polish stroke-l, Scandinavian stroke-o, and Latin Extended characters.

Data sources (all public-domain)

●

philipperemy/name-dataset — 730k first names and 983k last names sourced from 106 countries. Precision-filtered at build time to remove ~4,000 common English words, country names, and city names. Open-source, MIT licensed.

●

US Census Bureau 2010 — ~162,000 surnames, frequency-ranked. Public domain.

●

NameDatabases (GitHub) — ~20,000 first names and ~85,000 surnames. Open-source.

●

International supplement — 500+ hand-curated names covering South Asian, East Asian, Middle Eastern, European, and African naming conventions.

Name data is compiled at build time via npm run build:names and stored as static text files. No runtime fetching from external sources occurs.

Deployment Options

Hosted

Deploy as static files on any CDN or static hosting platform (Vercel, Netlify, S3 + CloudFront, GitHub Pages).

Self-hosted

Run npm run build and deploy the out/ directory to any internal static file server. Full control over infrastructure.

Air-gapped

Build once, copy the output to an isolated network. No internet connection required after the initial build. All assets including name dictionaries are bundled.

Note: The app loads DM Sans from Google Fonts on page load. For air-gapped or fully isolated deployments, this font can be self-hosted. The display font (PX Grotesk) is already bundled locally.

Verification Steps

Every claim on this page can be independently verified. Here's how:

Open browser DevTools → Network tab

Clear the log, then process a file. You will see only same-origin requests for name dictionary files. Zero external domain requests.

Inspect next.config.ts

Confirm output: "export" is set. This guarantees static-only output with no server runtime.

Check the src/app/ directory

Confirm there is no api/ subdirectory. No server-side request handlers exist.

Review package.json

Confirm no analytics, telemetry, or tracking dependencies are listed.

Search for fetch() calls

The only fetch calls in the entire codebase load /names-first.txt and /names-last.txt — static files from the same origin.

For maximum assurance

Clone the repository, build locally, and deploy on an air-gapped network. Process a sensitive document and observe: zero network traffic beyond the static assets.