Security &
Compliance

Technical documentation for security teams evaluating PII Scrambler for enterprise use. Every claim on this page is independently verifiable.

Zero transmission

No APIs, no uploads, no external calls. Data never leaves the browser.

1.2M+ name dictionary

Names from 106 countries — the largest client-side name dictionary of any PII tool.

No server runtime

Static HTML/JS/CSS export. No Node.js, no backend, no database.

Fully auditable

Open source. Inspect the network tab yourself to verify.

01

Architecture Overview

PII Scrambler is built with Next.js using output: "export" — this produces a fully static site. The build output is plain HTML, JavaScript, and CSS files. There is no server-side code, no API routes, no database, and no runtime environment.

File processing runs entirely in a Web Worker — a separate browser thread that handles parsing, PII detection, and file rebuilding. The Web Worker communicates with the main thread exclusively through postMessage with transferable ArrayBuffers.

The deployed application consists of:

  • Static HTML, JS bundles, and CSS
  • Two name dictionary files (3.2 MB + 6.3 MB, served from same origin)
  • A PDF.js worker script (for PDF text extraction)

No environment variables are used. No secrets or API keys exist anywhere in the codebase.

02

Data Flow

1

File selection

User selects a file via drag-drop or file picker. The file is read into an ArrayBuffer in browser memory.

2

Worker transfer

The ArrayBuffer is transferred (zero-copy) to a Web Worker thread. The main thread no longer holds this data.

3

Text extraction

The appropriate file processor extracts text content. PDF uses pdfjs-dist, DOCX uses jszip, XLSX uses the xlsx library. All libraries run in-browser.

4

PII detection

14 regex patterns scan for structured PII (emails, credit cards across all major networks, SSNs, NI numbers, dates of birth, IP addresses, phone numbers, postcodes, and VINs). Three-tier name detection — including a 1,229,656-name dictionary covering 106 countries — runs in parallel. All data is loaded from static files bundled with the app.

5

File rebuild

The processor rebuilds the file in its original format with PII replaced by labels like [EMAIL], [NAME], [PHONE_NUMBER]. For PDFs, pages containing PII are rendered to images and rebuilt with the original text content stream removed — only non-PII text is re-added as a selectable layer. This ensures redacted text cannot be recovered via copy-paste or text extraction.

6

Download

The cleaned file ArrayBuffer is transferred back to the main thread. A download is triggered via URL.createObjectURL. The object URL is immediately revoked and all references released for garbage collection.

At no point in this flow does data leave the browser's memory boundary. There is no network transmission of file content.

03

Network Activity

What you will see in the Network tab

On page load: HTML document, JS bundles, CSS, font files (DM Sans from Google Fonts, PX Grotesk from same origin), and two name dictionary files: /names-first.txt (3.2 MB) and /names-last.txt (6.3 MB) from same origin. These are preloaded static assets bundled with the app — no different from loading CSS or JS.

On first PDF process: /pdf.worker.min.mjs is loaded from same origin (Mozilla's PDF.js worker).

On file process: Nothing. All resources are already loaded and cached.

What you will NOT see

XHR/fetch to external domains

WebSocket connections

Tracking pixels or beacons

Google Analytics or any analytics

Mixpanel, Segment, Amplitude

Sentry, Bugsnag, or error reporting

CDN calls for PII processing

Any outbound POST requests

04

What It Won't Do

No server-side processing
No file uploads to any server
No data persistence (no localStorage, IndexedDB, or cookies for user data)
No user accounts or authentication
No analytics or telemetry of any kind
No third-party tracking scripts
No external API calls for PII processing
No error reporting to external services
No A/B testing frameworks
No session recording or heatmaps
No advertising or marketing SDKs
No data collection whatsoever
05

Technical Evidence

Static export configuration

// next.config.ts
const nextConfig = {
  output: "export",  // ← Fully static, no server
  webpack: (config) => {
    config.resolve.fallback = {
      fs: false,       // No filesystem access
      path: false,     // No path module
      crypto: false,   // No crypto module
      // ... all Node.js modules disabled
    };
    return config;
  },
};

No API routes

The src/app/ directory contains only page routes and layout files. There is no api/ subdirectory. No server-side request handlers exist anywhere in the codebase.

Worker isolation

The Web Worker runs in a separate thread within the same browser security context. Data is transferred via postMessage using Transferable ArrayBuffers (zero-copy, single-owner semantics). The worker terminates after each file is processed.

06

Dependency Audit

PackagePurposeNetwork
nextStatic site generation frameworkNone
react / react-domUI renderingNone
pdfjs-distPDF text extraction (Mozilla PDF.js)None
pdf-libPDF modification (content stream replacement, image embedding, text layer rebuild)None
jszipZIP manipulation (DOCX files are ZIP archives)None
xlsxExcel spreadsheet parsingNone
papaparseCSV parsingNone
compromiseNLP library for named entity recognitionNone

Every dependency operates exclusively on in-memory data. None make network requests, transmit telemetry, or access external resources.

07

Name Detection & Data Provenance

1,229,656unique names from 106 countries

The largest client-side name dictionary of any PII tool. 437k first names and 793k surnames — precision-filtered at build time to remove ~4,000 non-name words (countries, cities, common English) while preserving every legitimate name. Combined with patronymic suffix and prefix pattern recognition for naming conventions that dictionary lookup alone would miss.

Three-tier detection

Tier 1

Contextual heuristics

Detects names near honorifics (Mr., Dr.), salutations (Dear), form labels (Name:), signature blocks, and email-derived patterns.

Tier 2

NLP

Named entity recognition via compromise.js — a client-side NLP library. No external API calls.

Tier 3

Dictionary + suffix patterns

1,229,656-name dictionary (106 countries), precision-filtered at build time to exclude non-name words, plus cultural suffix/prefix pattern matching. Multi-tier agreement boosts confidence; common English words and blocklisted terms are penalised.

Cultural naming convention coverage

Beyond the dictionary, PII Scrambler recognises culturally-specific suffix and prefix patterns for surnames that wouldn't appear in standard English name lists:

Slavic -ović, -ski, -enko, -chuk

Arabic Al-, El-, Bin-, Abu-, Bint-

Turkish -oğlu

Persian -zadeh, -pour, -nejad

Georgian -dze, -shvili

Armenian -ian, -yan

Greek -opoulos, -idis, -akis

Scandinavian -sson, -ström

Romanian -escu, -eanu

Portuguese -eiro, -eira

Full Unicode diacritics support including Turkish dotless-i, Polish stroke-l, Scandinavian stroke-o, and Latin Extended characters.

Data sources (all public-domain)

philipperemy/name-dataset 730k first names and 983k last names sourced from 106 countries. Precision-filtered at build time to remove ~4,000 common English words, country names, and city names. Open-source, MIT licensed.

US Census Bureau 2010 ~162,000 surnames, frequency-ranked. Public domain.

NameDatabases (GitHub) ~20,000 first names and ~85,000 surnames. Open-source.

International supplement 500+ hand-curated names covering South Asian, East Asian, Middle Eastern, European, and African naming conventions.

Name data is compiled at build time via npm run build:names and stored as static text files. No runtime fetching from external sources occurs.

08

Deployment Options

Hosted

Deploy as static files on any CDN or static hosting platform (Vercel, Netlify, S3 + CloudFront, GitHub Pages).

Self-hosted

Run npm run build and deploy the out/ directory to any internal static file server. Full control over infrastructure.

Air-gapped

Build once, copy the output to an isolated network. No internet connection required after the initial build. All assets including name dictionaries are bundled.

Note: The app loads DM Sans from Google Fonts on page load. For air-gapped or fully isolated deployments, this font can be self-hosted. The display font (PX Grotesk) is already bundled locally.

09

Verification Steps

Every claim on this page can be independently verified. Here's how:

1

Open browser DevTools → Network tab

Clear the log, then process a file. You will see only same-origin requests for name dictionary files. Zero external domain requests.

2

Inspect next.config.ts

Confirm output: "export" is set. This guarantees static-only output with no server runtime.

3

Check the src/app/ directory

Confirm there is no api/ subdirectory. No server-side request handlers exist.

4

Review package.json

Confirm no analytics, telemetry, or tracking dependencies are listed.

5

Search for fetch() calls

The only fetch calls in the entire codebase load /names-first.txt and /names-last.txt — static files from the same origin.

6

For maximum assurance

Clone the repository, build locally, and deploy on an air-gapped network. Process a sensitive document and observe: zero network traffic beyond the static assets.