Hacker News new | ask | show | jobs
Show HN: Local personal data redaction for any AI tools (github.com)
12 points by unusual_typo 5 days ago
I built the desktop app that detects and redacts personal data (or PII) locally without sending any text to server. It supports rule-based filtering and AI model-based redaction (eg openai privacy filter). It's open source and free. Please check out the repo and https://pii-gui.vercel.app/
4 comments

Here are the benchmark results. You can check more details in the repo. openai/privacy-filter on Apple M1 Max

   dtype              1k total    1k tok/s       8k total    8k tok/s
  ━━━━━━━━━━━━━━━━  ━━━━━━━━━━━  ━━━━━━━━━━  ━━━━━━━━━━━━━  ━━━━━━━━━━
   fp32              620.52 ms       1,664    4,893.86 ms       1,689
  ────────────────  ───────────  ──────────  ─────────────  ──────────
   fp16              654.56 ms       1,578    5,430.17 ms       1,521
  ────────────────  ───────────  ──────────  ─────────────  ──────────
   q4                582.13 ms       1,776    4,635.39 ms       1,784
  ────────────────  ───────────  ──────────  ─────────────  ──────────
   q4f16             648.10 ms       1,594    5,261.56 ms       1,570
  ────────────────  ───────────  ──────────  ─────────────  ──────────
   quantized int8    573.94 ms       1,801    4,594.95 ms       1,800
I would love to have an option where instead of just redaction; I'd love to swap it with something else when it goes to AI and then swap it back when the AI returns it. Thanks for sharing the github. I might submit a PR if I don't find that feature
I wanted to implement the feature initially. i realized that it requires modification of coding agents (eg codex, claude code, opencode etc). hook or skills pass PII data into server eventually so i decided to share the standalone app first. Feel free to submit a PR!
Nice, local is the right call. What's the local AI model — a small NER model bundled in, or calling out to something? Curious about the size/footprint for a desktop app.
It use openai/privacy-filter which is smaller than 1GB in size. I haven't checked usage during inference. It rans at 1k toks/sec on my macbook. I will update the repo with benchmark results. Thanks for the comment
Local is the way. Any benchmarks on latency it has on CPU?
I just ran the benchmark on my macbook. 582 ms for 1k tokens and 4.64 s for 8k