Hacker News new | ask | show | jobs
Show HN: Open-source PDF layout analysis running entirely in the browser (embedpdf.com)
3 points by bobsingor 126 days ago
Hi HN, creator of EmbedPDF here.

I recently posted my open-source PDF viewer here, and one thing I really value is that it runs completely offline. I started wondering if we could push that further: could we do full ML layout analysis (detecting tables, headers, columns) directly in the browser?

To my surprise, it actually works.

The catch: It is far from production-ready. It crashes on most phones, and on older computers, it can be incredibly slow.

The why: I believe the future of document processing is local. Many users work with sensitive documents (bank statements, legal contracts) and simply do not want to upload them to a cloud endpoint just to parse a table or analyze layout.

This is a proof of concept for that future—where models get smaller, WASM/WebGPU gets faster, and we can keep data entirely on the client side.

Demo: https://www.embedpdf.com/layout-analysis Repo: https://github.com/embedpdf/embed-pdf-viewer

I'd love to hear your thoughts on the performance and where you think browser-based ML is heading.