| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rawsh 1098 days ago

Documents actually never get uploaded! PDF text extraction happens on the client using a web worker and MuPDF compiled to WASM.

1. PDF parsed and chunked on the client

2. Sparse vectors are regenerated for the entire document corpus and the existing vectors are updated

3. Dense vectors are generated for the new text and upserted along with the new sparse values

The original documents stay on your device.

1 comments

Yeah but what is GPT5?