|
|
|
|
|
by rawsh
1051 days ago
|
|
Documents actually never get uploaded! PDF text extraction happens on the client using a web worker and MuPDF compiled to WASM. 1. PDF parsed and chunked on the client 2. Sparse vectors are regenerated for the entire document corpus and the existing vectors are updated 3. Dense vectors are generated for the new text and upserted along with the new sparse values The original documents stay on your device. |
|