Hacker News new | ask | show | jobs
by mutant 1063 days ago
What happens to documents uploaded? Can you access them? Are they used in later training?
1 comments

Documents actually never get uploaded! PDF text extraction happens on the client using a web worker and MuPDF compiled to WASM.

1. PDF parsed and chunked on the client

2. Sparse vectors are regenerated for the entire document corpus and the existing vectors are updated

3. Dense vectors are generated for the new text and upserted along with the new sparse values

The original documents stay on your device.

Yeah but what is GPT5?