Hacker News new | ask | show | jobs
by codeviking 1739 days ago
We don't retain the uploaded document. We cache the extracted content, as to make things more efficient.

See https://papertohtml.org/about:

> What data do we keep? We cache a copy of the extracted content as well as the extracted images. This allows us to serve the results more quickly when a user uploads the same file again. We do not retain the uploaded files themselves. Cached content is never served to a user who has not provided the exact same document.

Also, we can delete the extracted data on request. Just send a note to accessibility@semanticscholar.org.

Sorry for the confusion!

1 comments

Ah okay, thank you.

>Also, we can delete the extracted data on request.

Just to be 100% clear, you are referring to the cached extracted data, right?

Yup, that's right.
Thank you very much!