Hacker News new | ask | show | jobs
by powera 939 days ago
"if they were mostly written by his secretary then Stanford wouldn't have paid to have them archived." - that is straight-up incorrect. Archiving emails is very cheap. And the redactions look to have been done programmatically.
1 comments

So ePadd redacts everything except the named entities, but that still means going through each message by hand to ensure that the entities generated by the NLP software are correct. Plus the time spent fixing ePadd to make the import run correctly with his non-standard email client, the time spent negotiating permissions and restrictions related to the collection, etc.

C.f.: https://github.com/search?q=repo%3AePADD%2Fepadd++knuth&type...