Hacker News new | ask | show | jobs
by powera 938 days ago
This appears to be a reference to https://epadd.stanford.edu/epadd/collection-detail?collectio... - a collection of "Don Knuth" emails.

The public versions are very redacted, but: 1) they seem to all be from his secretary, not him 2) most of the messages are not dictation or direct copies of messages Knuth wrote elsewhere.

1 comments

His secretary may have typed them, but if they were mostly written by his secretary then Stanford wouldn't have paid to have them archived. Even though the messages are mostly redacted, actually going through and doing those redactions is still hundreds of hours of work.
"if they were mostly written by his secretary then Stanford wouldn't have paid to have them archived." - that is straight-up incorrect. Archiving emails is very cheap. And the redactions look to have been done programmatically.
So ePadd redacts everything except the named entities, but that still means going through each message by hand to ensure that the entities generated by the NLP software are correct. Plus the time spent fixing ePadd to make the import run correctly with his non-standard email client, the time spent negotiating permissions and restrictions related to the collection, etc.

C.f.: https://github.com/search?q=repo%3AePADD%2Fepadd++knuth&type...