Hacker News new | ask | show | jobs
by jll29 492 days ago
Slightly related:

In the paper Leidner and Plachouras (2017), we reported an observation initially due to Bentley, namely that the McIllroy spell(1) implementation emailed a list of unknown words when it was run over a document to the author. While technically, this is a neat way to increase dictionary size by "mining user data", and certain versions of Microsoft Word and Microsoft Edge (see https://news.ycombinator.com/item?id=35208333 ) had the same behavior, it is privacy-violating at least if users are not informed beforehand [1]. Of course this has to be seen in the cultural context of the UNIX community at the time (people were even weary of using passwords then to protect their accounts), but still harm could be done if an email tasked about "malinoma" when that term was still absent from the dictionary, possibly revealing a sensitive condition of the email author or their circle of friends and family to a third party, the software engineer of the speel checker.

[1] https://aclanthology.org/W17-1604.pdf

2 comments

  > the speel checker
        ~~~~~
I've done something similar at my job. I don't think it implicates privacy because 1) it's a list of misses from the (public) dictionary, and 2) the email is sent as the program, not as the user. So if the user misspells melinoma, you have no information on which user that was.
This might be different if you have 4 users on the system or 400 users.

Also, traditional Unix makes it easy to find out who was active on the system at a certain time (for example with the "last" command, which reads the wtmp log), plus other data sources (historically regular users were often allowed to read most log files by default, and people often used relatively open file permissions by default, which would at least allow for examining other users' file metadata).

That's a very rude thing to say about the President's wife. Notice has been taken.