| Grandparent's statement is pretty absolute, but I find myself in agreement with it. Data collection is the right place to intervene, because once collected, data can be copied and misused at any time in the future. > when you read this comment you'll have loaded a page on HN. That means HN's server probably has a log of your IP address, browser agent string, etc. Such logging isn't technically necessary to serve web pages, and ideally shouldn't be done without consent. > Am I spying on you when I read those pages? That's not spying, because the user consented to making their comments public. (Not sure about favorites though, there's a small note on the profile page but maybe the favoriting action should make it more explicit.) > Google Analytics isn't spying on you when it tracks everything you do on 50% of the websites you visit. It's spying if you didn't consent to it. |
It's needed as soon as you want to do: non-trivial spam protection, context connection for errors/exceptions, dos mitigation, correlation of issues across browsers, and a few other things.
For most of those you could theoretically hash the IP because you're interested in matches not actual values (although matching either the AS or at least /24 makes things easier). But until we migrate to IPv6 hashing doesn't make sense (and once we move, keeping individual addresses doesn't make sense).
Basically the bigger the site, the more important that information is for operations.