Hacker News new | ask | show | jobs
by IanChiles 4693 days ago
It'd be awesome to see a dump of the data so that we can examine it on our own - many people may think of ways to use it that you haven't, or draw different conclusions from it. :D
1 comments

Yeah! This is something that I'm interested in, but there are a bunch of privacy issues with a raw data dump (references to specific files/URLs and possibly passwords), and I haven't gotten around to making sure it's safe yet.

(note: I haven't actually seen any passwords, but that doesn't mean there aren't any)

Oh, that totally skipped my mind. Maybe run something to recognize URLs (replace with $URL, or some kind of placeholder), and then try to obfuscate filenames and the like in a similar way? By normalizing the data like this, you could get much better results with regards to command line switches and the like.
Thanks for caring about the privacy issues.

I would have to guess that the first "word" with the command only is clean? Have you seen any evidence to the contrary? That data set alone would tell an interesting story.