Hacker News new | ask | show | jobs
by arif_sohaib 2907 days ago
If the reassociation returns multiple accounts, then it should get both the main and the throwaway. Unless it is over fitting to the main somehow.
1 comments

The basic result is horribly overfitted, because "account creation minute" and "account creation hour" are two of the parameters. (They split those up because "account creation time" was effectively a unique feature all on its own, and they wanted a 'harder' problem.)

This is basically an exercise in overfitting - they learned to recognize account data using that exact same data, and called it deanonymization. The data fuzzing was a little interesting, but it's still overfitting from top to bottom.

I just read quickly through on my way to work so I did not get to that part. I was thinking more like using tweet content, language useage(words used, sentence length etc) and ip data into a generative kind of network similar to what is used to fix images with missing content. Then the missing content can be the username and it might be able to at least find very similar users.