| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by arif_sohaib 2907 days ago
	If the reassociation returns multiple accounts, then it should get both the main and the throwaway. Unless it is over fitting to the main somehow.

1 comments

Bartweiss 2907 days ago

The basic result is horribly overfitted, because "account creation minute" and "account creation hour" are two of the parameters. (They split those up because "account creation time" was effectively a unique feature all on its own, and they wanted a 'harder' problem.)

This is basically an exercise in overfitting - they learned to recognize account data using that exact same data, and called it deanonymization. The data fuzzing was a little interesting, but it's still overfitting from top to bottom.

arif_sohaib 2907 days ago

I just read quickly through on my way to work so I did not get to that part. I was thinking more like using tweet content, language useage(words used, sentence length etc) and ip data into a generative kind of network similar to what is used to fix images with missing content. Then the missing content can be the username and it might be able to at least find very similar users.