Hacker News new | ask | show | jobs
by fchollet 4531 days ago
The author of the 2008 Bitcoin whitepaper was identified through textual analysis of his writing. JK Rowling was also identified as the author of a pseudonymously published novel using the same methods.

One important step towards real anonymity would to completely anonymize your writing style. Make sure the distribution of stop words in your writing is absolutely banal. Make sure to not use your favorite expressions, that can be found in your previous writing. Etc. Algorithmically measure your style before posting, and make sure it is non-identifiable.

9 comments

JK Rowling analysis was post-facto, she wasn't outed by the analysis, the leak came from her lawyer's wife [1].

The analysis added weight to that revelation but it wasn't enough in itself to confirm it for sure.

1:http://www.bbc.co.uk/news/entertainment-arts-25575269

I'm guessing it's now her ex-lawyer's wife.
The author of the 2008 Bitcoin whitepaper was identified through textual analysis of his writing.

Actually, that's incorrect. Nick Szabo is a candidate to be Satoshi Nakamoto, but the post claiming to use stylometry to out him is garbage. Gwern (who is no stranger to stylometry[1]) explains: http://www.reddit.com/r/Bitcoin/comments/1ruluz/satoshi_naka...

Stylometry was only used on JK Rowling after a tip-off from an anonymous source. Even then, it's not clear how useful it was in outing the author. Tools and algorithms are getting better, but even modern stylometric methods will give you false positives on a large corpus. People simply aren't that unique.

1. http://www.gwern.net/Death%20Note%20script#stylometrics

"The author of the 2008 Bitcoin whitepaper was identified through textual analysis of his writing."

wait, what?

> https://likeinamirror.wordpress.com/2013/12/01/satoshi-nakam...

Seems possible, although the textual analysis seems a bit weak. Combined with everything else its quite convincing

No. The analysis is complete bullshit, although you'd never know because he still hasn't approved any of the many critical comments left on the post. For example, http://www.reddit.com/r/Bitcoin/comments/1ruluz/satoshi_naka...
I don't think that your suggestions are reasonable. The most memorable phrases we use are also linked to our understanding of certain specific concepts, on a quite personal level. Essentially, you would be forced to generalize everything, and could be left only writing banal youtube-style comments rather than anything reflecting your best attempt at getting your thoughts down. At that rate, why bother writing?

I think better protection is simply not to publish much under any alias. If there isn't a large body of text, an alias writing a few thoughts on one or two issues can't really be mined.

He says he uses google translate to cycle through two languages and then spellcheck the result in order to avoid that.
That means Google Translate has a copy of his original and modified texts. Long shot, but still a liability.
Nick Szabo is probably also a pseudonym. There is not reference to him on the net made by reliable known person. I couldn't find anything about Nick Szabo apart from what he posted himself. Also, no pictures or connection to his George Washington University.
The author briefly touches upon this under the section 'Word and character frequency analysis', but I'm not sure this would really help with writing style?
I had a thought for if I ever wanted to write something completely anonymously: run the text through google translate and back. That should hopefully butcher all identifying features of the text.

edit: oops, should have read the whole post.

That's exactly what the author did here...
Use Jstylo/Anonymouth instead, they know all about the translate trick plus google had the original
Oh, do we know who Satoshi is? Because I missed that.