Hacker News new | ask | show | jobs
by jborichevskiy 2129 days ago
A hunch: I think anonymous writing today (by those with more non-anonymous writing) will be reliably de-anonymized relatively soon through some sort of comparative analysis by a motivated enough party.

It feels like writing needs an additional layer of obfuscation. For example, one way to give feedback in a small circle of people might be to have a neutral, anonymous third-party rephrase the idea in their own words -- a bit like running it through Google Translate but less deterministic.

This sort of feature could potentially be baked into a service, as a checkbox. Perhaps in order to post you have to "anonymize" n number of other users' posts first.

Some further reading:

https://33bits.wordpress.com/2012/02/20/is-writing-style-suf...

https://en.wikipedia.org/wiki/Stylometry

4 comments

Thank you for your suggestions and suggested reads. Though I think the incentive to de-anonymise or get into stylometry is there only if the post has a larger impact on an organisation or individual. Most of the post on Vigyaa.io are personal in nature. I don't see why anyone will make the effort to figure out who has written it. The whole obfuscation is necesarry if one is blowing the whistle on someone or is posting something offensive. In either case the author will make their own effort to not let their writing style reveal themselves.

Would love to hear your prospective further.

> The whole obfuscation is necesarry if one is blowing the whistle on someone or is posting something offensive.

Agreed -- I don't think personal posts need the anonymization as much as perhaps political discussions in oppressive regimes (at which point getting the technical security stack right seems higher ROI than worrying about de-anonymization).

I'm, like, not familiar at all with stylometry on a working basis, so this is absolutely just, like, my opinion, man. But the few examples I've seen of it were far more likely to bunch up subject choice than style e.g. all ethics to one heap, all metaphysics to another, regardless of author. And if you control for something like that (pet subject?) I don't know how much information would remain in the purely structural information of the text i.e. comma placement choice, but I'd guess it's fairly low. There are only so many ways to say things, and even then there are people seem to go straight to the point.

You'd only make yourself identifiable by railing about the same things over and over again, and I think this is something you don't even need a machine to pick up.

That seems to be more about intention, and I'll agree that even style can be intentionally selected, my dude. Perhaps there are larger leftover pieces, gaps, or other footholds to consider here. I'll agree you might not even need a machine, but I am concerned about unjustifiable ML-empowered humans sifting even more effectively for justified dissidence through style, sentiment, and semantic analysis. Do you think this is an arms race that is won by hand?

I'm not saying I disagree with you. I want to know more about what you think.

Wouldn’t it be possible to have a conversation with GPT-3 and have it generate the writing on some topic for you? Yes, the time it would take to write would no doubt become much longer, but then it would be much more difficult to find out who wrote what beyond getting access to GPT-3 and seeing queries.
Great point, I agree GPT-3 & related models will open up interesting avenues for this process.

I have not played around with it, but I can imagine querying it through a structure like LearnFromAnyone [0] where one would ask for a sentence (paragraph? document?) re-phrased in the style of a chosen person or group of people.

0 - https://learnfromanyone.com/

Look up Anonymouth on Github. It's abandonware now, but there might be an updated fork.
That's neat, thank you!