Hacker News new | ask | show | jobs
by olliej 1064 days ago
Because people keep on acting like these researchers have retroactively removed the anonymity of this forum, or somehow everything was anonymous before this published, lets go over the facts:

1. ejmr made a system that includes hashes that could be trivially linked to ip addresses

2. ejmr claimed posts were anonymous

3. this researcher realized that the hashes could be trivially linked to ip addresses

4. the researcher presumably informed ejmr (as ejmr changed their scheme prior to publication)

5. the researcher published the findings

The posts made on the forum could be linked to ip addresses from step 1, if this series of events stopped at step 2 or 3, the posts would still not be anonymous, and forum users would still believe that they were.

We know that at step 3 this researcher realized that the forum posts were not anonymous, we have no way of knowing how many other people may have also discovered this.

At step 4, we know ejmr changed their hashing scheme to actually make it [maybe] anonymous, and despite now knowing their existing scheme was not anonymous they did not inform any existing users that their posts were not anonymous.

At step 5 the people using these forums finally discovered that their posts were not actually anonymous, because they were never anonymous. People on that forum, and commenters on HN, act like the researcher was responsible for the technical failure of ejmr, and somehow the act of telling people that their posts were not anonymous is what actually removed anonymity.

Because people continue to struggle with this, let's imagine I made a forum where every post had an id that was computed as the first 10 characters of base64(rot13(ip || iso date)). A decade later someone goes "hang on, this looks like base 64", and then publishes their findings: you can get a post's IP address by decoding the truncated base 64 and reversing rot13.

Is that person responsible for de-anonymizing the users of my forum, or is it my fault for misrepresenting the anonymity of my forum?

2 comments

> ejmr made a system that includes hashes that could trivially be linked to ip addresses

"trivially be linked" = searching 3 quadrillion possibilities?

Suppose that in the near future that a quantum computer enables the "trivial" piercing of current anonymity assumptions, should those individuals also be fair game for doxxing: "they were never anonymous"?

Your casual appropriation of "triviality" to dismiss moral concerns over this paper and the authors' possible motives rings hollow in me.

> "trivially be linked" = searching 3 quadrillion possibilities?

Which is trivial. Doing the same thing many times is literally what computers were invented for. Whether it's 3 times or 3 quadrillion times, it does not matter.

> Suppose that in the near future that a quantum computer enables the "trivial" piercing of current anonymity assumptions, should those individuals also be fair game for doxxing: "they were never anonymous"?

There are myriad ways to have provable anonymity, quantum computers are not magic. More over the best known algorithm for some kind of deanonymization under QC is still Grover's search which is a sqrt improvement, rather than anything catastrophic like Shor's. But that's also irrelevant.

ejmr's "anonymization" was not anonymous under the standard cryptographic assumptions of 20 years ago, let alone 12 years ago when the software originated.

To be clear, when ejmr was first started:

* SHA1 was mostly cryptographically broken (that is it was considered a sufficiently determined adversary with unlimited money could break it), hence any new use of SHA-1 is definitionally wrong.

* SHA is the wrong family anyway, SHA hashes are authentication codes and are therefore intentionally extremely fast to compute. It was well established in the _90s_ that authentication hashes are not appropriate for anything other than authentication, alongside numerous demonstrations of breaking password hashes which is what ejmr was essentially doing.

* ejmr was not salting anything, and literally anyone with actual experience in any actual field using hashes knows that salting hashes is mandatory.

This isn't "this was anonymous until computers got faster", this was not anonymous at the time it was first written, under standard cryptographic assumptions. Let's say it cost $10k for this PI to compute those hashes, then 12 years ago, assuming Moore's law, it would cost $5million to break (under simple assumptions, so I doubled to be conservative).

That. is. broken.

> Your casual appropriation of "triviality" to dismiss moral concerns over this paper and the authors' possible motives rings hollow in me.

No. My claims are purely related to the claims that the authors of this paper are responsible for deanonymizing people that on ejmr, when ejmr catastrophically failed and misled its users.

Your immediate response to my statement about triviality was to repeat "it's a big number" which belies a gross misunderstanding of the field. Anything involving hashing or cryptography is filled with giant numbers. A non-trivial attack is one that involves doing something clever to reduce the search space to make the attack possible. This attack was _literally_ "we just tried every option as fast as possible". That attack on misuse of hashing operations was identified in the 90s when people demonstrated breaking of password hashes.

This attack is not clever. It does not - afaict - do anything that in anyway reduces the complexity from "try every option", it is a dumb solution to the incompetent "anonymization" performed by ejmr. That "try every option" was an option speaks to how poor the ejmr code was, and how trivial this was.

As for the "morality" of the paper: there are endless "studies" of forum culture and demographics that haven't caused problems.

The only problem I see is that ejmr is refusing to acknowledge that they rolled their own crypto, and predictably got it wrong. That and people like you who seem to believe this mediocre research paper is somehow responsible.

I think the mainstream take is that black or white hat hinges on responsible disclosure?

If that happened, the forum has completely mishandled this and the blame is squarely on them. If it didn't then I guess it's an open question.

No. Black vs white hat is "did you break this and then use it to <do something illegal>.

The responsible vs. irresponsible disclosure question is "do you tell the responsible party ahead of time and give them time to repair it". From articles it certainly appears that ejmr learned how broken their code was prior to this paper being published.

But responsible vs irresponsible disclosure is not a question of "should this be disclosed at all?", which the security community as whole seems to have determined that the answer is "yes".

The problem is that ejmr was not anonymous, and if you publish something that is not anonymous, it is forever not anonymous.

The only option would be to not disclose that there was any problem, not notify people that their posts were not anonymous, and this paper (the actual "research" about where posters lived/worked?) could also not be published. Because any acknowledgement or indication that the you could get form id to ip in any forum would cause people to go "huh, how did they do that?" a Streisand effect your way to everyone knowing.

This is of course assuming that no one else interested in commenter identities has ever looked at ejmr either, because these researchers did not do anything clever to break the scheme.

> Black vs white hat is "did you break this and then use it to <do something illegal>.

That is a very narrow interpretation of "black hat". I think mainstream take is that black hat includes many legal but ethically dubious actions. Maybe you would call it "grey hat", I don't know. But publishing vulnerability without a responsible disclosure can be considered unethical.

> But responsible vs irresponsible disclosure is not a question of "should this be disclosed at all?", which the security community as whole seems to have determined that the answer is "yes".

Yes, I don't know if you misread but by 'responsible disclosure' I meant 'tell ejmr about this before publishing'.

> The only option

No. If they were informed about this issue, after changing the schema EJMR could take down all preexisting posts made with the old schema and request public archives to remove them (and reindex new ones). It's not foolproof because many posts may happen to be archived independently but it would be something. And of course notify users.

> But publishing vulnerability without a responsible disclosure can be considered unethical.

Yes, there is debate on that, and there are arguments on either side. But given ejmr went 12 years without changing their "anonymization" scheme, and then changed it a short time prior to an article being published that demonstrated the scheme was broken, I think it's reasonable to presume ejmr was notified prior to publication, and had time to correct the flaw, which is the canonical example of responsible disclosure.

That ejmr did not tell its users is an example of the behavior that the anti-responsible disclosure folk point to. Organizations that say "you should tell us about vulnerabilities in our products, but you cannot tell our users, and neither will we" are a large part of the reason some people oppose responsible disclosure.

> No. If they were informed about this issue, after changing the schema EJMR could take down all preexisting posts made with the old schema and request public archives to remove them (and reindex new ones).

There are multiple existing libraries online to support scraping ejmr specifically, as well as who knows how many archives and search engines we don't know about.

Every person who posted need to be made aware that their posts could be tracked at least to the IP (though at any institution you're behind a NAT so generally IP != person, and the idea of ISPs having per hour IP<->user logs from a decade ago seems suspect).

Also we know that ejmr found out about the gaping hole somehow - we don't know exactly, we just know they addressed the incompetence, though I assume they're still doing it wrong - and they didn't even pull and re-index their own archive let alone ask anyone else to do so.

> It's not foolproof because many posts may happen to be archived independently but it would be something.

Either you're anonymous or you're not, so you can't just say "we doubt there are any other archives so you're safe". The user IPs are not secret, as they were never secret.

We also have no way to know if anyone else had already done this, and we likely never will.

> And of course notify users.

Which they also did not do.

> Either you're anonymous or you're not

False on many levels and dangerous belief.

You are never ever fully anonymous writing online. It all depends on how difficult it is and how determined the threat.

Everything you post can be correlated with your other writing and online activity (even if you fake the style), ISP subpoenaed and tor nodes compromised. But most people are sorta anonymous because no one bothers to go through the trouble.